diff --git "a/stage1/open-stage1.log" "b/stage1/open-stage1.log" new file mode 100644--- /dev/null +++ "b/stage1/open-stage1.log" @@ -0,0 +1,51160 @@ +2025-10-24 09:17:43,466 - root - INFO - Starting training. +2025-10-24 09:17:43,471 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:17:43,619 - root - INFO - Starting training. +2025-10-24 09:17:43,619 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:17:43,728 - root - INFO - Starting training. +2025-10-24 09:17:43,728 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:17:43,897 - root - INFO - Starting training. +2025-10-24 09:17:43,897 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:17:44,018 - root - INFO - Starting training. +2025-10-24 09:17:44,018 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:17:44,059 - root - INFO - Starting training. +2025-10-24 09:17:44,059 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:17:44,145 - root - INFO - Starting training. +2025-10-24 09:17:44,146 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:17:44,185 - root - INFO - Starting training. +2025-10-24 09:17:44,185 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:17:45,265 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:17:45,266 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:17:45,267 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:17:45,276 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:17:45,277 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:17:45,278 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:17:45,400 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:17:45,401 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:17:45,402 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:17:45,451 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:17:45,453 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:17:45,453 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:17:45,656 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:17:45,657 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:17:45,658 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:17:45,723 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:17:45,776 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:17:45,776 - root - INFO - GPU capacity: NVIDIA B200 (0) with 178.36GiB memory +2025-10-24 09:17:45,780 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:17:45,784 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:17:45,785 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:17:45,786 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:17:45,809 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:17:45,810 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:17:45,810 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:17:45,811 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:17:45,811 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:17:45,812 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:17:45,815 - root - INFO - Applied FSDP to the model +2025-10-24 09:17:45,816 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:17:45,835 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:17:45,864 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:17:45,886 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:17:45,886 - root - INFO - GPU capacity: NVIDIA B200 (3) with 178.36GiB memory +2025-10-24 09:17:45,889 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:17:45,914 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:17:45,915 - root - INFO - GPU capacity: NVIDIA B200 (4) with 178.36GiB memory +2025-10-24 09:17:45,918 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:17:45,922 - root - INFO - Applied FSDP to the model +2025-10-24 09:17:45,923 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:17:45,937 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:17:45,951 - root - INFO - Applied FSDP to the model +2025-10-24 09:17:45,951 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:17:45,987 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:17:45,987 - root - INFO - GPU capacity: NVIDIA B200 (7) with 178.36GiB memory +2025-10-24 09:17:45,990 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:17:46,023 - root - INFO - Applied FSDP to the model +2025-10-24 09:17:46,023 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:17:46,074 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:17:46,125 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:17:46,125 - root - INFO - GPU capacity: NVIDIA B200 (1) with 178.36GiB memory +2025-10-24 09:17:46,128 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:17:46,166 - root - INFO - Applied FSDP to the model +2025-10-24 09:17:46,167 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:17:46,213 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:17:46,225 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:17:46,227 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:17:46,266 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:17:46,267 - root - INFO - GPU capacity: NVIDIA B200 (6) with 178.36GiB memory +2025-10-24 09:17:46,270 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:17:46,275 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:17:46,275 - root - INFO - GPU capacity: NVIDIA B200 (5) with 178.36GiB memory +2025-10-24 09:17:46,277 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:17:46,277 - root - INFO - GPU capacity: NVIDIA B200 (2) with 178.36GiB memory +2025-10-24 09:17:46,278 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:17:46,281 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:17:46,304 - root - INFO - Applied FSDP to the model +2025-10-24 09:17:46,305 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:17:46,311 - root - INFO - Applied FSDP to the model +2025-10-24 09:17:46,312 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:17:46,314 - root - INFO - Applied FSDP to the model +2025-10-24 09:17:46,314 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:18:10,285 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:18:10,285 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:18:10,285 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:18:10,285 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:18:10,285 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:18:10,286 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:18:10,298 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:18:10,298 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +2025-10-24 09:18:10,731 - root - INFO - Loaded cached document counts in 9.179115295410156e-05 seconds +2025-10-24 09:18:10,731 - root - INFO - Loaded cached document counts in 8.511543273925781e-05 seconds +2025-10-24 09:18:10,731 - root - INFO - Loaded cached document counts in 9.5367431640625e-05 seconds +2025-10-24 09:18:10,731 - root - INFO - Loaded cached document counts in 8.368492126464844e-05 seconds +2025-10-24 09:18:10,731 - root - INFO - Loaded cached document counts in 0.00010013580322265625 seconds +2025-10-24 09:18:10,731 - root - INFO - Loaded cached document counts in 9.036064147949219e-05 seconds +2025-10-24 09:18:10,732 - root - INFO - Loaded cached document counts in 9.679794311523438e-05 seconds +2025-10-24 09:18:10,732 - root - INFO - Loaded cached document counts in 0.00011205673217773438 seconds +2025-10-24 09:18:10,732 - root - INFO - Worker 0 responsible for docs: [('/work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet', 0, 945398)] +2025-10-24 09:18:10,732 - root - INFO - Total docs: 945399 +2025-10-24 09:18:10,732 - root - INFO - Worker 0 assembled subdataset iterator for /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/, 1 of 1 +No valid checkpoint detected at jobs/munin-7b-open-stage1/checkpoints/dataloader, dataset starting from scratch. +2025-10-24 09:18:10,733 - root - INFO - Nodecay weight: tok_embeddings.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.0._orig_mod.attention.wq.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.0._orig_mod.attention.wk.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.0._orig_mod.attention.wv.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.0._orig_mod.attention.wo.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.0._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.0._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.0._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,733 - root - INFO - Nodecay weight: layers.0._orig_mod.attention_norm.weight +2025-10-24 09:18:10,733 - root - INFO - Nodecay weight: layers.0._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.1._orig_mod.attention.wq.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.1._orig_mod.attention.wk.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.1._orig_mod.attention.wv.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.1._orig_mod.attention.wo.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.1._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.1._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.1._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,733 - root - INFO - Nodecay weight: layers.1._orig_mod.attention_norm.weight +2025-10-24 09:18:10,733 - root - INFO - Nodecay weight: layers.1._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.2._orig_mod.attention.wq.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.2._orig_mod.attention.wk.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.2._orig_mod.attention.wv.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.2._orig_mod.attention.wo.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.2._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.2._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.2._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,733 - root - INFO - Nodecay weight: layers.2._orig_mod.attention_norm.weight +2025-10-24 09:18:10,733 - root - INFO - Nodecay weight: layers.2._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.3._orig_mod.attention.wq.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.3._orig_mod.attention.wk.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.3._orig_mod.attention.wv.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.3._orig_mod.attention.wo.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.3._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.3._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.3._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,733 - root - INFO - Nodecay weight: layers.3._orig_mod.attention_norm.weight +2025-10-24 09:18:10,733 - root - INFO - Nodecay weight: layers.3._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,733 - root - INFO - Decay weight: layers.4._orig_mod.attention.wq.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.4._orig_mod.attention.wk.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.4._orig_mod.attention.wv.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.4._orig_mod.attention.wo.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.4._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.4._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.4._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.4._orig_mod.attention_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.4._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.5._orig_mod.attention.wq.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.5._orig_mod.attention.wk.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.5._orig_mod.attention.wv.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.5._orig_mod.attention.wo.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.5._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.5._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.5._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.5._orig_mod.attention_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.5._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.6._orig_mod.attention.wq.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.6._orig_mod.attention.wk.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.6._orig_mod.attention.wv.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.6._orig_mod.attention.wo.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.6._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.6._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.6._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.6._orig_mod.attention_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.6._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.7._orig_mod.attention.wq.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.7._orig_mod.attention.wk.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.7._orig_mod.attention.wv.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.7._orig_mod.attention.wo.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.7._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.7._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.7._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.7._orig_mod.attention_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.7._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.8._orig_mod.attention.wq.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.8._orig_mod.attention.wk.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.8._orig_mod.attention.wv.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.8._orig_mod.attention.wo.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.8._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.8._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.8._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.8._orig_mod.attention_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.8._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.9._orig_mod.attention.wq.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.9._orig_mod.attention.wk.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.9._orig_mod.attention.wv.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.9._orig_mod.attention.wo.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.9._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.9._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.9._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.9._orig_mod.attention_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.9._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.10._orig_mod.attention.wq.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.10._orig_mod.attention.wk.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.10._orig_mod.attention.wv.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.10._orig_mod.attention.wo.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.10._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.10._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.10._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.10._orig_mod.attention_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.10._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.11._orig_mod.attention.wq.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.11._orig_mod.attention.wk.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.11._orig_mod.attention.wv.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.11._orig_mod.attention.wo.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.11._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.11._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.11._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.11._orig_mod.attention_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.11._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.12._orig_mod.attention.wq.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.12._orig_mod.attention.wk.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.12._orig_mod.attention.wv.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.12._orig_mod.attention.wo.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.12._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.12._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.12._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.12._orig_mod.attention_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.12._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.13._orig_mod.attention.wq.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.13._orig_mod.attention.wk.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.13._orig_mod.attention.wv.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.13._orig_mod.attention.wo.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.13._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.13._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.13._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.13._orig_mod.attention_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Nodecay weight: layers.13._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.14._orig_mod.attention.wq.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.14._orig_mod.attention.wk.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.14._orig_mod.attention.wv.weight +2025-10-24 09:18:10,734 - root - INFO - Decay weight: layers.14._orig_mod.attention.wo.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.14._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.14._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.14._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.14._orig_mod.attention_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.14._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.15._orig_mod.attention.wq.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.15._orig_mod.attention.wk.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.15._orig_mod.attention.wv.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.15._orig_mod.attention.wo.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.15._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.15._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.15._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.15._orig_mod.attention_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.15._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.16._orig_mod.attention.wq.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.16._orig_mod.attention.wk.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.16._orig_mod.attention.wv.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.16._orig_mod.attention.wo.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.16._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.16._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.16._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.16._orig_mod.attention_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.16._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.17._orig_mod.attention.wq.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.17._orig_mod.attention.wk.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.17._orig_mod.attention.wv.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.17._orig_mod.attention.wo.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.17._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.17._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.17._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.17._orig_mod.attention_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.17._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.18._orig_mod.attention.wq.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.18._orig_mod.attention.wk.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.18._orig_mod.attention.wv.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.18._orig_mod.attention.wo.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.18._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.18._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.18._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.18._orig_mod.attention_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.18._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.19._orig_mod.attention.wq.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.19._orig_mod.attention.wk.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.19._orig_mod.attention.wv.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.19._orig_mod.attention.wo.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.19._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.19._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.19._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.19._orig_mod.attention_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.19._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.20._orig_mod.attention.wq.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.20._orig_mod.attention.wk.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.20._orig_mod.attention.wv.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.20._orig_mod.attention.wo.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.20._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.20._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.20._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.20._orig_mod.attention_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.20._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.21._orig_mod.attention.wq.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.21._orig_mod.attention.wk.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.21._orig_mod.attention.wv.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.21._orig_mod.attention.wo.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.21._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.21._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.21._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.21._orig_mod.attention_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.21._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.22._orig_mod.attention.wq.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.22._orig_mod.attention.wk.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.22._orig_mod.attention.wv.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.22._orig_mod.attention.wo.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.22._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.22._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.22._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.22._orig_mod.attention_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.22._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.23._orig_mod.attention.wq.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.23._orig_mod.attention.wk.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.23._orig_mod.attention.wv.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.23._orig_mod.attention.wo.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.23._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.23._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.23._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.23._orig_mod.attention_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.23._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.24._orig_mod.attention.wq.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.24._orig_mod.attention.wk.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.24._orig_mod.attention.wv.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.24._orig_mod.attention.wo.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.24._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.24._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.24._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.24._orig_mod.attention_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Nodecay weight: layers.24._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.25._orig_mod.attention.wq.weight +2025-10-24 09:18:10,735 - root - INFO - Decay weight: layers.25._orig_mod.attention.wk.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.25._orig_mod.attention.wv.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.25._orig_mod.attention.wo.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.25._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.25._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.25._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,736 - root - INFO - Nodecay weight: layers.25._orig_mod.attention_norm.weight +2025-10-24 09:18:10,736 - root - INFO - Nodecay weight: layers.25._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.26._orig_mod.attention.wq.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.26._orig_mod.attention.wk.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.26._orig_mod.attention.wv.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.26._orig_mod.attention.wo.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.26._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.26._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.26._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,736 - root - INFO - Nodecay weight: layers.26._orig_mod.attention_norm.weight +2025-10-24 09:18:10,736 - root - INFO - Nodecay weight: layers.26._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.27._orig_mod.attention.wq.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.27._orig_mod.attention.wk.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.27._orig_mod.attention.wv.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.27._orig_mod.attention.wo.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.27._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.27._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.27._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,736 - root - INFO - Nodecay weight: layers.27._orig_mod.attention_norm.weight +2025-10-24 09:18:10,736 - root - INFO - Nodecay weight: layers.27._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.28._orig_mod.attention.wq.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.28._orig_mod.attention.wk.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.28._orig_mod.attention.wv.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.28._orig_mod.attention.wo.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.28._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.28._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.28._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,736 - root - INFO - Nodecay weight: layers.28._orig_mod.attention_norm.weight +2025-10-24 09:18:10,736 - root - INFO - Nodecay weight: layers.28._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.29._orig_mod.attention.wq.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.29._orig_mod.attention.wk.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.29._orig_mod.attention.wv.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.29._orig_mod.attention.wo.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.29._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.29._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.29._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,736 - root - INFO - Nodecay weight: layers.29._orig_mod.attention_norm.weight +2025-10-24 09:18:10,736 - root - INFO - Nodecay weight: layers.29._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.30._orig_mod.attention.wq.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.30._orig_mod.attention.wk.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.30._orig_mod.attention.wv.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.30._orig_mod.attention.wo.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.30._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.30._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.30._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,736 - root - INFO - Nodecay weight: layers.30._orig_mod.attention_norm.weight +2025-10-24 09:18:10,736 - root - INFO - Nodecay weight: layers.30._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.31._orig_mod.attention.wq.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.31._orig_mod.attention.wk.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.31._orig_mod.attention.wv.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.31._orig_mod.attention.wo.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.31._orig_mod.feed_forward.w1.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.31._orig_mod.feed_forward.w2.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: layers.31._orig_mod.feed_forward.w3.weight +2025-10-24 09:18:10,736 - root - INFO - Nodecay weight: layers.31._orig_mod.attention_norm.weight +2025-10-24 09:18:10,736 - root - INFO - Nodecay weight: layers.31._orig_mod.ffn_norm.weight +2025-10-24 09:18:10,736 - root - INFO - Nodecay weight: norm.weight +2025-10-24 09:18:10,736 - root - INFO - Decay weight: output.weight +2025-10-24 09:18:11,299 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:18:11,305 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:18:11,306 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:18:11,349 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:18:11,356 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:18:11,357 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:18:11,360 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:18:11,364 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:18:11,397 - root - INFO - Forcing load from /work/training/maester/comma-v0.1-2t-dcp/ +2025-10-24 09:18:11,397 - root - INFO - Forcing load from /work/training/maester/comma-v0.1-2t-dcp/ +2025-10-24 09:18:11,423 - root - INFO - Forcing load from /work/training/maester/comma-v0.1-2t-dcp/ +2025-10-24 09:18:11,423 - root - INFO - Forcing load from /work/training/maester/comma-v0.1-2t-dcp/ +2025-10-24 09:18:11,424 - root - INFO - Forcing load from /work/training/maester/comma-v0.1-2t-dcp/ +2025-10-24 09:18:11,424 - root - INFO - Forcing load from /work/training/maester/comma-v0.1-2t-dcp/ +2025-10-24 09:18:11,424 - root - INFO - Forcing load from /work/training/maester/comma-v0.1-2t-dcp/ +2025-10-24 09:18:11,424 - root - INFO - Forcing load from /work/training/maester/comma-v0.1-2t-dcp/ +2025-10-24 09:18:19,420 - root - INFO - Loaded model-only checkpoint from forced path in 8.00 seconds +2025-10-24 09:18:19,420 - root - INFO - Loaded model-only checkpoint from forced path in 8.02 seconds +2025-10-24 09:18:19,420 - root - INFO - Loaded model-only checkpoint from forced path in 8.00 seconds +2025-10-24 09:18:19,420 - root - INFO - Loaded model-only checkpoint from forced path in 8.00 seconds +2025-10-24 09:18:19,420 - root - INFO - Loaded model-only checkpoint from forced path in 8.02 seconds +2025-10-24 09:18:19,420 - root - INFO - Loaded model-only checkpoint from forced path in 8.00 seconds +2025-10-24 09:18:19,421 - root - INFO - Loaded model-only checkpoint from forced path in 8.00 seconds +2025-10-24 09:18:19,423 - root - INFO - Loaded model-only checkpoint from forced path in 8.00 seconds +2025-10-24 09:18:19,443 - root - INFO - Training starts at step 0 +2025-10-24 09:18:19,444 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:18:19,444 - root - INFO - Training starts at step 0 +2025-10-24 09:18:19,444 - root - INFO - Training starts at step 0 +2025-10-24 09:18:19,444 - root - INFO - Training starts at step 0 +2025-10-24 09:18:19,444 - root - INFO - Training starts at step 0 +2025-10-24 09:18:19,444 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:18:19,444 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:18:19,444 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:18:19,444 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:18:19,444 - root - INFO - Training starts at step 0 +2025-10-24 09:18:19,445 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:18:19,453 - root - INFO - Training starts at step 0 +2025-10-24 09:18:19,453 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:18:19,454 - root - INFO - Training starts at step 0 +2025-10-24 09:18:19,454 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:18:19,658 - root - INFO - ParquetDataset: entering epoch 0 +2025-10-24 09:18:19,658 - root - INFO - Worker 0 opening new file /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +2025-10-24 09:18:54,256 - root - INFO - Step 1: lr=2.00E-08, loss= 2.1574 (max= 2.6790), tps=941, mfu=1.96%, memory: 146.36GiB(82.06%) time/data_loading=4.78s (max=5.34s, 15.33%) +2025-10-24 09:18:54,256 - root - INFO - Step 1: lr=2.00E-08, loss= 2.1574 (max= 2.6790), tps=941, mfu=1.96%, memory: 146.36GiB(82.06%) time/data_loading=4.78s (max=5.34s, 15.33%) +2025-10-24 09:18:54,256 - root - INFO - Step 1: lr=2.00E-08, loss= 2.1574 (max= 2.6790), tps=941, mfu=1.96%, memory: 146.36GiB(82.06%) time/data_loading=4.78s (max=5.34s, 15.33%) +2025-10-24 09:18:54,256 - root - INFO - Step 1: lr=2.00E-08, loss= 2.1574 (max= 2.6790), tps=941, mfu=1.96%, memory: 146.36GiB(82.06%) time/data_loading=4.78s (max=5.34s, 15.33%) +2025-10-24 09:18:54,256 - root - INFO - Step 1: lr=2.00E-08, loss= 2.1574 (max= 2.6790), tps=942, mfu=1.96%, memory: 146.36GiB(82.06%) time/data_loading=4.78s (max=5.34s, 15.33%) +2025-10-24 09:18:54,256 - root - INFO - Step 1: lr=2.00E-08, loss= 2.1574 (max= 2.6790), tps=941, mfu=1.96%, memory: 146.36GiB(82.06%) time/data_loading=4.78s (max=5.34s, 15.33%) +2025-10-24 09:18:54,257 - root - INFO - Step 1: lr=2.00E-08, loss= 2.1574 (max= 2.6790), tps=941, mfu=1.96%, memory: 146.36GiB(82.06%) time/data_loading=4.78s (max=5.34s, 15.33%) +2025-10-24 09:18:54,257 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 +2025-10-24 09:18:54,257 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 +2025-10-24 09:18:54,257 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 +2025-10-24 09:18:54,257 - root - INFO - Step 1: lr=2.00E-08, loss= 2.1574 (max= 2.6790), tps=942, mfu=1.96%, memory: 146.36GiB(82.06%) time/data_loading=4.78s (max=5.34s, 15.33%) +2025-10-24 09:18:54,257 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 +2025-10-24 09:18:54,257 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 +2025-10-24 09:18:54,257 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 +2025-10-24 09:18:54,257 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 +2025-10-24 09:18:54,257 - root - INFO - Synchronizing and adjusting timeout for all ProcessGroups to 0:01:40 +2025-10-24 09:19:08,540 - root - INFO - Step 10: lr=1.10E-07, loss= 2.2783 (max= 3.3495), tps=20651, mfu=43.03%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:08,540 - root - INFO - Step 10: lr=1.10E-07, loss= 2.2783 (max= 3.3495), tps=20651, mfu=43.03%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:08,541 - root - INFO - Step 10: lr=1.10E-07, loss= 2.2783 (max= 3.3495), tps=20651, mfu=43.03%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:08,541 - root - INFO - Step 10: lr=1.10E-07, loss= 2.2783 (max= 3.3495), tps=20651, mfu=43.03%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:08,541 - root - INFO - Step 10: lr=1.10E-07, loss= 2.2783 (max= 3.3495), tps=20651, mfu=43.03%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:08,541 - root - INFO - Step 10: lr=1.10E-07, loss= 2.2783 (max= 3.3495), tps=20651, mfu=43.03%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:08,541 - root - INFO - Step 10: lr=1.10E-07, loss= 2.2783 (max= 3.3495), tps=20651, mfu=43.03%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:08,541 - root - INFO - Step 10: lr=1.10E-07, loss= 2.2783 (max= 3.3495), tps=20651, mfu=43.03%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:08,619 - root - INFO - Dumping traces at step 10 +2025-10-24 09:19:08,624 - root - INFO - Dumping traces at step 10 +2025-10-24 09:19:08,625 - root - INFO - Dumping traces at step 10 +2025-10-24 09:19:08,627 - root - INFO - Dumping traces at step 10 +2025-10-24 09:19:08,629 - root - INFO - Dumping traces at step 10 +2025-10-24 09:19:08,636 - root - INFO - Dumping traces at step 10 +2025-10-24 09:19:08,639 - root - INFO - Dumping traces at step 10 +2025-10-24 09:19:08,639 - root - INFO - Dumping traces at step 10 +2025-10-24 09:19:08,673 - root - INFO - Finished dumping traces in 0.05 seconds +2025-10-24 09:19:08,674 - root - INFO - Finished dumping traces in 0.05 seconds +2025-10-24 09:19:08,677 - root - INFO - Finished dumping traces in 0.05 seconds +2025-10-24 09:19:08,679 - root - INFO - Finished dumping traces in 0.05 seconds +2025-10-24 09:19:08,680 - root - INFO - Finished dumping traces in 0.05 seconds +2025-10-24 09:19:08,690 - root - INFO - Finished dumping traces in 0.05 seconds +2025-10-24 09:19:08,690 - root - INFO - Finished dumping traces in 0.05 seconds +2025-10-24 09:19:08,690 - root - INFO - Finished dumping traces in 0.05 seconds +2025-10-24 09:19:24,616 - root - INFO - Step 20: lr=2.10E-07, loss= 2.2349 (max= 3.5130), tps=20389, mfu=42.48%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:24,616 - root - INFO - Step 20: lr=2.10E-07, loss= 2.2349 (max= 3.5130), tps=20389, mfu=42.48%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:24,616 - root - INFO - Step 20: lr=2.10E-07, loss= 2.2349 (max= 3.5130), tps=20389, mfu=42.48%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:24,616 - root - INFO - Step 20: lr=2.10E-07, loss= 2.2349 (max= 3.5130), tps=20389, mfu=42.48%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:24,616 - root - INFO - Step 20: lr=2.10E-07, loss= 2.2349 (max= 3.5130), tps=20389, mfu=42.48%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:24,616 - root - INFO - Step 20: lr=2.10E-07, loss= 2.2349 (max= 3.5130), tps=20389, mfu=42.48%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:24,616 - root - INFO - Step 20: lr=2.10E-07, loss= 2.2349 (max= 3.5130), tps=20389, mfu=42.48%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:24,616 - root - INFO - Step 20: lr=2.10E-07, loss= 2.2349 (max= 3.5130), tps=20389, mfu=42.48%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:24,668 - root - INFO - Dumping traces at step 20 +2025-10-24 09:19:24,668 - root - INFO - Dumping traces at step 20 +2025-10-24 09:19:24,668 - root - INFO - Dumping traces at step 20 +2025-10-24 09:19:24,668 - root - INFO - Dumping traces at step 20 +2025-10-24 09:19:24,672 - root - INFO - Dumping traces at step 20 +2025-10-24 09:19:24,672 - root - INFO - Dumping traces at step 20 +2025-10-24 09:19:24,674 - root - INFO - Dumping traces at step 20 +2025-10-24 09:19:24,676 - root - INFO - Dumping traces at step 20 +2025-10-24 09:19:24,722 - root - INFO - Finished dumping traces in 0.05 seconds +2025-10-24 09:19:24,726 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 09:19:24,727 - root - INFO - Finished dumping traces in 0.05 seconds +2025-10-24 09:19:24,727 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 09:19:24,727 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 09:19:24,730 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 09:19:24,731 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 09:19:24,733 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 09:19:40,695 - root - INFO - Step 30: lr=3.10E-07, loss= 2.2319 (max= 3.5391), tps=20383, mfu=42.47%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:40,695 - root - INFO - Step 30: lr=3.10E-07, loss= 2.2319 (max= 3.5391), tps=20383, mfu=42.47%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:40,695 - root - INFO - Step 30: lr=3.10E-07, loss= 2.2319 (max= 3.5391), tps=20383, mfu=42.47%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:40,695 - root - INFO - Step 30: lr=3.10E-07, loss= 2.2319 (max= 3.5391), tps=20383, mfu=42.47%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:40,696 - root - INFO - Step 30: lr=3.10E-07, loss= 2.2319 (max= 3.5391), tps=20383, mfu=42.47%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:40,696 - root - INFO - Step 30: lr=3.10E-07, loss= 2.2319 (max= 3.5391), tps=20383, mfu=42.47%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:40,696 - root - INFO - Step 30: lr=3.10E-07, loss= 2.2319 (max= 3.5391), tps=20383, mfu=42.47%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:40,696 - root - INFO - Step 30: lr=3.10E-07, loss= 2.2319 (max= 3.5391), tps=20384, mfu=42.47%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:40,750 - root - INFO - Dumping traces at step 30 +2025-10-24 09:19:40,750 - root - INFO - Dumping traces at step 30 +2025-10-24 09:19:40,750 - root - INFO - Dumping traces at step 30 +2025-10-24 09:19:40,750 - root - INFO - Dumping traces at step 30 +2025-10-24 09:19:40,753 - root - INFO - Dumping traces at step 30 +2025-10-24 09:19:40,753 - root - INFO - Dumping traces at step 30 +2025-10-24 09:19:40,753 - root - INFO - Dumping traces at step 30 +2025-10-24 09:19:40,756 - root - INFO - Dumping traces at step 30 +2025-10-24 09:19:40,804 - root - INFO - Finished dumping traces in 0.05 seconds +2025-10-24 09:19:40,806 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 09:19:40,806 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 09:19:40,809 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 09:19:40,811 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 09:19:40,815 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 09:19:40,816 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 09:19:40,820 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 09:19:56,767 - root - INFO - Step 40: lr=4.10E-07, loss= 2.2058 (max= 3.3490), tps=20393, mfu=42.49%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:56,767 - root - INFO - Step 40: lr=4.10E-07, loss= 2.2058 (max= 3.3490), tps=20393, mfu=42.49%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:56,767 - root - INFO - Step 40: lr=4.10E-07, loss= 2.2058 (max= 3.3490), tps=20393, mfu=42.49%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:56,767 - root - INFO - Step 40: lr=4.10E-07, loss= 2.2058 (max= 3.3490), tps=20393, mfu=42.49%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:56,768 - root - INFO - Step 40: lr=4.10E-07, loss= 2.2058 (max= 3.3490), tps=20393, mfu=42.49%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:56,768 - root - INFO - Step 40: lr=4.10E-07, loss= 2.2058 (max= 3.3490), tps=20393, mfu=42.49%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:56,768 - root - INFO - Step 40: lr=4.10E-07, loss= 2.2058 (max= 3.3490), tps=20394, mfu=42.49%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:19:56,768 - root - INFO - Step 40: lr=4.10E-07, loss= 2.2058 (max= 3.3490), tps=20393, mfu=42.49%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:20:12,732 - root - INFO - Step 50: lr=5.10E-07, loss= 2.2516 (max= 3.1370), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:12,732 - root - INFO - Step 50: lr=5.10E-07, loss= 2.2516 (max= 3.1370), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:12,733 - root - INFO - Step 50: lr=5.10E-07, loss= 2.2516 (max= 3.1370), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:12,733 - root - INFO - Step 50: lr=5.10E-07, loss= 2.2516 (max= 3.1370), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:12,733 - root - INFO - Step 50: lr=5.10E-07, loss= 2.2516 (max= 3.1370), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:12,733 - root - INFO - Step 50: lr=5.10E-07, loss= 2.2516 (max= 3.1370), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:12,733 - root - INFO - Step 50: lr=5.10E-07, loss= 2.2516 (max= 3.1370), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:12,733 - root - INFO - Step 50: lr=5.10E-07, loss= 2.2516 (max= 3.1370), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:28,764 - root - INFO - Step 60: lr=6.10E-07, loss= 2.1957 (max= 2.7201), tps=20444, mfu=42.60%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:28,764 - root - INFO - Step 60: lr=6.10E-07, loss= 2.1957 (max= 2.7201), tps=20444, mfu=42.60%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:28,764 - root - INFO - Step 60: lr=6.10E-07, loss= 2.1957 (max= 2.7201), tps=20444, mfu=42.60%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:28,764 - root - INFO - Step 60: lr=6.10E-07, loss= 2.1957 (max= 2.7201), tps=20444, mfu=42.60%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:28,764 - root - INFO - Step 60: lr=6.10E-07, loss= 2.1957 (max= 2.7201), tps=20444, mfu=42.60%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:28,764 - root - INFO - Step 60: lr=6.10E-07, loss= 2.1957 (max= 2.7201), tps=20444, mfu=42.60%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:28,764 - root - INFO - Step 60: lr=6.10E-07, loss= 2.1957 (max= 2.7201), tps=20444, mfu=42.60%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:28,764 - root - INFO - Step 60: lr=6.10E-07, loss= 2.1957 (max= 2.7201), tps=20445, mfu=42.60%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:44,703 - root - INFO - Step 70: lr=7.10E-07, loss= 2.1892 (max= 2.8706), tps=20562, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:44,703 - root - INFO - Step 70: lr=7.10E-07, loss= 2.1892 (max= 2.8706), tps=20562, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:44,703 - root - INFO - Step 70: lr=7.10E-07, loss= 2.1892 (max= 2.8706), tps=20562, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:44,703 - root - INFO - Step 70: lr=7.10E-07, loss= 2.1892 (max= 2.8706), tps=20562, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:44,703 - root - INFO - Step 70: lr=7.10E-07, loss= 2.1892 (max= 2.8706), tps=20562, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:44,704 - root - INFO - Step 70: lr=7.10E-07, loss= 2.1892 (max= 2.8706), tps=20562, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:44,704 - root - INFO - Step 70: lr=7.10E-07, loss= 2.1892 (max= 2.8706), tps=20562, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:20:44,704 - root - INFO - Step 70: lr=7.10E-07, loss= 2.1892 (max= 2.8706), tps=20563, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:00,698 - root - INFO - Step 80: lr=8.10E-07, loss= 2.1393 (max= 2.7737), tps=20491, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:00,698 - root - INFO - Step 80: lr=8.10E-07, loss= 2.1393 (max= 2.7737), tps=20491, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:00,698 - root - INFO - Step 80: lr=8.10E-07, loss= 2.1393 (max= 2.7737), tps=20491, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:00,698 - root - INFO - Step 80: lr=8.10E-07, loss= 2.1393 (max= 2.7737), tps=20491, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:00,698 - root - INFO - Step 80: lr=8.10E-07, loss= 2.1393 (max= 2.7737), tps=20491, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:00,699 - root - INFO - Step 80: lr=8.10E-07, loss= 2.1393 (max= 2.7737), tps=20491, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:00,699 - root - INFO - Step 80: lr=8.10E-07, loss= 2.1393 (max= 2.7737), tps=20491, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:00,699 - root - INFO - Step 80: lr=8.10E-07, loss= 2.1393 (max= 2.7737), tps=20491, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:16,643 - root - INFO - Step 90: lr=9.10E-07, loss= 2.1636 (max= 2.6520), tps=20556, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:16,643 - root - INFO - Step 90: lr=9.10E-07, loss= 2.1636 (max= 2.6520), tps=20556, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:16,643 - root - INFO - Step 90: lr=9.10E-07, loss= 2.1636 (max= 2.6520), tps=20556, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:16,643 - root - INFO - Step 90: lr=9.10E-07, loss= 2.1636 (max= 2.6520), tps=20556, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:16,643 - root - INFO - Step 90: lr=9.10E-07, loss= 2.1636 (max= 2.6520), tps=20556, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:16,643 - root - INFO - Step 90: lr=9.10E-07, loss= 2.1636 (max= 2.6520), tps=20556, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:16,643 - root - INFO - Step 90: lr=9.10E-07, loss= 2.1636 (max= 2.6520), tps=20556, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:16,643 - root - INFO - Step 90: lr=9.10E-07, loss= 2.1636 (max= 2.6520), tps=20556, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:32,610 - root - INFO - Step 100: lr=1.01E-06, loss= 2.1499 (max= 2.7327), tps=20527, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:32,610 - root - INFO - Step 100: lr=1.01E-06, loss= 2.1499 (max= 2.7327), tps=20527, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:32,610 - root - INFO - Step 100: lr=1.01E-06, loss= 2.1499 (max= 2.7327), tps=20527, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:32,610 - root - INFO - Step 100: lr=1.01E-06, loss= 2.1499 (max= 2.7327), tps=20527, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:32,610 - root - INFO - Step 100: lr=1.01E-06, loss= 2.1499 (max= 2.7327), tps=20527, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:32,610 - root - INFO - Step 100: lr=1.01E-06, loss= 2.1499 (max= 2.7327), tps=20527, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:32,610 - root - INFO - Step 100: lr=1.01E-06, loss= 2.1499 (max= 2.7327), tps=20527, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:32,610 - root - INFO - Step 100: lr=1.01E-06, loss= 2.1499 (max= 2.7327), tps=20528, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:21:48,573 - root - INFO - Step 110: lr=1.11E-06, loss= 2.0969 (max= 2.6265), tps=20532, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:21:48,573 - root - INFO - Step 110: lr=1.11E-06, loss= 2.0969 (max= 2.6265), tps=20532, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:21:48,573 - root - INFO - Step 110: lr=1.11E-06, loss= 2.0969 (max= 2.6265), tps=20532, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:21:48,573 - root - INFO - Step 110: lr=1.11E-06, loss= 2.0969 (max= 2.6265), tps=20532, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:21:48,573 - root - INFO - Step 110: lr=1.11E-06, loss= 2.0969 (max= 2.6265), tps=20532, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:21:48,573 - root - INFO - Step 110: lr=1.11E-06, loss= 2.0969 (max= 2.6265), tps=20532, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:21:48,573 - root - INFO - Step 110: lr=1.11E-06, loss= 2.0969 (max= 2.6265), tps=20532, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:21:48,573 - root - INFO - Step 110: lr=1.11E-06, loss= 2.0969 (max= 2.6265), tps=20532, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:22:04,538 - root - INFO - Step 120: lr=1.21E-06, loss= 2.1002 (max= 2.8394), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:22:04,538 - root - INFO - Step 120: lr=1.21E-06, loss= 2.1002 (max= 2.8394), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:22:04,538 - root - INFO - Step 120: lr=1.21E-06, loss= 2.1002 (max= 2.8394), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:22:04,538 - root - INFO - Step 120: lr=1.21E-06, loss= 2.1002 (max= 2.8394), tps=20530, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:22:04,538 - root - INFO - Step 120: lr=1.21E-06, loss= 2.1002 (max= 2.8394), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:22:04,538 - root - INFO - Step 120: lr=1.21E-06, loss= 2.1002 (max= 2.8394), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:22:04,538 - root - INFO - Step 120: lr=1.21E-06, loss= 2.1002 (max= 2.8394), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:22:04,538 - root - INFO - Step 120: lr=1.21E-06, loss= 2.1002 (max= 2.8394), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:22:20,481 - root - INFO - Step 130: lr=1.31E-06, loss= 2.0874 (max= 2.5793), tps=20557, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:22:20,481 - root - INFO - Step 130: lr=1.31E-06, loss= 2.0874 (max= 2.5793), tps=20557, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:22:20,481 - root - INFO - Step 130: lr=1.31E-06, loss= 2.0874 (max= 2.5793), tps=20557, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:22:20,481 - root - INFO - Step 130: lr=1.31E-06, loss= 2.0874 (max= 2.5793), tps=20557, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:22:20,481 - root - INFO - Step 130: lr=1.31E-06, loss= 2.0874 (max= 2.5793), tps=20557, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:22:20,481 - root - INFO - Step 130: lr=1.31E-06, loss= 2.0874 (max= 2.5793), tps=20557, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:22:20,481 - root - INFO - Step 130: lr=1.31E-06, loss= 2.0874 (max= 2.5793), tps=20558, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:22:20,481 - root - INFO - Step 130: lr=1.31E-06, loss= 2.0874 (max= 2.5793), tps=20558, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:22:36,436 - root - INFO - Step 140: lr=1.41E-06, loss= 2.1086 (max= 2.7948), tps=20543, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:22:36,436 - root - INFO - Step 140: lr=1.41E-06, loss= 2.1086 (max= 2.7948), tps=20543, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:22:36,436 - root - INFO - Step 140: lr=1.41E-06, loss= 2.1086 (max= 2.7948), tps=20543, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:22:36,436 - root - INFO - Step 140: lr=1.41E-06, loss= 2.1086 (max= 2.7948), tps=20543, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:22:36,436 - root - INFO - Step 140: lr=1.41E-06, loss= 2.1086 (max= 2.7948), tps=20543, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:22:36,436 - root - INFO - Step 140: lr=1.41E-06, loss= 2.1086 (max= 2.7948), tps=20543, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:22:36,436 - root - INFO - Step 140: lr=1.41E-06, loss= 2.1086 (max= 2.7948), tps=20542, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:22:36,436 - root - INFO - Step 140: lr=1.41E-06, loss= 2.1086 (max= 2.7948), tps=20543, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:22:52,388 - root - INFO - Step 150: lr=1.51E-06, loss= 2.0520 (max= 2.5698), tps=20545, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:22:52,388 - root - INFO - Step 150: lr=1.51E-06, loss= 2.0520 (max= 2.5698), tps=20545, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:22:52,388 - root - INFO - Step 150: lr=1.51E-06, loss= 2.0520 (max= 2.5698), tps=20545, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:22:52,388 - root - INFO - Step 150: lr=1.51E-06, loss= 2.0520 (max= 2.5698), tps=20545, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:22:52,388 - root - INFO - Step 150: lr=1.51E-06, loss= 2.0520 (max= 2.5698), tps=20545, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:22:52,388 - root - INFO - Step 150: lr=1.51E-06, loss= 2.0520 (max= 2.5698), tps=20546, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:22:52,388 - root - INFO - Step 150: lr=1.51E-06, loss= 2.0520 (max= 2.5698), tps=20546, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:22:52,388 - root - INFO - Step 150: lr=1.51E-06, loss= 2.0520 (max= 2.5698), tps=20546, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:23:08,374 - root - INFO - Step 160: lr=1.61E-06, loss= 2.0082 (max= 2.5952), tps=20502, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:08,374 - root - INFO - Step 160: lr=1.61E-06, loss= 2.0082 (max= 2.5952), tps=20502, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:08,374 - root - INFO - Step 160: lr=1.61E-06, loss= 2.0082 (max= 2.5952), tps=20502, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:08,374 - root - INFO - Step 160: lr=1.61E-06, loss= 2.0082 (max= 2.5952), tps=20502, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:08,374 - root - INFO - Step 160: lr=1.61E-06, loss= 2.0082 (max= 2.5952), tps=20502, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:08,374 - root - INFO - Step 160: lr=1.61E-06, loss= 2.0082 (max= 2.5952), tps=20502, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:08,375 - root - INFO - Step 160: lr=1.61E-06, loss= 2.0082 (max= 2.5952), tps=20502, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:08,375 - root - INFO - Step 160: lr=1.61E-06, loss= 2.0082 (max= 2.5952), tps=20502, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:17,128 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:6602672 +2025-10-24 09:23:24,335 - root - INFO - Step 170: lr=1.71E-06, loss= 2.0238 (max= 2.7102), tps=20534, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:23:24,335 - root - INFO - Step 170: lr=1.71E-06, loss= 2.0238 (max= 2.7102), tps=20534, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:23:24,335 - root - INFO - Step 170: lr=1.71E-06, loss= 2.0238 (max= 2.7102), tps=20534, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:23:24,335 - root - INFO - Step 170: lr=1.71E-06, loss= 2.0238 (max= 2.7102), tps=20534, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:23:24,335 - root - INFO - Step 170: lr=1.71E-06, loss= 2.0238 (max= 2.7102), tps=20534, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:23:24,335 - root - INFO - Step 170: lr=1.71E-06, loss= 2.0238 (max= 2.7102), tps=20534, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:23:24,336 - root - INFO - Step 170: lr=1.71E-06, loss= 2.0238 (max= 2.7102), tps=20534, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:23:24,336 - root - INFO - Step 170: lr=1.71E-06, loss= 2.0238 (max= 2.7102), tps=20535, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:23:40,371 - root - INFO - Step 180: lr=1.81E-06, loss= 1.9973 (max= 2.4340), tps=20439, mfu=42.59%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:40,371 - root - INFO - Step 180: lr=1.81E-06, loss= 1.9973 (max= 2.4340), tps=20439, mfu=42.59%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:40,371 - root - INFO - Step 180: lr=1.81E-06, loss= 1.9973 (max= 2.4340), tps=20439, mfu=42.59%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:40,371 - root - INFO - Step 180: lr=1.81E-06, loss= 1.9973 (max= 2.4340), tps=20439, mfu=42.59%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:40,371 - root - INFO - Step 180: lr=1.81E-06, loss= 1.9973 (max= 2.4340), tps=20439, mfu=42.58%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:40,371 - root - INFO - Step 180: lr=1.81E-06, loss= 1.9973 (max= 2.4340), tps=20439, mfu=42.58%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:40,371 - root - INFO - Step 180: lr=1.81E-06, loss= 1.9973 (max= 2.4340), tps=20439, mfu=42.59%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:40,371 - root - INFO - Step 180: lr=1.81E-06, loss= 1.9973 (max= 2.4340), tps=20439, mfu=42.59%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:56,338 - root - INFO - Step 190: lr=1.91E-06, loss= 1.9829 (max= 2.5251), tps=20527, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:56,338 - root - INFO - Step 190: lr=1.91E-06, loss= 1.9829 (max= 2.5251), tps=20527, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:56,338 - root - INFO - Step 190: lr=1.91E-06, loss= 1.9829 (max= 2.5251), tps=20527, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:56,338 - root - INFO - Step 190: lr=1.91E-06, loss= 1.9829 (max= 2.5251), tps=20527, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:56,338 - root - INFO - Step 190: lr=1.91E-06, loss= 1.9829 (max= 2.5251), tps=20527, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:56,338 - root - INFO - Step 190: lr=1.91E-06, loss= 1.9829 (max= 2.5251), tps=20527, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:56,338 - root - INFO - Step 190: lr=1.91E-06, loss= 1.9829 (max= 2.5251), tps=20527, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:23:56,338 - root - INFO - Step 190: lr=1.91E-06, loss= 1.9829 (max= 2.5251), tps=20527, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:24:12,300 - root - INFO - Step 200: lr=2.01E-06, loss= 1.9265 (max= 2.5550), tps=20533, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:12,300 - root - INFO - Step 200: lr=2.01E-06, loss= 1.9265 (max= 2.5550), tps=20533, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:12,300 - root - INFO - Step 200: lr=2.01E-06, loss= 1.9265 (max= 2.5550), tps=20533, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:12,300 - root - INFO - Step 200: lr=2.01E-06, loss= 1.9265 (max= 2.5550), tps=20533, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:12,300 - root - INFO - Step 200: lr=2.01E-06, loss= 1.9265 (max= 2.5550), tps=20533, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:12,300 - root - INFO - Step 200: lr=2.01E-06, loss= 1.9265 (max= 2.5550), tps=20532, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:12,300 - root - INFO - Step 200: lr=2.01E-06, loss= 1.9265 (max= 2.5550), tps=20533, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:12,301 - root - INFO - Step 200: lr=2.01E-06, loss= 1.9265 (max= 2.5550), tps=20533, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:28,317 - root - INFO - Step 210: lr=2.11E-06, loss= 1.9234 (max= 2.6031), tps=20463, mfu=42.63%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:28,317 - root - INFO - Step 210: lr=2.11E-06, loss= 1.9234 (max= 2.6031), tps=20463, mfu=42.63%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:28,317 - root - INFO - Step 210: lr=2.11E-06, loss= 1.9234 (max= 2.6031), tps=20463, mfu=42.63%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:28,317 - root - INFO - Step 210: lr=2.11E-06, loss= 1.9234 (max= 2.6031), tps=20463, mfu=42.63%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:28,317 - root - INFO - Step 210: lr=2.11E-06, loss= 1.9234 (max= 2.6031), tps=20463, mfu=42.64%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:28,317 - root - INFO - Step 210: lr=2.11E-06, loss= 1.9234 (max= 2.6031), tps=20463, mfu=42.63%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:28,317 - root - INFO - Step 210: lr=2.11E-06, loss= 1.9234 (max= 2.6031), tps=20463, mfu=42.64%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:28,317 - root - INFO - Step 210: lr=2.11E-06, loss= 1.9234 (max= 2.6031), tps=20463, mfu=42.64%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:44,301 - root - INFO - Step 220: lr=2.21E-06, loss= 1.9677 (max= 2.4126), tps=20505, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:44,301 - root - INFO - Step 220: lr=2.21E-06, loss= 1.9677 (max= 2.4126), tps=20505, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:44,301 - root - INFO - Step 220: lr=2.21E-06, loss= 1.9677 (max= 2.4126), tps=20505, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:44,301 - root - INFO - Step 220: lr=2.21E-06, loss= 1.9677 (max= 2.4126), tps=20505, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:44,301 - root - INFO - Step 220: lr=2.21E-06, loss= 1.9677 (max= 2.4126), tps=20505, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:44,301 - root - INFO - Step 220: lr=2.21E-06, loss= 1.9677 (max= 2.4126), tps=20505, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:44,301 - root - INFO - Step 220: lr=2.21E-06, loss= 1.9677 (max= 2.4126), tps=20505, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:24:44,301 - root - INFO - Step 220: lr=2.21E-06, loss= 1.9677 (max= 2.4126), tps=20505, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:25:00,298 - root - INFO - Step 230: lr=2.31E-06, loss= 1.9499 (max= 2.3514), tps=20487, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:25:00,298 - root - INFO - Step 230: lr=2.31E-06, loss= 1.9499 (max= 2.3514), tps=20487, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:25:00,298 - root - INFO - Step 230: lr=2.31E-06, loss= 1.9499 (max= 2.3514), tps=20487, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:25:00,298 - root - INFO - Step 230: lr=2.31E-06, loss= 1.9499 (max= 2.3514), tps=20487, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:25:00,299 - root - INFO - Step 230: lr=2.31E-06, loss= 1.9499 (max= 2.3514), tps=20487, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:25:00,299 - root - INFO - Step 230: lr=2.31E-06, loss= 1.9499 (max= 2.3514), tps=20487, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:25:00,299 - root - INFO - Step 230: lr=2.31E-06, loss= 1.9499 (max= 2.3514), tps=20488, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:25:00,299 - root - INFO - Step 230: lr=2.31E-06, loss= 1.9499 (max= 2.3514), tps=20488, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:25:16,237 - root - INFO - Step 240: lr=2.41E-06, loss= 1.9078 (max= 2.2498), tps=20563, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:25:16,237 - root - INFO - Step 240: lr=2.41E-06, loss= 1.9078 (max= 2.2498), tps=20563, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:25:16,237 - root - INFO - Step 240: lr=2.41E-06, loss= 1.9078 (max= 2.2498), tps=20563, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:25:16,237 - root - INFO - Step 240: lr=2.41E-06, loss= 1.9078 (max= 2.2498), tps=20563, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:25:16,237 - root - INFO - Step 240: lr=2.41E-06, loss= 1.9078 (max= 2.2498), tps=20563, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:25:16,237 - root - INFO - Step 240: lr=2.41E-06, loss= 1.9078 (max= 2.2498), tps=20563, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:25:16,238 - root - INFO - Step 240: lr=2.41E-06, loss= 1.9078 (max= 2.2498), tps=20563, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:25:16,238 - root - INFO - Step 240: lr=2.41E-06, loss= 1.9078 (max= 2.2498), tps=20563, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:25:32,226 - root - INFO - Step 250: lr=2.51E-06, loss= 1.8522 (max= 2.4450), tps=20498, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:25:32,226 - root - INFO - Step 250: lr=2.51E-06, loss= 1.8522 (max= 2.4450), tps=20498, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:25:32,226 - root - INFO - Step 250: lr=2.51E-06, loss= 1.8522 (max= 2.4450), tps=20498, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:25:32,227 - root - INFO - Step 250: lr=2.51E-06, loss= 1.8522 (max= 2.4450), tps=20498, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:25:32,227 - root - INFO - Step 250: lr=2.51E-06, loss= 1.8522 (max= 2.4450), tps=20498, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:25:32,227 - root - INFO - Step 250: lr=2.51E-06, loss= 1.8522 (max= 2.4450), tps=20498, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:25:32,227 - root - INFO - Step 250: lr=2.51E-06, loss= 1.8522 (max= 2.4450), tps=20498, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:25:32,227 - root - INFO - Step 250: lr=2.51E-06, loss= 1.8522 (max= 2.4450), tps=20498, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:25:48,181 - root - INFO - Step 260: lr=2.61E-06, loss= 1.8492 (max= 2.3701), tps=20542, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:25:48,181 - root - INFO - Step 260: lr=2.61E-06, loss= 1.8492 (max= 2.3701), tps=20542, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:25:48,181 - root - INFO - Step 260: lr=2.61E-06, loss= 1.8492 (max= 2.3701), tps=20542, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:25:48,181 - root - INFO - Step 260: lr=2.61E-06, loss= 1.8492 (max= 2.3701), tps=20542, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:25:48,182 - root - INFO - Step 260: lr=2.61E-06, loss= 1.8492 (max= 2.3701), tps=20542, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:25:48,182 - root - INFO - Step 260: lr=2.61E-06, loss= 1.8492 (max= 2.3701), tps=20542, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:25:48,182 - root - INFO - Step 260: lr=2.61E-06, loss= 1.8492 (max= 2.3701), tps=20542, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:25:48,182 - root - INFO - Step 260: lr=2.61E-06, loss= 1.8492 (max= 2.3701), tps=20542, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:26:04,160 - root - INFO - Step 270: lr=2.71E-06, loss= 1.8413 (max= 2.4652), tps=20512, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:04,160 - root - INFO - Step 270: lr=2.71E-06, loss= 1.8413 (max= 2.4652), tps=20512, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:04,160 - root - INFO - Step 270: lr=2.71E-06, loss= 1.8413 (max= 2.4652), tps=20512, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:04,160 - root - INFO - Step 270: lr=2.71E-06, loss= 1.8413 (max= 2.4652), tps=20512, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:04,160 - root - INFO - Step 270: lr=2.71E-06, loss= 1.8413 (max= 2.4652), tps=20512, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:04,160 - root - INFO - Step 270: lr=2.71E-06, loss= 1.8413 (max= 2.4652), tps=20512, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:04,160 - root - INFO - Step 270: lr=2.71E-06, loss= 1.8413 (max= 2.4652), tps=20513, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:04,160 - root - INFO - Step 270: lr=2.71E-06, loss= 1.8413 (max= 2.4652), tps=20513, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:20,154 - root - INFO - Step 280: lr=2.81E-06, loss= 1.8747 (max= 2.5246), tps=20492, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:20,154 - root - INFO - Step 280: lr=2.81E-06, loss= 1.8747 (max= 2.5246), tps=20492, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:20,154 - root - INFO - Step 280: lr=2.81E-06, loss= 1.8747 (max= 2.5246), tps=20492, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:20,154 - root - INFO - Step 280: lr=2.81E-06, loss= 1.8747 (max= 2.5246), tps=20492, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:20,154 - root - INFO - Step 280: lr=2.81E-06, loss= 1.8747 (max= 2.5246), tps=20492, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:20,154 - root - INFO - Step 280: lr=2.81E-06, loss= 1.8747 (max= 2.5246), tps=20492, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:20,154 - root - INFO - Step 280: lr=2.81E-06, loss= 1.8747 (max= 2.5246), tps=20492, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:20,154 - root - INFO - Step 280: lr=2.81E-06, loss= 1.8747 (max= 2.5246), tps=20492, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:36,135 - root - INFO - Step 290: lr=2.91E-06, loss= 1.8316 (max= 2.3769), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:36,135 - root - INFO - Step 290: lr=2.91E-06, loss= 1.8316 (max= 2.3769), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:36,135 - root - INFO - Step 290: lr=2.91E-06, loss= 1.8316 (max= 2.3769), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:36,135 - root - INFO - Step 290: lr=2.91E-06, loss= 1.8316 (max= 2.3769), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:36,136 - root - INFO - Step 290: lr=2.91E-06, loss= 1.8316 (max= 2.3769), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:36,136 - root - INFO - Step 290: lr=2.91E-06, loss= 1.8316 (max= 2.3769), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:36,136 - root - INFO - Step 290: lr=2.91E-06, loss= 1.8316 (max= 2.3769), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:36,136 - root - INFO - Step 290: lr=2.91E-06, loss= 1.8316 (max= 2.3769), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:52,133 - root - INFO - Step 300: lr=3.01E-06, loss= 1.8308 (max= 2.3991), tps=20488, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:52,133 - root - INFO - Step 300: lr=3.01E-06, loss= 1.8308 (max= 2.3991), tps=20488, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:52,133 - root - INFO - Step 300: lr=3.01E-06, loss= 1.8308 (max= 2.3991), tps=20488, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:52,133 - root - INFO - Step 300: lr=3.01E-06, loss= 1.8308 (max= 2.3991), tps=20488, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:52,133 - root - INFO - Step 300: lr=3.01E-06, loss= 1.8308 (max= 2.3991), tps=20488, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:52,133 - root - INFO - Step 300: lr=3.01E-06, loss= 1.8308 (max= 2.3991), tps=20488, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:52,133 - root - INFO - Step 300: lr=3.01E-06, loss= 1.8308 (max= 2.3991), tps=20487, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:26:52,133 - root - INFO - Step 300: lr=3.01E-06, loss= 1.8308 (max= 2.3991), tps=20489, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:08,104 - root - INFO - Step 310: lr=3.11E-06, loss= 1.8085 (max= 2.3613), tps=20523, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:08,104 - root - INFO - Step 310: lr=3.11E-06, loss= 1.8085 (max= 2.3613), tps=20522, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:08,104 - root - INFO - Step 310: lr=3.11E-06, loss= 1.8085 (max= 2.3613), tps=20522, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:08,104 - root - INFO - Step 310: lr=3.11E-06, loss= 1.8085 (max= 2.3613), tps=20522, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:08,104 - root - INFO - Step 310: lr=3.11E-06, loss= 1.8085 (max= 2.3613), tps=20522, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:08,104 - root - INFO - Step 310: lr=3.11E-06, loss= 1.8085 (max= 2.3613), tps=20523, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:08,104 - root - INFO - Step 310: lr=3.11E-06, loss= 1.8085 (max= 2.3613), tps=20522, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:08,104 - root - INFO - Step 310: lr=3.11E-06, loss= 1.8085 (max= 2.3613), tps=20523, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:24,066 - root - INFO - Step 320: lr=3.21E-06, loss= 1.8057 (max= 2.2014), tps=20532, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:24,066 - root - INFO - Step 320: lr=3.21E-06, loss= 1.8057 (max= 2.2014), tps=20532, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:24,066 - root - INFO - Step 320: lr=3.21E-06, loss= 1.8057 (max= 2.2014), tps=20532, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:24,066 - root - INFO - Step 320: lr=3.21E-06, loss= 1.8057 (max= 2.2014), tps=20532, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:24,066 - root - INFO - Step 320: lr=3.21E-06, loss= 1.8057 (max= 2.2014), tps=20532, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:24,067 - root - INFO - Step 320: lr=3.21E-06, loss= 1.8057 (max= 2.2014), tps=20532, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:24,067 - root - INFO - Step 320: lr=3.21E-06, loss= 1.8057 (max= 2.2014), tps=20532, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:24,067 - root - INFO - Step 320: lr=3.21E-06, loss= 1.8057 (max= 2.2014), tps=20532, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:40,031 - root - INFO - Step 330: lr=3.31E-06, loss= 1.7491 (max= 2.1197), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:40,031 - root - INFO - Step 330: lr=3.31E-06, loss= 1.7491 (max= 2.1197), tps=20530, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:40,031 - root - INFO - Step 330: lr=3.31E-06, loss= 1.7491 (max= 2.1197), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:40,031 - root - INFO - Step 330: lr=3.31E-06, loss= 1.7491 (max= 2.1197), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:40,032 - root - INFO - Step 330: lr=3.31E-06, loss= 1.7491 (max= 2.1197), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:40,032 - root - INFO - Step 330: lr=3.31E-06, loss= 1.7491 (max= 2.1197), tps=20530, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:40,032 - root - INFO - Step 330: lr=3.31E-06, loss= 1.7491 (max= 2.1197), tps=20530, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:40,032 - root - INFO - Step 330: lr=3.31E-06, loss= 1.7491 (max= 2.1197), tps=20530, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:55,977 - root - INFO - Step 340: lr=3.41E-06, loss= 1.7538 (max= 2.3379), tps=20555, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:55,977 - root - INFO - Step 340: lr=3.41E-06, loss= 1.7538 (max= 2.3379), tps=20555, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:55,977 - root - INFO - Step 340: lr=3.41E-06, loss= 1.7538 (max= 2.3379), tps=20555, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:55,977 - root - INFO - Step 340: lr=3.41E-06, loss= 1.7538 (max= 2.3379), tps=20555, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:55,977 - root - INFO - Step 340: lr=3.41E-06, loss= 1.7538 (max= 2.3379), tps=20555, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:55,977 - root - INFO - Step 340: lr=3.41E-06, loss= 1.7538 (max= 2.3379), tps=20555, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:55,978 - root - INFO - Step 340: lr=3.41E-06, loss= 1.7538 (max= 2.3379), tps=20555, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:27:55,978 - root - INFO - Step 340: lr=3.41E-06, loss= 1.7538 (max= 2.3379), tps=20554, mfu=42.83%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:28:11,957 - root - INFO - Step 350: lr=3.51E-06, loss= 1.7666 (max= 2.3258), tps=20510, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:28:11,957 - root - INFO - Step 350: lr=3.51E-06, loss= 1.7666 (max= 2.3258), tps=20510, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:28:11,957 - root - INFO - Step 350: lr=3.51E-06, loss= 1.7666 (max= 2.3258), tps=20510, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:28:11,957 - root - INFO - Step 350: lr=3.51E-06, loss= 1.7666 (max= 2.3258), tps=20510, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:28:11,957 - root - INFO - Step 350: lr=3.51E-06, loss= 1.7666 (max= 2.3258), tps=20510, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:28:11,957 - root - INFO - Step 350: lr=3.51E-06, loss= 1.7666 (max= 2.3258), tps=20510, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:28:11,958 - root - INFO - Step 350: lr=3.51E-06, loss= 1.7666 (max= 2.3258), tps=20510, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:28:11,958 - root - INFO - Step 350: lr=3.51E-06, loss= 1.7666 (max= 2.3258), tps=20510, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:28:27,992 - root - INFO - Step 360: lr=3.61E-06, loss= 1.7095 (max= 2.2061), tps=20440, mfu=42.59%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:28:27,992 - root - INFO - Step 360: lr=3.61E-06, loss= 1.7095 (max= 2.2061), tps=20440, mfu=42.59%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:28:27,992 - root - INFO - Step 360: lr=3.61E-06, loss= 1.7095 (max= 2.2061), tps=20440, mfu=42.59%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:28:27,992 - root - INFO - Step 360: lr=3.61E-06, loss= 1.7095 (max= 2.2061), tps=20440, mfu=42.59%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:28:27,992 - root - INFO - Step 360: lr=3.61E-06, loss= 1.7095 (max= 2.2061), tps=20440, mfu=42.59%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:28:27,992 - root - INFO - Step 360: lr=3.61E-06, loss= 1.7095 (max= 2.2061), tps=20440, mfu=42.59%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:28:27,992 - root - INFO - Step 360: lr=3.61E-06, loss= 1.7095 (max= 2.2061), tps=20440, mfu=42.59%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:28:27,992 - root - INFO - Step 360: lr=3.61E-06, loss= 1.7095 (max= 2.2061), tps=20440, mfu=42.59%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:28:43,960 - root - INFO - Step 370: lr=3.71E-06, loss= 1.7296 (max= 2.3105), tps=20525, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:28:43,960 - root - INFO - Step 370: lr=3.71E-06, loss= 1.7296 (max= 2.3105), tps=20525, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:28:43,960 - root - INFO - Step 370: lr=3.71E-06, loss= 1.7296 (max= 2.3105), tps=20525, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:28:43,960 - root - INFO - Step 370: lr=3.71E-06, loss= 1.7296 (max= 2.3105), tps=20525, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:28:43,960 - root - INFO - Step 370: lr=3.71E-06, loss= 1.7296 (max= 2.3105), tps=20525, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:28:43,960 - root - INFO - Step 370: lr=3.71E-06, loss= 1.7296 (max= 2.3105), tps=20525, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:28:43,960 - root - INFO - Step 370: lr=3.71E-06, loss= 1.7296 (max= 2.3105), tps=20525, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:28:43,960 - root - INFO - Step 370: lr=3.71E-06, loss= 1.7296 (max= 2.3105), tps=20525, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:28:59,928 - root - INFO - Step 380: lr=3.81E-06, loss= 1.7133 (max= 2.1270), tps=20526, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:28:59,928 - root - INFO - Step 380: lr=3.81E-06, loss= 1.7133 (max= 2.1270), tps=20526, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:28:59,928 - root - INFO - Step 380: lr=3.81E-06, loss= 1.7133 (max= 2.1270), tps=20526, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:28:59,928 - root - INFO - Step 380: lr=3.81E-06, loss= 1.7133 (max= 2.1270), tps=20526, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:28:59,928 - root - INFO - Step 380: lr=3.81E-06, loss= 1.7133 (max= 2.1270), tps=20525, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:28:59,928 - root - INFO - Step 380: lr=3.81E-06, loss= 1.7133 (max= 2.1270), tps=20526, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:28:59,928 - root - INFO - Step 380: lr=3.81E-06, loss= 1.7133 (max= 2.1270), tps=20525, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:28:59,928 - root - INFO - Step 380: lr=3.81E-06, loss= 1.7133 (max= 2.1270), tps=20526, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:29:08,671 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:4017313 +2025-10-24 09:29:15,955 - root - INFO - Step 390: lr=3.91E-06, loss= 1.7035 (max= 2.5247), tps=20450, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:29:15,955 - root - INFO - Step 390: lr=3.91E-06, loss= 1.7035 (max= 2.5247), tps=20450, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:29:15,955 - root - INFO - Step 390: lr=3.91E-06, loss= 1.7035 (max= 2.5247), tps=20450, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:29:15,955 - root - INFO - Step 390: lr=3.91E-06, loss= 1.7035 (max= 2.5247), tps=20450, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:29:15,955 - root - INFO - Step 390: lr=3.91E-06, loss= 1.7035 (max= 2.5247), tps=20450, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:29:15,955 - root - INFO - Step 390: lr=3.91E-06, loss= 1.7035 (max= 2.5247), tps=20450, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:29:15,956 - root - INFO - Step 390: lr=3.91E-06, loss= 1.7035 (max= 2.5247), tps=20449, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:29:15,956 - root - INFO - Step 390: lr=3.91E-06, loss= 1.7035 (max= 2.5247), tps=20450, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:29:31,954 - root - INFO - Step 400: lr=4.01E-06, loss= 1.7031 (max= 2.3904), tps=20486, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:29:31,954 - root - INFO - Step 400: lr=4.01E-06, loss= 1.7031 (max= 2.3904), tps=20486, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:29:31,954 - root - INFO - Step 400: lr=4.01E-06, loss= 1.7031 (max= 2.3904), tps=20486, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:29:31,954 - root - INFO - Step 400: lr=4.01E-06, loss= 1.7031 (max= 2.3904), tps=20486, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:29:31,954 - root - INFO - Step 400: lr=4.01E-06, loss= 1.7031 (max= 2.3904), tps=20486, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:29:31,954 - root - INFO - Step 400: lr=4.01E-06, loss= 1.7031 (max= 2.3904), tps=20486, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:29:31,954 - root - INFO - Step 400: lr=4.01E-06, loss= 1.7031 (max= 2.3904), tps=20487, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:29:31,954 - root - INFO - Step 400: lr=4.01E-06, loss= 1.7031 (max= 2.3904), tps=20487, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:29:47,903 - root - INFO - Step 410: lr=4.11E-06, loss= 1.6603 (max= 2.2089), tps=20550, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:29:47,903 - root - INFO - Step 410: lr=4.11E-06, loss= 1.6603 (max= 2.2089), tps=20550, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:29:47,903 - root - INFO - Step 410: lr=4.11E-06, loss= 1.6603 (max= 2.2089), tps=20550, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:29:47,903 - root - INFO - Step 410: lr=4.11E-06, loss= 1.6603 (max= 2.2089), tps=20550, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:29:47,903 - root - INFO - Step 410: lr=4.11E-06, loss= 1.6603 (max= 2.2089), tps=20549, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:29:47,903 - root - INFO - Step 410: lr=4.11E-06, loss= 1.6603 (max= 2.2089), tps=20550, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:29:47,903 - root - INFO - Step 410: lr=4.11E-06, loss= 1.6603 (max= 2.2089), tps=20550, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:29:47,903 - root - INFO - Step 410: lr=4.11E-06, loss= 1.6603 (max= 2.2089), tps=20550, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:30:03,952 - root - INFO - Step 420: lr=4.21E-06, loss= 1.6907 (max= 2.0866), tps=20421, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:30:03,952 - root - INFO - Step 420: lr=4.21E-06, loss= 1.6907 (max= 2.0866), tps=20421, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:30:03,952 - root - INFO - Step 420: lr=4.21E-06, loss= 1.6907 (max= 2.0866), tps=20421, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:30:03,952 - root - INFO - Step 420: lr=4.21E-06, loss= 1.6907 (max= 2.0866), tps=20421, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:30:03,952 - root - INFO - Step 420: lr=4.21E-06, loss= 1.6907 (max= 2.0866), tps=20421, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:30:03,952 - root - INFO - Step 420: lr=4.21E-06, loss= 1.6907 (max= 2.0866), tps=20421, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:30:03,952 - root - INFO - Step 420: lr=4.21E-06, loss= 1.6907 (max= 2.0866), tps=20421, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:30:03,952 - root - INFO - Step 420: lr=4.21E-06, loss= 1.6907 (max= 2.0866), tps=20422, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:30:17,453 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:4815785 +2025-10-24 09:30:19,925 - root - INFO - Step 430: lr=4.31E-06, loss= 1.6798 (max= 2.1047), tps=20518, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:30:19,926 - root - INFO - Step 430: lr=4.31E-06, loss= 1.6798 (max= 2.1047), tps=20518, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:30:19,926 - root - INFO - Step 430: lr=4.31E-06, loss= 1.6798 (max= 2.1047), tps=20518, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:30:19,926 - root - INFO - Step 430: lr=4.31E-06, loss= 1.6798 (max= 2.1047), tps=20519, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:30:19,926 - root - INFO - Step 430: lr=4.31E-06, loss= 1.6798 (max= 2.1047), tps=20518, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:30:19,926 - root - INFO - Step 430: lr=4.31E-06, loss= 1.6798 (max= 2.1047), tps=20518, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:30:19,926 - root - INFO - Step 430: lr=4.31E-06, loss= 1.6798 (max= 2.1047), tps=20518, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:30:19,926 - root - INFO - Step 430: lr=4.31E-06, loss= 1.6798 (max= 2.1047), tps=20519, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:30:35,926 - root - INFO - Step 440: lr=4.41E-06, loss= 1.6682 (max= 2.1236), tps=20484, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:30:35,926 - root - INFO - Step 440: lr=4.41E-06, loss= 1.6682 (max= 2.1236), tps=20484, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:30:35,926 - root - INFO - Step 440: lr=4.41E-06, loss= 1.6682 (max= 2.1236), tps=20484, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:30:35,926 - root - INFO - Step 440: lr=4.41E-06, loss= 1.6682 (max= 2.1236), tps=20484, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:30:35,926 - root - INFO - Step 440: lr=4.41E-06, loss= 1.6682 (max= 2.1236), tps=20484, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:30:35,926 - root - INFO - Step 440: lr=4.41E-06, loss= 1.6682 (max= 2.1236), tps=20484, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:30:35,926 - root - INFO - Step 440: lr=4.41E-06, loss= 1.6682 (max= 2.1236), tps=20484, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:30:35,926 - root - INFO - Step 440: lr=4.41E-06, loss= 1.6682 (max= 2.1236), tps=20484, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:30:51,901 - root - INFO - Step 450: lr=4.51E-06, loss= 1.6786 (max= 2.1621), tps=20516, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:30:51,901 - root - INFO - Step 450: lr=4.51E-06, loss= 1.6786 (max= 2.1621), tps=20516, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:30:51,901 - root - INFO - Step 450: lr=4.51E-06, loss= 1.6786 (max= 2.1621), tps=20516, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:30:51,901 - root - INFO - Step 450: lr=4.51E-06, loss= 1.6786 (max= 2.1621), tps=20516, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:30:51,901 - root - INFO - Step 450: lr=4.51E-06, loss= 1.6786 (max= 2.1621), tps=20516, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:30:51,901 - root - INFO - Step 450: lr=4.51E-06, loss= 1.6786 (max= 2.1621), tps=20516, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:30:51,902 - root - INFO - Step 450: lr=4.51E-06, loss= 1.6786 (max= 2.1621), tps=20515, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:30:51,902 - root - INFO - Step 450: lr=4.51E-06, loss= 1.6786 (max= 2.1621), tps=20516, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:31:07,872 - root - INFO - Step 460: lr=4.61E-06, loss= 1.7238 (max= 2.1145), tps=20522, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:07,872 - root - INFO - Step 460: lr=4.61E-06, loss= 1.7238 (max= 2.1145), tps=20523, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:07,872 - root - INFO - Step 460: lr=4.61E-06, loss= 1.7238 (max= 2.1145), tps=20523, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:07,872 - root - INFO - Step 460: lr=4.61E-06, loss= 1.7238 (max= 2.1145), tps=20522, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:07,872 - root - INFO - Step 460: lr=4.61E-06, loss= 1.7238 (max= 2.1145), tps=20522, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:07,872 - root - INFO - Step 460: lr=4.61E-06, loss= 1.7238 (max= 2.1145), tps=20522, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:07,872 - root - INFO - Step 460: lr=4.61E-06, loss= 1.7238 (max= 2.1145), tps=20522, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:07,872 - root - INFO - Step 460: lr=4.61E-06, loss= 1.7238 (max= 2.1145), tps=20523, mfu=42.76%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:23,878 - root - INFO - Step 470: lr=4.71E-06, loss= 1.6942 (max= 2.0696), tps=20476, mfu=42.66%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:31:23,878 - root - INFO - Step 470: lr=4.71E-06, loss= 1.6942 (max= 2.0696), tps=20475, mfu=42.66%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:31:23,878 - root - INFO - Step 470: lr=4.71E-06, loss= 1.6942 (max= 2.0696), tps=20476, mfu=42.66%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:31:23,879 - root - INFO - Step 470: lr=4.71E-06, loss= 1.6942 (max= 2.0696), tps=20476, mfu=42.66%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:31:23,879 - root - INFO - Step 470: lr=4.71E-06, loss= 1.6942 (max= 2.0696), tps=20476, mfu=42.66%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:31:23,879 - root - INFO - Step 470: lr=4.71E-06, loss= 1.6942 (max= 2.0696), tps=20475, mfu=42.66%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:31:23,879 - root - INFO - Step 470: lr=4.71E-06, loss= 1.6942 (max= 2.0696), tps=20475, mfu=42.66%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:31:23,879 - root - INFO - Step 470: lr=4.71E-06, loss= 1.6942 (max= 2.0696), tps=20476, mfu=42.66%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:31:39,798 - root - INFO - Step 480: lr=4.81E-06, loss= 1.6517 (max= 2.1076), tps=20588, mfu=42.89%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:39,798 - root - INFO - Step 480: lr=4.81E-06, loss= 1.6517 (max= 2.1076), tps=20588, mfu=42.89%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:39,798 - root - INFO - Step 480: lr=4.81E-06, loss= 1.6517 (max= 2.1076), tps=20588, mfu=42.89%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:39,798 - root - INFO - Step 480: lr=4.81E-06, loss= 1.6517 (max= 2.1076), tps=20588, mfu=42.89%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:39,798 - root - INFO - Step 480: lr=4.81E-06, loss= 1.6517 (max= 2.1076), tps=20587, mfu=42.89%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:39,798 - root - INFO - Step 480: lr=4.81E-06, loss= 1.6517 (max= 2.1076), tps=20587, mfu=42.89%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:39,799 - root - INFO - Step 480: lr=4.81E-06, loss= 1.6517 (max= 2.1076), tps=20587, mfu=42.89%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:39,799 - root - INFO - Step 480: lr=4.81E-06, loss= 1.6517 (max= 2.1076), tps=20588, mfu=42.90%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:55,746 - root - INFO - Step 490: lr=4.91E-06, loss= 1.6689 (max= 2.0950), tps=20552, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:55,746 - root - INFO - Step 490: lr=4.91E-06, loss= 1.6689 (max= 2.0950), tps=20552, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:55,746 - root - INFO - Step 490: lr=4.91E-06, loss= 1.6689 (max= 2.0950), tps=20552, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:55,746 - root - INFO - Step 490: lr=4.91E-06, loss= 1.6689 (max= 2.0950), tps=20551, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:55,746 - root - INFO - Step 490: lr=4.91E-06, loss= 1.6689 (max= 2.0950), tps=20552, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:55,746 - root - INFO - Step 490: lr=4.91E-06, loss= 1.6689 (max= 2.0950), tps=20551, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:55,746 - root - INFO - Step 490: lr=4.91E-06, loss= 1.6689 (max= 2.0950), tps=20552, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:31:55,746 - root - INFO - Step 490: lr=4.91E-06, loss= 1.6689 (max= 2.0950), tps=20552, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:32:11,687 - root - INFO - Step 500: lr=5.01E-06, loss= 1.6422 (max= 2.0509), tps=20561, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:11,687 - root - INFO - Step 500: lr=5.01E-06, loss= 1.6422 (max= 2.0509), tps=20561, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:11,687 - root - INFO - Step 500: lr=5.01E-06, loss= 1.6422 (max= 2.0509), tps=20561, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:11,687 - root - INFO - Step 500: lr=5.01E-06, loss= 1.6422 (max= 2.0509), tps=20561, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:11,687 - root - INFO - Step 500: lr=5.01E-06, loss= 1.6422 (max= 2.0509), tps=20561, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:11,687 - root - INFO - Step 500: lr=5.01E-06, loss= 1.6422 (max= 2.0509), tps=20560, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:11,687 - root - INFO - Step 500: lr=5.01E-06, loss= 1.6422 (max= 2.0509), tps=20560, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:11,687 - root - INFO - Step 500: lr=5.01E-06, loss= 1.6422 (max= 2.0509), tps=20561, mfu=42.84%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:27,640 - root - INFO - Step 510: lr=5.11E-06, loss= 1.6371 (max= 2.1134), tps=20545, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:32:27,640 - root - INFO - Step 510: lr=5.11E-06, loss= 1.6371 (max= 2.1134), tps=20545, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:32:27,640 - root - INFO - Step 510: lr=5.11E-06, loss= 1.6371 (max= 2.1134), tps=20545, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:32:27,640 - root - INFO - Step 510: lr=5.11E-06, loss= 1.6371 (max= 2.1134), tps=20545, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:32:27,640 - root - INFO - Step 510: lr=5.11E-06, loss= 1.6371 (max= 2.1134), tps=20545, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:32:27,640 - root - INFO - Step 510: lr=5.11E-06, loss= 1.6371 (max= 2.1134), tps=20545, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:32:27,640 - root - INFO - Step 510: lr=5.11E-06, loss= 1.6371 (max= 2.1134), tps=20545, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:32:27,640 - root - INFO - Step 510: lr=5.11E-06, loss= 1.6371 (max= 2.1134), tps=20545, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:32:43,603 - root - INFO - Step 520: lr=5.21E-06, loss= 1.6263 (max= 2.0942), tps=20531, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:43,603 - root - INFO - Step 520: lr=5.21E-06, loss= 1.6263 (max= 2.0942), tps=20531, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:43,603 - root - INFO - Step 520: lr=5.21E-06, loss= 1.6263 (max= 2.0942), tps=20531, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:43,604 - root - INFO - Step 520: lr=5.21E-06, loss= 1.6263 (max= 2.0942), tps=20531, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:43,604 - root - INFO - Step 520: lr=5.21E-06, loss= 1.6263 (max= 2.0942), tps=20531, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:43,604 - root - INFO - Step 520: lr=5.21E-06, loss= 1.6263 (max= 2.0942), tps=20531, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:43,604 - root - INFO - Step 520: lr=5.21E-06, loss= 1.6263 (max= 2.0942), tps=20531, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:43,604 - root - INFO - Step 520: lr=5.21E-06, loss= 1.6263 (max= 2.0942), tps=20531, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:59,569 - root - INFO - Step 530: lr=5.31E-06, loss= 1.6680 (max= 2.2144), tps=20530, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:59,569 - root - INFO - Step 530: lr=5.31E-06, loss= 1.6680 (max= 2.2144), tps=20530, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:59,569 - root - INFO - Step 530: lr=5.31E-06, loss= 1.6680 (max= 2.2144), tps=20530, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:59,569 - root - INFO - Step 530: lr=5.31E-06, loss= 1.6680 (max= 2.2144), tps=20530, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:59,569 - root - INFO - Step 530: lr=5.31E-06, loss= 1.6680 (max= 2.2144), tps=20530, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:59,569 - root - INFO - Step 530: lr=5.31E-06, loss= 1.6680 (max= 2.2144), tps=20530, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:59,569 - root - INFO - Step 530: lr=5.31E-06, loss= 1.6680 (max= 2.2144), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:32:59,569 - root - INFO - Step 530: lr=5.31E-06, loss= 1.6680 (max= 2.2144), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:33:15,593 - root - INFO - Step 540: lr=5.41E-06, loss= 1.6262 (max= 2.0930), tps=20454, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:15,593 - root - INFO - Step 540: lr=5.41E-06, loss= 1.6262 (max= 2.0930), tps=20454, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:15,593 - root - INFO - Step 540: lr=5.41E-06, loss= 1.6262 (max= 2.0930), tps=20454, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:15,593 - root - INFO - Step 540: lr=5.41E-06, loss= 1.6262 (max= 2.0930), tps=20453, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:15,593 - root - INFO - Step 540: lr=5.41E-06, loss= 1.6262 (max= 2.0930), tps=20454, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:15,593 - root - INFO - Step 540: lr=5.41E-06, loss= 1.6262 (max= 2.0930), tps=20453, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:15,593 - root - INFO - Step 540: lr=5.41E-06, loss= 1.6262 (max= 2.0930), tps=20454, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:15,593 - root - INFO - Step 540: lr=5.41E-06, loss= 1.6262 (max= 2.0930), tps=20454, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:31,635 - root - INFO - Step 550: lr=5.51E-06, loss= 1.6013 (max= 2.0658), tps=20431, mfu=42.57%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:31,635 - root - INFO - Step 550: lr=5.51E-06, loss= 1.6013 (max= 2.0658), tps=20430, mfu=42.57%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:31,635 - root - INFO - Step 550: lr=5.51E-06, loss= 1.6013 (max= 2.0658), tps=20431, mfu=42.57%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:31,635 - root - INFO - Step 550: lr=5.51E-06, loss= 1.6013 (max= 2.0658), tps=20431, mfu=42.57%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:31,635 - root - INFO - Step 550: lr=5.51E-06, loss= 1.6013 (max= 2.0658), tps=20430, mfu=42.57%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:31,635 - root - INFO - Step 550: lr=5.51E-06, loss= 1.6013 (max= 2.0658), tps=20430, mfu=42.57%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:31,635 - root - INFO - Step 550: lr=5.51E-06, loss= 1.6013 (max= 2.0658), tps=20430, mfu=42.57%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:31,636 - root - INFO - Step 550: lr=5.51E-06, loss= 1.6013 (max= 2.0658), tps=20431, mfu=42.57%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:47,659 - root - INFO - Step 560: lr=5.61E-06, loss= 1.6241 (max= 2.0608), tps=20453, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:47,659 - root - INFO - Step 560: lr=5.61E-06, loss= 1.6241 (max= 2.0608), tps=20453, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:47,659 - root - INFO - Step 560: lr=5.61E-06, loss= 1.6241 (max= 2.0608), tps=20453, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:47,659 - root - INFO - Step 560: lr=5.61E-06, loss= 1.6241 (max= 2.0608), tps=20453, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:47,659 - root - INFO - Step 560: lr=5.61E-06, loss= 1.6241 (max= 2.0608), tps=20453, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:47,659 - root - INFO - Step 560: lr=5.61E-06, loss= 1.6241 (max= 2.0608), tps=20453, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:47,659 - root - INFO - Step 560: lr=5.61E-06, loss= 1.6241 (max= 2.0608), tps=20453, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:33:47,660 - root - INFO - Step 560: lr=5.61E-06, loss= 1.6241 (max= 2.0608), tps=20454, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:34:03,709 - root - INFO - Step 570: lr=5.71E-06, loss= 1.6128 (max= 2.0187), tps=20421, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:03,709 - root - INFO - Step 570: lr=5.71E-06, loss= 1.6128 (max= 2.0187), tps=20421, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:03,709 - root - INFO - Step 570: lr=5.71E-06, loss= 1.6128 (max= 2.0187), tps=20421, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:03,709 - root - INFO - Step 570: lr=5.71E-06, loss= 1.6128 (max= 2.0187), tps=20421, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:03,709 - root - INFO - Step 570: lr=5.71E-06, loss= 1.6128 (max= 2.0187), tps=20421, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:03,709 - root - INFO - Step 570: lr=5.71E-06, loss= 1.6128 (max= 2.0187), tps=20421, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:03,709 - root - INFO - Step 570: lr=5.71E-06, loss= 1.6128 (max= 2.0187), tps=20421, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:03,709 - root - INFO - Step 570: lr=5.71E-06, loss= 1.6128 (max= 2.0187), tps=20421, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:19,689 - root - INFO - Step 580: lr=5.81E-06, loss= 1.6406 (max= 2.1058), tps=20510, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:19,689 - root - INFO - Step 580: lr=5.81E-06, loss= 1.6406 (max= 2.1058), tps=20510, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:19,689 - root - INFO - Step 580: lr=5.81E-06, loss= 1.6406 (max= 2.1058), tps=20510, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:19,689 - root - INFO - Step 580: lr=5.81E-06, loss= 1.6406 (max= 2.1058), tps=20510, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:19,689 - root - INFO - Step 580: lr=5.81E-06, loss= 1.6406 (max= 2.1058), tps=20510, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:19,689 - root - INFO - Step 580: lr=5.81E-06, loss= 1.6406 (max= 2.1058), tps=20510, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:19,689 - root - INFO - Step 580: lr=5.81E-06, loss= 1.6406 (max= 2.1058), tps=20510, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:19,690 - root - INFO - Step 580: lr=5.81E-06, loss= 1.6406 (max= 2.1058), tps=20510, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:35,664 - root - INFO - Step 590: lr=5.91E-06, loss= 1.5961 (max= 2.0633), tps=20517, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:35,664 - root - INFO - Step 590: lr=5.91E-06, loss= 1.5961 (max= 2.0633), tps=20517, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:35,664 - root - INFO - Step 590: lr=5.91E-06, loss= 1.5961 (max= 2.0633), tps=20517, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:35,664 - root - INFO - Step 590: lr=5.91E-06, loss= 1.5961 (max= 2.0633), tps=20517, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:35,664 - root - INFO - Step 590: lr=5.91E-06, loss= 1.5961 (max= 2.0633), tps=20517, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:35,664 - root - INFO - Step 590: lr=5.91E-06, loss= 1.5961 (max= 2.0633), tps=20518, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:35,664 - root - INFO - Step 590: lr=5.91E-06, loss= 1.5961 (max= 2.0633), tps=20517, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:35,664 - root - INFO - Step 590: lr=5.91E-06, loss= 1.5961 (max= 2.0633), tps=20518, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:51,686 - root - INFO - Step 600: lr=6.01E-06, loss= 1.6276 (max= 2.1592), tps=20455, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:51,686 - root - INFO - Step 600: lr=6.01E-06, loss= 1.6276 (max= 2.1592), tps=20456, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:51,686 - root - INFO - Step 600: lr=6.01E-06, loss= 1.6276 (max= 2.1592), tps=20455, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:51,686 - root - INFO - Step 600: lr=6.01E-06, loss= 1.6276 (max= 2.1592), tps=20456, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:51,686 - root - INFO - Step 600: lr=6.01E-06, loss= 1.6276 (max= 2.1592), tps=20455, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:51,686 - root - INFO - Step 600: lr=6.01E-06, loss= 1.6276 (max= 2.1592), tps=20455, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:51,686 - root - INFO - Step 600: lr=6.01E-06, loss= 1.6276 (max= 2.1592), tps=20455, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:34:51,687 - root - INFO - Step 600: lr=6.01E-06, loss= 1.6276 (max= 2.1592), tps=20456, mfu=42.62%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:07,622 - root - INFO - Step 610: lr=6.11E-06, loss= 1.6387 (max= 2.1363), tps=20567, mfu=42.85%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:07,622 - root - INFO - Step 610: lr=6.11E-06, loss= 1.6387 (max= 2.1363), tps=20567, mfu=42.85%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:07,622 - root - INFO - Step 610: lr=6.11E-06, loss= 1.6387 (max= 2.1363), tps=20567, mfu=42.85%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:07,622 - root - INFO - Step 610: lr=6.11E-06, loss= 1.6387 (max= 2.1363), tps=20567, mfu=42.85%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:07,622 - root - INFO - Step 610: lr=6.11E-06, loss= 1.6387 (max= 2.1363), tps=20567, mfu=42.85%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:07,622 - root - INFO - Step 610: lr=6.11E-06, loss= 1.6387 (max= 2.1363), tps=20567, mfu=42.85%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:07,622 - root - INFO - Step 610: lr=6.11E-06, loss= 1.6387 (max= 2.1363), tps=20567, mfu=42.85%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:07,622 - root - INFO - Step 610: lr=6.11E-06, loss= 1.6387 (max= 2.1363), tps=20568, mfu=42.85%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:23,587 - root - INFO - Step 620: lr=6.21E-06, loss= 1.6168 (max= 2.2049), tps=20528, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:23,588 - root - INFO - Step 620: lr=6.21E-06, loss= 1.6168 (max= 2.2049), tps=20528, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:23,588 - root - INFO - Step 620: lr=6.21E-06, loss= 1.6168 (max= 2.2049), tps=20528, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:23,588 - root - INFO - Step 620: lr=6.21E-06, loss= 1.6168 (max= 2.2049), tps=20528, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:23,588 - root - INFO - Step 620: lr=6.21E-06, loss= 1.6168 (max= 2.2049), tps=20528, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:23,588 - root - INFO - Step 620: lr=6.21E-06, loss= 1.6168 (max= 2.2049), tps=20528, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:23,588 - root - INFO - Step 620: lr=6.21E-06, loss= 1.6168 (max= 2.2049), tps=20528, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:23,588 - root - INFO - Step 620: lr=6.21E-06, loss= 1.6168 (max= 2.2049), tps=20529, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:39,543 - root - INFO - Step 630: lr=6.31E-06, loss= 1.6258 (max= 2.1862), tps=20541, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:35:39,544 - root - INFO - Step 630: lr=6.31E-06, loss= 1.6258 (max= 2.1862), tps=20541, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:35:39,544 - root - INFO - Step 630: lr=6.31E-06, loss= 1.6258 (max= 2.1862), tps=20541, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:35:39,544 - root - INFO - Step 630: lr=6.31E-06, loss= 1.6258 (max= 2.1862), tps=20541, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:35:39,544 - root - INFO - Step 630: lr=6.31E-06, loss= 1.6258 (max= 2.1862), tps=20541, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:35:39,544 - root - INFO - Step 630: lr=6.31E-06, loss= 1.6258 (max= 2.1862), tps=20541, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:35:39,544 - root - INFO - Step 630: lr=6.31E-06, loss= 1.6258 (max= 2.1862), tps=20541, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:35:39,544 - root - INFO - Step 630: lr=6.31E-06, loss= 1.6258 (max= 2.1862), tps=20541, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:35:55,530 - root - INFO - Step 640: lr=6.41E-06, loss= 1.5834 (max= 2.1089), tps=20502, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:55,531 - root - INFO - Step 640: lr=6.41E-06, loss= 1.5834 (max= 2.1089), tps=20502, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:55,531 - root - INFO - Step 640: lr=6.41E-06, loss= 1.5834 (max= 2.1089), tps=20502, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:55,531 - root - INFO - Step 640: lr=6.41E-06, loss= 1.5834 (max= 2.1089), tps=20502, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:55,531 - root - INFO - Step 640: lr=6.41E-06, loss= 1.5834 (max= 2.1089), tps=20502, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:55,531 - root - INFO - Step 640: lr=6.41E-06, loss= 1.5834 (max= 2.1089), tps=20502, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:55,531 - root - INFO - Step 640: lr=6.41E-06, loss= 1.5834 (max= 2.1089), tps=20502, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:35:55,532 - root - INFO - Step 640: lr=6.41E-06, loss= 1.5834 (max= 2.1089), tps=20501, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:36:11,447 - root - INFO - Step 650: lr=6.51E-06, loss= 1.5870 (max= 2.0460), tps=20592, mfu=42.90%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:36:11,447 - root - INFO - Step 650: lr=6.51E-06, loss= 1.5870 (max= 2.0460), tps=20592, mfu=42.90%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:36:11,447 - root - INFO - Step 650: lr=6.51E-06, loss= 1.5870 (max= 2.0460), tps=20592, mfu=42.90%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:36:11,447 - root - INFO - Step 650: lr=6.51E-06, loss= 1.5870 (max= 2.0460), tps=20592, mfu=42.90%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:36:11,447 - root - INFO - Step 650: lr=6.51E-06, loss= 1.5870 (max= 2.0460), tps=20592, mfu=42.90%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:36:11,447 - root - INFO - Step 650: lr=6.51E-06, loss= 1.5870 (max= 2.0460), tps=20592, mfu=42.90%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:36:11,447 - root - INFO - Step 650: lr=6.51E-06, loss= 1.5870 (max= 2.0460), tps=20592, mfu=42.90%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:36:11,448 - root - INFO - Step 650: lr=6.51E-06, loss= 1.5870 (max= 2.0460), tps=20593, mfu=42.91%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:36:27,431 - root - INFO - Step 660: lr=6.61E-06, loss= 1.6274 (max= 2.1911), tps=20504, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:36:27,432 - root - INFO - Step 660: lr=6.61E-06, loss= 1.6274 (max= 2.1911), tps=20504, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:36:27,432 - root - INFO - Step 660: lr=6.61E-06, loss= 1.6274 (max= 2.1911), tps=20504, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:36:27,432 - root - INFO - Step 660: lr=6.61E-06, loss= 1.6274 (max= 2.1911), tps=20504, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:36:27,432 - root - INFO - Step 660: lr=6.61E-06, loss= 1.6274 (max= 2.1911), tps=20504, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:36:27,432 - root - INFO - Step 660: lr=6.61E-06, loss= 1.6274 (max= 2.1911), tps=20504, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:36:27,432 - root - INFO - Step 660: lr=6.61E-06, loss= 1.6274 (max= 2.1911), tps=20504, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:36:27,432 - root - INFO - Step 660: lr=6.61E-06, loss= 1.6274 (max= 2.1911), tps=20505, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:36:43,413 - root - INFO - Step 670: lr=6.71E-06, loss= 1.5599 (max= 2.0787), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:36:43,414 - root - INFO - Step 670: lr=6.71E-06, loss= 1.5599 (max= 2.0787), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:36:43,414 - root - INFO - Step 670: lr=6.71E-06, loss= 1.5599 (max= 2.0787), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:36:43,414 - root - INFO - Step 670: lr=6.71E-06, loss= 1.5599 (max= 2.0787), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:36:43,414 - root - INFO - Step 670: lr=6.71E-06, loss= 1.5599 (max= 2.0787), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:36:43,414 - root - INFO - Step 670: lr=6.71E-06, loss= 1.5599 (max= 2.0787), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:36:43,414 - root - INFO - Step 670: lr=6.71E-06, loss= 1.5599 (max= 2.0787), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:36:43,414 - root - INFO - Step 670: lr=6.71E-06, loss= 1.5599 (max= 2.0787), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:36:59,439 - root - INFO - Step 680: lr=6.81E-06, loss= 1.5811 (max= 1.9267), tps=20452, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:36:59,439 - root - INFO - Step 680: lr=6.81E-06, loss= 1.5811 (max= 1.9267), tps=20452, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:36:59,439 - root - INFO - Step 680: lr=6.81E-06, loss= 1.5811 (max= 1.9267), tps=20452, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:36:59,439 - root - INFO - Step 680: lr=6.81E-06, loss= 1.5811 (max= 1.9267), tps=20452, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:36:59,439 - root - INFO - Step 680: lr=6.81E-06, loss= 1.5811 (max= 1.9267), tps=20452, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:36:59,439 - root - INFO - Step 680: lr=6.81E-06, loss= 1.5811 (max= 1.9267), tps=20452, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:36:59,439 - root - INFO - Step 680: lr=6.81E-06, loss= 1.5811 (max= 1.9267), tps=20452, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:36:59,440 - root - INFO - Step 680: lr=6.81E-06, loss= 1.5811 (max= 1.9267), tps=20452, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:37:15,416 - root - INFO - Step 690: lr=6.91E-06, loss= 1.5900 (max= 2.2149), tps=20513, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:37:15,416 - root - INFO - Step 690: lr=6.91E-06, loss= 1.5900 (max= 2.2149), tps=20514, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:37:15,416 - root - INFO - Step 690: lr=6.91E-06, loss= 1.5900 (max= 2.2149), tps=20514, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:37:15,416 - root - INFO - Step 690: lr=6.91E-06, loss= 1.5900 (max= 2.2149), tps=20514, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:37:15,416 - root - INFO - Step 690: lr=6.91E-06, loss= 1.5900 (max= 2.2149), tps=20514, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:37:15,417 - root - INFO - Step 690: lr=6.91E-06, loss= 1.5900 (max= 2.2149), tps=20513, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:37:15,417 - root - INFO - Step 690: lr=6.91E-06, loss= 1.5900 (max= 2.2149), tps=20513, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:37:15,417 - root - INFO - Step 690: lr=6.91E-06, loss= 1.5900 (max= 2.2149), tps=20514, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:37:31,441 - root - INFO - Step 700: lr=7.01E-06, loss= 1.5782 (max= 2.1362), tps=20452, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:37:31,441 - root - INFO - Step 700: lr=7.01E-06, loss= 1.5782 (max= 2.1362), tps=20453, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:37:31,441 - root - INFO - Step 700: lr=7.01E-06, loss= 1.5782 (max= 2.1362), tps=20453, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:37:31,441 - root - INFO - Step 700: lr=7.01E-06, loss= 1.5782 (max= 2.1362), tps=20453, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:37:31,441 - root - INFO - Step 700: lr=7.01E-06, loss= 1.5782 (max= 2.1362), tps=20453, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:37:31,441 - root - INFO - Step 700: lr=7.01E-06, loss= 1.5782 (max= 2.1362), tps=20452, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:37:31,441 - root - INFO - Step 700: lr=7.01E-06, loss= 1.5782 (max= 2.1362), tps=20452, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:37:31,442 - root - INFO - Step 700: lr=7.01E-06, loss= 1.5782 (max= 2.1362), tps=20453, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:37:47,423 - root - INFO - Step 710: lr=7.11E-06, loss= 1.5958 (max= 2.1134), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:37:47,423 - root - INFO - Step 710: lr=7.11E-06, loss= 1.5958 (max= 2.1134), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:37:47,423 - root - INFO - Step 710: lr=7.11E-06, loss= 1.5958 (max= 2.1134), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:37:47,423 - root - INFO - Step 710: lr=7.11E-06, loss= 1.5958 (max= 2.1134), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:37:47,423 - root - INFO - Step 710: lr=7.11E-06, loss= 1.5958 (max= 2.1134), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:37:47,424 - root - INFO - Step 710: lr=7.11E-06, loss= 1.5958 (max= 2.1134), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:37:47,424 - root - INFO - Step 710: lr=7.11E-06, loss= 1.5958 (max= 2.1134), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:37:47,424 - root - INFO - Step 710: lr=7.11E-06, loss= 1.5958 (max= 2.1134), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:03,400 - root - INFO - Step 720: lr=7.21E-06, loss= 1.5610 (max= 2.0235), tps=20514, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:03,400 - root - INFO - Step 720: lr=7.21E-06, loss= 1.5610 (max= 2.0235), tps=20514, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:03,400 - root - INFO - Step 720: lr=7.21E-06, loss= 1.5610 (max= 2.0235), tps=20513, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:03,400 - root - INFO - Step 720: lr=7.21E-06, loss= 1.5610 (max= 2.0235), tps=20514, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:03,400 - root - INFO - Step 720: lr=7.21E-06, loss= 1.5610 (max= 2.0235), tps=20514, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:03,400 - root - INFO - Step 720: lr=7.21E-06, loss= 1.5610 (max= 2.0235), tps=20514, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:03,400 - root - INFO - Step 720: lr=7.21E-06, loss= 1.5610 (max= 2.0235), tps=20514, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:03,401 - root - INFO - Step 720: lr=7.21E-06, loss= 1.5610 (max= 2.0235), tps=20515, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:19,383 - root - INFO - Step 730: lr=7.31E-06, loss= 1.5897 (max= 2.0582), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:38:19,383 - root - INFO - Step 730: lr=7.31E-06, loss= 1.5897 (max= 2.0582), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:38:19,383 - root - INFO - Step 730: lr=7.31E-06, loss= 1.5897 (max= 2.0582), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:38:19,383 - root - INFO - Step 730: lr=7.31E-06, loss= 1.5897 (max= 2.0582), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:38:19,383 - root - INFO - Step 730: lr=7.31E-06, loss= 1.5897 (max= 2.0582), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:38:19,383 - root - INFO - Step 730: lr=7.31E-06, loss= 1.5897 (max= 2.0582), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:38:19,384 - root - INFO - Step 730: lr=7.31E-06, loss= 1.5897 (max= 2.0582), tps=20506, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:38:19,384 - root - INFO - Step 730: lr=7.31E-06, loss= 1.5897 (max= 2.0582), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:38:35,356 - root - INFO - Step 740: lr=7.41E-06, loss= 1.6003 (max= 2.0101), tps=20519, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:35,356 - root - INFO - Step 740: lr=7.41E-06, loss= 1.6003 (max= 2.0101), tps=20519, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:35,356 - root - INFO - Step 740: lr=7.41E-06, loss= 1.6003 (max= 2.0101), tps=20519, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:35,356 - root - INFO - Step 740: lr=7.41E-06, loss= 1.6003 (max= 2.0101), tps=20519, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:35,356 - root - INFO - Step 740: lr=7.41E-06, loss= 1.6003 (max= 2.0101), tps=20519, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:35,356 - root - INFO - Step 740: lr=7.41E-06, loss= 1.6003 (max= 2.0101), tps=20519, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:35,356 - root - INFO - Step 740: lr=7.41E-06, loss= 1.6003 (max= 2.0101), tps=20519, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:35,357 - root - INFO - Step 740: lr=7.41E-06, loss= 1.6003 (max= 2.0101), tps=20519, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:51,304 - root - INFO - Step 750: lr=7.51E-06, loss= 1.5213 (max= 2.1027), tps=20551, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:51,304 - root - INFO - Step 750: lr=7.51E-06, loss= 1.5213 (max= 2.1027), tps=20552, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:51,304 - root - INFO - Step 750: lr=7.51E-06, loss= 1.5213 (max= 2.1027), tps=20552, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:51,304 - root - INFO - Step 750: lr=7.51E-06, loss= 1.5213 (max= 2.1027), tps=20552, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:51,304 - root - INFO - Step 750: lr=7.51E-06, loss= 1.5213 (max= 2.1027), tps=20552, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:51,304 - root - INFO - Step 750: lr=7.51E-06, loss= 1.5213 (max= 2.1027), tps=20552, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:51,304 - root - INFO - Step 750: lr=7.51E-06, loss= 1.5213 (max= 2.1027), tps=20551, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:38:51,305 - root - INFO - Step 750: lr=7.51E-06, loss= 1.5213 (max= 2.1027), tps=20552, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:39:07,260 - root - INFO - Step 760: lr=7.61E-06, loss= 1.5477 (max= 1.9877), tps=20540, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:07,260 - root - INFO - Step 760: lr=7.61E-06, loss= 1.5477 (max= 1.9877), tps=20540, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:07,260 - root - INFO - Step 760: lr=7.61E-06, loss= 1.5477 (max= 1.9877), tps=20541, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:07,260 - root - INFO - Step 760: lr=7.61E-06, loss= 1.5477 (max= 1.9877), tps=20541, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:07,260 - root - INFO - Step 760: lr=7.61E-06, loss= 1.5477 (max= 1.9877), tps=20541, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:07,260 - root - INFO - Step 760: lr=7.61E-06, loss= 1.5477 (max= 1.9877), tps=20540, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:07,261 - root - INFO - Step 760: lr=7.61E-06, loss= 1.5477 (max= 1.9877), tps=20541, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:07,261 - root - INFO - Step 760: lr=7.61E-06, loss= 1.5477 (max= 1.9877), tps=20541, mfu=42.80%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:23,298 - root - INFO - Step 770: lr=7.71E-06, loss= 1.5390 (max= 2.2469), tps=20436, mfu=42.58%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:23,298 - root - INFO - Step 770: lr=7.71E-06, loss= 1.5390 (max= 2.2469), tps=20436, mfu=42.58%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:23,298 - root - INFO - Step 770: lr=7.71E-06, loss= 1.5390 (max= 2.2469), tps=20437, mfu=42.58%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:23,298 - root - INFO - Step 770: lr=7.71E-06, loss= 1.5390 (max= 2.2469), tps=20437, mfu=42.58%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:23,298 - root - INFO - Step 770: lr=7.71E-06, loss= 1.5390 (max= 2.2469), tps=20437, mfu=42.58%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:23,298 - root - INFO - Step 770: lr=7.71E-06, loss= 1.5390 (max= 2.2469), tps=20436, mfu=42.58%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:23,298 - root - INFO - Step 770: lr=7.71E-06, loss= 1.5390 (max= 2.2469), tps=20437, mfu=42.58%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:23,298 - root - INFO - Step 770: lr=7.71E-06, loss= 1.5390 (max= 2.2469), tps=20437, mfu=42.58%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:39,247 - root - INFO - Step 780: lr=7.81E-06, loss= 1.5253 (max= 2.4012), tps=20550, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:39,247 - root - INFO - Step 780: lr=7.81E-06, loss= 1.5253 (max= 2.4012), tps=20550, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:39,247 - root - INFO - Step 780: lr=7.81E-06, loss= 1.5253 (max= 2.4012), tps=20550, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:39,247 - root - INFO - Step 780: lr=7.81E-06, loss= 1.5253 (max= 2.4012), tps=20550, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:39,247 - root - INFO - Step 780: lr=7.81E-06, loss= 1.5253 (max= 2.4012), tps=20550, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:39,247 - root - INFO - Step 780: lr=7.81E-06, loss= 1.5253 (max= 2.4012), tps=20550, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:39,247 - root - INFO - Step 780: lr=7.81E-06, loss= 1.5253 (max= 2.4012), tps=20550, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:39,247 - root - INFO - Step 780: lr=7.81E-06, loss= 1.5253 (max= 2.4012), tps=20550, mfu=42.82%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:55,242 - root - INFO - Step 790: lr=7.91E-06, loss= 1.5404 (max= 2.1533), tps=20490, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:55,242 - root - INFO - Step 790: lr=7.91E-06, loss= 1.5404 (max= 2.1533), tps=20489, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:55,243 - root - INFO - Step 790: lr=7.91E-06, loss= 1.5404 (max= 2.1533), tps=20490, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:55,243 - root - INFO - Step 790: lr=7.91E-06, loss= 1.5404 (max= 2.1533), tps=20490, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:55,243 - root - INFO - Step 790: lr=7.91E-06, loss= 1.5404 (max= 2.1533), tps=20490, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:55,243 - root - INFO - Step 790: lr=7.91E-06, loss= 1.5404 (max= 2.1533), tps=20490, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:55,243 - root - INFO - Step 790: lr=7.91E-06, loss= 1.5404 (max= 2.1533), tps=20490, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:39:55,243 - root - INFO - Step 790: lr=7.91E-06, loss= 1.5404 (max= 2.1533), tps=20490, mfu=42.69%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:40:11,259 - root - INFO - Step 800: lr=8.01E-06, loss= 1.5734 (max= 2.0560), tps=20462, mfu=42.63%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:11,260 - root - INFO - Step 800: lr=8.01E-06, loss= 1.5734 (max= 2.0560), tps=20462, mfu=42.63%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:11,260 - root - INFO - Step 800: lr=8.01E-06, loss= 1.5734 (max= 2.0560), tps=20462, mfu=42.63%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:11,260 - root - INFO - Step 800: lr=8.01E-06, loss= 1.5734 (max= 2.0560), tps=20463, mfu=42.63%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:11,260 - root - INFO - Step 800: lr=8.01E-06, loss= 1.5734 (max= 2.0560), tps=20462, mfu=42.63%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:11,260 - root - INFO - Step 800: lr=8.01E-06, loss= 1.5734 (max= 2.0560), tps=20462, mfu=42.63%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:11,260 - root - INFO - Step 800: lr=8.01E-06, loss= 1.5734 (max= 2.0560), tps=20462, mfu=42.63%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:11,260 - root - INFO - Step 800: lr=8.01E-06, loss= 1.5734 (max= 2.0560), tps=20463, mfu=42.63%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:27,250 - root - INFO - Step 810: lr=8.11E-06, loss= 1.5451 (max= 2.0783), tps=20496, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:27,250 - root - INFO - Step 810: lr=8.11E-06, loss= 1.5451 (max= 2.0783), tps=20496, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:27,250 - root - INFO - Step 810: lr=8.11E-06, loss= 1.5451 (max= 2.0783), tps=20496, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:27,250 - root - INFO - Step 810: lr=8.11E-06, loss= 1.5451 (max= 2.0783), tps=20496, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:27,250 - root - INFO - Step 810: lr=8.11E-06, loss= 1.5451 (max= 2.0783), tps=20497, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:27,250 - root - INFO - Step 810: lr=8.11E-06, loss= 1.5451 (max= 2.0783), tps=20496, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:27,250 - root - INFO - Step 810: lr=8.11E-06, loss= 1.5451 (max= 2.0783), tps=20496, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:27,251 - root - INFO - Step 810: lr=8.11E-06, loss= 1.5451 (max= 2.0783), tps=20497, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:43,237 - root - INFO - Step 820: lr=8.21E-06, loss= 1.5645 (max= 2.0004), tps=20501, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:43,237 - root - INFO - Step 820: lr=8.21E-06, loss= 1.5645 (max= 2.0004), tps=20501, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:43,237 - root - INFO - Step 820: lr=8.21E-06, loss= 1.5645 (max= 2.0004), tps=20501, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:43,237 - root - INFO - Step 820: lr=8.21E-06, loss= 1.5645 (max= 2.0004), tps=20501, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:43,237 - root - INFO - Step 820: lr=8.21E-06, loss= 1.5645 (max= 2.0004), tps=20501, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:43,237 - root - INFO - Step 820: lr=8.21E-06, loss= 1.5645 (max= 2.0004), tps=20501, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:43,237 - root - INFO - Step 820: lr=8.21E-06, loss= 1.5645 (max= 2.0004), tps=20501, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:43,238 - root - INFO - Step 820: lr=8.21E-06, loss= 1.5645 (max= 2.0004), tps=20501, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:59,201 - root - INFO - Step 830: lr=8.31E-06, loss= 1.5510 (max= 2.6914), tps=20530, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:59,201 - root - INFO - Step 830: lr=8.31E-06, loss= 1.5510 (max= 2.6914), tps=20530, mfu=42.77%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:59,201 - root - INFO - Step 830: lr=8.31E-06, loss= 1.5510 (max= 2.6914), tps=20531, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:59,202 - root - INFO - Step 830: lr=8.31E-06, loss= 1.5510 (max= 2.6914), tps=20530, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:59,202 - root - INFO - Step 830: lr=8.31E-06, loss= 1.5510 (max= 2.6914), tps=20530, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:59,202 - root - INFO - Step 830: lr=8.31E-06, loss= 1.5510 (max= 2.6914), tps=20530, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:59,202 - root - INFO - Step 830: lr=8.31E-06, loss= 1.5510 (max= 2.6914), tps=20530, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:40:59,202 - root - INFO - Step 830: lr=8.31E-06, loss= 1.5510 (max= 2.6914), tps=20531, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:15,177 - root - INFO - Step 840: lr=8.41E-06, loss= 1.5170 (max= 2.1228), tps=20516, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:15,177 - root - INFO - Step 840: lr=8.41E-06, loss= 1.5170 (max= 2.1228), tps=20516, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:15,177 - root - INFO - Step 840: lr=8.41E-06, loss= 1.5170 (max= 2.1228), tps=20516, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:15,177 - root - INFO - Step 840: lr=8.41E-06, loss= 1.5170 (max= 2.1228), tps=20516, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:15,177 - root - INFO - Step 840: lr=8.41E-06, loss= 1.5170 (max= 2.1228), tps=20515, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:15,177 - root - INFO - Step 840: lr=8.41E-06, loss= 1.5170 (max= 2.1228), tps=20516, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:15,177 - root - INFO - Step 840: lr=8.41E-06, loss= 1.5170 (max= 2.1228), tps=20516, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:15,177 - root - INFO - Step 840: lr=8.41E-06, loss= 1.5170 (max= 2.1228), tps=20516, mfu=42.75%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:31,201 - root - INFO - Step 850: lr=8.51E-06, loss= 1.5001 (max= 1.8377), tps=20453, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:31,202 - root - INFO - Step 850: lr=8.51E-06, loss= 1.5001 (max= 1.8377), tps=20453, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:31,202 - root - INFO - Step 850: lr=8.51E-06, loss= 1.5001 (max= 1.8377), tps=20453, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:31,202 - root - INFO - Step 850: lr=8.51E-06, loss= 1.5001 (max= 1.8377), tps=20453, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:31,202 - root - INFO - Step 850: lr=8.51E-06, loss= 1.5001 (max= 1.8377), tps=20453, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:31,202 - root - INFO - Step 850: lr=8.51E-06, loss= 1.5001 (max= 1.8377), tps=20453, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:31,202 - root - INFO - Step 850: lr=8.51E-06, loss= 1.5001 (max= 1.8377), tps=20453, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:31,202 - root - INFO - Step 850: lr=8.51E-06, loss= 1.5001 (max= 1.8377), tps=20453, mfu=42.61%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:47,212 - root - INFO - Step 860: lr=8.61E-06, loss= 1.5499 (max= 2.0518), tps=20471, mfu=42.65%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:47,212 - root - INFO - Step 860: lr=8.61E-06, loss= 1.5499 (max= 2.0518), tps=20471, mfu=42.65%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:47,212 - root - INFO - Step 860: lr=8.61E-06, loss= 1.5499 (max= 2.0518), tps=20471, mfu=42.65%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:47,212 - root - INFO - Step 860: lr=8.61E-06, loss= 1.5499 (max= 2.0518), tps=20471, mfu=42.65%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:47,212 - root - INFO - Step 860: lr=8.61E-06, loss= 1.5499 (max= 2.0518), tps=20471, mfu=42.65%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:47,212 - root - INFO - Step 860: lr=8.61E-06, loss= 1.5499 (max= 2.0518), tps=20471, mfu=42.65%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:47,212 - root - INFO - Step 860: lr=8.61E-06, loss= 1.5499 (max= 2.0518), tps=20471, mfu=42.65%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:41:47,212 - root - INFO - Step 860: lr=8.61E-06, loss= 1.5499 (max= 2.0518), tps=20471, mfu=42.65%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:03,196 - root - INFO - Step 870: lr=8.71E-06, loss= 1.5115 (max= 1.9154), tps=20505, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:03,196 - root - INFO - Step 870: lr=8.71E-06, loss= 1.5115 (max= 1.9154), tps=20505, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:03,196 - root - INFO - Step 870: lr=8.71E-06, loss= 1.5115 (max= 1.9154), tps=20505, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:03,196 - root - INFO - Step 870: lr=8.71E-06, loss= 1.5115 (max= 1.9154), tps=20505, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:03,196 - root - INFO - Step 870: lr=8.71E-06, loss= 1.5115 (max= 1.9154), tps=20505, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:03,196 - root - INFO - Step 870: lr=8.71E-06, loss= 1.5115 (max= 1.9154), tps=20505, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:03,196 - root - INFO - Step 870: lr=8.71E-06, loss= 1.5115 (max= 1.9154), tps=20505, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:03,196 - root - INFO - Step 870: lr=8.71E-06, loss= 1.5115 (max= 1.9154), tps=20505, mfu=42.72%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:19,196 - root - INFO - Step 880: lr=8.81E-06, loss= 1.5464 (max= 1.9610), tps=20484, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:19,196 - root - INFO - Step 880: lr=8.81E-06, loss= 1.5464 (max= 1.9610), tps=20484, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:19,196 - root - INFO - Step 880: lr=8.81E-06, loss= 1.5464 (max= 1.9610), tps=20484, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:19,196 - root - INFO - Step 880: lr=8.81E-06, loss= 1.5464 (max= 1.9610), tps=20484, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:19,196 - root - INFO - Step 880: lr=8.81E-06, loss= 1.5464 (max= 1.9610), tps=20484, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:19,196 - root - INFO - Step 880: lr=8.81E-06, loss= 1.5464 (max= 1.9610), tps=20484, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:19,196 - root - INFO - Step 880: lr=8.81E-06, loss= 1.5464 (max= 1.9610), tps=20484, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:19,197 - root - INFO - Step 880: lr=8.81E-06, loss= 1.5464 (max= 1.9610), tps=20484, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:35,189 - root - INFO - Step 890: lr=8.91E-06, loss= 1.5067 (max= 1.8988), tps=20493, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:35,190 - root - INFO - Step 890: lr=8.91E-06, loss= 1.5067 (max= 1.8988), tps=20493, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:35,190 - root - INFO - Step 890: lr=8.91E-06, loss= 1.5067 (max= 1.8988), tps=20493, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:35,190 - root - INFO - Step 890: lr=8.91E-06, loss= 1.5067 (max= 1.8988), tps=20493, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:35,190 - root - INFO - Step 890: lr=8.91E-06, loss= 1.5067 (max= 1.8988), tps=20493, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:35,190 - root - INFO - Step 890: lr=8.91E-06, loss= 1.5067 (max= 1.8988), tps=20493, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:35,190 - root - INFO - Step 890: lr=8.91E-06, loss= 1.5067 (max= 1.8988), tps=20493, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:35,190 - root - INFO - Step 890: lr=8.91E-06, loss= 1.5067 (max= 1.8988), tps=20493, mfu=42.70%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:51,167 - root - INFO - Step 900: lr=9.01E-06, loss= 1.5176 (max= 2.0546), tps=20513, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:51,168 - root - INFO - Step 900: lr=9.01E-06, loss= 1.5176 (max= 2.0546), tps=20513, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:51,168 - root - INFO - Step 900: lr=9.01E-06, loss= 1.5176 (max= 2.0546), tps=20513, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:51,168 - root - INFO - Step 900: lr=9.01E-06, loss= 1.5176 (max= 2.0546), tps=20513, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:51,168 - root - INFO - Step 900: lr=9.01E-06, loss= 1.5176 (max= 2.0546), tps=20513, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:51,168 - root - INFO - Step 900: lr=9.01E-06, loss= 1.5176 (max= 2.0546), tps=20513, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:51,168 - root - INFO - Step 900: lr=9.01E-06, loss= 1.5176 (max= 2.0546), tps=20512, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:42:51,168 - root - INFO - Step 900: lr=9.01E-06, loss= 1.5176 (max= 2.0546), tps=20513, mfu=42.74%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:07,170 - root - INFO - Step 910: lr=9.11E-06, loss= 1.5297 (max= 1.9281), tps=20481, mfu=42.67%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:07,170 - root - INFO - Step 910: lr=9.11E-06, loss= 1.5297 (max= 1.9281), tps=20481, mfu=42.67%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:07,170 - root - INFO - Step 910: lr=9.11E-06, loss= 1.5297 (max= 1.9281), tps=20481, mfu=42.67%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:07,170 - root - INFO - Step 910: lr=9.11E-06, loss= 1.5297 (max= 1.9281), tps=20481, mfu=42.67%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:07,170 - root - INFO - Step 910: lr=9.11E-06, loss= 1.5297 (max= 1.9281), tps=20482, mfu=42.67%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:07,170 - root - INFO - Step 910: lr=9.11E-06, loss= 1.5297 (max= 1.9281), tps=20481, mfu=42.67%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:07,170 - root - INFO - Step 910: lr=9.11E-06, loss= 1.5297 (max= 1.9281), tps=20481, mfu=42.67%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:07,171 - root - INFO - Step 910: lr=9.11E-06, loss= 1.5297 (max= 1.9281), tps=20482, mfu=42.67%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:23,219 - root - INFO - Step 920: lr=9.21E-06, loss= 1.4940 (max= 1.8788), tps=20422, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:23,219 - root - INFO - Step 920: lr=9.21E-06, loss= 1.4940 (max= 1.8788), tps=20422, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:23,219 - root - INFO - Step 920: lr=9.21E-06, loss= 1.4940 (max= 1.8788), tps=20422, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:23,219 - root - INFO - Step 920: lr=9.21E-06, loss= 1.4940 (max= 1.8788), tps=20422, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:23,219 - root - INFO - Step 920: lr=9.21E-06, loss= 1.4940 (max= 1.8788), tps=20422, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:23,219 - root - INFO - Step 920: lr=9.21E-06, loss= 1.4940 (max= 1.8788), tps=20422, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:23,219 - root - INFO - Step 920: lr=9.21E-06, loss= 1.4940 (max= 1.8788), tps=20422, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:23,219 - root - INFO - Step 920: lr=9.21E-06, loss= 1.4940 (max= 1.8788), tps=20423, mfu=42.55%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:39,220 - root - INFO - Step 930: lr=9.31E-06, loss= 1.5157 (max= 1.9363), tps=20483, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:39,220 - root - INFO - Step 930: lr=9.31E-06, loss= 1.5157 (max= 1.9363), tps=20483, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:39,220 - root - INFO - Step 930: lr=9.31E-06, loss= 1.5157 (max= 1.9363), tps=20483, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:39,220 - root - INFO - Step 930: lr=9.31E-06, loss= 1.5157 (max= 1.9363), tps=20483, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:39,220 - root - INFO - Step 930: lr=9.31E-06, loss= 1.5157 (max= 1.9363), tps=20483, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:39,220 - root - INFO - Step 930: lr=9.31E-06, loss= 1.5157 (max= 1.9363), tps=20483, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:39,220 - root - INFO - Step 930: lr=9.31E-06, loss= 1.5157 (max= 1.9363), tps=20483, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:39,220 - root - INFO - Step 930: lr=9.31E-06, loss= 1.5157 (max= 1.9363), tps=20484, mfu=42.68%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:55,207 - root - INFO - Step 940: lr=9.41E-06, loss= 1.5241 (max= 2.0692), tps=20500, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:55,208 - root - INFO - Step 940: lr=9.41E-06, loss= 1.5241 (max= 2.0692), tps=20500, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:55,208 - root - INFO - Step 940: lr=9.41E-06, loss= 1.5241 (max= 2.0692), tps=20500, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:55,208 - root - INFO - Step 940: lr=9.41E-06, loss= 1.5241 (max= 2.0692), tps=20500, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:55,208 - root - INFO - Step 940: lr=9.41E-06, loss= 1.5241 (max= 2.0692), tps=20500, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:55,208 - root - INFO - Step 940: lr=9.41E-06, loss= 1.5241 (max= 2.0692), tps=20500, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:55,208 - root - INFO - Step 940: lr=9.41E-06, loss= 1.5241 (max= 2.0692), tps=20500, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:43:55,208 - root - INFO - Step 940: lr=9.41E-06, loss= 1.5241 (max= 2.0692), tps=20500, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:11,196 - root - INFO - Step 950: lr=9.51E-06, loss= 1.5353 (max= 1.9043), tps=20499, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:11,196 - root - INFO - Step 950: lr=9.51E-06, loss= 1.5353 (max= 1.9043), tps=20499, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:11,196 - root - INFO - Step 950: lr=9.51E-06, loss= 1.5353 (max= 1.9043), tps=20499, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:11,196 - root - INFO - Step 950: lr=9.51E-06, loss= 1.5353 (max= 1.9043), tps=20500, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:11,196 - root - INFO - Step 950: lr=9.51E-06, loss= 1.5353 (max= 1.9043), tps=20499, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:11,196 - root - INFO - Step 950: lr=9.51E-06, loss= 1.5353 (max= 1.9043), tps=20499, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:11,196 - root - INFO - Step 950: lr=9.51E-06, loss= 1.5353 (max= 1.9043), tps=20499, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:11,196 - root - INFO - Step 950: lr=9.51E-06, loss= 1.5353 (max= 1.9043), tps=20500, mfu=42.71%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:27,177 - root - INFO - Step 960: lr=9.61E-06, loss= 1.5118 (max= 1.9790), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:27,178 - root - INFO - Step 960: lr=9.61E-06, loss= 1.5118 (max= 1.9790), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:27,178 - root - INFO - Step 960: lr=9.61E-06, loss= 1.5118 (max= 1.9790), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:27,178 - root - INFO - Step 960: lr=9.61E-06, loss= 1.5118 (max= 1.9790), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:27,178 - root - INFO - Step 960: lr=9.61E-06, loss= 1.5118 (max= 1.9790), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:27,178 - root - INFO - Step 960: lr=9.61E-06, loss= 1.5118 (max= 1.9790), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:27,178 - root - INFO - Step 960: lr=9.61E-06, loss= 1.5118 (max= 1.9790), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:27,178 - root - INFO - Step 960: lr=9.61E-06, loss= 1.5118 (max= 1.9790), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:43,139 - root - INFO - Step 970: lr=9.71E-06, loss= 1.4983 (max= 1.8905), tps=20534, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:43,139 - root - INFO - Step 970: lr=9.71E-06, loss= 1.4983 (max= 1.8905), tps=20534, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:43,139 - root - INFO - Step 970: lr=9.71E-06, loss= 1.4983 (max= 1.8905), tps=20534, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:43,139 - root - INFO - Step 970: lr=9.71E-06, loss= 1.4983 (max= 1.8905), tps=20534, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:43,139 - root - INFO - Step 970: lr=9.71E-06, loss= 1.4983 (max= 1.8905), tps=20534, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:43,139 - root - INFO - Step 970: lr=9.71E-06, loss= 1.4983 (max= 1.8905), tps=20534, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:43,139 - root - INFO - Step 970: lr=9.71E-06, loss= 1.4983 (max= 1.8905), tps=20534, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:43,139 - root - INFO - Step 970: lr=9.71E-06, loss= 1.4983 (max= 1.8905), tps=20534, mfu=42.78%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:59,098 - root - INFO - Step 980: lr=9.81E-06, loss= 1.5042 (max= 1.9141), tps=20536, mfu=42.79%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:59,098 - root - INFO - Step 980: lr=9.81E-06, loss= 1.5042 (max= 1.9141), tps=20537, mfu=42.79%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:59,098 - root - INFO - Step 980: lr=9.81E-06, loss= 1.5042 (max= 1.9141), tps=20537, mfu=42.79%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:59,098 - root - INFO - Step 980: lr=9.81E-06, loss= 1.5042 (max= 1.9141), tps=20537, mfu=42.79%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:59,098 - root - INFO - Step 980: lr=9.81E-06, loss= 1.5042 (max= 1.9141), tps=20537, mfu=42.79%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:59,098 - root - INFO - Step 980: lr=9.81E-06, loss= 1.5042 (max= 1.9141), tps=20537, mfu=42.79%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:59,098 - root - INFO - Step 980: lr=9.81E-06, loss= 1.5042 (max= 1.9141), tps=20536, mfu=42.79%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:44:59,099 - root - INFO - Step 980: lr=9.81E-06, loss= 1.5042 (max= 1.9141), tps=20537, mfu=42.79%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:45:15,080 - root - INFO - Step 990: lr=9.91E-06, loss= 1.5556 (max= 1.9317), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:45:15,080 - root - INFO - Step 990: lr=9.91E-06, loss= 1.5556 (max= 1.9317), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:45:15,080 - root - INFO - Step 990: lr=9.91E-06, loss= 1.5556 (max= 1.9317), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:45:15,080 - root - INFO - Step 990: lr=9.91E-06, loss= 1.5556 (max= 1.9317), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:45:15,080 - root - INFO - Step 990: lr=9.91E-06, loss= 1.5556 (max= 1.9317), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:45:15,080 - root - INFO - Step 990: lr=9.91E-06, loss= 1.5556 (max= 1.9317), tps=20507, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:45:15,080 - root - INFO - Step 990: lr=9.91E-06, loss= 1.5556 (max= 1.9317), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:45:15,080 - root - INFO - Step 990: lr=9.91E-06, loss= 1.5556 (max= 1.9317), tps=20508, mfu=42.73%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-1000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-1000! Save time: 3.6267969608306885 +2025-10-24 09:45:31,031 - root - INFO - Step 1000: lr=1.00E-05, loss= 1.5419 (max= 2.0260), tps=20548, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:45:31,031 - root - INFO - Step 1000: lr=1.00E-05, loss= 1.5419 (max= 2.0260), tps=20548, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:45:31,031 - root - INFO - Step 1000: lr=1.00E-05, loss= 1.5419 (max= 2.0260), tps=20547, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:45:31,031 - root - INFO - Step 1000: lr=1.00E-05, loss= 1.5419 (max= 2.0260), tps=20548, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:45:31,031 - root - INFO - Step 1000: lr=1.00E-05, loss= 1.5419 (max= 2.0260), tps=20548, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:45:31,031 - root - INFO - Step 1000: lr=1.00E-05, loss= 1.5419 (max= 2.0260), tps=20548, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:45:31,031 - root - INFO - Saving a full checkpoint at step 1000 +2025-10-24 09:45:31,031 - root - INFO - Saving a full checkpoint at step 1000 +2025-10-24 09:45:31,031 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:45:31,031 - root - INFO - Saving a full checkpoint at step 1000 +2025-10-24 09:45:31,031 - root - INFO - Saving a full checkpoint at step 1000 +2025-10-24 09:45:31,031 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:45:31,031 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:45:31,031 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:45:31,031 - root - INFO - Saving a full checkpoint at step 1000 +2025-10-24 09:45:31,031 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:45:31,031 - root - INFO - Saving a full checkpoint at step 1000 +2025-10-24 09:45:31,031 - root - INFO - Step 1000: lr=1.00E-05, loss= 1.5419 (max= 2.0260), tps=20548, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:45:31,031 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:45:31,031 - root - INFO - Saving a full checkpoint at step 1000 +2025-10-24 09:45:31,031 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:45:31,031 - root - INFO - Step 1000: lr=1.00E-05, loss= 1.5419 (max= 2.0260), tps=20548, mfu=42.81%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:45:31,031 - root - INFO - Saving a full checkpoint at step 1000 +2025-10-24 09:45:31,031 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:45:55,297 - root - INFO - Finished saving the checkpoint in 24.27 seconds +2025-10-24 09:45:55,306 - root - INFO - Finished saving the checkpoint in 24.28 seconds +2025-10-24 09:45:55,306 - root - INFO - Finished saving the checkpoint in 24.28 seconds +2025-10-24 09:45:55,306 - root - INFO - Finished saving the checkpoint in 24.28 seconds +2025-10-24 09:45:55,306 - root - INFO - Finished saving the checkpoint in 24.28 seconds +2025-10-24 09:45:55,306 - root - INFO - Finished saving the checkpoint in 24.28 seconds +2025-10-24 09:45:55,306 - root - INFO - Finished saving the checkpoint in 24.28 seconds +2025-10-24 09:45:55,307 - root - INFO - Finished saving the checkpoint in 24.28 seconds +2025-10-24 09:46:08,778 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:2240880 +2025-10-24 09:46:11,249 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 2.0488), tps=8148, mfu=16.98%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:46:11,249 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 2.0488), tps=8148, mfu=16.98%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:46:11,249 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 2.0488), tps=8148, mfu=16.98%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:46:11,249 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 2.0488), tps=8148, mfu=16.98%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:46:11,249 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 2.0488), tps=8148, mfu=16.98%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:46:11,249 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 2.0488), tps=8148, mfu=16.98%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:46:11,249 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 2.0488), tps=8148, mfu=16.98%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:46:11,249 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 2.0488), tps=8148, mfu=16.98%, memory: 154.21GiB(86.46%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 09:48:22,540 - root - INFO - Starting training. +2025-10-24 09:48:22,540 - root - INFO - Starting training. +2025-10-24 09:48:22,540 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:48:22,540 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:48:22,666 - root - INFO - Starting training. +2025-10-24 09:48:22,666 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:48:22,688 - root - INFO - Starting training. +2025-10-24 09:48:22,688 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:48:22,735 - root - INFO - Starting training. +2025-10-24 09:48:22,735 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:48:22,797 - root - INFO - Starting training. +2025-10-24 09:48:22,797 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:48:22,797 - root - INFO - Starting training. +2025-10-24 09:48:22,797 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:48:22,810 - root - INFO - Starting training. +2025-10-24 09:48:22,810 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:48:23,885 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:48:23,886 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:48:23,887 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:48:24,239 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:48:24,289 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:48:24,290 - root - INFO - GPU capacity: NVIDIA B200 (0) with 178.36GiB memory +2025-10-24 09:48:24,293 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:48:24,328 - root - INFO - Applied FSDP to the model +2025-10-24 09:48:24,328 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:48:24,384 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:48:24,386 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:48:24,386 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:48:24,457 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:48:24,459 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:48:24,459 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:48:24,717 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:48:24,767 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:48:24,767 - root - INFO - GPU capacity: NVIDIA B200 (6) with 178.36GiB memory +2025-10-24 09:48:24,771 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:48:24,804 - root - INFO - Applied FSDP to the model +2025-10-24 09:48:24,805 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:48:24,811 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:48:24,812 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:48:24,813 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:48:24,826 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:48:24,875 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:48:24,876 - root - INFO - GPU capacity: NVIDIA B200 (5) with 178.36GiB memory +2025-10-24 09:48:24,879 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:48:24,888 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:48:24,889 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:48:24,890 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:48:24,912 - root - INFO - Applied FSDP to the model +2025-10-24 09:48:24,912 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:48:24,956 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:48:24,958 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:48:24,958 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:48:24,990 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:48:24,991 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:48:24,992 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:48:25,003 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:48:25,004 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:48:25,005 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:48:25,138 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:48:25,188 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:48:25,189 - root - INFO - GPU capacity: NVIDIA B200 (4) with 178.36GiB memory +2025-10-24 09:48:25,192 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:48:25,225 - root - INFO - Applied FSDP to the model +2025-10-24 09:48:25,226 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:48:25,257 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:48:25,308 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:48:25,308 - root - INFO - GPU capacity: NVIDIA B200 (7) with 178.36GiB memory +2025-10-24 09:48:25,311 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:48:25,332 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:48:25,333 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:48:25,346 - root - INFO - Applied FSDP to the model +2025-10-24 09:48:25,346 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:48:25,359 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:48:25,382 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:48:25,382 - root - INFO - GPU capacity: NVIDIA B200 (1) with 178.36GiB memory +2025-10-24 09:48:25,384 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:48:25,384 - root - INFO - GPU capacity: NVIDIA B200 (3) with 178.36GiB memory +2025-10-24 09:48:25,386 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:48:25,388 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:48:25,409 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:48:25,410 - root - INFO - GPU capacity: NVIDIA B200 (2) with 178.36GiB memory +2025-10-24 09:48:25,413 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:48:25,419 - root - INFO - Applied FSDP to the model +2025-10-24 09:48:25,420 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:48:25,421 - root - INFO - Applied FSDP to the model +2025-10-24 09:48:25,421 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:48:25,446 - root - INFO - Applied FSDP to the model +2025-10-24 09:48:25,447 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:48:49,281 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:48:49,282 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:48:49,282 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:48:49,282 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:48:49,282 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:48:49,282 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:48:49,282 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:48:49,282 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +2025-10-24 09:48:49,697 - root - INFO - Loaded cached document counts in 0.00011658668518066406 seconds +2025-10-24 09:48:49,697 - root - INFO - Loaded cached document counts in 0.00011777877807617188 seconds +2025-10-24 09:48:49,697 - root - INFO - Loaded cached document counts in 0.000118255615234375 seconds +2025-10-24 09:48:49,697 - root - INFO - Loaded cached document counts in 0.0001266002655029297 seconds +2025-10-24 09:48:49,697 - root - INFO - Loaded cached document counts in 0.00011610984802246094 seconds +2025-10-24 09:48:49,697 - root - INFO - Loaded cached document counts in 0.0001232624053955078 seconds +2025-10-24 09:48:49,697 - root - INFO - Loaded cached document counts in 0.00013637542724609375 seconds +2025-10-24 09:48:49,697 - root - INFO - Loaded cached document counts in 0.0001628398895263672 seconds +2025-10-24 09:48:49,698 - root - INFO - Worker 0 responsible for docs: [('/work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet', 0, 945398)] +2025-10-24 09:48:49,698 - root - INFO - Total docs: 945399 +2025-10-24 09:48:49,698 - root - INFO - Worker 0 assembled subdataset iterator for /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/, 1 of 1 +Dataset checkpoint detected at jobs/munin-7b-open-stage1/checkpoints/dataloader/step-1000 +Dataset checkpoint loaded! Load time: 12.691347599029541 +2025-10-24 09:49:02,390 - root - INFO - Nodecay weight: tok_embeddings.weight +2025-10-24 09:49:02,390 - root - INFO - Decay weight: layers.0._orig_mod.attention.wq.weight +2025-10-24 09:49:02,390 - root - INFO - Decay weight: layers.0._orig_mod.attention.wk.weight +2025-10-24 09:49:02,390 - root - INFO - Decay weight: layers.0._orig_mod.attention.wv.weight +2025-10-24 09:49:02,390 - root - INFO - Decay weight: layers.0._orig_mod.attention.wo.weight +2025-10-24 09:49:02,390 - root - INFO - Decay weight: layers.0._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,390 - root - INFO - Decay weight: layers.0._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,390 - root - INFO - Decay weight: layers.0._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,390 - root - INFO - Nodecay weight: layers.0._orig_mod.attention_norm.weight +2025-10-24 09:49:02,390 - root - INFO - Nodecay weight: layers.0._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,390 - root - INFO - Decay weight: layers.1._orig_mod.attention.wq.weight +2025-10-24 09:49:02,390 - root - INFO - Decay weight: layers.1._orig_mod.attention.wk.weight +2025-10-24 09:49:02,390 - root - INFO - Decay weight: layers.1._orig_mod.attention.wv.weight +2025-10-24 09:49:02,390 - root - INFO - Decay weight: layers.1._orig_mod.attention.wo.weight +2025-10-24 09:49:02,390 - root - INFO - Decay weight: layers.1._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,390 - root - INFO - Decay weight: layers.1._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.1._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.1._orig_mod.attention_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.1._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.2._orig_mod.attention.wq.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.2._orig_mod.attention.wk.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.2._orig_mod.attention.wv.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.2._orig_mod.attention.wo.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.2._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.2._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.2._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.2._orig_mod.attention_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.2._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.3._orig_mod.attention.wq.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.3._orig_mod.attention.wk.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.3._orig_mod.attention.wv.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.3._orig_mod.attention.wo.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.3._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.3._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.3._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.3._orig_mod.attention_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.3._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.4._orig_mod.attention.wq.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.4._orig_mod.attention.wk.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.4._orig_mod.attention.wv.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.4._orig_mod.attention.wo.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.4._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.4._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.4._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.4._orig_mod.attention_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.4._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.5._orig_mod.attention.wq.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.5._orig_mod.attention.wk.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.5._orig_mod.attention.wv.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.5._orig_mod.attention.wo.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.5._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.5._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.5._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.5._orig_mod.attention_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.5._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.6._orig_mod.attention.wq.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.6._orig_mod.attention.wk.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.6._orig_mod.attention.wv.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.6._orig_mod.attention.wo.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.6._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.6._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.6._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.6._orig_mod.attention_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.6._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.7._orig_mod.attention.wq.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.7._orig_mod.attention.wk.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.7._orig_mod.attention.wv.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.7._orig_mod.attention.wo.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.7._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.7._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.7._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.7._orig_mod.attention_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.7._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.8._orig_mod.attention.wq.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.8._orig_mod.attention.wk.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.8._orig_mod.attention.wv.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.8._orig_mod.attention.wo.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.8._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.8._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.8._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.8._orig_mod.attention_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.8._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.9._orig_mod.attention.wq.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.9._orig_mod.attention.wk.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.9._orig_mod.attention.wv.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.9._orig_mod.attention.wo.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.9._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.9._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.9._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.9._orig_mod.attention_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.9._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.10._orig_mod.attention.wq.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.10._orig_mod.attention.wk.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.10._orig_mod.attention.wv.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.10._orig_mod.attention.wo.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.10._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.10._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.10._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.10._orig_mod.attention_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Nodecay weight: layers.10._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.11._orig_mod.attention.wq.weight +2025-10-24 09:49:02,391 - root - INFO - Decay weight: layers.11._orig_mod.attention.wk.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.11._orig_mod.attention.wv.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.11._orig_mod.attention.wo.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.11._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.11._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.11._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.11._orig_mod.attention_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.11._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.12._orig_mod.attention.wq.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.12._orig_mod.attention.wk.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.12._orig_mod.attention.wv.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.12._orig_mod.attention.wo.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.12._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.12._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.12._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.12._orig_mod.attention_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.12._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.13._orig_mod.attention.wq.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.13._orig_mod.attention.wk.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.13._orig_mod.attention.wv.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.13._orig_mod.attention.wo.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.13._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.13._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.13._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.13._orig_mod.attention_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.13._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.14._orig_mod.attention.wq.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.14._orig_mod.attention.wk.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.14._orig_mod.attention.wv.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.14._orig_mod.attention.wo.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.14._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.14._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.14._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.14._orig_mod.attention_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.14._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.15._orig_mod.attention.wq.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.15._orig_mod.attention.wk.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.15._orig_mod.attention.wv.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.15._orig_mod.attention.wo.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.15._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.15._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.15._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.15._orig_mod.attention_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.15._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.16._orig_mod.attention.wq.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.16._orig_mod.attention.wk.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.16._orig_mod.attention.wv.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.16._orig_mod.attention.wo.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.16._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.16._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.16._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.16._orig_mod.attention_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.16._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.17._orig_mod.attention.wq.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.17._orig_mod.attention.wk.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.17._orig_mod.attention.wv.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.17._orig_mod.attention.wo.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.17._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.17._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.17._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.17._orig_mod.attention_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.17._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.18._orig_mod.attention.wq.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.18._orig_mod.attention.wk.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.18._orig_mod.attention.wv.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.18._orig_mod.attention.wo.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.18._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.18._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.18._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.18._orig_mod.attention_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.18._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.19._orig_mod.attention.wq.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.19._orig_mod.attention.wk.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.19._orig_mod.attention.wv.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.19._orig_mod.attention.wo.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.19._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.19._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.19._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.19._orig_mod.attention_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.19._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.20._orig_mod.attention.wq.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.20._orig_mod.attention.wk.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.20._orig_mod.attention.wv.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.20._orig_mod.attention.wo.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.20._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.20._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.20._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.20._orig_mod.attention_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Nodecay weight: layers.20._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.21._orig_mod.attention.wq.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.21._orig_mod.attention.wk.weight +2025-10-24 09:49:02,392 - root - INFO - Decay weight: layers.21._orig_mod.attention.wv.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.21._orig_mod.attention.wo.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.21._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.21._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.21._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.21._orig_mod.attention_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.21._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.22._orig_mod.attention.wq.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.22._orig_mod.attention.wk.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.22._orig_mod.attention.wv.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.22._orig_mod.attention.wo.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.22._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.22._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.22._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.22._orig_mod.attention_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.22._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.23._orig_mod.attention.wq.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.23._orig_mod.attention.wk.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.23._orig_mod.attention.wv.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.23._orig_mod.attention.wo.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.23._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.23._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.23._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.23._orig_mod.attention_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.23._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.24._orig_mod.attention.wq.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.24._orig_mod.attention.wk.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.24._orig_mod.attention.wv.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.24._orig_mod.attention.wo.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.24._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.24._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.24._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.24._orig_mod.attention_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.24._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.25._orig_mod.attention.wq.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.25._orig_mod.attention.wk.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.25._orig_mod.attention.wv.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.25._orig_mod.attention.wo.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.25._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.25._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.25._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.25._orig_mod.attention_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.25._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.26._orig_mod.attention.wq.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.26._orig_mod.attention.wk.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.26._orig_mod.attention.wv.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.26._orig_mod.attention.wo.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.26._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.26._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.26._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.26._orig_mod.attention_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.26._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.27._orig_mod.attention.wq.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.27._orig_mod.attention.wk.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.27._orig_mod.attention.wv.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.27._orig_mod.attention.wo.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.27._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.27._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.27._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.27._orig_mod.attention_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.27._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.28._orig_mod.attention.wq.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.28._orig_mod.attention.wk.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.28._orig_mod.attention.wv.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.28._orig_mod.attention.wo.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.28._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.28._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.28._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.28._orig_mod.attention_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.28._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.29._orig_mod.attention.wq.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.29._orig_mod.attention.wk.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.29._orig_mod.attention.wv.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.29._orig_mod.attention.wo.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.29._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.29._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.29._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.29._orig_mod.attention_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.29._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.30._orig_mod.attention.wq.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.30._orig_mod.attention.wk.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.30._orig_mod.attention.wv.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.30._orig_mod.attention.wo.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.30._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.30._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.30._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.30._orig_mod.attention_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Nodecay weight: layers.30._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.31._orig_mod.attention.wq.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.31._orig_mod.attention.wk.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.31._orig_mod.attention.wv.weight +2025-10-24 09:49:02,393 - root - INFO - Decay weight: layers.31._orig_mod.attention.wo.weight +2025-10-24 09:49:02,394 - root - INFO - Decay weight: layers.31._orig_mod.feed_forward.w1.weight +2025-10-24 09:49:02,394 - root - INFO - Decay weight: layers.31._orig_mod.feed_forward.w2.weight +2025-10-24 09:49:02,394 - root - INFO - Decay weight: layers.31._orig_mod.feed_forward.w3.weight +2025-10-24 09:49:02,394 - root - INFO - Nodecay weight: layers.31._orig_mod.attention_norm.weight +2025-10-24 09:49:02,394 - root - INFO - Nodecay weight: layers.31._orig_mod.ffn_norm.weight +2025-10-24 09:49:02,394 - root - INFO - Nodecay weight: norm.weight +2025-10-24 09:49:02,394 - root - INFO - Decay weight: output.weight +2025-10-24 09:49:02,957 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:49:03,066 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:49:03,131 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:49:03,144 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:49:03,148 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:49:03,156 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:49:03,255 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:49:03,304 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:49:03,336 - root - INFO - Loading the checkpoint at step 1000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:49:03,336 - root - INFO - Loading the checkpoint at step 1000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:49:03,336 - root - INFO - Loading the checkpoint at step 1000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:49:03,337 - root - INFO - Loading the checkpoint at step 1000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:49:03,337 - root - INFO - Loading the checkpoint at step 1000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:49:03,337 - root - INFO - Loading the checkpoint at step 1000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:49:03,369 - root - INFO - Loading the checkpoint at step 1000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:49:03,369 - root - INFO - Loading the checkpoint at step 1000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:49:10,546 - root - INFO - Finished loading the checkpoint in 7.21 seconds +2025-10-24 09:49:10,547 - root - INFO - Finished loading the checkpoint in 7.18 seconds +2025-10-24 09:49:10,547 - root - INFO - Finished loading the checkpoint in 7.21 seconds +2025-10-24 09:49:10,547 - root - INFO - Finished loading the checkpoint in 7.21 seconds +2025-10-24 09:49:10,547 - root - INFO - Finished loading the checkpoint in 7.21 seconds +2025-10-24 09:49:10,547 - root - INFO - Finished loading the checkpoint in 7.21 seconds +2025-10-24 09:49:10,547 - root - INFO - Finished loading the checkpoint in 7.21 seconds +2025-10-24 09:49:10,548 - root - INFO - Finished loading the checkpoint in 7.18 seconds +2025-10-24 09:49:10,580 - root - INFO - Training starts at step 1000 +2025-10-24 09:49:10,581 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:49:10,581 - root - INFO - Training starts at step 1000 +2025-10-24 09:49:10,581 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:49:10,583 - root - INFO - Training starts at step 1000 +2025-10-24 09:49:10,583 - root - INFO - Training starts at step 1000 +2025-10-24 09:49:10,583 - root - INFO - Training starts at step 1000 +2025-10-24 09:49:10,584 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:49:10,584 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:49:10,584 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:49:10,584 - root - INFO - Training starts at step 1000 +2025-10-24 09:49:10,584 - root - INFO - Training starts at step 1000 +2025-10-24 09:49:10,584 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:49:10,584 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:49:10,584 - root - INFO - Training starts at step 1000 +2025-10-24 09:49:10,584 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:49:10,585 - root - INFO - Worker 0 opening new file /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +2025-10-24 09:49:42,267 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 2.4001), tps=10344, mfu=21.55%, memory: 112.04GiB(62.82%) time/data_loading=0.16s (max=0.21s, 13.11%) +2025-10-24 09:49:42,267 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 2.4001), tps=10344, mfu=21.55%, memory: 112.04GiB(62.82%) time/data_loading=0.16s (max=0.21s, 13.11%) +2025-10-24 09:49:42,267 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 2.4001), tps=10343, mfu=21.55%, memory: 112.04GiB(62.82%) time/data_loading=0.16s (max=0.21s, 13.11%) +2025-10-24 09:49:42,267 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 2.4001), tps=10344, mfu=21.55%, memory: 112.04GiB(62.82%) time/data_loading=0.16s (max=0.21s, 13.11%) +2025-10-24 09:49:42,267 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 2.4001), tps=10344, mfu=21.55%, memory: 112.04GiB(62.82%) time/data_loading=0.16s (max=0.21s, 13.11%) +2025-10-24 09:49:42,267 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 2.4001), tps=10344, mfu=21.55%, memory: 112.04GiB(62.82%) time/data_loading=0.16s (max=0.21s, 13.11%) +2025-10-24 09:49:42,267 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 2.4001), tps=10344, mfu=21.55%, memory: 112.04GiB(62.82%) time/data_loading=0.16s (max=0.21s, 13.11%) +2025-10-24 09:49:42,267 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 2.4001), tps=10343, mfu=21.55%, memory: 112.04GiB(62.82%) time/data_loading=0.16s (max=0.21s, 13.11%) +2025-10-24 09:49:42,547 - root - INFO - Dumping traces at step 10 +2025-10-24 09:49:42,558 - root - INFO - Dumping traces at step 10 +2025-10-24 09:49:42,560 - root - INFO - Dumping traces at step 10 +2025-10-24 09:49:42,563 - root - INFO - Dumping traces at step 10 +2025-10-24 09:49:42,565 - root - INFO - Dumping traces at step 10 +2025-10-24 09:49:42,577 - root - INFO - Dumping traces at step 10 +2025-10-24 09:49:42,581 - root - INFO - Dumping traces at step 10 +2025-10-24 09:49:42,582 - root - INFO - Dumping traces at step 10 +2025-10-24 09:49:42,631 - root - INFO - Finished dumping traces in 0.08 seconds +2025-10-24 09:49:42,643 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-24 09:49:42,644 - root - INFO - Finished dumping traces in 0.08 seconds +2025-10-24 09:49:42,649 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-24 09:49:42,650 - root - INFO - Finished dumping traces in 0.08 seconds +2025-10-24 09:49:42,665 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-24 09:49:42,667 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-24 09:49:42,669 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-24 09:49:47,296 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:2240880 +2025-10-24 09:49:59,345 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.5192 (max= 2.1790), tps=19191, mfu=39.99%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:49:59,345 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.5192 (max= 2.1790), tps=19191, mfu=39.99%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:49:59,345 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.5192 (max= 2.1790), tps=19191, mfu=39.99%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:49:59,345 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.5192 (max= 2.1790), tps=19191, mfu=39.99%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:49:59,345 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.5192 (max= 2.1790), tps=19191, mfu=39.99%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:49:59,345 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.5192 (max= 2.1790), tps=19191, mfu=39.99%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:49:59,345 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.5192 (max= 2.1790), tps=19191, mfu=39.99%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:49:59,345 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.5192 (max= 2.1790), tps=19191, mfu=39.99%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:49:59,433 - root - INFO - Dumping traces at step 20 +2025-10-24 09:49:59,433 - root - INFO - Dumping traces at step 20 +2025-10-24 09:49:59,433 - root - INFO - Dumping traces at step 20 +2025-10-24 09:49:59,434 - root - INFO - Dumping traces at step 20 +2025-10-24 09:49:59,435 - root - INFO - Dumping traces at step 20 +2025-10-24 09:49:59,441 - root - INFO - Dumping traces at step 20 +2025-10-24 09:49:59,441 - root - INFO - Dumping traces at step 20 +2025-10-24 09:49:59,442 - root - INFO - Dumping traces at step 20 +2025-10-24 09:49:59,520 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-24 09:49:59,523 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-24 09:49:59,528 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-24 09:49:59,528 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-24 09:49:59,541 - root - INFO - Finished dumping traces in 0.11 seconds +2025-10-24 09:49:59,541 - root - INFO - Finished dumping traces in 0.10 seconds +2025-10-24 09:49:59,541 - root - INFO - Finished dumping traces in 0.10 seconds +2025-10-24 09:49:59,544 - root - INFO - Finished dumping traces in 0.10 seconds +2025-10-24 09:50:16,212 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.4891 (max= 2.4937), tps=19431, mfu=40.49%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:16,212 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.4891 (max= 2.4937), tps=19431, mfu=40.49%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:16,212 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.4891 (max= 2.4937), tps=19431, mfu=40.49%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:16,212 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.4891 (max= 2.4937), tps=19431, mfu=40.49%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:16,212 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.4891 (max= 2.4937), tps=19431, mfu=40.49%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:16,212 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.4891 (max= 2.4937), tps=19431, mfu=40.49%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:16,212 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.4891 (max= 2.4937), tps=19431, mfu=40.49%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:16,212 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.4891 (max= 2.4937), tps=19431, mfu=40.49%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:16,303 - root - INFO - Dumping traces at step 30 +2025-10-24 09:50:16,303 - root - INFO - Dumping traces at step 30 +2025-10-24 09:50:16,303 - root - INFO - Dumping traces at step 30 +2025-10-24 09:50:16,303 - root - INFO - Dumping traces at step 30 +2025-10-24 09:50:16,304 - root - INFO - Dumping traces at step 30 +2025-10-24 09:50:16,304 - root - INFO - Dumping traces at step 30 +2025-10-24 09:50:16,304 - root - INFO - Dumping traces at step 30 +2025-10-24 09:50:16,306 - root - INFO - Dumping traces at step 30 +2025-10-24 09:50:16,387 - root - INFO - Finished dumping traces in 0.08 seconds +2025-10-24 09:50:16,389 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-24 09:50:16,395 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-24 09:50:16,395 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-24 09:50:16,396 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-24 09:50:16,397 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-24 09:50:16,397 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-24 09:50:16,397 - root - INFO - Finished dumping traces in 0.09 seconds +2025-10-24 09:50:33,083 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.4984 (max= 2.2787), tps=19427, mfu=40.48%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:33,083 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.4984 (max= 2.2787), tps=19427, mfu=40.48%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:33,083 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.4984 (max= 2.2787), tps=19427, mfu=40.48%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:33,083 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.4984 (max= 2.2787), tps=19427, mfu=40.48%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:33,083 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.4984 (max= 2.2787), tps=19427, mfu=40.48%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:33,083 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.4984 (max= 2.2787), tps=19427, mfu=40.48%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:33,083 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.4984 (max= 2.2787), tps=19427, mfu=40.48%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:33,083 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.4984 (max= 2.2787), tps=19427, mfu=40.48%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:49,709 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.4735 (max= 2.5500), tps=19713, mfu=41.07%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:49,709 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.4735 (max= 2.5500), tps=19713, mfu=41.07%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:49,709 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.4735 (max= 2.5500), tps=19713, mfu=41.07%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:49,709 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.4735 (max= 2.5500), tps=19712, mfu=41.07%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:49,709 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.4735 (max= 2.5500), tps=19712, mfu=41.07%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:49,709 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.4735 (max= 2.5500), tps=19712, mfu=41.07%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:49,709 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.4735 (max= 2.5500), tps=19713, mfu=41.07%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:50:49,709 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.4735 (max= 2.5500), tps=19713, mfu=41.07%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:51:06,355 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.5094 (max= 2.5728), tps=19689, mfu=41.02%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:51:06,355 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.5094 (max= 2.5728), tps=19689, mfu=41.02%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:51:06,355 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.5094 (max= 2.5728), tps=19689, mfu=41.02%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:51:06,355 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.5094 (max= 2.5728), tps=19689, mfu=41.02%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:51:06,355 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.5094 (max= 2.5728), tps=19689, mfu=41.02%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:51:06,355 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.5094 (max= 2.5728), tps=19689, mfu=41.02%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:51:06,355 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.5094 (max= 2.5728), tps=19689, mfu=41.02%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:51:06,355 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.5094 (max= 2.5728), tps=19689, mfu=41.02%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:51:22,985 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.4741 (max= 2.0750), tps=19709, mfu=41.06%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:51:22,985 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.4741 (max= 2.0750), tps=19709, mfu=41.06%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:51:22,985 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.4741 (max= 2.0750), tps=19709, mfu=41.06%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:51:22,985 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.4741 (max= 2.0750), tps=19709, mfu=41.06%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:51:22,985 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.4741 (max= 2.0750), tps=19709, mfu=41.06%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:51:22,985 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.4741 (max= 2.0750), tps=19709, mfu=41.06%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:51:22,985 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.4741 (max= 2.0750), tps=19709, mfu=41.06%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:51:22,985 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.4741 (max= 2.0750), tps=19709, mfu=41.06%, memory: 112.04GiB(62.82%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 09:51:56,379 - root - INFO - Starting training. +2025-10-24 09:51:56,379 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:51:56,382 - root - INFO - Starting training. +2025-10-24 09:51:56,382 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:51:56,567 - root - INFO - Starting training. +2025-10-24 09:51:56,567 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:51:56,569 - root - INFO - Starting training. +2025-10-24 09:51:56,569 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:51:56,579 - root - INFO - Starting training. +2025-10-24 09:51:56,579 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:51:56,593 - root - INFO - Starting training. +2025-10-24 09:51:56,594 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:51:56,606 - root - INFO - Starting training. +2025-10-24 09:51:56,606 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:51:56,606 - root - INFO - Starting training. +2025-10-24 09:51:56,606 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 09:51:57,350 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:51:57,351 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:51:57,352 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:51:57,871 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:51:57,873 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:51:57,873 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:51:57,940 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:51:57,990 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:51:57,991 - root - INFO - GPU capacity: NVIDIA B200 (0) with 178.36GiB memory +2025-10-24 09:51:57,994 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:51:58,008 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:51:58,009 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:51:58,010 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:51:58,028 - root - INFO - Applied FSDP to the model +2025-10-24 09:51:58,029 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:51:58,209 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:51:58,260 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:51:58,261 - root - INFO - GPU capacity: NVIDIA B200 (2) with 178.36GiB memory +2025-10-24 09:51:58,264 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:51:58,298 - root - INFO - Applied FSDP to the model +2025-10-24 09:51:58,299 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:51:58,391 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:51:58,441 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:51:58,441 - root - INFO - GPU capacity: NVIDIA B200 (6) with 178.36GiB memory +2025-10-24 09:51:58,445 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:51:58,477 - root - INFO - Applied FSDP to the model +2025-10-24 09:51:58,478 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:51:58,669 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:51:58,670 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:51:58,671 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:51:58,716 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:51:58,717 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:51:58,717 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:51:58,718 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:51:58,719 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:51:58,719 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:51:58,727 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:51:58,729 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:51:58,729 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 09:51:58,729 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:51:58,731 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 09:51:58,732 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 09:51:59,064 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:51:59,072 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:51:59,107 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:51:59,118 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:51:59,119 - root - INFO - GPU capacity: NVIDIA B200 (3) with 178.36GiB memory +2025-10-24 09:51:59,122 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:51:59,122 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:51:59,123 - root - INFO - GPU capacity: NVIDIA B200 (4) with 178.36GiB memory +2025-10-24 09:51:59,126 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:51:59,136 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:51:59,157 - root - INFO - Applied FSDP to the model +2025-10-24 09:51:59,157 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:51:59,158 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:51:59,158 - root - INFO - GPU capacity: NVIDIA B200 (5) with 178.36GiB memory +2025-10-24 09:51:59,159 - root - INFO - Applied FSDP to the model +2025-10-24 09:51:59,160 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:51:59,161 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:51:59,186 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:51:59,186 - root - INFO - GPU capacity: NVIDIA B200 (7) with 178.36GiB memory +2025-10-24 09:51:59,190 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:51:59,195 - root - INFO - Applied FSDP to the model +2025-10-24 09:51:59,195 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:51:59,223 - root - INFO - Applied FSDP to the model +2025-10-24 09:51:59,224 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:51:59,231 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 09:51:59,281 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 09:51:59,281 - root - INFO - GPU capacity: NVIDIA B200 (1) with 178.36GiB memory +2025-10-24 09:51:59,285 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 09:51:59,320 - root - INFO - Applied FSDP to the model +2025-10-24 09:51:59,321 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 09:52:23,363 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:52:23,364 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:52:23,364 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:52:23,364 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:52:23,364 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:52:23,364 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:52:23,365 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 09:52:23,365 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +2025-10-24 09:52:24,145 - root - INFO - Loaded cached document counts in 0.00012159347534179688 seconds +2025-10-24 09:52:24,146 - root - INFO - Loaded cached document counts in 9.632110595703125e-05 seconds +2025-10-24 09:52:24,146 - root - INFO - Loaded cached document counts in 0.00010466575622558594 seconds +2025-10-24 09:52:24,146 - root - INFO - Loaded cached document counts in 7.891654968261719e-05 seconds +2025-10-24 09:52:24,146 - root - INFO - Loaded cached document counts in 7.796287536621094e-05 seconds +2025-10-24 09:52:24,146 - root - INFO - Loaded cached document counts in 6.532669067382812e-05 seconds +2025-10-24 09:52:24,146 - root - INFO - Loaded cached document counts in 5.650520324707031e-05 seconds +2025-10-24 09:52:24,146 - root - INFO - Loaded cached document counts in 7.963180541992188e-05 seconds +2025-10-24 09:52:24,147 - root - INFO - Worker 0 responsible for docs: [('/work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet', 0, 945398)] +2025-10-24 09:52:24,147 - root - INFO - Total docs: 945399 +2025-10-24 09:52:24,147 - root - INFO - Worker 0 assembled subdataset iterator for /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/, 1 of 1 +Dataset checkpoint detected at jobs/munin-7b-open-stage1/checkpoints/dataloader/step-1000 +Dataset checkpoint loaded! Load time: 12.709259748458862 +2025-10-24 09:52:36,858 - root - INFO - Nodecay weight: tok_embeddings.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.0._orig_mod.attention.wq.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.0._orig_mod.attention.wk.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.0._orig_mod.attention.wv.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.0._orig_mod.attention.wo.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.0._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.0._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.0._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,859 - root - INFO - Nodecay weight: layers.0._orig_mod.attention_norm.weight +2025-10-24 09:52:36,859 - root - INFO - Nodecay weight: layers.0._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.1._orig_mod.attention.wq.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.1._orig_mod.attention.wk.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.1._orig_mod.attention.wv.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.1._orig_mod.attention.wo.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.1._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.1._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.1._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,859 - root - INFO - Nodecay weight: layers.1._orig_mod.attention_norm.weight +2025-10-24 09:52:36,859 - root - INFO - Nodecay weight: layers.1._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.2._orig_mod.attention.wq.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.2._orig_mod.attention.wk.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.2._orig_mod.attention.wv.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.2._orig_mod.attention.wo.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.2._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.2._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.2._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,859 - root - INFO - Nodecay weight: layers.2._orig_mod.attention_norm.weight +2025-10-24 09:52:36,859 - root - INFO - Nodecay weight: layers.2._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.3._orig_mod.attention.wq.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.3._orig_mod.attention.wk.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.3._orig_mod.attention.wv.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.3._orig_mod.attention.wo.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.3._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.3._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.3._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,859 - root - INFO - Nodecay weight: layers.3._orig_mod.attention_norm.weight +2025-10-24 09:52:36,859 - root - INFO - Nodecay weight: layers.3._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.4._orig_mod.attention.wq.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.4._orig_mod.attention.wk.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.4._orig_mod.attention.wv.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.4._orig_mod.attention.wo.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.4._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.4._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,859 - root - INFO - Decay weight: layers.4._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,859 - root - INFO - Nodecay weight: layers.4._orig_mod.attention_norm.weight +2025-10-24 09:52:36,860 - root - INFO - Nodecay weight: layers.4._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.5._orig_mod.attention.wq.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.5._orig_mod.attention.wk.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.5._orig_mod.attention.wv.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.5._orig_mod.attention.wo.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.5._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.5._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.5._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,860 - root - INFO - Nodecay weight: layers.5._orig_mod.attention_norm.weight +2025-10-24 09:52:36,860 - root - INFO - Nodecay weight: layers.5._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.6._orig_mod.attention.wq.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.6._orig_mod.attention.wk.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.6._orig_mod.attention.wv.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.6._orig_mod.attention.wo.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.6._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.6._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.6._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,860 - root - INFO - Nodecay weight: layers.6._orig_mod.attention_norm.weight +2025-10-24 09:52:36,860 - root - INFO - Nodecay weight: layers.6._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.7._orig_mod.attention.wq.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.7._orig_mod.attention.wk.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.7._orig_mod.attention.wv.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.7._orig_mod.attention.wo.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.7._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.7._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.7._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,860 - root - INFO - Nodecay weight: layers.7._orig_mod.attention_norm.weight +2025-10-24 09:52:36,860 - root - INFO - Nodecay weight: layers.7._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.8._orig_mod.attention.wq.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.8._orig_mod.attention.wk.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.8._orig_mod.attention.wv.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.8._orig_mod.attention.wo.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.8._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.8._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.8._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,860 - root - INFO - Nodecay weight: layers.8._orig_mod.attention_norm.weight +2025-10-24 09:52:36,860 - root - INFO - Nodecay weight: layers.8._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.9._orig_mod.attention.wq.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.9._orig_mod.attention.wk.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.9._orig_mod.attention.wv.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.9._orig_mod.attention.wo.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.9._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.9._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.9._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,860 - root - INFO - Nodecay weight: layers.9._orig_mod.attention_norm.weight +2025-10-24 09:52:36,860 - root - INFO - Nodecay weight: layers.9._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.10._orig_mod.attention.wq.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.10._orig_mod.attention.wk.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.10._orig_mod.attention.wv.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.10._orig_mod.attention.wo.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.10._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.10._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.10._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,860 - root - INFO - Nodecay weight: layers.10._orig_mod.attention_norm.weight +2025-10-24 09:52:36,860 - root - INFO - Nodecay weight: layers.10._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.11._orig_mod.attention.wq.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.11._orig_mod.attention.wk.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.11._orig_mod.attention.wv.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.11._orig_mod.attention.wo.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.11._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.11._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,860 - root - INFO - Decay weight: layers.11._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,860 - root - INFO - Nodecay weight: layers.11._orig_mod.attention_norm.weight +2025-10-24 09:52:36,860 - root - INFO - Nodecay weight: layers.11._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.12._orig_mod.attention.wq.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.12._orig_mod.attention.wk.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.12._orig_mod.attention.wv.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.12._orig_mod.attention.wo.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.12._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.12._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.12._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,861 - root - INFO - Nodecay weight: layers.12._orig_mod.attention_norm.weight +2025-10-24 09:52:36,861 - root - INFO - Nodecay weight: layers.12._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.13._orig_mod.attention.wq.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.13._orig_mod.attention.wk.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.13._orig_mod.attention.wv.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.13._orig_mod.attention.wo.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.13._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.13._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.13._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,861 - root - INFO - Nodecay weight: layers.13._orig_mod.attention_norm.weight +2025-10-24 09:52:36,861 - root - INFO - Nodecay weight: layers.13._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.14._orig_mod.attention.wq.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.14._orig_mod.attention.wk.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.14._orig_mod.attention.wv.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.14._orig_mod.attention.wo.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.14._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.14._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.14._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,861 - root - INFO - Nodecay weight: layers.14._orig_mod.attention_norm.weight +2025-10-24 09:52:36,861 - root - INFO - Nodecay weight: layers.14._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.15._orig_mod.attention.wq.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.15._orig_mod.attention.wk.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.15._orig_mod.attention.wv.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.15._orig_mod.attention.wo.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.15._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.15._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.15._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,861 - root - INFO - Nodecay weight: layers.15._orig_mod.attention_norm.weight +2025-10-24 09:52:36,861 - root - INFO - Nodecay weight: layers.15._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.16._orig_mod.attention.wq.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.16._orig_mod.attention.wk.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.16._orig_mod.attention.wv.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.16._orig_mod.attention.wo.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.16._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.16._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.16._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,861 - root - INFO - Nodecay weight: layers.16._orig_mod.attention_norm.weight +2025-10-24 09:52:36,861 - root - INFO - Nodecay weight: layers.16._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.17._orig_mod.attention.wq.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.17._orig_mod.attention.wk.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.17._orig_mod.attention.wv.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.17._orig_mod.attention.wo.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.17._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.17._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.17._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,861 - root - INFO - Nodecay weight: layers.17._orig_mod.attention_norm.weight +2025-10-24 09:52:36,861 - root - INFO - Nodecay weight: layers.17._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.18._orig_mod.attention.wq.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.18._orig_mod.attention.wk.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.18._orig_mod.attention.wv.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.18._orig_mod.attention.wo.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.18._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.18._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.18._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,861 - root - INFO - Nodecay weight: layers.18._orig_mod.attention_norm.weight +2025-10-24 09:52:36,861 - root - INFO - Nodecay weight: layers.18._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.19._orig_mod.attention.wq.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.19._orig_mod.attention.wk.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.19._orig_mod.attention.wv.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.19._orig_mod.attention.wo.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.19._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.19._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,861 - root - INFO - Decay weight: layers.19._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,862 - root - INFO - Nodecay weight: layers.19._orig_mod.attention_norm.weight +2025-10-24 09:52:36,862 - root - INFO - Nodecay weight: layers.19._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.20._orig_mod.attention.wq.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.20._orig_mod.attention.wk.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.20._orig_mod.attention.wv.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.20._orig_mod.attention.wo.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.20._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.20._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.20._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,862 - root - INFO - Nodecay weight: layers.20._orig_mod.attention_norm.weight +2025-10-24 09:52:36,862 - root - INFO - Nodecay weight: layers.20._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.21._orig_mod.attention.wq.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.21._orig_mod.attention.wk.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.21._orig_mod.attention.wv.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.21._orig_mod.attention.wo.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.21._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.21._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.21._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,862 - root - INFO - Nodecay weight: layers.21._orig_mod.attention_norm.weight +2025-10-24 09:52:36,862 - root - INFO - Nodecay weight: layers.21._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.22._orig_mod.attention.wq.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.22._orig_mod.attention.wk.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.22._orig_mod.attention.wv.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.22._orig_mod.attention.wo.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.22._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.22._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.22._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,862 - root - INFO - Nodecay weight: layers.22._orig_mod.attention_norm.weight +2025-10-24 09:52:36,862 - root - INFO - Nodecay weight: layers.22._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.23._orig_mod.attention.wq.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.23._orig_mod.attention.wk.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.23._orig_mod.attention.wv.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.23._orig_mod.attention.wo.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.23._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.23._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.23._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,862 - root - INFO - Nodecay weight: layers.23._orig_mod.attention_norm.weight +2025-10-24 09:52:36,862 - root - INFO - Nodecay weight: layers.23._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.24._orig_mod.attention.wq.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.24._orig_mod.attention.wk.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.24._orig_mod.attention.wv.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.24._orig_mod.attention.wo.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.24._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.24._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.24._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,862 - root - INFO - Nodecay weight: layers.24._orig_mod.attention_norm.weight +2025-10-24 09:52:36,862 - root - INFO - Nodecay weight: layers.24._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.25._orig_mod.attention.wq.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.25._orig_mod.attention.wk.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.25._orig_mod.attention.wv.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.25._orig_mod.attention.wo.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.25._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.25._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.25._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,862 - root - INFO - Nodecay weight: layers.25._orig_mod.attention_norm.weight +2025-10-24 09:52:36,862 - root - INFO - Nodecay weight: layers.25._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.26._orig_mod.attention.wq.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.26._orig_mod.attention.wk.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.26._orig_mod.attention.wv.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.26._orig_mod.attention.wo.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.26._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.26._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.26._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,862 - root - INFO - Nodecay weight: layers.26._orig_mod.attention_norm.weight +2025-10-24 09:52:36,862 - root - INFO - Nodecay weight: layers.26._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.27._orig_mod.attention.wq.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.27._orig_mod.attention.wk.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.27._orig_mod.attention.wv.weight +2025-10-24 09:52:36,862 - root - INFO - Decay weight: layers.27._orig_mod.attention.wo.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.27._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.27._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.27._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,863 - root - INFO - Nodecay weight: layers.27._orig_mod.attention_norm.weight +2025-10-24 09:52:36,863 - root - INFO - Nodecay weight: layers.27._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.28._orig_mod.attention.wq.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.28._orig_mod.attention.wk.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.28._orig_mod.attention.wv.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.28._orig_mod.attention.wo.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.28._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.28._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.28._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,863 - root - INFO - Nodecay weight: layers.28._orig_mod.attention_norm.weight +2025-10-24 09:52:36,863 - root - INFO - Nodecay weight: layers.28._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.29._orig_mod.attention.wq.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.29._orig_mod.attention.wk.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.29._orig_mod.attention.wv.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.29._orig_mod.attention.wo.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.29._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.29._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.29._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,863 - root - INFO - Nodecay weight: layers.29._orig_mod.attention_norm.weight +2025-10-24 09:52:36,863 - root - INFO - Nodecay weight: layers.29._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.30._orig_mod.attention.wq.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.30._orig_mod.attention.wk.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.30._orig_mod.attention.wv.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.30._orig_mod.attention.wo.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.30._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.30._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.30._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,863 - root - INFO - Nodecay weight: layers.30._orig_mod.attention_norm.weight +2025-10-24 09:52:36,863 - root - INFO - Nodecay weight: layers.30._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.31._orig_mod.attention.wq.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.31._orig_mod.attention.wk.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.31._orig_mod.attention.wv.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.31._orig_mod.attention.wo.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.31._orig_mod.feed_forward.w1.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.31._orig_mod.feed_forward.w2.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: layers.31._orig_mod.feed_forward.w3.weight +2025-10-24 09:52:36,863 - root - INFO - Nodecay weight: layers.31._orig_mod.attention_norm.weight +2025-10-24 09:52:36,863 - root - INFO - Nodecay weight: layers.31._orig_mod.ffn_norm.weight +2025-10-24 09:52:36,863 - root - INFO - Nodecay weight: norm.weight +2025-10-24 09:52:36,863 - root - INFO - Decay weight: output.weight +2025-10-24 09:52:37,385 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:52:37,458 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:52:37,487 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:52:37,564 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:52:37,700 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:52:37,787 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:52:37,824 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:52:37,947 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 09:52:37,952 - root - INFO - Loading the checkpoint at step 1000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:52:37,996 - root - INFO - Loading the checkpoint at step 1000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:52:37,997 - root - INFO - Loading the checkpoint at step 1000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:52:37,997 - root - INFO - Loading the checkpoint at step 1000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:52:37,998 - root - INFO - Loading the checkpoint at step 1000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:52:37,998 - root - INFO - Loading the checkpoint at step 1000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:52:37,998 - root - INFO - Loading the checkpoint at step 1000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:52:37,998 - root - INFO - Loading the checkpoint at step 1000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 09:52:46,935 - root - INFO - Finished loading the checkpoint in 8.94 seconds +2025-10-24 09:52:46,936 - root - INFO - Finished loading the checkpoint in 8.94 seconds +2025-10-24 09:52:46,936 - root - INFO - Finished loading the checkpoint in 8.94 seconds +2025-10-24 09:52:46,936 - root - INFO - Finished loading the checkpoint in 8.94 seconds +2025-10-24 09:52:46,937 - root - INFO - Finished loading the checkpoint in 8.94 seconds +2025-10-24 09:52:46,937 - root - INFO - Finished loading the checkpoint in 8.94 seconds +2025-10-24 09:52:46,937 - root - INFO - Finished loading the checkpoint in 8.94 seconds +2025-10-24 09:52:46,938 - root - INFO - Finished loading the checkpoint in 8.99 seconds +2025-10-24 09:52:46,969 - root - INFO - Training starts at step 1000 +2025-10-24 09:52:46,969 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:52:46,970 - root - INFO - Training starts at step 1000 +2025-10-24 09:52:46,971 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:52:46,971 - root - INFO - Training starts at step 1000 +2025-10-24 09:52:46,972 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:52:46,971 - root - INFO - Training starts at step 1000 +2025-10-24 09:52:46,972 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:52:46,972 - root - INFO - Training starts at step 1000 +2025-10-24 09:52:46,972 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:52:46,972 - root - INFO - Training starts at step 1000 +2025-10-24 09:52:46,973 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:52:46,980 - root - INFO - Training starts at step 1000 +2025-10-24 09:52:46,980 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:52:46,983 - root - INFO - Training starts at step 1000 +2025-10-24 09:52:46,983 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 09:52:46,983 - root - INFO - Worker 0 opening new file /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/_inductor/lowering.py:1917: UserWarning: Torchinductor does not support code generation for complex operators. Performance may be worse than eager. + warnings.warn( +2025-10-24 09:53:15,276 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 3.1524), tps=11580, mfu=24.13%, memory: 78.53GiB(44.03%) time/data_loading=0.05s (max=0.08s, 11.53%) +2025-10-24 09:53:15,276 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 3.1524), tps=11579, mfu=24.12%, memory: 78.53GiB(44.03%) time/data_loading=0.05s (max=0.08s, 11.53%) +2025-10-24 09:53:15,276 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 3.1524), tps=11583, mfu=24.13%, memory: 78.53GiB(44.03%) time/data_loading=0.05s (max=0.08s, 11.53%) +2025-10-24 09:53:15,276 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 3.1524), tps=11579, mfu=24.13%, memory: 78.53GiB(44.03%) time/data_loading=0.05s (max=0.08s, 11.53%) +2025-10-24 09:53:15,276 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 3.1524), tps=11580, mfu=24.13%, memory: 78.53GiB(44.03%) time/data_loading=0.05s (max=0.08s, 11.53%) +2025-10-24 09:53:15,276 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 3.1524), tps=11580, mfu=24.13%, memory: 78.53GiB(44.03%) time/data_loading=0.05s (max=0.08s, 11.53%) +2025-10-24 09:53:15,276 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 3.1524), tps=11579, mfu=24.13%, memory: 78.53GiB(44.03%) time/data_loading=0.05s (max=0.08s, 11.53%) +2025-10-24 09:53:15,276 - root - INFO - Step 1010: lr=1.00E-05, loss= 1.5061 (max= 3.1524), tps=11584, mfu=24.14%, memory: 78.53GiB(44.03%) time/data_loading=0.05s (max=0.08s, 11.53%) +2025-10-24 09:53:15,506 - root - INFO - Dumping traces at step 10 +2025-10-24 09:53:15,507 - root - INFO - Dumping traces at step 10 +2025-10-24 09:53:15,509 - root - INFO - Dumping traces at step 10 +2025-10-24 09:53:15,522 - root - INFO - Dumping traces at step 10 +2025-10-24 09:53:15,525 - root - INFO - Dumping traces at step 10 +2025-10-24 09:53:15,527 - root - INFO - Dumping traces at step 10 +2025-10-24 09:53:15,528 - root - INFO - Dumping traces at step 10 +2025-10-24 09:53:15,529 - root - INFO - Dumping traces at step 10 +2025-10-24 09:53:15,638 - root - INFO - Finished dumping traces in 0.13 seconds +2025-10-24 09:53:15,646 - root - INFO - Finished dumping traces in 0.14 seconds +2025-10-24 09:53:15,647 - root - INFO - Finished dumping traces in 0.14 seconds +2025-10-24 09:53:15,664 - root - INFO - Finished dumping traces in 0.14 seconds +2025-10-24 09:53:15,665 - root - INFO - Finished dumping traces in 0.14 seconds +2025-10-24 09:53:15,668 - root - INFO - Finished dumping traces in 0.14 seconds +2025-10-24 09:53:15,669 - root - INFO - Finished dumping traces in 0.14 seconds +2025-10-24 09:53:15,669 - root - INFO - Finished dumping traces in 0.14 seconds +2025-10-24 09:53:24,890 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:2240880 +2025-10-24 09:53:33,737 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.5192 (max= 2.6973), tps=17753, mfu=36.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:53:33,737 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.5192 (max= 2.6973), tps=17754, mfu=36.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:53:33,737 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.5192 (max= 2.6973), tps=17753, mfu=36.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:53:33,737 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.5192 (max= 2.6973), tps=17753, mfu=36.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:53:33,737 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.5192 (max= 2.6973), tps=17753, mfu=36.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:53:33,737 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.5192 (max= 2.6973), tps=17753, mfu=36.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:53:33,737 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.5192 (max= 2.6973), tps=17753, mfu=36.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:53:33,737 - root - INFO - Step 1020: lr=1.00E-05, loss= 1.5192 (max= 2.6973), tps=17753, mfu=36.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:53:33,905 - root - INFO - Dumping traces at step 20 +2025-10-24 09:53:33,906 - root - INFO - Dumping traces at step 20 +2025-10-24 09:53:33,908 - root - INFO - Dumping traces at step 20 +2025-10-24 09:53:33,910 - root - INFO - Dumping traces at step 20 +2025-10-24 09:53:33,910 - root - INFO - Dumping traces at step 20 +2025-10-24 09:53:33,918 - root - INFO - Dumping traces at step 20 +2025-10-24 09:53:33,919 - root - INFO - Dumping traces at step 20 +2025-10-24 09:53:33,924 - root - INFO - Dumping traces at step 20 +2025-10-24 09:53:34,038 - root - INFO - Finished dumping traces in 0.13 seconds +2025-10-24 09:53:34,043 - root - INFO - Finished dumping traces in 0.14 seconds +2025-10-24 09:53:34,049 - root - INFO - Finished dumping traces in 0.14 seconds +2025-10-24 09:53:34,050 - root - INFO - Finished dumping traces in 0.14 seconds +2025-10-24 09:53:34,058 - root - INFO - Finished dumping traces in 0.15 seconds +2025-10-24 09:53:34,059 - root - INFO - Finished dumping traces in 0.14 seconds +2025-10-24 09:53:34,059 - root - INFO - Finished dumping traces in 0.14 seconds +2025-10-24 09:53:34,059 - root - INFO - Finished dumping traces in 0.14 seconds +2025-10-24 09:53:52,118 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.4891 (max= 2.7790), tps=17831, mfu=37.15%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:53:52,118 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.4891 (max= 2.7790), tps=17831, mfu=37.15%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:53:52,118 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.4891 (max= 2.7790), tps=17831, mfu=37.15%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:53:52,118 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.4891 (max= 2.7790), tps=17831, mfu=37.15%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:53:52,118 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.4891 (max= 2.7790), tps=17831, mfu=37.15%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:53:52,118 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.4891 (max= 2.7790), tps=17830, mfu=37.15%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:53:52,118 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.4891 (max= 2.7790), tps=17831, mfu=37.15%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:53:52,118 - root - INFO - Step 1030: lr=1.00E-05, loss= 1.4891 (max= 2.7790), tps=17831, mfu=37.15%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:53:52,292 - root - INFO - Dumping traces at step 30 +2025-10-24 09:53:52,293 - root - INFO - Dumping traces at step 30 +2025-10-24 09:53:52,296 - root - INFO - Dumping traces at step 30 +2025-10-24 09:53:52,297 - root - INFO - Dumping traces at step 30 +2025-10-24 09:53:52,297 - root - INFO - Dumping traces at step 30 +2025-10-24 09:53:52,308 - root - INFO - Dumping traces at step 30 +2025-10-24 09:53:52,312 - root - INFO - Dumping traces at step 30 +2025-10-24 09:53:52,313 - root - INFO - Dumping traces at step 30 +2025-10-24 09:53:52,422 - root - INFO - Finished dumping traces in 0.13 seconds +2025-10-24 09:53:52,429 - root - INFO - Finished dumping traces in 0.14 seconds +2025-10-24 09:53:52,431 - root - INFO - Finished dumping traces in 0.13 seconds +2025-10-24 09:53:52,432 - root - INFO - Finished dumping traces in 0.13 seconds +2025-10-24 09:53:52,433 - root - INFO - Finished dumping traces in 0.14 seconds +2025-10-24 09:53:52,437 - root - INFO - Finished dumping traces in 0.13 seconds +2025-10-24 09:53:52,442 - root - INFO - Finished dumping traces in 0.13 seconds +2025-10-24 09:53:52,444 - root - INFO - Finished dumping traces in 0.13 seconds +2025-10-24 09:54:10,461 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.4986 (max= 3.1497), tps=17868, mfu=37.23%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:54:10,461 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.4986 (max= 3.1497), tps=17868, mfu=37.23%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:54:10,461 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.4986 (max= 3.1497), tps=17868, mfu=37.23%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:54:10,461 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.4986 (max= 3.1497), tps=17868, mfu=37.23%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:54:10,461 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.4986 (max= 3.1497), tps=17868, mfu=37.23%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:54:10,461 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.4986 (max= 3.1497), tps=17868, mfu=37.23%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:54:10,461 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.4986 (max= 3.1497), tps=17868, mfu=37.23%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:54:10,461 - root - INFO - Step 1040: lr=1.00E-05, loss= 1.4986 (max= 3.1497), tps=17868, mfu=37.23%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:54:28,446 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.4737 (max= 4.0457), tps=18223, mfu=37.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:54:28,446 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.4737 (max= 4.0457), tps=18223, mfu=37.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:54:28,446 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.4737 (max= 4.0457), tps=18223, mfu=37.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:54:28,446 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.4737 (max= 4.0457), tps=18223, mfu=37.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:54:28,446 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.4737 (max= 4.0457), tps=18223, mfu=37.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:54:28,446 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.4737 (max= 4.0457), tps=18224, mfu=37.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:54:28,446 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.4737 (max= 4.0457), tps=18223, mfu=37.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:54:28,446 - root - INFO - Step 1050: lr=1.00E-05, loss= 1.4737 (max= 4.0457), tps=18223, mfu=37.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:54:46,448 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.5095 (max= 3.6945), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:54:46,448 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.5095 (max= 3.6945), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:54:46,448 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.5095 (max= 3.6945), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:54:46,448 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.5095 (max= 3.6945), tps=18205, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:54:46,449 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.5095 (max= 3.6945), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:54:46,449 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.5095 (max= 3.6945), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:54:46,449 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.5095 (max= 3.6945), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:54:46,449 - root - INFO - Step 1060: lr=1.00E-05, loss= 1.5095 (max= 3.6945), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:55:04,448 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.4743 (max= 2.5475), tps=18208, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:55:04,448 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.4743 (max= 2.5475), tps=18208, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:55:04,449 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.4743 (max= 2.5475), tps=18208, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:55:04,449 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.4743 (max= 2.5475), tps=18209, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:55:04,449 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.4743 (max= 2.5475), tps=18208, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:55:04,449 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.4743 (max= 2.5475), tps=18208, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:55:04,449 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.4743 (max= 2.5475), tps=18208, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:55:04,449 - root - INFO - Step 1070: lr=1.00E-05, loss= 1.4743 (max= 2.5475), tps=18209, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:55:22,468 - root - INFO - Step 1080: lr=1.00E-05, loss= 1.4671 (max= 2.3588), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:22,469 - root - INFO - Step 1080: lr=1.00E-05, loss= 1.4671 (max= 2.3588), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:22,469 - root - INFO - Step 1080: lr=1.00E-05, loss= 1.4671 (max= 2.3588), tps=18188, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:22,469 - root - INFO - Step 1080: lr=1.00E-05, loss= 1.4671 (max= 2.3588), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:22,469 - root - INFO - Step 1080: lr=1.00E-05, loss= 1.4671 (max= 2.3588), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:22,469 - root - INFO - Step 1080: lr=1.00E-05, loss= 1.4671 (max= 2.3588), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:22,469 - root - INFO - Step 1080: lr=1.00E-05, loss= 1.4671 (max= 2.3588), tps=18188, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:22,469 - root - INFO - Step 1080: lr=1.00E-05, loss= 1.4671 (max= 2.3588), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:40,521 - root - INFO - Step 1090: lr=1.00E-05, loss= 1.4854 (max= 2.8857), tps=18155, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:40,521 - root - INFO - Step 1090: lr=1.00E-05, loss= 1.4854 (max= 2.8857), tps=18155, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:40,521 - root - INFO - Step 1090: lr=1.00E-05, loss= 1.4854 (max= 2.8857), tps=18155, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:40,522 - root - INFO - Step 1090: lr=1.00E-05, loss= 1.4854 (max= 2.8857), tps=18154, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:40,522 - root - INFO - Step 1090: lr=1.00E-05, loss= 1.4854 (max= 2.8857), tps=18155, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:40,522 - root - INFO - Step 1090: lr=1.00E-05, loss= 1.4854 (max= 2.8857), tps=18154, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:40,522 - root - INFO - Step 1090: lr=1.00E-05, loss= 1.4854 (max= 2.8857), tps=18155, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:40,522 - root - INFO - Step 1090: lr=1.00E-05, loss= 1.4854 (max= 2.8857), tps=18154, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:58,523 - root - INFO - Step 1100: lr=1.00E-05, loss= 1.4923 (max= 3.5254), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:58,523 - root - INFO - Step 1100: lr=1.00E-05, loss= 1.4923 (max= 3.5254), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:58,523 - root - INFO - Step 1100: lr=1.00E-05, loss= 1.4923 (max= 3.5254), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:58,523 - root - INFO - Step 1100: lr=1.00E-05, loss= 1.4923 (max= 3.5254), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:58,523 - root - INFO - Step 1100: lr=1.00E-05, loss= 1.4923 (max= 3.5254), tps=18207, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:58,523 - root - INFO - Step 1100: lr=1.00E-05, loss= 1.4923 (max= 3.5254), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:58,523 - root - INFO - Step 1100: lr=1.00E-05, loss= 1.4923 (max= 3.5254), tps=18207, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:55:58,524 - root - INFO - Step 1100: lr=1.00E-05, loss= 1.4923 (max= 3.5254), tps=18207, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:56:16,564 - root - INFO - Step 1110: lr=1.00E-05, loss= 1.5224 (max= 2.7422), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:16,564 - root - INFO - Step 1110: lr=1.00E-05, loss= 1.5224 (max= 2.7422), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:16,564 - root - INFO - Step 1110: lr=1.00E-05, loss= 1.5224 (max= 2.7422), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:16,564 - root - INFO - Step 1110: lr=1.00E-05, loss= 1.5224 (max= 2.7422), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:16,564 - root - INFO - Step 1110: lr=1.00E-05, loss= 1.5224 (max= 2.7422), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:16,564 - root - INFO - Step 1110: lr=1.00E-05, loss= 1.5224 (max= 2.7422), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:16,564 - root - INFO - Step 1110: lr=1.00E-05, loss= 1.5224 (max= 2.7422), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:16,564 - root - INFO - Step 1110: lr=1.00E-05, loss= 1.5224 (max= 2.7422), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:34,563 - root - INFO - Step 1120: lr=1.00E-05, loss= 1.5108 (max= 2.7012), tps=18209, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:34,563 - root - INFO - Step 1120: lr=1.00E-05, loss= 1.5108 (max= 2.7012), tps=18209, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:34,563 - root - INFO - Step 1120: lr=1.00E-05, loss= 1.5108 (max= 2.7012), tps=18209, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:34,563 - root - INFO - Step 1120: lr=1.00E-05, loss= 1.5108 (max= 2.7012), tps=18209, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:34,563 - root - INFO - Step 1120: lr=1.00E-05, loss= 1.5108 (max= 2.7012), tps=18209, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:34,563 - root - INFO - Step 1120: lr=1.00E-05, loss= 1.5108 (max= 2.7012), tps=18209, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:34,563 - root - INFO - Step 1120: lr=1.00E-05, loss= 1.5108 (max= 2.7012), tps=18209, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:34,564 - root - INFO - Step 1120: lr=1.00E-05, loss= 1.5108 (max= 2.7012), tps=18209, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:52,576 - root - INFO - Step 1130: lr=1.00E-05, loss= 1.4803 (max= 3.5235), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:52,577 - root - INFO - Step 1130: lr=1.00E-05, loss= 1.4803 (max= 3.5235), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:52,577 - root - INFO - Step 1130: lr=1.00E-05, loss= 1.4803 (max= 3.5235), tps=18195, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:52,577 - root - INFO - Step 1130: lr=1.00E-05, loss= 1.4803 (max= 3.5235), tps=18195, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:52,577 - root - INFO - Step 1130: lr=1.00E-05, loss= 1.4803 (max= 3.5235), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:52,577 - root - INFO - Step 1130: lr=1.00E-05, loss= 1.4803 (max= 3.5235), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:52,577 - root - INFO - Step 1130: lr=1.00E-05, loss= 1.4803 (max= 3.5235), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:56:52,577 - root - INFO - Step 1130: lr=1.00E-05, loss= 1.4803 (max= 3.5235), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:57:10,579 - root - INFO - Step 1140: lr=1.00E-05, loss= 1.4972 (max= 2.6044), tps=18205, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:10,579 - root - INFO - Step 1140: lr=1.00E-05, loss= 1.4972 (max= 2.6044), tps=18205, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:10,579 - root - INFO - Step 1140: lr=1.00E-05, loss= 1.4972 (max= 2.6044), tps=18205, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:10,579 - root - INFO - Step 1140: lr=1.00E-05, loss= 1.4972 (max= 2.6044), tps=18205, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:10,579 - root - INFO - Step 1140: lr=1.00E-05, loss= 1.4972 (max= 2.6044), tps=18205, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:10,580 - root - INFO - Step 1140: lr=1.00E-05, loss= 1.4972 (max= 2.6044), tps=18205, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:10,580 - root - INFO - Step 1140: lr=1.00E-05, loss= 1.4972 (max= 2.6044), tps=18205, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:10,580 - root - INFO - Step 1140: lr=1.00E-05, loss= 1.4972 (max= 2.6044), tps=18205, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:28,571 - root - INFO - Step 1150: lr=1.00E-05, loss= 1.4368 (max= 3.0254), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:28,571 - root - INFO - Step 1150: lr=1.00E-05, loss= 1.4368 (max= 3.0254), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:28,571 - root - INFO - Step 1150: lr=1.00E-05, loss= 1.4368 (max= 3.0254), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:28,571 - root - INFO - Step 1150: lr=1.00E-05, loss= 1.4368 (max= 3.0254), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:28,571 - root - INFO - Step 1150: lr=1.00E-05, loss= 1.4368 (max= 3.0254), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:28,572 - root - INFO - Step 1150: lr=1.00E-05, loss= 1.4368 (max= 3.0254), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:28,572 - root - INFO - Step 1150: lr=1.00E-05, loss= 1.4368 (max= 3.0254), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:28,572 - root - INFO - Step 1150: lr=1.00E-05, loss= 1.4368 (max= 3.0254), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:46,593 - root - INFO - Step 1160: lr=1.00E-05, loss= 1.4733 (max= 2.6711), tps=18185, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:46,594 - root - INFO - Step 1160: lr=1.00E-05, loss= 1.4733 (max= 2.6711), tps=18186, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:46,594 - root - INFO - Step 1160: lr=1.00E-05, loss= 1.4733 (max= 2.6711), tps=18186, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:46,594 - root - INFO - Step 1160: lr=1.00E-05, loss= 1.4733 (max= 2.6711), tps=18186, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:46,594 - root - INFO - Step 1160: lr=1.00E-05, loss= 1.4733 (max= 2.6711), tps=18186, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:46,594 - root - INFO - Step 1160: lr=1.00E-05, loss= 1.4733 (max= 2.6711), tps=18185, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:46,594 - root - INFO - Step 1160: lr=1.00E-05, loss= 1.4733 (max= 2.6711), tps=18185, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:57:46,594 - root - INFO - Step 1160: lr=1.00E-05, loss= 1.4733 (max= 2.6711), tps=18186, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:58:04,616 - root - INFO - Step 1170: lr=1.00E-05, loss= 1.4587 (max= 2.9527), tps=18185, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:04,616 - root - INFO - Step 1170: lr=1.00E-05, loss= 1.4587 (max= 2.9527), tps=18185, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:04,616 - root - INFO - Step 1170: lr=1.00E-05, loss= 1.4587 (max= 2.9527), tps=18185, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:04,616 - root - INFO - Step 1170: lr=1.00E-05, loss= 1.4587 (max= 2.9527), tps=18185, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:04,616 - root - INFO - Step 1170: lr=1.00E-05, loss= 1.4587 (max= 2.9527), tps=18186, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:04,616 - root - INFO - Step 1170: lr=1.00E-05, loss= 1.4587 (max= 2.9527), tps=18185, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:04,617 - root - INFO - Step 1170: lr=1.00E-05, loss= 1.4587 (max= 2.9527), tps=18185, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:04,617 - root - INFO - Step 1170: lr=1.00E-05, loss= 1.4587 (max= 2.9527), tps=18185, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:22,672 - root - INFO - Step 1180: lr=1.00E-05, loss= 1.4638 (max= 3.9418), tps=18152, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:22,672 - root - INFO - Step 1180: lr=1.00E-05, loss= 1.4638 (max= 3.9418), tps=18152, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:22,672 - root - INFO - Step 1180: lr=1.00E-05, loss= 1.4638 (max= 3.9418), tps=18152, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:22,672 - root - INFO - Step 1180: lr=1.00E-05, loss= 1.4638 (max= 3.9418), tps=18152, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:22,672 - root - INFO - Step 1180: lr=1.00E-05, loss= 1.4638 (max= 3.9418), tps=18152, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:22,672 - root - INFO - Step 1180: lr=1.00E-05, loss= 1.4638 (max= 3.9418), tps=18152, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:22,672 - root - INFO - Step 1180: lr=1.00E-05, loss= 1.4638 (max= 3.9418), tps=18152, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:22,673 - root - INFO - Step 1180: lr=1.00E-05, loss= 1.4638 (max= 3.9418), tps=18153, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:40,648 - root - INFO - Step 1190: lr=1.00E-05, loss= 1.4767 (max= 2.6884), tps=18232, mfu=37.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:58:40,648 - root - INFO - Step 1190: lr=1.00E-05, loss= 1.4767 (max= 2.6884), tps=18232, mfu=37.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:58:40,648 - root - INFO - Step 1190: lr=1.00E-05, loss= 1.4767 (max= 2.6884), tps=18232, mfu=37.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:58:40,648 - root - INFO - Step 1190: lr=1.00E-05, loss= 1.4767 (max= 2.6884), tps=18232, mfu=37.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:58:40,648 - root - INFO - Step 1190: lr=1.00E-05, loss= 1.4767 (max= 2.6884), tps=18232, mfu=37.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:58:40,648 - root - INFO - Step 1190: lr=1.00E-05, loss= 1.4767 (max= 2.6884), tps=18232, mfu=37.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:58:40,649 - root - INFO - Step 1190: lr=1.00E-05, loss= 1.4767 (max= 2.6884), tps=18232, mfu=37.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:58:40,649 - root - INFO - Step 1190: lr=1.00E-05, loss= 1.4767 (max= 2.6884), tps=18232, mfu=37.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:58:58,655 - root - INFO - Step 1200: lr=1.00E-05, loss= 1.5021 (max= 2.7379), tps=18201, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:58,656 - root - INFO - Step 1200: lr=1.00E-05, loss= 1.5021 (max= 2.7379), tps=18201, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:58,656 - root - INFO - Step 1200: lr=1.00E-05, loss= 1.5021 (max= 2.7379), tps=18201, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:58,656 - root - INFO - Step 1200: lr=1.00E-05, loss= 1.5021 (max= 2.7379), tps=18201, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:58,656 - root - INFO - Step 1200: lr=1.00E-05, loss= 1.5021 (max= 2.7379), tps=18201, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:58,656 - root - INFO - Step 1200: lr=1.00E-05, loss= 1.5021 (max= 2.7379), tps=18201, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:58,656 - root - INFO - Step 1200: lr=1.00E-05, loss= 1.5021 (max= 2.7379), tps=18201, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:58:58,656 - root - INFO - Step 1200: lr=1.00E-05, loss= 1.5021 (max= 2.7379), tps=18202, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:59:16,662 - root - INFO - Step 1210: lr=1.00E-05, loss= 1.4790 (max= 2.7495), tps=18201, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:59:16,662 - root - INFO - Step 1210: lr=1.00E-05, loss= 1.4790 (max= 2.7495), tps=18202, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:59:16,662 - root - INFO - Step 1210: lr=1.00E-05, loss= 1.4790 (max= 2.7495), tps=18202, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:59:16,662 - root - INFO - Step 1210: lr=1.00E-05, loss= 1.4790 (max= 2.7495), tps=18202, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:59:16,662 - root - INFO - Step 1210: lr=1.00E-05, loss= 1.4790 (max= 2.7495), tps=18202, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:59:16,662 - root - INFO - Step 1210: lr=1.00E-05, loss= 1.4790 (max= 2.7495), tps=18202, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:59:16,662 - root - INFO - Step 1210: lr=1.00E-05, loss= 1.4790 (max= 2.7495), tps=18202, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:59:16,663 - root - INFO - Step 1210: lr=1.00E-05, loss= 1.4790 (max= 2.7495), tps=18202, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:59:34,667 - root - INFO - Step 1220: lr=1.00E-05, loss= 1.4812 (max= 2.9840), tps=18202, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:59:34,668 - root - INFO - Step 1220: lr=1.00E-05, loss= 1.4812 (max= 2.9840), tps=18203, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:59:34,668 - root - INFO - Step 1220: lr=1.00E-05, loss= 1.4812 (max= 2.9840), tps=18202, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:59:34,668 - root - INFO - Step 1220: lr=1.00E-05, loss= 1.4812 (max= 2.9840), tps=18203, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:59:34,668 - root - INFO - Step 1220: lr=1.00E-05, loss= 1.4812 (max= 2.9840), tps=18203, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:59:34,668 - root - INFO - Step 1220: lr=1.00E-05, loss= 1.4812 (max= 2.9840), tps=18203, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:59:34,668 - root - INFO - Step 1220: lr=1.00E-05, loss= 1.4812 (max= 2.9840), tps=18202, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:59:34,668 - root - INFO - Step 1220: lr=1.00E-05, loss= 1.4812 (max= 2.9840), tps=18203, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 09:59:52,697 - root - INFO - Step 1230: lr=1.00E-05, loss= 1.4301 (max= 2.7614), tps=18179, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:59:52,697 - root - INFO - Step 1230: lr=1.00E-05, loss= 1.4301 (max= 2.7614), tps=18179, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:59:52,697 - root - INFO - Step 1230: lr=1.00E-05, loss= 1.4301 (max= 2.7614), tps=18179, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:59:52,697 - root - INFO - Step 1230: lr=1.00E-05, loss= 1.4301 (max= 2.7614), tps=18179, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:59:52,697 - root - INFO - Step 1230: lr=1.00E-05, loss= 1.4301 (max= 2.7614), tps=18179, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:59:52,697 - root - INFO - Step 1230: lr=1.00E-05, loss= 1.4301 (max= 2.7614), tps=18179, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:59:52,697 - root - INFO - Step 1230: lr=1.00E-05, loss= 1.4301 (max= 2.7614), tps=18179, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 09:59:52,697 - root - INFO - Step 1230: lr=1.00E-05, loss= 1.4301 (max= 2.7614), tps=18179, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:00:10,699 - root - INFO - Step 1240: lr=1.00E-05, loss= 1.4240 (max= 2.6085), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:00:10,699 - root - INFO - Step 1240: lr=1.00E-05, loss= 1.4240 (max= 2.6085), tps=18205, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:00:10,699 - root - INFO - Step 1240: lr=1.00E-05, loss= 1.4240 (max= 2.6085), tps=18205, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:00:10,700 - root - INFO - Step 1240: lr=1.00E-05, loss= 1.4240 (max= 2.6085), tps=18205, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:00:10,700 - root - INFO - Step 1240: lr=1.00E-05, loss= 1.4240 (max= 2.6085), tps=18205, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:00:10,700 - root - INFO - Step 1240: lr=1.00E-05, loss= 1.4240 (max= 2.6085), tps=18205, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:00:10,700 - root - INFO - Step 1240: lr=1.00E-05, loss= 1.4240 (max= 2.6085), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:00:10,700 - root - INFO - Step 1240: lr=1.00E-05, loss= 1.4240 (max= 2.6085), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:00:28,716 - root - INFO - Step 1250: lr=1.00E-05, loss= 1.4440 (max= 2.7752), tps=18191, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:00:28,716 - root - INFO - Step 1250: lr=1.00E-05, loss= 1.4440 (max= 2.7752), tps=18191, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:00:28,716 - root - INFO - Step 1250: lr=1.00E-05, loss= 1.4440 (max= 2.7752), tps=18192, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:00:28,716 - root - INFO - Step 1250: lr=1.00E-05, loss= 1.4440 (max= 2.7752), tps=18192, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:00:28,716 - root - INFO - Step 1250: lr=1.00E-05, loss= 1.4440 (max= 2.7752), tps=18192, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:00:28,716 - root - INFO - Step 1250: lr=1.00E-05, loss= 1.4440 (max= 2.7752), tps=18191, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:00:28,716 - root - INFO - Step 1250: lr=1.00E-05, loss= 1.4440 (max= 2.7752), tps=18191, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:00:28,717 - root - INFO - Step 1250: lr=1.00E-05, loss= 1.4440 (max= 2.7752), tps=18192, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:00:46,717 - root - INFO - Step 1260: lr=1.00E-05, loss= 1.4487 (max= 3.0723), tps=18207, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:00:46,717 - root - INFO - Step 1260: lr=1.00E-05, loss= 1.4487 (max= 3.0723), tps=18207, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:00:46,717 - root - INFO - Step 1260: lr=1.00E-05, loss= 1.4487 (max= 3.0723), tps=18207, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:00:46,717 - root - INFO - Step 1260: lr=1.00E-05, loss= 1.4487 (max= 3.0723), tps=18207, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:00:46,717 - root - INFO - Step 1260: lr=1.00E-05, loss= 1.4487 (max= 3.0723), tps=18207, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:00:46,717 - root - INFO - Step 1260: lr=1.00E-05, loss= 1.4487 (max= 3.0723), tps=18207, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:00:46,717 - root - INFO - Step 1260: lr=1.00E-05, loss= 1.4487 (max= 3.0723), tps=18207, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:00:46,718 - root - INFO - Step 1260: lr=1.00E-05, loss= 1.4487 (max= 3.0723), tps=18207, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:04,756 - root - INFO - Step 1270: lr=1.00E-05, loss= 1.4639 (max= 2.6111), tps=18169, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:04,756 - root - INFO - Step 1270: lr=1.00E-05, loss= 1.4639 (max= 2.6111), tps=18169, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:04,756 - root - INFO - Step 1270: lr=1.00E-05, loss= 1.4639 (max= 2.6111), tps=18169, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:04,756 - root - INFO - Step 1270: lr=1.00E-05, loss= 1.4639 (max= 2.6111), tps=18169, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:04,756 - root - INFO - Step 1270: lr=1.00E-05, loss= 1.4639 (max= 2.6111), tps=18169, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:04,756 - root - INFO - Step 1270: lr=1.00E-05, loss= 1.4639 (max= 2.6111), tps=18169, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:04,757 - root - INFO - Step 1270: lr=1.00E-05, loss= 1.4639 (max= 2.6111), tps=18169, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:04,757 - root - INFO - Step 1270: lr=1.00E-05, loss= 1.4639 (max= 2.6111), tps=18169, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:22,804 - root - INFO - Step 1280: lr=1.00E-05, loss= 1.4600 (max= 2.6791), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:22,804 - root - INFO - Step 1280: lr=1.00E-05, loss= 1.4600 (max= 2.6791), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:22,804 - root - INFO - Step 1280: lr=1.00E-05, loss= 1.4600 (max= 2.6791), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:22,804 - root - INFO - Step 1280: lr=1.00E-05, loss= 1.4600 (max= 2.6791), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:22,804 - root - INFO - Step 1280: lr=1.00E-05, loss= 1.4600 (max= 2.6791), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:22,804 - root - INFO - Step 1280: lr=1.00E-05, loss= 1.4600 (max= 2.6791), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:22,804 - root - INFO - Step 1280: lr=1.00E-05, loss= 1.4600 (max= 2.6791), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:22,805 - root - INFO - Step 1280: lr=1.00E-05, loss= 1.4600 (max= 2.6791), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:40,815 - root - INFO - Step 1290: lr=1.00E-05, loss= 1.4682 (max= 3.4672), tps=18197, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:40,815 - root - INFO - Step 1290: lr=1.00E-05, loss= 1.4682 (max= 3.4672), tps=18198, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:40,815 - root - INFO - Step 1290: lr=1.00E-05, loss= 1.4682 (max= 3.4672), tps=18197, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:40,815 - root - INFO - Step 1290: lr=1.00E-05, loss= 1.4682 (max= 3.4672), tps=18197, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:40,815 - root - INFO - Step 1290: lr=1.00E-05, loss= 1.4682 (max= 3.4672), tps=18198, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:40,815 - root - INFO - Step 1290: lr=1.00E-05, loss= 1.4682 (max= 3.4672), tps=18197, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:40,815 - root - INFO - Step 1290: lr=1.00E-05, loss= 1.4682 (max= 3.4672), tps=18197, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:40,816 - root - INFO - Step 1290: lr=1.00E-05, loss= 1.4682 (max= 3.4672), tps=18197, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:58,789 - root - INFO - Step 1300: lr=1.00E-05, loss= 1.4339 (max= 2.6655), tps=18234, mfu=37.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:58,790 - root - INFO - Step 1300: lr=1.00E-05, loss= 1.4339 (max= 2.6655), tps=18234, mfu=37.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:58,790 - root - INFO - Step 1300: lr=1.00E-05, loss= 1.4339 (max= 2.6655), tps=18234, mfu=37.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:58,790 - root - INFO - Step 1300: lr=1.00E-05, loss= 1.4339 (max= 2.6655), tps=18234, mfu=37.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:58,790 - root - INFO - Step 1300: lr=1.00E-05, loss= 1.4339 (max= 2.6655), tps=18234, mfu=37.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:58,790 - root - INFO - Step 1300: lr=1.00E-05, loss= 1.4339 (max= 2.6655), tps=18234, mfu=37.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:58,790 - root - INFO - Step 1300: lr=1.00E-05, loss= 1.4339 (max= 2.6655), tps=18234, mfu=37.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:01:58,790 - root - INFO - Step 1300: lr=1.00E-05, loss= 1.4339 (max= 2.6655), tps=18234, mfu=37.99%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:02:16,817 - root - INFO - Step 1310: lr=1.00E-05, loss= 1.4671 (max= 3.1190), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:02:16,817 - root - INFO - Step 1310: lr=1.00E-05, loss= 1.4671 (max= 3.1190), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:02:16,817 - root - INFO - Step 1310: lr=1.00E-05, loss= 1.4671 (max= 3.1190), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:02:16,817 - root - INFO - Step 1310: lr=1.00E-05, loss= 1.4671 (max= 3.1190), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:02:16,817 - root - INFO - Step 1310: lr=1.00E-05, loss= 1.4671 (max= 3.1190), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:02:16,817 - root - INFO - Step 1310: lr=1.00E-05, loss= 1.4671 (max= 3.1190), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:02:16,817 - root - INFO - Step 1310: lr=1.00E-05, loss= 1.4671 (max= 3.1190), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:02:16,818 - root - INFO - Step 1310: lr=1.00E-05, loss= 1.4671 (max= 3.1190), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:02:34,830 - root - INFO - Step 1320: lr=1.00E-05, loss= 1.4273 (max= 2.8808), tps=18195, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:02:34,830 - root - INFO - Step 1320: lr=1.00E-05, loss= 1.4273 (max= 2.8808), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:02:34,830 - root - INFO - Step 1320: lr=1.00E-05, loss= 1.4273 (max= 2.8808), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:02:34,830 - root - INFO - Step 1320: lr=1.00E-05, loss= 1.4273 (max= 2.8808), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:02:34,830 - root - INFO - Step 1320: lr=1.00E-05, loss= 1.4273 (max= 2.8808), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:02:34,830 - root - INFO - Step 1320: lr=1.00E-05, loss= 1.4273 (max= 2.8808), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:02:34,830 - root - INFO - Step 1320: lr=1.00E-05, loss= 1.4273 (max= 2.8808), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:02:34,830 - root - INFO - Step 1320: lr=1.00E-05, loss= 1.4273 (max= 2.8808), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:02:52,849 - root - INFO - Step 1330: lr=1.00E-05, loss= 1.4684 (max= 2.6588), tps=18188, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:02:52,849 - root - INFO - Step 1330: lr=1.00E-05, loss= 1.4684 (max= 2.6588), tps=18188, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:02:52,849 - root - INFO - Step 1330: lr=1.00E-05, loss= 1.4684 (max= 2.6588), tps=18188, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:02:52,849 - root - INFO - Step 1330: lr=1.00E-05, loss= 1.4684 (max= 2.6588), tps=18188, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:02:52,849 - root - INFO - Step 1330: lr=1.00E-05, loss= 1.4684 (max= 2.6588), tps=18188, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:02:52,849 - root - INFO - Step 1330: lr=1.00E-05, loss= 1.4684 (max= 2.6588), tps=18188, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:02:52,849 - root - INFO - Step 1330: lr=1.00E-05, loss= 1.4684 (max= 2.6588), tps=18188, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:02:52,850 - root - INFO - Step 1330: lr=1.00E-05, loss= 1.4684 (max= 2.6588), tps=18189, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:03:10,869 - root - INFO - Step 1340: lr=1.00E-05, loss= 1.4888 (max= 2.5608), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:03:10,869 - root - INFO - Step 1340: lr=1.00E-05, loss= 1.4888 (max= 2.5608), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:03:10,869 - root - INFO - Step 1340: lr=1.00E-05, loss= 1.4888 (max= 2.5608), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:03:10,869 - root - INFO - Step 1340: lr=1.00E-05, loss= 1.4888 (max= 2.5608), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:03:10,869 - root - INFO - Step 1340: lr=1.00E-05, loss= 1.4888 (max= 2.5608), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:03:10,869 - root - INFO - Step 1340: lr=1.00E-05, loss= 1.4888 (max= 2.5608), tps=18187, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:03:10,869 - root - INFO - Step 1340: lr=1.00E-05, loss= 1.4888 (max= 2.5608), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:03:10,870 - root - INFO - Step 1340: lr=1.00E-05, loss= 1.4888 (max= 2.5608), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:03:31,776 - root - INFO - Step 1350: lr=1.00E-05, loss= 1.4481 (max= 2.4525), tps=15676, mfu=32.66%, memory: 78.53GiB(44.03%) time/data_loading=0.01s (max=0.08s, 14.99%) +2025-10-24 10:03:31,776 - root - INFO - Step 1350: lr=1.00E-05, loss= 1.4481 (max= 2.4525), tps=15676, mfu=32.66%, memory: 78.53GiB(44.03%) time/data_loading=0.01s (max=0.08s, 14.99%) +2025-10-24 10:03:31,776 - root - INFO - Step 1350: lr=1.00E-05, loss= 1.4481 (max= 2.4525), tps=15676, mfu=32.66%, memory: 78.53GiB(44.03%) time/data_loading=0.01s (max=0.08s, 14.99%) +2025-10-24 10:03:31,776 - root - INFO - Step 1350: lr=1.00E-05, loss= 1.4481 (max= 2.4525), tps=15676, mfu=32.66%, memory: 78.53GiB(44.03%) time/data_loading=0.01s (max=0.08s, 14.99%) +2025-10-24 10:03:31,776 - root - INFO - Step 1350: lr=1.00E-05, loss= 1.4481 (max= 2.4525), tps=15676, mfu=32.66%, memory: 78.53GiB(44.03%) time/data_loading=0.01s (max=0.08s, 14.99%) +2025-10-24 10:03:31,776 - root - INFO - Step 1350: lr=1.00E-05, loss= 1.4481 (max= 2.4525), tps=15676, mfu=32.66%, memory: 78.53GiB(44.03%) time/data_loading=0.01s (max=0.08s, 14.99%) +2025-10-24 10:03:31,776 - root - INFO - Step 1350: lr=1.00E-05, loss= 1.4481 (max= 2.4525), tps=15676, mfu=32.66%, memory: 78.53GiB(44.03%) time/data_loading=0.01s (max=0.08s, 14.99%) +2025-10-24 10:03:31,777 - root - INFO - Step 1350: lr=1.00E-05, loss= 1.4481 (max= 2.4525), tps=15676, mfu=32.66%, memory: 78.53GiB(44.03%) time/data_loading=0.01s (max=0.08s, 14.99%) +2025-10-24 10:03:49,784 - root - INFO - Step 1360: lr=1.00E-05, loss= 1.4693 (max= 2.8204), tps=18199, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:03:49,785 - root - INFO - Step 1360: lr=1.00E-05, loss= 1.4693 (max= 2.8204), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:03:49,785 - root - INFO - Step 1360: lr=1.00E-05, loss= 1.4693 (max= 2.8204), tps=18199, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:03:49,785 - root - INFO - Step 1360: lr=1.00E-05, loss= 1.4693 (max= 2.8204), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:03:49,785 - root - INFO - Step 1360: lr=1.00E-05, loss= 1.4693 (max= 2.8204), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:03:49,785 - root - INFO - Step 1360: lr=1.00E-05, loss= 1.4693 (max= 2.8204), tps=18199, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:03:49,785 - root - INFO - Step 1360: lr=1.00E-05, loss= 1.4693 (max= 2.8204), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:03:49,785 - root - INFO - Step 1360: lr=1.00E-05, loss= 1.4693 (max= 2.8204), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:04:07,817 - root - INFO - Step 1370: lr=1.00E-05, loss= 1.4126 (max= 2.8331), tps=18176, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:04:07,817 - root - INFO - Step 1370: lr=1.00E-05, loss= 1.4126 (max= 2.8331), tps=18176, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:04:07,817 - root - INFO - Step 1370: lr=1.00E-05, loss= 1.4126 (max= 2.8331), tps=18176, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:04:07,817 - root - INFO - Step 1370: lr=1.00E-05, loss= 1.4126 (max= 2.8331), tps=18176, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:04:07,817 - root - INFO - Step 1370: lr=1.00E-05, loss= 1.4126 (max= 2.8331), tps=18176, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:04:07,817 - root - INFO - Step 1370: lr=1.00E-05, loss= 1.4126 (max= 2.8331), tps=18176, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:04:07,817 - root - INFO - Step 1370: lr=1.00E-05, loss= 1.4126 (max= 2.8331), tps=18176, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:04:07,817 - root - INFO - Step 1370: lr=1.00E-05, loss= 1.4126 (max= 2.8331), tps=18176, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:04:25,859 - root - INFO - Step 1380: lr=1.00E-05, loss= 1.4357 (max= 2.6978), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:04:25,859 - root - INFO - Step 1380: lr=1.00E-05, loss= 1.4357 (max= 2.6978), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:04:25,859 - root - INFO - Step 1380: lr=1.00E-05, loss= 1.4357 (max= 2.6978), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:04:25,859 - root - INFO - Step 1380: lr=1.00E-05, loss= 1.4357 (max= 2.6978), tps=18166, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:04:25,859 - root - INFO - Step 1380: lr=1.00E-05, loss= 1.4357 (max= 2.6978), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:04:25,859 - root - INFO - Step 1380: lr=1.00E-05, loss= 1.4357 (max= 2.6978), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:04:25,859 - root - INFO - Step 1380: lr=1.00E-05, loss= 1.4357 (max= 2.6978), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:04:25,860 - root - INFO - Step 1380: lr=1.00E-05, loss= 1.4357 (max= 2.6978), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:04:43,897 - root - INFO - Step 1390: lr=1.00E-05, loss= 1.4625 (max= 2.7611), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:04:43,897 - root - INFO - Step 1390: lr=1.00E-05, loss= 1.4625 (max= 2.7611), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:04:43,897 - root - INFO - Step 1390: lr=1.00E-05, loss= 1.4625 (max= 2.7611), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:04:43,897 - root - INFO - Step 1390: lr=1.00E-05, loss= 1.4625 (max= 2.7611), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:04:43,898 - root - INFO - Step 1390: lr=1.00E-05, loss= 1.4625 (max= 2.7611), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:04:43,898 - root - INFO - Step 1390: lr=1.00E-05, loss= 1.4625 (max= 2.7611), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:04:43,898 - root - INFO - Step 1390: lr=1.00E-05, loss= 1.4625 (max= 2.7611), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:04:43,899 - root - INFO - Step 1390: lr=1.00E-05, loss= 1.4625 (max= 2.7611), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:05:01,905 - root - INFO - Step 1400: lr=1.00E-05, loss= 1.4327 (max= 2.4997), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:05:01,905 - root - INFO - Step 1400: lr=1.00E-05, loss= 1.4327 (max= 2.4997), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:05:01,906 - root - INFO - Step 1400: lr=1.00E-05, loss= 1.4327 (max= 2.4997), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:05:01,906 - root - INFO - Step 1400: lr=1.00E-05, loss= 1.4327 (max= 2.4997), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:05:01,906 - root - INFO - Step 1400: lr=1.00E-05, loss= 1.4327 (max= 2.4997), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:05:01,906 - root - INFO - Step 1400: lr=1.00E-05, loss= 1.4327 (max= 2.4997), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:05:01,906 - root - INFO - Step 1400: lr=1.00E-05, loss= 1.4327 (max= 2.4997), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:05:01,906 - root - INFO - Step 1400: lr=1.00E-05, loss= 1.4327 (max= 2.4997), tps=18201, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:05:19,954 - root - INFO - Step 1410: lr=1.00E-05, loss= 1.4492 (max= 3.3768), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:05:19,954 - root - INFO - Step 1410: lr=1.00E-05, loss= 1.4492 (max= 3.3768), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:05:19,954 - root - INFO - Step 1410: lr=1.00E-05, loss= 1.4492 (max= 3.3768), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:05:19,954 - root - INFO - Step 1410: lr=1.00E-05, loss= 1.4492 (max= 3.3768), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:05:19,954 - root - INFO - Step 1410: lr=1.00E-05, loss= 1.4492 (max= 3.3768), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:05:19,954 - root - INFO - Step 1410: lr=1.00E-05, loss= 1.4492 (max= 3.3768), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:05:19,954 - root - INFO - Step 1410: lr=1.00E-05, loss= 1.4492 (max= 3.3768), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:05:19,955 - root - INFO - Step 1410: lr=1.00E-05, loss= 1.4492 (max= 3.3768), tps=18159, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:05:37,932 - root - INFO - Step 1420: lr=1.00E-05, loss= 1.4305 (max= 2.5200), tps=18231, mfu=37.98%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:05:37,932 - root - INFO - Step 1420: lr=1.00E-05, loss= 1.4305 (max= 2.5200), tps=18231, mfu=37.98%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:05:37,932 - root - INFO - Step 1420: lr=1.00E-05, loss= 1.4305 (max= 2.5200), tps=18231, mfu=37.98%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:05:37,932 - root - INFO - Step 1420: lr=1.00E-05, loss= 1.4305 (max= 2.5200), tps=18231, mfu=37.98%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:05:37,932 - root - INFO - Step 1420: lr=1.00E-05, loss= 1.4305 (max= 2.5200), tps=18231, mfu=37.98%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:05:37,932 - root - INFO - Step 1420: lr=1.00E-05, loss= 1.4305 (max= 2.5200), tps=18231, mfu=37.98%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:05:37,932 - root - INFO - Step 1420: lr=1.00E-05, loss= 1.4305 (max= 2.5200), tps=18231, mfu=37.98%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:05:37,933 - root - INFO - Step 1420: lr=1.00E-05, loss= 1.4305 (max= 2.5200), tps=18231, mfu=37.98%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:05:55,957 - root - INFO - Step 1430: lr=1.00E-05, loss= 1.4247 (max= 2.5375), tps=18184, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:05:55,957 - root - INFO - Step 1430: lr=1.00E-05, loss= 1.4247 (max= 2.5375), tps=18184, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:05:55,957 - root - INFO - Step 1430: lr=1.00E-05, loss= 1.4247 (max= 2.5375), tps=18184, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:05:55,957 - root - INFO - Step 1430: lr=1.00E-05, loss= 1.4247 (max= 2.5375), tps=18184, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:05:55,957 - root - INFO - Step 1430: lr=1.00E-05, loss= 1.4247 (max= 2.5375), tps=18183, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:05:55,957 - root - INFO - Step 1430: lr=1.00E-05, loss= 1.4247 (max= 2.5375), tps=18184, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:05:55,957 - root - INFO - Step 1430: lr=1.00E-05, loss= 1.4247 (max= 2.5375), tps=18184, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:05:55,957 - root - INFO - Step 1430: lr=1.00E-05, loss= 1.4247 (max= 2.5375), tps=18184, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:06:13,974 - root - INFO - Step 1440: lr=1.00E-05, loss= 1.4517 (max= 2.6659), tps=18191, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:06:13,974 - root - INFO - Step 1440: lr=1.00E-05, loss= 1.4517 (max= 2.6659), tps=18191, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:06:13,974 - root - INFO - Step 1440: lr=1.00E-05, loss= 1.4517 (max= 2.6659), tps=18191, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:06:13,974 - root - INFO - Step 1440: lr=1.00E-05, loss= 1.4517 (max= 2.6659), tps=18191, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:06:13,974 - root - INFO - Step 1440: lr=1.00E-05, loss= 1.4517 (max= 2.6659), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:06:13,974 - root - INFO - Step 1440: lr=1.00E-05, loss= 1.4517 (max= 2.6659), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:06:13,975 - root - INFO - Step 1440: lr=1.00E-05, loss= 1.4517 (max= 2.6659), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:06:13,975 - root - INFO - Step 1440: lr=1.00E-05, loss= 1.4517 (max= 2.6659), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:06:32,017 - root - INFO - Step 1450: lr=1.00E-05, loss= 1.4169 (max= 3.0729), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:06:32,017 - root - INFO - Step 1450: lr=1.00E-05, loss= 1.4169 (max= 3.0729), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:06:32,017 - root - INFO - Step 1450: lr=1.00E-05, loss= 1.4169 (max= 3.0729), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:06:32,017 - root - INFO - Step 1450: lr=1.00E-05, loss= 1.4169 (max= 3.0729), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:06:32,017 - root - INFO - Step 1450: lr=1.00E-05, loss= 1.4169 (max= 3.0729), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:06:32,017 - root - INFO - Step 1450: lr=1.00E-05, loss= 1.4169 (max= 3.0729), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:06:32,017 - root - INFO - Step 1450: lr=1.00E-05, loss= 1.4169 (max= 3.0729), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:06:32,018 - root - INFO - Step 1450: lr=1.00E-05, loss= 1.4169 (max= 3.0729), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:06:50,025 - root - INFO - Step 1460: lr=1.00E-05, loss= 1.4643 (max= 3.0892), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:06:50,025 - root - INFO - Step 1460: lr=1.00E-05, loss= 1.4643 (max= 3.0892), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:06:50,025 - root - INFO - Step 1460: lr=1.00E-05, loss= 1.4643 (max= 3.0892), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:06:50,025 - root - INFO - Step 1460: lr=1.00E-05, loss= 1.4643 (max= 3.0892), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:06:50,025 - root - INFO - Step 1460: lr=1.00E-05, loss= 1.4643 (max= 3.0892), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:06:50,025 - root - INFO - Step 1460: lr=1.00E-05, loss= 1.4643 (max= 3.0892), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:06:50,025 - root - INFO - Step 1460: lr=1.00E-05, loss= 1.4643 (max= 3.0892), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:06:50,026 - root - INFO - Step 1460: lr=1.00E-05, loss= 1.4643 (max= 3.0892), tps=18200, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:06,010 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:990296 +2025-10-24 10:07:08,027 - root - INFO - Step 1470: lr=1.00E-05, loss= 1.4188 (max= 2.4900), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:08,027 - root - INFO - Step 1470: lr=1.00E-05, loss= 1.4188 (max= 2.4900), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:08,028 - root - INFO - Step 1470: lr=1.00E-05, loss= 1.4188 (max= 2.4900), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:08,028 - root - INFO - Step 1470: lr=1.00E-05, loss= 1.4188 (max= 2.4900), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:08,028 - root - INFO - Step 1470: lr=1.00E-05, loss= 1.4188 (max= 2.4900), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:08,028 - root - INFO - Step 1470: lr=1.00E-05, loss= 1.4188 (max= 2.4900), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:08,028 - root - INFO - Step 1470: lr=1.00E-05, loss= 1.4188 (max= 2.4900), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:08,028 - root - INFO - Step 1470: lr=1.00E-05, loss= 1.4188 (max= 2.4900), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:26,047 - root - INFO - Step 1480: lr=1.00E-05, loss= 1.4107 (max= 3.2619), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:26,047 - root - INFO - Step 1480: lr=1.00E-05, loss= 1.4107 (max= 3.2619), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:26,047 - root - INFO - Step 1480: lr=1.00E-05, loss= 1.4107 (max= 3.2619), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:26,047 - root - INFO - Step 1480: lr=1.00E-05, loss= 1.4107 (max= 3.2619), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:26,048 - root - INFO - Step 1480: lr=1.00E-05, loss= 1.4107 (max= 3.2619), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:26,048 - root - INFO - Step 1480: lr=1.00E-05, loss= 1.4107 (max= 3.2619), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:26,048 - root - INFO - Step 1480: lr=1.00E-05, loss= 1.4107 (max= 3.2619), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:26,048 - root - INFO - Step 1480: lr=1.00E-05, loss= 1.4107 (max= 3.2619), tps=18188, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:44,068 - root - INFO - Step 1490: lr=1.00E-05, loss= 1.4460 (max= 2.5600), tps=18187, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:44,068 - root - INFO - Step 1490: lr=1.00E-05, loss= 1.4460 (max= 2.5600), tps=18187, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:44,068 - root - INFO - Step 1490: lr=1.00E-05, loss= 1.4460 (max= 2.5600), tps=18187, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:44,068 - root - INFO - Step 1490: lr=1.00E-05, loss= 1.4460 (max= 2.5600), tps=18187, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:44,068 - root - INFO - Step 1490: lr=1.00E-05, loss= 1.4460 (max= 2.5600), tps=18187, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:44,068 - root - INFO - Step 1490: lr=1.00E-05, loss= 1.4460 (max= 2.5600), tps=18187, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:44,069 - root - INFO - Step 1490: lr=1.00E-05, loss= 1.4460 (max= 2.5600), tps=18187, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:07:44,069 - root - INFO - Step 1490: lr=1.00E-05, loss= 1.4460 (max= 2.5600), tps=18187, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:08:02,050 - root - INFO - Step 1500: lr=1.00E-05, loss= 1.4111 (max= 2.6997), tps=18226, mfu=37.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:02,051 - root - INFO - Step 1500: lr=1.00E-05, loss= 1.4111 (max= 2.6997), tps=18226, mfu=37.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:02,051 - root - INFO - Step 1500: lr=1.00E-05, loss= 1.4111 (max= 2.6997), tps=18226, mfu=37.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:02,051 - root - INFO - Step 1500: lr=1.00E-05, loss= 1.4111 (max= 2.6997), tps=18226, mfu=37.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:02,051 - root - INFO - Step 1500: lr=1.00E-05, loss= 1.4111 (max= 2.6997), tps=18226, mfu=37.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:02,051 - root - INFO - Step 1500: lr=1.00E-05, loss= 1.4111 (max= 2.6997), tps=18226, mfu=37.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:02,051 - root - INFO - Step 1500: lr=1.00E-05, loss= 1.4111 (max= 2.6997), tps=18226, mfu=37.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:02,051 - root - INFO - Step 1500: lr=1.00E-05, loss= 1.4111 (max= 2.6997), tps=18226, mfu=37.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:20,039 - root - INFO - Step 1510: lr=1.00E-05, loss= 1.4504 (max= 2.7191), tps=18220, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:20,039 - root - INFO - Step 1510: lr=1.00E-05, loss= 1.4504 (max= 2.7191), tps=18220, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:20,039 - root - INFO - Step 1510: lr=1.00E-05, loss= 1.4504 (max= 2.7191), tps=18220, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:20,039 - root - INFO - Step 1510: lr=1.00E-05, loss= 1.4504 (max= 2.7191), tps=18220, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:20,039 - root - INFO - Step 1510: lr=1.00E-05, loss= 1.4504 (max= 2.7191), tps=18220, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:20,039 - root - INFO - Step 1510: lr=1.00E-05, loss= 1.4504 (max= 2.7191), tps=18220, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:20,039 - root - INFO - Step 1510: lr=1.00E-05, loss= 1.4504 (max= 2.7191), tps=18220, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:20,039 - root - INFO - Step 1510: lr=1.00E-05, loss= 1.4504 (max= 2.7191), tps=18220, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:38,046 - root - INFO - Step 1520: lr=1.00E-05, loss= 1.4390 (max= 2.5723), tps=18201, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:08:38,046 - root - INFO - Step 1520: lr=1.00E-05, loss= 1.4390 (max= 2.5723), tps=18201, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:08:38,046 - root - INFO - Step 1520: lr=1.00E-05, loss= 1.4390 (max= 2.5723), tps=18201, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:08:38,046 - root - INFO - Step 1520: lr=1.00E-05, loss= 1.4390 (max= 2.5723), tps=18201, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:08:38,046 - root - INFO - Step 1520: lr=1.00E-05, loss= 1.4390 (max= 2.5723), tps=18201, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:08:38,046 - root - INFO - Step 1520: lr=1.00E-05, loss= 1.4390 (max= 2.5723), tps=18201, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:08:38,046 - root - INFO - Step 1520: lr=1.00E-05, loss= 1.4390 (max= 2.5723), tps=18201, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:08:38,046 - root - INFO - Step 1520: lr=1.00E-05, loss= 1.4390 (max= 2.5723), tps=18201, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:08:56,056 - root - INFO - Step 1530: lr=1.00E-05, loss= 1.4374 (max= 2.8266), tps=18198, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:56,056 - root - INFO - Step 1530: lr=1.00E-05, loss= 1.4374 (max= 2.8266), tps=18198, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:56,056 - root - INFO - Step 1530: lr=1.00E-05, loss= 1.4374 (max= 2.8266), tps=18198, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:56,056 - root - INFO - Step 1530: lr=1.00E-05, loss= 1.4374 (max= 2.8266), tps=18198, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:56,056 - root - INFO - Step 1530: lr=1.00E-05, loss= 1.4374 (max= 2.8266), tps=18198, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:56,056 - root - INFO - Step 1530: lr=1.00E-05, loss= 1.4374 (max= 2.8266), tps=18197, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:56,056 - root - INFO - Step 1530: lr=1.00E-05, loss= 1.4374 (max= 2.8266), tps=18198, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:08:56,056 - root - INFO - Step 1530: lr=1.00E-05, loss= 1.4374 (max= 2.8266), tps=18198, mfu=37.92%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:09:14,090 - root - INFO - Step 1540: lr=1.00E-05, loss= 1.4100 (max= 3.4848), tps=18174, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:09:14,090 - root - INFO - Step 1540: lr=1.00E-05, loss= 1.4100 (max= 3.4848), tps=18175, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:09:14,090 - root - INFO - Step 1540: lr=1.00E-05, loss= 1.4100 (max= 3.4848), tps=18174, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:09:14,090 - root - INFO - Step 1540: lr=1.00E-05, loss= 1.4100 (max= 3.4848), tps=18175, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:09:14,090 - root - INFO - Step 1540: lr=1.00E-05, loss= 1.4100 (max= 3.4848), tps=18174, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:09:14,090 - root - INFO - Step 1540: lr=1.00E-05, loss= 1.4100 (max= 3.4848), tps=18174, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:09:14,090 - root - INFO - Step 1540: lr=1.00E-05, loss= 1.4100 (max= 3.4848), tps=18174, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:09:14,090 - root - INFO - Step 1540: lr=1.00E-05, loss= 1.4100 (max= 3.4848), tps=18174, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:09:32,081 - root - INFO - Step 1550: lr=1.00E-05, loss= 1.4402 (max= 3.4825), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:09:32,081 - root - INFO - Step 1550: lr=1.00E-05, loss= 1.4402 (max= 3.4825), tps=18217, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:09:32,081 - root - INFO - Step 1550: lr=1.00E-05, loss= 1.4402 (max= 3.4825), tps=18217, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:09:32,081 - root - INFO - Step 1550: lr=1.00E-05, loss= 1.4402 (max= 3.4825), tps=18217, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:09:32,081 - root - INFO - Step 1550: lr=1.00E-05, loss= 1.4402 (max= 3.4825), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:09:32,081 - root - INFO - Step 1550: lr=1.00E-05, loss= 1.4402 (max= 3.4825), tps=18217, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:09:32,082 - root - INFO - Step 1550: lr=1.00E-05, loss= 1.4402 (max= 3.4825), tps=18217, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:09:32,082 - root - INFO - Step 1550: lr=1.00E-05, loss= 1.4402 (max= 3.4825), tps=18217, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:09:50,115 - root - INFO - Step 1560: lr=1.00E-05, loss= 1.4520 (max= 3.4519), tps=18174, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:09:50,115 - root - INFO - Step 1560: lr=1.00E-05, loss= 1.4520 (max= 3.4519), tps=18174, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:09:50,115 - root - INFO - Step 1560: lr=1.00E-05, loss= 1.4520 (max= 3.4519), tps=18174, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:09:50,115 - root - INFO - Step 1560: lr=1.00E-05, loss= 1.4520 (max= 3.4519), tps=18174, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:09:50,115 - root - INFO - Step 1560: lr=1.00E-05, loss= 1.4520 (max= 3.4519), tps=18174, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:09:50,115 - root - INFO - Step 1560: lr=1.00E-05, loss= 1.4520 (max= 3.4519), tps=18174, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:09:50,115 - root - INFO - Step 1560: lr=1.00E-05, loss= 1.4520 (max= 3.4519), tps=18174, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:09:50,115 - root - INFO - Step 1560: lr=1.00E-05, loss= 1.4520 (max= 3.4519), tps=18174, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:08,155 - root - INFO - Step 1570: lr=1.00E-05, loss= 1.4321 (max= 2.5743), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:08,155 - root - INFO - Step 1570: lr=1.00E-05, loss= 1.4321 (max= 2.5743), tps=18168, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:08,155 - root - INFO - Step 1570: lr=1.00E-05, loss= 1.4321 (max= 2.5743), tps=18168, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:08,155 - root - INFO - Step 1570: lr=1.00E-05, loss= 1.4321 (max= 2.5743), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:08,155 - root - INFO - Step 1570: lr=1.00E-05, loss= 1.4321 (max= 2.5743), tps=18168, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:08,155 - root - INFO - Step 1570: lr=1.00E-05, loss= 1.4321 (max= 2.5743), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:08,155 - root - INFO - Step 1570: lr=1.00E-05, loss= 1.4321 (max= 2.5743), tps=18168, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:08,155 - root - INFO - Step 1570: lr=1.00E-05, loss= 1.4321 (max= 2.5743), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:26,190 - root - INFO - Step 1580: lr=1.00E-05, loss= 1.4312 (max= 2.7261), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:26,191 - root - INFO - Step 1580: lr=1.00E-05, loss= 1.4312 (max= 2.7261), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:26,191 - root - INFO - Step 1580: lr=1.00E-05, loss= 1.4312 (max= 2.7261), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:26,191 - root - INFO - Step 1580: lr=1.00E-05, loss= 1.4312 (max= 2.7261), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:26,191 - root - INFO - Step 1580: lr=1.00E-05, loss= 1.4312 (max= 2.7261), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:26,191 - root - INFO - Step 1580: lr=1.00E-05, loss= 1.4312 (max= 2.7261), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:26,191 - root - INFO - Step 1580: lr=1.00E-05, loss= 1.4312 (max= 2.7261), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:26,191 - root - INFO - Step 1580: lr=1.00E-05, loss= 1.4312 (max= 2.7261), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:44,246 - root - INFO - Step 1590: lr=1.00E-05, loss= 1.4142 (max= 2.7173), tps=18152, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:44,246 - root - INFO - Step 1590: lr=1.00E-05, loss= 1.4142 (max= 2.7173), tps=18152, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:44,246 - root - INFO - Step 1590: lr=1.00E-05, loss= 1.4142 (max= 2.7173), tps=18152, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:44,246 - root - INFO - Step 1590: lr=1.00E-05, loss= 1.4142 (max= 2.7173), tps=18152, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:44,246 - root - INFO - Step 1590: lr=1.00E-05, loss= 1.4142 (max= 2.7173), tps=18152, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:44,246 - root - INFO - Step 1590: lr=1.00E-05, loss= 1.4142 (max= 2.7173), tps=18152, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:44,246 - root - INFO - Step 1590: lr=1.00E-05, loss= 1.4142 (max= 2.7173), tps=18152, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:10:44,246 - root - INFO - Step 1590: lr=1.00E-05, loss= 1.4142 (max= 2.7173), tps=18152, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:11:02,281 - root - INFO - Step 1600: lr=1.00E-05, loss= 1.4186 (max= 2.9612), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:11:02,281 - root - INFO - Step 1600: lr=1.00E-05, loss= 1.4186 (max= 2.9612), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:11:02,282 - root - INFO - Step 1600: lr=1.00E-05, loss= 1.4186 (max= 2.9612), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:11:02,282 - root - INFO - Step 1600: lr=1.00E-05, loss= 1.4186 (max= 2.9612), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:11:02,282 - root - INFO - Step 1600: lr=1.00E-05, loss= 1.4186 (max= 2.9612), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:11:02,282 - root - INFO - Step 1600: lr=1.00E-05, loss= 1.4186 (max= 2.9612), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:11:02,282 - root - INFO - Step 1600: lr=1.00E-05, loss= 1.4186 (max= 2.9612), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:11:02,282 - root - INFO - Step 1600: lr=1.00E-05, loss= 1.4186 (max= 2.9612), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:11:20,323 - root - INFO - Step 1610: lr=1.00E-05, loss= 1.4223 (max= 2.4785), tps=18166, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:11:20,323 - root - INFO - Step 1610: lr=1.00E-05, loss= 1.4223 (max= 2.4785), tps=18166, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:11:20,323 - root - INFO - Step 1610: lr=1.00E-05, loss= 1.4223 (max= 2.4785), tps=18166, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:11:20,323 - root - INFO - Step 1610: lr=1.00E-05, loss= 1.4223 (max= 2.4785), tps=18166, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:11:20,323 - root - INFO - Step 1610: lr=1.00E-05, loss= 1.4223 (max= 2.4785), tps=18166, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:11:20,323 - root - INFO - Step 1610: lr=1.00E-05, loss= 1.4223 (max= 2.4785), tps=18166, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:11:20,323 - root - INFO - Step 1610: lr=1.00E-05, loss= 1.4223 (max= 2.4785), tps=18166, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:11:20,323 - root - INFO - Step 1610: lr=1.00E-05, loss= 1.4223 (max= 2.4785), tps=18166, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:11:38,334 - root - INFO - Step 1620: lr=1.00E-05, loss= 1.4018 (max= 2.6775), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:11:38,335 - root - INFO - Step 1620: lr=1.00E-05, loss= 1.4018 (max= 2.6775), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:11:38,335 - root - INFO - Step 1620: lr=1.00E-05, loss= 1.4018 (max= 2.6775), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:11:38,335 - root - INFO - Step 1620: lr=1.00E-05, loss= 1.4018 (max= 2.6775), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:11:38,335 - root - INFO - Step 1620: lr=1.00E-05, loss= 1.4018 (max= 2.6775), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:11:38,335 - root - INFO - Step 1620: lr=1.00E-05, loss= 1.4018 (max= 2.6775), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:11:38,335 - root - INFO - Step 1620: lr=1.00E-05, loss= 1.4018 (max= 2.6775), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:11:38,335 - root - INFO - Step 1620: lr=1.00E-05, loss= 1.4018 (max= 2.6775), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:11:56,350 - root - INFO - Step 1630: lr=1.00E-05, loss= 1.4100 (max= 3.6525), tps=18191, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:11:56,351 - root - INFO - Step 1630: lr=1.00E-05, loss= 1.4100 (max= 3.6525), tps=18192, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:11:56,351 - root - INFO - Step 1630: lr=1.00E-05, loss= 1.4100 (max= 3.6525), tps=18192, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:11:56,351 - root - INFO - Step 1630: lr=1.00E-05, loss= 1.4100 (max= 3.6525), tps=18192, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:11:56,351 - root - INFO - Step 1630: lr=1.00E-05, loss= 1.4100 (max= 3.6525), tps=18192, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:11:56,351 - root - INFO - Step 1630: lr=1.00E-05, loss= 1.4100 (max= 3.6525), tps=18192, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:11:56,351 - root - INFO - Step 1630: lr=1.00E-05, loss= 1.4100 (max= 3.6525), tps=18192, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:11:56,351 - root - INFO - Step 1630: lr=1.00E-05, loss= 1.4100 (max= 3.6525), tps=18192, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:12:14,387 - root - INFO - Step 1640: lr=1.00E-05, loss= 1.4163 (max= 2.9972), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:12:14,388 - root - INFO - Step 1640: lr=1.00E-05, loss= 1.4163 (max= 2.9972), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:12:14,388 - root - INFO - Step 1640: lr=1.00E-05, loss= 1.4163 (max= 2.9972), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:12:14,388 - root - INFO - Step 1640: lr=1.00E-05, loss= 1.4163 (max= 2.9972), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:12:14,388 - root - INFO - Step 1640: lr=1.00E-05, loss= 1.4163 (max= 2.9972), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:12:14,388 - root - INFO - Step 1640: lr=1.00E-05, loss= 1.4163 (max= 2.9972), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:12:14,388 - root - INFO - Step 1640: lr=1.00E-05, loss= 1.4163 (max= 2.9972), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:12:14,389 - root - INFO - Step 1640: lr=1.00E-05, loss= 1.4163 (max= 2.9972), tps=18171, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:12:32,432 - root - INFO - Step 1650: lr=1.00E-05, loss= 1.4238 (max= 2.5201), tps=18164, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:12:32,432 - root - INFO - Step 1650: lr=1.00E-05, loss= 1.4238 (max= 2.5201), tps=18164, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:12:32,432 - root - INFO - Step 1650: lr=1.00E-05, loss= 1.4238 (max= 2.5201), tps=18164, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:12:32,432 - root - INFO - Step 1650: lr=1.00E-05, loss= 1.4238 (max= 2.5201), tps=18164, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:12:32,432 - root - INFO - Step 1650: lr=1.00E-05, loss= 1.4238 (max= 2.5201), tps=18163, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:12:32,432 - root - INFO - Step 1650: lr=1.00E-05, loss= 1.4238 (max= 2.5201), tps=18163, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:12:32,432 - root - INFO - Step 1650: lr=1.00E-05, loss= 1.4238 (max= 2.5201), tps=18163, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:12:32,432 - root - INFO - Step 1650: lr=1.00E-05, loss= 1.4238 (max= 2.5201), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.05%) +2025-10-24 10:12:50,424 - root - INFO - Step 1660: lr=1.00E-05, loss= 1.4065 (max= 2.9731), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:12:50,424 - root - INFO - Step 1660: lr=1.00E-05, loss= 1.4065 (max= 2.9731), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:12:50,424 - root - INFO - Step 1660: lr=1.00E-05, loss= 1.4065 (max= 2.9731), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:12:50,424 - root - INFO - Step 1660: lr=1.00E-05, loss= 1.4065 (max= 2.9731), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:12:50,424 - root - INFO - Step 1660: lr=1.00E-05, loss= 1.4065 (max= 2.9731), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:12:50,424 - root - INFO - Step 1660: lr=1.00E-05, loss= 1.4065 (max= 2.9731), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:12:50,424 - root - INFO - Step 1660: lr=1.00E-05, loss= 1.4065 (max= 2.9731), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:12:50,424 - root - INFO - Step 1660: lr=1.00E-05, loss= 1.4065 (max= 2.9731), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:13:09,242 - root - INFO - Step 1670: lr=1.00E-05, loss= 1.4452 (max= 3.0064), tps=17417, mfu=36.29%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:13:09,242 - root - INFO - Step 1670: lr=1.00E-05, loss= 1.4452 (max= 3.0064), tps=17416, mfu=36.29%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:13:09,242 - root - INFO - Step 1670: lr=1.00E-05, loss= 1.4452 (max= 3.0064), tps=17416, mfu=36.29%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:13:09,242 - root - INFO - Step 1670: lr=1.00E-05, loss= 1.4452 (max= 3.0064), tps=17416, mfu=36.29%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:13:09,242 - root - INFO - Step 1670: lr=1.00E-05, loss= 1.4452 (max= 3.0064), tps=17416, mfu=36.29%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:13:09,242 - root - INFO - Step 1670: lr=1.00E-05, loss= 1.4452 (max= 3.0064), tps=17416, mfu=36.29%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:13:09,242 - root - INFO - Step 1670: lr=1.00E-05, loss= 1.4452 (max= 3.0064), tps=17416, mfu=36.29%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:13:09,243 - root - INFO - Step 1670: lr=1.00E-05, loss= 1.4452 (max= 3.0064), tps=17416, mfu=36.29%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:13:27,279 - root - INFO - Step 1680: lr=1.00E-05, loss= 1.3910 (max= 2.8583), tps=18171, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:13:27,279 - root - INFO - Step 1680: lr=1.00E-05, loss= 1.3910 (max= 2.8583), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:13:27,280 - root - INFO - Step 1680: lr=1.00E-05, loss= 1.3910 (max= 2.8583), tps=18171, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:13:27,280 - root - INFO - Step 1680: lr=1.00E-05, loss= 1.3910 (max= 2.8583), tps=18171, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:13:27,280 - root - INFO - Step 1680: lr=1.00E-05, loss= 1.3910 (max= 2.8583), tps=18171, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:13:27,280 - root - INFO - Step 1680: lr=1.00E-05, loss= 1.3910 (max= 2.8583), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:13:27,280 - root - INFO - Step 1680: lr=1.00E-05, loss= 1.3910 (max= 2.8583), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:13:27,280 - root - INFO - Step 1680: lr=1.00E-05, loss= 1.3910 (max= 2.8583), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:13:45,342 - root - INFO - Step 1690: lr=1.00E-05, loss= 1.3682 (max= 2.3836), tps=18146, mfu=37.81%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.09%) +2025-10-24 10:13:45,342 - root - INFO - Step 1690: lr=1.00E-05, loss= 1.3682 (max= 2.3836), tps=18146, mfu=37.81%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.09%) +2025-10-24 10:13:45,342 - root - INFO - Step 1690: lr=1.00E-05, loss= 1.3682 (max= 2.3836), tps=18146, mfu=37.81%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.09%) +2025-10-24 10:13:45,342 - root - INFO - Step 1690: lr=1.00E-05, loss= 1.3682 (max= 2.3836), tps=18146, mfu=37.81%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.09%) +2025-10-24 10:13:45,342 - root - INFO - Step 1690: lr=1.00E-05, loss= 1.3682 (max= 2.3836), tps=18147, mfu=37.81%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.09%) +2025-10-24 10:13:45,342 - root - INFO - Step 1690: lr=1.00E-05, loss= 1.3682 (max= 2.3836), tps=18146, mfu=37.81%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.09%) +2025-10-24 10:13:45,343 - root - INFO - Step 1690: lr=1.00E-05, loss= 1.3682 (max= 2.3836), tps=18146, mfu=37.81%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.09%) +2025-10-24 10:13:45,343 - root - INFO - Step 1690: lr=1.00E-05, loss= 1.3682 (max= 2.3836), tps=18146, mfu=37.81%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.09%) +2025-10-24 10:14:03,812 - root - INFO - Step 1700: lr=1.00E-05, loss= 1.4055 (max= 2.4482), tps=17745, mfu=36.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.07%) +2025-10-24 10:14:03,812 - root - INFO - Step 1700: lr=1.00E-05, loss= 1.4055 (max= 2.4482), tps=17745, mfu=36.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.07%) +2025-10-24 10:14:03,812 - root - INFO - Step 1700: lr=1.00E-05, loss= 1.4055 (max= 2.4482), tps=17745, mfu=36.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.07%) +2025-10-24 10:14:03,812 - root - INFO - Step 1700: lr=1.00E-05, loss= 1.4055 (max= 2.4482), tps=17745, mfu=36.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.07%) +2025-10-24 10:14:03,812 - root - INFO - Step 1700: lr=1.00E-05, loss= 1.4055 (max= 2.4482), tps=17745, mfu=36.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.07%) +2025-10-24 10:14:03,812 - root - INFO - Step 1700: lr=1.00E-05, loss= 1.4055 (max= 2.4482), tps=17745, mfu=36.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.07%) +2025-10-24 10:14:03,812 - root - INFO - Step 1700: lr=1.00E-05, loss= 1.4055 (max= 2.4482), tps=17745, mfu=36.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.07%) +2025-10-24 10:14:03,813 - root - INFO - Step 1700: lr=1.00E-05, loss= 1.4055 (max= 2.4482), tps=17746, mfu=36.97%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.07%) +2025-10-24 10:14:21,916 - root - INFO - Step 1710: lr=1.00E-05, loss= 1.4320 (max= 3.7543), tps=18104, mfu=37.72%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:14:21,916 - root - INFO - Step 1710: lr=1.00E-05, loss= 1.4320 (max= 3.7543), tps=18104, mfu=37.72%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:14:21,916 - root - INFO - Step 1710: lr=1.00E-05, loss= 1.4320 (max= 3.7543), tps=18104, mfu=37.72%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:14:21,916 - root - INFO - Step 1710: lr=1.00E-05, loss= 1.4320 (max= 3.7543), tps=18104, mfu=37.72%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:14:21,916 - root - INFO - Step 1710: lr=1.00E-05, loss= 1.4320 (max= 3.7543), tps=18104, mfu=37.72%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:14:21,916 - root - INFO - Step 1710: lr=1.00E-05, loss= 1.4320 (max= 3.7543), tps=18104, mfu=37.72%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:14:21,916 - root - INFO - Step 1710: lr=1.00E-05, loss= 1.4320 (max= 3.7543), tps=18104, mfu=37.72%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:14:21,916 - root - INFO - Step 1710: lr=1.00E-05, loss= 1.4320 (max= 3.7543), tps=18104, mfu=37.72%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 10:14:40,057 - root - INFO - Step 1720: lr=1.00E-05, loss= 1.4156 (max= 2.6500), tps=18067, mfu=37.64%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:14:40,057 - root - INFO - Step 1720: lr=1.00E-05, loss= 1.4156 (max= 2.6500), tps=18067, mfu=37.64%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:14:40,057 - root - INFO - Step 1720: lr=1.00E-05, loss= 1.4156 (max= 2.6500), tps=18067, mfu=37.64%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:14:40,057 - root - INFO - Step 1720: lr=1.00E-05, loss= 1.4156 (max= 2.6500), tps=18067, mfu=37.64%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:14:40,057 - root - INFO - Step 1720: lr=1.00E-05, loss= 1.4156 (max= 2.6500), tps=18067, mfu=37.64%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:14:40,058 - root - INFO - Step 1720: lr=1.00E-05, loss= 1.4156 (max= 2.6500), tps=18066, mfu=37.64%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:14:40,058 - root - INFO - Step 1720: lr=1.00E-05, loss= 1.4156 (max= 2.6500), tps=18067, mfu=37.64%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:14:40,058 - root - INFO - Step 1720: lr=1.00E-05, loss= 1.4156 (max= 2.6500), tps=18066, mfu=37.64%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:14:58,852 - root - INFO - Step 1730: lr=1.00E-05, loss= 1.4278 (max= 3.4249), tps=17438, mfu=36.33%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:14:58,852 - root - INFO - Step 1730: lr=1.00E-05, loss= 1.4278 (max= 3.4249), tps=17438, mfu=36.33%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:14:58,852 - root - INFO - Step 1730: lr=1.00E-05, loss= 1.4278 (max= 3.4249), tps=17438, mfu=36.33%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:14:58,852 - root - INFO - Step 1730: lr=1.00E-05, loss= 1.4278 (max= 3.4249), tps=17437, mfu=36.33%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:14:58,853 - root - INFO - Step 1730: lr=1.00E-05, loss= 1.4278 (max= 3.4249), tps=17438, mfu=36.33%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:14:58,853 - root - INFO - Step 1730: lr=1.00E-05, loss= 1.4278 (max= 3.4249), tps=17437, mfu=36.33%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:14:58,853 - root - INFO - Step 1730: lr=1.00E-05, loss= 1.4278 (max= 3.4249), tps=17438, mfu=36.33%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:14:58,853 - root - INFO - Step 1730: lr=1.00E-05, loss= 1.4278 (max= 3.4249), tps=17438, mfu=36.33%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:15:23,746 - root - INFO - Step 1740: lr=1.00E-05, loss= 1.4284 (max= 2.6127), tps=13165, mfu=27.43%, memory: 78.53GiB(44.03%) time/data_loading=0.02s (max=0.18s, 28.61%) +2025-10-24 10:15:23,746 - root - INFO - Step 1740: lr=1.00E-05, loss= 1.4284 (max= 2.6127), tps=13165, mfu=27.43%, memory: 78.53GiB(44.03%) time/data_loading=0.02s (max=0.18s, 28.61%) +2025-10-24 10:15:23,746 - root - INFO - Step 1740: lr=1.00E-05, loss= 1.4284 (max= 2.6127), tps=13165, mfu=27.43%, memory: 78.53GiB(44.03%) time/data_loading=0.02s (max=0.18s, 28.61%) +2025-10-24 10:15:23,746 - root - INFO - Step 1740: lr=1.00E-05, loss= 1.4284 (max= 2.6127), tps=13165, mfu=27.43%, memory: 78.53GiB(44.03%) time/data_loading=0.02s (max=0.18s, 28.61%) +2025-10-24 10:15:23,746 - root - INFO - Step 1740: lr=1.00E-05, loss= 1.4284 (max= 2.6127), tps=13165, mfu=27.43%, memory: 78.53GiB(44.03%) time/data_loading=0.02s (max=0.18s, 28.61%) +2025-10-24 10:15:23,746 - root - INFO - Step 1740: lr=1.00E-05, loss= 1.4284 (max= 2.6127), tps=13165, mfu=27.43%, memory: 78.53GiB(44.03%) time/data_loading=0.02s (max=0.18s, 28.61%) +2025-10-24 10:15:23,746 - root - INFO - Step 1740: lr=1.00E-05, loss= 1.4284 (max= 2.6127), tps=13165, mfu=27.43%, memory: 78.53GiB(44.03%) time/data_loading=0.02s (max=0.18s, 28.61%) +2025-10-24 10:15:23,746 - root - INFO - Step 1740: lr=1.00E-05, loss= 1.4284 (max= 2.6127), tps=13165, mfu=27.43%, memory: 78.53GiB(44.03%) time/data_loading=0.02s (max=0.18s, 28.61%) +2025-10-24 10:15:41,786 - root - INFO - Step 1750: lr=1.00E-05, loss= 1.3901 (max= 2.6121), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:15:41,786 - root - INFO - Step 1750: lr=1.00E-05, loss= 1.3901 (max= 2.6121), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:15:41,786 - root - INFO - Step 1750: lr=1.00E-05, loss= 1.3901 (max= 2.6121), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:15:41,786 - root - INFO - Step 1750: lr=1.00E-05, loss= 1.3901 (max= 2.6121), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:15:41,786 - root - INFO - Step 1750: lr=1.00E-05, loss= 1.3901 (max= 2.6121), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:15:41,786 - root - INFO - Step 1750: lr=1.00E-05, loss= 1.3901 (max= 2.6121), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:15:41,786 - root - INFO - Step 1750: lr=1.00E-05, loss= 1.3901 (max= 2.6121), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:15:41,786 - root - INFO - Step 1750: lr=1.00E-05, loss= 1.3901 (max= 2.6121), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:15:59,812 - root - INFO - Step 1760: lr=1.00E-05, loss= 1.4317 (max= 2.4749), tps=18181, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:15:59,812 - root - INFO - Step 1760: lr=1.00E-05, loss= 1.4317 (max= 2.4749), tps=18181, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:15:59,812 - root - INFO - Step 1760: lr=1.00E-05, loss= 1.4317 (max= 2.4749), tps=18181, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:15:59,813 - root - INFO - Step 1760: lr=1.00E-05, loss= 1.4317 (max= 2.4749), tps=18181, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:15:59,812 - root - INFO - Step 1760: lr=1.00E-05, loss= 1.4317 (max= 2.4749), tps=18181, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:15:59,813 - root - INFO - Step 1760: lr=1.00E-05, loss= 1.4317 (max= 2.4749), tps=18181, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:15:59,813 - root - INFO - Step 1760: lr=1.00E-05, loss= 1.4317 (max= 2.4749), tps=18181, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:15:59,813 - root - INFO - Step 1760: lr=1.00E-05, loss= 1.4317 (max= 2.4749), tps=18181, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:16:17,814 - root - INFO - Step 1770: lr=1.00E-05, loss= 1.4505 (max= 3.1607), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:16:17,814 - root - INFO - Step 1770: lr=1.00E-05, loss= 1.4505 (max= 3.1607), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:16:17,814 - root - INFO - Step 1770: lr=1.00E-05, loss= 1.4505 (max= 3.1607), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:16:17,814 - root - INFO - Step 1770: lr=1.00E-05, loss= 1.4505 (max= 3.1607), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:16:17,814 - root - INFO - Step 1770: lr=1.00E-05, loss= 1.4505 (max= 3.1607), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:16:17,814 - root - INFO - Step 1770: lr=1.00E-05, loss= 1.4505 (max= 3.1607), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:16:17,815 - root - INFO - Step 1770: lr=1.00E-05, loss= 1.4505 (max= 3.1607), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:16:17,815 - root - INFO - Step 1770: lr=1.00E-05, loss= 1.4505 (max= 3.1607), tps=18206, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:16:35,819 - root - INFO - Step 1780: lr=1.00E-05, loss= 1.4219 (max= 3.0698), tps=18204, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:16:35,819 - root - INFO - Step 1780: lr=1.00E-05, loss= 1.4219 (max= 3.0698), tps=18204, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:16:35,819 - root - INFO - Step 1780: lr=1.00E-05, loss= 1.4219 (max= 3.0698), tps=18204, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:16:35,819 - root - INFO - Step 1780: lr=1.00E-05, loss= 1.4219 (max= 3.0698), tps=18204, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:16:35,819 - root - INFO - Step 1780: lr=1.00E-05, loss= 1.4219 (max= 3.0698), tps=18203, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:16:35,819 - root - INFO - Step 1780: lr=1.00E-05, loss= 1.4219 (max= 3.0698), tps=18204, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:16:35,819 - root - INFO - Step 1780: lr=1.00E-05, loss= 1.4219 (max= 3.0698), tps=18204, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:16:35,819 - root - INFO - Step 1780: lr=1.00E-05, loss= 1.4219 (max= 3.0698), tps=18203, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:16:53,847 - root - INFO - Step 1790: lr=1.00E-05, loss= 1.3728 (max= 2.8624), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:16:53,847 - root - INFO - Step 1790: lr=1.00E-05, loss= 1.3728 (max= 2.8624), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:16:53,847 - root - INFO - Step 1790: lr=1.00E-05, loss= 1.3728 (max= 2.8624), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:16:53,847 - root - INFO - Step 1790: lr=1.00E-05, loss= 1.3728 (max= 2.8624), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:16:53,847 - root - INFO - Step 1790: lr=1.00E-05, loss= 1.3728 (max= 2.8624), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:16:53,847 - root - INFO - Step 1790: lr=1.00E-05, loss= 1.3728 (max= 2.8624), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:16:53,847 - root - INFO - Step 1790: lr=1.00E-05, loss= 1.3728 (max= 2.8624), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:16:53,847 - root - INFO - Step 1790: lr=1.00E-05, loss= 1.3728 (max= 2.8624), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:17:11,845 - root - INFO - Step 1800: lr=1.00E-05, loss= 1.4296 (max= 3.0054), tps=18210, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:17:11,845 - root - INFO - Step 1800: lr=1.00E-05, loss= 1.4296 (max= 3.0054), tps=18210, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:17:11,845 - root - INFO - Step 1800: lr=1.00E-05, loss= 1.4296 (max= 3.0054), tps=18210, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:17:11,845 - root - INFO - Step 1800: lr=1.00E-05, loss= 1.4296 (max= 3.0054), tps=18210, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:17:11,845 - root - INFO - Step 1800: lr=1.00E-05, loss= 1.4296 (max= 3.0054), tps=18210, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:17:11,845 - root - INFO - Step 1800: lr=1.00E-05, loss= 1.4296 (max= 3.0054), tps=18210, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:17:11,845 - root - INFO - Step 1800: lr=1.00E-05, loss= 1.4296 (max= 3.0054), tps=18210, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:17:11,845 - root - INFO - Step 1800: lr=1.00E-05, loss= 1.4296 (max= 3.0054), tps=18210, mfu=37.94%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:17:29,879 - root - INFO - Step 1810: lr=1.00E-05, loss= 1.3996 (max= 2.6855), tps=18174, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:17:29,879 - root - INFO - Step 1810: lr=1.00E-05, loss= 1.3996 (max= 2.6855), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:17:29,879 - root - INFO - Step 1810: lr=1.00E-05, loss= 1.3996 (max= 2.6855), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:17:29,879 - root - INFO - Step 1810: lr=1.00E-05, loss= 1.3996 (max= 2.6855), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:17:29,879 - root - INFO - Step 1810: lr=1.00E-05, loss= 1.3996 (max= 2.6855), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:17:29,879 - root - INFO - Step 1810: lr=1.00E-05, loss= 1.3996 (max= 2.6855), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:17:29,879 - root - INFO - Step 1810: lr=1.00E-05, loss= 1.3996 (max= 2.6855), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:17:29,879 - root - INFO - Step 1810: lr=1.00E-05, loss= 1.3996 (max= 2.6855), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:17:47,943 - root - INFO - Step 1820: lr=1.00E-05, loss= 1.4496 (max= 2.6532), tps=18143, mfu=37.80%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:17:47,943 - root - INFO - Step 1820: lr=1.00E-05, loss= 1.4496 (max= 2.6532), tps=18143, mfu=37.80%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:17:47,944 - root - INFO - Step 1820: lr=1.00E-05, loss= 1.4496 (max= 2.6532), tps=18143, mfu=37.80%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:17:47,944 - root - INFO - Step 1820: lr=1.00E-05, loss= 1.4496 (max= 2.6532), tps=18143, mfu=37.80%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:17:47,944 - root - INFO - Step 1820: lr=1.00E-05, loss= 1.4496 (max= 2.6532), tps=18143, mfu=37.80%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:17:47,944 - root - INFO - Step 1820: lr=1.00E-05, loss= 1.4496 (max= 2.6532), tps=18143, mfu=37.80%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:17:47,944 - root - INFO - Step 1820: lr=1.00E-05, loss= 1.4496 (max= 2.6532), tps=18143, mfu=37.80%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:17:47,944 - root - INFO - Step 1820: lr=1.00E-05, loss= 1.4496 (max= 2.6532), tps=18143, mfu=37.80%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:18:06,130 - root - INFO - Step 1830: lr=1.00E-05, loss= 1.4341 (max= 2.5079), tps=18021, mfu=37.55%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:18:06,130 - root - INFO - Step 1830: lr=1.00E-05, loss= 1.4341 (max= 2.5079), tps=18021, mfu=37.55%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:18:06,130 - root - INFO - Step 1830: lr=1.00E-05, loss= 1.4341 (max= 2.5079), tps=18021, mfu=37.55%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:18:06,130 - root - INFO - Step 1830: lr=1.00E-05, loss= 1.4341 (max= 2.5079), tps=18021, mfu=37.55%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:18:06,130 - root - INFO - Step 1830: lr=1.00E-05, loss= 1.4341 (max= 2.5079), tps=18021, mfu=37.55%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:18:06,130 - root - INFO - Step 1830: lr=1.00E-05, loss= 1.4341 (max= 2.5079), tps=18021, mfu=37.55%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:18:06,130 - root - INFO - Step 1830: lr=1.00E-05, loss= 1.4341 (max= 2.5079), tps=18021, mfu=37.55%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:18:06,130 - root - INFO - Step 1830: lr=1.00E-05, loss= 1.4341 (max= 2.5079), tps=18021, mfu=37.55%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:18:14,887 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:1780277 +2025-10-24 10:18:17,118 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:1033588 +2025-10-24 10:18:24,122 - root - INFO - Step 1840: lr=1.00E-05, loss= 1.4175 (max= 2.8172), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:18:24,123 - root - INFO - Step 1840: lr=1.00E-05, loss= 1.4175 (max= 2.8172), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:18:24,123 - root - INFO - Step 1840: lr=1.00E-05, loss= 1.4175 (max= 2.8172), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:18:24,123 - root - INFO - Step 1840: lr=1.00E-05, loss= 1.4175 (max= 2.8172), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:18:24,123 - root - INFO - Step 1840: lr=1.00E-05, loss= 1.4175 (max= 2.8172), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:18:24,123 - root - INFO - Step 1840: lr=1.00E-05, loss= 1.4175 (max= 2.8172), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:18:24,123 - root - INFO - Step 1840: lr=1.00E-05, loss= 1.4175 (max= 2.8172), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:18:24,123 - root - INFO - Step 1840: lr=1.00E-05, loss= 1.4175 (max= 2.8172), tps=18216, mfu=37.95%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:18:42,148 - root - INFO - Step 1850: lr=1.00E-05, loss= 1.4205 (max= 2.7238), tps=18182, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:18:42,148 - root - INFO - Step 1850: lr=1.00E-05, loss= 1.4205 (max= 2.7238), tps=18182, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:18:42,148 - root - INFO - Step 1850: lr=1.00E-05, loss= 1.4205 (max= 2.7238), tps=18182, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:18:42,148 - root - INFO - Step 1850: lr=1.00E-05, loss= 1.4205 (max= 2.7238), tps=18182, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:18:42,148 - root - INFO - Step 1850: lr=1.00E-05, loss= 1.4205 (max= 2.7238), tps=18182, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:18:42,148 - root - INFO - Step 1850: lr=1.00E-05, loss= 1.4205 (max= 2.7238), tps=18182, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:18:42,148 - root - INFO - Step 1850: lr=1.00E-05, loss= 1.4205 (max= 2.7238), tps=18182, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:18:42,148 - root - INFO - Step 1850: lr=1.00E-05, loss= 1.4205 (max= 2.7238), tps=18182, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:19:00,183 - root - INFO - Step 1860: lr=1.00E-05, loss= 1.4366 (max= 2.5347), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:00,183 - root - INFO - Step 1860: lr=1.00E-05, loss= 1.4366 (max= 2.5347), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:00,183 - root - INFO - Step 1860: lr=1.00E-05, loss= 1.4366 (max= 2.5347), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:00,183 - root - INFO - Step 1860: lr=1.00E-05, loss= 1.4366 (max= 2.5347), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:00,183 - root - INFO - Step 1860: lr=1.00E-05, loss= 1.4366 (max= 2.5347), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:00,183 - root - INFO - Step 1860: lr=1.00E-05, loss= 1.4366 (max= 2.5347), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:00,183 - root - INFO - Step 1860: lr=1.00E-05, loss= 1.4366 (max= 2.5347), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:00,183 - root - INFO - Step 1860: lr=1.00E-05, loss= 1.4366 (max= 2.5347), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:18,197 - root - INFO - Step 1870: lr=1.00E-05, loss= 1.3818 (max= 2.6945), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:18,197 - root - INFO - Step 1870: lr=1.00E-05, loss= 1.3818 (max= 2.6945), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:18,197 - root - INFO - Step 1870: lr=1.00E-05, loss= 1.3818 (max= 2.6945), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:18,197 - root - INFO - Step 1870: lr=1.00E-05, loss= 1.3818 (max= 2.6945), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:18,197 - root - INFO - Step 1870: lr=1.00E-05, loss= 1.3818 (max= 2.6945), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:18,197 - root - INFO - Step 1870: lr=1.00E-05, loss= 1.3818 (max= 2.6945), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:18,197 - root - INFO - Step 1870: lr=1.00E-05, loss= 1.3818 (max= 2.6945), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:18,197 - root - INFO - Step 1870: lr=1.00E-05, loss= 1.3818 (max= 2.6945), tps=18193, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:36,221 - root - INFO - Step 1880: lr=1.00E-05, loss= 1.4028 (max= 3.0822), tps=18184, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:36,221 - root - INFO - Step 1880: lr=1.00E-05, loss= 1.4028 (max= 3.0822), tps=18184, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:36,221 - root - INFO - Step 1880: lr=1.00E-05, loss= 1.4028 (max= 3.0822), tps=18184, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:36,221 - root - INFO - Step 1880: lr=1.00E-05, loss= 1.4028 (max= 3.0822), tps=18184, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:36,221 - root - INFO - Step 1880: lr=1.00E-05, loss= 1.4028 (max= 3.0822), tps=18184, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:36,221 - root - INFO - Step 1880: lr=1.00E-05, loss= 1.4028 (max= 3.0822), tps=18184, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:36,221 - root - INFO - Step 1880: lr=1.00E-05, loss= 1.4028 (max= 3.0822), tps=18184, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:36,221 - root - INFO - Step 1880: lr=1.00E-05, loss= 1.4028 (max= 3.0822), tps=18184, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:54,238 - root - INFO - Step 1890: lr=1.00E-05, loss= 1.3948 (max= 3.6789), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:54,238 - root - INFO - Step 1890: lr=1.00E-05, loss= 1.3948 (max= 3.6789), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:54,238 - root - INFO - Step 1890: lr=1.00E-05, loss= 1.3948 (max= 3.6789), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:54,238 - root - INFO - Step 1890: lr=1.00E-05, loss= 1.3948 (max= 3.6789), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:54,238 - root - INFO - Step 1890: lr=1.00E-05, loss= 1.3948 (max= 3.6789), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:54,238 - root - INFO - Step 1890: lr=1.00E-05, loss= 1.3948 (max= 3.6789), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:54,238 - root - INFO - Step 1890: lr=1.00E-05, loss= 1.3948 (max= 3.6789), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:19:54,238 - root - INFO - Step 1890: lr=1.00E-05, loss= 1.3948 (max= 3.6789), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:12,253 - root - INFO - Step 1900: lr=1.00E-05, loss= 1.4300 (max= 2.5143), tps=18193, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:12,253 - root - INFO - Step 1900: lr=1.00E-05, loss= 1.4300 (max= 2.5143), tps=18193, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:12,253 - root - INFO - Step 1900: lr=1.00E-05, loss= 1.4300 (max= 2.5143), tps=18193, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:12,253 - root - INFO - Step 1900: lr=1.00E-05, loss= 1.4300 (max= 2.5143), tps=18193, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:12,253 - root - INFO - Step 1900: lr=1.00E-05, loss= 1.4300 (max= 2.5143), tps=18193, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:12,253 - root - INFO - Step 1900: lr=1.00E-05, loss= 1.4300 (max= 2.5143), tps=18193, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:12,254 - root - INFO - Step 1900: lr=1.00E-05, loss= 1.4300 (max= 2.5143), tps=18193, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:12,254 - root - INFO - Step 1900: lr=1.00E-05, loss= 1.4300 (max= 2.5143), tps=18193, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:30,244 - root - INFO - Step 1910: lr=1.00E-05, loss= 1.4023 (max= 2.5917), tps=18217, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:30,244 - root - INFO - Step 1910: lr=1.00E-05, loss= 1.4023 (max= 2.5917), tps=18217, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:30,244 - root - INFO - Step 1910: lr=1.00E-05, loss= 1.4023 (max= 2.5917), tps=18217, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:30,244 - root - INFO - Step 1910: lr=1.00E-05, loss= 1.4023 (max= 2.5917), tps=18217, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:30,244 - root - INFO - Step 1910: lr=1.00E-05, loss= 1.4023 (max= 2.5917), tps=18217, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:30,244 - root - INFO - Step 1910: lr=1.00E-05, loss= 1.4023 (max= 2.5917), tps=18217, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:30,245 - root - INFO - Step 1910: lr=1.00E-05, loss= 1.4023 (max= 2.5917), tps=18217, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:30,245 - root - INFO - Step 1910: lr=1.00E-05, loss= 1.4023 (max= 2.5917), tps=18217, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:48,298 - root - INFO - Step 1920: lr=1.00E-05, loss= 1.3908 (max= 2.5445), tps=18155, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:48,298 - root - INFO - Step 1920: lr=1.00E-05, loss= 1.3908 (max= 2.5445), tps=18154, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:48,298 - root - INFO - Step 1920: lr=1.00E-05, loss= 1.3908 (max= 2.5445), tps=18155, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:48,298 - root - INFO - Step 1920: lr=1.00E-05, loss= 1.3908 (max= 2.5445), tps=18155, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:48,298 - root - INFO - Step 1920: lr=1.00E-05, loss= 1.3908 (max= 2.5445), tps=18154, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:48,298 - root - INFO - Step 1920: lr=1.00E-05, loss= 1.3908 (max= 2.5445), tps=18154, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:48,298 - root - INFO - Step 1920: lr=1.00E-05, loss= 1.3908 (max= 2.5445), tps=18155, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:20:48,298 - root - INFO - Step 1920: lr=1.00E-05, loss= 1.3908 (max= 2.5445), tps=18154, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:06,342 - root - INFO - Step 1930: lr=1.00E-05, loss= 1.4096 (max= 2.4736), tps=18163, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:06,342 - root - INFO - Step 1930: lr=1.00E-05, loss= 1.4096 (max= 2.4736), tps=18163, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:06,342 - root - INFO - Step 1930: lr=1.00E-05, loss= 1.4096 (max= 2.4736), tps=18163, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:06,342 - root - INFO - Step 1930: lr=1.00E-05, loss= 1.4096 (max= 2.4736), tps=18163, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:06,342 - root - INFO - Step 1930: lr=1.00E-05, loss= 1.4096 (max= 2.4736), tps=18163, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:06,342 - root - INFO - Step 1930: lr=1.00E-05, loss= 1.4096 (max= 2.4736), tps=18163, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:06,342 - root - INFO - Step 1930: lr=1.00E-05, loss= 1.4096 (max= 2.4736), tps=18163, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:06,342 - root - INFO - Step 1930: lr=1.00E-05, loss= 1.4096 (max= 2.4736), tps=18163, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:24,394 - root - INFO - Step 1940: lr=1.00E-05, loss= 1.4300 (max= 3.3498), tps=18155, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:24,394 - root - INFO - Step 1940: lr=1.00E-05, loss= 1.4300 (max= 3.3498), tps=18155, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:24,394 - root - INFO - Step 1940: lr=1.00E-05, loss= 1.4300 (max= 3.3498), tps=18155, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:24,394 - root - INFO - Step 1940: lr=1.00E-05, loss= 1.4300 (max= 3.3498), tps=18155, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:24,394 - root - INFO - Step 1940: lr=1.00E-05, loss= 1.4300 (max= 3.3498), tps=18155, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:24,394 - root - INFO - Step 1940: lr=1.00E-05, loss= 1.4300 (max= 3.3498), tps=18155, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:24,395 - root - INFO - Step 1940: lr=1.00E-05, loss= 1.4300 (max= 3.3498), tps=18155, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:24,394 - root - INFO - Step 1940: lr=1.00E-05, loss= 1.4300 (max= 3.3498), tps=18155, mfu=37.83%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:42,601 - root - INFO - Step 1950: lr=1.00E-05, loss= 1.3689 (max= 2.7486), tps=18001, mfu=37.51%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:42,601 - root - INFO - Step 1950: lr=1.00E-05, loss= 1.3689 (max= 2.7486), tps=18001, mfu=37.51%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:42,601 - root - INFO - Step 1950: lr=1.00E-05, loss= 1.3689 (max= 2.7486), tps=18001, mfu=37.51%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:42,601 - root - INFO - Step 1950: lr=1.00E-05, loss= 1.3689 (max= 2.7486), tps=18001, mfu=37.51%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:42,601 - root - INFO - Step 1950: lr=1.00E-05, loss= 1.3689 (max= 2.7486), tps=18001, mfu=37.51%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:42,601 - root - INFO - Step 1950: lr=1.00E-05, loss= 1.3689 (max= 2.7486), tps=18001, mfu=37.51%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:42,601 - root - INFO - Step 1950: lr=1.00E-05, loss= 1.3689 (max= 2.7486), tps=18001, mfu=37.51%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:21:42,601 - root - INFO - Step 1950: lr=1.00E-05, loss= 1.3689 (max= 2.7486), tps=18001, mfu=37.51%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:00,621 - root - INFO - Step 1960: lr=1.00E-05, loss= 1.4067 (max= 3.1148), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:00,621 - root - INFO - Step 1960: lr=1.00E-05, loss= 1.4067 (max= 3.1148), tps=18187, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:00,621 - root - INFO - Step 1960: lr=1.00E-05, loss= 1.4067 (max= 3.1148), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:00,621 - root - INFO - Step 1960: lr=1.00E-05, loss= 1.4067 (max= 3.1148), tps=18188, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:00,621 - root - INFO - Step 1960: lr=1.00E-05, loss= 1.4067 (max= 3.1148), tps=18187, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:00,621 - root - INFO - Step 1960: lr=1.00E-05, loss= 1.4067 (max= 3.1148), tps=18187, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:00,621 - root - INFO - Step 1960: lr=1.00E-05, loss= 1.4067 (max= 3.1148), tps=18187, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:00,622 - root - INFO - Step 1960: lr=1.00E-05, loss= 1.4067 (max= 3.1148), tps=18187, mfu=37.89%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:18,655 - root - INFO - Step 1970: lr=1.00E-05, loss= 1.4027 (max= 2.6892), tps=18174, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:18,655 - root - INFO - Step 1970: lr=1.00E-05, loss= 1.4027 (max= 2.6892), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:18,655 - root - INFO - Step 1970: lr=1.00E-05, loss= 1.4027 (max= 2.6892), tps=18174, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:18,655 - root - INFO - Step 1970: lr=1.00E-05, loss= 1.4027 (max= 2.6892), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:18,655 - root - INFO - Step 1970: lr=1.00E-05, loss= 1.4027 (max= 2.6892), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:18,656 - root - INFO - Step 1970: lr=1.00E-05, loss= 1.4027 (max= 2.6892), tps=18174, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:18,656 - root - INFO - Step 1970: lr=1.00E-05, loss= 1.4027 (max= 2.6892), tps=18174, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:18,656 - root - INFO - Step 1970: lr=1.00E-05, loss= 1.4027 (max= 2.6892), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:36,695 - root - INFO - Step 1980: lr=1.00E-05, loss= 1.4019 (max= 2.3489), tps=18168, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:36,695 - root - INFO - Step 1980: lr=1.00E-05, loss= 1.4019 (max= 2.3489), tps=18168, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:36,695 - root - INFO - Step 1980: lr=1.00E-05, loss= 1.4019 (max= 2.3489), tps=18168, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:36,695 - root - INFO - Step 1980: lr=1.00E-05, loss= 1.4019 (max= 2.3489), tps=18168, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:36,696 - root - INFO - Step 1980: lr=1.00E-05, loss= 1.4019 (max= 2.3489), tps=18168, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:36,696 - root - INFO - Step 1980: lr=1.00E-05, loss= 1.4019 (max= 2.3489), tps=18167, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:36,696 - root - INFO - Step 1980: lr=1.00E-05, loss= 1.4019 (max= 2.3489), tps=18168, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:36,696 - root - INFO - Step 1980: lr=1.00E-05, loss= 1.4019 (max= 2.3489), tps=18168, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:54,738 - root - INFO - Step 1990: lr=1.00E-05, loss= 1.4149 (max= 2.4648), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:54,738 - root - INFO - Step 1990: lr=1.00E-05, loss= 1.4149 (max= 2.4648), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:54,738 - root - INFO - Step 1990: lr=1.00E-05, loss= 1.4149 (max= 2.4648), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:54,738 - root - INFO - Step 1990: lr=1.00E-05, loss= 1.4149 (max= 2.4648), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:54,738 - root - INFO - Step 1990: lr=1.00E-05, loss= 1.4149 (max= 2.4648), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:54,738 - root - INFO - Step 1990: lr=1.00E-05, loss= 1.4149 (max= 2.4648), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:54,738 - root - INFO - Step 1990: lr=1.00E-05, loss= 1.4149 (max= 2.4648), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:22:54,738 - root - INFO - Step 1990: lr=1.00E-05, loss= 1.4149 (max= 2.4648), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-2000 +2025-10-24 10:23:12,783 - root - INFO - Step 2000: lr=1.00E-05, loss= 1.4143 (max= 2.6227), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:23:12,783 - root - INFO - Step 2000: lr=1.00E-05, loss= 1.4143 (max= 2.6227), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:23:12,783 - root - INFO - Step 2000: lr=1.00E-05, loss= 1.4143 (max= 2.6227), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:23:12,783 - root - INFO - Step 2000: lr=1.00E-05, loss= 1.4143 (max= 2.6227), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:23:12,783 - root - INFO - Saving a full checkpoint at step 2000 +2025-10-24 10:23:12,783 - root - INFO - Saving a full checkpoint at step 2000 +2025-10-24 10:23:12,783 - root - INFO - Step 2000: lr=1.00E-05, loss= 1.4143 (max= 2.6227), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:23:12,783 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 10:23:12,783 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 10:23:12,783 - root - INFO - Saving a full checkpoint at step 2000 +2025-10-24 10:23:12,783 - root - INFO - Saving a full checkpoint at step 2000 +2025-10-24 10:23:12,783 - root - INFO - Step 2000: lr=1.00E-05, loss= 1.4143 (max= 2.6227), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:23:12,783 - root - INFO - Saving a full checkpoint at step 2000 +2025-10-24 10:23:12,783 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 10:23:12,783 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 10:23:12,783 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 10:23:12,783 - root - INFO - Saving a full checkpoint at step 2000 +2025-10-24 10:23:12,783 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 10:23:12,783 - root - INFO - Step 2000: lr=1.00E-05, loss= 1.4143 (max= 2.6227), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:23:12,783 - root - INFO - Saving a full checkpoint at step 2000 +2025-10-24 10:23:12,783 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 10:23:12,784 - root - INFO - Step 2000: lr=1.00E-05, loss= 1.4143 (max= 2.6227), tps=18163, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:23:12,785 - root - INFO - Saving a full checkpoint at step 2000 +2025-10-24 10:23:12,785 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-2000! Save time: 4.664669752120972 +2025-10-24 10:23:33,849 - root - INFO - Finished saving the checkpoint in 21.07 seconds +2025-10-24 10:23:33,857 - root - INFO - Finished saving the checkpoint in 21.07 seconds +2025-10-24 10:23:33,858 - root - INFO - Finished saving the checkpoint in 21.07 seconds +2025-10-24 10:23:33,859 - root - INFO - Finished saving the checkpoint in 21.08 seconds +2025-10-24 10:23:33,859 - root - INFO - Finished saving the checkpoint in 21.08 seconds +2025-10-24 10:23:33,861 - root - INFO - Finished saving the checkpoint in 21.08 seconds +2025-10-24 10:23:33,861 - root - INFO - Finished saving the checkpoint in 21.08 seconds +2025-10-24 10:23:33,862 - root - INFO - Finished saving the checkpoint in 21.08 seconds +2025-10-24 10:23:51,782 - root - INFO - Step 2010: lr=1.00E-05, loss= 1.3985 (max= 2.3526), tps=8403, mfu=17.51%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 10:23:51,782 - root - INFO - Step 2010: lr=1.00E-05, loss= 1.3985 (max= 2.3526), tps=8403, mfu=17.51%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 10:23:51,782 - root - INFO - Step 2010: lr=1.00E-05, loss= 1.3985 (max= 2.3526), tps=8403, mfu=17.51%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 10:23:51,782 - root - INFO - Step 2010: lr=1.00E-05, loss= 1.3985 (max= 2.3526), tps=8403, mfu=17.51%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 10:23:51,782 - root - INFO - Step 2010: lr=1.00E-05, loss= 1.3985 (max= 2.3526), tps=8403, mfu=17.51%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 10:23:51,783 - root - INFO - Step 2010: lr=1.00E-05, loss= 1.3985 (max= 2.3526), tps=8403, mfu=17.51%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 10:23:51,783 - root - INFO - Step 2010: lr=1.00E-05, loss= 1.3985 (max= 2.3526), tps=8403, mfu=17.51%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 10:23:51,783 - root - INFO - Step 2010: lr=1.00E-05, loss= 1.3985 (max= 2.3526), tps=8403, mfu=17.51%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 10:24:09,830 - root - INFO - Step 2020: lr=1.00E-05, loss= 1.4255 (max= 2.8679), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:24:09,830 - root - INFO - Step 2020: lr=1.00E-05, loss= 1.4255 (max= 2.8679), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:24:09,830 - root - INFO - Step 2020: lr=1.00E-05, loss= 1.4255 (max= 2.8679), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:24:09,830 - root - INFO - Step 2020: lr=1.00E-05, loss= 1.4255 (max= 2.8679), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:24:09,830 - root - INFO - Step 2020: lr=1.00E-05, loss= 1.4255 (max= 2.8679), tps=18166, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:24:09,830 - root - INFO - Step 2020: lr=1.00E-05, loss= 1.4255 (max= 2.8679), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:24:09,830 - root - INFO - Step 2020: lr=1.00E-05, loss= 1.4255 (max= 2.8679), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:24:09,833 - root - INFO - Step 2020: lr=1.00E-05, loss= 1.4255 (max= 2.8679), tps=18161, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:24:27,860 - root - INFO - Step 2030: lr=1.00E-05, loss= 1.3886 (max= 2.4044), tps=18178, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:24:27,860 - root - INFO - Step 2030: lr=1.00E-05, loss= 1.3886 (max= 2.4044), tps=18178, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:24:27,860 - root - INFO - Step 2030: lr=1.00E-05, loss= 1.3886 (max= 2.4044), tps=18178, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:24:27,860 - root - INFO - Step 2030: lr=1.00E-05, loss= 1.3886 (max= 2.4044), tps=18178, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:24:27,860 - root - INFO - Step 2030: lr=1.00E-05, loss= 1.3886 (max= 2.4044), tps=18178, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:24:27,860 - root - INFO - Step 2030: lr=1.00E-05, loss= 1.3886 (max= 2.4044), tps=18181, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:24:27,860 - root - INFO - Step 2030: lr=1.00E-05, loss= 1.3886 (max= 2.4044), tps=18178, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:24:27,860 - root - INFO - Step 2030: lr=1.00E-05, loss= 1.3886 (max= 2.4044), tps=18178, mfu=37.87%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:24:45,904 - root - INFO - Step 2040: lr=1.00E-05, loss= 1.3866 (max= 2.3568), tps=18163, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:24:45,904 - root - INFO - Step 2040: lr=1.00E-05, loss= 1.3866 (max= 2.3568), tps=18164, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:24:45,904 - root - INFO - Step 2040: lr=1.00E-05, loss= 1.3866 (max= 2.3568), tps=18164, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:24:45,904 - root - INFO - Step 2040: lr=1.00E-05, loss= 1.3866 (max= 2.3568), tps=18164, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:24:45,904 - root - INFO - Step 2040: lr=1.00E-05, loss= 1.3866 (max= 2.3568), tps=18164, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:24:45,905 - root - INFO - Step 2040: lr=1.00E-05, loss= 1.3866 (max= 2.3568), tps=18164, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:24:45,905 - root - INFO - Step 2040: lr=1.00E-05, loss= 1.3866 (max= 2.3568), tps=18163, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:24:45,905 - root - INFO - Step 2040: lr=1.00E-05, loss= 1.3866 (max= 2.3568), tps=18164, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:25:03,893 - root - INFO - Step 2050: lr=1.00E-05, loss= 1.4144 (max= 2.7958), tps=18220, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:25:03,893 - root - INFO - Step 2050: lr=1.00E-05, loss= 1.4144 (max= 2.7958), tps=18220, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:25:03,893 - root - INFO - Step 2050: lr=1.00E-05, loss= 1.4144 (max= 2.7958), tps=18220, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:25:03,893 - root - INFO - Step 2050: lr=1.00E-05, loss= 1.4144 (max= 2.7958), tps=18220, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:25:03,893 - root - INFO - Step 2050: lr=1.00E-05, loss= 1.4144 (max= 2.7958), tps=18220, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:25:03,893 - root - INFO - Step 2050: lr=1.00E-05, loss= 1.4144 (max= 2.7958), tps=18220, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:25:03,893 - root - INFO - Step 2050: lr=1.00E-05, loss= 1.4144 (max= 2.7958), tps=18220, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:25:03,894 - root - INFO - Step 2050: lr=1.00E-05, loss= 1.4144 (max= 2.7958), tps=18220, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:25:21,930 - root - INFO - Step 2060: lr=1.00E-05, loss= 1.3833 (max= 2.5823), tps=18171, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:25:21,930 - root - INFO - Step 2060: lr=1.00E-05, loss= 1.3833 (max= 2.5823), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:25:21,930 - root - INFO - Step 2060: lr=1.00E-05, loss= 1.3833 (max= 2.5823), tps=18171, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:25:21,930 - root - INFO - Step 2060: lr=1.00E-05, loss= 1.3833 (max= 2.5823), tps=18171, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:25:21,930 - root - INFO - Step 2060: lr=1.00E-05, loss= 1.3833 (max= 2.5823), tps=18172, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:25:21,930 - root - INFO - Step 2060: lr=1.00E-05, loss= 1.3833 (max= 2.5823), tps=18171, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:25:21,930 - root - INFO - Step 2060: lr=1.00E-05, loss= 1.3833 (max= 2.5823), tps=18171, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:25:21,930 - root - INFO - Step 2060: lr=1.00E-05, loss= 1.3833 (max= 2.5823), tps=18171, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:25:39,978 - root - INFO - Step 2070: lr=1.00E-05, loss= 1.3915 (max= 2.6895), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:25:39,978 - root - INFO - Step 2070: lr=1.00E-05, loss= 1.3915 (max= 2.6895), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:25:39,978 - root - INFO - Step 2070: lr=1.00E-05, loss= 1.3915 (max= 2.6895), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:25:39,978 - root - INFO - Step 2070: lr=1.00E-05, loss= 1.3915 (max= 2.6895), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:25:39,978 - root - INFO - Step 2070: lr=1.00E-05, loss= 1.3915 (max= 2.6895), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:25:39,978 - root - INFO - Step 2070: lr=1.00E-05, loss= 1.3915 (max= 2.6895), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:25:39,978 - root - INFO - Step 2070: lr=1.00E-05, loss= 1.3915 (max= 2.6895), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:25:39,978 - root - INFO - Step 2070: lr=1.00E-05, loss= 1.3915 (max= 2.6895), tps=18160, mfu=37.84%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:25:58,015 - root - INFO - Step 2080: lr=1.00E-05, loss= 1.4018 (max= 3.3197), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:25:58,015 - root - INFO - Step 2080: lr=1.00E-05, loss= 1.4018 (max= 3.3197), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:25:58,016 - root - INFO - Step 2080: lr=1.00E-05, loss= 1.4018 (max= 3.3197), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:25:58,016 - root - INFO - Step 2080: lr=1.00E-05, loss= 1.4018 (max= 3.3197), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:25:58,016 - root - INFO - Step 2080: lr=1.00E-05, loss= 1.4018 (max= 3.3197), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:25:58,016 - root - INFO - Step 2080: lr=1.00E-05, loss= 1.4018 (max= 3.3197), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:25:58,016 - root - INFO - Step 2080: lr=1.00E-05, loss= 1.4018 (max= 3.3197), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:25:58,016 - root - INFO - Step 2080: lr=1.00E-05, loss= 1.4018 (max= 3.3197), tps=18170, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:16,029 - root - INFO - Step 2090: lr=1.00E-05, loss= 1.4063 (max= 3.1029), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:16,029 - root - INFO - Step 2090: lr=1.00E-05, loss= 1.4063 (max= 3.1029), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:16,029 - root - INFO - Step 2090: lr=1.00E-05, loss= 1.4063 (max= 3.1029), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:16,029 - root - INFO - Step 2090: lr=1.00E-05, loss= 1.4063 (max= 3.1029), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:16,030 - root - INFO - Step 2090: lr=1.00E-05, loss= 1.4063 (max= 3.1029), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:16,030 - root - INFO - Step 2090: lr=1.00E-05, loss= 1.4063 (max= 3.1029), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:16,030 - root - INFO - Step 2090: lr=1.00E-05, loss= 1.4063 (max= 3.1029), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:16,030 - root - INFO - Step 2090: lr=1.00E-05, loss= 1.4063 (max= 3.1029), tps=18194, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:34,019 - root - INFO - Step 2100: lr=1.00E-05, loss= 1.4150 (max= 2.5632), tps=18219, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:34,019 - root - INFO - Step 2100: lr=1.00E-05, loss= 1.4150 (max= 2.5632), tps=18219, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:34,019 - root - INFO - Step 2100: lr=1.00E-05, loss= 1.4150 (max= 2.5632), tps=18219, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:34,019 - root - INFO - Step 2100: lr=1.00E-05, loss= 1.4150 (max= 2.5632), tps=18219, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:34,019 - root - INFO - Step 2100: lr=1.00E-05, loss= 1.4150 (max= 2.5632), tps=18219, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:34,019 - root - INFO - Step 2100: lr=1.00E-05, loss= 1.4150 (max= 2.5632), tps=18219, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:34,019 - root - INFO - Step 2100: lr=1.00E-05, loss= 1.4150 (max= 2.5632), tps=18219, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:34,019 - root - INFO - Step 2100: lr=1.00E-05, loss= 1.4150 (max= 2.5632), tps=18219, mfu=37.96%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:52,031 - root - INFO - Step 2110: lr=1.00E-05, loss= 1.4144 (max= 4.0602), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:52,031 - root - INFO - Step 2110: lr=1.00E-05, loss= 1.4144 (max= 4.0602), tps=18197, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:52,031 - root - INFO - Step 2110: lr=1.00E-05, loss= 1.4144 (max= 4.0602), tps=18197, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:52,031 - root - INFO - Step 2110: lr=1.00E-05, loss= 1.4144 (max= 4.0602), tps=18197, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:52,031 - root - INFO - Step 2110: lr=1.00E-05, loss= 1.4144 (max= 4.0602), tps=18197, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:52,031 - root - INFO - Step 2110: lr=1.00E-05, loss= 1.4144 (max= 4.0602), tps=18197, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:52,031 - root - INFO - Step 2110: lr=1.00E-05, loss= 1.4144 (max= 4.0602), tps=18197, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:26:52,031 - root - INFO - Step 2110: lr=1.00E-05, loss= 1.4144 (max= 4.0602), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:27:10,036 - root - INFO - Step 2120: lr=1.00E-05, loss= 1.4063 (max= 2.5335), tps=18203, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:27:10,036 - root - INFO - Step 2120: lr=1.00E-05, loss= 1.4063 (max= 2.5335), tps=18203, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:27:10,036 - root - INFO - Step 2120: lr=1.00E-05, loss= 1.4063 (max= 2.5335), tps=18203, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:27:10,036 - root - INFO - Step 2120: lr=1.00E-05, loss= 1.4063 (max= 2.5335), tps=18203, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:27:10,036 - root - INFO - Step 2120: lr=1.00E-05, loss= 1.4063 (max= 2.5335), tps=18203, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:27:10,036 - root - INFO - Step 2120: lr=1.00E-05, loss= 1.4063 (max= 2.5335), tps=18203, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:27:10,036 - root - INFO - Step 2120: lr=1.00E-05, loss= 1.4063 (max= 2.5335), tps=18203, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:27:10,036 - root - INFO - Step 2120: lr=1.00E-05, loss= 1.4063 (max= 2.5335), tps=18203, mfu=37.93%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:27:28,070 - root - INFO - Step 2130: lr=1.00E-05, loss= 1.4132 (max= 3.4430), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:27:28,070 - root - INFO - Step 2130: lr=1.00E-05, loss= 1.4132 (max= 3.4430), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:27:28,070 - root - INFO - Step 2130: lr=1.00E-05, loss= 1.4132 (max= 3.4430), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:27:28,071 - root - INFO - Step 2130: lr=1.00E-05, loss= 1.4132 (max= 3.4430), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:27:28,071 - root - INFO - Step 2130: lr=1.00E-05, loss= 1.4132 (max= 3.4430), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:27:28,071 - root - INFO - Step 2130: lr=1.00E-05, loss= 1.4132 (max= 3.4430), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:27:28,071 - root - INFO - Step 2130: lr=1.00E-05, loss= 1.4132 (max= 3.4430), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:27:28,071 - root - INFO - Step 2130: lr=1.00E-05, loss= 1.4132 (max= 3.4430), tps=18173, mfu=37.86%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:27:46,098 - root - INFO - Step 2140: lr=1.00E-05, loss= 1.4309 (max= 3.0014), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:27:46,098 - root - INFO - Step 2140: lr=1.00E-05, loss= 1.4309 (max= 3.0014), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:27:46,098 - root - INFO - Step 2140: lr=1.00E-05, loss= 1.4309 (max= 3.0014), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:27:46,098 - root - INFO - Step 2140: lr=1.00E-05, loss= 1.4309 (max= 3.0014), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:27:46,098 - root - INFO - Step 2140: lr=1.00E-05, loss= 1.4309 (max= 3.0014), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:27:46,098 - root - INFO - Step 2140: lr=1.00E-05, loss= 1.4309 (max= 3.0014), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:27:46,098 - root - INFO - Step 2140: lr=1.00E-05, loss= 1.4309 (max= 3.0014), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:27:46,098 - root - INFO - Step 2140: lr=1.00E-05, loss= 1.4309 (max= 3.0014), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:04,126 - root - INFO - Step 2150: lr=1.00E-05, loss= 1.4120 (max= 3.2652), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:04,126 - root - INFO - Step 2150: lr=1.00E-05, loss= 1.4120 (max= 3.2652), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:04,126 - root - INFO - Step 2150: lr=1.00E-05, loss= 1.4120 (max= 3.2652), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:04,126 - root - INFO - Step 2150: lr=1.00E-05, loss= 1.4120 (max= 3.2652), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:04,126 - root - INFO - Step 2150: lr=1.00E-05, loss= 1.4120 (max= 3.2652), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:04,126 - root - INFO - Step 2150: lr=1.00E-05, loss= 1.4120 (max= 3.2652), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:04,126 - root - INFO - Step 2150: lr=1.00E-05, loss= 1.4120 (max= 3.2652), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:04,127 - root - INFO - Step 2150: lr=1.00E-05, loss= 1.4120 (max= 3.2652), tps=18180, mfu=37.88%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:22,138 - root - INFO - Step 2160: lr=1.00E-05, loss= 1.4076 (max= 3.4379), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:22,138 - root - INFO - Step 2160: lr=1.00E-05, loss= 1.4076 (max= 3.4379), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:22,138 - root - INFO - Step 2160: lr=1.00E-05, loss= 1.4076 (max= 3.4379), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:22,138 - root - INFO - Step 2160: lr=1.00E-05, loss= 1.4076 (max= 3.4379), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:22,138 - root - INFO - Step 2160: lr=1.00E-05, loss= 1.4076 (max= 3.4379), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:22,138 - root - INFO - Step 2160: lr=1.00E-05, loss= 1.4076 (max= 3.4379), tps=18197, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:22,138 - root - INFO - Step 2160: lr=1.00E-05, loss= 1.4076 (max= 3.4379), tps=18196, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:22,138 - root - INFO - Step 2160: lr=1.00E-05, loss= 1.4076 (max= 3.4379), tps=18197, mfu=37.91%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:40,156 - root - INFO - Step 2170: lr=1.00E-05, loss= 1.4098 (max= 2.5731), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:40,156 - root - INFO - Step 2170: lr=1.00E-05, loss= 1.4098 (max= 2.5731), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:40,156 - root - INFO - Step 2170: lr=1.00E-05, loss= 1.4098 (max= 2.5731), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:40,156 - root - INFO - Step 2170: lr=1.00E-05, loss= 1.4098 (max= 2.5731), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:40,156 - root - INFO - Step 2170: lr=1.00E-05, loss= 1.4098 (max= 2.5731), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:40,156 - root - INFO - Step 2170: lr=1.00E-05, loss= 1.4098 (max= 2.5731), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:40,156 - root - INFO - Step 2170: lr=1.00E-05, loss= 1.4098 (max= 2.5731), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:40,156 - root - INFO - Step 2170: lr=1.00E-05, loss= 1.4098 (max= 2.5731), tps=18190, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:28:58,175 - root - INFO - Step 2180: lr=1.00E-05, loss= 1.4373 (max= 2.4259), tps=18189, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:28:58,175 - root - INFO - Step 2180: lr=1.00E-05, loss= 1.4373 (max= 2.4259), tps=18189, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:28:58,175 - root - INFO - Step 2180: lr=1.00E-05, loss= 1.4373 (max= 2.4259), tps=18189, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:28:58,175 - root - INFO - Step 2180: lr=1.00E-05, loss= 1.4373 (max= 2.4259), tps=18189, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:28:58,175 - root - INFO - Step 2180: lr=1.00E-05, loss= 1.4373 (max= 2.4259), tps=18189, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:28:58,175 - root - INFO - Step 2180: lr=1.00E-05, loss= 1.4373 (max= 2.4259), tps=18189, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:28:58,175 - root - INFO - Step 2180: lr=1.00E-05, loss= 1.4373 (max= 2.4259), tps=18189, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:28:58,176 - root - INFO - Step 2180: lr=1.00E-05, loss= 1.4373 (max= 2.4259), tps=18188, mfu=37.90%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:16,229 - root - INFO - Step 2190: lr=1.00E-05, loss= 1.4197 (max= 3.9108), tps=18153, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:16,229 - root - INFO - Step 2190: lr=1.00E-05, loss= 1.4197 (max= 3.9108), tps=18154, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:16,229 - root - INFO - Step 2190: lr=1.00E-05, loss= 1.4197 (max= 3.9108), tps=18153, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:16,229 - root - INFO - Step 2190: lr=1.00E-05, loss= 1.4197 (max= 3.9108), tps=18154, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:16,229 - root - INFO - Step 2190: lr=1.00E-05, loss= 1.4197 (max= 3.9108), tps=18154, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:16,230 - root - INFO - Step 2190: lr=1.00E-05, loss= 1.4197 (max= 3.9108), tps=18154, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:16,230 - root - INFO - Step 2190: lr=1.00E-05, loss= 1.4197 (max= 3.9108), tps=18154, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:16,230 - root - INFO - Step 2190: lr=1.00E-05, loss= 1.4197 (max= 3.9108), tps=18153, mfu=37.82%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:34,297 - root - INFO - Step 2200: lr=1.00E-05, loss= 1.3693 (max= 2.4807), tps=18140, mfu=37.79%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:34,297 - root - INFO - Step 2200: lr=1.00E-05, loss= 1.3693 (max= 2.4807), tps=18140, mfu=37.80%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:34,297 - root - INFO - Step 2200: lr=1.00E-05, loss= 1.3693 (max= 2.4807), tps=18140, mfu=37.80%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:34,297 - root - INFO - Step 2200: lr=1.00E-05, loss= 1.3693 (max= 2.4807), tps=18140, mfu=37.80%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:34,297 - root - INFO - Step 2200: lr=1.00E-05, loss= 1.3693 (max= 2.4807), tps=18140, mfu=37.80%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:34,297 - root - INFO - Step 2200: lr=1.00E-05, loss= 1.3693 (max= 2.4807), tps=18140, mfu=37.80%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:34,297 - root - INFO - Step 2200: lr=1.00E-05, loss= 1.3693 (max= 2.4807), tps=18140, mfu=37.80%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:34,297 - root - INFO - Step 2200: lr=1.00E-05, loss= 1.3693 (max= 2.4807), tps=18140, mfu=37.80%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:52,339 - root - INFO - Step 2210: lr=1.00E-05, loss= 1.3999 (max= 3.0286), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:52,339 - root - INFO - Step 2210: lr=1.00E-05, loss= 1.3999 (max= 3.0286), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:52,339 - root - INFO - Step 2210: lr=1.00E-05, loss= 1.3999 (max= 3.0286), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:52,340 - root - INFO - Step 2210: lr=1.00E-05, loss= 1.3999 (max= 3.0286), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:52,340 - root - INFO - Step 2210: lr=1.00E-05, loss= 1.3999 (max= 3.0286), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:52,340 - root - INFO - Step 2210: lr=1.00E-05, loss= 1.3999 (max= 3.0286), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:52,340 - root - INFO - Step 2210: lr=1.00E-05, loss= 1.3999 (max= 3.0286), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:29:52,340 - root - INFO - Step 2210: lr=1.00E-05, loss= 1.3999 (max= 3.0286), tps=18165, mfu=37.85%, memory: 78.53GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:10,396 - root - INFO - Step 2220: lr=1.00E-05, loss= 1.4018 (max= 2.2646), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:10,396 - root - INFO - Step 2220: lr=1.00E-05, loss= 1.4018 (max= 2.2646), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:10,396 - root - INFO - Step 2220: lr=1.00E-05, loss= 1.4018 (max= 2.2646), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:10,396 - root - INFO - Step 2220: lr=1.00E-05, loss= 1.4018 (max= 2.2646), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:10,397 - root - INFO - Step 2220: lr=1.00E-05, loss= 1.4018 (max= 2.2646), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:10,397 - root - INFO - Step 2220: lr=1.00E-05, loss= 1.4018 (max= 2.2646), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:10,397 - root - INFO - Step 2220: lr=1.00E-05, loss= 1.4018 (max= 2.2646), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:10,397 - root - INFO - Step 2220: lr=1.00E-05, loss= 1.4018 (max= 2.2646), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:28,430 - root - INFO - Step 2230: lr=1.00E-05, loss= 1.3817 (max= 2.3148), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:28,431 - root - INFO - Step 2230: lr=1.00E-05, loss= 1.3817 (max= 2.3148), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:28,431 - root - INFO - Step 2230: lr=1.00E-05, loss= 1.3817 (max= 2.3148), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:28,431 - root - INFO - Step 2230: lr=1.00E-05, loss= 1.3817 (max= 2.3148), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:28,431 - root - INFO - Step 2230: lr=1.00E-05, loss= 1.3817 (max= 2.3148), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:28,431 - root - INFO - Step 2230: lr=1.00E-05, loss= 1.3817 (max= 2.3148), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:28,431 - root - INFO - Step 2230: lr=1.00E-05, loss= 1.3817 (max= 2.3148), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:28,431 - root - INFO - Step 2230: lr=1.00E-05, loss= 1.3817 (max= 2.3148), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:46,442 - root - INFO - Step 2240: lr=1.00E-05, loss= 1.4155 (max= 2.7230), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:46,442 - root - INFO - Step 2240: lr=1.00E-05, loss= 1.4155 (max= 2.7230), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:46,442 - root - INFO - Step 2240: lr=1.00E-05, loss= 1.4155 (max= 2.7230), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:46,442 - root - INFO - Step 2240: lr=1.00E-05, loss= 1.4155 (max= 2.7230), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:46,442 - root - INFO - Step 2240: lr=1.00E-05, loss= 1.4155 (max= 2.7230), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:46,442 - root - INFO - Step 2240: lr=1.00E-05, loss= 1.4155 (max= 2.7230), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:46,442 - root - INFO - Step 2240: lr=1.00E-05, loss= 1.4155 (max= 2.7230), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:30:46,442 - root - INFO - Step 2240: lr=1.00E-05, loss= 1.4155 (max= 2.7230), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:31:04,510 - root - INFO - Step 2250: lr=1.00E-05, loss= 1.4049 (max= 2.5348), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:31:04,510 - root - INFO - Step 2250: lr=1.00E-05, loss= 1.4049 (max= 2.5348), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:31:04,510 - root - INFO - Step 2250: lr=1.00E-05, loss= 1.4049 (max= 2.5348), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:31:04,510 - root - INFO - Step 2250: lr=1.00E-05, loss= 1.4049 (max= 2.5348), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:31:04,510 - root - INFO - Step 2250: lr=1.00E-05, loss= 1.4049 (max= 2.5348), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:31:04,510 - root - INFO - Step 2250: lr=1.00E-05, loss= 1.4049 (max= 2.5348), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:31:04,510 - root - INFO - Step 2250: lr=1.00E-05, loss= 1.4049 (max= 2.5348), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:31:04,510 - root - INFO - Step 2250: lr=1.00E-05, loss= 1.4049 (max= 2.5348), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:31:22,559 - root - INFO - Step 2260: lr=1.00E-05, loss= 1.4017 (max= 2.5553), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:22,559 - root - INFO - Step 2260: lr=1.00E-05, loss= 1.4017 (max= 2.5553), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:22,560 - root - INFO - Step 2260: lr=1.00E-05, loss= 1.4017 (max= 2.5553), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:22,560 - root - INFO - Step 2260: lr=1.00E-05, loss= 1.4017 (max= 2.5553), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:22,560 - root - INFO - Step 2260: lr=1.00E-05, loss= 1.4017 (max= 2.5553), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:22,560 - root - INFO - Step 2260: lr=1.00E-05, loss= 1.4017 (max= 2.5553), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:22,560 - root - INFO - Step 2260: lr=1.00E-05, loss= 1.4017 (max= 2.5553), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:22,560 - root - INFO - Step 2260: lr=1.00E-05, loss= 1.4017 (max= 2.5553), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:40,552 - root - INFO - Step 2270: lr=1.00E-05, loss= 1.4250 (max= 2.4524), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:40,552 - root - INFO - Step 2270: lr=1.00E-05, loss= 1.4250 (max= 2.4524), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:40,552 - root - INFO - Step 2270: lr=1.00E-05, loss= 1.4250 (max= 2.4524), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:40,553 - root - INFO - Step 2270: lr=1.00E-05, loss= 1.4250 (max= 2.4524), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:40,553 - root - INFO - Step 2270: lr=1.00E-05, loss= 1.4250 (max= 2.4524), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:40,553 - root - INFO - Step 2270: lr=1.00E-05, loss= 1.4250 (max= 2.4524), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:40,553 - root - INFO - Step 2270: lr=1.00E-05, loss= 1.4250 (max= 2.4524), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:40,553 - root - INFO - Step 2270: lr=1.00E-05, loss= 1.4250 (max= 2.4524), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:58,568 - root - INFO - Step 2280: lr=1.00E-05, loss= 1.3954 (max= 2.4549), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:58,568 - root - INFO - Step 2280: lr=1.00E-05, loss= 1.3954 (max= 2.4549), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:58,569 - root - INFO - Step 2280: lr=1.00E-05, loss= 1.3954 (max= 2.4549), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:58,569 - root - INFO - Step 2280: lr=1.00E-05, loss= 1.3954 (max= 2.4549), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:58,569 - root - INFO - Step 2280: lr=1.00E-05, loss= 1.3954 (max= 2.4549), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:58,569 - root - INFO - Step 2280: lr=1.00E-05, loss= 1.3954 (max= 2.4549), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:58,569 - root - INFO - Step 2280: lr=1.00E-05, loss= 1.3954 (max= 2.4549), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:31:58,569 - root - INFO - Step 2280: lr=1.00E-05, loss= 1.3954 (max= 2.4549), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:16,548 - root - INFO - Step 2290: lr=1.00E-05, loss= 1.3608 (max= 2.3837), tps=18229, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:16,548 - root - INFO - Step 2290: lr=1.00E-05, loss= 1.3608 (max= 2.3837), tps=18229, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:16,548 - root - INFO - Step 2290: lr=1.00E-05, loss= 1.3608 (max= 2.3837), tps=18229, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:16,548 - root - INFO - Step 2290: lr=1.00E-05, loss= 1.3608 (max= 2.3837), tps=18229, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:16,548 - root - INFO - Step 2290: lr=1.00E-05, loss= 1.3608 (max= 2.3837), tps=18229, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:16,548 - root - INFO - Step 2290: lr=1.00E-05, loss= 1.3608 (max= 2.3837), tps=18229, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:16,548 - root - INFO - Step 2290: lr=1.00E-05, loss= 1.3608 (max= 2.3837), tps=18229, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:16,548 - root - INFO - Step 2290: lr=1.00E-05, loss= 1.3608 (max= 2.3837), tps=18229, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:34,600 - root - INFO - Step 2300: lr=1.00E-05, loss= 1.3538 (max= 2.4049), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:34,600 - root - INFO - Step 2300: lr=1.00E-05, loss= 1.3538 (max= 2.4049), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:34,600 - root - INFO - Step 2300: lr=1.00E-05, loss= 1.3538 (max= 2.4049), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:34,600 - root - INFO - Step 2300: lr=1.00E-05, loss= 1.3538 (max= 2.4049), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:34,600 - root - INFO - Step 2300: lr=1.00E-05, loss= 1.3538 (max= 2.4049), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:34,600 - root - INFO - Step 2300: lr=1.00E-05, loss= 1.3538 (max= 2.4049), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:34,600 - root - INFO - Step 2300: lr=1.00E-05, loss= 1.3538 (max= 2.4049), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:34,600 - root - INFO - Step 2300: lr=1.00E-05, loss= 1.3538 (max= 2.4049), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:52,603 - root - INFO - Step 2310: lr=1.00E-05, loss= 1.3726 (max= 2.4679), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:52,603 - root - INFO - Step 2310: lr=1.00E-05, loss= 1.3726 (max= 2.4679), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:52,604 - root - INFO - Step 2310: lr=1.00E-05, loss= 1.3726 (max= 2.4679), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:52,604 - root - INFO - Step 2310: lr=1.00E-05, loss= 1.3726 (max= 2.4679), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:52,604 - root - INFO - Step 2310: lr=1.00E-05, loss= 1.3726 (max= 2.4679), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:52,604 - root - INFO - Step 2310: lr=1.00E-05, loss= 1.3726 (max= 2.4679), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:52,604 - root - INFO - Step 2310: lr=1.00E-05, loss= 1.3726 (max= 2.4679), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:32:52,604 - root - INFO - Step 2310: lr=1.00E-05, loss= 1.3726 (max= 2.4679), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:33:10,638 - root - INFO - Step 2320: lr=1.00E-05, loss= 1.3843 (max= 2.4642), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:10,638 - root - INFO - Step 2320: lr=1.00E-05, loss= 1.3843 (max= 2.4642), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:10,639 - root - INFO - Step 2320: lr=1.00E-05, loss= 1.3843 (max= 2.4642), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:10,639 - root - INFO - Step 2320: lr=1.00E-05, loss= 1.3843 (max= 2.4642), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:10,639 - root - INFO - Step 2320: lr=1.00E-05, loss= 1.3843 (max= 2.4642), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:10,639 - root - INFO - Step 2320: lr=1.00E-05, loss= 1.3843 (max= 2.4642), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:10,639 - root - INFO - Step 2320: lr=1.00E-05, loss= 1.3843 (max= 2.4642), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:10,639 - root - INFO - Step 2320: lr=1.00E-05, loss= 1.3843 (max= 2.4642), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:28,661 - root - INFO - Step 2330: lr=1.00E-05, loss= 1.3700 (max= 2.8604), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:28,661 - root - INFO - Step 2330: lr=1.00E-05, loss= 1.3700 (max= 2.8604), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:28,661 - root - INFO - Step 2330: lr=1.00E-05, loss= 1.3700 (max= 2.8604), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:28,661 - root - INFO - Step 2330: lr=1.00E-05, loss= 1.3700 (max= 2.8604), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:28,661 - root - INFO - Step 2330: lr=1.00E-05, loss= 1.3700 (max= 2.8604), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:28,661 - root - INFO - Step 2330: lr=1.00E-05, loss= 1.3700 (max= 2.8604), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:28,661 - root - INFO - Step 2330: lr=1.00E-05, loss= 1.3700 (max= 2.8604), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:28,661 - root - INFO - Step 2330: lr=1.00E-05, loss= 1.3700 (max= 2.8604), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:46,729 - root - INFO - Step 2340: lr=1.00E-05, loss= 1.3766 (max= 2.5350), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:46,729 - root - INFO - Step 2340: lr=1.00E-05, loss= 1.3766 (max= 2.5350), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:46,729 - root - INFO - Step 2340: lr=1.00E-05, loss= 1.3766 (max= 2.5350), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:46,729 - root - INFO - Step 2340: lr=1.00E-05, loss= 1.3766 (max= 2.5350), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:46,729 - root - INFO - Step 2340: lr=1.00E-05, loss= 1.3766 (max= 2.5350), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:46,729 - root - INFO - Step 2340: lr=1.00E-05, loss= 1.3766 (max= 2.5350), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:46,729 - root - INFO - Step 2340: lr=1.00E-05, loss= 1.3766 (max= 2.5350), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:33:46,729 - root - INFO - Step 2340: lr=1.00E-05, loss= 1.3766 (max= 2.5350), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:34:04,762 - root - INFO - Step 2350: lr=1.00E-05, loss= 1.4121 (max= 2.5739), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:34:04,762 - root - INFO - Step 2350: lr=1.00E-05, loss= 1.4121 (max= 2.5739), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:34:04,762 - root - INFO - Step 2350: lr=1.00E-05, loss= 1.4121 (max= 2.5739), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:34:04,762 - root - INFO - Step 2350: lr=1.00E-05, loss= 1.4121 (max= 2.5739), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:34:04,762 - root - INFO - Step 2350: lr=1.00E-05, loss= 1.4121 (max= 2.5739), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:34:04,762 - root - INFO - Step 2350: lr=1.00E-05, loss= 1.4121 (max= 2.5739), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:34:04,762 - root - INFO - Step 2350: lr=1.00E-05, loss= 1.4121 (max= 2.5739), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:34:04,762 - root - INFO - Step 2350: lr=1.00E-05, loss= 1.4121 (max= 2.5739), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:34:22,767 - root - INFO - Step 2360: lr=1.00E-05, loss= 1.3767 (max= 2.6270), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:34:22,767 - root - INFO - Step 2360: lr=1.00E-05, loss= 1.3767 (max= 2.6270), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:34:22,767 - root - INFO - Step 2360: lr=1.00E-05, loss= 1.3767 (max= 2.6270), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:34:22,767 - root - INFO - Step 2360: lr=1.00E-05, loss= 1.3767 (max= 2.6270), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:34:22,767 - root - INFO - Step 2360: lr=1.00E-05, loss= 1.3767 (max= 2.6270), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:34:22,767 - root - INFO - Step 2360: lr=1.00E-05, loss= 1.3767 (max= 2.6270), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:34:22,767 - root - INFO - Step 2360: lr=1.00E-05, loss= 1.3767 (max= 2.6270), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:34:22,767 - root - INFO - Step 2360: lr=1.00E-05, loss= 1.3767 (max= 2.6270), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:34:40,776 - root - INFO - Step 2370: lr=1.00E-05, loss= 1.3830 (max= 2.5120), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:34:40,776 - root - INFO - Step 2370: lr=1.00E-05, loss= 1.3830 (max= 2.5120), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:34:40,776 - root - INFO - Step 2370: lr=1.00E-05, loss= 1.3830 (max= 2.5120), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:34:40,777 - root - INFO - Step 2370: lr=1.00E-05, loss= 1.3830 (max= 2.5120), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:34:40,777 - root - INFO - Step 2370: lr=1.00E-05, loss= 1.3830 (max= 2.5120), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:34:40,777 - root - INFO - Step 2370: lr=1.00E-05, loss= 1.3830 (max= 2.5120), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:34:40,777 - root - INFO - Step 2370: lr=1.00E-05, loss= 1.3830 (max= 2.5120), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:34:40,777 - root - INFO - Step 2370: lr=1.00E-05, loss= 1.3830 (max= 2.5120), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:34:58,786 - root - INFO - Step 2380: lr=1.00E-05, loss= 1.3975 (max= 2.8266), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:34:58,786 - root - INFO - Step 2380: lr=1.00E-05, loss= 1.3975 (max= 2.8266), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:34:58,786 - root - INFO - Step 2380: lr=1.00E-05, loss= 1.3975 (max= 2.8266), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:34:58,786 - root - INFO - Step 2380: lr=1.00E-05, loss= 1.3975 (max= 2.8266), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:34:58,786 - root - INFO - Step 2380: lr=1.00E-05, loss= 1.3975 (max= 2.8266), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:34:58,787 - root - INFO - Step 2380: lr=1.00E-05, loss= 1.3975 (max= 2.8266), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:34:58,787 - root - INFO - Step 2380: lr=1.00E-05, loss= 1.3975 (max= 2.8266), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:34:58,787 - root - INFO - Step 2380: lr=1.00E-05, loss= 1.3975 (max= 2.8266), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:16,789 - root - INFO - Step 2390: lr=1.00E-05, loss= 1.3914 (max= 2.6178), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:16,789 - root - INFO - Step 2390: lr=1.00E-05, loss= 1.3914 (max= 2.6178), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:16,790 - root - INFO - Step 2390: lr=1.00E-05, loss= 1.3914 (max= 2.6178), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:16,790 - root - INFO - Step 2390: lr=1.00E-05, loss= 1.3914 (max= 2.6178), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:16,790 - root - INFO - Step 2390: lr=1.00E-05, loss= 1.3914 (max= 2.6178), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:16,790 - root - INFO - Step 2390: lr=1.00E-05, loss= 1.3914 (max= 2.6178), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:16,790 - root - INFO - Step 2390: lr=1.00E-05, loss= 1.3914 (max= 2.6178), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:16,790 - root - INFO - Step 2390: lr=1.00E-05, loss= 1.3914 (max= 2.6178), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:34,815 - root - INFO - Step 2400: lr=1.00E-05, loss= 1.3652 (max= 3.1046), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:34,815 - root - INFO - Step 2400: lr=1.00E-05, loss= 1.3652 (max= 3.1046), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:34,815 - root - INFO - Step 2400: lr=1.00E-05, loss= 1.3652 (max= 3.1046), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:34,816 - root - INFO - Step 2400: lr=1.00E-05, loss= 1.3652 (max= 3.1046), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:34,816 - root - INFO - Step 2400: lr=1.00E-05, loss= 1.3652 (max= 3.1046), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:34,816 - root - INFO - Step 2400: lr=1.00E-05, loss= 1.3652 (max= 3.1046), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:34,816 - root - INFO - Step 2400: lr=1.00E-05, loss= 1.3652 (max= 3.1046), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:34,816 - root - INFO - Step 2400: lr=1.00E-05, loss= 1.3652 (max= 3.1046), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:52,826 - root - INFO - Step 2410: lr=1.00E-05, loss= 1.3666 (max= 2.6062), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:52,826 - root - INFO - Step 2410: lr=1.00E-05, loss= 1.3666 (max= 2.6062), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:52,826 - root - INFO - Step 2410: lr=1.00E-05, loss= 1.3666 (max= 2.6062), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:52,826 - root - INFO - Step 2410: lr=1.00E-05, loss= 1.3666 (max= 2.6062), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:52,826 - root - INFO - Step 2410: lr=1.00E-05, loss= 1.3666 (max= 2.6062), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:52,826 - root - INFO - Step 2410: lr=1.00E-05, loss= 1.3666 (max= 2.6062), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:52,826 - root - INFO - Step 2410: lr=1.00E-05, loss= 1.3666 (max= 2.6062), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:35:52,826 - root - INFO - Step 2410: lr=1.00E-05, loss= 1.3666 (max= 2.6062), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:36:10,881 - root - INFO - Step 2420: lr=1.00E-05, loss= 1.3900 (max= 3.1369), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:36:10,881 - root - INFO - Step 2420: lr=1.00E-05, loss= 1.3900 (max= 3.1369), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:36:10,882 - root - INFO - Step 2420: lr=1.00E-05, loss= 1.3900 (max= 3.1369), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:36:10,882 - root - INFO - Step 2420: lr=1.00E-05, loss= 1.3900 (max= 3.1369), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:36:10,882 - root - INFO - Step 2420: lr=1.00E-05, loss= 1.3900 (max= 3.1369), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:36:10,882 - root - INFO - Step 2420: lr=1.00E-05, loss= 1.3900 (max= 3.1369), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:36:10,882 - root - INFO - Step 2420: lr=1.00E-05, loss= 1.3900 (max= 3.1369), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:36:10,882 - root - INFO - Step 2420: lr=1.00E-05, loss= 1.3900 (max= 3.1369), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:36:28,992 - root - INFO - Step 2430: lr=1.00E-05, loss= 1.3707 (max= 2.2913), tps=18097, mfu=37.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:36:28,992 - root - INFO - Step 2430: lr=1.00E-05, loss= 1.3707 (max= 2.2913), tps=18097, mfu=37.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:36:28,992 - root - INFO - Step 2430: lr=1.00E-05, loss= 1.3707 (max= 2.2913), tps=18097, mfu=37.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:36:28,992 - root - INFO - Step 2430: lr=1.00E-05, loss= 1.3707 (max= 2.2913), tps=18097, mfu=37.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:36:28,992 - root - INFO - Step 2430: lr=1.00E-05, loss= 1.3707 (max= 2.2913), tps=18097, mfu=37.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:36:28,992 - root - INFO - Step 2430: lr=1.00E-05, loss= 1.3707 (max= 2.2913), tps=18097, mfu=37.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:36:28,992 - root - INFO - Step 2430: lr=1.00E-05, loss= 1.3707 (max= 2.2913), tps=18097, mfu=37.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:36:28,992 - root - INFO - Step 2430: lr=1.00E-05, loss= 1.3707 (max= 2.2913), tps=18097, mfu=37.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:36:48,563 - root - INFO - Step 2440: lr=1.00E-05, loss= 1.3871 (max= 3.2734), tps=16747, mfu=34.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:36:48,563 - root - INFO - Step 2440: lr=1.00E-05, loss= 1.3871 (max= 3.2734), tps=16747, mfu=34.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:36:48,563 - root - INFO - Step 2440: lr=1.00E-05, loss= 1.3871 (max= 3.2734), tps=16747, mfu=34.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:36:48,563 - root - INFO - Step 2440: lr=1.00E-05, loss= 1.3871 (max= 3.2734), tps=16747, mfu=34.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:36:48,563 - root - INFO - Step 2440: lr=1.00E-05, loss= 1.3871 (max= 3.2734), tps=16748, mfu=34.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:36:48,564 - root - INFO - Step 2440: lr=1.00E-05, loss= 1.3871 (max= 3.2734), tps=16748, mfu=34.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:36:48,564 - root - INFO - Step 2440: lr=1.00E-05, loss= 1.3871 (max= 3.2734), tps=16747, mfu=34.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:36:48,564 - root - INFO - Step 2440: lr=1.00E-05, loss= 1.3871 (max= 3.2734), tps=16747, mfu=34.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:37:08,824 - root - INFO - Step 2450: lr=1.00E-05, loss= 1.3636 (max= 2.8453), tps=16177, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:37:08,824 - root - INFO - Step 2450: lr=1.00E-05, loss= 1.3636 (max= 2.8453), tps=16177, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:37:08,824 - root - INFO - Step 2450: lr=1.00E-05, loss= 1.3636 (max= 2.8453), tps=16177, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:37:08,824 - root - INFO - Step 2450: lr=1.00E-05, loss= 1.3636 (max= 2.8453), tps=16177, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:37:08,824 - root - INFO - Step 2450: lr=1.00E-05, loss= 1.3636 (max= 2.8453), tps=16177, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:37:08,824 - root - INFO - Step 2450: lr=1.00E-05, loss= 1.3636 (max= 2.8453), tps=16177, mfu=33.70%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:37:08,824 - root - INFO - Step 2450: lr=1.00E-05, loss= 1.3636 (max= 2.8453), tps=16177, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:37:08,825 - root - INFO - Step 2450: lr=1.00E-05, loss= 1.3636 (max= 2.8453), tps=16177, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:37:29,060 - root - INFO - Step 2460: lr=1.00E-05, loss= 1.4086 (max= 3.5438), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:37:29,060 - root - INFO - Step 2460: lr=1.00E-05, loss= 1.4086 (max= 3.5438), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:37:29,060 - root - INFO - Step 2460: lr=1.00E-05, loss= 1.4086 (max= 3.5438), tps=16197, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:37:29,060 - root - INFO - Step 2460: lr=1.00E-05, loss= 1.4086 (max= 3.5438), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:37:29,060 - root - INFO - Step 2460: lr=1.00E-05, loss= 1.4086 (max= 3.5438), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:37:29,060 - root - INFO - Step 2460: lr=1.00E-05, loss= 1.4086 (max= 3.5438), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:37:29,060 - root - INFO - Step 2460: lr=1.00E-05, loss= 1.4086 (max= 3.5438), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:37:29,060 - root - INFO - Step 2460: lr=1.00E-05, loss= 1.4086 (max= 3.5438), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:37:56,177 - root - INFO - Step 2470: lr=1.00E-05, loss= 1.3610 (max= 2.4927), tps=12086, mfu=25.18%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.18s, 26.42%) +2025-10-24 10:37:56,177 - root - INFO - Step 2470: lr=1.00E-05, loss= 1.3610 (max= 2.4927), tps=12086, mfu=25.18%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.18s, 26.42%) +2025-10-24 10:37:56,177 - root - INFO - Step 2470: lr=1.00E-05, loss= 1.3610 (max= 2.4927), tps=12086, mfu=25.18%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.18s, 26.42%) +2025-10-24 10:37:56,177 - root - INFO - Step 2470: lr=1.00E-05, loss= 1.3610 (max= 2.4927), tps=12086, mfu=25.18%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.18s, 26.42%) +2025-10-24 10:37:56,177 - root - INFO - Step 2470: lr=1.00E-05, loss= 1.3610 (max= 2.4927), tps=12086, mfu=25.18%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.18s, 26.42%) +2025-10-24 10:37:56,177 - root - INFO - Step 2470: lr=1.00E-05, loss= 1.3610 (max= 2.4927), tps=12086, mfu=25.18%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.18s, 26.42%) +2025-10-24 10:37:56,178 - root - INFO - Step 2470: lr=1.00E-05, loss= 1.3610 (max= 2.4927), tps=12086, mfu=25.18%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.18s, 26.42%) +2025-10-24 10:37:56,178 - root - INFO - Step 2470: lr=1.00E-05, loss= 1.3610 (max= 2.4927), tps=12086, mfu=25.18%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.18s, 26.42%) +2025-10-24 10:38:16,399 - root - INFO - Step 2480: lr=1.00E-05, loss= 1.3455 (max= 2.9554), tps=16208, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:16,399 - root - INFO - Step 2480: lr=1.00E-05, loss= 1.3455 (max= 2.9554), tps=16208, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:16,400 - root - INFO - Step 2480: lr=1.00E-05, loss= 1.3455 (max= 2.9554), tps=16208, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:16,400 - root - INFO - Step 2480: lr=1.00E-05, loss= 1.3455 (max= 2.9554), tps=16208, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:16,400 - root - INFO - Step 2480: lr=1.00E-05, loss= 1.3455 (max= 2.9554), tps=16208, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:16,400 - root - INFO - Step 2480: lr=1.00E-05, loss= 1.3455 (max= 2.9554), tps=16208, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:16,400 - root - INFO - Step 2480: lr=1.00E-05, loss= 1.3455 (max= 2.9554), tps=16208, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:16,400 - root - INFO - Step 2480: lr=1.00E-05, loss= 1.3455 (max= 2.9554), tps=16208, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:36,675 - root - INFO - Step 2490: lr=1.00E-05, loss= 1.3799 (max= 2.7021), tps=16165, mfu=33.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:36,676 - root - INFO - Step 2490: lr=1.00E-05, loss= 1.3799 (max= 2.7021), tps=16165, mfu=33.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:36,676 - root - INFO - Step 2490: lr=1.00E-05, loss= 1.3799 (max= 2.7021), tps=16165, mfu=33.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:36,676 - root - INFO - Step 2490: lr=1.00E-05, loss= 1.3799 (max= 2.7021), tps=16165, mfu=33.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:36,676 - root - INFO - Step 2490: lr=1.00E-05, loss= 1.3799 (max= 2.7021), tps=16165, mfu=33.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:36,676 - root - INFO - Step 2490: lr=1.00E-05, loss= 1.3799 (max= 2.7021), tps=16165, mfu=33.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:36,676 - root - INFO - Step 2490: lr=1.00E-05, loss= 1.3799 (max= 2.7021), tps=16165, mfu=33.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:36,676 - root - INFO - Step 2490: lr=1.00E-05, loss= 1.3799 (max= 2.7021), tps=16165, mfu=33.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:56,933 - root - INFO - Step 2500: lr=1.00E-05, loss= 1.3568 (max= 2.6496), tps=16180, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:56,933 - root - INFO - Step 2500: lr=1.00E-05, loss= 1.3568 (max= 2.6496), tps=16180, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:56,933 - root - INFO - Step 2500: lr=1.00E-05, loss= 1.3568 (max= 2.6496), tps=16180, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:56,934 - root - INFO - Step 2500: lr=1.00E-05, loss= 1.3568 (max= 2.6496), tps=16180, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:56,934 - root - INFO - Step 2500: lr=1.00E-05, loss= 1.3568 (max= 2.6496), tps=16180, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:56,934 - root - INFO - Step 2500: lr=1.00E-05, loss= 1.3568 (max= 2.6496), tps=16180, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:56,934 - root - INFO - Step 2500: lr=1.00E-05, loss= 1.3568 (max= 2.6496), tps=16180, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:38:56,934 - root - INFO - Step 2500: lr=1.00E-05, loss= 1.3568 (max= 2.6496), tps=16180, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:39:16,191 - root - INFO - Step 2510: lr=1.00E-05, loss= 1.3738 (max= 2.8526), tps=17019, mfu=35.46%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:39:16,191 - root - INFO - Step 2510: lr=1.00E-05, loss= 1.3738 (max= 2.8526), tps=17019, mfu=35.46%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:39:16,191 - root - INFO - Step 2510: lr=1.00E-05, loss= 1.3738 (max= 2.8526), tps=17019, mfu=35.46%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:39:16,191 - root - INFO - Step 2510: lr=1.00E-05, loss= 1.3738 (max= 2.8526), tps=17019, mfu=35.46%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:39:16,191 - root - INFO - Step 2510: lr=1.00E-05, loss= 1.3738 (max= 2.8526), tps=17019, mfu=35.46%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:39:16,191 - root - INFO - Step 2510: lr=1.00E-05, loss= 1.3738 (max= 2.8526), tps=17019, mfu=35.46%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:39:16,191 - root - INFO - Step 2510: lr=1.00E-05, loss= 1.3738 (max= 2.8526), tps=17019, mfu=35.46%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:39:16,191 - root - INFO - Step 2510: lr=1.00E-05, loss= 1.3738 (max= 2.8526), tps=17019, mfu=35.46%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:39:34,228 - root - INFO - Step 2520: lr=1.00E-05, loss= 1.3977 (max= 2.6383), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:39:34,228 - root - INFO - Step 2520: lr=1.00E-05, loss= 1.3977 (max= 2.6383), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:39:34,228 - root - INFO - Step 2520: lr=1.00E-05, loss= 1.3977 (max= 2.6383), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:39:34,228 - root - INFO - Step 2520: lr=1.00E-05, loss= 1.3977 (max= 2.6383), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:39:34,228 - root - INFO - Step 2520: lr=1.00E-05, loss= 1.3977 (max= 2.6383), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:39:34,228 - root - INFO - Step 2520: lr=1.00E-05, loss= 1.3977 (max= 2.6383), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:39:34,228 - root - INFO - Step 2520: lr=1.00E-05, loss= 1.3977 (max= 2.6383), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:39:34,228 - root - INFO - Step 2520: lr=1.00E-05, loss= 1.3977 (max= 2.6383), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:39:52,270 - root - INFO - Step 2530: lr=1.00E-05, loss= 1.3867 (max= 2.3751), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:39:52,270 - root - INFO - Step 2530: lr=1.00E-05, loss= 1.3867 (max= 2.3751), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:39:52,270 - root - INFO - Step 2530: lr=1.00E-05, loss= 1.3867 (max= 2.3751), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:39:52,270 - root - INFO - Step 2530: lr=1.00E-05, loss= 1.3867 (max= 2.3751), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:39:52,271 - root - INFO - Step 2530: lr=1.00E-05, loss= 1.3867 (max= 2.3751), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:39:52,271 - root - INFO - Step 2530: lr=1.00E-05, loss= 1.3867 (max= 2.3751), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:39:52,271 - root - INFO - Step 2530: lr=1.00E-05, loss= 1.3867 (max= 2.3751), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:39:52,271 - root - INFO - Step 2530: lr=1.00E-05, loss= 1.3867 (max= 2.3751), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:40:10,286 - root - INFO - Step 2540: lr=1.00E-05, loss= 1.3614 (max= 2.4856), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:10,286 - root - INFO - Step 2540: lr=1.00E-05, loss= 1.3614 (max= 2.4856), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:10,286 - root - INFO - Step 2540: lr=1.00E-05, loss= 1.3614 (max= 2.4856), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:10,286 - root - INFO - Step 2540: lr=1.00E-05, loss= 1.3614 (max= 2.4856), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:10,286 - root - INFO - Step 2540: lr=1.00E-05, loss= 1.3614 (max= 2.4856), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:10,286 - root - INFO - Step 2540: lr=1.00E-05, loss= 1.3614 (max= 2.4856), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:10,286 - root - INFO - Step 2540: lr=1.00E-05, loss= 1.3614 (max= 2.4856), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:10,286 - root - INFO - Step 2540: lr=1.00E-05, loss= 1.3614 (max= 2.4856), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:28,273 - root - INFO - Step 2550: lr=1.00E-05, loss= 1.3667 (max= 2.6699), tps=18221, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:28,273 - root - INFO - Step 2550: lr=1.00E-05, loss= 1.3667 (max= 2.6699), tps=18221, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:28,273 - root - INFO - Step 2550: lr=1.00E-05, loss= 1.3667 (max= 2.6699), tps=18221, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:28,273 - root - INFO - Step 2550: lr=1.00E-05, loss= 1.3667 (max= 2.6699), tps=18221, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:28,273 - root - INFO - Step 2550: lr=1.00E-05, loss= 1.3667 (max= 2.6699), tps=18221, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:28,273 - root - INFO - Step 2550: lr=1.00E-05, loss= 1.3667 (max= 2.6699), tps=18221, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:28,273 - root - INFO - Step 2550: lr=1.00E-05, loss= 1.3667 (max= 2.6699), tps=18221, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:28,273 - root - INFO - Step 2550: lr=1.00E-05, loss= 1.3667 (max= 2.6699), tps=18221, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:48,190 - root - INFO - Step 2560: lr=1.00E-05, loss= 1.3760 (max= 3.4654), tps=16456, mfu=34.29%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:48,190 - root - INFO - Step 2560: lr=1.00E-05, loss= 1.3760 (max= 3.4654), tps=16456, mfu=34.29%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:48,190 - root - INFO - Step 2560: lr=1.00E-05, loss= 1.3760 (max= 3.4654), tps=16456, mfu=34.29%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:48,190 - root - INFO - Step 2560: lr=1.00E-05, loss= 1.3760 (max= 3.4654), tps=16456, mfu=34.29%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:48,191 - root - INFO - Step 2560: lr=1.00E-05, loss= 1.3760 (max= 3.4654), tps=16456, mfu=34.29%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:48,191 - root - INFO - Step 2560: lr=1.00E-05, loss= 1.3760 (max= 3.4654), tps=16456, mfu=34.29%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:48,191 - root - INFO - Step 2560: lr=1.00E-05, loss= 1.3760 (max= 3.4654), tps=16457, mfu=34.29%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:40:48,191 - root - INFO - Step 2560: lr=1.00E-05, loss= 1.3760 (max= 3.4654), tps=16456, mfu=34.29%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:08,445 - root - INFO - Step 2570: lr=1.00E-05, loss= 1.3812 (max= 2.3621), tps=16183, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:08,445 - root - INFO - Step 2570: lr=1.00E-05, loss= 1.3812 (max= 2.3621), tps=16183, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:08,445 - root - INFO - Step 2570: lr=1.00E-05, loss= 1.3812 (max= 2.3621), tps=16183, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:08,445 - root - INFO - Step 2570: lr=1.00E-05, loss= 1.3812 (max= 2.3621), tps=16183, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:08,445 - root - INFO - Step 2570: lr=1.00E-05, loss= 1.3812 (max= 2.3621), tps=16183, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:08,445 - root - INFO - Step 2570: lr=1.00E-05, loss= 1.3812 (max= 2.3621), tps=16183, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:08,445 - root - INFO - Step 2570: lr=1.00E-05, loss= 1.3812 (max= 2.3621), tps=16183, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:08,445 - root - INFO - Step 2570: lr=1.00E-05, loss= 1.3812 (max= 2.3621), tps=16183, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:28,646 - root - INFO - Step 2580: lr=1.00E-05, loss= 1.3579 (max= 2.3892), tps=16225, mfu=33.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:28,646 - root - INFO - Step 2580: lr=1.00E-05, loss= 1.3579 (max= 2.3892), tps=16225, mfu=33.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:28,646 - root - INFO - Step 2580: lr=1.00E-05, loss= 1.3579 (max= 2.3892), tps=16225, mfu=33.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:28,647 - root - INFO - Step 2580: lr=1.00E-05, loss= 1.3579 (max= 2.3892), tps=16225, mfu=33.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:28,647 - root - INFO - Step 2580: lr=1.00E-05, loss= 1.3579 (max= 2.3892), tps=16225, mfu=33.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:28,647 - root - INFO - Step 2580: lr=1.00E-05, loss= 1.3579 (max= 2.3892), tps=16225, mfu=33.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:28,647 - root - INFO - Step 2580: lr=1.00E-05, loss= 1.3579 (max= 2.3892), tps=16225, mfu=33.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:28,647 - root - INFO - Step 2580: lr=1.00E-05, loss= 1.3579 (max= 2.3892), tps=16225, mfu=33.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:48,905 - root - INFO - Step 2590: lr=1.00E-05, loss= 1.3541 (max= 2.3409), tps=16179, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:48,905 - root - INFO - Step 2590: lr=1.00E-05, loss= 1.3541 (max= 2.3409), tps=16179, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:48,905 - root - INFO - Step 2590: lr=1.00E-05, loss= 1.3541 (max= 2.3409), tps=16179, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:48,905 - root - INFO - Step 2590: lr=1.00E-05, loss= 1.3541 (max= 2.3409), tps=16179, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:48,905 - root - INFO - Step 2590: lr=1.00E-05, loss= 1.3541 (max= 2.3409), tps=16179, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:48,905 - root - INFO - Step 2590: lr=1.00E-05, loss= 1.3541 (max= 2.3409), tps=16179, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:48,905 - root - INFO - Step 2590: lr=1.00E-05, loss= 1.3541 (max= 2.3409), tps=16179, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:41:48,905 - root - INFO - Step 2590: lr=1.00E-05, loss= 1.3541 (max= 2.3409), tps=16179, mfu=33.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:09,123 - root - INFO - Step 2600: lr=1.00E-05, loss= 1.3455 (max= 2.4499), tps=16212, mfu=33.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:09,123 - root - INFO - Step 2600: lr=1.00E-05, loss= 1.3455 (max= 2.4499), tps=16212, mfu=33.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:09,123 - root - INFO - Step 2600: lr=1.00E-05, loss= 1.3455 (max= 2.4499), tps=16212, mfu=33.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:09,123 - root - INFO - Step 2600: lr=1.00E-05, loss= 1.3455 (max= 2.4499), tps=16212, mfu=33.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:09,123 - root - INFO - Step 2600: lr=1.00E-05, loss= 1.3455 (max= 2.4499), tps=16212, mfu=33.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:09,123 - root - INFO - Step 2600: lr=1.00E-05, loss= 1.3455 (max= 2.4499), tps=16212, mfu=33.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:09,123 - root - INFO - Step 2600: lr=1.00E-05, loss= 1.3455 (max= 2.4499), tps=16212, mfu=33.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:09,123 - root - INFO - Step 2600: lr=1.00E-05, loss= 1.3455 (max= 2.4499), tps=16212, mfu=33.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:29,392 - root - INFO - Step 2610: lr=1.00E-05, loss= 1.3418 (max= 3.1401), tps=16171, mfu=33.69%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:29,392 - root - INFO - Step 2610: lr=1.00E-05, loss= 1.3418 (max= 3.1401), tps=16171, mfu=33.69%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:29,392 - root - INFO - Step 2610: lr=1.00E-05, loss= 1.3418 (max= 3.1401), tps=16171, mfu=33.69%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:29,392 - root - INFO - Step 2610: lr=1.00E-05, loss= 1.3418 (max= 3.1401), tps=16171, mfu=33.69%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:29,392 - root - INFO - Step 2610: lr=1.00E-05, loss= 1.3418 (max= 3.1401), tps=16171, mfu=33.69%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:29,392 - root - INFO - Step 2610: lr=1.00E-05, loss= 1.3418 (max= 3.1401), tps=16171, mfu=33.69%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:29,392 - root - INFO - Step 2610: lr=1.00E-05, loss= 1.3418 (max= 3.1401), tps=16171, mfu=33.69%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:29,393 - root - INFO - Step 2610: lr=1.00E-05, loss= 1.3418 (max= 3.1401), tps=16170, mfu=33.69%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:49,630 - root - INFO - Step 2620: lr=1.00E-05, loss= 1.4043 (max= 2.6565), tps=16195, mfu=33.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:49,630 - root - INFO - Step 2620: lr=1.00E-05, loss= 1.4043 (max= 2.6565), tps=16195, mfu=33.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:49,631 - root - INFO - Step 2620: lr=1.00E-05, loss= 1.4043 (max= 2.6565), tps=16195, mfu=33.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:49,631 - root - INFO - Step 2620: lr=1.00E-05, loss= 1.4043 (max= 2.6565), tps=16195, mfu=33.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:49,631 - root - INFO - Step 2620: lr=1.00E-05, loss= 1.4043 (max= 2.6565), tps=16195, mfu=33.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:49,631 - root - INFO - Step 2620: lr=1.00E-05, loss= 1.4043 (max= 2.6565), tps=16195, mfu=33.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:49,631 - root - INFO - Step 2620: lr=1.00E-05, loss= 1.4043 (max= 2.6565), tps=16195, mfu=33.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:42:49,631 - root - INFO - Step 2620: lr=1.00E-05, loss= 1.4043 (max= 2.6565), tps=16196, mfu=33.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:09,300 - root - INFO - Step 2630: lr=1.00E-05, loss= 1.3467 (max= 2.3036), tps=16662, mfu=34.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:09,300 - root - INFO - Step 2630: lr=1.00E-05, loss= 1.3467 (max= 2.3036), tps=16662, mfu=34.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:09,300 - root - INFO - Step 2630: lr=1.00E-05, loss= 1.3467 (max= 2.3036), tps=16662, mfu=34.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:09,301 - root - INFO - Step 2630: lr=1.00E-05, loss= 1.3467 (max= 2.3036), tps=16662, mfu=34.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:09,301 - root - INFO - Step 2630: lr=1.00E-05, loss= 1.3467 (max= 2.3036), tps=16662, mfu=34.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:09,301 - root - INFO - Step 2630: lr=1.00E-05, loss= 1.3467 (max= 2.3036), tps=16662, mfu=34.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:09,301 - root - INFO - Step 2630: lr=1.00E-05, loss= 1.3467 (max= 2.3036), tps=16662, mfu=34.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:09,301 - root - INFO - Step 2630: lr=1.00E-05, loss= 1.3467 (max= 2.3036), tps=16662, mfu=34.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:27,332 - root - INFO - Step 2640: lr=1.00E-05, loss= 1.3798 (max= 2.5915), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:27,332 - root - INFO - Step 2640: lr=1.00E-05, loss= 1.3798 (max= 2.5915), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:27,332 - root - INFO - Step 2640: lr=1.00E-05, loss= 1.3798 (max= 2.5915), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:27,332 - root - INFO - Step 2640: lr=1.00E-05, loss= 1.3798 (max= 2.5915), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:27,332 - root - INFO - Step 2640: lr=1.00E-05, loss= 1.3798 (max= 2.5915), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:27,332 - root - INFO - Step 2640: lr=1.00E-05, loss= 1.3798 (max= 2.5915), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:27,332 - root - INFO - Step 2640: lr=1.00E-05, loss= 1.3798 (max= 2.5915), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:27,332 - root - INFO - Step 2640: lr=1.00E-05, loss= 1.3798 (max= 2.5915), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:45,347 - root - INFO - Step 2650: lr=1.00E-05, loss= 1.3474 (max= 2.7793), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:45,347 - root - INFO - Step 2650: lr=1.00E-05, loss= 1.3474 (max= 2.7793), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:45,347 - root - INFO - Step 2650: lr=1.00E-05, loss= 1.3474 (max= 2.7793), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:45,347 - root - INFO - Step 2650: lr=1.00E-05, loss= 1.3474 (max= 2.7793), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:45,347 - root - INFO - Step 2650: lr=1.00E-05, loss= 1.3474 (max= 2.7793), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:45,347 - root - INFO - Step 2650: lr=1.00E-05, loss= 1.3474 (max= 2.7793), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:45,348 - root - INFO - Step 2650: lr=1.00E-05, loss= 1.3474 (max= 2.7793), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:43:45,348 - root - INFO - Step 2650: lr=1.00E-05, loss= 1.3474 (max= 2.7793), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:44:03,372 - root - INFO - Step 2660: lr=1.00E-05, loss= 1.3656 (max= 2.4230), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:44:03,372 - root - INFO - Step 2660: lr=1.00E-05, loss= 1.3656 (max= 2.4230), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:44:03,372 - root - INFO - Step 2660: lr=1.00E-05, loss= 1.3656 (max= 2.4230), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:44:03,372 - root - INFO - Step 2660: lr=1.00E-05, loss= 1.3656 (max= 2.4230), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:44:03,373 - root - INFO - Step 2660: lr=1.00E-05, loss= 1.3656 (max= 2.4230), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:44:03,373 - root - INFO - Step 2660: lr=1.00E-05, loss= 1.3656 (max= 2.4230), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:44:03,373 - root - INFO - Step 2660: lr=1.00E-05, loss= 1.3656 (max= 2.4230), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:44:03,373 - root - INFO - Step 2660: lr=1.00E-05, loss= 1.3656 (max= 2.4230), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:44:21,385 - root - INFO - Step 2670: lr=1.00E-05, loss= 1.3928 (max= 2.4427), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:44:21,385 - root - INFO - Step 2670: lr=1.00E-05, loss= 1.3928 (max= 2.4427), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:44:21,385 - root - INFO - Step 2670: lr=1.00E-05, loss= 1.3928 (max= 2.4427), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:44:21,385 - root - INFO - Step 2670: lr=1.00E-05, loss= 1.3928 (max= 2.4427), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:44:21,386 - root - INFO - Step 2670: lr=1.00E-05, loss= 1.3928 (max= 2.4427), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:44:21,386 - root - INFO - Step 2670: lr=1.00E-05, loss= 1.3928 (max= 2.4427), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:44:21,386 - root - INFO - Step 2670: lr=1.00E-05, loss= 1.3928 (max= 2.4427), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:44:21,386 - root - INFO - Step 2670: lr=1.00E-05, loss= 1.3928 (max= 2.4427), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:44:39,389 - root - INFO - Step 2680: lr=1.00E-05, loss= 1.3731 (max= 3.6176), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:44:39,389 - root - INFO - Step 2680: lr=1.00E-05, loss= 1.3731 (max= 3.6176), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:44:39,389 - root - INFO - Step 2680: lr=1.00E-05, loss= 1.3731 (max= 3.6176), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:44:39,389 - root - INFO - Step 2680: lr=1.00E-05, loss= 1.3731 (max= 3.6176), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:44:39,389 - root - INFO - Step 2680: lr=1.00E-05, loss= 1.3731 (max= 3.6176), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:44:39,389 - root - INFO - Step 2680: lr=1.00E-05, loss= 1.3731 (max= 3.6176), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:44:39,389 - root - INFO - Step 2680: lr=1.00E-05, loss= 1.3731 (max= 3.6176), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:44:39,389 - root - INFO - Step 2680: lr=1.00E-05, loss= 1.3731 (max= 3.6176), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:44:57,450 - root - INFO - Step 2690: lr=1.00E-05, loss= 1.3620 (max= 2.4899), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:44:57,451 - root - INFO - Step 2690: lr=1.00E-05, loss= 1.3620 (max= 2.4899), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:44:57,451 - root - INFO - Step 2690: lr=1.00E-05, loss= 1.3620 (max= 2.4899), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:44:57,451 - root - INFO - Step 2690: lr=1.00E-05, loss= 1.3620 (max= 2.4899), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:44:57,451 - root - INFO - Step 2690: lr=1.00E-05, loss= 1.3620 (max= 2.4899), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:44:57,451 - root - INFO - Step 2690: lr=1.00E-05, loss= 1.3620 (max= 2.4899), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:44:57,451 - root - INFO - Step 2690: lr=1.00E-05, loss= 1.3620 (max= 2.4899), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:44:57,451 - root - INFO - Step 2690: lr=1.00E-05, loss= 1.3620 (max= 2.4899), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:45:15,504 - root - INFO - Step 2700: lr=1.00E-05, loss= 1.3849 (max= 2.5231), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:15,505 - root - INFO - Step 2700: lr=1.00E-05, loss= 1.3849 (max= 2.5231), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:15,505 - root - INFO - Step 2700: lr=1.00E-05, loss= 1.3849 (max= 2.5231), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:15,505 - root - INFO - Step 2700: lr=1.00E-05, loss= 1.3849 (max= 2.5231), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:15,505 - root - INFO - Step 2700: lr=1.00E-05, loss= 1.3849 (max= 2.5231), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:15,505 - root - INFO - Step 2700: lr=1.00E-05, loss= 1.3849 (max= 2.5231), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:15,505 - root - INFO - Step 2700: lr=1.00E-05, loss= 1.3849 (max= 2.5231), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:15,505 - root - INFO - Step 2700: lr=1.00E-05, loss= 1.3849 (max= 2.5231), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:33,535 - root - INFO - Step 2710: lr=1.00E-05, loss= 1.3864 (max= 2.5990), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:33,536 - root - INFO - Step 2710: lr=1.00E-05, loss= 1.3864 (max= 2.5990), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:33,536 - root - INFO - Step 2710: lr=1.00E-05, loss= 1.3864 (max= 2.5990), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:33,536 - root - INFO - Step 2710: lr=1.00E-05, loss= 1.3864 (max= 2.5990), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:33,536 - root - INFO - Step 2710: lr=1.00E-05, loss= 1.3864 (max= 2.5990), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:33,536 - root - INFO - Step 2710: lr=1.00E-05, loss= 1.3864 (max= 2.5990), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:33,536 - root - INFO - Step 2710: lr=1.00E-05, loss= 1.3864 (max= 2.5990), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:33,536 - root - INFO - Step 2710: lr=1.00E-05, loss= 1.3864 (max= 2.5990), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:51,568 - root - INFO - Step 2720: lr=1.00E-05, loss= 1.3541 (max= 2.3497), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:51,568 - root - INFO - Step 2720: lr=1.00E-05, loss= 1.3541 (max= 2.3497), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:51,568 - root - INFO - Step 2720: lr=1.00E-05, loss= 1.3541 (max= 2.3497), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:51,569 - root - INFO - Step 2720: lr=1.00E-05, loss= 1.3541 (max= 2.3497), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:51,569 - root - INFO - Step 2720: lr=1.00E-05, loss= 1.3541 (max= 2.3497), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:51,569 - root - INFO - Step 2720: lr=1.00E-05, loss= 1.3541 (max= 2.3497), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:51,569 - root - INFO - Step 2720: lr=1.00E-05, loss= 1.3541 (max= 2.3497), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:45:51,569 - root - INFO - Step 2720: lr=1.00E-05, loss= 1.3541 (max= 2.3497), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:09,620 - root - INFO - Step 2730: lr=1.00E-05, loss= 1.3823 (max= 2.7590), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:09,621 - root - INFO - Step 2730: lr=1.00E-05, loss= 1.3823 (max= 2.7590), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:09,621 - root - INFO - Step 2730: lr=1.00E-05, loss= 1.3823 (max= 2.7590), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:09,621 - root - INFO - Step 2730: lr=1.00E-05, loss= 1.3823 (max= 2.7590), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:09,621 - root - INFO - Step 2730: lr=1.00E-05, loss= 1.3823 (max= 2.7590), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:09,621 - root - INFO - Step 2730: lr=1.00E-05, loss= 1.3823 (max= 2.7590), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:09,621 - root - INFO - Step 2730: lr=1.00E-05, loss= 1.3823 (max= 2.7590), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:09,621 - root - INFO - Step 2730: lr=1.00E-05, loss= 1.3823 (max= 2.7590), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:27,658 - root - INFO - Step 2740: lr=1.00E-05, loss= 1.3528 (max= 3.3922), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:27,658 - root - INFO - Step 2740: lr=1.00E-05, loss= 1.3528 (max= 3.3922), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:27,658 - root - INFO - Step 2740: lr=1.00E-05, loss= 1.3528 (max= 3.3922), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:27,658 - root - INFO - Step 2740: lr=1.00E-05, loss= 1.3528 (max= 3.3922), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:27,658 - root - INFO - Step 2740: lr=1.00E-05, loss= 1.3528 (max= 3.3922), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:27,658 - root - INFO - Step 2740: lr=1.00E-05, loss= 1.3528 (max= 3.3922), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:27,658 - root - INFO - Step 2740: lr=1.00E-05, loss= 1.3528 (max= 3.3922), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:27,658 - root - INFO - Step 2740: lr=1.00E-05, loss= 1.3528 (max= 3.3922), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:45,714 - root - INFO - Step 2750: lr=1.00E-05, loss= 1.3728 (max= 2.3593), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:45,714 - root - INFO - Step 2750: lr=1.00E-05, loss= 1.3728 (max= 2.3593), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:45,714 - root - INFO - Step 2750: lr=1.00E-05, loss= 1.3728 (max= 2.3593), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:45,714 - root - INFO - Step 2750: lr=1.00E-05, loss= 1.3728 (max= 2.3593), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:45,714 - root - INFO - Step 2750: lr=1.00E-05, loss= 1.3728 (max= 2.3593), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:45,714 - root - INFO - Step 2750: lr=1.00E-05, loss= 1.3728 (max= 2.3593), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:45,714 - root - INFO - Step 2750: lr=1.00E-05, loss= 1.3728 (max= 2.3593), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:46:45,714 - root - INFO - Step 2750: lr=1.00E-05, loss= 1.3728 (max= 2.3593), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:03,751 - root - INFO - Step 2760: lr=1.00E-05, loss= 1.3711 (max= 2.5763), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:03,751 - root - INFO - Step 2760: lr=1.00E-05, loss= 1.3711 (max= 2.5763), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:03,751 - root - INFO - Step 2760: lr=1.00E-05, loss= 1.3711 (max= 2.5763), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:03,751 - root - INFO - Step 2760: lr=1.00E-05, loss= 1.3711 (max= 2.5763), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:03,751 - root - INFO - Step 2760: lr=1.00E-05, loss= 1.3711 (max= 2.5763), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:03,751 - root - INFO - Step 2760: lr=1.00E-05, loss= 1.3711 (max= 2.5763), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:03,751 - root - INFO - Step 2760: lr=1.00E-05, loss= 1.3711 (max= 2.5763), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:03,751 - root - INFO - Step 2760: lr=1.00E-05, loss= 1.3711 (max= 2.5763), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:21,819 - root - INFO - Step 2770: lr=1.00E-05, loss= 1.4129 (max= 2.7709), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:21,820 - root - INFO - Step 2770: lr=1.00E-05, loss= 1.4129 (max= 2.7709), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:21,820 - root - INFO - Step 2770: lr=1.00E-05, loss= 1.4129 (max= 2.7709), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:21,820 - root - INFO - Step 2770: lr=1.00E-05, loss= 1.4129 (max= 2.7709), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:21,820 - root - INFO - Step 2770: lr=1.00E-05, loss= 1.4129 (max= 2.7709), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:21,820 - root - INFO - Step 2770: lr=1.00E-05, loss= 1.4129 (max= 2.7709), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:21,820 - root - INFO - Step 2770: lr=1.00E-05, loss= 1.4129 (max= 2.7709), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:21,820 - root - INFO - Step 2770: lr=1.00E-05, loss= 1.4129 (max= 2.7709), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:39,849 - root - INFO - Step 2780: lr=1.00E-05, loss= 1.3874 (max= 2.4562), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:39,849 - root - INFO - Step 2780: lr=1.00E-05, loss= 1.3874 (max= 2.4562), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:39,849 - root - INFO - Step 2780: lr=1.00E-05, loss= 1.3874 (max= 2.4562), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:39,849 - root - INFO - Step 2780: lr=1.00E-05, loss= 1.3874 (max= 2.4562), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:39,850 - root - INFO - Step 2780: lr=1.00E-05, loss= 1.3874 (max= 2.4562), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:39,850 - root - INFO - Step 2780: lr=1.00E-05, loss= 1.3874 (max= 2.4562), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:39,850 - root - INFO - Step 2780: lr=1.00E-05, loss= 1.3874 (max= 2.4562), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:39,850 - root - INFO - Step 2780: lr=1.00E-05, loss= 1.3874 (max= 2.4562), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:57,865 - root - INFO - Step 2790: lr=1.00E-05, loss= 1.3617 (max= 2.5490), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:57,865 - root - INFO - Step 2790: lr=1.00E-05, loss= 1.3617 (max= 2.5490), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:57,865 - root - INFO - Step 2790: lr=1.00E-05, loss= 1.3617 (max= 2.5490), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:57,865 - root - INFO - Step 2790: lr=1.00E-05, loss= 1.3617 (max= 2.5490), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:57,865 - root - INFO - Step 2790: lr=1.00E-05, loss= 1.3617 (max= 2.5490), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:57,865 - root - INFO - Step 2790: lr=1.00E-05, loss= 1.3617 (max= 2.5490), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:57,865 - root - INFO - Step 2790: lr=1.00E-05, loss= 1.3617 (max= 2.5490), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:47:57,865 - root - INFO - Step 2790: lr=1.00E-05, loss= 1.3617 (max= 2.5490), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:15,913 - root - INFO - Step 2800: lr=1.00E-05, loss= 1.4060 (max= 2.8332), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:15,913 - root - INFO - Step 2800: lr=1.00E-05, loss= 1.4060 (max= 2.8332), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:15,913 - root - INFO - Step 2800: lr=1.00E-05, loss= 1.4060 (max= 2.8332), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:15,914 - root - INFO - Step 2800: lr=1.00E-05, loss= 1.4060 (max= 2.8332), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:15,914 - root - INFO - Step 2800: lr=1.00E-05, loss= 1.4060 (max= 2.8332), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:15,914 - root - INFO - Step 2800: lr=1.00E-05, loss= 1.4060 (max= 2.8332), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:15,914 - root - INFO - Step 2800: lr=1.00E-05, loss= 1.4060 (max= 2.8332), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:15,914 - root - INFO - Step 2800: lr=1.00E-05, loss= 1.4060 (max= 2.8332), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:33,920 - root - INFO - Step 2810: lr=1.00E-05, loss= 1.3768 (max= 3.3311), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:33,920 - root - INFO - Step 2810: lr=1.00E-05, loss= 1.3768 (max= 3.3311), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:33,921 - root - INFO - Step 2810: lr=1.00E-05, loss= 1.3768 (max= 3.3311), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:33,921 - root - INFO - Step 2810: lr=1.00E-05, loss= 1.3768 (max= 3.3311), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:33,921 - root - INFO - Step 2810: lr=1.00E-05, loss= 1.3768 (max= 3.3311), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:33,921 - root - INFO - Step 2810: lr=1.00E-05, loss= 1.3768 (max= 3.3311), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:33,921 - root - INFO - Step 2810: lr=1.00E-05, loss= 1.3768 (max= 3.3311), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:33,921 - root - INFO - Step 2810: lr=1.00E-05, loss= 1.3768 (max= 3.3311), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:51,950 - root - INFO - Step 2820: lr=1.00E-05, loss= 1.3614 (max= 2.3267), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:51,951 - root - INFO - Step 2820: lr=1.00E-05, loss= 1.3614 (max= 2.3267), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:51,951 - root - INFO - Step 2820: lr=1.00E-05, loss= 1.3614 (max= 2.3267), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:51,951 - root - INFO - Step 2820: lr=1.00E-05, loss= 1.3614 (max= 2.3267), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:51,951 - root - INFO - Step 2820: lr=1.00E-05, loss= 1.3614 (max= 2.3267), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:51,951 - root - INFO - Step 2820: lr=1.00E-05, loss= 1.3614 (max= 2.3267), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:51,951 - root - INFO - Step 2820: lr=1.00E-05, loss= 1.3614 (max= 2.3267), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:48:51,951 - root - INFO - Step 2820: lr=1.00E-05, loss= 1.3614 (max= 2.3267), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:49:09,971 - root - INFO - Step 2830: lr=1.00E-05, loss= 1.3720 (max= 2.4711), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:49:09,971 - root - INFO - Step 2830: lr=1.00E-05, loss= 1.3720 (max= 2.4711), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:49:09,971 - root - INFO - Step 2830: lr=1.00E-05, loss= 1.3720 (max= 2.4711), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:49:09,971 - root - INFO - Step 2830: lr=1.00E-05, loss= 1.3720 (max= 2.4711), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:49:09,971 - root - INFO - Step 2830: lr=1.00E-05, loss= 1.3720 (max= 2.4711), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:49:09,971 - root - INFO - Step 2830: lr=1.00E-05, loss= 1.3720 (max= 2.4711), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:49:09,971 - root - INFO - Step 2830: lr=1.00E-05, loss= 1.3720 (max= 2.4711), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:49:09,971 - root - INFO - Step 2830: lr=1.00E-05, loss= 1.3720 (max= 2.4711), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:49:28,020 - root - INFO - Step 2840: lr=1.00E-05, loss= 1.3777 (max= 2.4212), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:49:28,021 - root - INFO - Step 2840: lr=1.00E-05, loss= 1.3777 (max= 2.4212), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:49:28,021 - root - INFO - Step 2840: lr=1.00E-05, loss= 1.3777 (max= 2.4212), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:49:28,021 - root - INFO - Step 2840: lr=1.00E-05, loss= 1.3777 (max= 2.4212), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:49:28,021 - root - INFO - Step 2840: lr=1.00E-05, loss= 1.3777 (max= 2.4212), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:49:28,021 - root - INFO - Step 2840: lr=1.00E-05, loss= 1.3777 (max= 2.4212), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:49:28,021 - root - INFO - Step 2840: lr=1.00E-05, loss= 1.3777 (max= 2.4212), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:49:28,021 - root - INFO - Step 2840: lr=1.00E-05, loss= 1.3777 (max= 2.4212), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:49:46,060 - root - INFO - Step 2850: lr=1.00E-05, loss= 1.3787 (max= 2.3498), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:49:46,060 - root - INFO - Step 2850: lr=1.00E-05, loss= 1.3787 (max= 2.3498), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:49:46,060 - root - INFO - Step 2850: lr=1.00E-05, loss= 1.3787 (max= 2.3498), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:49:46,060 - root - INFO - Step 2850: lr=1.00E-05, loss= 1.3787 (max= 2.3498), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:49:46,060 - root - INFO - Step 2850: lr=1.00E-05, loss= 1.3787 (max= 2.3498), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:49:46,060 - root - INFO - Step 2850: lr=1.00E-05, loss= 1.3787 (max= 2.3498), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:49:46,060 - root - INFO - Step 2850: lr=1.00E-05, loss= 1.3787 (max= 2.3498), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:49:46,060 - root - INFO - Step 2850: lr=1.00E-05, loss= 1.3787 (max= 2.3498), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:50:04,086 - root - INFO - Step 2860: lr=1.00E-05, loss= 1.3681 (max= 2.5872), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:04,086 - root - INFO - Step 2860: lr=1.00E-05, loss= 1.3681 (max= 2.5872), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:04,086 - root - INFO - Step 2860: lr=1.00E-05, loss= 1.3681 (max= 2.5872), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:04,086 - root - INFO - Step 2860: lr=1.00E-05, loss= 1.3681 (max= 2.5872), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:04,086 - root - INFO - Step 2860: lr=1.00E-05, loss= 1.3681 (max= 2.5872), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:04,086 - root - INFO - Step 2860: lr=1.00E-05, loss= 1.3681 (max= 2.5872), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:04,086 - root - INFO - Step 2860: lr=1.00E-05, loss= 1.3681 (max= 2.5872), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:04,086 - root - INFO - Step 2860: lr=1.00E-05, loss= 1.3681 (max= 2.5872), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:22,087 - root - INFO - Step 2870: lr=1.00E-05, loss= 1.3756 (max= 2.8151), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:50:22,087 - root - INFO - Step 2870: lr=1.00E-05, loss= 1.3756 (max= 2.8151), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:50:22,087 - root - INFO - Step 2870: lr=1.00E-05, loss= 1.3756 (max= 2.8151), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:50:22,087 - root - INFO - Step 2870: lr=1.00E-05, loss= 1.3756 (max= 2.8151), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:50:22,087 - root - INFO - Step 2870: lr=1.00E-05, loss= 1.3756 (max= 2.8151), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:50:22,087 - root - INFO - Step 2870: lr=1.00E-05, loss= 1.3756 (max= 2.8151), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:50:22,087 - root - INFO - Step 2870: lr=1.00E-05, loss= 1.3756 (max= 2.8151), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:50:22,087 - root - INFO - Step 2870: lr=1.00E-05, loss= 1.3756 (max= 2.8151), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:50:40,115 - root - INFO - Step 2880: lr=1.00E-05, loss= 1.3811 (max= 2.4073), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:40,115 - root - INFO - Step 2880: lr=1.00E-05, loss= 1.3811 (max= 2.4073), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:40,115 - root - INFO - Step 2880: lr=1.00E-05, loss= 1.3811 (max= 2.4073), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:40,115 - root - INFO - Step 2880: lr=1.00E-05, loss= 1.3811 (max= 2.4073), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:40,115 - root - INFO - Step 2880: lr=1.00E-05, loss= 1.3811 (max= 2.4073), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:40,115 - root - INFO - Step 2880: lr=1.00E-05, loss= 1.3811 (max= 2.4073), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:40,115 - root - INFO - Step 2880: lr=1.00E-05, loss= 1.3811 (max= 2.4073), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:40,115 - root - INFO - Step 2880: lr=1.00E-05, loss= 1.3811 (max= 2.4073), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:58,153 - root - INFO - Step 2890: lr=1.00E-05, loss= 1.3291 (max= 3.5069), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:58,153 - root - INFO - Step 2890: lr=1.00E-05, loss= 1.3291 (max= 3.5069), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:58,153 - root - INFO - Step 2890: lr=1.00E-05, loss= 1.3291 (max= 3.5069), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:58,153 - root - INFO - Step 2890: lr=1.00E-05, loss= 1.3291 (max= 3.5069), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:58,153 - root - INFO - Step 2890: lr=1.00E-05, loss= 1.3291 (max= 3.5069), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:58,153 - root - INFO - Step 2890: lr=1.00E-05, loss= 1.3291 (max= 3.5069), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:58,153 - root - INFO - Step 2890: lr=1.00E-05, loss= 1.3291 (max= 3.5069), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:50:58,153 - root - INFO - Step 2890: lr=1.00E-05, loss= 1.3291 (max= 3.5069), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:51:16,169 - root - INFO - Step 2900: lr=1.00E-05, loss= 1.3635 (max= 2.5560), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:51:16,169 - root - INFO - Step 2900: lr=1.00E-05, loss= 1.3635 (max= 2.5560), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:51:16,170 - root - INFO - Step 2900: lr=1.00E-05, loss= 1.3635 (max= 2.5560), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:51:16,170 - root - INFO - Step 2900: lr=1.00E-05, loss= 1.3635 (max= 2.5560), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:51:16,170 - root - INFO - Step 2900: lr=1.00E-05, loss= 1.3635 (max= 2.5560), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:51:16,170 - root - INFO - Step 2900: lr=1.00E-05, loss= 1.3635 (max= 2.5560), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:51:16,170 - root - INFO - Step 2900: lr=1.00E-05, loss= 1.3635 (max= 2.5560), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:51:16,170 - root - INFO - Step 2900: lr=1.00E-05, loss= 1.3635 (max= 2.5560), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:51:34,176 - root - INFO - Step 2910: lr=1.00E-05, loss= 1.3709 (max= 2.4610), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:51:34,176 - root - INFO - Step 2910: lr=1.00E-05, loss= 1.3709 (max= 2.4610), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:51:34,176 - root - INFO - Step 2910: lr=1.00E-05, loss= 1.3709 (max= 2.4610), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:51:34,176 - root - INFO - Step 2910: lr=1.00E-05, loss= 1.3709 (max= 2.4610), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:51:34,176 - root - INFO - Step 2910: lr=1.00E-05, loss= 1.3709 (max= 2.4610), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:51:34,176 - root - INFO - Step 2910: lr=1.00E-05, loss= 1.3709 (max= 2.4610), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:51:34,176 - root - INFO - Step 2910: lr=1.00E-05, loss= 1.3709 (max= 2.4610), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:51:34,176 - root - INFO - Step 2910: lr=1.00E-05, loss= 1.3709 (max= 2.4610), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:51:52,213 - root - INFO - Step 2920: lr=1.00E-05, loss= 1.3318 (max= 2.7830), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:51:52,213 - root - INFO - Step 2920: lr=1.00E-05, loss= 1.3318 (max= 2.7830), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:51:52,214 - root - INFO - Step 2920: lr=1.00E-05, loss= 1.3318 (max= 2.7830), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:51:52,214 - root - INFO - Step 2920: lr=1.00E-05, loss= 1.3318 (max= 2.7830), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:51:52,214 - root - INFO - Step 2920: lr=1.00E-05, loss= 1.3318 (max= 2.7830), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:51:52,214 - root - INFO - Step 2920: lr=1.00E-05, loss= 1.3318 (max= 2.7830), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:51:52,214 - root - INFO - Step 2920: lr=1.00E-05, loss= 1.3318 (max= 2.7830), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:51:52,214 - root - INFO - Step 2920: lr=1.00E-05, loss= 1.3318 (max= 2.7830), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:52:10,228 - root - INFO - Step 2930: lr=1.00E-05, loss= 1.4017 (max= 2.3189), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:52:10,228 - root - INFO - Step 2930: lr=1.00E-05, loss= 1.4017 (max= 2.3189), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:52:10,228 - root - INFO - Step 2930: lr=1.00E-05, loss= 1.4017 (max= 2.3189), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:52:10,228 - root - INFO - Step 2930: lr=1.00E-05, loss= 1.4017 (max= 2.3189), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:52:10,228 - root - INFO - Step 2930: lr=1.00E-05, loss= 1.4017 (max= 2.3189), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:52:10,228 - root - INFO - Step 2930: lr=1.00E-05, loss= 1.4017 (max= 2.3189), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:52:10,228 - root - INFO - Step 2930: lr=1.00E-05, loss= 1.4017 (max= 2.3189), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:52:10,228 - root - INFO - Step 2930: lr=1.00E-05, loss= 1.4017 (max= 2.3189), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:52:28,253 - root - INFO - Step 2940: lr=1.00E-05, loss= 1.3977 (max= 3.1568), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:52:28,253 - root - INFO - Step 2940: lr=1.00E-05, loss= 1.3977 (max= 3.1568), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:52:28,253 - root - INFO - Step 2940: lr=1.00E-05, loss= 1.3977 (max= 3.1568), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:52:28,253 - root - INFO - Step 2940: lr=1.00E-05, loss= 1.3977 (max= 3.1568), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:52:28,253 - root - INFO - Step 2940: lr=1.00E-05, loss= 1.3977 (max= 3.1568), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:52:28,253 - root - INFO - Step 2940: lr=1.00E-05, loss= 1.3977 (max= 3.1568), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:52:28,253 - root - INFO - Step 2940: lr=1.00E-05, loss= 1.3977 (max= 3.1568), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:52:28,253 - root - INFO - Step 2940: lr=1.00E-05, loss= 1.3977 (max= 3.1568), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:52:46,260 - root - INFO - Step 2950: lr=1.00E-05, loss= 1.3377 (max= 2.4890), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:52:46,260 - root - INFO - Step 2950: lr=1.00E-05, loss= 1.3377 (max= 2.4890), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:52:46,260 - root - INFO - Step 2950: lr=1.00E-05, loss= 1.3377 (max= 2.4890), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:52:46,261 - root - INFO - Step 2950: lr=1.00E-05, loss= 1.3377 (max= 2.4890), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:52:46,261 - root - INFO - Step 2950: lr=1.00E-05, loss= 1.3377 (max= 2.4890), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:52:46,261 - root - INFO - Step 2950: lr=1.00E-05, loss= 1.3377 (max= 2.4890), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:52:46,261 - root - INFO - Step 2950: lr=1.00E-05, loss= 1.3377 (max= 2.4890), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:52:46,261 - root - INFO - Step 2950: lr=1.00E-05, loss= 1.3377 (max= 2.4890), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:53:04,288 - root - INFO - Step 2960: lr=1.00E-05, loss= 1.3776 (max= 2.3594), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:04,288 - root - INFO - Step 2960: lr=1.00E-05, loss= 1.3776 (max= 2.3594), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:04,288 - root - INFO - Step 2960: lr=1.00E-05, loss= 1.3776 (max= 2.3594), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:04,288 - root - INFO - Step 2960: lr=1.00E-05, loss= 1.3776 (max= 2.3594), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:04,288 - root - INFO - Step 2960: lr=1.00E-05, loss= 1.3776 (max= 2.3594), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:04,288 - root - INFO - Step 2960: lr=1.00E-05, loss= 1.3776 (max= 2.3594), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:04,288 - root - INFO - Step 2960: lr=1.00E-05, loss= 1.3776 (max= 2.3594), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:04,289 - root - INFO - Step 2960: lr=1.00E-05, loss= 1.3776 (max= 2.3594), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:22,312 - root - INFO - Step 2970: lr=1.00E-05, loss= 1.3294 (max= 2.3032), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:22,312 - root - INFO - Step 2970: lr=1.00E-05, loss= 1.3294 (max= 2.3032), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:22,312 - root - INFO - Step 2970: lr=1.00E-05, loss= 1.3294 (max= 2.3032), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:22,313 - root - INFO - Step 2970: lr=1.00E-05, loss= 1.3294 (max= 2.3032), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:22,313 - root - INFO - Step 2970: lr=1.00E-05, loss= 1.3294 (max= 2.3032), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:22,313 - root - INFO - Step 2970: lr=1.00E-05, loss= 1.3294 (max= 2.3032), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:22,313 - root - INFO - Step 2970: lr=1.00E-05, loss= 1.3294 (max= 2.3032), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:22,313 - root - INFO - Step 2970: lr=1.00E-05, loss= 1.3294 (max= 2.3032), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:40,328 - root - INFO - Step 2980: lr=1.00E-05, loss= 1.3484 (max= 2.4910), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:40,328 - root - INFO - Step 2980: lr=1.00E-05, loss= 1.3484 (max= 2.4910), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:40,328 - root - INFO - Step 2980: lr=1.00E-05, loss= 1.3484 (max= 2.4910), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:40,328 - root - INFO - Step 2980: lr=1.00E-05, loss= 1.3484 (max= 2.4910), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:40,328 - root - INFO - Step 2980: lr=1.00E-05, loss= 1.3484 (max= 2.4910), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:40,328 - root - INFO - Step 2980: lr=1.00E-05, loss= 1.3484 (max= 2.4910), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:40,328 - root - INFO - Step 2980: lr=1.00E-05, loss= 1.3484 (max= 2.4910), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:40,328 - root - INFO - Step 2980: lr=1.00E-05, loss= 1.3484 (max= 2.4910), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:58,372 - root - INFO - Step 2990: lr=1.00E-05, loss= 1.3540 (max= 2.6056), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:58,372 - root - INFO - Step 2990: lr=1.00E-05, loss= 1.3540 (max= 2.6056), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:58,372 - root - INFO - Step 2990: lr=1.00E-05, loss= 1.3540 (max= 2.6056), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:58,373 - root - INFO - Step 2990: lr=1.00E-05, loss= 1.3540 (max= 2.6056), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:58,373 - root - INFO - Step 2990: lr=1.00E-05, loss= 1.3540 (max= 2.6056), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:58,373 - root - INFO - Step 2990: lr=1.00E-05, loss= 1.3540 (max= 2.6056), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:58,373 - root - INFO - Step 2990: lr=1.00E-05, loss= 1.3540 (max= 2.6056), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:53:58,373 - root - INFO - Step 2990: lr=1.00E-05, loss= 1.3540 (max= 2.6056), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-3000 +2025-10-24 10:54:16,392 - root - INFO - Step 3000: lr=1.00E-05, loss= 1.3873 (max= 3.5477), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:54:16,392 - root - INFO - Step 3000: lr=1.00E-05, loss= 1.3873 (max= 3.5477), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:54:16,392 - root - INFO - Step 3000: lr=1.00E-05, loss= 1.3873 (max= 3.5477), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:54:16,392 - root - INFO - Saving a full checkpoint at step 3000 +2025-10-24 10:54:16,392 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 10:54:16,392 - root - INFO - Saving a full checkpoint at step 3000 +2025-10-24 10:54:16,392 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 10:54:16,392 - root - INFO - Saving a full checkpoint at step 3000 +2025-10-24 10:54:16,392 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 10:54:16,393 - root - INFO - Step 3000: lr=1.00E-05, loss= 1.3873 (max= 3.5477), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:54:16,393 - root - INFO - Saving a full checkpoint at step 3000 +2025-10-24 10:54:16,393 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 10:54:16,393 - root - INFO - Step 3000: lr=1.00E-05, loss= 1.3873 (max= 3.5477), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:54:16,393 - root - INFO - Step 3000: lr=1.00E-05, loss= 1.3873 (max= 3.5477), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:54:16,393 - root - INFO - Step 3000: lr=1.00E-05, loss= 1.3873 (max= 3.5477), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:54:16,393 - root - INFO - Saving a full checkpoint at step 3000 +2025-10-24 10:54:16,393 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 10:54:16,393 - root - INFO - Saving a full checkpoint at step 3000 +2025-10-24 10:54:16,393 - root - INFO - Step 3000: lr=1.00E-05, loss= 1.3873 (max= 3.5477), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:54:16,393 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 10:54:16,393 - root - INFO - Saving a full checkpoint at step 3000 +2025-10-24 10:54:16,393 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 10:54:16,393 - root - INFO - Saving a full checkpoint at step 3000 +2025-10-24 10:54:16,393 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-3000! Save time: 4.3983683586120605 +2025-10-24 10:54:31,122 - root - INFO - Finished saving the checkpoint in 14.73 seconds +2025-10-24 10:54:31,130 - root - INFO - Finished saving the checkpoint in 14.74 seconds +2025-10-24 10:54:31,130 - root - INFO - Finished saving the checkpoint in 14.74 seconds +2025-10-24 10:54:31,130 - root - INFO - Finished saving the checkpoint in 14.74 seconds +2025-10-24 10:54:31,130 - root - INFO - Finished saving the checkpoint in 14.74 seconds +2025-10-24 10:54:31,131 - root - INFO - Finished saving the checkpoint in 14.74 seconds +2025-10-24 10:54:31,131 - root - INFO - Finished saving the checkpoint in 14.74 seconds +2025-10-24 10:54:31,131 - root - INFO - Finished saving the checkpoint in 14.74 seconds +2025-10-24 10:54:49,071 - root - INFO - Step 3010: lr=1.00E-05, loss= 1.3919 (max= 2.7600), tps=10028, mfu=20.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 10:54:49,071 - root - INFO - Step 3010: lr=1.00E-05, loss= 1.3919 (max= 2.7600), tps=10028, mfu=20.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 10:54:49,071 - root - INFO - Step 3010: lr=1.00E-05, loss= 1.3919 (max= 2.7600), tps=10028, mfu=20.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 10:54:49,071 - root - INFO - Step 3010: lr=1.00E-05, loss= 1.3919 (max= 2.7600), tps=10029, mfu=20.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 10:54:49,071 - root - INFO - Step 3010: lr=1.00E-05, loss= 1.3919 (max= 2.7600), tps=10029, mfu=20.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 10:54:49,071 - root - INFO - Step 3010: lr=1.00E-05, loss= 1.3919 (max= 2.7600), tps=10029, mfu=20.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 10:54:49,071 - root - INFO - Step 3010: lr=1.00E-05, loss= 1.3919 (max= 2.7600), tps=10029, mfu=20.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 10:54:49,071 - root - INFO - Step 3010: lr=1.00E-05, loss= 1.3919 (max= 2.7600), tps=10029, mfu=20.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 10:55:07,054 - root - INFO - Step 3020: lr=1.00E-05, loss= 1.3839 (max= 3.0679), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:55:07,054 - root - INFO - Step 3020: lr=1.00E-05, loss= 1.3839 (max= 3.0679), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:55:07,054 - root - INFO - Step 3020: lr=1.00E-05, loss= 1.3839 (max= 3.0679), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:55:07,054 - root - INFO - Step 3020: lr=1.00E-05, loss= 1.3839 (max= 3.0679), tps=18226, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:55:07,054 - root - INFO - Step 3020: lr=1.00E-05, loss= 1.3839 (max= 3.0679), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:55:07,054 - root - INFO - Step 3020: lr=1.00E-05, loss= 1.3839 (max= 3.0679), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:55:07,054 - root - INFO - Step 3020: lr=1.00E-05, loss= 1.3839 (max= 3.0679), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:55:07,054 - root - INFO - Step 3020: lr=1.00E-05, loss= 1.3839 (max= 3.0679), tps=18226, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:55:25,072 - root - INFO - Step 3030: lr=1.00E-05, loss= 1.3689 (max= 2.3415), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:55:25,072 - root - INFO - Step 3030: lr=1.00E-05, loss= 1.3689 (max= 2.3415), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:55:25,072 - root - INFO - Step 3030: lr=1.00E-05, loss= 1.3689 (max= 2.3415), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:55:25,072 - root - INFO - Step 3030: lr=1.00E-05, loss= 1.3689 (max= 2.3415), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:55:25,072 - root - INFO - Step 3030: lr=1.00E-05, loss= 1.3689 (max= 2.3415), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:55:25,072 - root - INFO - Step 3030: lr=1.00E-05, loss= 1.3689 (max= 2.3415), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:55:25,072 - root - INFO - Step 3030: lr=1.00E-05, loss= 1.3689 (max= 2.3415), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:55:25,072 - root - INFO - Step 3030: lr=1.00E-05, loss= 1.3689 (max= 2.3415), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:55:43,088 - root - INFO - Step 3040: lr=1.00E-05, loss= 1.3644 (max= 2.4918), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:55:43,088 - root - INFO - Step 3040: lr=1.00E-05, loss= 1.3644 (max= 2.4918), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:55:43,088 - root - INFO - Step 3040: lr=1.00E-05, loss= 1.3644 (max= 2.4918), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:55:43,088 - root - INFO - Step 3040: lr=1.00E-05, loss= 1.3644 (max= 2.4918), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:55:43,088 - root - INFO - Step 3040: lr=1.00E-05, loss= 1.3644 (max= 2.4918), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:55:43,088 - root - INFO - Step 3040: lr=1.00E-05, loss= 1.3644 (max= 2.4918), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:55:43,088 - root - INFO - Step 3040: lr=1.00E-05, loss= 1.3644 (max= 2.4918), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:55:43,088 - root - INFO - Step 3040: lr=1.00E-05, loss= 1.3644 (max= 2.4918), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:01,106 - root - INFO - Step 3050: lr=1.00E-05, loss= 1.3796 (max= 2.5906), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:56:01,106 - root - INFO - Step 3050: lr=1.00E-05, loss= 1.3796 (max= 2.5906), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:56:01,106 - root - INFO - Step 3050: lr=1.00E-05, loss= 1.3796 (max= 2.5906), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:56:01,106 - root - INFO - Step 3050: lr=1.00E-05, loss= 1.3796 (max= 2.5906), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:56:01,106 - root - INFO - Step 3050: lr=1.00E-05, loss= 1.3796 (max= 2.5906), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:56:01,106 - root - INFO - Step 3050: lr=1.00E-05, loss= 1.3796 (max= 2.5906), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:56:01,106 - root - INFO - Step 3050: lr=1.00E-05, loss= 1.3796 (max= 2.5906), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:56:01,107 - root - INFO - Step 3050: lr=1.00E-05, loss= 1.3796 (max= 2.5906), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:56:19,131 - root - INFO - Step 3060: lr=1.00E-05, loss= 1.3538 (max= 2.6736), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:19,131 - root - INFO - Step 3060: lr=1.00E-05, loss= 1.3538 (max= 2.6736), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:19,131 - root - INFO - Step 3060: lr=1.00E-05, loss= 1.3538 (max= 2.6736), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:19,131 - root - INFO - Step 3060: lr=1.00E-05, loss= 1.3538 (max= 2.6736), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:19,131 - root - INFO - Step 3060: lr=1.00E-05, loss= 1.3538 (max= 2.6736), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:19,131 - root - INFO - Step 3060: lr=1.00E-05, loss= 1.3538 (max= 2.6736), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:19,131 - root - INFO - Step 3060: lr=1.00E-05, loss= 1.3538 (max= 2.6736), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:19,131 - root - INFO - Step 3060: lr=1.00E-05, loss= 1.3538 (max= 2.6736), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:37,134 - root - INFO - Step 3070: lr=1.00E-05, loss= 1.3562 (max= 2.3969), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:37,134 - root - INFO - Step 3070: lr=1.00E-05, loss= 1.3562 (max= 2.3969), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:37,134 - root - INFO - Step 3070: lr=1.00E-05, loss= 1.3562 (max= 2.3969), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:37,134 - root - INFO - Step 3070: lr=1.00E-05, loss= 1.3562 (max= 2.3969), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:37,134 - root - INFO - Step 3070: lr=1.00E-05, loss= 1.3562 (max= 2.3969), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:37,134 - root - INFO - Step 3070: lr=1.00E-05, loss= 1.3562 (max= 2.3969), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:37,134 - root - INFO - Step 3070: lr=1.00E-05, loss= 1.3562 (max= 2.3969), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:37,135 - root - INFO - Step 3070: lr=1.00E-05, loss= 1.3562 (max= 2.3969), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:55,143 - root - INFO - Step 3080: lr=1.00E-05, loss= 1.3513 (max= 2.6894), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:55,143 - root - INFO - Step 3080: lr=1.00E-05, loss= 1.3513 (max= 2.6894), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:55,143 - root - INFO - Step 3080: lr=1.00E-05, loss= 1.3513 (max= 2.6894), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:55,144 - root - INFO - Step 3080: lr=1.00E-05, loss= 1.3513 (max= 2.6894), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:55,144 - root - INFO - Step 3080: lr=1.00E-05, loss= 1.3513 (max= 2.6894), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:55,144 - root - INFO - Step 3080: lr=1.00E-05, loss= 1.3513 (max= 2.6894), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:55,144 - root - INFO - Step 3080: lr=1.00E-05, loss= 1.3513 (max= 2.6894), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:56:55,144 - root - INFO - Step 3080: lr=1.00E-05, loss= 1.3513 (max= 2.6894), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:13,193 - root - INFO - Step 3090: lr=1.00E-05, loss= 1.4070 (max= 2.4176), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:13,193 - root - INFO - Step 3090: lr=1.00E-05, loss= 1.4070 (max= 2.4176), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:13,193 - root - INFO - Step 3090: lr=1.00E-05, loss= 1.4070 (max= 2.4176), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:13,193 - root - INFO - Step 3090: lr=1.00E-05, loss= 1.4070 (max= 2.4176), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:13,193 - root - INFO - Step 3090: lr=1.00E-05, loss= 1.4070 (max= 2.4176), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:13,193 - root - INFO - Step 3090: lr=1.00E-05, loss= 1.4070 (max= 2.4176), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:13,193 - root - INFO - Step 3090: lr=1.00E-05, loss= 1.4070 (max= 2.4176), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:13,193 - root - INFO - Step 3090: lr=1.00E-05, loss= 1.4070 (max= 2.4176), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:31,218 - root - INFO - Step 3100: lr=1.00E-05, loss= 1.3473 (max= 2.5110), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:31,218 - root - INFO - Step 3100: lr=1.00E-05, loss= 1.3473 (max= 2.5110), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:31,218 - root - INFO - Step 3100: lr=1.00E-05, loss= 1.3473 (max= 2.5110), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:31,218 - root - INFO - Step 3100: lr=1.00E-05, loss= 1.3473 (max= 2.5110), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:31,218 - root - INFO - Step 3100: lr=1.00E-05, loss= 1.3473 (max= 2.5110), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:31,218 - root - INFO - Step 3100: lr=1.00E-05, loss= 1.3473 (max= 2.5110), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:31,218 - root - INFO - Step 3100: lr=1.00E-05, loss= 1.3473 (max= 2.5110), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:31,218 - root - INFO - Step 3100: lr=1.00E-05, loss= 1.3473 (max= 2.5110), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:48,065 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:4818066 +2025-10-24 10:57:49,234 - root - INFO - Step 3110: lr=1.00E-05, loss= 1.4073 (max= 2.5509), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:49,234 - root - INFO - Step 3110: lr=1.00E-05, loss= 1.4073 (max= 2.5509), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:49,234 - root - INFO - Step 3110: lr=1.00E-05, loss= 1.4073 (max= 2.5509), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:49,234 - root - INFO - Step 3110: lr=1.00E-05, loss= 1.4073 (max= 2.5509), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:49,234 - root - INFO - Step 3110: lr=1.00E-05, loss= 1.4073 (max= 2.5509), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:49,234 - root - INFO - Step 3110: lr=1.00E-05, loss= 1.4073 (max= 2.5509), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:49,234 - root - INFO - Step 3110: lr=1.00E-05, loss= 1.4073 (max= 2.5509), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:57:49,234 - root - INFO - Step 3110: lr=1.00E-05, loss= 1.4073 (max= 2.5509), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:58:07,250 - root - INFO - Step 3120: lr=1.00E-05, loss= 1.3853 (max= 2.4259), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:58:07,250 - root - INFO - Step 3120: lr=1.00E-05, loss= 1.3853 (max= 2.4259), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:58:07,251 - root - INFO - Step 3120: lr=1.00E-05, loss= 1.3853 (max= 2.4259), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:58:07,251 - root - INFO - Step 3120: lr=1.00E-05, loss= 1.3853 (max= 2.4259), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:58:07,251 - root - INFO - Step 3120: lr=1.00E-05, loss= 1.3853 (max= 2.4259), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:58:07,251 - root - INFO - Step 3120: lr=1.00E-05, loss= 1.3853 (max= 2.4259), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:58:07,251 - root - INFO - Step 3120: lr=1.00E-05, loss= 1.3853 (max= 2.4259), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:58:07,251 - root - INFO - Step 3120: lr=1.00E-05, loss= 1.3853 (max= 2.4259), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:58:25,257 - root - INFO - Step 3130: lr=1.00E-05, loss= 1.3705 (max= 2.4904), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:58:25,257 - root - INFO - Step 3130: lr=1.00E-05, loss= 1.3705 (max= 2.4904), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:58:25,257 - root - INFO - Step 3130: lr=1.00E-05, loss= 1.3705 (max= 2.4904), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:58:25,257 - root - INFO - Step 3130: lr=1.00E-05, loss= 1.3705 (max= 2.4904), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:58:25,257 - root - INFO - Step 3130: lr=1.00E-05, loss= 1.3705 (max= 2.4904), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:58:25,257 - root - INFO - Step 3130: lr=1.00E-05, loss= 1.3705 (max= 2.4904), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:58:25,257 - root - INFO - Step 3130: lr=1.00E-05, loss= 1.3705 (max= 2.4904), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:58:25,257 - root - INFO - Step 3130: lr=1.00E-05, loss= 1.3705 (max= 2.4904), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:58:43,272 - root - INFO - Step 3140: lr=1.00E-05, loss= 1.3702 (max= 3.0054), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:58:43,272 - root - INFO - Step 3140: lr=1.00E-05, loss= 1.3702 (max= 3.0054), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:58:43,272 - root - INFO - Step 3140: lr=1.00E-05, loss= 1.3702 (max= 3.0054), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:58:43,272 - root - INFO - Step 3140: lr=1.00E-05, loss= 1.3702 (max= 3.0054), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:58:43,272 - root - INFO - Step 3140: lr=1.00E-05, loss= 1.3702 (max= 3.0054), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:58:43,272 - root - INFO - Step 3140: lr=1.00E-05, loss= 1.3702 (max= 3.0054), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:58:43,272 - root - INFO - Step 3140: lr=1.00E-05, loss= 1.3702 (max= 3.0054), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:58:43,272 - root - INFO - Step 3140: lr=1.00E-05, loss= 1.3702 (max= 3.0054), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:01,313 - root - INFO - Step 3150: lr=1.00E-05, loss= 1.3861 (max= 2.0415), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:01,313 - root - INFO - Step 3150: lr=1.00E-05, loss= 1.3861 (max= 2.0415), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:01,313 - root - INFO - Step 3150: lr=1.00E-05, loss= 1.3861 (max= 2.0415), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:01,313 - root - INFO - Step 3150: lr=1.00E-05, loss= 1.3861 (max= 2.0415), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:01,313 - root - INFO - Step 3150: lr=1.00E-05, loss= 1.3861 (max= 2.0415), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:01,313 - root - INFO - Step 3150: lr=1.00E-05, loss= 1.3861 (max= 2.0415), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:01,313 - root - INFO - Step 3150: lr=1.00E-05, loss= 1.3861 (max= 2.0415), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:01,313 - root - INFO - Step 3150: lr=1.00E-05, loss= 1.3861 (max= 2.0415), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:19,336 - root - INFO - Step 3160: lr=1.00E-05, loss= 1.3658 (max= 2.7075), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:19,336 - root - INFO - Step 3160: lr=1.00E-05, loss= 1.3658 (max= 2.7075), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:19,336 - root - INFO - Step 3160: lr=1.00E-05, loss= 1.3658 (max= 2.7075), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:19,336 - root - INFO - Step 3160: lr=1.00E-05, loss= 1.3658 (max= 2.7075), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:19,337 - root - INFO - Step 3160: lr=1.00E-05, loss= 1.3658 (max= 2.7075), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:19,337 - root - INFO - Step 3160: lr=1.00E-05, loss= 1.3658 (max= 2.7075), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:19,337 - root - INFO - Step 3160: lr=1.00E-05, loss= 1.3658 (max= 2.7075), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:19,337 - root - INFO - Step 3160: lr=1.00E-05, loss= 1.3658 (max= 2.7075), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:37,371 - root - INFO - Step 3170: lr=1.00E-05, loss= 1.3829 (max= 2.5304), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:59:37,371 - root - INFO - Step 3170: lr=1.00E-05, loss= 1.3829 (max= 2.5304), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:59:37,371 - root - INFO - Step 3170: lr=1.00E-05, loss= 1.3829 (max= 2.5304), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:59:37,371 - root - INFO - Step 3170: lr=1.00E-05, loss= 1.3829 (max= 2.5304), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:59:37,371 - root - INFO - Step 3170: lr=1.00E-05, loss= 1.3829 (max= 2.5304), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:59:37,371 - root - INFO - Step 3170: lr=1.00E-05, loss= 1.3829 (max= 2.5304), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:59:37,371 - root - INFO - Step 3170: lr=1.00E-05, loss= 1.3829 (max= 2.5304), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:59:37,371 - root - INFO - Step 3170: lr=1.00E-05, loss= 1.3829 (max= 2.5304), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 10:59:55,415 - root - INFO - Step 3180: lr=1.00E-05, loss= 1.3700 (max= 2.8380), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:55,415 - root - INFO - Step 3180: lr=1.00E-05, loss= 1.3700 (max= 2.8380), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:55,415 - root - INFO - Step 3180: lr=1.00E-05, loss= 1.3700 (max= 2.8380), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:55,415 - root - INFO - Step 3180: lr=1.00E-05, loss= 1.3700 (max= 2.8380), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:55,415 - root - INFO - Step 3180: lr=1.00E-05, loss= 1.3700 (max= 2.8380), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:55,415 - root - INFO - Step 3180: lr=1.00E-05, loss= 1.3700 (max= 2.8380), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:55,415 - root - INFO - Step 3180: lr=1.00E-05, loss= 1.3700 (max= 2.8380), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 10:59:55,415 - root - INFO - Step 3180: lr=1.00E-05, loss= 1.3700 (max= 2.8380), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:00:13,458 - root - INFO - Step 3190: lr=1.00E-05, loss= 1.3659 (max= 2.5198), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:00:13,458 - root - INFO - Step 3190: lr=1.00E-05, loss= 1.3659 (max= 2.5198), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:00:13,458 - root - INFO - Step 3190: lr=1.00E-05, loss= 1.3659 (max= 2.5198), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:00:13,458 - root - INFO - Step 3190: lr=1.00E-05, loss= 1.3659 (max= 2.5198), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:00:13,458 - root - INFO - Step 3190: lr=1.00E-05, loss= 1.3659 (max= 2.5198), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:00:13,459 - root - INFO - Step 3190: lr=1.00E-05, loss= 1.3659 (max= 2.5198), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:00:13,459 - root - INFO - Step 3190: lr=1.00E-05, loss= 1.3659 (max= 2.5198), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:00:13,459 - root - INFO - Step 3190: lr=1.00E-05, loss= 1.3659 (max= 2.5198), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:00:31,471 - root - INFO - Step 3200: lr=1.00E-05, loss= 1.3655 (max= 2.2321), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:00:31,471 - root - INFO - Step 3200: lr=1.00E-05, loss= 1.3655 (max= 2.2321), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:00:31,471 - root - INFO - Step 3200: lr=1.00E-05, loss= 1.3655 (max= 2.2321), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:00:31,471 - root - INFO - Step 3200: lr=1.00E-05, loss= 1.3655 (max= 2.2321), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:00:31,471 - root - INFO - Step 3200: lr=1.00E-05, loss= 1.3655 (max= 2.2321), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:00:31,471 - root - INFO - Step 3200: lr=1.00E-05, loss= 1.3655 (max= 2.2321), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:00:31,471 - root - INFO - Step 3200: lr=1.00E-05, loss= 1.3655 (max= 2.2321), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:00:31,471 - root - INFO - Step 3200: lr=1.00E-05, loss= 1.3655 (max= 2.2321), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:00:49,481 - root - INFO - Step 3210: lr=1.00E-05, loss= 1.3479 (max= 2.3965), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:00:49,481 - root - INFO - Step 3210: lr=1.00E-05, loss= 1.3479 (max= 2.3965), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:00:49,481 - root - INFO - Step 3210: lr=1.00E-05, loss= 1.3479 (max= 2.3965), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:00:49,481 - root - INFO - Step 3210: lr=1.00E-05, loss= 1.3479 (max= 2.3965), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:00:49,481 - root - INFO - Step 3210: lr=1.00E-05, loss= 1.3479 (max= 2.3965), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:00:49,482 - root - INFO - Step 3210: lr=1.00E-05, loss= 1.3479 (max= 2.3965), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:00:49,482 - root - INFO - Step 3210: lr=1.00E-05, loss= 1.3479 (max= 2.3965), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:00:49,482 - root - INFO - Step 3210: lr=1.00E-05, loss= 1.3479 (max= 2.3965), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:07,495 - root - INFO - Step 3220: lr=1.00E-05, loss= 1.3657 (max= 2.3828), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:07,495 - root - INFO - Step 3220: lr=1.00E-05, loss= 1.3657 (max= 2.3828), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:07,495 - root - INFO - Step 3220: lr=1.00E-05, loss= 1.3657 (max= 2.3828), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:07,495 - root - INFO - Step 3220: lr=1.00E-05, loss= 1.3657 (max= 2.3828), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:07,495 - root - INFO - Step 3220: lr=1.00E-05, loss= 1.3657 (max= 2.3828), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:07,495 - root - INFO - Step 3220: lr=1.00E-05, loss= 1.3657 (max= 2.3828), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:07,495 - root - INFO - Step 3220: lr=1.00E-05, loss= 1.3657 (max= 2.3828), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:07,495 - root - INFO - Step 3220: lr=1.00E-05, loss= 1.3657 (max= 2.3828), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:25,519 - root - INFO - Step 3230: lr=1.00E-05, loss= 1.3630 (max= 3.5211), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:25,519 - root - INFO - Step 3230: lr=1.00E-05, loss= 1.3630 (max= 3.5211), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:25,519 - root - INFO - Step 3230: lr=1.00E-05, loss= 1.3630 (max= 3.5211), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:25,519 - root - INFO - Step 3230: lr=1.00E-05, loss= 1.3630 (max= 3.5211), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:25,519 - root - INFO - Step 3230: lr=1.00E-05, loss= 1.3630 (max= 3.5211), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:25,519 - root - INFO - Step 3230: lr=1.00E-05, loss= 1.3630 (max= 3.5211), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:25,519 - root - INFO - Step 3230: lr=1.00E-05, loss= 1.3630 (max= 3.5211), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:25,519 - root - INFO - Step 3230: lr=1.00E-05, loss= 1.3630 (max= 3.5211), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:43,535 - root - INFO - Step 3240: lr=1.00E-05, loss= 1.3378 (max= 2.3578), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:43,535 - root - INFO - Step 3240: lr=1.00E-05, loss= 1.3378 (max= 2.3578), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:43,535 - root - INFO - Step 3240: lr=1.00E-05, loss= 1.3378 (max= 2.3578), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:43,535 - root - INFO - Step 3240: lr=1.00E-05, loss= 1.3378 (max= 2.3578), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:43,535 - root - INFO - Step 3240: lr=1.00E-05, loss= 1.3378 (max= 2.3578), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:43,535 - root - INFO - Step 3240: lr=1.00E-05, loss= 1.3378 (max= 2.3578), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:43,535 - root - INFO - Step 3240: lr=1.00E-05, loss= 1.3378 (max= 2.3578), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:01:43,535 - root - INFO - Step 3240: lr=1.00E-05, loss= 1.3378 (max= 2.3578), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:02:03,319 - root - INFO - Step 3250: lr=1.00E-05, loss= 1.3304 (max= 2.3059), tps=16566, mfu=34.51%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.05s, 10.03%) +2025-10-24 11:02:03,319 - root - INFO - Step 3250: lr=1.00E-05, loss= 1.3304 (max= 2.3059), tps=16565, mfu=34.51%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.05s, 10.03%) +2025-10-24 11:02:03,319 - root - INFO - Step 3250: lr=1.00E-05, loss= 1.3304 (max= 2.3059), tps=16566, mfu=34.51%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.05s, 10.03%) +2025-10-24 11:02:03,319 - root - INFO - Step 3250: lr=1.00E-05, loss= 1.3304 (max= 2.3059), tps=16565, mfu=34.51%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.05s, 10.03%) +2025-10-24 11:02:03,319 - root - INFO - Step 3250: lr=1.00E-05, loss= 1.3304 (max= 2.3059), tps=16566, mfu=34.52%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.05s, 10.03%) +2025-10-24 11:02:03,319 - root - INFO - Step 3250: lr=1.00E-05, loss= 1.3304 (max= 2.3059), tps=16566, mfu=34.52%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.05s, 10.03%) +2025-10-24 11:02:03,319 - root - INFO - Step 3250: lr=1.00E-05, loss= 1.3304 (max= 2.3059), tps=16565, mfu=34.51%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.05s, 10.03%) +2025-10-24 11:02:03,319 - root - INFO - Step 3250: lr=1.00E-05, loss= 1.3304 (max= 2.3059), tps=16566, mfu=34.52%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.05s, 10.03%) +2025-10-24 11:02:21,354 - root - INFO - Step 3260: lr=1.00E-05, loss= 1.3752 (max= 2.5552), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:02:21,355 - root - INFO - Step 3260: lr=1.00E-05, loss= 1.3752 (max= 2.5552), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:02:21,355 - root - INFO - Step 3260: lr=1.00E-05, loss= 1.3752 (max= 2.5552), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:02:21,355 - root - INFO - Step 3260: lr=1.00E-05, loss= 1.3752 (max= 2.5552), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:02:21,355 - root - INFO - Step 3260: lr=1.00E-05, loss= 1.3752 (max= 2.5552), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:02:21,355 - root - INFO - Step 3260: lr=1.00E-05, loss= 1.3752 (max= 2.5552), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:02:21,355 - root - INFO - Step 3260: lr=1.00E-05, loss= 1.3752 (max= 2.5552), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:02:21,355 - root - INFO - Step 3260: lr=1.00E-05, loss= 1.3752 (max= 2.5552), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:02:39,365 - root - INFO - Step 3270: lr=1.00E-05, loss= 1.3676 (max= 2.4211), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:02:39,365 - root - INFO - Step 3270: lr=1.00E-05, loss= 1.3676 (max= 2.4211), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:02:39,365 - root - INFO - Step 3270: lr=1.00E-05, loss= 1.3676 (max= 2.4211), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:02:39,365 - root - INFO - Step 3270: lr=1.00E-05, loss= 1.3676 (max= 2.4211), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:02:39,365 - root - INFO - Step 3270: lr=1.00E-05, loss= 1.3676 (max= 2.4211), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:02:39,365 - root - INFO - Step 3270: lr=1.00E-05, loss= 1.3676 (max= 2.4211), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:02:39,365 - root - INFO - Step 3270: lr=1.00E-05, loss= 1.3676 (max= 2.4211), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:02:39,365 - root - INFO - Step 3270: lr=1.00E-05, loss= 1.3676 (max= 2.4211), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:02:57,371 - root - INFO - Step 3280: lr=1.00E-05, loss= 1.3543 (max= 2.5336), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:02:57,371 - root - INFO - Step 3280: lr=1.00E-05, loss= 1.3543 (max= 2.5336), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:02:57,371 - root - INFO - Step 3280: lr=1.00E-05, loss= 1.3543 (max= 2.5336), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:02:57,371 - root - INFO - Step 3280: lr=1.00E-05, loss= 1.3543 (max= 2.5336), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:02:57,371 - root - INFO - Step 3280: lr=1.00E-05, loss= 1.3543 (max= 2.5336), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:02:57,372 - root - INFO - Step 3280: lr=1.00E-05, loss= 1.3543 (max= 2.5336), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:02:57,372 - root - INFO - Step 3280: lr=1.00E-05, loss= 1.3543 (max= 2.5336), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:02:57,372 - root - INFO - Step 3280: lr=1.00E-05, loss= 1.3543 (max= 2.5336), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:03:15,373 - root - INFO - Step 3290: lr=1.00E-05, loss= 1.3612 (max= 2.5774), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:03:15,373 - root - INFO - Step 3290: lr=1.00E-05, loss= 1.3612 (max= 2.5774), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:03:15,373 - root - INFO - Step 3290: lr=1.00E-05, loss= 1.3612 (max= 2.5774), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:03:15,373 - root - INFO - Step 3290: lr=1.00E-05, loss= 1.3612 (max= 2.5774), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:03:15,373 - root - INFO - Step 3290: lr=1.00E-05, loss= 1.3612 (max= 2.5774), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:03:15,373 - root - INFO - Step 3290: lr=1.00E-05, loss= 1.3612 (max= 2.5774), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:03:15,373 - root - INFO - Step 3290: lr=1.00E-05, loss= 1.3612 (max= 2.5774), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:03:15,373 - root - INFO - Step 3290: lr=1.00E-05, loss= 1.3612 (max= 2.5774), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:03:33,406 - root - INFO - Step 3300: lr=1.00E-05, loss= 1.3531 (max= 2.3024), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:03:33,406 - root - INFO - Step 3300: lr=1.00E-05, loss= 1.3531 (max= 2.3024), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:03:33,406 - root - INFO - Step 3300: lr=1.00E-05, loss= 1.3531 (max= 2.3024), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:03:33,406 - root - INFO - Step 3300: lr=1.00E-05, loss= 1.3531 (max= 2.3024), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:03:33,407 - root - INFO - Step 3300: lr=1.00E-05, loss= 1.3531 (max= 2.3024), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:03:33,407 - root - INFO - Step 3300: lr=1.00E-05, loss= 1.3531 (max= 2.3024), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:03:33,407 - root - INFO - Step 3300: lr=1.00E-05, loss= 1.3531 (max= 2.3024), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:03:33,407 - root - INFO - Step 3300: lr=1.00E-05, loss= 1.3531 (max= 2.3024), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:03:51,441 - root - INFO - Step 3310: lr=1.00E-05, loss= 1.3537 (max= 2.5734), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:03:51,441 - root - INFO - Step 3310: lr=1.00E-05, loss= 1.3537 (max= 2.5734), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:03:51,441 - root - INFO - Step 3310: lr=1.00E-05, loss= 1.3537 (max= 2.5734), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:03:51,441 - root - INFO - Step 3310: lr=1.00E-05, loss= 1.3537 (max= 2.5734), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:03:51,441 - root - INFO - Step 3310: lr=1.00E-05, loss= 1.3537 (max= 2.5734), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:03:51,441 - root - INFO - Step 3310: lr=1.00E-05, loss= 1.3537 (max= 2.5734), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:03:51,441 - root - INFO - Step 3310: lr=1.00E-05, loss= 1.3537 (max= 2.5734), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:03:51,441 - root - INFO - Step 3310: lr=1.00E-05, loss= 1.3537 (max= 2.5734), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:09,456 - root - INFO - Step 3320: lr=1.00E-05, loss= 1.3653 (max= 3.3018), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:09,456 - root - INFO - Step 3320: lr=1.00E-05, loss= 1.3653 (max= 3.3018), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:09,456 - root - INFO - Step 3320: lr=1.00E-05, loss= 1.3653 (max= 3.3018), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:09,456 - root - INFO - Step 3320: lr=1.00E-05, loss= 1.3653 (max= 3.3018), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:09,456 - root - INFO - Step 3320: lr=1.00E-05, loss= 1.3653 (max= 3.3018), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:09,456 - root - INFO - Step 3320: lr=1.00E-05, loss= 1.3653 (max= 3.3018), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:09,456 - root - INFO - Step 3320: lr=1.00E-05, loss= 1.3653 (max= 3.3018), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:09,456 - root - INFO - Step 3320: lr=1.00E-05, loss= 1.3653 (max= 3.3018), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:27,454 - root - INFO - Step 3330: lr=1.00E-05, loss= 1.3334 (max= 2.4813), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:27,454 - root - INFO - Step 3330: lr=1.00E-05, loss= 1.3334 (max= 2.4813), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:27,454 - root - INFO - Step 3330: lr=1.00E-05, loss= 1.3334 (max= 2.4813), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:27,454 - root - INFO - Step 3330: lr=1.00E-05, loss= 1.3334 (max= 2.4813), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:27,454 - root - INFO - Step 3330: lr=1.00E-05, loss= 1.3334 (max= 2.4813), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:27,454 - root - INFO - Step 3330: lr=1.00E-05, loss= 1.3334 (max= 2.4813), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:27,454 - root - INFO - Step 3330: lr=1.00E-05, loss= 1.3334 (max= 2.4813), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:27,455 - root - INFO - Step 3330: lr=1.00E-05, loss= 1.3334 (max= 2.4813), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:45,456 - root - INFO - Step 3340: lr=1.00E-05, loss= 1.3940 (max= 2.4873), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:45,456 - root - INFO - Step 3340: lr=1.00E-05, loss= 1.3940 (max= 2.4873), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:45,456 - root - INFO - Step 3340: lr=1.00E-05, loss= 1.3940 (max= 2.4873), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:45,456 - root - INFO - Step 3340: lr=1.00E-05, loss= 1.3940 (max= 2.4873), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:45,456 - root - INFO - Step 3340: lr=1.00E-05, loss= 1.3940 (max= 2.4873), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:45,457 - root - INFO - Step 3340: lr=1.00E-05, loss= 1.3940 (max= 2.4873), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:45,456 - root - INFO - Step 3340: lr=1.00E-05, loss= 1.3940 (max= 2.4873), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:04:45,457 - root - INFO - Step 3340: lr=1.00E-05, loss= 1.3940 (max= 2.4873), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:03,472 - root - INFO - Step 3350: lr=1.00E-05, loss= 1.3100 (max= 2.3986), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:03,472 - root - INFO - Step 3350: lr=1.00E-05, loss= 1.3100 (max= 2.3986), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:03,472 - root - INFO - Step 3350: lr=1.00E-05, loss= 1.3100 (max= 2.3986), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:03,472 - root - INFO - Step 3350: lr=1.00E-05, loss= 1.3100 (max= 2.3986), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:03,472 - root - INFO - Step 3350: lr=1.00E-05, loss= 1.3100 (max= 2.3986), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:03,472 - root - INFO - Step 3350: lr=1.00E-05, loss= 1.3100 (max= 2.3986), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:03,472 - root - INFO - Step 3350: lr=1.00E-05, loss= 1.3100 (max= 2.3986), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:03,472 - root - INFO - Step 3350: lr=1.00E-05, loss= 1.3100 (max= 2.3986), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:21,655 - root - INFO - Step 3360: lr=1.00E-05, loss= 1.3281 (max= 2.5506), tps=18024, mfu=37.55%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:21,655 - root - INFO - Step 3360: lr=1.00E-05, loss= 1.3281 (max= 2.5506), tps=18025, mfu=37.55%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:21,655 - root - INFO - Step 3360: lr=1.00E-05, loss= 1.3281 (max= 2.5506), tps=18025, mfu=37.55%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:21,655 - root - INFO - Step 3360: lr=1.00E-05, loss= 1.3281 (max= 2.5506), tps=18024, mfu=37.55%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:21,655 - root - INFO - Step 3360: lr=1.00E-05, loss= 1.3281 (max= 2.5506), tps=18025, mfu=37.55%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:21,655 - root - INFO - Step 3360: lr=1.00E-05, loss= 1.3281 (max= 2.5506), tps=18025, mfu=37.55%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:21,655 - root - INFO - Step 3360: lr=1.00E-05, loss= 1.3281 (max= 2.5506), tps=18025, mfu=37.55%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:21,655 - root - INFO - Step 3360: lr=1.00E-05, loss= 1.3281 (max= 2.5506), tps=18025, mfu=37.55%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:39,672 - root - INFO - Step 3370: lr=1.00E-05, loss= 1.3370 (max= 3.7468), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:05:39,673 - root - INFO - Step 3370: lr=1.00E-05, loss= 1.3370 (max= 3.7468), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:05:39,673 - root - INFO - Step 3370: lr=1.00E-05, loss= 1.3370 (max= 3.7468), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:05:39,673 - root - INFO - Step 3370: lr=1.00E-05, loss= 1.3370 (max= 3.7468), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:05:39,673 - root - INFO - Step 3370: lr=1.00E-05, loss= 1.3370 (max= 3.7468), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:05:39,673 - root - INFO - Step 3370: lr=1.00E-05, loss= 1.3370 (max= 3.7468), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:05:39,673 - root - INFO - Step 3370: lr=1.00E-05, loss= 1.3370 (max= 3.7468), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:05:39,673 - root - INFO - Step 3370: lr=1.00E-05, loss= 1.3370 (max= 3.7468), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:05:57,677 - root - INFO - Step 3380: lr=1.00E-05, loss= 1.3385 (max= 2.6634), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:57,677 - root - INFO - Step 3380: lr=1.00E-05, loss= 1.3385 (max= 2.6634), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:57,677 - root - INFO - Step 3380: lr=1.00E-05, loss= 1.3385 (max= 2.6634), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:57,677 - root - INFO - Step 3380: lr=1.00E-05, loss= 1.3385 (max= 2.6634), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:57,677 - root - INFO - Step 3380: lr=1.00E-05, loss= 1.3385 (max= 2.6634), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:57,677 - root - INFO - Step 3380: lr=1.00E-05, loss= 1.3385 (max= 2.6634), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:57,678 - root - INFO - Step 3380: lr=1.00E-05, loss= 1.3385 (max= 2.6634), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:05:57,678 - root - INFO - Step 3380: lr=1.00E-05, loss= 1.3385 (max= 2.6634), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:06:15,675 - root - INFO - Step 3390: lr=1.00E-05, loss= 1.3131 (max= 2.6352), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:06:15,675 - root - INFO - Step 3390: lr=1.00E-05, loss= 1.3131 (max= 2.6352), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:06:15,675 - root - INFO - Step 3390: lr=1.00E-05, loss= 1.3131 (max= 2.6352), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:06:15,675 - root - INFO - Step 3390: lr=1.00E-05, loss= 1.3131 (max= 2.6352), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:06:15,675 - root - INFO - Step 3390: lr=1.00E-05, loss= 1.3131 (max= 2.6352), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:06:15,675 - root - INFO - Step 3390: lr=1.00E-05, loss= 1.3131 (max= 2.6352), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:06:15,675 - root - INFO - Step 3390: lr=1.00E-05, loss= 1.3131 (max= 2.6352), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:06:15,675 - root - INFO - Step 3390: lr=1.00E-05, loss= 1.3131 (max= 2.6352), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:06:33,690 - root - INFO - Step 3400: lr=1.00E-05, loss= 1.3602 (max= 2.3776), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:06:33,690 - root - INFO - Step 3400: lr=1.00E-05, loss= 1.3602 (max= 2.3776), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:06:33,690 - root - INFO - Step 3400: lr=1.00E-05, loss= 1.3602 (max= 2.3776), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:06:33,690 - root - INFO - Step 3400: lr=1.00E-05, loss= 1.3602 (max= 2.3776), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:06:33,691 - root - INFO - Step 3400: lr=1.00E-05, loss= 1.3602 (max= 2.3776), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:06:33,691 - root - INFO - Step 3400: lr=1.00E-05, loss= 1.3602 (max= 2.3776), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:06:33,691 - root - INFO - Step 3400: lr=1.00E-05, loss= 1.3602 (max= 2.3776), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:06:33,691 - root - INFO - Step 3400: lr=1.00E-05, loss= 1.3602 (max= 2.3776), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:06:51,697 - root - INFO - Step 3410: lr=1.00E-05, loss= 1.3473 (max= 2.6059), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:06:51,698 - root - INFO - Step 3410: lr=1.00E-05, loss= 1.3473 (max= 2.6059), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:06:51,698 - root - INFO - Step 3410: lr=1.00E-05, loss= 1.3473 (max= 2.6059), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:06:51,698 - root - INFO - Step 3410: lr=1.00E-05, loss= 1.3473 (max= 2.6059), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:06:51,698 - root - INFO - Step 3410: lr=1.00E-05, loss= 1.3473 (max= 2.6059), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:06:51,698 - root - INFO - Step 3410: lr=1.00E-05, loss= 1.3473 (max= 2.6059), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:06:51,698 - root - INFO - Step 3410: lr=1.00E-05, loss= 1.3473 (max= 2.6059), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:06:51,698 - root - INFO - Step 3410: lr=1.00E-05, loss= 1.3473 (max= 2.6059), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:09,714 - root - INFO - Step 3420: lr=1.00E-05, loss= 1.3829 (max= 2.3935), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:09,714 - root - INFO - Step 3420: lr=1.00E-05, loss= 1.3829 (max= 2.3935), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:09,714 - root - INFO - Step 3420: lr=1.00E-05, loss= 1.3829 (max= 2.3935), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:09,714 - root - INFO - Step 3420: lr=1.00E-05, loss= 1.3829 (max= 2.3935), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:09,714 - root - INFO - Step 3420: lr=1.00E-05, loss= 1.3829 (max= 2.3935), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:09,715 - root - INFO - Step 3420: lr=1.00E-05, loss= 1.3829 (max= 2.3935), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:09,715 - root - INFO - Step 3420: lr=1.00E-05, loss= 1.3829 (max= 2.3935), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:09,715 - root - INFO - Step 3420: lr=1.00E-05, loss= 1.3829 (max= 2.3935), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:27,759 - root - INFO - Step 3430: lr=1.00E-05, loss= 1.3489 (max= 2.5284), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:27,759 - root - INFO - Step 3430: lr=1.00E-05, loss= 1.3489 (max= 2.5284), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:27,759 - root - INFO - Step 3430: lr=1.00E-05, loss= 1.3489 (max= 2.5284), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:27,760 - root - INFO - Step 3430: lr=1.00E-05, loss= 1.3489 (max= 2.5284), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:27,760 - root - INFO - Step 3430: lr=1.00E-05, loss= 1.3489 (max= 2.5284), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:27,760 - root - INFO - Step 3430: lr=1.00E-05, loss= 1.3489 (max= 2.5284), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:27,760 - root - INFO - Step 3430: lr=1.00E-05, loss= 1.3489 (max= 2.5284), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:27,760 - root - INFO - Step 3430: lr=1.00E-05, loss= 1.3489 (max= 2.5284), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:45,781 - root - INFO - Step 3440: lr=1.00E-05, loss= 1.3382 (max= 2.3817), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:45,781 - root - INFO - Step 3440: lr=1.00E-05, loss= 1.3382 (max= 2.3817), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:45,781 - root - INFO - Step 3440: lr=1.00E-05, loss= 1.3382 (max= 2.3817), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:45,781 - root - INFO - Step 3440: lr=1.00E-05, loss= 1.3382 (max= 2.3817), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:45,781 - root - INFO - Step 3440: lr=1.00E-05, loss= 1.3382 (max= 2.3817), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:45,781 - root - INFO - Step 3440: lr=1.00E-05, loss= 1.3382 (max= 2.3817), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:45,781 - root - INFO - Step 3440: lr=1.00E-05, loss= 1.3382 (max= 2.3817), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:07:45,781 - root - INFO - Step 3440: lr=1.00E-05, loss= 1.3382 (max= 2.3817), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:03,792 - root - INFO - Step 3450: lr=1.00E-05, loss= 1.3662 (max= 3.6553), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:03,792 - root - INFO - Step 3450: lr=1.00E-05, loss= 1.3662 (max= 3.6553), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:03,792 - root - INFO - Step 3450: lr=1.00E-05, loss= 1.3662 (max= 3.6553), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:03,792 - root - INFO - Step 3450: lr=1.00E-05, loss= 1.3662 (max= 3.6553), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:03,792 - root - INFO - Step 3450: lr=1.00E-05, loss= 1.3662 (max= 3.6553), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:03,792 - root - INFO - Step 3450: lr=1.00E-05, loss= 1.3662 (max= 3.6553), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:03,792 - root - INFO - Step 3450: lr=1.00E-05, loss= 1.3662 (max= 3.6553), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:03,792 - root - INFO - Step 3450: lr=1.00E-05, loss= 1.3662 (max= 3.6553), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:21,844 - root - INFO - Step 3460: lr=1.00E-05, loss= 1.3481 (max= 3.6265), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:21,844 - root - INFO - Step 3460: lr=1.00E-05, loss= 1.3481 (max= 3.6265), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:21,844 - root - INFO - Step 3460: lr=1.00E-05, loss= 1.3481 (max= 3.6265), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:21,844 - root - INFO - Step 3460: lr=1.00E-05, loss= 1.3481 (max= 3.6265), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:21,844 - root - INFO - Step 3460: lr=1.00E-05, loss= 1.3481 (max= 3.6265), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:21,844 - root - INFO - Step 3460: lr=1.00E-05, loss= 1.3481 (max= 3.6265), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:21,844 - root - INFO - Step 3460: lr=1.00E-05, loss= 1.3481 (max= 3.6265), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:21,844 - root - INFO - Step 3460: lr=1.00E-05, loss= 1.3481 (max= 3.6265), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:39,907 - root - INFO - Step 3470: lr=1.00E-05, loss= 1.3430 (max= 2.2027), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:39,907 - root - INFO - Step 3470: lr=1.00E-05, loss= 1.3430 (max= 2.2027), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:39,907 - root - INFO - Step 3470: lr=1.00E-05, loss= 1.3430 (max= 2.2027), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:39,907 - root - INFO - Step 3470: lr=1.00E-05, loss= 1.3430 (max= 2.2027), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:39,907 - root - INFO - Step 3470: lr=1.00E-05, loss= 1.3430 (max= 2.2027), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:39,907 - root - INFO - Step 3470: lr=1.00E-05, loss= 1.3430 (max= 2.2027), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:39,907 - root - INFO - Step 3470: lr=1.00E-05, loss= 1.3430 (max= 2.2027), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:39,907 - root - INFO - Step 3470: lr=1.00E-05, loss= 1.3430 (max= 2.2027), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:57,934 - root - INFO - Step 3480: lr=1.00E-05, loss= 1.3713 (max= 2.5427), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:57,934 - root - INFO - Step 3480: lr=1.00E-05, loss= 1.3713 (max= 2.5427), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:57,934 - root - INFO - Step 3480: lr=1.00E-05, loss= 1.3713 (max= 2.5427), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:57,934 - root - INFO - Step 3480: lr=1.00E-05, loss= 1.3713 (max= 2.5427), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:57,934 - root - INFO - Step 3480: lr=1.00E-05, loss= 1.3713 (max= 2.5427), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:57,934 - root - INFO - Step 3480: lr=1.00E-05, loss= 1.3713 (max= 2.5427), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:57,934 - root - INFO - Step 3480: lr=1.00E-05, loss= 1.3713 (max= 2.5427), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:08:57,934 - root - INFO - Step 3480: lr=1.00E-05, loss= 1.3713 (max= 2.5427), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:15,956 - root - INFO - Step 3490: lr=1.00E-05, loss= 1.3424 (max= 2.7621), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:15,956 - root - INFO - Step 3490: lr=1.00E-05, loss= 1.3424 (max= 2.7621), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:15,956 - root - INFO - Step 3490: lr=1.00E-05, loss= 1.3424 (max= 2.7621), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:15,956 - root - INFO - Step 3490: lr=1.00E-05, loss= 1.3424 (max= 2.7621), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:15,956 - root - INFO - Step 3490: lr=1.00E-05, loss= 1.3424 (max= 2.7621), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:15,957 - root - INFO - Step 3490: lr=1.00E-05, loss= 1.3424 (max= 2.7621), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:15,957 - root - INFO - Step 3490: lr=1.00E-05, loss= 1.3424 (max= 2.7621), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:15,957 - root - INFO - Step 3490: lr=1.00E-05, loss= 1.3424 (max= 2.7621), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:33,998 - root - INFO - Step 3500: lr=1.00E-05, loss= 1.3640 (max= 2.2767), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:33,998 - root - INFO - Step 3500: lr=1.00E-05, loss= 1.3640 (max= 2.2767), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:33,998 - root - INFO - Step 3500: lr=1.00E-05, loss= 1.3640 (max= 2.2767), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:33,998 - root - INFO - Step 3500: lr=1.00E-05, loss= 1.3640 (max= 2.2767), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:33,998 - root - INFO - Step 3500: lr=1.00E-05, loss= 1.3640 (max= 2.2767), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:33,998 - root - INFO - Step 3500: lr=1.00E-05, loss= 1.3640 (max= 2.2767), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:33,998 - root - INFO - Step 3500: lr=1.00E-05, loss= 1.3640 (max= 2.2767), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:33,998 - root - INFO - Step 3500: lr=1.00E-05, loss= 1.3640 (max= 2.2767), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:52,034 - root - INFO - Step 3510: lr=1.00E-05, loss= 1.3655 (max= 2.4811), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:52,034 - root - INFO - Step 3510: lr=1.00E-05, loss= 1.3655 (max= 2.4811), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:52,034 - root - INFO - Step 3510: lr=1.00E-05, loss= 1.3655 (max= 2.4811), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:52,034 - root - INFO - Step 3510: lr=1.00E-05, loss= 1.3655 (max= 2.4811), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:52,034 - root - INFO - Step 3510: lr=1.00E-05, loss= 1.3655 (max= 2.4811), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:52,034 - root - INFO - Step 3510: lr=1.00E-05, loss= 1.3655 (max= 2.4811), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:52,034 - root - INFO - Step 3510: lr=1.00E-05, loss= 1.3655 (max= 2.4811), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:09:52,034 - root - INFO - Step 3510: lr=1.00E-05, loss= 1.3655 (max= 2.4811), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:10:10,051 - root - INFO - Step 3520: lr=1.00E-05, loss= 1.3319 (max= 2.6579), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:10:10,051 - root - INFO - Step 3520: lr=1.00E-05, loss= 1.3319 (max= 2.6579), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:10:10,051 - root - INFO - Step 3520: lr=1.00E-05, loss= 1.3319 (max= 2.6579), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:10:10,051 - root - INFO - Step 3520: lr=1.00E-05, loss= 1.3319 (max= 2.6579), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:10:10,051 - root - INFO - Step 3520: lr=1.00E-05, loss= 1.3319 (max= 2.6579), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:10:10,051 - root - INFO - Step 3520: lr=1.00E-05, loss= 1.3319 (max= 2.6579), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:10:10,051 - root - INFO - Step 3520: lr=1.00E-05, loss= 1.3319 (max= 2.6579), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:10:10,051 - root - INFO - Step 3520: lr=1.00E-05, loss= 1.3319 (max= 2.6579), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:10:28,078 - root - INFO - Step 3530: lr=1.00E-05, loss= 1.3313 (max= 2.2898), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:10:28,078 - root - INFO - Step 3530: lr=1.00E-05, loss= 1.3313 (max= 2.2898), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:10:28,078 - root - INFO - Step 3530: lr=1.00E-05, loss= 1.3313 (max= 2.2898), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:10:28,078 - root - INFO - Step 3530: lr=1.00E-05, loss= 1.3313 (max= 2.2898), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:10:28,078 - root - INFO - Step 3530: lr=1.00E-05, loss= 1.3313 (max= 2.2898), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:10:28,078 - root - INFO - Step 3530: lr=1.00E-05, loss= 1.3313 (max= 2.2898), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:10:28,078 - root - INFO - Step 3530: lr=1.00E-05, loss= 1.3313 (max= 2.2898), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:10:28,078 - root - INFO - Step 3530: lr=1.00E-05, loss= 1.3313 (max= 2.2898), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:10:46,107 - root - INFO - Step 3540: lr=1.00E-05, loss= 1.3342 (max= 2.6642), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:10:46,107 - root - INFO - Step 3540: lr=1.00E-05, loss= 1.3342 (max= 2.6642), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:10:46,107 - root - INFO - Step 3540: lr=1.00E-05, loss= 1.3342 (max= 2.6642), tps=18178, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:10:46,107 - root - INFO - Step 3540: lr=1.00E-05, loss= 1.3342 (max= 2.6642), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:10:46,107 - root - INFO - Step 3540: lr=1.00E-05, loss= 1.3342 (max= 2.6642), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:10:46,107 - root - INFO - Step 3540: lr=1.00E-05, loss= 1.3342 (max= 2.6642), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:10:46,107 - root - INFO - Step 3540: lr=1.00E-05, loss= 1.3342 (max= 2.6642), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:10:46,107 - root - INFO - Step 3540: lr=1.00E-05, loss= 1.3342 (max= 2.6642), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:04,140 - root - INFO - Step 3550: lr=1.00E-05, loss= 1.3628 (max= 3.8134), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:04,140 - root - INFO - Step 3550: lr=1.00E-05, loss= 1.3628 (max= 3.8134), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:04,140 - root - INFO - Step 3550: lr=1.00E-05, loss= 1.3628 (max= 3.8134), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:04,140 - root - INFO - Step 3550: lr=1.00E-05, loss= 1.3628 (max= 3.8134), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:04,140 - root - INFO - Step 3550: lr=1.00E-05, loss= 1.3628 (max= 3.8134), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:04,140 - root - INFO - Step 3550: lr=1.00E-05, loss= 1.3628 (max= 3.8134), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:04,140 - root - INFO - Step 3550: lr=1.00E-05, loss= 1.3628 (max= 3.8134), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:04,140 - root - INFO - Step 3550: lr=1.00E-05, loss= 1.3628 (max= 3.8134), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:22,158 - root - INFO - Step 3560: lr=1.00E-05, loss= 1.3334 (max= 2.3060), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:22,158 - root - INFO - Step 3560: lr=1.00E-05, loss= 1.3334 (max= 2.3060), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:22,158 - root - INFO - Step 3560: lr=1.00E-05, loss= 1.3334 (max= 2.3060), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:22,158 - root - INFO - Step 3560: lr=1.00E-05, loss= 1.3334 (max= 2.3060), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:22,159 - root - INFO - Step 3560: lr=1.00E-05, loss= 1.3334 (max= 2.3060), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:22,159 - root - INFO - Step 3560: lr=1.00E-05, loss= 1.3334 (max= 2.3060), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:22,159 - root - INFO - Step 3560: lr=1.00E-05, loss= 1.3334 (max= 2.3060), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:22,159 - root - INFO - Step 3560: lr=1.00E-05, loss= 1.3334 (max= 2.3060), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:40,137 - root - INFO - Step 3570: lr=1.00E-05, loss= 1.3501 (max= 2.5871), tps=18229, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:40,137 - root - INFO - Step 3570: lr=1.00E-05, loss= 1.3501 (max= 2.5871), tps=18229, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:40,137 - root - INFO - Step 3570: lr=1.00E-05, loss= 1.3501 (max= 2.5871), tps=18229, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:40,137 - root - INFO - Step 3570: lr=1.00E-05, loss= 1.3501 (max= 2.5871), tps=18230, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:40,137 - root - INFO - Step 3570: lr=1.00E-05, loss= 1.3501 (max= 2.5871), tps=18230, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:40,137 - root - INFO - Step 3570: lr=1.00E-05, loss= 1.3501 (max= 2.5871), tps=18230, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:40,137 - root - INFO - Step 3570: lr=1.00E-05, loss= 1.3501 (max= 2.5871), tps=18230, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:40,137 - root - INFO - Step 3570: lr=1.00E-05, loss= 1.3501 (max= 2.5871), tps=18230, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:58,137 - root - INFO - Step 3580: lr=1.00E-05, loss= 1.3385 (max= 3.5390), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:58,137 - root - INFO - Step 3580: lr=1.00E-05, loss= 1.3385 (max= 3.5390), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:58,137 - root - INFO - Step 3580: lr=1.00E-05, loss= 1.3385 (max= 3.5390), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:58,137 - root - INFO - Step 3580: lr=1.00E-05, loss= 1.3385 (max= 3.5390), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:58,137 - root - INFO - Step 3580: lr=1.00E-05, loss= 1.3385 (max= 3.5390), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:58,137 - root - INFO - Step 3580: lr=1.00E-05, loss= 1.3385 (max= 3.5390), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:58,137 - root - INFO - Step 3580: lr=1.00E-05, loss= 1.3385 (max= 3.5390), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:11:58,137 - root - INFO - Step 3580: lr=1.00E-05, loss= 1.3385 (max= 3.5390), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:12:16,166 - root - INFO - Step 3590: lr=1.00E-05, loss= 1.3667 (max= 2.9078), tps=18178, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:12:16,166 - root - INFO - Step 3590: lr=1.00E-05, loss= 1.3667 (max= 2.9078), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:12:16,166 - root - INFO - Step 3590: lr=1.00E-05, loss= 1.3667 (max= 2.9078), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:12:16,166 - root - INFO - Step 3590: lr=1.00E-05, loss= 1.3667 (max= 2.9078), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:12:16,166 - root - INFO - Step 3590: lr=1.00E-05, loss= 1.3667 (max= 2.9078), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:12:16,166 - root - INFO - Step 3590: lr=1.00E-05, loss= 1.3667 (max= 2.9078), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:12:16,166 - root - INFO - Step 3590: lr=1.00E-05, loss= 1.3667 (max= 2.9078), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:12:16,166 - root - INFO - Step 3590: lr=1.00E-05, loss= 1.3667 (max= 2.9078), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:12:34,183 - root - INFO - Step 3600: lr=1.00E-05, loss= 1.3273 (max= 2.3238), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:12:34,183 - root - INFO - Step 3600: lr=1.00E-05, loss= 1.3273 (max= 2.3238), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:12:34,183 - root - INFO - Step 3600: lr=1.00E-05, loss= 1.3273 (max= 2.3238), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:12:34,183 - root - INFO - Step 3600: lr=1.00E-05, loss= 1.3273 (max= 2.3238), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:12:34,183 - root - INFO - Step 3600: lr=1.00E-05, loss= 1.3273 (max= 2.3238), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:12:34,183 - root - INFO - Step 3600: lr=1.00E-05, loss= 1.3273 (max= 2.3238), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:12:34,183 - root - INFO - Step 3600: lr=1.00E-05, loss= 1.3273 (max= 2.3238), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:12:34,183 - root - INFO - Step 3600: lr=1.00E-05, loss= 1.3273 (max= 2.3238), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:12:52,201 - root - INFO - Step 3610: lr=1.00E-05, loss= 1.3254 (max= 2.2326), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:12:52,201 - root - INFO - Step 3610: lr=1.00E-05, loss= 1.3254 (max= 2.2326), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:12:52,201 - root - INFO - Step 3610: lr=1.00E-05, loss= 1.3254 (max= 2.2326), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:12:52,201 - root - INFO - Step 3610: lr=1.00E-05, loss= 1.3254 (max= 2.2326), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:12:52,201 - root - INFO - Step 3610: lr=1.00E-05, loss= 1.3254 (max= 2.2326), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:12:52,201 - root - INFO - Step 3610: lr=1.00E-05, loss= 1.3254 (max= 2.2326), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:12:52,201 - root - INFO - Step 3610: lr=1.00E-05, loss= 1.3254 (max= 2.2326), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:12:52,201 - root - INFO - Step 3610: lr=1.00E-05, loss= 1.3254 (max= 2.2326), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:13:10,227 - root - INFO - Step 3620: lr=1.00E-05, loss= 1.3709 (max= 2.6694), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:10,228 - root - INFO - Step 3620: lr=1.00E-05, loss= 1.3709 (max= 2.6694), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:10,228 - root - INFO - Step 3620: lr=1.00E-05, loss= 1.3709 (max= 2.6694), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:10,228 - root - INFO - Step 3620: lr=1.00E-05, loss= 1.3709 (max= 2.6694), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:10,228 - root - INFO - Step 3620: lr=1.00E-05, loss= 1.3709 (max= 2.6694), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:10,228 - root - INFO - Step 3620: lr=1.00E-05, loss= 1.3709 (max= 2.6694), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:10,228 - root - INFO - Step 3620: lr=1.00E-05, loss= 1.3709 (max= 2.6694), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:10,228 - root - INFO - Step 3620: lr=1.00E-05, loss= 1.3709 (max= 2.6694), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:28,275 - root - INFO - Step 3630: lr=1.00E-05, loss= 1.3482 (max= 2.3008), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:28,276 - root - INFO - Step 3630: lr=1.00E-05, loss= 1.3482 (max= 2.3008), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:28,276 - root - INFO - Step 3630: lr=1.00E-05, loss= 1.3482 (max= 2.3008), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:28,276 - root - INFO - Step 3630: lr=1.00E-05, loss= 1.3482 (max= 2.3008), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:28,276 - root - INFO - Step 3630: lr=1.00E-05, loss= 1.3482 (max= 2.3008), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:28,276 - root - INFO - Step 3630: lr=1.00E-05, loss= 1.3482 (max= 2.3008), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:28,276 - root - INFO - Step 3630: lr=1.00E-05, loss= 1.3482 (max= 2.3008), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:28,276 - root - INFO - Step 3630: lr=1.00E-05, loss= 1.3482 (max= 2.3008), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:46,328 - root - INFO - Step 3640: lr=1.00E-05, loss= 1.3445 (max= 2.4361), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:46,328 - root - INFO - Step 3640: lr=1.00E-05, loss= 1.3445 (max= 2.4361), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:46,328 - root - INFO - Step 3640: lr=1.00E-05, loss= 1.3445 (max= 2.4361), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:46,328 - root - INFO - Step 3640: lr=1.00E-05, loss= 1.3445 (max= 2.4361), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:46,328 - root - INFO - Step 3640: lr=1.00E-05, loss= 1.3445 (max= 2.4361), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:46,328 - root - INFO - Step 3640: lr=1.00E-05, loss= 1.3445 (max= 2.4361), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:46,329 - root - INFO - Step 3640: lr=1.00E-05, loss= 1.3445 (max= 2.4361), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:13:46,329 - root - INFO - Step 3640: lr=1.00E-05, loss= 1.3445 (max= 2.4361), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:05,075 - root - INFO - Step 3650: lr=1.00E-05, loss= 1.3481 (max= 2.4113), tps=17482, mfu=36.42%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.02s, 5.11%) +2025-10-24 11:14:05,075 - root - INFO - Step 3650: lr=1.00E-05, loss= 1.3481 (max= 2.4113), tps=17482, mfu=36.42%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.02s, 5.11%) +2025-10-24 11:14:05,075 - root - INFO - Step 3650: lr=1.00E-05, loss= 1.3481 (max= 2.4113), tps=17482, mfu=36.42%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.02s, 5.11%) +2025-10-24 11:14:05,075 - root - INFO - Step 3650: lr=1.00E-05, loss= 1.3481 (max= 2.4113), tps=17482, mfu=36.42%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.02s, 5.11%) +2025-10-24 11:14:05,075 - root - INFO - Step 3650: lr=1.00E-05, loss= 1.3481 (max= 2.4113), tps=17482, mfu=36.42%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.02s, 5.11%) +2025-10-24 11:14:05,075 - root - INFO - Step 3650: lr=1.00E-05, loss= 1.3481 (max= 2.4113), tps=17483, mfu=36.43%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.02s, 5.11%) +2025-10-24 11:14:05,076 - root - INFO - Step 3650: lr=1.00E-05, loss= 1.3481 (max= 2.4113), tps=17482, mfu=36.42%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.02s, 5.11%) +2025-10-24 11:14:05,076 - root - INFO - Step 3650: lr=1.00E-05, loss= 1.3481 (max= 2.4113), tps=17482, mfu=36.42%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.02s, 5.11%) +2025-10-24 11:14:23,059 - root - INFO - Step 3660: lr=1.00E-05, loss= 1.3289 (max= 2.6257), tps=18224, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:23,059 - root - INFO - Step 3660: lr=1.00E-05, loss= 1.3289 (max= 2.6257), tps=18224, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:23,059 - root - INFO - Step 3660: lr=1.00E-05, loss= 1.3289 (max= 2.6257), tps=18224, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:23,059 - root - INFO - Step 3660: lr=1.00E-05, loss= 1.3289 (max= 2.6257), tps=18224, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:23,059 - root - INFO - Step 3660: lr=1.00E-05, loss= 1.3289 (max= 2.6257), tps=18224, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:23,059 - root - INFO - Step 3660: lr=1.00E-05, loss= 1.3289 (max= 2.6257), tps=18224, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:23,059 - root - INFO - Step 3660: lr=1.00E-05, loss= 1.3289 (max= 2.6257), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:23,059 - root - INFO - Step 3660: lr=1.00E-05, loss= 1.3289 (max= 2.6257), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:41,089 - root - INFO - Step 3670: lr=1.00E-05, loss= 1.3657 (max= 2.4367), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:41,089 - root - INFO - Step 3670: lr=1.00E-05, loss= 1.3657 (max= 2.4367), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:41,089 - root - INFO - Step 3670: lr=1.00E-05, loss= 1.3657 (max= 2.4367), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:41,089 - root - INFO - Step 3670: lr=1.00E-05, loss= 1.3657 (max= 2.4367), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:41,089 - root - INFO - Step 3670: lr=1.00E-05, loss= 1.3657 (max= 2.4367), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:41,089 - root - INFO - Step 3670: lr=1.00E-05, loss= 1.3657 (max= 2.4367), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:41,089 - root - INFO - Step 3670: lr=1.00E-05, loss= 1.3657 (max= 2.4367), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:41,089 - root - INFO - Step 3670: lr=1.00E-05, loss= 1.3657 (max= 2.4367), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:59,148 - root - INFO - Step 3680: lr=1.00E-05, loss= 1.3275 (max= 3.2250), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:59,148 - root - INFO - Step 3680: lr=1.00E-05, loss= 1.3275 (max= 3.2250), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:59,148 - root - INFO - Step 3680: lr=1.00E-05, loss= 1.3275 (max= 3.2250), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:59,148 - root - INFO - Step 3680: lr=1.00E-05, loss= 1.3275 (max= 3.2250), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:59,148 - root - INFO - Step 3680: lr=1.00E-05, loss= 1.3275 (max= 3.2250), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:59,148 - root - INFO - Step 3680: lr=1.00E-05, loss= 1.3275 (max= 3.2250), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:59,148 - root - INFO - Step 3680: lr=1.00E-05, loss= 1.3275 (max= 3.2250), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:14:59,148 - root - INFO - Step 3680: lr=1.00E-05, loss= 1.3275 (max= 3.2250), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:17,183 - root - INFO - Step 3690: lr=1.00E-05, loss= 1.3750 (max= 2.5085), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:17,183 - root - INFO - Step 3690: lr=1.00E-05, loss= 1.3750 (max= 2.5085), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:17,183 - root - INFO - Step 3690: lr=1.00E-05, loss= 1.3750 (max= 2.5085), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:17,183 - root - INFO - Step 3690: lr=1.00E-05, loss= 1.3750 (max= 2.5085), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:17,183 - root - INFO - Step 3690: lr=1.00E-05, loss= 1.3750 (max= 2.5085), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:17,183 - root - INFO - Step 3690: lr=1.00E-05, loss= 1.3750 (max= 2.5085), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:17,183 - root - INFO - Step 3690: lr=1.00E-05, loss= 1.3750 (max= 2.5085), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:17,183 - root - INFO - Step 3690: lr=1.00E-05, loss= 1.3750 (max= 2.5085), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:35,182 - root - INFO - Step 3700: lr=1.00E-05, loss= 1.3339 (max= 3.2574), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:35,182 - root - INFO - Step 3700: lr=1.00E-05, loss= 1.3339 (max= 3.2574), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:35,182 - root - INFO - Step 3700: lr=1.00E-05, loss= 1.3339 (max= 3.2574), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:35,182 - root - INFO - Step 3700: lr=1.00E-05, loss= 1.3339 (max= 3.2574), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:35,182 - root - INFO - Step 3700: lr=1.00E-05, loss= 1.3339 (max= 3.2574), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:35,182 - root - INFO - Step 3700: lr=1.00E-05, loss= 1.3339 (max= 3.2574), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:35,182 - root - INFO - Step 3700: lr=1.00E-05, loss= 1.3339 (max= 3.2574), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:35,182 - root - INFO - Step 3700: lr=1.00E-05, loss= 1.3339 (max= 3.2574), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:53,197 - root - INFO - Step 3710: lr=1.00E-05, loss= 1.3640 (max= 2.7248), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:53,198 - root - INFO - Step 3710: lr=1.00E-05, loss= 1.3640 (max= 2.7248), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:53,198 - root - INFO - Step 3710: lr=1.00E-05, loss= 1.3640 (max= 2.7248), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:53,198 - root - INFO - Step 3710: lr=1.00E-05, loss= 1.3640 (max= 2.7248), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:53,198 - root - INFO - Step 3710: lr=1.00E-05, loss= 1.3640 (max= 2.7248), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:53,198 - root - INFO - Step 3710: lr=1.00E-05, loss= 1.3640 (max= 2.7248), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:53,198 - root - INFO - Step 3710: lr=1.00E-05, loss= 1.3640 (max= 2.7248), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:15:53,198 - root - INFO - Step 3710: lr=1.00E-05, loss= 1.3640 (max= 2.7248), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:16:11,212 - root - INFO - Step 3720: lr=1.00E-05, loss= 1.3343 (max= 2.4502), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:16:11,212 - root - INFO - Step 3720: lr=1.00E-05, loss= 1.3343 (max= 2.4502), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:16:11,212 - root - INFO - Step 3720: lr=1.00E-05, loss= 1.3343 (max= 2.4502), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:16:11,212 - root - INFO - Step 3720: lr=1.00E-05, loss= 1.3343 (max= 2.4502), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:16:11,212 - root - INFO - Step 3720: lr=1.00E-05, loss= 1.3343 (max= 2.4502), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:16:11,212 - root - INFO - Step 3720: lr=1.00E-05, loss= 1.3343 (max= 2.4502), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:16:11,212 - root - INFO - Step 3720: lr=1.00E-05, loss= 1.3343 (max= 2.4502), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:16:11,212 - root - INFO - Step 3720: lr=1.00E-05, loss= 1.3343 (max= 2.4502), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:16:29,240 - root - INFO - Step 3730: lr=1.00E-05, loss= 1.3419 (max= 3.0801), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:16:29,240 - root - INFO - Step 3730: lr=1.00E-05, loss= 1.3419 (max= 3.0801), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:16:29,240 - root - INFO - Step 3730: lr=1.00E-05, loss= 1.3419 (max= 3.0801), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:16:29,240 - root - INFO - Step 3730: lr=1.00E-05, loss= 1.3419 (max= 3.0801), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:16:29,240 - root - INFO - Step 3730: lr=1.00E-05, loss= 1.3419 (max= 3.0801), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:16:29,240 - root - INFO - Step 3730: lr=1.00E-05, loss= 1.3419 (max= 3.0801), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:16:29,240 - root - INFO - Step 3730: lr=1.00E-05, loss= 1.3419 (max= 3.0801), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:16:29,240 - root - INFO - Step 3730: lr=1.00E-05, loss= 1.3419 (max= 3.0801), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:16:47,262 - root - INFO - Step 3740: lr=1.00E-05, loss= 1.3373 (max= 3.2293), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:16:47,262 - root - INFO - Step 3740: lr=1.00E-05, loss= 1.3373 (max= 3.2293), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:16:47,262 - root - INFO - Step 3740: lr=1.00E-05, loss= 1.3373 (max= 3.2293), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:16:47,262 - root - INFO - Step 3740: lr=1.00E-05, loss= 1.3373 (max= 3.2293), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:16:47,262 - root - INFO - Step 3740: lr=1.00E-05, loss= 1.3373 (max= 3.2293), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:16:47,262 - root - INFO - Step 3740: lr=1.00E-05, loss= 1.3373 (max= 3.2293), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:16:47,262 - root - INFO - Step 3740: lr=1.00E-05, loss= 1.3373 (max= 3.2293), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:16:47,262 - root - INFO - Step 3740: lr=1.00E-05, loss= 1.3373 (max= 3.2293), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:05,231 - root - INFO - Step 3750: lr=1.00E-05, loss= 1.3379 (max= 3.1546), tps=18239, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:05,231 - root - INFO - Step 3750: lr=1.00E-05, loss= 1.3379 (max= 3.1546), tps=18239, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:05,232 - root - INFO - Step 3750: lr=1.00E-05, loss= 1.3379 (max= 3.1546), tps=18239, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:05,232 - root - INFO - Step 3750: lr=1.00E-05, loss= 1.3379 (max= 3.1546), tps=18239, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:05,232 - root - INFO - Step 3750: lr=1.00E-05, loss= 1.3379 (max= 3.1546), tps=18239, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:05,232 - root - INFO - Step 3750: lr=1.00E-05, loss= 1.3379 (max= 3.1546), tps=18239, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:05,232 - root - INFO - Step 3750: lr=1.00E-05, loss= 1.3379 (max= 3.1546), tps=18239, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:05,232 - root - INFO - Step 3750: lr=1.00E-05, loss= 1.3379 (max= 3.1546), tps=18239, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:23,271 - root - INFO - Step 3760: lr=1.00E-05, loss= 1.3414 (max= 2.8053), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:23,271 - root - INFO - Step 3760: lr=1.00E-05, loss= 1.3414 (max= 2.8053), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:23,271 - root - INFO - Step 3760: lr=1.00E-05, loss= 1.3414 (max= 2.8053), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:23,271 - root - INFO - Step 3760: lr=1.00E-05, loss= 1.3414 (max= 2.8053), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:23,271 - root - INFO - Step 3760: lr=1.00E-05, loss= 1.3414 (max= 2.8053), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:23,271 - root - INFO - Step 3760: lr=1.00E-05, loss= 1.3414 (max= 2.8053), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:23,271 - root - INFO - Step 3760: lr=1.00E-05, loss= 1.3414 (max= 2.8053), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:23,271 - root - INFO - Step 3760: lr=1.00E-05, loss= 1.3414 (max= 2.8053), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:41,276 - root - INFO - Step 3770: lr=1.00E-05, loss= 1.3561 (max= 3.8990), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:17:41,277 - root - INFO - Step 3770: lr=1.00E-05, loss= 1.3561 (max= 3.8990), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:17:41,277 - root - INFO - Step 3770: lr=1.00E-05, loss= 1.3561 (max= 3.8990), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:17:41,277 - root - INFO - Step 3770: lr=1.00E-05, loss= 1.3561 (max= 3.8990), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:17:41,277 - root - INFO - Step 3770: lr=1.00E-05, loss= 1.3561 (max= 3.8990), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:17:41,277 - root - INFO - Step 3770: lr=1.00E-05, loss= 1.3561 (max= 3.8990), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:17:41,277 - root - INFO - Step 3770: lr=1.00E-05, loss= 1.3561 (max= 3.8990), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:17:41,277 - root - INFO - Step 3770: lr=1.00E-05, loss= 1.3561 (max= 3.8990), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:17:59,315 - root - INFO - Step 3780: lr=1.00E-05, loss= 1.3350 (max= 3.3857), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:59,315 - root - INFO - Step 3780: lr=1.00E-05, loss= 1.3350 (max= 3.3857), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:59,315 - root - INFO - Step 3780: lr=1.00E-05, loss= 1.3350 (max= 3.3857), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:59,315 - root - INFO - Step 3780: lr=1.00E-05, loss= 1.3350 (max= 3.3857), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:59,315 - root - INFO - Step 3780: lr=1.00E-05, loss= 1.3350 (max= 3.3857), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:59,315 - root - INFO - Step 3780: lr=1.00E-05, loss= 1.3350 (max= 3.3857), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:59,315 - root - INFO - Step 3780: lr=1.00E-05, loss= 1.3350 (max= 3.3857), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:17:59,315 - root - INFO - Step 3780: lr=1.00E-05, loss= 1.3350 (max= 3.3857), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:18:17,340 - root - INFO - Step 3790: lr=1.00E-05, loss= 1.3449 (max= 2.2434), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:18:17,340 - root - INFO - Step 3790: lr=1.00E-05, loss= 1.3449 (max= 2.2434), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:18:17,340 - root - INFO - Step 3790: lr=1.00E-05, loss= 1.3449 (max= 2.2434), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:18:17,340 - root - INFO - Step 3790: lr=1.00E-05, loss= 1.3449 (max= 2.2434), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:18:17,340 - root - INFO - Step 3790: lr=1.00E-05, loss= 1.3449 (max= 2.2434), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:18:17,340 - root - INFO - Step 3790: lr=1.00E-05, loss= 1.3449 (max= 2.2434), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:18:17,340 - root - INFO - Step 3790: lr=1.00E-05, loss= 1.3449 (max= 2.2434), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:18:17,340 - root - INFO - Step 3790: lr=1.00E-05, loss= 1.3449 (max= 2.2434), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:18:35,350 - root - INFO - Step 3800: lr=1.00E-05, loss= 1.3871 (max= 2.3463), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:18:35,350 - root - INFO - Step 3800: lr=1.00E-05, loss= 1.3871 (max= 2.3463), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:18:35,350 - root - INFO - Step 3800: lr=1.00E-05, loss= 1.3871 (max= 2.3463), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:18:35,350 - root - INFO - Step 3800: lr=1.00E-05, loss= 1.3871 (max= 2.3463), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:18:35,350 - root - INFO - Step 3800: lr=1.00E-05, loss= 1.3871 (max= 2.3463), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:18:35,350 - root - INFO - Step 3800: lr=1.00E-05, loss= 1.3871 (max= 2.3463), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:18:35,350 - root - INFO - Step 3800: lr=1.00E-05, loss= 1.3871 (max= 2.3463), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:18:35,350 - root - INFO - Step 3800: lr=1.00E-05, loss= 1.3871 (max= 2.3463), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:18:53,399 - root - INFO - Step 3810: lr=1.00E-05, loss= 1.3401 (max= 2.6268), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:18:53,399 - root - INFO - Step 3810: lr=1.00E-05, loss= 1.3401 (max= 2.6268), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:18:53,399 - root - INFO - Step 3810: lr=1.00E-05, loss= 1.3401 (max= 2.6268), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:18:53,399 - root - INFO - Step 3810: lr=1.00E-05, loss= 1.3401 (max= 2.6268), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:18:53,400 - root - INFO - Step 3810: lr=1.00E-05, loss= 1.3401 (max= 2.6268), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:18:53,400 - root - INFO - Step 3810: lr=1.00E-05, loss= 1.3401 (max= 2.6268), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:18:53,400 - root - INFO - Step 3810: lr=1.00E-05, loss= 1.3401 (max= 2.6268), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:18:53,400 - root - INFO - Step 3810: lr=1.00E-05, loss= 1.3401 (max= 2.6268), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:19:11,458 - root - INFO - Step 3820: lr=1.00E-05, loss= 1.3101 (max= 2.4133), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:11,458 - root - INFO - Step 3820: lr=1.00E-05, loss= 1.3101 (max= 2.4133), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:11,458 - root - INFO - Step 3820: lr=1.00E-05, loss= 1.3101 (max= 2.4133), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:11,458 - root - INFO - Step 3820: lr=1.00E-05, loss= 1.3101 (max= 2.4133), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:11,459 - root - INFO - Step 3820: lr=1.00E-05, loss= 1.3101 (max= 2.4133), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:11,459 - root - INFO - Step 3820: lr=1.00E-05, loss= 1.3101 (max= 2.4133), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:11,459 - root - INFO - Step 3820: lr=1.00E-05, loss= 1.3101 (max= 2.4133), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:11,459 - root - INFO - Step 3820: lr=1.00E-05, loss= 1.3101 (max= 2.4133), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:29,470 - root - INFO - Step 3830: lr=1.00E-05, loss= 1.3219 (max= 2.2803), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:29,470 - root - INFO - Step 3830: lr=1.00E-05, loss= 1.3219 (max= 2.2803), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:29,470 - root - INFO - Step 3830: lr=1.00E-05, loss= 1.3219 (max= 2.2803), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:29,470 - root - INFO - Step 3830: lr=1.00E-05, loss= 1.3219 (max= 2.2803), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:29,470 - root - INFO - Step 3830: lr=1.00E-05, loss= 1.3219 (max= 2.2803), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:29,470 - root - INFO - Step 3830: lr=1.00E-05, loss= 1.3219 (max= 2.2803), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:29,470 - root - INFO - Step 3830: lr=1.00E-05, loss= 1.3219 (max= 2.2803), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:29,470 - root - INFO - Step 3830: lr=1.00E-05, loss= 1.3219 (max= 2.2803), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:47,542 - root - INFO - Step 3840: lr=1.00E-05, loss= 1.3015 (max= 2.5361), tps=18135, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:47,543 - root - INFO - Step 3840: lr=1.00E-05, loss= 1.3015 (max= 2.5361), tps=18135, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:47,543 - root - INFO - Step 3840: lr=1.00E-05, loss= 1.3015 (max= 2.5361), tps=18135, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:47,543 - root - INFO - Step 3840: lr=1.00E-05, loss= 1.3015 (max= 2.5361), tps=18135, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:47,543 - root - INFO - Step 3840: lr=1.00E-05, loss= 1.3015 (max= 2.5361), tps=18135, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:47,543 - root - INFO - Step 3840: lr=1.00E-05, loss= 1.3015 (max= 2.5361), tps=18135, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:47,543 - root - INFO - Step 3840: lr=1.00E-05, loss= 1.3015 (max= 2.5361), tps=18135, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:19:47,543 - root - INFO - Step 3840: lr=1.00E-05, loss= 1.3015 (max= 2.5361), tps=18135, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:20:05,594 - root - INFO - Step 3850: lr=1.00E-05, loss= 1.3608 (max= 2.5885), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:20:05,594 - root - INFO - Step 3850: lr=1.00E-05, loss= 1.3608 (max= 2.5885), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:20:05,594 - root - INFO - Step 3850: lr=1.00E-05, loss= 1.3608 (max= 2.5885), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:20:05,594 - root - INFO - Step 3850: lr=1.00E-05, loss= 1.3608 (max= 2.5885), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:20:05,594 - root - INFO - Step 3850: lr=1.00E-05, loss= 1.3608 (max= 2.5885), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:20:05,594 - root - INFO - Step 3850: lr=1.00E-05, loss= 1.3608 (max= 2.5885), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:20:05,594 - root - INFO - Step 3850: lr=1.00E-05, loss= 1.3608 (max= 2.5885), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:20:05,594 - root - INFO - Step 3850: lr=1.00E-05, loss= 1.3608 (max= 2.5885), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:20:23,601 - root - INFO - Step 3860: lr=1.00E-05, loss= 1.2799 (max= 2.4472), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:20:23,601 - root - INFO - Step 3860: lr=1.00E-05, loss= 1.2799 (max= 2.4472), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:20:23,601 - root - INFO - Step 3860: lr=1.00E-05, loss= 1.2799 (max= 2.4472), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:20:23,601 - root - INFO - Step 3860: lr=1.00E-05, loss= 1.2799 (max= 2.4472), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:20:23,602 - root - INFO - Step 3860: lr=1.00E-05, loss= 1.2799 (max= 2.4472), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:20:23,602 - root - INFO - Step 3860: lr=1.00E-05, loss= 1.2799 (max= 2.4472), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:20:23,602 - root - INFO - Step 3860: lr=1.00E-05, loss= 1.2799 (max= 2.4472), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:20:23,602 - root - INFO - Step 3860: lr=1.00E-05, loss= 1.2799 (max= 2.4472), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:20:41,603 - root - INFO - Step 3870: lr=1.00E-05, loss= 1.3239 (max= 2.5755), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:20:41,603 - root - INFO - Step 3870: lr=1.00E-05, loss= 1.3239 (max= 2.5755), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:20:41,603 - root - INFO - Step 3870: lr=1.00E-05, loss= 1.3239 (max= 2.5755), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:20:41,603 - root - INFO - Step 3870: lr=1.00E-05, loss= 1.3239 (max= 2.5755), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:20:41,603 - root - INFO - Step 3870: lr=1.00E-05, loss= 1.3239 (max= 2.5755), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:20:41,603 - root - INFO - Step 3870: lr=1.00E-05, loss= 1.3239 (max= 2.5755), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:20:41,603 - root - INFO - Step 3870: lr=1.00E-05, loss= 1.3239 (max= 2.5755), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:20:41,603 - root - INFO - Step 3870: lr=1.00E-05, loss= 1.3239 (max= 2.5755), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:20:59,625 - root - INFO - Step 3880: lr=1.00E-05, loss= 1.3155 (max= 2.5612), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:20:59,625 - root - INFO - Step 3880: lr=1.00E-05, loss= 1.3155 (max= 2.5612), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:20:59,625 - root - INFO - Step 3880: lr=1.00E-05, loss= 1.3155 (max= 2.5612), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:20:59,626 - root - INFO - Step 3880: lr=1.00E-05, loss= 1.3155 (max= 2.5612), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:20:59,626 - root - INFO - Step 3880: lr=1.00E-05, loss= 1.3155 (max= 2.5612), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:20:59,626 - root - INFO - Step 3880: lr=1.00E-05, loss= 1.3155 (max= 2.5612), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:20:59,626 - root - INFO - Step 3880: lr=1.00E-05, loss= 1.3155 (max= 2.5612), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:20:59,626 - root - INFO - Step 3880: lr=1.00E-05, loss= 1.3155 (max= 2.5612), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:17,638 - root - INFO - Step 3890: lr=1.00E-05, loss= 1.3364 (max= 2.6219), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:17,638 - root - INFO - Step 3890: lr=1.00E-05, loss= 1.3364 (max= 2.6219), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:17,638 - root - INFO - Step 3890: lr=1.00E-05, loss= 1.3364 (max= 2.6219), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:17,638 - root - INFO - Step 3890: lr=1.00E-05, loss= 1.3364 (max= 2.6219), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:17,638 - root - INFO - Step 3890: lr=1.00E-05, loss= 1.3364 (max= 2.6219), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:17,638 - root - INFO - Step 3890: lr=1.00E-05, loss= 1.3364 (max= 2.6219), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:17,639 - root - INFO - Step 3890: lr=1.00E-05, loss= 1.3364 (max= 2.6219), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:17,639 - root - INFO - Step 3890: lr=1.00E-05, loss= 1.3364 (max= 2.6219), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:35,729 - root - INFO - Step 3900: lr=1.00E-05, loss= 1.3448 (max= 2.7679), tps=18116, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:35,729 - root - INFO - Step 3900: lr=1.00E-05, loss= 1.3448 (max= 2.7679), tps=18116, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:35,729 - root - INFO - Step 3900: lr=1.00E-05, loss= 1.3448 (max= 2.7679), tps=18116, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:35,729 - root - INFO - Step 3900: lr=1.00E-05, loss= 1.3448 (max= 2.7679), tps=18116, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:35,729 - root - INFO - Step 3900: lr=1.00E-05, loss= 1.3448 (max= 2.7679), tps=18117, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:35,729 - root - INFO - Step 3900: lr=1.00E-05, loss= 1.3448 (max= 2.7679), tps=18116, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:35,729 - root - INFO - Step 3900: lr=1.00E-05, loss= 1.3448 (max= 2.7679), tps=18116, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:35,730 - root - INFO - Step 3900: lr=1.00E-05, loss= 1.3448 (max= 2.7679), tps=18117, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:55,668 - root - INFO - Step 3910: lr=1.00E-05, loss= 1.3595 (max= 2.7582), tps=16438, mfu=34.25%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:55,668 - root - INFO - Step 3910: lr=1.00E-05, loss= 1.3595 (max= 2.7582), tps=16438, mfu=34.25%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:55,668 - root - INFO - Step 3910: lr=1.00E-05, loss= 1.3595 (max= 2.7582), tps=16438, mfu=34.25%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:55,669 - root - INFO - Step 3910: lr=1.00E-05, loss= 1.3595 (max= 2.7582), tps=16438, mfu=34.25%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:55,669 - root - INFO - Step 3910: lr=1.00E-05, loss= 1.3595 (max= 2.7582), tps=16438, mfu=34.25%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:55,669 - root - INFO - Step 3910: lr=1.00E-05, loss= 1.3595 (max= 2.7582), tps=16438, mfu=34.25%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:55,669 - root - INFO - Step 3910: lr=1.00E-05, loss= 1.3595 (max= 2.7582), tps=16438, mfu=34.25%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:21:55,669 - root - INFO - Step 3910: lr=1.00E-05, loss= 1.3595 (max= 2.7582), tps=16438, mfu=34.25%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:15,892 - root - INFO - Step 3920: lr=1.00E-05, loss= 1.3332 (max= 2.7031), tps=16207, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:15,892 - root - INFO - Step 3920: lr=1.00E-05, loss= 1.3332 (max= 2.7031), tps=16207, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:15,892 - root - INFO - Step 3920: lr=1.00E-05, loss= 1.3332 (max= 2.7031), tps=16207, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:15,893 - root - INFO - Step 3920: lr=1.00E-05, loss= 1.3332 (max= 2.7031), tps=16207, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:15,893 - root - INFO - Step 3920: lr=1.00E-05, loss= 1.3332 (max= 2.7031), tps=16207, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:15,893 - root - INFO - Step 3920: lr=1.00E-05, loss= 1.3332 (max= 2.7031), tps=16207, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:15,893 - root - INFO - Step 3920: lr=1.00E-05, loss= 1.3332 (max= 2.7031), tps=16207, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:15,893 - root - INFO - Step 3920: lr=1.00E-05, loss= 1.3332 (max= 2.7031), tps=16207, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:36,127 - root - INFO - Step 3930: lr=1.00E-05, loss= 1.3111 (max= 2.2869), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:36,127 - root - INFO - Step 3930: lr=1.00E-05, loss= 1.3111 (max= 2.2869), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:36,127 - root - INFO - Step 3930: lr=1.00E-05, loss= 1.3111 (max= 2.2869), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:36,127 - root - INFO - Step 3930: lr=1.00E-05, loss= 1.3111 (max= 2.2869), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:36,127 - root - INFO - Step 3930: lr=1.00E-05, loss= 1.3111 (max= 2.2869), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:36,127 - root - INFO - Step 3930: lr=1.00E-05, loss= 1.3111 (max= 2.2869), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:36,127 - root - INFO - Step 3930: lr=1.00E-05, loss= 1.3111 (max= 2.2869), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:36,127 - root - INFO - Step 3930: lr=1.00E-05, loss= 1.3111 (max= 2.2869), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:56,378 - root - INFO - Step 3940: lr=1.00E-05, loss= 1.3288 (max= 2.2894), tps=16185, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:56,378 - root - INFO - Step 3940: lr=1.00E-05, loss= 1.3288 (max= 2.2894), tps=16185, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:56,379 - root - INFO - Step 3940: lr=1.00E-05, loss= 1.3288 (max= 2.2894), tps=16185, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:56,379 - root - INFO - Step 3940: lr=1.00E-05, loss= 1.3288 (max= 2.2894), tps=16185, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:56,379 - root - INFO - Step 3940: lr=1.00E-05, loss= 1.3288 (max= 2.2894), tps=16185, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:56,379 - root - INFO - Step 3940: lr=1.00E-05, loss= 1.3288 (max= 2.2894), tps=16185, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:56,379 - root - INFO - Step 3940: lr=1.00E-05, loss= 1.3288 (max= 2.2894), tps=16185, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:22:56,379 - root - INFO - Step 3940: lr=1.00E-05, loss= 1.3288 (max= 2.2894), tps=16185, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:16,613 - root - INFO - Step 3950: lr=1.00E-05, loss= 1.3314 (max= 2.4957), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:16,614 - root - INFO - Step 3950: lr=1.00E-05, loss= 1.3314 (max= 2.4957), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:16,614 - root - INFO - Step 3950: lr=1.00E-05, loss= 1.3314 (max= 2.4957), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:16,614 - root - INFO - Step 3950: lr=1.00E-05, loss= 1.3314 (max= 2.4957), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:16,614 - root - INFO - Step 3950: lr=1.00E-05, loss= 1.3314 (max= 2.4957), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:16,614 - root - INFO - Step 3950: lr=1.00E-05, loss= 1.3314 (max= 2.4957), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:16,614 - root - INFO - Step 3950: lr=1.00E-05, loss= 1.3314 (max= 2.4957), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:16,614 - root - INFO - Step 3950: lr=1.00E-05, loss= 1.3314 (max= 2.4957), tps=16198, mfu=33.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:36,836 - root - INFO - Step 3960: lr=1.00E-05, loss= 1.3065 (max= 2.4570), tps=16208, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:36,836 - root - INFO - Step 3960: lr=1.00E-05, loss= 1.3065 (max= 2.4570), tps=16208, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:36,837 - root - INFO - Step 3960: lr=1.00E-05, loss= 1.3065 (max= 2.4570), tps=16208, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:36,837 - root - INFO - Step 3960: lr=1.00E-05, loss= 1.3065 (max= 2.4570), tps=16208, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:36,837 - root - INFO - Step 3960: lr=1.00E-05, loss= 1.3065 (max= 2.4570), tps=16208, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:36,837 - root - INFO - Step 3960: lr=1.00E-05, loss= 1.3065 (max= 2.4570), tps=16208, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:36,837 - root - INFO - Step 3960: lr=1.00E-05, loss= 1.3065 (max= 2.4570), tps=16208, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:36,837 - root - INFO - Step 3960: lr=1.00E-05, loss= 1.3065 (max= 2.4570), tps=16209, mfu=33.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:57,090 - root - INFO - Step 3970: lr=1.00E-05, loss= 1.3312 (max= 2.5506), tps=16183, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:57,091 - root - INFO - Step 3970: lr=1.00E-05, loss= 1.3312 (max= 2.5506), tps=16183, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:57,091 - root - INFO - Step 3970: lr=1.00E-05, loss= 1.3312 (max= 2.5506), tps=16183, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:57,091 - root - INFO - Step 3970: lr=1.00E-05, loss= 1.3312 (max= 2.5506), tps=16183, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:57,091 - root - INFO - Step 3970: lr=1.00E-05, loss= 1.3312 (max= 2.5506), tps=16183, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:57,091 - root - INFO - Step 3970: lr=1.00E-05, loss= 1.3312 (max= 2.5506), tps=16183, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:57,091 - root - INFO - Step 3970: lr=1.00E-05, loss= 1.3312 (max= 2.5506), tps=16183, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:23:57,091 - root - INFO - Step 3970: lr=1.00E-05, loss= 1.3312 (max= 2.5506), tps=16183, mfu=33.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:24:16,722 - root - INFO - Step 3980: lr=1.00E-05, loss= 1.3063 (max= 2.3728), tps=16695, mfu=34.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:24:16,722 - root - INFO - Step 3980: lr=1.00E-05, loss= 1.3063 (max= 2.3728), tps=16695, mfu=34.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:24:16,722 - root - INFO - Step 3980: lr=1.00E-05, loss= 1.3063 (max= 2.3728), tps=16695, mfu=34.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:24:16,722 - root - INFO - Step 3980: lr=1.00E-05, loss= 1.3063 (max= 2.3728), tps=16695, mfu=34.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:24:16,722 - root - INFO - Step 3980: lr=1.00E-05, loss= 1.3063 (max= 2.3728), tps=16695, mfu=34.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:24:16,722 - root - INFO - Step 3980: lr=1.00E-05, loss= 1.3063 (max= 2.3728), tps=16695, mfu=34.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:24:16,722 - root - INFO - Step 3980: lr=1.00E-05, loss= 1.3063 (max= 2.3728), tps=16695, mfu=34.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:24:16,723 - root - INFO - Step 3980: lr=1.00E-05, loss= 1.3063 (max= 2.3728), tps=16695, mfu=34.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:24:34,742 - root - INFO - Step 3990: lr=1.00E-05, loss= 1.3272 (max= 2.4930), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:24:34,742 - root - INFO - Step 3990: lr=1.00E-05, loss= 1.3272 (max= 2.4930), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:24:34,742 - root - INFO - Step 3990: lr=1.00E-05, loss= 1.3272 (max= 2.4930), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:24:34,743 - root - INFO - Step 3990: lr=1.00E-05, loss= 1.3272 (max= 2.4930), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:24:34,743 - root - INFO - Step 3990: lr=1.00E-05, loss= 1.3272 (max= 2.4930), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:24:34,743 - root - INFO - Step 3990: lr=1.00E-05, loss= 1.3272 (max= 2.4930), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:24:34,743 - root - INFO - Step 3990: lr=1.00E-05, loss= 1.3272 (max= 2.4930), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:24:34,743 - root - INFO - Step 3990: lr=1.00E-05, loss= 1.3272 (max= 2.4930), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-4000 +2025-10-24 11:24:52,743 - root - INFO - Step 4000: lr=1.00E-05, loss= 1.3372 (max= 2.7936), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:24:52,743 - root - INFO - Saving a full checkpoint at step 4000 +2025-10-24 11:24:52,743 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 11:24:52,743 - root - INFO - Step 4000: lr=1.00E-05, loss= 1.3372 (max= 2.7936), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:24:52,743 - root - INFO - Step 4000: lr=1.00E-05, loss= 1.3372 (max= 2.7936), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:24:52,743 - root - INFO - Saving a full checkpoint at step 4000 +2025-10-24 11:24:52,743 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 11:24:52,744 - root - INFO - Saving a full checkpoint at step 4000 +2025-10-24 11:24:52,744 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 11:24:52,744 - root - INFO - Step 4000: lr=1.00E-05, loss= 1.3372 (max= 2.7936), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:24:52,744 - root - INFO - Saving a full checkpoint at step 4000 +2025-10-24 11:24:52,744 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 11:24:52,744 - root - INFO - Step 4000: lr=1.00E-05, loss= 1.3372 (max= 2.7936), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:24:52,744 - root - INFO - Step 4000: lr=1.00E-05, loss= 1.3372 (max= 2.7936), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:24:52,744 - root - INFO - Saving a full checkpoint at step 4000 +2025-10-24 11:24:52,744 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 11:24:52,744 - root - INFO - Saving a full checkpoint at step 4000 +2025-10-24 11:24:52,744 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 11:24:52,744 - root - INFO - Step 4000: lr=1.00E-05, loss= 1.3372 (max= 2.7936), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:24:52,744 - root - INFO - Step 4000: lr=1.00E-05, loss= 1.3372 (max= 2.7936), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:24:52,744 - root - INFO - Saving a full checkpoint at step 4000 +2025-10-24 11:24:52,744 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 11:24:52,744 - root - INFO - Saving a full checkpoint at step 4000 +2025-10-24 11:24:52,744 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-4000! Save time: 4.532243251800537 +2025-10-24 11:25:06,517 - root - INFO - Finished saving the checkpoint in 13.77 seconds +2025-10-24 11:25:06,524 - root - INFO - Finished saving the checkpoint in 13.78 seconds +2025-10-24 11:25:06,526 - root - INFO - Finished saving the checkpoint in 13.78 seconds +2025-10-24 11:25:06,526 - root - INFO - Finished saving the checkpoint in 13.78 seconds +2025-10-24 11:25:06,526 - root - INFO - Finished saving the checkpoint in 13.78 seconds +2025-10-24 11:25:06,526 - root - INFO - Finished saving the checkpoint in 13.78 seconds +2025-10-24 11:25:06,527 - root - INFO - Finished saving the checkpoint in 13.78 seconds +2025-10-24 11:25:06,527 - root - INFO - Finished saving the checkpoint in 13.78 seconds +2025-10-24 11:25:25,491 - root - INFO - Step 4010: lr=1.00E-05, loss= 1.3396 (max= 2.4757), tps=10008, mfu=20.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 11:25:25,491 - root - INFO - Step 4010: lr=1.00E-05, loss= 1.3396 (max= 2.4757), tps=10008, mfu=20.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 11:25:25,491 - root - INFO - Step 4010: lr=1.00E-05, loss= 1.3396 (max= 2.4757), tps=10008, mfu=20.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 11:25:25,491 - root - INFO - Step 4010: lr=1.00E-05, loss= 1.3396 (max= 2.4757), tps=10008, mfu=20.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 11:25:25,491 - root - INFO - Step 4010: lr=1.00E-05, loss= 1.3396 (max= 2.4757), tps=10008, mfu=20.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 11:25:25,492 - root - INFO - Step 4010: lr=1.00E-05, loss= 1.3396 (max= 2.4757), tps=10008, mfu=20.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 11:25:25,492 - root - INFO - Step 4010: lr=1.00E-05, loss= 1.3396 (max= 2.4757), tps=10008, mfu=20.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 11:25:25,492 - root - INFO - Step 4010: lr=1.00E-05, loss= 1.3396 (max= 2.4757), tps=10008, mfu=20.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 11:25:44,646 - root - INFO - Step 4020: lr=1.00E-05, loss= 1.2882 (max= 2.4753), tps=17110, mfu=35.65%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:25:44,647 - root - INFO - Step 4020: lr=1.00E-05, loss= 1.2882 (max= 2.4753), tps=17110, mfu=35.65%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:25:44,647 - root - INFO - Step 4020: lr=1.00E-05, loss= 1.2882 (max= 2.4753), tps=17110, mfu=35.65%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:25:44,647 - root - INFO - Step 4020: lr=1.00E-05, loss= 1.2882 (max= 2.4753), tps=17110, mfu=35.65%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:25:44,647 - root - INFO - Step 4020: lr=1.00E-05, loss= 1.2882 (max= 2.4753), tps=17110, mfu=35.65%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:25:44,647 - root - INFO - Step 4020: lr=1.00E-05, loss= 1.2882 (max= 2.4753), tps=17110, mfu=35.65%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:25:44,647 - root - INFO - Step 4020: lr=1.00E-05, loss= 1.2882 (max= 2.4753), tps=17110, mfu=35.65%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:25:44,647 - root - INFO - Step 4020: lr=1.00E-05, loss= 1.2882 (max= 2.4753), tps=17110, mfu=35.65%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:26:02,682 - root - INFO - Step 4030: lr=1.00E-05, loss= 1.3067 (max= 2.3637), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:26:02,683 - root - INFO - Step 4030: lr=1.00E-05, loss= 1.3067 (max= 2.3637), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:26:02,683 - root - INFO - Step 4030: lr=1.00E-05, loss= 1.3067 (max= 2.3637), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:26:02,683 - root - INFO - Step 4030: lr=1.00E-05, loss= 1.3067 (max= 2.3637), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:26:02,683 - root - INFO - Step 4030: lr=1.00E-05, loss= 1.3067 (max= 2.3637), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:26:02,683 - root - INFO - Step 4030: lr=1.00E-05, loss= 1.3067 (max= 2.3637), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:26:02,683 - root - INFO - Step 4030: lr=1.00E-05, loss= 1.3067 (max= 2.3637), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:26:02,683 - root - INFO - Step 4030: lr=1.00E-05, loss= 1.3067 (max= 2.3637), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:26:20,692 - root - INFO - Step 4040: lr=1.00E-05, loss= 1.2955 (max= 2.5470), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:26:20,692 - root - INFO - Step 4040: lr=1.00E-05, loss= 1.2955 (max= 2.5470), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:26:20,692 - root - INFO - Step 4040: lr=1.00E-05, loss= 1.2955 (max= 2.5470), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:26:20,693 - root - INFO - Step 4040: lr=1.00E-05, loss= 1.2955 (max= 2.5470), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:26:20,693 - root - INFO - Step 4040: lr=1.00E-05, loss= 1.2955 (max= 2.5470), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:26:20,693 - root - INFO - Step 4040: lr=1.00E-05, loss= 1.2955 (max= 2.5470), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:26:20,693 - root - INFO - Step 4040: lr=1.00E-05, loss= 1.2955 (max= 2.5470), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:26:20,693 - root - INFO - Step 4040: lr=1.00E-05, loss= 1.2955 (max= 2.5470), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:26:38,692 - root - INFO - Step 4050: lr=1.00E-05, loss= 1.3198 (max= 3.2196), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:26:38,692 - root - INFO - Step 4050: lr=1.00E-05, loss= 1.3198 (max= 3.2196), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:26:38,692 - root - INFO - Step 4050: lr=1.00E-05, loss= 1.3198 (max= 3.2196), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:26:38,692 - root - INFO - Step 4050: lr=1.00E-05, loss= 1.3198 (max= 3.2196), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:26:38,692 - root - INFO - Step 4050: lr=1.00E-05, loss= 1.3198 (max= 3.2196), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:26:38,693 - root - INFO - Step 4050: lr=1.00E-05, loss= 1.3198 (max= 3.2196), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:26:38,693 - root - INFO - Step 4050: lr=1.00E-05, loss= 1.3198 (max= 3.2196), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:26:38,693 - root - INFO - Step 4050: lr=1.00E-05, loss= 1.3198 (max= 3.2196), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:26:56,915 - root - INFO - Step 4060: lr=1.00E-05, loss= 1.3228 (max= 2.8760), tps=17985, mfu=37.47%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:26:56,915 - root - INFO - Step 4060: lr=1.00E-05, loss= 1.3228 (max= 2.8760), tps=17985, mfu=37.47%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:26:56,915 - root - INFO - Step 4060: lr=1.00E-05, loss= 1.3228 (max= 2.8760), tps=17986, mfu=37.47%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:26:56,915 - root - INFO - Step 4060: lr=1.00E-05, loss= 1.3228 (max= 2.8760), tps=17986, mfu=37.47%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:26:56,915 - root - INFO - Step 4060: lr=1.00E-05, loss= 1.3228 (max= 2.8760), tps=17986, mfu=37.47%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:26:56,915 - root - INFO - Step 4060: lr=1.00E-05, loss= 1.3228 (max= 2.8760), tps=17986, mfu=37.47%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:26:56,915 - root - INFO - Step 4060: lr=1.00E-05, loss= 1.3228 (max= 2.8760), tps=17986, mfu=37.47%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:26:56,915 - root - INFO - Step 4060: lr=1.00E-05, loss= 1.3228 (max= 2.8760), tps=17986, mfu=37.47%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:14,971 - root - INFO - Step 4070: lr=1.00E-05, loss= 1.3336 (max= 2.3500), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:14,971 - root - INFO - Step 4070: lr=1.00E-05, loss= 1.3336 (max= 2.3500), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:14,972 - root - INFO - Step 4070: lr=1.00E-05, loss= 1.3336 (max= 2.3500), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:14,972 - root - INFO - Step 4070: lr=1.00E-05, loss= 1.3336 (max= 2.3500), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:14,972 - root - INFO - Step 4070: lr=1.00E-05, loss= 1.3336 (max= 2.3500), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:14,972 - root - INFO - Step 4070: lr=1.00E-05, loss= 1.3336 (max= 2.3500), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:14,972 - root - INFO - Step 4070: lr=1.00E-05, loss= 1.3336 (max= 2.3500), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:14,972 - root - INFO - Step 4070: lr=1.00E-05, loss= 1.3336 (max= 2.3500), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:33,058 - root - INFO - Step 4080: lr=1.00E-05, loss= 1.3194 (max= 2.9527), tps=18120, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:33,058 - root - INFO - Step 4080: lr=1.00E-05, loss= 1.3194 (max= 2.9527), tps=18120, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:33,059 - root - INFO - Step 4080: lr=1.00E-05, loss= 1.3194 (max= 2.9527), tps=18120, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:33,059 - root - INFO - Step 4080: lr=1.00E-05, loss= 1.3194 (max= 2.9527), tps=18120, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:33,059 - root - INFO - Step 4080: lr=1.00E-05, loss= 1.3194 (max= 2.9527), tps=18120, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:33,059 - root - INFO - Step 4080: lr=1.00E-05, loss= 1.3194 (max= 2.9527), tps=18120, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:33,059 - root - INFO - Step 4080: lr=1.00E-05, loss= 1.3194 (max= 2.9527), tps=18121, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:33,059 - root - INFO - Step 4080: lr=1.00E-05, loss= 1.3194 (max= 2.9527), tps=18120, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:51,063 - root - INFO - Step 4090: lr=1.00E-05, loss= 1.3198 (max= 2.9450), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:51,063 - root - INFO - Step 4090: lr=1.00E-05, loss= 1.3198 (max= 2.9450), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:51,063 - root - INFO - Step 4090: lr=1.00E-05, loss= 1.3198 (max= 2.9450), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:51,063 - root - INFO - Step 4090: lr=1.00E-05, loss= 1.3198 (max= 2.9450), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:51,063 - root - INFO - Step 4090: lr=1.00E-05, loss= 1.3198 (max= 2.9450), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:51,063 - root - INFO - Step 4090: lr=1.00E-05, loss= 1.3198 (max= 2.9450), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:51,063 - root - INFO - Step 4090: lr=1.00E-05, loss= 1.3198 (max= 2.9450), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:27:51,063 - root - INFO - Step 4090: lr=1.00E-05, loss= 1.3198 (max= 2.9450), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:28:09,097 - root - INFO - Step 4100: lr=1.00E-05, loss= 1.3370 (max= 2.4619), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:28:09,098 - root - INFO - Step 4100: lr=1.00E-05, loss= 1.3370 (max= 2.4619), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:28:09,098 - root - INFO - Step 4100: lr=1.00E-05, loss= 1.3370 (max= 2.4619), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:28:09,098 - root - INFO - Step 4100: lr=1.00E-05, loss= 1.3370 (max= 2.4619), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:28:09,098 - root - INFO - Step 4100: lr=1.00E-05, loss= 1.3370 (max= 2.4619), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:28:09,098 - root - INFO - Step 4100: lr=1.00E-05, loss= 1.3370 (max= 2.4619), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:28:09,098 - root - INFO - Step 4100: lr=1.00E-05, loss= 1.3370 (max= 2.4619), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:28:09,098 - root - INFO - Step 4100: lr=1.00E-05, loss= 1.3370 (max= 2.4619), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:28:27,111 - root - INFO - Step 4110: lr=1.00E-05, loss= 1.3115 (max= 2.3806), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:28:27,111 - root - INFO - Step 4110: lr=1.00E-05, loss= 1.3115 (max= 2.3806), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:28:27,112 - root - INFO - Step 4110: lr=1.00E-05, loss= 1.3115 (max= 2.3806), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:28:27,112 - root - INFO - Step 4110: lr=1.00E-05, loss= 1.3115 (max= 2.3806), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:28:27,112 - root - INFO - Step 4110: lr=1.00E-05, loss= 1.3115 (max= 2.3806), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:28:27,112 - root - INFO - Step 4110: lr=1.00E-05, loss= 1.3115 (max= 2.3806), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:28:27,112 - root - INFO - Step 4110: lr=1.00E-05, loss= 1.3115 (max= 2.3806), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:28:27,112 - root - INFO - Step 4110: lr=1.00E-05, loss= 1.3115 (max= 2.3806), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:28:45,143 - root - INFO - Step 4120: lr=1.00E-05, loss= 1.3366 (max= 2.4406), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:28:45,143 - root - INFO - Step 4120: lr=1.00E-05, loss= 1.3366 (max= 2.4406), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:28:45,144 - root - INFO - Step 4120: lr=1.00E-05, loss= 1.3366 (max= 2.4406), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:28:45,144 - root - INFO - Step 4120: lr=1.00E-05, loss= 1.3366 (max= 2.4406), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:28:45,144 - root - INFO - Step 4120: lr=1.00E-05, loss= 1.3366 (max= 2.4406), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:28:45,144 - root - INFO - Step 4120: lr=1.00E-05, loss= 1.3366 (max= 2.4406), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:28:45,144 - root - INFO - Step 4120: lr=1.00E-05, loss= 1.3366 (max= 2.4406), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:28:45,144 - root - INFO - Step 4120: lr=1.00E-05, loss= 1.3366 (max= 2.4406), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:03,194 - root - INFO - Step 4130: lr=1.00E-05, loss= 1.3449 (max= 3.4843), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:03,194 - root - INFO - Step 4130: lr=1.00E-05, loss= 1.3449 (max= 3.4843), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:03,194 - root - INFO - Step 4130: lr=1.00E-05, loss= 1.3449 (max= 3.4843), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:03,194 - root - INFO - Step 4130: lr=1.00E-05, loss= 1.3449 (max= 3.4843), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:03,195 - root - INFO - Step 4130: lr=1.00E-05, loss= 1.3449 (max= 3.4843), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:03,195 - root - INFO - Step 4130: lr=1.00E-05, loss= 1.3449 (max= 3.4843), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:03,195 - root - INFO - Step 4130: lr=1.00E-05, loss= 1.3449 (max= 3.4843), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:03,195 - root - INFO - Step 4130: lr=1.00E-05, loss= 1.3449 (max= 3.4843), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:21,256 - root - INFO - Step 4140: lr=1.00E-05, loss= 1.3112 (max= 2.5750), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:21,256 - root - INFO - Step 4140: lr=1.00E-05, loss= 1.3112 (max= 2.5750), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:21,257 - root - INFO - Step 4140: lr=1.00E-05, loss= 1.3112 (max= 2.5750), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:21,257 - root - INFO - Step 4140: lr=1.00E-05, loss= 1.3112 (max= 2.5750), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:21,257 - root - INFO - Step 4140: lr=1.00E-05, loss= 1.3112 (max= 2.5750), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:21,257 - root - INFO - Step 4140: lr=1.00E-05, loss= 1.3112 (max= 2.5750), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:21,257 - root - INFO - Step 4140: lr=1.00E-05, loss= 1.3112 (max= 2.5750), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:21,257 - root - INFO - Step 4140: lr=1.00E-05, loss= 1.3112 (max= 2.5750), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:39,493 - root - INFO - Step 4150: lr=1.00E-05, loss= 1.3157 (max= 2.4346), tps=17972, mfu=37.45%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:29:39,493 - root - INFO - Step 4150: lr=1.00E-05, loss= 1.3157 (max= 2.4346), tps=17972, mfu=37.45%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:29:39,493 - root - INFO - Step 4150: lr=1.00E-05, loss= 1.3157 (max= 2.4346), tps=17973, mfu=37.45%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:29:39,493 - root - INFO - Step 4150: lr=1.00E-05, loss= 1.3157 (max= 2.4346), tps=17973, mfu=37.45%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:29:39,493 - root - INFO - Step 4150: lr=1.00E-05, loss= 1.3157 (max= 2.4346), tps=17973, mfu=37.45%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:29:39,493 - root - INFO - Step 4150: lr=1.00E-05, loss= 1.3157 (max= 2.4346), tps=17973, mfu=37.45%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:29:39,493 - root - INFO - Step 4150: lr=1.00E-05, loss= 1.3157 (max= 2.4346), tps=17973, mfu=37.45%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:29:39,493 - root - INFO - Step 4150: lr=1.00E-05, loss= 1.3157 (max= 2.4346), tps=17972, mfu=37.44%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:29:57,599 - root - INFO - Step 4160: lr=1.00E-05, loss= 1.3127 (max= 2.5442), tps=18101, mfu=37.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:57,599 - root - INFO - Step 4160: lr=1.00E-05, loss= 1.3127 (max= 2.5442), tps=18101, mfu=37.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:57,599 - root - INFO - Step 4160: lr=1.00E-05, loss= 1.3127 (max= 2.5442), tps=18101, mfu=37.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:57,599 - root - INFO - Step 4160: lr=1.00E-05, loss= 1.3127 (max= 2.5442), tps=18101, mfu=37.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:57,599 - root - INFO - Step 4160: lr=1.00E-05, loss= 1.3127 (max= 2.5442), tps=18101, mfu=37.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:57,599 - root - INFO - Step 4160: lr=1.00E-05, loss= 1.3127 (max= 2.5442), tps=18101, mfu=37.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:57,599 - root - INFO - Step 4160: lr=1.00E-05, loss= 1.3127 (max= 2.5442), tps=18101, mfu=37.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:29:57,600 - root - INFO - Step 4160: lr=1.00E-05, loss= 1.3127 (max= 2.5442), tps=18101, mfu=37.71%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:15,633 - root - INFO - Step 4170: lr=1.00E-05, loss= 1.3352 (max= 2.8695), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:15,634 - root - INFO - Step 4170: lr=1.00E-05, loss= 1.3352 (max= 2.8695), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:15,634 - root - INFO - Step 4170: lr=1.00E-05, loss= 1.3352 (max= 2.8695), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:15,634 - root - INFO - Step 4170: lr=1.00E-05, loss= 1.3352 (max= 2.8695), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:15,634 - root - INFO - Step 4170: lr=1.00E-05, loss= 1.3352 (max= 2.8695), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:15,634 - root - INFO - Step 4170: lr=1.00E-05, loss= 1.3352 (max= 2.8695), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:15,634 - root - INFO - Step 4170: lr=1.00E-05, loss= 1.3352 (max= 2.8695), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:15,634 - root - INFO - Step 4170: lr=1.00E-05, loss= 1.3352 (max= 2.8695), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:33,676 - root - INFO - Step 4180: lr=1.00E-05, loss= 1.3150 (max= 2.4154), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:33,677 - root - INFO - Step 4180: lr=1.00E-05, loss= 1.3150 (max= 2.4154), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:33,677 - root - INFO - Step 4180: lr=1.00E-05, loss= 1.3150 (max= 2.4154), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:33,677 - root - INFO - Step 4180: lr=1.00E-05, loss= 1.3150 (max= 2.4154), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:33,677 - root - INFO - Step 4180: lr=1.00E-05, loss= 1.3150 (max= 2.4154), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:33,677 - root - INFO - Step 4180: lr=1.00E-05, loss= 1.3150 (max= 2.4154), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:33,677 - root - INFO - Step 4180: lr=1.00E-05, loss= 1.3150 (max= 2.4154), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:33,677 - root - INFO - Step 4180: lr=1.00E-05, loss= 1.3150 (max= 2.4154), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:51,690 - root - INFO - Step 4190: lr=1.00E-05, loss= 1.3401 (max= 3.7452), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:51,690 - root - INFO - Step 4190: lr=1.00E-05, loss= 1.3401 (max= 3.7452), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:51,690 - root - INFO - Step 4190: lr=1.00E-05, loss= 1.3401 (max= 3.7452), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:51,690 - root - INFO - Step 4190: lr=1.00E-05, loss= 1.3401 (max= 3.7452), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:51,690 - root - INFO - Step 4190: lr=1.00E-05, loss= 1.3401 (max= 3.7452), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:51,690 - root - INFO - Step 4190: lr=1.00E-05, loss= 1.3401 (max= 3.7452), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:51,690 - root - INFO - Step 4190: lr=1.00E-05, loss= 1.3401 (max= 3.7452), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:30:51,690 - root - INFO - Step 4190: lr=1.00E-05, loss= 1.3401 (max= 3.7452), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:31:09,732 - root - INFO - Step 4200: lr=1.00E-05, loss= 1.3315 (max= 2.5834), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:09,733 - root - INFO - Step 4200: lr=1.00E-05, loss= 1.3315 (max= 2.5834), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:09,733 - root - INFO - Step 4200: lr=1.00E-05, loss= 1.3315 (max= 2.5834), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:09,733 - root - INFO - Step 4200: lr=1.00E-05, loss= 1.3315 (max= 2.5834), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:09,733 - root - INFO - Step 4200: lr=1.00E-05, loss= 1.3315 (max= 2.5834), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:09,733 - root - INFO - Step 4200: lr=1.00E-05, loss= 1.3315 (max= 2.5834), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:09,733 - root - INFO - Step 4200: lr=1.00E-05, loss= 1.3315 (max= 2.5834), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:09,733 - root - INFO - Step 4200: lr=1.00E-05, loss= 1.3315 (max= 2.5834), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:27,754 - root - INFO - Step 4210: lr=1.00E-05, loss= 1.3024 (max= 2.1805), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:27,754 - root - INFO - Step 4210: lr=1.00E-05, loss= 1.3024 (max= 2.1805), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:27,754 - root - INFO - Step 4210: lr=1.00E-05, loss= 1.3024 (max= 2.1805), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:27,754 - root - INFO - Step 4210: lr=1.00E-05, loss= 1.3024 (max= 2.1805), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:27,754 - root - INFO - Step 4210: lr=1.00E-05, loss= 1.3024 (max= 2.1805), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:27,754 - root - INFO - Step 4210: lr=1.00E-05, loss= 1.3024 (max= 2.1805), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:27,754 - root - INFO - Step 4210: lr=1.00E-05, loss= 1.3024 (max= 2.1805), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:27,754 - root - INFO - Step 4210: lr=1.00E-05, loss= 1.3024 (max= 2.1805), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:45,765 - root - INFO - Step 4220: lr=1.00E-05, loss= 1.3516 (max= 2.4924), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:45,765 - root - INFO - Step 4220: lr=1.00E-05, loss= 1.3516 (max= 2.4924), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:45,765 - root - INFO - Step 4220: lr=1.00E-05, loss= 1.3516 (max= 2.4924), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:45,765 - root - INFO - Step 4220: lr=1.00E-05, loss= 1.3516 (max= 2.4924), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:45,765 - root - INFO - Step 4220: lr=1.00E-05, loss= 1.3516 (max= 2.4924), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:45,765 - root - INFO - Step 4220: lr=1.00E-05, loss= 1.3516 (max= 2.4924), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:45,765 - root - INFO - Step 4220: lr=1.00E-05, loss= 1.3516 (max= 2.4924), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:31:45,765 - root - INFO - Step 4220: lr=1.00E-05, loss= 1.3516 (max= 2.4924), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:32:03,766 - root - INFO - Step 4230: lr=1.00E-05, loss= 1.3463 (max= 2.7262), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:03,766 - root - INFO - Step 4230: lr=1.00E-05, loss= 1.3463 (max= 2.7262), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:03,766 - root - INFO - Step 4230: lr=1.00E-05, loss= 1.3463 (max= 2.7262), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:03,766 - root - INFO - Step 4230: lr=1.00E-05, loss= 1.3463 (max= 2.7262), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:03,766 - root - INFO - Step 4230: lr=1.00E-05, loss= 1.3463 (max= 2.7262), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:03,766 - root - INFO - Step 4230: lr=1.00E-05, loss= 1.3463 (max= 2.7262), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:03,766 - root - INFO - Step 4230: lr=1.00E-05, loss= 1.3463 (max= 2.7262), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:03,766 - root - INFO - Step 4230: lr=1.00E-05, loss= 1.3463 (max= 2.7262), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:21,779 - root - INFO - Step 4240: lr=1.00E-05, loss= 1.3178 (max= 2.5841), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:21,779 - root - INFO - Step 4240: lr=1.00E-05, loss= 1.3178 (max= 2.5841), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:21,779 - root - INFO - Step 4240: lr=1.00E-05, loss= 1.3178 (max= 2.5841), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:21,779 - root - INFO - Step 4240: lr=1.00E-05, loss= 1.3178 (max= 2.5841), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:21,779 - root - INFO - Step 4240: lr=1.00E-05, loss= 1.3178 (max= 2.5841), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:21,779 - root - INFO - Step 4240: lr=1.00E-05, loss= 1.3178 (max= 2.5841), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:21,779 - root - INFO - Step 4240: lr=1.00E-05, loss= 1.3178 (max= 2.5841), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:21,780 - root - INFO - Step 4240: lr=1.00E-05, loss= 1.3178 (max= 2.5841), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:39,835 - root - INFO - Step 4250: lr=1.00E-05, loss= 1.2866 (max= 2.5070), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:39,835 - root - INFO - Step 4250: lr=1.00E-05, loss= 1.2866 (max= 2.5070), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:39,835 - root - INFO - Step 4250: lr=1.00E-05, loss= 1.2866 (max= 2.5070), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:39,835 - root - INFO - Step 4250: lr=1.00E-05, loss= 1.2866 (max= 2.5070), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:39,835 - root - INFO - Step 4250: lr=1.00E-05, loss= 1.2866 (max= 2.5070), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:39,835 - root - INFO - Step 4250: lr=1.00E-05, loss= 1.2866 (max= 2.5070), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:39,835 - root - INFO - Step 4250: lr=1.00E-05, loss= 1.2866 (max= 2.5070), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:39,835 - root - INFO - Step 4250: lr=1.00E-05, loss= 1.2866 (max= 2.5070), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:57,854 - root - INFO - Step 4260: lr=1.00E-05, loss= 1.2929 (max= 2.5113), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:57,855 - root - INFO - Step 4260: lr=1.00E-05, loss= 1.2929 (max= 2.5113), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:57,855 - root - INFO - Step 4260: lr=1.00E-05, loss= 1.2929 (max= 2.5113), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:57,855 - root - INFO - Step 4260: lr=1.00E-05, loss= 1.2929 (max= 2.5113), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:57,855 - root - INFO - Step 4260: lr=1.00E-05, loss= 1.2929 (max= 2.5113), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:57,855 - root - INFO - Step 4260: lr=1.00E-05, loss= 1.2929 (max= 2.5113), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:57,855 - root - INFO - Step 4260: lr=1.00E-05, loss= 1.2929 (max= 2.5113), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:32:57,855 - root - INFO - Step 4260: lr=1.00E-05, loss= 1.2929 (max= 2.5113), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:15,872 - root - INFO - Step 4270: lr=1.00E-05, loss= 1.2890 (max= 2.7088), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:15,872 - root - INFO - Step 4270: lr=1.00E-05, loss= 1.2890 (max= 2.7088), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:15,872 - root - INFO - Step 4270: lr=1.00E-05, loss= 1.2890 (max= 2.7088), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:15,872 - root - INFO - Step 4270: lr=1.00E-05, loss= 1.2890 (max= 2.7088), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:15,872 - root - INFO - Step 4270: lr=1.00E-05, loss= 1.2890 (max= 2.7088), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:15,872 - root - INFO - Step 4270: lr=1.00E-05, loss= 1.2890 (max= 2.7088), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:15,873 - root - INFO - Step 4270: lr=1.00E-05, loss= 1.2890 (max= 2.7088), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:15,873 - root - INFO - Step 4270: lr=1.00E-05, loss= 1.2890 (max= 2.7088), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:33,891 - root - INFO - Step 4280: lr=1.00E-05, loss= 1.3349 (max= 2.3917), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:33,891 - root - INFO - Step 4280: lr=1.00E-05, loss= 1.3349 (max= 2.3917), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:33,891 - root - INFO - Step 4280: lr=1.00E-05, loss= 1.3349 (max= 2.3917), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:33,891 - root - INFO - Step 4280: lr=1.00E-05, loss= 1.3349 (max= 2.3917), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:33,891 - root - INFO - Step 4280: lr=1.00E-05, loss= 1.3349 (max= 2.3917), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:33,892 - root - INFO - Step 4280: lr=1.00E-05, loss= 1.3349 (max= 2.3917), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:33,892 - root - INFO - Step 4280: lr=1.00E-05, loss= 1.3349 (max= 2.3917), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:33,892 - root - INFO - Step 4280: lr=1.00E-05, loss= 1.3349 (max= 2.3917), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:51,925 - root - INFO - Step 4290: lr=1.00E-05, loss= 1.3279 (max= 2.5869), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:51,926 - root - INFO - Step 4290: lr=1.00E-05, loss= 1.3279 (max= 2.5869), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:51,926 - root - INFO - Step 4290: lr=1.00E-05, loss= 1.3279 (max= 2.5869), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:51,926 - root - INFO - Step 4290: lr=1.00E-05, loss= 1.3279 (max= 2.5869), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:51,926 - root - INFO - Step 4290: lr=1.00E-05, loss= 1.3279 (max= 2.5869), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:51,926 - root - INFO - Step 4290: lr=1.00E-05, loss= 1.3279 (max= 2.5869), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:51,926 - root - INFO - Step 4290: lr=1.00E-05, loss= 1.3279 (max= 2.5869), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:33:51,926 - root - INFO - Step 4290: lr=1.00E-05, loss= 1.3279 (max= 2.5869), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:09,952 - root - INFO - Step 4300: lr=1.00E-05, loss= 1.3085 (max= 2.6142), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:09,952 - root - INFO - Step 4300: lr=1.00E-05, loss= 1.3085 (max= 2.6142), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:09,952 - root - INFO - Step 4300: lr=1.00E-05, loss= 1.3085 (max= 2.6142), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:09,952 - root - INFO - Step 4300: lr=1.00E-05, loss= 1.3085 (max= 2.6142), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:09,952 - root - INFO - Step 4300: lr=1.00E-05, loss= 1.3085 (max= 2.6142), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:09,953 - root - INFO - Step 4300: lr=1.00E-05, loss= 1.3085 (max= 2.6142), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:09,953 - root - INFO - Step 4300: lr=1.00E-05, loss= 1.3085 (max= 2.6142), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:09,953 - root - INFO - Step 4300: lr=1.00E-05, loss= 1.3085 (max= 2.6142), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:27,987 - root - INFO - Step 4310: lr=1.00E-05, loss= 1.3241 (max= 2.4682), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:27,987 - root - INFO - Step 4310: lr=1.00E-05, loss= 1.3241 (max= 2.4682), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:27,987 - root - INFO - Step 4310: lr=1.00E-05, loss= 1.3241 (max= 2.4682), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:27,987 - root - INFO - Step 4310: lr=1.00E-05, loss= 1.3241 (max= 2.4682), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:27,987 - root - INFO - Step 4310: lr=1.00E-05, loss= 1.3241 (max= 2.4682), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:27,987 - root - INFO - Step 4310: lr=1.00E-05, loss= 1.3241 (max= 2.4682), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:27,988 - root - INFO - Step 4310: lr=1.00E-05, loss= 1.3241 (max= 2.4682), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:27,988 - root - INFO - Step 4310: lr=1.00E-05, loss= 1.3241 (max= 2.4682), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:46,024 - root - INFO - Step 4320: lr=1.00E-05, loss= 1.3164 (max= 2.3857), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:46,025 - root - INFO - Step 4320: lr=1.00E-05, loss= 1.3164 (max= 2.3857), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:46,025 - root - INFO - Step 4320: lr=1.00E-05, loss= 1.3164 (max= 2.3857), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:46,025 - root - INFO - Step 4320: lr=1.00E-05, loss= 1.3164 (max= 2.3857), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:46,025 - root - INFO - Step 4320: lr=1.00E-05, loss= 1.3164 (max= 2.3857), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:46,025 - root - INFO - Step 4320: lr=1.00E-05, loss= 1.3164 (max= 2.3857), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:46,025 - root - INFO - Step 4320: lr=1.00E-05, loss= 1.3164 (max= 2.3857), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:34:46,025 - root - INFO - Step 4320: lr=1.00E-05, loss= 1.3164 (max= 2.3857), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:04,067 - root - INFO - Step 4330: lr=1.00E-05, loss= 1.3091 (max= 2.3756), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:04,067 - root - INFO - Step 4330: lr=1.00E-05, loss= 1.3091 (max= 2.3756), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:04,067 - root - INFO - Step 4330: lr=1.00E-05, loss= 1.3091 (max= 2.3756), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:04,067 - root - INFO - Step 4330: lr=1.00E-05, loss= 1.3091 (max= 2.3756), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:04,067 - root - INFO - Step 4330: lr=1.00E-05, loss= 1.3091 (max= 2.3756), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:04,068 - root - INFO - Step 4330: lr=1.00E-05, loss= 1.3091 (max= 2.3756), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:04,068 - root - INFO - Step 4330: lr=1.00E-05, loss= 1.3091 (max= 2.3756), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:04,068 - root - INFO - Step 4330: lr=1.00E-05, loss= 1.3091 (max= 2.3756), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:22,115 - root - INFO - Step 4340: lr=1.00E-05, loss= 1.2887 (max= 2.3542), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:22,115 - root - INFO - Step 4340: lr=1.00E-05, loss= 1.2887 (max= 2.3542), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:22,115 - root - INFO - Step 4340: lr=1.00E-05, loss= 1.2887 (max= 2.3542), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:22,115 - root - INFO - Step 4340: lr=1.00E-05, loss= 1.2887 (max= 2.3542), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:22,115 - root - INFO - Step 4340: lr=1.00E-05, loss= 1.2887 (max= 2.3542), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:22,115 - root - INFO - Step 4340: lr=1.00E-05, loss= 1.2887 (max= 2.3542), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:22,115 - root - INFO - Step 4340: lr=1.00E-05, loss= 1.2887 (max= 2.3542), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:22,115 - root - INFO - Step 4340: lr=1.00E-05, loss= 1.2887 (max= 2.3542), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:40,157 - root - INFO - Step 4350: lr=1.00E-05, loss= 1.3093 (max= 2.7287), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:40,157 - root - INFO - Step 4350: lr=1.00E-05, loss= 1.3093 (max= 2.7287), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:40,157 - root - INFO - Step 4350: lr=1.00E-05, loss= 1.3093 (max= 2.7287), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:40,157 - root - INFO - Step 4350: lr=1.00E-05, loss= 1.3093 (max= 2.7287), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:40,157 - root - INFO - Step 4350: lr=1.00E-05, loss= 1.3093 (max= 2.7287), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:40,157 - root - INFO - Step 4350: lr=1.00E-05, loss= 1.3093 (max= 2.7287), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:40,158 - root - INFO - Step 4350: lr=1.00E-05, loss= 1.3093 (max= 2.7287), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:40,158 - root - INFO - Step 4350: lr=1.00E-05, loss= 1.3093 (max= 2.7287), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:58,179 - root - INFO - Step 4360: lr=1.00E-05, loss= 1.3082 (max= 2.3530), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:58,179 - root - INFO - Step 4360: lr=1.00E-05, loss= 1.3082 (max= 2.3530), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:58,179 - root - INFO - Step 4360: lr=1.00E-05, loss= 1.3082 (max= 2.3530), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:58,179 - root - INFO - Step 4360: lr=1.00E-05, loss= 1.3082 (max= 2.3530), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:58,179 - root - INFO - Step 4360: lr=1.00E-05, loss= 1.3082 (max= 2.3530), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:58,179 - root - INFO - Step 4360: lr=1.00E-05, loss= 1.3082 (max= 2.3530), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:58,179 - root - INFO - Step 4360: lr=1.00E-05, loss= 1.3082 (max= 2.3530), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:35:58,179 - root - INFO - Step 4360: lr=1.00E-05, loss= 1.3082 (max= 2.3530), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:36:16,213 - root - INFO - Step 4370: lr=1.00E-05, loss= 1.3079 (max= 2.3515), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:36:16,214 - root - INFO - Step 4370: lr=1.00E-05, loss= 1.3079 (max= 2.3515), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:36:16,214 - root - INFO - Step 4370: lr=1.00E-05, loss= 1.3079 (max= 2.3515), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:36:16,214 - root - INFO - Step 4370: lr=1.00E-05, loss= 1.3079 (max= 2.3515), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:36:16,214 - root - INFO - Step 4370: lr=1.00E-05, loss= 1.3079 (max= 2.3515), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:36:16,214 - root - INFO - Step 4370: lr=1.00E-05, loss= 1.3079 (max= 2.3515), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:36:16,214 - root - INFO - Step 4370: lr=1.00E-05, loss= 1.3079 (max= 2.3515), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:36:16,214 - root - INFO - Step 4370: lr=1.00E-05, loss= 1.3079 (max= 2.3515), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:36:34,243 - root - INFO - Step 4380: lr=1.00E-05, loss= 1.2744 (max= 2.4090), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:36:34,243 - root - INFO - Step 4380: lr=1.00E-05, loss= 1.2744 (max= 2.4090), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:36:34,243 - root - INFO - Step 4380: lr=1.00E-05, loss= 1.2744 (max= 2.4090), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:36:34,243 - root - INFO - Step 4380: lr=1.00E-05, loss= 1.2744 (max= 2.4090), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:36:34,244 - root - INFO - Step 4380: lr=1.00E-05, loss= 1.2744 (max= 2.4090), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:36:34,244 - root - INFO - Step 4380: lr=1.00E-05, loss= 1.2744 (max= 2.4090), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:36:34,244 - root - INFO - Step 4380: lr=1.00E-05, loss= 1.2744 (max= 2.4090), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:36:34,244 - root - INFO - Step 4380: lr=1.00E-05, loss= 1.2744 (max= 2.4090), tps=18178, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:36:52,252 - root - INFO - Step 4390: lr=1.00E-05, loss= 1.2960 (max= 2.8217), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:36:52,252 - root - INFO - Step 4390: lr=1.00E-05, loss= 1.2960 (max= 2.8217), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:36:52,252 - root - INFO - Step 4390: lr=1.00E-05, loss= 1.2960 (max= 2.8217), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:36:52,252 - root - INFO - Step 4390: lr=1.00E-05, loss= 1.2960 (max= 2.8217), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:36:52,253 - root - INFO - Step 4390: lr=1.00E-05, loss= 1.2960 (max= 2.8217), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:36:52,253 - root - INFO - Step 4390: lr=1.00E-05, loss= 1.2960 (max= 2.8217), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:36:52,253 - root - INFO - Step 4390: lr=1.00E-05, loss= 1.2960 (max= 2.8217), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:36:52,253 - root - INFO - Step 4390: lr=1.00E-05, loss= 1.2960 (max= 2.8217), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:37:10,276 - root - INFO - Step 4400: lr=1.00E-05, loss= 1.3389 (max= 3.5005), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:37:10,276 - root - INFO - Step 4400: lr=1.00E-05, loss= 1.3389 (max= 3.5005), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:37:10,276 - root - INFO - Step 4400: lr=1.00E-05, loss= 1.3389 (max= 3.5005), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:37:10,276 - root - INFO - Step 4400: lr=1.00E-05, loss= 1.3389 (max= 3.5005), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:37:10,276 - root - INFO - Step 4400: lr=1.00E-05, loss= 1.3389 (max= 3.5005), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:37:10,276 - root - INFO - Step 4400: lr=1.00E-05, loss= 1.3389 (max= 3.5005), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:37:10,276 - root - INFO - Step 4400: lr=1.00E-05, loss= 1.3389 (max= 3.5005), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:37:10,276 - root - INFO - Step 4400: lr=1.00E-05, loss= 1.3389 (max= 3.5005), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:37:28,305 - root - INFO - Step 4410: lr=1.00E-05, loss= 1.3134 (max= 2.5882), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:37:28,305 - root - INFO - Step 4410: lr=1.00E-05, loss= 1.3134 (max= 2.5882), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:37:28,305 - root - INFO - Step 4410: lr=1.00E-05, loss= 1.3134 (max= 2.5882), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:37:28,305 - root - INFO - Step 4410: lr=1.00E-05, loss= 1.3134 (max= 2.5882), tps=18178, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:37:28,306 - root - INFO - Step 4410: lr=1.00E-05, loss= 1.3134 (max= 2.5882), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:37:28,306 - root - INFO - Step 4410: lr=1.00E-05, loss= 1.3134 (max= 2.5882), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:37:28,306 - root - INFO - Step 4410: lr=1.00E-05, loss= 1.3134 (max= 2.5882), tps=18178, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:37:28,306 - root - INFO - Step 4410: lr=1.00E-05, loss= 1.3134 (max= 2.5882), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:37:46,317 - root - INFO - Step 4420: lr=1.00E-05, loss= 1.3898 (max= 3.7067), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:37:46,317 - root - INFO - Step 4420: lr=1.00E-05, loss= 1.3898 (max= 3.7067), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:37:46,317 - root - INFO - Step 4420: lr=1.00E-05, loss= 1.3898 (max= 3.7067), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:37:46,317 - root - INFO - Step 4420: lr=1.00E-05, loss= 1.3898 (max= 3.7067), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:37:46,317 - root - INFO - Step 4420: lr=1.00E-05, loss= 1.3898 (max= 3.7067), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:37:46,317 - root - INFO - Step 4420: lr=1.00E-05, loss= 1.3898 (max= 3.7067), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:37:46,317 - root - INFO - Step 4420: lr=1.00E-05, loss= 1.3898 (max= 3.7067), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:37:46,317 - root - INFO - Step 4420: lr=1.00E-05, loss= 1.3898 (max= 3.7067), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:38:04,375 - root - INFO - Step 4430: lr=1.00E-05, loss= 1.3161 (max= 2.4450), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:04,375 - root - INFO - Step 4430: lr=1.00E-05, loss= 1.3161 (max= 2.4450), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:04,375 - root - INFO - Step 4430: lr=1.00E-05, loss= 1.3161 (max= 2.4450), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:04,375 - root - INFO - Step 4430: lr=1.00E-05, loss= 1.3161 (max= 2.4450), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:04,375 - root - INFO - Step 4430: lr=1.00E-05, loss= 1.3161 (max= 2.4450), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:04,375 - root - INFO - Step 4430: lr=1.00E-05, loss= 1.3161 (max= 2.4450), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:04,375 - root - INFO - Step 4430: lr=1.00E-05, loss= 1.3161 (max= 2.4450), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:04,375 - root - INFO - Step 4430: lr=1.00E-05, loss= 1.3161 (max= 2.4450), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:22,379 - root - INFO - Step 4440: lr=1.00E-05, loss= 1.3231 (max= 2.3592), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:22,379 - root - INFO - Step 4440: lr=1.00E-05, loss= 1.3231 (max= 2.3592), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:22,379 - root - INFO - Step 4440: lr=1.00E-05, loss= 1.3231 (max= 2.3592), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:22,379 - root - INFO - Step 4440: lr=1.00E-05, loss= 1.3231 (max= 2.3592), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:22,379 - root - INFO - Step 4440: lr=1.00E-05, loss= 1.3231 (max= 2.3592), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:22,379 - root - INFO - Step 4440: lr=1.00E-05, loss= 1.3231 (max= 2.3592), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:22,379 - root - INFO - Step 4440: lr=1.00E-05, loss= 1.3231 (max= 2.3592), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:22,379 - root - INFO - Step 4440: lr=1.00E-05, loss= 1.3231 (max= 2.3592), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:40,376 - root - INFO - Step 4450: lr=1.00E-05, loss= 1.2850 (max= 2.3145), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:40,377 - root - INFO - Step 4450: lr=1.00E-05, loss= 1.2850 (max= 2.3145), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:40,377 - root - INFO - Step 4450: lr=1.00E-05, loss= 1.2850 (max= 2.3145), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:40,377 - root - INFO - Step 4450: lr=1.00E-05, loss= 1.2850 (max= 2.3145), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:40,377 - root - INFO - Step 4450: lr=1.00E-05, loss= 1.2850 (max= 2.3145), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:40,377 - root - INFO - Step 4450: lr=1.00E-05, loss= 1.2850 (max= 2.3145), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:40,377 - root - INFO - Step 4450: lr=1.00E-05, loss= 1.2850 (max= 2.3145), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:40,377 - root - INFO - Step 4450: lr=1.00E-05, loss= 1.2850 (max= 2.3145), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:38:58,353 - root - INFO - Step 4460: lr=1.00E-05, loss= 1.3296 (max= 2.4329), tps=18231, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:38:58,353 - root - INFO - Step 4460: lr=1.00E-05, loss= 1.3296 (max= 2.4329), tps=18231, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:38:58,353 - root - INFO - Step 4460: lr=1.00E-05, loss= 1.3296 (max= 2.4329), tps=18231, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:38:58,354 - root - INFO - Step 4460: lr=1.00E-05, loss= 1.3296 (max= 2.4329), tps=18231, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:38:58,354 - root - INFO - Step 4460: lr=1.00E-05, loss= 1.3296 (max= 2.4329), tps=18232, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:38:58,354 - root - INFO - Step 4460: lr=1.00E-05, loss= 1.3296 (max= 2.4329), tps=18231, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:38:58,354 - root - INFO - Step 4460: lr=1.00E-05, loss= 1.3296 (max= 2.4329), tps=18232, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:38:58,354 - root - INFO - Step 4460: lr=1.00E-05, loss= 1.3296 (max= 2.4329), tps=18232, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:39:16,378 - root - INFO - Step 4470: lr=1.00E-05, loss= 1.3266 (max= 2.3852), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:39:16,379 - root - INFO - Step 4470: lr=1.00E-05, loss= 1.3266 (max= 2.3852), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:39:16,379 - root - INFO - Step 4470: lr=1.00E-05, loss= 1.3266 (max= 2.3852), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:39:16,379 - root - INFO - Step 4470: lr=1.00E-05, loss= 1.3266 (max= 2.3852), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:39:16,379 - root - INFO - Step 4470: lr=1.00E-05, loss= 1.3266 (max= 2.3852), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:39:16,379 - root - INFO - Step 4470: lr=1.00E-05, loss= 1.3266 (max= 2.3852), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:39:16,379 - root - INFO - Step 4470: lr=1.00E-05, loss= 1.3266 (max= 2.3852), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:39:16,379 - root - INFO - Step 4470: lr=1.00E-05, loss= 1.3266 (max= 2.3852), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:39:34,399 - root - INFO - Step 4480: lr=1.00E-05, loss= 1.2508 (max= 2.2571), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:39:34,399 - root - INFO - Step 4480: lr=1.00E-05, loss= 1.2508 (max= 2.2571), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:39:34,399 - root - INFO - Step 4480: lr=1.00E-05, loss= 1.2508 (max= 2.2571), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:39:34,399 - root - INFO - Step 4480: lr=1.00E-05, loss= 1.2508 (max= 2.2571), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:39:34,400 - root - INFO - Step 4480: lr=1.00E-05, loss= 1.2508 (max= 2.2571), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:39:34,400 - root - INFO - Step 4480: lr=1.00E-05, loss= 1.2508 (max= 2.2571), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:39:34,400 - root - INFO - Step 4480: lr=1.00E-05, loss= 1.2508 (max= 2.2571), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:39:34,400 - root - INFO - Step 4480: lr=1.00E-05, loss= 1.2508 (max= 2.2571), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:39:52,429 - root - INFO - Step 4490: lr=1.00E-05, loss= 1.3270 (max= 2.4211), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:39:52,429 - root - INFO - Step 4490: lr=1.00E-05, loss= 1.3270 (max= 2.4211), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:39:52,429 - root - INFO - Step 4490: lr=1.00E-05, loss= 1.3270 (max= 2.4211), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:39:52,429 - root - INFO - Step 4490: lr=1.00E-05, loss= 1.3270 (max= 2.4211), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:39:52,429 - root - INFO - Step 4490: lr=1.00E-05, loss= 1.3270 (max= 2.4211), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:39:52,429 - root - INFO - Step 4490: lr=1.00E-05, loss= 1.3270 (max= 2.4211), tps=18178, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:39:52,429 - root - INFO - Step 4490: lr=1.00E-05, loss= 1.3270 (max= 2.4211), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:39:52,429 - root - INFO - Step 4490: lr=1.00E-05, loss= 1.3270 (max= 2.4211), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:10,439 - root - INFO - Step 4500: lr=1.00E-05, loss= 1.3081 (max= 2.6643), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:10,439 - root - INFO - Step 4500: lr=1.00E-05, loss= 1.3081 (max= 2.6643), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:10,439 - root - INFO - Step 4500: lr=1.00E-05, loss= 1.3081 (max= 2.6643), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:10,440 - root - INFO - Step 4500: lr=1.00E-05, loss= 1.3081 (max= 2.6643), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:10,440 - root - INFO - Step 4500: lr=1.00E-05, loss= 1.3081 (max= 2.6643), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:10,440 - root - INFO - Step 4500: lr=1.00E-05, loss= 1.3081 (max= 2.6643), tps=18198, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:10,440 - root - INFO - Step 4500: lr=1.00E-05, loss= 1.3081 (max= 2.6643), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:10,440 - root - INFO - Step 4500: lr=1.00E-05, loss= 1.3081 (max= 2.6643), tps=18198, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:28,454 - root - INFO - Step 4510: lr=1.00E-05, loss= 1.2852 (max= 2.3557), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:28,454 - root - INFO - Step 4510: lr=1.00E-05, loss= 1.2852 (max= 2.3557), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:28,454 - root - INFO - Step 4510: lr=1.00E-05, loss= 1.2852 (max= 2.3557), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:28,454 - root - INFO - Step 4510: lr=1.00E-05, loss= 1.2852 (max= 2.3557), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:28,454 - root - INFO - Step 4510: lr=1.00E-05, loss= 1.2852 (max= 2.3557), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:28,454 - root - INFO - Step 4510: lr=1.00E-05, loss= 1.2852 (max= 2.3557), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:28,454 - root - INFO - Step 4510: lr=1.00E-05, loss= 1.2852 (max= 2.3557), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:28,454 - root - INFO - Step 4510: lr=1.00E-05, loss= 1.2852 (max= 2.3557), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:46,449 - root - INFO - Step 4520: lr=1.00E-05, loss= 1.3123 (max= 2.3552), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:46,449 - root - INFO - Step 4520: lr=1.00E-05, loss= 1.3123 (max= 2.3552), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:46,449 - root - INFO - Step 4520: lr=1.00E-05, loss= 1.3123 (max= 2.3552), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:46,450 - root - INFO - Step 4520: lr=1.00E-05, loss= 1.3123 (max= 2.3552), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:46,450 - root - INFO - Step 4520: lr=1.00E-05, loss= 1.3123 (max= 2.3552), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:46,450 - root - INFO - Step 4520: lr=1.00E-05, loss= 1.3123 (max= 2.3552), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:46,450 - root - INFO - Step 4520: lr=1.00E-05, loss= 1.3123 (max= 2.3552), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:40:46,450 - root - INFO - Step 4520: lr=1.00E-05, loss= 1.3123 (max= 2.3552), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:04,463 - root - INFO - Step 4530: lr=1.00E-05, loss= 1.3050 (max= 2.3374), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:41:04,463 - root - INFO - Step 4530: lr=1.00E-05, loss= 1.3050 (max= 2.3374), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:41:04,463 - root - INFO - Step 4530: lr=1.00E-05, loss= 1.3050 (max= 2.3374), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:41:04,463 - root - INFO - Step 4530: lr=1.00E-05, loss= 1.3050 (max= 2.3374), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:41:04,463 - root - INFO - Step 4530: lr=1.00E-05, loss= 1.3050 (max= 2.3374), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:41:04,463 - root - INFO - Step 4530: lr=1.00E-05, loss= 1.3050 (max= 2.3374), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:41:04,463 - root - INFO - Step 4530: lr=1.00E-05, loss= 1.3050 (max= 2.3374), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:41:04,464 - root - INFO - Step 4530: lr=1.00E-05, loss= 1.3050 (max= 2.3374), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:41:22,525 - root - INFO - Step 4540: lr=1.00E-05, loss= 1.3083 (max= 2.3372), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:22,525 - root - INFO - Step 4540: lr=1.00E-05, loss= 1.3083 (max= 2.3372), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:22,525 - root - INFO - Step 4540: lr=1.00E-05, loss= 1.3083 (max= 2.3372), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:22,525 - root - INFO - Step 4540: lr=1.00E-05, loss= 1.3083 (max= 2.3372), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:22,525 - root - INFO - Step 4540: lr=1.00E-05, loss= 1.3083 (max= 2.3372), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:22,525 - root - INFO - Step 4540: lr=1.00E-05, loss= 1.3083 (max= 2.3372), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:22,525 - root - INFO - Step 4540: lr=1.00E-05, loss= 1.3083 (max= 2.3372), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:22,525 - root - INFO - Step 4540: lr=1.00E-05, loss= 1.3083 (max= 2.3372), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:40,535 - root - INFO - Step 4550: lr=1.00E-05, loss= 1.3017 (max= 2.4112), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:40,535 - root - INFO - Step 4550: lr=1.00E-05, loss= 1.3017 (max= 2.4112), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:40,535 - root - INFO - Step 4550: lr=1.00E-05, loss= 1.3017 (max= 2.4112), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:40,535 - root - INFO - Step 4550: lr=1.00E-05, loss= 1.3017 (max= 2.4112), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:40,535 - root - INFO - Step 4550: lr=1.00E-05, loss= 1.3017 (max= 2.4112), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:40,535 - root - INFO - Step 4550: lr=1.00E-05, loss= 1.3017 (max= 2.4112), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:40,535 - root - INFO - Step 4550: lr=1.00E-05, loss= 1.3017 (max= 2.4112), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:40,535 - root - INFO - Step 4550: lr=1.00E-05, loss= 1.3017 (max= 2.4112), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:58,530 - root - INFO - Step 4560: lr=1.00E-05, loss= 1.3150 (max= 2.5502), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:58,530 - root - INFO - Step 4560: lr=1.00E-05, loss= 1.3150 (max= 2.5502), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:58,530 - root - INFO - Step 4560: lr=1.00E-05, loss= 1.3150 (max= 2.5502), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:58,531 - root - INFO - Step 4560: lr=1.00E-05, loss= 1.3150 (max= 2.5502), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:58,531 - root - INFO - Step 4560: lr=1.00E-05, loss= 1.3150 (max= 2.5502), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:58,531 - root - INFO - Step 4560: lr=1.00E-05, loss= 1.3150 (max= 2.5502), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:58,531 - root - INFO - Step 4560: lr=1.00E-05, loss= 1.3150 (max= 2.5502), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:41:58,531 - root - INFO - Step 4560: lr=1.00E-05, loss= 1.3150 (max= 2.5502), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:42:16,527 - root - INFO - Step 4570: lr=1.00E-05, loss= 1.3109 (max= 2.5487), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:42:16,527 - root - INFO - Step 4570: lr=1.00E-05, loss= 1.3109 (max= 2.5487), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:42:16,528 - root - INFO - Step 4570: lr=1.00E-05, loss= 1.3109 (max= 2.5487), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:42:16,528 - root - INFO - Step 4570: lr=1.00E-05, loss= 1.3109 (max= 2.5487), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:42:16,528 - root - INFO - Step 4570: lr=1.00E-05, loss= 1.3109 (max= 2.5487), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:42:16,528 - root - INFO - Step 4570: lr=1.00E-05, loss= 1.3109 (max= 2.5487), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:42:16,528 - root - INFO - Step 4570: lr=1.00E-05, loss= 1.3109 (max= 2.5487), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:42:16,528 - root - INFO - Step 4570: lr=1.00E-05, loss= 1.3109 (max= 2.5487), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:42:34,535 - root - INFO - Step 4580: lr=1.00E-05, loss= 1.3188 (max= 2.6044), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:42:34,535 - root - INFO - Step 4580: lr=1.00E-05, loss= 1.3188 (max= 2.6044), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:42:34,535 - root - INFO - Step 4580: lr=1.00E-05, loss= 1.3188 (max= 2.6044), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:42:34,535 - root - INFO - Step 4580: lr=1.00E-05, loss= 1.3188 (max= 2.6044), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:42:34,535 - root - INFO - Step 4580: lr=1.00E-05, loss= 1.3188 (max= 2.6044), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:42:34,535 - root - INFO - Step 4580: lr=1.00E-05, loss= 1.3188 (max= 2.6044), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:42:34,535 - root - INFO - Step 4580: lr=1.00E-05, loss= 1.3188 (max= 2.6044), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:42:34,535 - root - INFO - Step 4580: lr=1.00E-05, loss= 1.3188 (max= 2.6044), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:42:52,558 - root - INFO - Step 4590: lr=1.00E-05, loss= 1.3071 (max= 2.5307), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:42:52,558 - root - INFO - Step 4590: lr=1.00E-05, loss= 1.3071 (max= 2.5307), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:42:52,558 - root - INFO - Step 4590: lr=1.00E-05, loss= 1.3071 (max= 2.5307), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:42:52,558 - root - INFO - Step 4590: lr=1.00E-05, loss= 1.3071 (max= 2.5307), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:42:52,558 - root - INFO - Step 4590: lr=1.00E-05, loss= 1.3071 (max= 2.5307), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:42:52,558 - root - INFO - Step 4590: lr=1.00E-05, loss= 1.3071 (max= 2.5307), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:42:52,558 - root - INFO - Step 4590: lr=1.00E-05, loss= 1.3071 (max= 2.5307), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:42:52,558 - root - INFO - Step 4590: lr=1.00E-05, loss= 1.3071 (max= 2.5307), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:10,683 - root - INFO - Step 4600: lr=1.00E-05, loss= 1.2913 (max= 2.8660), tps=18084, mfu=37.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:10,683 - root - INFO - Step 4600: lr=1.00E-05, loss= 1.2913 (max= 2.8660), tps=18084, mfu=37.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:10,683 - root - INFO - Step 4600: lr=1.00E-05, loss= 1.2913 (max= 2.8660), tps=18084, mfu=37.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:10,683 - root - INFO - Step 4600: lr=1.00E-05, loss= 1.2913 (max= 2.8660), tps=18084, mfu=37.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:10,683 - root - INFO - Step 4600: lr=1.00E-05, loss= 1.2913 (max= 2.8660), tps=18084, mfu=37.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:10,683 - root - INFO - Step 4600: lr=1.00E-05, loss= 1.2913 (max= 2.8660), tps=18084, mfu=37.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:10,683 - root - INFO - Step 4600: lr=1.00E-05, loss= 1.2913 (max= 2.8660), tps=18084, mfu=37.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:10,683 - root - INFO - Step 4600: lr=1.00E-05, loss= 1.2913 (max= 2.8660), tps=18084, mfu=37.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:28,760 - root - INFO - Step 4610: lr=1.00E-05, loss= 1.3208 (max= 2.8930), tps=18130, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:28,760 - root - INFO - Step 4610: lr=1.00E-05, loss= 1.3208 (max= 2.8930), tps=18130, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:28,760 - root - INFO - Step 4610: lr=1.00E-05, loss= 1.3208 (max= 2.8930), tps=18130, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:28,760 - root - INFO - Step 4610: lr=1.00E-05, loss= 1.3208 (max= 2.8930), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:28,760 - root - INFO - Step 4610: lr=1.00E-05, loss= 1.3208 (max= 2.8930), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:28,760 - root - INFO - Step 4610: lr=1.00E-05, loss= 1.3208 (max= 2.8930), tps=18130, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:28,760 - root - INFO - Step 4610: lr=1.00E-05, loss= 1.3208 (max= 2.8930), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:28,760 - root - INFO - Step 4610: lr=1.00E-05, loss= 1.3208 (max= 2.8930), tps=18130, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:46,789 - root - INFO - Step 4620: lr=1.00E-05, loss= 1.2732 (max= 3.0344), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:46,789 - root - INFO - Step 4620: lr=1.00E-05, loss= 1.2732 (max= 3.0344), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:46,789 - root - INFO - Step 4620: lr=1.00E-05, loss= 1.2732 (max= 3.0344), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:46,789 - root - INFO - Step 4620: lr=1.00E-05, loss= 1.2732 (max= 3.0344), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:46,789 - root - INFO - Step 4620: lr=1.00E-05, loss= 1.2732 (max= 3.0344), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:46,789 - root - INFO - Step 4620: lr=1.00E-05, loss= 1.2732 (max= 3.0344), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:46,789 - root - INFO - Step 4620: lr=1.00E-05, loss= 1.2732 (max= 3.0344), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:43:46,789 - root - INFO - Step 4620: lr=1.00E-05, loss= 1.2732 (max= 3.0344), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:04,812 - root - INFO - Step 4630: lr=1.00E-05, loss= 1.2820 (max= 3.6760), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:44:04,812 - root - INFO - Step 4630: lr=1.00E-05, loss= 1.2820 (max= 3.6760), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:44:04,812 - root - INFO - Step 4630: lr=1.00E-05, loss= 1.2820 (max= 3.6760), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:44:04,812 - root - INFO - Step 4630: lr=1.00E-05, loss= 1.2820 (max= 3.6760), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:44:04,812 - root - INFO - Step 4630: lr=1.00E-05, loss= 1.2820 (max= 3.6760), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:44:04,812 - root - INFO - Step 4630: lr=1.00E-05, loss= 1.2820 (max= 3.6760), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:44:04,812 - root - INFO - Step 4630: lr=1.00E-05, loss= 1.2820 (max= 3.6760), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:44:04,813 - root - INFO - Step 4630: lr=1.00E-05, loss= 1.2820 (max= 3.6760), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:44:22,854 - root - INFO - Step 4640: lr=1.00E-05, loss= 1.2884 (max= 2.3522), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:22,854 - root - INFO - Step 4640: lr=1.00E-05, loss= 1.2884 (max= 2.3522), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:22,854 - root - INFO - Step 4640: lr=1.00E-05, loss= 1.2884 (max= 2.3522), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:22,854 - root - INFO - Step 4640: lr=1.00E-05, loss= 1.2884 (max= 2.3522), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:22,855 - root - INFO - Step 4640: lr=1.00E-05, loss= 1.2884 (max= 2.3522), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:22,855 - root - INFO - Step 4640: lr=1.00E-05, loss= 1.2884 (max= 2.3522), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:22,855 - root - INFO - Step 4640: lr=1.00E-05, loss= 1.2884 (max= 2.3522), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:22,855 - root - INFO - Step 4640: lr=1.00E-05, loss= 1.2884 (max= 2.3522), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:40,855 - root - INFO - Step 4650: lr=1.00E-05, loss= 1.2827 (max= 2.2299), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:40,855 - root - INFO - Step 4650: lr=1.00E-05, loss= 1.2827 (max= 2.2299), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:40,855 - root - INFO - Step 4650: lr=1.00E-05, loss= 1.2827 (max= 2.2299), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:40,855 - root - INFO - Step 4650: lr=1.00E-05, loss= 1.2827 (max= 2.2299), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:40,855 - root - INFO - Step 4650: lr=1.00E-05, loss= 1.2827 (max= 2.2299), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:40,855 - root - INFO - Step 4650: lr=1.00E-05, loss= 1.2827 (max= 2.2299), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:40,855 - root - INFO - Step 4650: lr=1.00E-05, loss= 1.2827 (max= 2.2299), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:40,855 - root - INFO - Step 4650: lr=1.00E-05, loss= 1.2827 (max= 2.2299), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:58,862 - root - INFO - Step 4660: lr=1.00E-05, loss= 1.2904 (max= 2.1373), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:58,862 - root - INFO - Step 4660: lr=1.00E-05, loss= 1.2904 (max= 2.1373), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:58,862 - root - INFO - Step 4660: lr=1.00E-05, loss= 1.2904 (max= 2.1373), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:58,863 - root - INFO - Step 4660: lr=1.00E-05, loss= 1.2904 (max= 2.1373), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:58,863 - root - INFO - Step 4660: lr=1.00E-05, loss= 1.2904 (max= 2.1373), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:58,863 - root - INFO - Step 4660: lr=1.00E-05, loss= 1.2904 (max= 2.1373), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:58,863 - root - INFO - Step 4660: lr=1.00E-05, loss= 1.2904 (max= 2.1373), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:44:58,863 - root - INFO - Step 4660: lr=1.00E-05, loss= 1.2904 (max= 2.1373), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:45:16,878 - root - INFO - Step 4670: lr=1.00E-05, loss= 1.3339 (max= 2.2194), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:45:16,878 - root - INFO - Step 4670: lr=1.00E-05, loss= 1.3339 (max= 2.2194), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:45:16,878 - root - INFO - Step 4670: lr=1.00E-05, loss= 1.3339 (max= 2.2194), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:45:16,878 - root - INFO - Step 4670: lr=1.00E-05, loss= 1.3339 (max= 2.2194), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:45:16,878 - root - INFO - Step 4670: lr=1.00E-05, loss= 1.3339 (max= 2.2194), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:45:16,878 - root - INFO - Step 4670: lr=1.00E-05, loss= 1.3339 (max= 2.2194), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:45:16,878 - root - INFO - Step 4670: lr=1.00E-05, loss= 1.3339 (max= 2.2194), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:45:16,878 - root - INFO - Step 4670: lr=1.00E-05, loss= 1.3339 (max= 2.2194), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:45:34,885 - root - INFO - Step 4680: lr=1.00E-05, loss= 1.3171 (max= 2.1665), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:45:34,885 - root - INFO - Step 4680: lr=1.00E-05, loss= 1.3171 (max= 2.1665), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:45:34,885 - root - INFO - Step 4680: lr=1.00E-05, loss= 1.3171 (max= 2.1665), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:45:34,885 - root - INFO - Step 4680: lr=1.00E-05, loss= 1.3171 (max= 2.1665), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:45:34,885 - root - INFO - Step 4680: lr=1.00E-05, loss= 1.3171 (max= 2.1665), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:45:34,886 - root - INFO - Step 4680: lr=1.00E-05, loss= 1.3171 (max= 2.1665), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:45:34,886 - root - INFO - Step 4680: lr=1.00E-05, loss= 1.3171 (max= 2.1665), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:45:34,886 - root - INFO - Step 4680: lr=1.00E-05, loss= 1.3171 (max= 2.1665), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:45:52,911 - root - INFO - Step 4690: lr=1.00E-05, loss= 1.2924 (max= 2.4128), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:45:52,911 - root - INFO - Step 4690: lr=1.00E-05, loss= 1.2924 (max= 2.4128), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:45:52,911 - root - INFO - Step 4690: lr=1.00E-05, loss= 1.2924 (max= 2.4128), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:45:52,911 - root - INFO - Step 4690: lr=1.00E-05, loss= 1.2924 (max= 2.4128), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:45:52,911 - root - INFO - Step 4690: lr=1.00E-05, loss= 1.2924 (max= 2.4128), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:45:52,911 - root - INFO - Step 4690: lr=1.00E-05, loss= 1.2924 (max= 2.4128), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:45:52,911 - root - INFO - Step 4690: lr=1.00E-05, loss= 1.2924 (max= 2.4128), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:45:52,911 - root - INFO - Step 4690: lr=1.00E-05, loss= 1.2924 (max= 2.4128), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:46:10,935 - root - INFO - Step 4700: lr=1.00E-05, loss= 1.2819 (max= 2.3786), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:10,935 - root - INFO - Step 4700: lr=1.00E-05, loss= 1.2819 (max= 2.3786), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:10,935 - root - INFO - Step 4700: lr=1.00E-05, loss= 1.2819 (max= 2.3786), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:10,935 - root - INFO - Step 4700: lr=1.00E-05, loss= 1.2819 (max= 2.3786), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:10,935 - root - INFO - Step 4700: lr=1.00E-05, loss= 1.2819 (max= 2.3786), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:10,935 - root - INFO - Step 4700: lr=1.00E-05, loss= 1.2819 (max= 2.3786), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:10,935 - root - INFO - Step 4700: lr=1.00E-05, loss= 1.2819 (max= 2.3786), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:10,935 - root - INFO - Step 4700: lr=1.00E-05, loss= 1.2819 (max= 2.3786), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:28,906 - root - INFO - Step 4710: lr=1.00E-05, loss= 1.3243 (max= 2.7987), tps=18237, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:28,906 - root - INFO - Step 4710: lr=1.00E-05, loss= 1.3243 (max= 2.7987), tps=18237, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:28,906 - root - INFO - Step 4710: lr=1.00E-05, loss= 1.3243 (max= 2.7987), tps=18237, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:28,906 - root - INFO - Step 4710: lr=1.00E-05, loss= 1.3243 (max= 2.7987), tps=18238, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:28,906 - root - INFO - Step 4710: lr=1.00E-05, loss= 1.3243 (max= 2.7987), tps=18238, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:28,906 - root - INFO - Step 4710: lr=1.00E-05, loss= 1.3243 (max= 2.7987), tps=18237, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:28,906 - root - INFO - Step 4710: lr=1.00E-05, loss= 1.3243 (max= 2.7987), tps=18238, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:28,906 - root - INFO - Step 4710: lr=1.00E-05, loss= 1.3243 (max= 2.7987), tps=18238, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:46,946 - root - INFO - Step 4720: lr=1.00E-05, loss= 1.2967 (max= 3.2369), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:46,946 - root - INFO - Step 4720: lr=1.00E-05, loss= 1.2967 (max= 3.2369), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:46,946 - root - INFO - Step 4720: lr=1.00E-05, loss= 1.2967 (max= 3.2369), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:46,946 - root - INFO - Step 4720: lr=1.00E-05, loss= 1.2967 (max= 3.2369), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:46,946 - root - INFO - Step 4720: lr=1.00E-05, loss= 1.2967 (max= 3.2369), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:46,946 - root - INFO - Step 4720: lr=1.00E-05, loss= 1.2967 (max= 3.2369), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:46,946 - root - INFO - Step 4720: lr=1.00E-05, loss= 1.2967 (max= 3.2369), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:46:46,946 - root - INFO - Step 4720: lr=1.00E-05, loss= 1.2967 (max= 3.2369), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:47:04,954 - root - INFO - Step 4730: lr=1.00E-05, loss= 1.2983 (max= 2.5688), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:04,954 - root - INFO - Step 4730: lr=1.00E-05, loss= 1.2983 (max= 2.5688), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:04,954 - root - INFO - Step 4730: lr=1.00E-05, loss= 1.2983 (max= 2.5688), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:04,954 - root - INFO - Step 4730: lr=1.00E-05, loss= 1.2983 (max= 2.5688), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:04,954 - root - INFO - Step 4730: lr=1.00E-05, loss= 1.2983 (max= 2.5688), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:04,954 - root - INFO - Step 4730: lr=1.00E-05, loss= 1.2983 (max= 2.5688), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:04,954 - root - INFO - Step 4730: lr=1.00E-05, loss= 1.2983 (max= 2.5688), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:04,955 - root - INFO - Step 4730: lr=1.00E-05, loss= 1.2983 (max= 2.5688), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:22,984 - root - INFO - Step 4740: lr=1.00E-05, loss= 1.3259 (max= 3.6799), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:22,984 - root - INFO - Step 4740: lr=1.00E-05, loss= 1.3259 (max= 3.6799), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:22,984 - root - INFO - Step 4740: lr=1.00E-05, loss= 1.3259 (max= 3.6799), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:22,985 - root - INFO - Step 4740: lr=1.00E-05, loss= 1.3259 (max= 3.6799), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:22,985 - root - INFO - Step 4740: lr=1.00E-05, loss= 1.3259 (max= 3.6799), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:22,985 - root - INFO - Step 4740: lr=1.00E-05, loss= 1.3259 (max= 3.6799), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:22,985 - root - INFO - Step 4740: lr=1.00E-05, loss= 1.3259 (max= 3.6799), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:22,985 - root - INFO - Step 4740: lr=1.00E-05, loss= 1.3259 (max= 3.6799), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:40,996 - root - INFO - Step 4750: lr=1.00E-05, loss= 1.2922 (max= 2.3996), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:40,996 - root - INFO - Step 4750: lr=1.00E-05, loss= 1.2922 (max= 2.3996), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:40,996 - root - INFO - Step 4750: lr=1.00E-05, loss= 1.2922 (max= 2.3996), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:40,996 - root - INFO - Step 4750: lr=1.00E-05, loss= 1.2922 (max= 2.3996), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:40,996 - root - INFO - Step 4750: lr=1.00E-05, loss= 1.2922 (max= 2.3996), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:40,996 - root - INFO - Step 4750: lr=1.00E-05, loss= 1.2922 (max= 2.3996), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:40,996 - root - INFO - Step 4750: lr=1.00E-05, loss= 1.2922 (max= 2.3996), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:40,996 - root - INFO - Step 4750: lr=1.00E-05, loss= 1.2922 (max= 2.3996), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:59,022 - root - INFO - Step 4760: lr=1.00E-05, loss= 1.3135 (max= 2.1625), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:59,022 - root - INFO - Step 4760: lr=1.00E-05, loss= 1.3135 (max= 2.1625), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:59,022 - root - INFO - Step 4760: lr=1.00E-05, loss= 1.3135 (max= 2.1625), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:59,022 - root - INFO - Step 4760: lr=1.00E-05, loss= 1.3135 (max= 2.1625), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:59,022 - root - INFO - Step 4760: lr=1.00E-05, loss= 1.3135 (max= 2.1625), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:59,022 - root - INFO - Step 4760: lr=1.00E-05, loss= 1.3135 (max= 2.1625), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:59,022 - root - INFO - Step 4760: lr=1.00E-05, loss= 1.3135 (max= 2.1625), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:47:59,023 - root - INFO - Step 4760: lr=1.00E-05, loss= 1.3135 (max= 2.1625), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:48:17,076 - root - INFO - Step 4770: lr=1.00E-05, loss= 1.3120 (max= 3.7730), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:17,076 - root - INFO - Step 4770: lr=1.00E-05, loss= 1.3120 (max= 3.7730), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:17,076 - root - INFO - Step 4770: lr=1.00E-05, loss= 1.3120 (max= 3.7730), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:17,076 - root - INFO - Step 4770: lr=1.00E-05, loss= 1.3120 (max= 3.7730), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:17,076 - root - INFO - Step 4770: lr=1.00E-05, loss= 1.3120 (max= 3.7730), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:17,076 - root - INFO - Step 4770: lr=1.00E-05, loss= 1.3120 (max= 3.7730), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:17,076 - root - INFO - Step 4770: lr=1.00E-05, loss= 1.3120 (max= 3.7730), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:17,077 - root - INFO - Step 4770: lr=1.00E-05, loss= 1.3120 (max= 3.7730), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:35,085 - root - INFO - Step 4780: lr=1.00E-05, loss= 1.2993 (max= 2.6189), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:35,085 - root - INFO - Step 4780: lr=1.00E-05, loss= 1.2993 (max= 2.6189), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:35,085 - root - INFO - Step 4780: lr=1.00E-05, loss= 1.2993 (max= 2.6189), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:35,085 - root - INFO - Step 4780: lr=1.00E-05, loss= 1.2993 (max= 2.6189), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:35,085 - root - INFO - Step 4780: lr=1.00E-05, loss= 1.2993 (max= 2.6189), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:35,085 - root - INFO - Step 4780: lr=1.00E-05, loss= 1.2993 (max= 2.6189), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:35,085 - root - INFO - Step 4780: lr=1.00E-05, loss= 1.2993 (max= 2.6189), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:35,085 - root - INFO - Step 4780: lr=1.00E-05, loss= 1.2993 (max= 2.6189), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:53,104 - root - INFO - Step 4790: lr=1.00E-05, loss= 1.3095 (max= 2.6997), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:53,104 - root - INFO - Step 4790: lr=1.00E-05, loss= 1.3095 (max= 2.6997), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:53,104 - root - INFO - Step 4790: lr=1.00E-05, loss= 1.3095 (max= 2.6997), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:53,104 - root - INFO - Step 4790: lr=1.00E-05, loss= 1.3095 (max= 2.6997), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:53,104 - root - INFO - Step 4790: lr=1.00E-05, loss= 1.3095 (max= 2.6997), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:53,104 - root - INFO - Step 4790: lr=1.00E-05, loss= 1.3095 (max= 2.6997), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:53,104 - root - INFO - Step 4790: lr=1.00E-05, loss= 1.3095 (max= 2.6997), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:48:53,104 - root - INFO - Step 4790: lr=1.00E-05, loss= 1.3095 (max= 2.6997), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:49:11,126 - root - INFO - Step 4800: lr=1.00E-05, loss= 1.2958 (max= 2.4884), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:49:11,126 - root - INFO - Step 4800: lr=1.00E-05, loss= 1.2958 (max= 2.4884), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:49:11,126 - root - INFO - Step 4800: lr=1.00E-05, loss= 1.2958 (max= 2.4884), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:49:11,126 - root - INFO - Step 4800: lr=1.00E-05, loss= 1.2958 (max= 2.4884), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:49:11,126 - root - INFO - Step 4800: lr=1.00E-05, loss= 1.2958 (max= 2.4884), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:49:11,126 - root - INFO - Step 4800: lr=1.00E-05, loss= 1.2958 (max= 2.4884), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:49:11,126 - root - INFO - Step 4800: lr=1.00E-05, loss= 1.2958 (max= 2.4884), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:49:11,126 - root - INFO - Step 4800: lr=1.00E-05, loss= 1.2958 (max= 2.4884), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:49:29,169 - root - INFO - Step 4810: lr=1.00E-05, loss= 1.3395 (max= 2.6634), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:49:29,169 - root - INFO - Step 4810: lr=1.00E-05, loss= 1.3395 (max= 2.6634), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:49:29,170 - root - INFO - Step 4810: lr=1.00E-05, loss= 1.3395 (max= 2.6634), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:49:29,170 - root - INFO - Step 4810: lr=1.00E-05, loss= 1.3395 (max= 2.6634), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:49:29,170 - root - INFO - Step 4810: lr=1.00E-05, loss= 1.3395 (max= 2.6634), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:49:29,170 - root - INFO - Step 4810: lr=1.00E-05, loss= 1.3395 (max= 2.6634), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:49:29,170 - root - INFO - Step 4810: lr=1.00E-05, loss= 1.3395 (max= 2.6634), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:49:29,170 - root - INFO - Step 4810: lr=1.00E-05, loss= 1.3395 (max= 2.6634), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:49:47,202 - root - INFO - Step 4820: lr=1.00E-05, loss= 1.3204 (max= 3.5312), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:49:47,202 - root - INFO - Step 4820: lr=1.00E-05, loss= 1.3204 (max= 3.5312), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:49:47,202 - root - INFO - Step 4820: lr=1.00E-05, loss= 1.3204 (max= 3.5312), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:49:47,202 - root - INFO - Step 4820: lr=1.00E-05, loss= 1.3204 (max= 3.5312), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:49:47,202 - root - INFO - Step 4820: lr=1.00E-05, loss= 1.3204 (max= 3.5312), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:49:47,202 - root - INFO - Step 4820: lr=1.00E-05, loss= 1.3204 (max= 3.5312), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:49:47,202 - root - INFO - Step 4820: lr=1.00E-05, loss= 1.3204 (max= 3.5312), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:49:47,203 - root - INFO - Step 4820: lr=1.00E-05, loss= 1.3204 (max= 3.5312), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:05,192 - root - INFO - Step 4830: lr=1.00E-05, loss= 1.3621 (max= 2.7539), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:05,192 - root - INFO - Step 4830: lr=1.00E-05, loss= 1.3621 (max= 2.7539), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:05,192 - root - INFO - Step 4830: lr=1.00E-05, loss= 1.3621 (max= 2.7539), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:05,192 - root - INFO - Step 4830: lr=1.00E-05, loss= 1.3621 (max= 2.7539), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:05,192 - root - INFO - Step 4830: lr=1.00E-05, loss= 1.3621 (max= 2.7539), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:05,193 - root - INFO - Step 4830: lr=1.00E-05, loss= 1.3621 (max= 2.7539), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:05,193 - root - INFO - Step 4830: lr=1.00E-05, loss= 1.3621 (max= 2.7539), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:05,193 - root - INFO - Step 4830: lr=1.00E-05, loss= 1.3621 (max= 2.7539), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:23,175 - root - INFO - Step 4840: lr=1.00E-05, loss= 1.2794 (max= 2.2226), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:23,175 - root - INFO - Step 4840: lr=1.00E-05, loss= 1.2794 (max= 2.2226), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:23,175 - root - INFO - Step 4840: lr=1.00E-05, loss= 1.2794 (max= 2.2226), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:23,176 - root - INFO - Step 4840: lr=1.00E-05, loss= 1.2794 (max= 2.2226), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:23,176 - root - INFO - Step 4840: lr=1.00E-05, loss= 1.2794 (max= 2.2226), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:23,176 - root - INFO - Step 4840: lr=1.00E-05, loss= 1.2794 (max= 2.2226), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:23,176 - root - INFO - Step 4840: lr=1.00E-05, loss= 1.2794 (max= 2.2226), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:23,176 - root - INFO - Step 4840: lr=1.00E-05, loss= 1.2794 (max= 2.2226), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:41,191 - root - INFO - Step 4850: lr=1.00E-05, loss= 1.3034 (max= 2.3802), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:41,192 - root - INFO - Step 4850: lr=1.00E-05, loss= 1.3034 (max= 2.3802), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:41,192 - root - INFO - Step 4850: lr=1.00E-05, loss= 1.3034 (max= 2.3802), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:41,192 - root - INFO - Step 4850: lr=1.00E-05, loss= 1.3034 (max= 2.3802), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:41,192 - root - INFO - Step 4850: lr=1.00E-05, loss= 1.3034 (max= 2.3802), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:41,192 - root - INFO - Step 4850: lr=1.00E-05, loss= 1.3034 (max= 2.3802), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:41,192 - root - INFO - Step 4850: lr=1.00E-05, loss= 1.3034 (max= 2.3802), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:41,192 - root - INFO - Step 4850: lr=1.00E-05, loss= 1.3034 (max= 2.3802), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:59,223 - root - INFO - Step 4860: lr=1.00E-05, loss= 1.3029 (max= 2.0713), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:59,224 - root - INFO - Step 4860: lr=1.00E-05, loss= 1.3029 (max= 2.0713), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:59,224 - root - INFO - Step 4860: lr=1.00E-05, loss= 1.3029 (max= 2.0713), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:59,224 - root - INFO - Step 4860: lr=1.00E-05, loss= 1.3029 (max= 2.0713), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:59,224 - root - INFO - Step 4860: lr=1.00E-05, loss= 1.3029 (max= 2.0713), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:59,224 - root - INFO - Step 4860: lr=1.00E-05, loss= 1.3029 (max= 2.0713), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:59,224 - root - INFO - Step 4860: lr=1.00E-05, loss= 1.3029 (max= 2.0713), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:50:59,224 - root - INFO - Step 4860: lr=1.00E-05, loss= 1.3029 (max= 2.0713), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:51:17,238 - root - INFO - Step 4870: lr=1.00E-05, loss= 1.3044 (max= 2.3990), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:51:17,238 - root - INFO - Step 4870: lr=1.00E-05, loss= 1.3044 (max= 2.3990), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:51:17,238 - root - INFO - Step 4870: lr=1.00E-05, loss= 1.3044 (max= 2.3990), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:51:17,238 - root - INFO - Step 4870: lr=1.00E-05, loss= 1.3044 (max= 2.3990), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:51:17,238 - root - INFO - Step 4870: lr=1.00E-05, loss= 1.3044 (max= 2.3990), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:51:17,238 - root - INFO - Step 4870: lr=1.00E-05, loss= 1.3044 (max= 2.3990), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:51:17,238 - root - INFO - Step 4870: lr=1.00E-05, loss= 1.3044 (max= 2.3990), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:51:17,238 - root - INFO - Step 4870: lr=1.00E-05, loss= 1.3044 (max= 2.3990), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:51:35,253 - root - INFO - Step 4880: lr=1.00E-05, loss= 1.2924 (max= 2.5013), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:51:35,254 - root - INFO - Step 4880: lr=1.00E-05, loss= 1.2924 (max= 2.5013), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:51:35,254 - root - INFO - Step 4880: lr=1.00E-05, loss= 1.2924 (max= 2.5013), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:51:35,254 - root - INFO - Step 4880: lr=1.00E-05, loss= 1.2924 (max= 2.5013), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:51:35,254 - root - INFO - Step 4880: lr=1.00E-05, loss= 1.2924 (max= 2.5013), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:51:35,254 - root - INFO - Step 4880: lr=1.00E-05, loss= 1.2924 (max= 2.5013), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:51:35,254 - root - INFO - Step 4880: lr=1.00E-05, loss= 1.2924 (max= 2.5013), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:51:35,254 - root - INFO - Step 4880: lr=1.00E-05, loss= 1.2924 (max= 2.5013), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:51:53,251 - root - INFO - Step 4890: lr=1.00E-05, loss= 1.3076 (max= 2.5424), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:51:53,252 - root - INFO - Step 4890: lr=1.00E-05, loss= 1.3076 (max= 2.5424), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:51:53,252 - root - INFO - Step 4890: lr=1.00E-05, loss= 1.3076 (max= 2.5424), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:51:53,252 - root - INFO - Step 4890: lr=1.00E-05, loss= 1.3076 (max= 2.5424), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:51:53,252 - root - INFO - Step 4890: lr=1.00E-05, loss= 1.3076 (max= 2.5424), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:51:53,252 - root - INFO - Step 4890: lr=1.00E-05, loss= 1.3076 (max= 2.5424), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:51:53,252 - root - INFO - Step 4890: lr=1.00E-05, loss= 1.3076 (max= 2.5424), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:51:53,252 - root - INFO - Step 4890: lr=1.00E-05, loss= 1.3076 (max= 2.5424), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:52:11,255 - root - INFO - Step 4900: lr=1.00E-05, loss= 1.3196 (max= 2.4482), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:11,255 - root - INFO - Step 4900: lr=1.00E-05, loss= 1.3196 (max= 2.4482), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:11,255 - root - INFO - Step 4900: lr=1.00E-05, loss= 1.3196 (max= 2.4482), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:11,255 - root - INFO - Step 4900: lr=1.00E-05, loss= 1.3196 (max= 2.4482), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:11,255 - root - INFO - Step 4900: lr=1.00E-05, loss= 1.3196 (max= 2.4482), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:11,255 - root - INFO - Step 4900: lr=1.00E-05, loss= 1.3196 (max= 2.4482), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:11,255 - root - INFO - Step 4900: lr=1.00E-05, loss= 1.3196 (max= 2.4482), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:11,255 - root - INFO - Step 4900: lr=1.00E-05, loss= 1.3196 (max= 2.4482), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:29,258 - root - INFO - Step 4910: lr=1.00E-05, loss= 1.2985 (max= 2.6912), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:29,258 - root - INFO - Step 4910: lr=1.00E-05, loss= 1.2985 (max= 2.6912), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:29,258 - root - INFO - Step 4910: lr=1.00E-05, loss= 1.2985 (max= 2.6912), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:29,258 - root - INFO - Step 4910: lr=1.00E-05, loss= 1.2985 (max= 2.6912), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:29,258 - root - INFO - Step 4910: lr=1.00E-05, loss= 1.2985 (max= 2.6912), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:29,258 - root - INFO - Step 4910: lr=1.00E-05, loss= 1.2985 (max= 2.6912), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:29,258 - root - INFO - Step 4910: lr=1.00E-05, loss= 1.2985 (max= 2.6912), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:29,259 - root - INFO - Step 4910: lr=1.00E-05, loss= 1.2985 (max= 2.6912), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:47,281 - root - INFO - Step 4920: lr=1.00E-05, loss= 1.2915 (max= 2.4095), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:47,281 - root - INFO - Step 4920: lr=1.00E-05, loss= 1.2915 (max= 2.4095), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:47,281 - root - INFO - Step 4920: lr=1.00E-05, loss= 1.2915 (max= 2.4095), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:47,281 - root - INFO - Step 4920: lr=1.00E-05, loss= 1.2915 (max= 2.4095), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:47,281 - root - INFO - Step 4920: lr=1.00E-05, loss= 1.2915 (max= 2.4095), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:47,281 - root - INFO - Step 4920: lr=1.00E-05, loss= 1.2915 (max= 2.4095), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:47,281 - root - INFO - Step 4920: lr=1.00E-05, loss= 1.2915 (max= 2.4095), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:52:47,281 - root - INFO - Step 4920: lr=1.00E-05, loss= 1.2915 (max= 2.4095), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:05,316 - root - INFO - Step 4930: lr=1.00E-05, loss= 1.3086 (max= 3.7740), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:05,316 - root - INFO - Step 4930: lr=1.00E-05, loss= 1.3086 (max= 3.7740), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:05,316 - root - INFO - Step 4930: lr=1.00E-05, loss= 1.3086 (max= 3.7740), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:05,316 - root - INFO - Step 4930: lr=1.00E-05, loss= 1.3086 (max= 3.7740), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:05,316 - root - INFO - Step 4930: lr=1.00E-05, loss= 1.3086 (max= 3.7740), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:05,316 - root - INFO - Step 4930: lr=1.00E-05, loss= 1.3086 (max= 3.7740), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:05,316 - root - INFO - Step 4930: lr=1.00E-05, loss= 1.3086 (max= 3.7740), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:05,316 - root - INFO - Step 4930: lr=1.00E-05, loss= 1.3086 (max= 3.7740), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:23,362 - root - INFO - Step 4940: lr=1.00E-05, loss= 1.2756 (max= 2.4442), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:23,362 - root - INFO - Step 4940: lr=1.00E-05, loss= 1.2756 (max= 2.4442), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:23,362 - root - INFO - Step 4940: lr=1.00E-05, loss= 1.2756 (max= 2.4442), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:23,362 - root - INFO - Step 4940: lr=1.00E-05, loss= 1.2756 (max= 2.4442), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:23,362 - root - INFO - Step 4940: lr=1.00E-05, loss= 1.2756 (max= 2.4442), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:23,362 - root - INFO - Step 4940: lr=1.00E-05, loss= 1.2756 (max= 2.4442), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:23,362 - root - INFO - Step 4940: lr=1.00E-05, loss= 1.2756 (max= 2.4442), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:23,362 - root - INFO - Step 4940: lr=1.00E-05, loss= 1.2756 (max= 2.4442), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:41,378 - root - INFO - Step 4950: lr=1.00E-05, loss= 1.2585 (max= 3.2437), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:41,378 - root - INFO - Step 4950: lr=1.00E-05, loss= 1.2585 (max= 3.2437), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:41,378 - root - INFO - Step 4950: lr=1.00E-05, loss= 1.2585 (max= 3.2437), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:41,378 - root - INFO - Step 4950: lr=1.00E-05, loss= 1.2585 (max= 3.2437), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:41,378 - root - INFO - Step 4950: lr=1.00E-05, loss= 1.2585 (max= 3.2437), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:41,378 - root - INFO - Step 4950: lr=1.00E-05, loss= 1.2585 (max= 3.2437), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:41,378 - root - INFO - Step 4950: lr=1.00E-05, loss= 1.2585 (max= 3.2437), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:41,378 - root - INFO - Step 4950: lr=1.00E-05, loss= 1.2585 (max= 3.2437), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:59,381 - root - INFO - Step 4960: lr=1.00E-05, loss= 1.2965 (max= 2.6022), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:59,381 - root - INFO - Step 4960: lr=1.00E-05, loss= 1.2965 (max= 2.6022), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:59,382 - root - INFO - Step 4960: lr=1.00E-05, loss= 1.2965 (max= 2.6022), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:59,382 - root - INFO - Step 4960: lr=1.00E-05, loss= 1.2965 (max= 2.6022), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:59,382 - root - INFO - Step 4960: lr=1.00E-05, loss= 1.2965 (max= 2.6022), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:59,382 - root - INFO - Step 4960: lr=1.00E-05, loss= 1.2965 (max= 2.6022), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:59,382 - root - INFO - Step 4960: lr=1.00E-05, loss= 1.2965 (max= 2.6022), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:53:59,382 - root - INFO - Step 4960: lr=1.00E-05, loss= 1.2965 (max= 2.6022), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:17,404 - root - INFO - Step 4970: lr=1.00E-05, loss= 1.2575 (max= 2.4621), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:17,404 - root - INFO - Step 4970: lr=1.00E-05, loss= 1.2575 (max= 2.4621), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:17,404 - root - INFO - Step 4970: lr=1.00E-05, loss= 1.2575 (max= 2.4621), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:17,405 - root - INFO - Step 4970: lr=1.00E-05, loss= 1.2575 (max= 2.4621), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:17,405 - root - INFO - Step 4970: lr=1.00E-05, loss= 1.2575 (max= 2.4621), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:17,405 - root - INFO - Step 4970: lr=1.00E-05, loss= 1.2575 (max= 2.4621), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:17,405 - root - INFO - Step 4970: lr=1.00E-05, loss= 1.2575 (max= 2.4621), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:17,405 - root - INFO - Step 4970: lr=1.00E-05, loss= 1.2575 (max= 2.4621), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:35,445 - root - INFO - Step 4980: lr=1.00E-05, loss= 1.2974 (max= 2.4690), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:35,445 - root - INFO - Step 4980: lr=1.00E-05, loss= 1.2974 (max= 2.4690), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:35,446 - root - INFO - Step 4980: lr=1.00E-05, loss= 1.2974 (max= 2.4690), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:35,446 - root - INFO - Step 4980: lr=1.00E-05, loss= 1.2974 (max= 2.4690), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:35,446 - root - INFO - Step 4980: lr=1.00E-05, loss= 1.2974 (max= 2.4690), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:35,446 - root - INFO - Step 4980: lr=1.00E-05, loss= 1.2974 (max= 2.4690), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:35,446 - root - INFO - Step 4980: lr=1.00E-05, loss= 1.2974 (max= 2.4690), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:35,446 - root - INFO - Step 4980: lr=1.00E-05, loss= 1.2974 (max= 2.4690), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:53,478 - root - INFO - Step 4990: lr=1.00E-05, loss= 1.3174 (max= 2.6046), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:53,478 - root - INFO - Step 4990: lr=1.00E-05, loss= 1.3174 (max= 2.6046), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:53,478 - root - INFO - Step 4990: lr=1.00E-05, loss= 1.3174 (max= 2.6046), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:53,478 - root - INFO - Step 4990: lr=1.00E-05, loss= 1.3174 (max= 2.6046), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:53,478 - root - INFO - Step 4990: lr=1.00E-05, loss= 1.3174 (max= 2.6046), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:53,478 - root - INFO - Step 4990: lr=1.00E-05, loss= 1.3174 (max= 2.6046), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:53,478 - root - INFO - Step 4990: lr=1.00E-05, loss= 1.3174 (max= 2.6046), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:54:53,478 - root - INFO - Step 4990: lr=1.00E-05, loss= 1.3174 (max= 2.6046), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-5000 +2025-10-24 11:55:11,494 - root - INFO - Step 5000: lr=1.00E-05, loss= 1.2889 (max= 2.3637), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:55:11,494 - root - INFO - Saving a full checkpoint at step 5000 +2025-10-24 11:55:11,494 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 11:55:11,494 - root - INFO - Step 5000: lr=1.00E-05, loss= 1.2889 (max= 2.3637), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:55:11,494 - root - INFO - Step 5000: lr=1.00E-05, loss= 1.2889 (max= 2.3637), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:55:11,494 - root - INFO - Saving a full checkpoint at step 5000 +2025-10-24 11:55:11,494 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 11:55:11,494 - root - INFO - Saving a full checkpoint at step 5000 +2025-10-24 11:55:11,494 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 11:55:11,494 - root - INFO - Step 5000: lr=1.00E-05, loss= 1.2889 (max= 2.3637), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:55:11,494 - root - INFO - Step 5000: lr=1.00E-05, loss= 1.2889 (max= 2.3637), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:55:11,494 - root - INFO - Step 5000: lr=1.00E-05, loss= 1.2889 (max= 2.3637), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:55:11,494 - root - INFO - Saving a full checkpoint at step 5000 +2025-10-24 11:55:11,494 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 11:55:11,494 - root - INFO - Saving a full checkpoint at step 5000 +2025-10-24 11:55:11,494 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 11:55:11,494 - root - INFO - Saving a full checkpoint at step 5000 +2025-10-24 11:55:11,494 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 11:55:11,494 - root - INFO - Step 5000: lr=1.00E-05, loss= 1.2889 (max= 2.3637), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:55:11,494 - root - INFO - Step 5000: lr=1.00E-05, loss= 1.2889 (max= 2.3637), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:55:11,494 - root - INFO - Saving a full checkpoint at step 5000 +2025-10-24 11:55:11,495 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 11:55:11,495 - root - INFO - Saving a full checkpoint at step 5000 +2025-10-24 11:55:11,495 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-5000! Save time: 4.590330600738525 +2025-10-24 11:55:26,480 - root - INFO - Finished saving the checkpoint in 14.99 seconds +2025-10-24 11:55:26,487 - root - INFO - Finished saving the checkpoint in 14.99 seconds +2025-10-24 11:55:26,488 - root - INFO - Finished saving the checkpoint in 14.99 seconds +2025-10-24 11:55:26,489 - root - INFO - Finished saving the checkpoint in 14.99 seconds +2025-10-24 11:55:26,489 - root - INFO - Finished saving the checkpoint in 14.99 seconds +2025-10-24 11:55:26,489 - root - INFO - Finished saving the checkpoint in 14.99 seconds +2025-10-24 11:55:26,490 - root - INFO - Finished saving the checkpoint in 15.00 seconds +2025-10-24 11:55:26,491 - root - INFO - Finished saving the checkpoint in 15.00 seconds +2025-10-24 11:55:44,501 - root - INFO - Step 5010: lr=1.00E-05, loss= 1.3191 (max= 3.6804), tps=9929, mfu=20.69%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 11:55:44,501 - root - INFO - Step 5010: lr=1.00E-05, loss= 1.3191 (max= 3.6804), tps=9929, mfu=20.69%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 11:55:44,501 - root - INFO - Step 5010: lr=1.00E-05, loss= 1.3191 (max= 3.6804), tps=9929, mfu=20.69%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 11:55:44,501 - root - INFO - Step 5010: lr=1.00E-05, loss= 1.3191 (max= 3.6804), tps=9929, mfu=20.69%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 11:55:44,501 - root - INFO - Step 5010: lr=1.00E-05, loss= 1.3191 (max= 3.6804), tps=9929, mfu=20.69%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 11:55:44,501 - root - INFO - Step 5010: lr=1.00E-05, loss= 1.3191 (max= 3.6804), tps=9929, mfu=20.69%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 11:55:44,501 - root - INFO - Step 5010: lr=1.00E-05, loss= 1.3191 (max= 3.6804), tps=9929, mfu=20.69%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 11:55:44,501 - root - INFO - Step 5010: lr=1.00E-05, loss= 1.3191 (max= 3.6804), tps=9929, mfu=20.69%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 11:56:02,546 - root - INFO - Step 5020: lr=1.00E-05, loss= 1.2646 (max= 2.4668), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:02,546 - root - INFO - Step 5020: lr=1.00E-05, loss= 1.2646 (max= 2.4668), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:02,546 - root - INFO - Step 5020: lr=1.00E-05, loss= 1.2646 (max= 2.4668), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:02,546 - root - INFO - Step 5020: lr=1.00E-05, loss= 1.2646 (max= 2.4668), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:02,546 - root - INFO - Step 5020: lr=1.00E-05, loss= 1.2646 (max= 2.4668), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:02,546 - root - INFO - Step 5020: lr=1.00E-05, loss= 1.2646 (max= 2.4668), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:02,546 - root - INFO - Step 5020: lr=1.00E-05, loss= 1.2646 (max= 2.4668), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:02,546 - root - INFO - Step 5020: lr=1.00E-05, loss= 1.2646 (max= 2.4668), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:20,569 - root - INFO - Step 5030: lr=1.00E-05, loss= 1.2859 (max= 2.3760), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:20,569 - root - INFO - Step 5030: lr=1.00E-05, loss= 1.2859 (max= 2.3760), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:20,569 - root - INFO - Step 5030: lr=1.00E-05, loss= 1.2859 (max= 2.3760), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:20,569 - root - INFO - Step 5030: lr=1.00E-05, loss= 1.2859 (max= 2.3760), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:20,569 - root - INFO - Step 5030: lr=1.00E-05, loss= 1.2859 (max= 2.3760), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:20,569 - root - INFO - Step 5030: lr=1.00E-05, loss= 1.2859 (max= 2.3760), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:20,570 - root - INFO - Step 5030: lr=1.00E-05, loss= 1.2859 (max= 2.3760), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:20,570 - root - INFO - Step 5030: lr=1.00E-05, loss= 1.2859 (max= 2.3760), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:38,574 - root - INFO - Step 5040: lr=1.00E-05, loss= 1.3012 (max= 2.3628), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:38,574 - root - INFO - Step 5040: lr=1.00E-05, loss= 1.3012 (max= 2.3628), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:38,574 - root - INFO - Step 5040: lr=1.00E-05, loss= 1.3012 (max= 2.3628), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:38,574 - root - INFO - Step 5040: lr=1.00E-05, loss= 1.3012 (max= 2.3628), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:38,574 - root - INFO - Step 5040: lr=1.00E-05, loss= 1.3012 (max= 2.3628), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:38,575 - root - INFO - Step 5040: lr=1.00E-05, loss= 1.3012 (max= 2.3628), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:38,575 - root - INFO - Step 5040: lr=1.00E-05, loss= 1.3012 (max= 2.3628), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:38,575 - root - INFO - Step 5040: lr=1.00E-05, loss= 1.3012 (max= 2.3628), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:56,566 - root - INFO - Step 5050: lr=1.00E-05, loss= 1.2739 (max= 2.4654), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:56,566 - root - INFO - Step 5050: lr=1.00E-05, loss= 1.2739 (max= 2.4654), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:56,566 - root - INFO - Step 5050: lr=1.00E-05, loss= 1.2739 (max= 2.4654), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:56,566 - root - INFO - Step 5050: lr=1.00E-05, loss= 1.2739 (max= 2.4654), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:56,566 - root - INFO - Step 5050: lr=1.00E-05, loss= 1.2739 (max= 2.4654), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:56,567 - root - INFO - Step 5050: lr=1.00E-05, loss= 1.2739 (max= 2.4654), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:56,567 - root - INFO - Step 5050: lr=1.00E-05, loss= 1.2739 (max= 2.4654), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:56:56,567 - root - INFO - Step 5050: lr=1.00E-05, loss= 1.2739 (max= 2.4654), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:57:14,602 - root - INFO - Step 5060: lr=1.00E-05, loss= 1.2663 (max= 2.4126), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:57:14,602 - root - INFO - Step 5060: lr=1.00E-05, loss= 1.2663 (max= 2.4126), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:57:14,602 - root - INFO - Step 5060: lr=1.00E-05, loss= 1.2663 (max= 2.4126), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:57:14,603 - root - INFO - Step 5060: lr=1.00E-05, loss= 1.2663 (max= 2.4126), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:57:14,603 - root - INFO - Step 5060: lr=1.00E-05, loss= 1.2663 (max= 2.4126), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:57:14,603 - root - INFO - Step 5060: lr=1.00E-05, loss= 1.2663 (max= 2.4126), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:57:14,603 - root - INFO - Step 5060: lr=1.00E-05, loss= 1.2663 (max= 2.4126), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:57:14,603 - root - INFO - Step 5060: lr=1.00E-05, loss= 1.2663 (max= 2.4126), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:57:32,622 - root - INFO - Step 5070: lr=1.00E-05, loss= 1.3095 (max= 2.4442), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:57:32,622 - root - INFO - Step 5070: lr=1.00E-05, loss= 1.3095 (max= 2.4442), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:57:32,622 - root - INFO - Step 5070: lr=1.00E-05, loss= 1.3095 (max= 2.4442), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:57:32,622 - root - INFO - Step 5070: lr=1.00E-05, loss= 1.3095 (max= 2.4442), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:57:32,622 - root - INFO - Step 5070: lr=1.00E-05, loss= 1.3095 (max= 2.4442), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:57:32,622 - root - INFO - Step 5070: lr=1.00E-05, loss= 1.3095 (max= 2.4442), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:57:32,622 - root - INFO - Step 5070: lr=1.00E-05, loss= 1.3095 (max= 2.4442), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:57:32,622 - root - INFO - Step 5070: lr=1.00E-05, loss= 1.3095 (max= 2.4442), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:57:50,642 - root - INFO - Step 5080: lr=1.00E-05, loss= 1.2485 (max= 2.3666), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:57:50,642 - root - INFO - Step 5080: lr=1.00E-05, loss= 1.2485 (max= 2.3666), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:57:50,642 - root - INFO - Step 5080: lr=1.00E-05, loss= 1.2485 (max= 2.3666), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:57:50,642 - root - INFO - Step 5080: lr=1.00E-05, loss= 1.2485 (max= 2.3666), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:57:50,642 - root - INFO - Step 5080: lr=1.00E-05, loss= 1.2485 (max= 2.3666), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:57:50,642 - root - INFO - Step 5080: lr=1.00E-05, loss= 1.2485 (max= 2.3666), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:57:50,642 - root - INFO - Step 5080: lr=1.00E-05, loss= 1.2485 (max= 2.3666), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:57:50,642 - root - INFO - Step 5080: lr=1.00E-05, loss= 1.2485 (max= 2.3666), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:58:08,666 - root - INFO - Step 5090: lr=1.00E-05, loss= 1.2771 (max= 2.5504), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:58:08,666 - root - INFO - Step 5090: lr=1.00E-05, loss= 1.2771 (max= 2.5504), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:58:08,666 - root - INFO - Step 5090: lr=1.00E-05, loss= 1.2771 (max= 2.5504), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:58:08,666 - root - INFO - Step 5090: lr=1.00E-05, loss= 1.2771 (max= 2.5504), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:58:08,666 - root - INFO - Step 5090: lr=1.00E-05, loss= 1.2771 (max= 2.5504), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:58:08,666 - root - INFO - Step 5090: lr=1.00E-05, loss= 1.2771 (max= 2.5504), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:58:08,666 - root - INFO - Step 5090: lr=1.00E-05, loss= 1.2771 (max= 2.5504), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:58:08,666 - root - INFO - Step 5090: lr=1.00E-05, loss= 1.2771 (max= 2.5504), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:58:26,685 - root - INFO - Step 5100: lr=1.00E-05, loss= 1.2752 (max= 2.3903), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:58:26,685 - root - INFO - Step 5100: lr=1.00E-05, loss= 1.2752 (max= 2.3903), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:58:26,685 - root - INFO - Step 5100: lr=1.00E-05, loss= 1.2752 (max= 2.3903), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:58:26,685 - root - INFO - Step 5100: lr=1.00E-05, loss= 1.2752 (max= 2.3903), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:58:26,685 - root - INFO - Step 5100: lr=1.00E-05, loss= 1.2752 (max= 2.3903), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:58:26,685 - root - INFO - Step 5100: lr=1.00E-05, loss= 1.2752 (max= 2.3903), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:58:26,685 - root - INFO - Step 5100: lr=1.00E-05, loss= 1.2752 (max= 2.3903), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:58:26,685 - root - INFO - Step 5100: lr=1.00E-05, loss= 1.2752 (max= 2.3903), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:58:44,761 - root - INFO - Step 5110: lr=1.00E-05, loss= 1.3095 (max= 2.2756), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:58:44,761 - root - INFO - Step 5110: lr=1.00E-05, loss= 1.3095 (max= 2.2756), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:58:44,761 - root - INFO - Step 5110: lr=1.00E-05, loss= 1.3095 (max= 2.2756), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:58:44,761 - root - INFO - Step 5110: lr=1.00E-05, loss= 1.3095 (max= 2.2756), tps=18132, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:58:44,761 - root - INFO - Step 5110: lr=1.00E-05, loss= 1.3095 (max= 2.2756), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:58:44,762 - root - INFO - Step 5110: lr=1.00E-05, loss= 1.3095 (max= 2.2756), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:58:44,762 - root - INFO - Step 5110: lr=1.00E-05, loss= 1.3095 (max= 2.2756), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:58:44,762 - root - INFO - Step 5110: lr=1.00E-05, loss= 1.3095 (max= 2.2756), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:59:02,775 - root - INFO - Step 5120: lr=1.00E-05, loss= 1.3165 (max= 2.4858), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:59:02,775 - root - INFO - Step 5120: lr=1.00E-05, loss= 1.3165 (max= 2.4858), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:59:02,775 - root - INFO - Step 5120: lr=1.00E-05, loss= 1.3165 (max= 2.4858), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:59:02,775 - root - INFO - Step 5120: lr=1.00E-05, loss= 1.3165 (max= 2.4858), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:59:02,775 - root - INFO - Step 5120: lr=1.00E-05, loss= 1.3165 (max= 2.4858), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:59:02,775 - root - INFO - Step 5120: lr=1.00E-05, loss= 1.3165 (max= 2.4858), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:59:02,775 - root - INFO - Step 5120: lr=1.00E-05, loss= 1.3165 (max= 2.4858), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:59:02,775 - root - INFO - Step 5120: lr=1.00E-05, loss= 1.3165 (max= 2.4858), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:59:20,817 - root - INFO - Step 5130: lr=1.00E-05, loss= 1.2898 (max= 2.2297), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:59:20,817 - root - INFO - Step 5130: lr=1.00E-05, loss= 1.2898 (max= 2.2297), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:59:20,817 - root - INFO - Step 5130: lr=1.00E-05, loss= 1.2898 (max= 2.2297), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:59:20,817 - root - INFO - Step 5130: lr=1.00E-05, loss= 1.2898 (max= 2.2297), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:59:20,817 - root - INFO - Step 5130: lr=1.00E-05, loss= 1.2898 (max= 2.2297), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:59:20,817 - root - INFO - Step 5130: lr=1.00E-05, loss= 1.2898 (max= 2.2297), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:59:20,817 - root - INFO - Step 5130: lr=1.00E-05, loss= 1.2898 (max= 2.2297), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:59:20,817 - root - INFO - Step 5130: lr=1.00E-05, loss= 1.2898 (max= 2.2297), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 11:59:38,864 - root - INFO - Step 5140: lr=1.00E-05, loss= 1.3026 (max= 2.7514), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:59:38,864 - root - INFO - Step 5140: lr=1.00E-05, loss= 1.3026 (max= 2.7514), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:59:38,865 - root - INFO - Step 5140: lr=1.00E-05, loss= 1.3026 (max= 2.7514), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:59:38,865 - root - INFO - Step 5140: lr=1.00E-05, loss= 1.3026 (max= 2.7514), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:59:38,865 - root - INFO - Step 5140: lr=1.00E-05, loss= 1.3026 (max= 2.7514), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:59:38,865 - root - INFO - Step 5140: lr=1.00E-05, loss= 1.3026 (max= 2.7514), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:59:38,865 - root - INFO - Step 5140: lr=1.00E-05, loss= 1.3026 (max= 2.7514), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:59:38,865 - root - INFO - Step 5140: lr=1.00E-05, loss= 1.3026 (max= 2.7514), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:59:56,840 - root - INFO - Step 5150: lr=1.00E-05, loss= 1.2918 (max= 2.4661), tps=18233, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:59:56,840 - root - INFO - Step 5150: lr=1.00E-05, loss= 1.2918 (max= 2.4661), tps=18233, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:59:56,840 - root - INFO - Step 5150: lr=1.00E-05, loss= 1.2918 (max= 2.4661), tps=18233, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:59:56,840 - root - INFO - Step 5150: lr=1.00E-05, loss= 1.2918 (max= 2.4661), tps=18233, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:59:56,840 - root - INFO - Step 5150: lr=1.00E-05, loss= 1.2918 (max= 2.4661), tps=18233, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:59:56,840 - root - INFO - Step 5150: lr=1.00E-05, loss= 1.2918 (max= 2.4661), tps=18233, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:59:56,840 - root - INFO - Step 5150: lr=1.00E-05, loss= 1.2918 (max= 2.4661), tps=18233, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 11:59:56,840 - root - INFO - Step 5150: lr=1.00E-05, loss= 1.2918 (max= 2.4661), tps=18233, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:00:14,863 - root - INFO - Step 5160: lr=1.00E-05, loss= 1.3046 (max= 2.5119), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:00:14,864 - root - INFO - Step 5160: lr=1.00E-05, loss= 1.3046 (max= 2.5119), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:00:14,864 - root - INFO - Step 5160: lr=1.00E-05, loss= 1.3046 (max= 2.5119), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:00:14,864 - root - INFO - Step 5160: lr=1.00E-05, loss= 1.3046 (max= 2.5119), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:00:14,864 - root - INFO - Step 5160: lr=1.00E-05, loss= 1.3046 (max= 2.5119), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:00:14,864 - root - INFO - Step 5160: lr=1.00E-05, loss= 1.3046 (max= 2.5119), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:00:14,864 - root - INFO - Step 5160: lr=1.00E-05, loss= 1.3046 (max= 2.5119), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:00:14,864 - root - INFO - Step 5160: lr=1.00E-05, loss= 1.3046 (max= 2.5119), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:00:32,922 - root - INFO - Step 5170: lr=1.00E-05, loss= 1.2615 (max= 2.3210), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:00:32,922 - root - INFO - Step 5170: lr=1.00E-05, loss= 1.2615 (max= 2.3210), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:00:32,922 - root - INFO - Step 5170: lr=1.00E-05, loss= 1.2615 (max= 2.3210), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:00:32,922 - root - INFO - Step 5170: lr=1.00E-05, loss= 1.2615 (max= 2.3210), tps=18150, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:00:32,922 - root - INFO - Step 5170: lr=1.00E-05, loss= 1.2615 (max= 2.3210), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:00:32,922 - root - INFO - Step 5170: lr=1.00E-05, loss= 1.2615 (max= 2.3210), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:00:32,922 - root - INFO - Step 5170: lr=1.00E-05, loss= 1.2615 (max= 2.3210), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:00:32,922 - root - INFO - Step 5170: lr=1.00E-05, loss= 1.2615 (max= 2.3210), tps=18150, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:00:50,966 - root - INFO - Step 5180: lr=1.00E-05, loss= 1.2631 (max= 2.3456), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:00:50,966 - root - INFO - Step 5180: lr=1.00E-05, loss= 1.2631 (max= 2.3456), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:00:50,966 - root - INFO - Step 5180: lr=1.00E-05, loss= 1.2631 (max= 2.3456), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:00:50,966 - root - INFO - Step 5180: lr=1.00E-05, loss= 1.2631 (max= 2.3456), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:00:50,966 - root - INFO - Step 5180: lr=1.00E-05, loss= 1.2631 (max= 2.3456), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:00:50,966 - root - INFO - Step 5180: lr=1.00E-05, loss= 1.2631 (max= 2.3456), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:00:50,966 - root - INFO - Step 5180: lr=1.00E-05, loss= 1.2631 (max= 2.3456), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:00:50,966 - root - INFO - Step 5180: lr=1.00E-05, loss= 1.2631 (max= 2.3456), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:08,960 - root - INFO - Step 5190: lr=1.00E-05, loss= 1.2583 (max= 2.5549), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:08,960 - root - INFO - Step 5190: lr=1.00E-05, loss= 1.2583 (max= 2.5549), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:08,960 - root - INFO - Step 5190: lr=1.00E-05, loss= 1.2583 (max= 2.5549), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:08,960 - root - INFO - Step 5190: lr=1.00E-05, loss= 1.2583 (max= 2.5549), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:08,960 - root - INFO - Step 5190: lr=1.00E-05, loss= 1.2583 (max= 2.5549), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:08,960 - root - INFO - Step 5190: lr=1.00E-05, loss= 1.2583 (max= 2.5549), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:08,960 - root - INFO - Step 5190: lr=1.00E-05, loss= 1.2583 (max= 2.5549), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:08,960 - root - INFO - Step 5190: lr=1.00E-05, loss= 1.2583 (max= 2.5549), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:26,984 - root - INFO - Step 5200: lr=1.00E-05, loss= 1.2662 (max= 2.6576), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:26,984 - root - INFO - Step 5200: lr=1.00E-05, loss= 1.2662 (max= 2.6576), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:26,984 - root - INFO - Step 5200: lr=1.00E-05, loss= 1.2662 (max= 2.6576), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:26,984 - root - INFO - Step 5200: lr=1.00E-05, loss= 1.2662 (max= 2.6576), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:26,984 - root - INFO - Step 5200: lr=1.00E-05, loss= 1.2662 (max= 2.6576), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:26,984 - root - INFO - Step 5200: lr=1.00E-05, loss= 1.2662 (max= 2.6576), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:26,984 - root - INFO - Step 5200: lr=1.00E-05, loss= 1.2662 (max= 2.6576), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:26,984 - root - INFO - Step 5200: lr=1.00E-05, loss= 1.2662 (max= 2.6576), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:45,015 - root - INFO - Step 5210: lr=1.00E-05, loss= 1.2867 (max= 2.2984), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:45,015 - root - INFO - Step 5210: lr=1.00E-05, loss= 1.2867 (max= 2.2984), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:45,016 - root - INFO - Step 5210: lr=1.00E-05, loss= 1.2867 (max= 2.2984), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:45,016 - root - INFO - Step 5210: lr=1.00E-05, loss= 1.2867 (max= 2.2984), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:45,016 - root - INFO - Step 5210: lr=1.00E-05, loss= 1.2867 (max= 2.2984), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:45,016 - root - INFO - Step 5210: lr=1.00E-05, loss= 1.2867 (max= 2.2984), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:45,016 - root - INFO - Step 5210: lr=1.00E-05, loss= 1.2867 (max= 2.2984), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:45,016 - root - INFO - Step 5210: lr=1.00E-05, loss= 1.2867 (max= 2.2984), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:01:52,450 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:239785 +2025-10-24 12:02:03,041 - root - INFO - Step 5220: lr=1.00E-05, loss= 1.2802 (max= 3.4058), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:02:03,041 - root - INFO - Step 5220: lr=1.00E-05, loss= 1.2802 (max= 3.4058), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:02:03,041 - root - INFO - Step 5220: lr=1.00E-05, loss= 1.2802 (max= 3.4058), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:02:03,041 - root - INFO - Step 5220: lr=1.00E-05, loss= 1.2802 (max= 3.4058), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:02:03,041 - root - INFO - Step 5220: lr=1.00E-05, loss= 1.2802 (max= 3.4058), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:02:03,041 - root - INFO - Step 5220: lr=1.00E-05, loss= 1.2802 (max= 3.4058), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:02:03,041 - root - INFO - Step 5220: lr=1.00E-05, loss= 1.2802 (max= 3.4058), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:02:03,041 - root - INFO - Step 5220: lr=1.00E-05, loss= 1.2802 (max= 3.4058), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:02:21,067 - root - INFO - Step 5230: lr=1.00E-05, loss= 1.2950 (max= 2.4811), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:02:21,068 - root - INFO - Step 5230: lr=1.00E-05, loss= 1.2950 (max= 2.4811), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:02:21,068 - root - INFO - Step 5230: lr=1.00E-05, loss= 1.2950 (max= 2.4811), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:02:21,068 - root - INFO - Step 5230: lr=1.00E-05, loss= 1.2950 (max= 2.4811), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:02:21,068 - root - INFO - Step 5230: lr=1.00E-05, loss= 1.2950 (max= 2.4811), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:02:21,068 - root - INFO - Step 5230: lr=1.00E-05, loss= 1.2950 (max= 2.4811), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:02:21,068 - root - INFO - Step 5230: lr=1.00E-05, loss= 1.2950 (max= 2.4811), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:02:21,068 - root - INFO - Step 5230: lr=1.00E-05, loss= 1.2950 (max= 2.4811), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:02:39,073 - root - INFO - Step 5240: lr=1.00E-05, loss= 1.2857 (max= 2.3775), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:02:39,073 - root - INFO - Step 5240: lr=1.00E-05, loss= 1.2857 (max= 2.3775), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:02:39,073 - root - INFO - Step 5240: lr=1.00E-05, loss= 1.2857 (max= 2.3775), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:02:39,073 - root - INFO - Step 5240: lr=1.00E-05, loss= 1.2857 (max= 2.3775), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:02:39,073 - root - INFO - Step 5240: lr=1.00E-05, loss= 1.2857 (max= 2.3775), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:02:39,073 - root - INFO - Step 5240: lr=1.00E-05, loss= 1.2857 (max= 2.3775), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:02:39,073 - root - INFO - Step 5240: lr=1.00E-05, loss= 1.2857 (max= 2.3775), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:02:39,073 - root - INFO - Step 5240: lr=1.00E-05, loss= 1.2857 (max= 2.3775), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:02:57,071 - root - INFO - Step 5250: lr=1.00E-05, loss= 1.2996 (max= 2.4867), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:02:57,071 - root - INFO - Step 5250: lr=1.00E-05, loss= 1.2996 (max= 2.4867), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:02:57,071 - root - INFO - Step 5250: lr=1.00E-05, loss= 1.2996 (max= 2.4867), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:02:57,071 - root - INFO - Step 5250: lr=1.00E-05, loss= 1.2996 (max= 2.4867), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:02:57,071 - root - INFO - Step 5250: lr=1.00E-05, loss= 1.2996 (max= 2.4867), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:02:57,071 - root - INFO - Step 5250: lr=1.00E-05, loss= 1.2996 (max= 2.4867), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:02:57,071 - root - INFO - Step 5250: lr=1.00E-05, loss= 1.2996 (max= 2.4867), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:02:57,071 - root - INFO - Step 5250: lr=1.00E-05, loss= 1.2996 (max= 2.4867), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:15,123 - root - INFO - Step 5260: lr=1.00E-05, loss= 1.2772 (max= 2.4795), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:15,123 - root - INFO - Step 5260: lr=1.00E-05, loss= 1.2772 (max= 2.4795), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:15,123 - root - INFO - Step 5260: lr=1.00E-05, loss= 1.2772 (max= 2.4795), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:15,123 - root - INFO - Step 5260: lr=1.00E-05, loss= 1.2772 (max= 2.4795), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:15,123 - root - INFO - Step 5260: lr=1.00E-05, loss= 1.2772 (max= 2.4795), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:15,123 - root - INFO - Step 5260: lr=1.00E-05, loss= 1.2772 (max= 2.4795), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:15,124 - root - INFO - Step 5260: lr=1.00E-05, loss= 1.2772 (max= 2.4795), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:15,124 - root - INFO - Step 5260: lr=1.00E-05, loss= 1.2772 (max= 2.4795), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:33,175 - root - INFO - Step 5270: lr=1.00E-05, loss= 1.2592 (max= 2.1883), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:33,175 - root - INFO - Step 5270: lr=1.00E-05, loss= 1.2592 (max= 2.1883), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:33,175 - root - INFO - Step 5270: lr=1.00E-05, loss= 1.2592 (max= 2.1883), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:33,175 - root - INFO - Step 5270: lr=1.00E-05, loss= 1.2592 (max= 2.1883), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:33,175 - root - INFO - Step 5270: lr=1.00E-05, loss= 1.2592 (max= 2.1883), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:33,175 - root - INFO - Step 5270: lr=1.00E-05, loss= 1.2592 (max= 2.1883), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:33,175 - root - INFO - Step 5270: lr=1.00E-05, loss= 1.2592 (max= 2.1883), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:33,175 - root - INFO - Step 5270: lr=1.00E-05, loss= 1.2592 (max= 2.1883), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:51,205 - root - INFO - Step 5280: lr=1.00E-05, loss= 1.2730 (max= 2.2661), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:51,205 - root - INFO - Step 5280: lr=1.00E-05, loss= 1.2730 (max= 2.2661), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:51,205 - root - INFO - Step 5280: lr=1.00E-05, loss= 1.2730 (max= 2.2661), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:51,205 - root - INFO - Step 5280: lr=1.00E-05, loss= 1.2730 (max= 2.2661), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:51,205 - root - INFO - Step 5280: lr=1.00E-05, loss= 1.2730 (max= 2.2661), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:51,205 - root - INFO - Step 5280: lr=1.00E-05, loss= 1.2730 (max= 2.2661), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:51,206 - root - INFO - Step 5280: lr=1.00E-05, loss= 1.2730 (max= 2.2661), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:03:51,206 - root - INFO - Step 5280: lr=1.00E-05, loss= 1.2730 (max= 2.2661), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:04:09,215 - root - INFO - Step 5290: lr=1.00E-05, loss= 1.2901 (max= 3.3066), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:04:09,215 - root - INFO - Step 5290: lr=1.00E-05, loss= 1.2901 (max= 3.3066), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:04:09,215 - root - INFO - Step 5290: lr=1.00E-05, loss= 1.2901 (max= 3.3066), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:04:09,215 - root - INFO - Step 5290: lr=1.00E-05, loss= 1.2901 (max= 3.3066), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:04:09,215 - root - INFO - Step 5290: lr=1.00E-05, loss= 1.2901 (max= 3.3066), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:04:09,215 - root - INFO - Step 5290: lr=1.00E-05, loss= 1.2901 (max= 3.3066), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:04:09,215 - root - INFO - Step 5290: lr=1.00E-05, loss= 1.2901 (max= 3.3066), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:04:09,215 - root - INFO - Step 5290: lr=1.00E-05, loss= 1.2901 (max= 3.3066), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:04:27,237 - root - INFO - Step 5300: lr=1.00E-05, loss= 1.3025 (max= 2.5281), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:04:27,237 - root - INFO - Step 5300: lr=1.00E-05, loss= 1.3025 (max= 2.5281), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:04:27,237 - root - INFO - Step 5300: lr=1.00E-05, loss= 1.3025 (max= 2.5281), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:04:27,237 - root - INFO - Step 5300: lr=1.00E-05, loss= 1.3025 (max= 2.5281), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:04:27,237 - root - INFO - Step 5300: lr=1.00E-05, loss= 1.3025 (max= 2.5281), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:04:27,237 - root - INFO - Step 5300: lr=1.00E-05, loss= 1.3025 (max= 2.5281), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:04:27,237 - root - INFO - Step 5300: lr=1.00E-05, loss= 1.3025 (max= 2.5281), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:04:27,237 - root - INFO - Step 5300: lr=1.00E-05, loss= 1.3025 (max= 2.5281), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:04:45,265 - root - INFO - Step 5310: lr=1.00E-05, loss= 1.3026 (max= 2.2538), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:04:45,265 - root - INFO - Step 5310: lr=1.00E-05, loss= 1.3026 (max= 2.2538), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:04:45,265 - root - INFO - Step 5310: lr=1.00E-05, loss= 1.3026 (max= 2.2538), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:04:45,265 - root - INFO - Step 5310: lr=1.00E-05, loss= 1.3026 (max= 2.2538), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:04:45,265 - root - INFO - Step 5310: lr=1.00E-05, loss= 1.3026 (max= 2.2538), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:04:45,265 - root - INFO - Step 5310: lr=1.00E-05, loss= 1.3026 (max= 2.2538), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:04:45,265 - root - INFO - Step 5310: lr=1.00E-05, loss= 1.3026 (max= 2.2538), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:04:45,265 - root - INFO - Step 5310: lr=1.00E-05, loss= 1.3026 (max= 2.2538), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:05:03,279 - root - INFO - Step 5320: lr=1.00E-05, loss= 1.2829 (max= 3.3143), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:05:03,279 - root - INFO - Step 5320: lr=1.00E-05, loss= 1.2829 (max= 3.3143), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:05:03,279 - root - INFO - Step 5320: lr=1.00E-05, loss= 1.2829 (max= 3.3143), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:05:03,279 - root - INFO - Step 5320: lr=1.00E-05, loss= 1.2829 (max= 3.3143), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:05:03,279 - root - INFO - Step 5320: lr=1.00E-05, loss= 1.2829 (max= 3.3143), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:05:03,279 - root - INFO - Step 5320: lr=1.00E-05, loss= 1.2829 (max= 3.3143), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:05:03,279 - root - INFO - Step 5320: lr=1.00E-05, loss= 1.2829 (max= 3.3143), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:05:03,279 - root - INFO - Step 5320: lr=1.00E-05, loss= 1.2829 (max= 3.3143), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:05:21,301 - root - INFO - Step 5330: lr=1.00E-05, loss= 1.2971 (max= 2.5950), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:05:21,301 - root - INFO - Step 5330: lr=1.00E-05, loss= 1.2971 (max= 2.5950), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:05:21,301 - root - INFO - Step 5330: lr=1.00E-05, loss= 1.2971 (max= 2.5950), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:05:21,302 - root - INFO - Step 5330: lr=1.00E-05, loss= 1.2971 (max= 2.5950), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:05:21,302 - root - INFO - Step 5330: lr=1.00E-05, loss= 1.2971 (max= 2.5950), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:05:21,302 - root - INFO - Step 5330: lr=1.00E-05, loss= 1.2971 (max= 2.5950), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:05:21,302 - root - INFO - Step 5330: lr=1.00E-05, loss= 1.2971 (max= 2.5950), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:05:21,302 - root - INFO - Step 5330: lr=1.00E-05, loss= 1.2971 (max= 2.5950), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:05:39,322 - root - INFO - Step 5340: lr=1.00E-05, loss= 1.2583 (max= 2.4698), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:05:39,322 - root - INFO - Step 5340: lr=1.00E-05, loss= 1.2583 (max= 2.4698), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:05:39,322 - root - INFO - Step 5340: lr=1.00E-05, loss= 1.2583 (max= 2.4698), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:05:39,322 - root - INFO - Step 5340: lr=1.00E-05, loss= 1.2583 (max= 2.4698), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:05:39,322 - root - INFO - Step 5340: lr=1.00E-05, loss= 1.2583 (max= 2.4698), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:05:39,322 - root - INFO - Step 5340: lr=1.00E-05, loss= 1.2583 (max= 2.4698), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:05:39,322 - root - INFO - Step 5340: lr=1.00E-05, loss= 1.2583 (max= 2.4698), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:05:39,322 - root - INFO - Step 5340: lr=1.00E-05, loss= 1.2583 (max= 2.4698), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:05:57,319 - root - INFO - Step 5350: lr=1.00E-05, loss= 1.2950 (max= 3.8807), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:05:57,319 - root - INFO - Step 5350: lr=1.00E-05, loss= 1.2950 (max= 3.8807), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:05:57,319 - root - INFO - Step 5350: lr=1.00E-05, loss= 1.2950 (max= 3.8807), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:05:57,319 - root - INFO - Step 5350: lr=1.00E-05, loss= 1.2950 (max= 3.8807), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:05:57,319 - root - INFO - Step 5350: lr=1.00E-05, loss= 1.2950 (max= 3.8807), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:05:57,319 - root - INFO - Step 5350: lr=1.00E-05, loss= 1.2950 (max= 3.8807), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:05:57,319 - root - INFO - Step 5350: lr=1.00E-05, loss= 1.2950 (max= 3.8807), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:05:57,319 - root - INFO - Step 5350: lr=1.00E-05, loss= 1.2950 (max= 3.8807), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:06:15,359 - root - INFO - Step 5360: lr=1.00E-05, loss= 1.2871 (max= 2.4185), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:06:15,359 - root - INFO - Step 5360: lr=1.00E-05, loss= 1.2871 (max= 2.4185), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:06:15,359 - root - INFO - Step 5360: lr=1.00E-05, loss= 1.2871 (max= 2.4185), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:06:15,359 - root - INFO - Step 5360: lr=1.00E-05, loss= 1.2871 (max= 2.4185), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:06:15,359 - root - INFO - Step 5360: lr=1.00E-05, loss= 1.2871 (max= 2.4185), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:06:15,359 - root - INFO - Step 5360: lr=1.00E-05, loss= 1.2871 (max= 2.4185), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:06:15,359 - root - INFO - Step 5360: lr=1.00E-05, loss= 1.2871 (max= 2.4185), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:06:15,359 - root - INFO - Step 5360: lr=1.00E-05, loss= 1.2871 (max= 2.4185), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:06:33,415 - root - INFO - Step 5370: lr=1.00E-05, loss= 1.2591 (max= 2.3855), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:06:33,415 - root - INFO - Step 5370: lr=1.00E-05, loss= 1.2591 (max= 2.3855), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:06:33,415 - root - INFO - Step 5370: lr=1.00E-05, loss= 1.2591 (max= 2.3855), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:06:33,415 - root - INFO - Step 5370: lr=1.00E-05, loss= 1.2591 (max= 2.3855), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:06:33,415 - root - INFO - Step 5370: lr=1.00E-05, loss= 1.2591 (max= 2.3855), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:06:33,415 - root - INFO - Step 5370: lr=1.00E-05, loss= 1.2591 (max= 2.3855), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:06:33,415 - root - INFO - Step 5370: lr=1.00E-05, loss= 1.2591 (max= 2.3855), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:06:33,415 - root - INFO - Step 5370: lr=1.00E-05, loss= 1.2591 (max= 2.3855), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:06:51,379 - root - INFO - Step 5380: lr=1.00E-05, loss= 1.2752 (max= 2.3044), tps=18244, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:06:51,379 - root - INFO - Step 5380: lr=1.00E-05, loss= 1.2752 (max= 2.3044), tps=18244, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:06:51,379 - root - INFO - Step 5380: lr=1.00E-05, loss= 1.2752 (max= 2.3044), tps=18244, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:06:51,379 - root - INFO - Step 5380: lr=1.00E-05, loss= 1.2752 (max= 2.3044), tps=18244, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:06:51,379 - root - INFO - Step 5380: lr=1.00E-05, loss= 1.2752 (max= 2.3044), tps=18244, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:06:51,379 - root - INFO - Step 5380: lr=1.00E-05, loss= 1.2752 (max= 2.3044), tps=18244, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:06:51,380 - root - INFO - Step 5380: lr=1.00E-05, loss= 1.2752 (max= 2.3044), tps=18244, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:06:51,380 - root - INFO - Step 5380: lr=1.00E-05, loss= 1.2752 (max= 2.3044), tps=18244, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:07:09,385 - root - INFO - Step 5390: lr=1.00E-05, loss= 1.2868 (max= 3.2718), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:07:09,385 - root - INFO - Step 5390: lr=1.00E-05, loss= 1.2868 (max= 3.2718), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:07:09,385 - root - INFO - Step 5390: lr=1.00E-05, loss= 1.2868 (max= 3.2718), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:07:09,385 - root - INFO - Step 5390: lr=1.00E-05, loss= 1.2868 (max= 3.2718), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:07:09,385 - root - INFO - Step 5390: lr=1.00E-05, loss= 1.2868 (max= 3.2718), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:07:09,385 - root - INFO - Step 5390: lr=1.00E-05, loss= 1.2868 (max= 3.2718), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:07:09,385 - root - INFO - Step 5390: lr=1.00E-05, loss= 1.2868 (max= 3.2718), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:07:09,385 - root - INFO - Step 5390: lr=1.00E-05, loss= 1.2868 (max= 3.2718), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:07:27,445 - root - INFO - Step 5400: lr=1.00E-05, loss= 1.2860 (max= 2.7868), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:07:27,445 - root - INFO - Step 5400: lr=1.00E-05, loss= 1.2860 (max= 2.7868), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:07:27,445 - root - INFO - Step 5400: lr=1.00E-05, loss= 1.2860 (max= 2.7868), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:07:27,445 - root - INFO - Step 5400: lr=1.00E-05, loss= 1.2860 (max= 2.7868), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:07:27,445 - root - INFO - Step 5400: lr=1.00E-05, loss= 1.2860 (max= 2.7868), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:07:27,445 - root - INFO - Step 5400: lr=1.00E-05, loss= 1.2860 (max= 2.7868), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:07:27,445 - root - INFO - Step 5400: lr=1.00E-05, loss= 1.2860 (max= 2.7868), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:07:27,445 - root - INFO - Step 5400: lr=1.00E-05, loss= 1.2860 (max= 2.7868), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:07:45,443 - root - INFO - Step 5410: lr=1.00E-05, loss= 1.2796 (max= 2.5094), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:07:45,443 - root - INFO - Step 5410: lr=1.00E-05, loss= 1.2796 (max= 2.5094), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:07:45,443 - root - INFO - Step 5410: lr=1.00E-05, loss= 1.2796 (max= 2.5094), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:07:45,443 - root - INFO - Step 5410: lr=1.00E-05, loss= 1.2796 (max= 2.5094), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:07:45,443 - root - INFO - Step 5410: lr=1.00E-05, loss= 1.2796 (max= 2.5094), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:07:45,443 - root - INFO - Step 5410: lr=1.00E-05, loss= 1.2796 (max= 2.5094), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:07:45,443 - root - INFO - Step 5410: lr=1.00E-05, loss= 1.2796 (max= 2.5094), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:07:45,443 - root - INFO - Step 5410: lr=1.00E-05, loss= 1.2796 (max= 2.5094), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:08:03,459 - root - INFO - Step 5420: lr=1.00E-05, loss= 1.2706 (max= 2.4222), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:08:03,459 - root - INFO - Step 5420: lr=1.00E-05, loss= 1.2706 (max= 2.4222), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:08:03,459 - root - INFO - Step 5420: lr=1.00E-05, loss= 1.2706 (max= 2.4222), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:08:03,459 - root - INFO - Step 5420: lr=1.00E-05, loss= 1.2706 (max= 2.4222), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:08:03,459 - root - INFO - Step 5420: lr=1.00E-05, loss= 1.2706 (max= 2.4222), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:08:03,459 - root - INFO - Step 5420: lr=1.00E-05, loss= 1.2706 (max= 2.4222), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:08:03,459 - root - INFO - Step 5420: lr=1.00E-05, loss= 1.2706 (max= 2.4222), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:08:03,459 - root - INFO - Step 5420: lr=1.00E-05, loss= 1.2706 (max= 2.4222), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:08:21,437 - root - INFO - Step 5430: lr=1.00E-05, loss= 1.2712 (max= 2.6842), tps=18230, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:08:21,437 - root - INFO - Step 5430: lr=1.00E-05, loss= 1.2712 (max= 2.6842), tps=18230, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:08:21,437 - root - INFO - Step 5430: lr=1.00E-05, loss= 1.2712 (max= 2.6842), tps=18230, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:08:21,437 - root - INFO - Step 5430: lr=1.00E-05, loss= 1.2712 (max= 2.6842), tps=18230, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:08:21,437 - root - INFO - Step 5430: lr=1.00E-05, loss= 1.2712 (max= 2.6842), tps=18230, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:08:21,437 - root - INFO - Step 5430: lr=1.00E-05, loss= 1.2712 (max= 2.6842), tps=18230, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:08:21,437 - root - INFO - Step 5430: lr=1.00E-05, loss= 1.2712 (max= 2.6842), tps=18230, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:08:21,438 - root - INFO - Step 5430: lr=1.00E-05, loss= 1.2712 (max= 2.6842), tps=18230, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:08:39,486 - root - INFO - Step 5440: lr=1.00E-05, loss= 1.3060 (max= 2.2755), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:08:39,486 - root - INFO - Step 5440: lr=1.00E-05, loss= 1.3060 (max= 2.2755), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:08:39,486 - root - INFO - Step 5440: lr=1.00E-05, loss= 1.3060 (max= 2.2755), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:08:39,486 - root - INFO - Step 5440: lr=1.00E-05, loss= 1.3060 (max= 2.2755), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:08:39,487 - root - INFO - Step 5440: lr=1.00E-05, loss= 1.3060 (max= 2.2755), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:08:39,487 - root - INFO - Step 5440: lr=1.00E-05, loss= 1.3060 (max= 2.2755), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:08:39,487 - root - INFO - Step 5440: lr=1.00E-05, loss= 1.3060 (max= 2.2755), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:08:39,487 - root - INFO - Step 5440: lr=1.00E-05, loss= 1.3060 (max= 2.2755), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:08:57,508 - root - INFO - Step 5450: lr=1.00E-05, loss= 1.3095 (max= 2.5420), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:08:57,509 - root - INFO - Step 5450: lr=1.00E-05, loss= 1.3095 (max= 2.5420), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:08:57,509 - root - INFO - Step 5450: lr=1.00E-05, loss= 1.3095 (max= 2.5420), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:08:57,509 - root - INFO - Step 5450: lr=1.00E-05, loss= 1.3095 (max= 2.5420), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:08:57,509 - root - INFO - Step 5450: lr=1.00E-05, loss= 1.3095 (max= 2.5420), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:08:57,509 - root - INFO - Step 5450: lr=1.00E-05, loss= 1.3095 (max= 2.5420), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:08:57,509 - root - INFO - Step 5450: lr=1.00E-05, loss= 1.3095 (max= 2.5420), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:08:57,509 - root - INFO - Step 5450: lr=1.00E-05, loss= 1.3095 (max= 2.5420), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:09:15,501 - root - INFO - Step 5460: lr=1.00E-05, loss= 1.2762 (max= 2.3335), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:09:15,502 - root - INFO - Step 5460: lr=1.00E-05, loss= 1.2762 (max= 2.3335), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:09:15,502 - root - INFO - Step 5460: lr=1.00E-05, loss= 1.2762 (max= 2.3335), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:09:15,502 - root - INFO - Step 5460: lr=1.00E-05, loss= 1.2762 (max= 2.3335), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:09:15,502 - root - INFO - Step 5460: lr=1.00E-05, loss= 1.2762 (max= 2.3335), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:09:15,502 - root - INFO - Step 5460: lr=1.00E-05, loss= 1.2762 (max= 2.3335), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:09:15,502 - root - INFO - Step 5460: lr=1.00E-05, loss= 1.2762 (max= 2.3335), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:09:15,502 - root - INFO - Step 5460: lr=1.00E-05, loss= 1.2762 (max= 2.3335), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:09:33,534 - root - INFO - Step 5470: lr=1.00E-05, loss= 1.2515 (max= 2.2682), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:09:33,534 - root - INFO - Step 5470: lr=1.00E-05, loss= 1.2515 (max= 2.2682), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:09:33,534 - root - INFO - Step 5470: lr=1.00E-05, loss= 1.2515 (max= 2.2682), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:09:33,534 - root - INFO - Step 5470: lr=1.00E-05, loss= 1.2515 (max= 2.2682), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:09:33,534 - root - INFO - Step 5470: lr=1.00E-05, loss= 1.2515 (max= 2.2682), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:09:33,534 - root - INFO - Step 5470: lr=1.00E-05, loss= 1.2515 (max= 2.2682), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:09:33,534 - root - INFO - Step 5470: lr=1.00E-05, loss= 1.2515 (max= 2.2682), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:09:33,534 - root - INFO - Step 5470: lr=1.00E-05, loss= 1.2515 (max= 2.2682), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:09:51,564 - root - INFO - Step 5480: lr=1.00E-05, loss= 1.2907 (max= 3.6180), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:09:51,565 - root - INFO - Step 5480: lr=1.00E-05, loss= 1.2907 (max= 3.6180), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:09:51,565 - root - INFO - Step 5480: lr=1.00E-05, loss= 1.2907 (max= 3.6180), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:09:51,565 - root - INFO - Step 5480: lr=1.00E-05, loss= 1.2907 (max= 3.6180), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:09:51,565 - root - INFO - Step 5480: lr=1.00E-05, loss= 1.2907 (max= 3.6180), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:09:51,565 - root - INFO - Step 5480: lr=1.00E-05, loss= 1.2907 (max= 3.6180), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:09:51,565 - root - INFO - Step 5480: lr=1.00E-05, loss= 1.2907 (max= 3.6180), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:09:51,565 - root - INFO - Step 5480: lr=1.00E-05, loss= 1.2907 (max= 3.6180), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:10:09,573 - root - INFO - Step 5490: lr=1.00E-05, loss= 1.2760 (max= 2.4187), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:10:09,573 - root - INFO - Step 5490: lr=1.00E-05, loss= 1.2760 (max= 2.4187), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:10:09,573 - root - INFO - Step 5490: lr=1.00E-05, loss= 1.2760 (max= 2.4187), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:10:09,574 - root - INFO - Step 5490: lr=1.00E-05, loss= 1.2760 (max= 2.4187), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:10:09,574 - root - INFO - Step 5490: lr=1.00E-05, loss= 1.2760 (max= 2.4187), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:10:09,574 - root - INFO - Step 5490: lr=1.00E-05, loss= 1.2760 (max= 2.4187), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:10:09,574 - root - INFO - Step 5490: lr=1.00E-05, loss= 1.2760 (max= 2.4187), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:10:09,574 - root - INFO - Step 5490: lr=1.00E-05, loss= 1.2760 (max= 2.4187), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:10:27,625 - root - INFO - Step 5500: lr=1.00E-05, loss= 1.2944 (max= 2.6238), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:10:27,625 - root - INFO - Step 5500: lr=1.00E-05, loss= 1.2944 (max= 2.6238), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:10:27,625 - root - INFO - Step 5500: lr=1.00E-05, loss= 1.2944 (max= 2.6238), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:10:27,626 - root - INFO - Step 5500: lr=1.00E-05, loss= 1.2944 (max= 2.6238), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:10:27,626 - root - INFO - Step 5500: lr=1.00E-05, loss= 1.2944 (max= 2.6238), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:10:27,626 - root - INFO - Step 5500: lr=1.00E-05, loss= 1.2944 (max= 2.6238), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:10:27,626 - root - INFO - Step 5500: lr=1.00E-05, loss= 1.2944 (max= 2.6238), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:10:27,626 - root - INFO - Step 5500: lr=1.00E-05, loss= 1.2944 (max= 2.6238), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:10:45,673 - root - INFO - Step 5510: lr=1.00E-05, loss= 1.2879 (max= 3.4782), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:10:45,673 - root - INFO - Step 5510: lr=1.00E-05, loss= 1.2879 (max= 3.4782), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:10:45,673 - root - INFO - Step 5510: lr=1.00E-05, loss= 1.2879 (max= 3.4782), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:10:45,673 - root - INFO - Step 5510: lr=1.00E-05, loss= 1.2879 (max= 3.4782), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:10:45,673 - root - INFO - Step 5510: lr=1.00E-05, loss= 1.2879 (max= 3.4782), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:10:45,673 - root - INFO - Step 5510: lr=1.00E-05, loss= 1.2879 (max= 3.4782), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:10:45,673 - root - INFO - Step 5510: lr=1.00E-05, loss= 1.2879 (max= 3.4782), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:10:45,673 - root - INFO - Step 5510: lr=1.00E-05, loss= 1.2879 (max= 3.4782), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:03,700 - root - INFO - Step 5520: lr=1.00E-05, loss= 1.2759 (max= 2.3619), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:03,700 - root - INFO - Step 5520: lr=1.00E-05, loss= 1.2759 (max= 2.3619), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:03,700 - root - INFO - Step 5520: lr=1.00E-05, loss= 1.2759 (max= 2.3619), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:03,701 - root - INFO - Step 5520: lr=1.00E-05, loss= 1.2759 (max= 2.3619), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:03,701 - root - INFO - Step 5520: lr=1.00E-05, loss= 1.2759 (max= 2.3619), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:03,701 - root - INFO - Step 5520: lr=1.00E-05, loss= 1.2759 (max= 2.3619), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:03,701 - root - INFO - Step 5520: lr=1.00E-05, loss= 1.2759 (max= 2.3619), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:03,701 - root - INFO - Step 5520: lr=1.00E-05, loss= 1.2759 (max= 2.3619), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:21,686 - root - INFO - Step 5530: lr=1.00E-05, loss= 1.2849 (max= 2.2919), tps=18222, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:11:21,687 - root - INFO - Step 5530: lr=1.00E-05, loss= 1.2849 (max= 2.2919), tps=18222, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:11:21,687 - root - INFO - Step 5530: lr=1.00E-05, loss= 1.2849 (max= 2.2919), tps=18222, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:11:21,687 - root - INFO - Step 5530: lr=1.00E-05, loss= 1.2849 (max= 2.2919), tps=18222, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:11:21,687 - root - INFO - Step 5530: lr=1.00E-05, loss= 1.2849 (max= 2.2919), tps=18222, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:11:21,687 - root - INFO - Step 5530: lr=1.00E-05, loss= 1.2849 (max= 2.2919), tps=18222, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:11:21,687 - root - INFO - Step 5530: lr=1.00E-05, loss= 1.2849 (max= 2.2919), tps=18222, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:11:21,687 - root - INFO - Step 5530: lr=1.00E-05, loss= 1.2849 (max= 2.2919), tps=18222, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:11:39,940 - root - INFO - Step 5540: lr=1.00E-05, loss= 1.2903 (max= 2.0633), tps=17957, mfu=37.41%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:39,940 - root - INFO - Step 5540: lr=1.00E-05, loss= 1.2903 (max= 2.0633), tps=17957, mfu=37.41%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:39,940 - root - INFO - Step 5540: lr=1.00E-05, loss= 1.2903 (max= 2.0633), tps=17957, mfu=37.41%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:39,940 - root - INFO - Step 5540: lr=1.00E-05, loss= 1.2903 (max= 2.0633), tps=17957, mfu=37.41%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:39,941 - root - INFO - Step 5540: lr=1.00E-05, loss= 1.2903 (max= 2.0633), tps=17957, mfu=37.41%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:39,941 - root - INFO - Step 5540: lr=1.00E-05, loss= 1.2903 (max= 2.0633), tps=17957, mfu=37.41%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:39,941 - root - INFO - Step 5540: lr=1.00E-05, loss= 1.2903 (max= 2.0633), tps=17957, mfu=37.41%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:39,941 - root - INFO - Step 5540: lr=1.00E-05, loss= 1.2903 (max= 2.0633), tps=17957, mfu=37.41%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:58,198 - root - INFO - Step 5550: lr=1.00E-05, loss= 1.2750 (max= 2.5687), tps=17951, mfu=37.40%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:58,199 - root - INFO - Step 5550: lr=1.00E-05, loss= 1.2750 (max= 2.5687), tps=17951, mfu=37.40%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:58,199 - root - INFO - Step 5550: lr=1.00E-05, loss= 1.2750 (max= 2.5687), tps=17951, mfu=37.40%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:58,199 - root - INFO - Step 5550: lr=1.00E-05, loss= 1.2750 (max= 2.5687), tps=17951, mfu=37.40%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:58,199 - root - INFO - Step 5550: lr=1.00E-05, loss= 1.2750 (max= 2.5687), tps=17951, mfu=37.40%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:58,199 - root - INFO - Step 5550: lr=1.00E-05, loss= 1.2750 (max= 2.5687), tps=17951, mfu=37.40%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:58,199 - root - INFO - Step 5550: lr=1.00E-05, loss= 1.2750 (max= 2.5687), tps=17950, mfu=37.40%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:11:58,199 - root - INFO - Step 5550: lr=1.00E-05, loss= 1.2750 (max= 2.5687), tps=17950, mfu=37.40%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:12:16,190 - root - INFO - Step 5560: lr=1.00E-05, loss= 1.2644 (max= 2.3050), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:12:16,191 - root - INFO - Step 5560: lr=1.00E-05, loss= 1.2644 (max= 2.3050), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:12:16,191 - root - INFO - Step 5560: lr=1.00E-05, loss= 1.2644 (max= 2.3050), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:12:16,191 - root - INFO - Step 5560: lr=1.00E-05, loss= 1.2644 (max= 2.3050), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:12:16,191 - root - INFO - Step 5560: lr=1.00E-05, loss= 1.2644 (max= 2.3050), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:12:16,191 - root - INFO - Step 5560: lr=1.00E-05, loss= 1.2644 (max= 2.3050), tps=18217, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:12:16,191 - root - INFO - Step 5560: lr=1.00E-05, loss= 1.2644 (max= 2.3050), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:12:16,191 - root - INFO - Step 5560: lr=1.00E-05, loss= 1.2644 (max= 2.3050), tps=18217, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:12:34,693 - root - INFO - Step 5570: lr=1.00E-05, loss= 1.3096 (max= 2.3215), tps=17715, mfu=36.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:12:34,694 - root - INFO - Step 5570: lr=1.00E-05, loss= 1.3096 (max= 2.3215), tps=17715, mfu=36.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:12:34,694 - root - INFO - Step 5570: lr=1.00E-05, loss= 1.3096 (max= 2.3215), tps=17715, mfu=36.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:12:34,694 - root - INFO - Step 5570: lr=1.00E-05, loss= 1.3096 (max= 2.3215), tps=17715, mfu=36.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:12:34,694 - root - INFO - Step 5570: lr=1.00E-05, loss= 1.3096 (max= 2.3215), tps=17715, mfu=36.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:12:34,694 - root - INFO - Step 5570: lr=1.00E-05, loss= 1.3096 (max= 2.3215), tps=17715, mfu=36.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:12:34,694 - root - INFO - Step 5570: lr=1.00E-05, loss= 1.3096 (max= 2.3215), tps=17715, mfu=36.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:12:34,694 - root - INFO - Step 5570: lr=1.00E-05, loss= 1.3096 (max= 2.3215), tps=17715, mfu=36.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:12:52,762 - root - INFO - Step 5580: lr=1.00E-05, loss= 1.2686 (max= 2.4550), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:12:52,763 - root - INFO - Step 5580: lr=1.00E-05, loss= 1.2686 (max= 2.4550), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:12:52,763 - root - INFO - Step 5580: lr=1.00E-05, loss= 1.2686 (max= 2.4550), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:12:52,763 - root - INFO - Step 5580: lr=1.00E-05, loss= 1.2686 (max= 2.4550), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:12:52,763 - root - INFO - Step 5580: lr=1.00E-05, loss= 1.2686 (max= 2.4550), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:12:52,763 - root - INFO - Step 5580: lr=1.00E-05, loss= 1.2686 (max= 2.4550), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:12:52,763 - root - INFO - Step 5580: lr=1.00E-05, loss= 1.2686 (max= 2.4550), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:12:52,763 - root - INFO - Step 5580: lr=1.00E-05, loss= 1.2686 (max= 2.4550), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:13:10,769 - root - INFO - Step 5590: lr=1.00E-05, loss= 1.2920 (max= 2.2146), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:10,769 - root - INFO - Step 5590: lr=1.00E-05, loss= 1.2920 (max= 2.2146), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:10,769 - root - INFO - Step 5590: lr=1.00E-05, loss= 1.2920 (max= 2.2146), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:10,769 - root - INFO - Step 5590: lr=1.00E-05, loss= 1.2920 (max= 2.2146), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:10,769 - root - INFO - Step 5590: lr=1.00E-05, loss= 1.2920 (max= 2.2146), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:10,769 - root - INFO - Step 5590: lr=1.00E-05, loss= 1.2920 (max= 2.2146), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:10,769 - root - INFO - Step 5590: lr=1.00E-05, loss= 1.2920 (max= 2.2146), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:10,770 - root - INFO - Step 5590: lr=1.00E-05, loss= 1.2920 (max= 2.2146), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:28,782 - root - INFO - Step 5600: lr=1.00E-05, loss= 1.3108 (max= 2.2364), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:28,782 - root - INFO - Step 5600: lr=1.00E-05, loss= 1.3108 (max= 2.2364), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:28,782 - root - INFO - Step 5600: lr=1.00E-05, loss= 1.3108 (max= 2.2364), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:28,782 - root - INFO - Step 5600: lr=1.00E-05, loss= 1.3108 (max= 2.2364), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:28,782 - root - INFO - Step 5600: lr=1.00E-05, loss= 1.3108 (max= 2.2364), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:28,782 - root - INFO - Step 5600: lr=1.00E-05, loss= 1.3108 (max= 2.2364), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:28,782 - root - INFO - Step 5600: lr=1.00E-05, loss= 1.3108 (max= 2.2364), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:28,783 - root - INFO - Step 5600: lr=1.00E-05, loss= 1.3108 (max= 2.2364), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:46,765 - root - INFO - Step 5610: lr=1.00E-05, loss= 1.2768 (max= 2.5599), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:46,765 - root - INFO - Step 5610: lr=1.00E-05, loss= 1.2768 (max= 2.5599), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:46,765 - root - INFO - Step 5610: lr=1.00E-05, loss= 1.2768 (max= 2.5599), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:46,765 - root - INFO - Step 5610: lr=1.00E-05, loss= 1.2768 (max= 2.5599), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:46,766 - root - INFO - Step 5610: lr=1.00E-05, loss= 1.2768 (max= 2.5599), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:46,766 - root - INFO - Step 5610: lr=1.00E-05, loss= 1.2768 (max= 2.5599), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:46,766 - root - INFO - Step 5610: lr=1.00E-05, loss= 1.2768 (max= 2.5599), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:13:46,766 - root - INFO - Step 5610: lr=1.00E-05, loss= 1.2768 (max= 2.5599), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:14:10,276 - root - INFO - Step 5620: lr=1.00E-05, loss= 1.2684 (max= 2.3787), tps=13940, mfu=29.04%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.14s, 24.28%) +2025-10-24 12:14:10,276 - root - INFO - Step 5620: lr=1.00E-05, loss= 1.2684 (max= 2.3787), tps=13940, mfu=29.04%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.14s, 24.28%) +2025-10-24 12:14:10,276 - root - INFO - Step 5620: lr=1.00E-05, loss= 1.2684 (max= 2.3787), tps=13940, mfu=29.04%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.14s, 24.28%) +2025-10-24 12:14:10,276 - root - INFO - Step 5620: lr=1.00E-05, loss= 1.2684 (max= 2.3787), tps=13940, mfu=29.04%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.14s, 24.28%) +2025-10-24 12:14:10,276 - root - INFO - Step 5620: lr=1.00E-05, loss= 1.2684 (max= 2.3787), tps=13940, mfu=29.04%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.14s, 24.28%) +2025-10-24 12:14:10,276 - root - INFO - Step 5620: lr=1.00E-05, loss= 1.2684 (max= 2.3787), tps=13940, mfu=29.04%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.14s, 24.28%) +2025-10-24 12:14:10,276 - root - INFO - Step 5620: lr=1.00E-05, loss= 1.2684 (max= 2.3787), tps=13940, mfu=29.04%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.14s, 24.28%) +2025-10-24 12:14:10,276 - root - INFO - Step 5620: lr=1.00E-05, loss= 1.2684 (max= 2.3787), tps=13940, mfu=29.04%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.14s, 24.28%) +2025-10-24 12:14:28,298 - root - INFO - Step 5630: lr=1.00E-05, loss= 1.2595 (max= 2.4108), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:14:28,298 - root - INFO - Step 5630: lr=1.00E-05, loss= 1.2595 (max= 2.4108), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:14:28,298 - root - INFO - Step 5630: lr=1.00E-05, loss= 1.2595 (max= 2.4108), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:14:28,298 - root - INFO - Step 5630: lr=1.00E-05, loss= 1.2595 (max= 2.4108), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:14:28,298 - root - INFO - Step 5630: lr=1.00E-05, loss= 1.2595 (max= 2.4108), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:14:28,298 - root - INFO - Step 5630: lr=1.00E-05, loss= 1.2595 (max= 2.4108), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:14:28,299 - root - INFO - Step 5630: lr=1.00E-05, loss= 1.2595 (max= 2.4108), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:14:28,299 - root - INFO - Step 5630: lr=1.00E-05, loss= 1.2595 (max= 2.4108), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:14:46,328 - root - INFO - Step 5640: lr=1.00E-05, loss= 1.2753 (max= 2.3413), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:14:46,328 - root - INFO - Step 5640: lr=1.00E-05, loss= 1.2753 (max= 2.3413), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:14:46,328 - root - INFO - Step 5640: lr=1.00E-05, loss= 1.2753 (max= 2.3413), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:14:46,328 - root - INFO - Step 5640: lr=1.00E-05, loss= 1.2753 (max= 2.3413), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:14:46,328 - root - INFO - Step 5640: lr=1.00E-05, loss= 1.2753 (max= 2.3413), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:14:46,328 - root - INFO - Step 5640: lr=1.00E-05, loss= 1.2753 (max= 2.3413), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:14:46,328 - root - INFO - Step 5640: lr=1.00E-05, loss= 1.2753 (max= 2.3413), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:14:46,328 - root - INFO - Step 5640: lr=1.00E-05, loss= 1.2753 (max= 2.3413), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:04,383 - root - INFO - Step 5650: lr=1.00E-05, loss= 1.2861 (max= 2.2117), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:04,383 - root - INFO - Step 5650: lr=1.00E-05, loss= 1.2861 (max= 2.2117), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:04,383 - root - INFO - Step 5650: lr=1.00E-05, loss= 1.2861 (max= 2.2117), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:04,383 - root - INFO - Step 5650: lr=1.00E-05, loss= 1.2861 (max= 2.2117), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:04,384 - root - INFO - Step 5650: lr=1.00E-05, loss= 1.2861 (max= 2.2117), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:04,384 - root - INFO - Step 5650: lr=1.00E-05, loss= 1.2861 (max= 2.2117), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:04,384 - root - INFO - Step 5650: lr=1.00E-05, loss= 1.2861 (max= 2.2117), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:04,384 - root - INFO - Step 5650: lr=1.00E-05, loss= 1.2861 (max= 2.2117), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:22,426 - root - INFO - Step 5660: lr=1.00E-05, loss= 1.2928 (max= 2.4778), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:22,426 - root - INFO - Step 5660: lr=1.00E-05, loss= 1.2928 (max= 2.4778), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:22,426 - root - INFO - Step 5660: lr=1.00E-05, loss= 1.2928 (max= 2.4778), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:22,426 - root - INFO - Step 5660: lr=1.00E-05, loss= 1.2928 (max= 2.4778), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:22,426 - root - INFO - Step 5660: lr=1.00E-05, loss= 1.2928 (max= 2.4778), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:22,426 - root - INFO - Step 5660: lr=1.00E-05, loss= 1.2928 (max= 2.4778), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:22,426 - root - INFO - Step 5660: lr=1.00E-05, loss= 1.2928 (max= 2.4778), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:22,426 - root - INFO - Step 5660: lr=1.00E-05, loss= 1.2928 (max= 2.4778), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:40,460 - root - INFO - Step 5670: lr=1.00E-05, loss= 1.2787 (max= 3.1387), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:40,460 - root - INFO - Step 5670: lr=1.00E-05, loss= 1.2787 (max= 3.1387), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:40,460 - root - INFO - Step 5670: lr=1.00E-05, loss= 1.2787 (max= 3.1387), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:40,460 - root - INFO - Step 5670: lr=1.00E-05, loss= 1.2787 (max= 3.1387), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:40,460 - root - INFO - Step 5670: lr=1.00E-05, loss= 1.2787 (max= 3.1387), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:40,460 - root - INFO - Step 5670: lr=1.00E-05, loss= 1.2787 (max= 3.1387), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:40,460 - root - INFO - Step 5670: lr=1.00E-05, loss= 1.2787 (max= 3.1387), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:40,460 - root - INFO - Step 5670: lr=1.00E-05, loss= 1.2787 (max= 3.1387), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:58,454 - root - INFO - Step 5680: lr=1.00E-05, loss= 1.2988 (max= 2.3891), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:58,454 - root - INFO - Step 5680: lr=1.00E-05, loss= 1.2988 (max= 2.3891), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:58,454 - root - INFO - Step 5680: lr=1.00E-05, loss= 1.2988 (max= 2.3891), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:58,454 - root - INFO - Step 5680: lr=1.00E-05, loss= 1.2988 (max= 2.3891), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:58,454 - root - INFO - Step 5680: lr=1.00E-05, loss= 1.2988 (max= 2.3891), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:58,454 - root - INFO - Step 5680: lr=1.00E-05, loss= 1.2988 (max= 2.3891), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:58,454 - root - INFO - Step 5680: lr=1.00E-05, loss= 1.2988 (max= 2.3891), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:15:58,454 - root - INFO - Step 5680: lr=1.00E-05, loss= 1.2988 (max= 2.3891), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:16:16,459 - root - INFO - Step 5690: lr=1.00E-05, loss= 1.2798 (max= 2.1367), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:16:16,459 - root - INFO - Step 5690: lr=1.00E-05, loss= 1.2798 (max= 2.1367), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:16:16,459 - root - INFO - Step 5690: lr=1.00E-05, loss= 1.2798 (max= 2.1367), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:16:16,459 - root - INFO - Step 5690: lr=1.00E-05, loss= 1.2798 (max= 2.1367), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:16:16,459 - root - INFO - Step 5690: lr=1.00E-05, loss= 1.2798 (max= 2.1367), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:16:16,459 - root - INFO - Step 5690: lr=1.00E-05, loss= 1.2798 (max= 2.1367), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:16:16,459 - root - INFO - Step 5690: lr=1.00E-05, loss= 1.2798 (max= 2.1367), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:16:16,459 - root - INFO - Step 5690: lr=1.00E-05, loss= 1.2798 (max= 2.1367), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:16:34,492 - root - INFO - Step 5700: lr=1.00E-05, loss= 1.2853 (max= 2.3708), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:16:34,492 - root - INFO - Step 5700: lr=1.00E-05, loss= 1.2853 (max= 2.3708), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:16:34,492 - root - INFO - Step 5700: lr=1.00E-05, loss= 1.2853 (max= 2.3708), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:16:34,492 - root - INFO - Step 5700: lr=1.00E-05, loss= 1.2853 (max= 2.3708), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:16:34,492 - root - INFO - Step 5700: lr=1.00E-05, loss= 1.2853 (max= 2.3708), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:16:34,492 - root - INFO - Step 5700: lr=1.00E-05, loss= 1.2853 (max= 2.3708), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:16:34,492 - root - INFO - Step 5700: lr=1.00E-05, loss= 1.2853 (max= 2.3708), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:16:34,492 - root - INFO - Step 5700: lr=1.00E-05, loss= 1.2853 (max= 2.3708), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:16:52,537 - root - INFO - Step 5710: lr=1.00E-05, loss= 1.3150 (max= 2.1262), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:16:52,537 - root - INFO - Step 5710: lr=1.00E-05, loss= 1.3150 (max= 2.1262), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:16:52,537 - root - INFO - Step 5710: lr=1.00E-05, loss= 1.3150 (max= 2.1262), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:16:52,537 - root - INFO - Step 5710: lr=1.00E-05, loss= 1.3150 (max= 2.1262), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:16:52,538 - root - INFO - Step 5710: lr=1.00E-05, loss= 1.3150 (max= 2.1262), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:16:52,538 - root - INFO - Step 5710: lr=1.00E-05, loss= 1.3150 (max= 2.1262), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:16:52,538 - root - INFO - Step 5710: lr=1.00E-05, loss= 1.3150 (max= 2.1262), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:16:52,538 - root - INFO - Step 5710: lr=1.00E-05, loss= 1.3150 (max= 2.1262), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:10,552 - root - INFO - Step 5720: lr=1.00E-05, loss= 1.2764 (max= 2.0799), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:10,552 - root - INFO - Step 5720: lr=1.00E-05, loss= 1.2764 (max= 2.0799), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:10,552 - root - INFO - Step 5720: lr=1.00E-05, loss= 1.2764 (max= 2.0799), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:10,552 - root - INFO - Step 5720: lr=1.00E-05, loss= 1.2764 (max= 2.0799), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:10,552 - root - INFO - Step 5720: lr=1.00E-05, loss= 1.2764 (max= 2.0799), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:10,553 - root - INFO - Step 5720: lr=1.00E-05, loss= 1.2764 (max= 2.0799), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:10,553 - root - INFO - Step 5720: lr=1.00E-05, loss= 1.2764 (max= 2.0799), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:10,553 - root - INFO - Step 5720: lr=1.00E-05, loss= 1.2764 (max= 2.0799), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:28,588 - root - INFO - Step 5730: lr=1.00E-05, loss= 1.3161 (max= 2.2454), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:28,588 - root - INFO - Step 5730: lr=1.00E-05, loss= 1.3161 (max= 2.2454), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:28,588 - root - INFO - Step 5730: lr=1.00E-05, loss= 1.3161 (max= 2.2454), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:28,588 - root - INFO - Step 5730: lr=1.00E-05, loss= 1.3161 (max= 2.2454), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:28,588 - root - INFO - Step 5730: lr=1.00E-05, loss= 1.3161 (max= 2.2454), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:28,588 - root - INFO - Step 5730: lr=1.00E-05, loss= 1.3161 (max= 2.2454), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:28,588 - root - INFO - Step 5730: lr=1.00E-05, loss= 1.3161 (max= 2.2454), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:28,588 - root - INFO - Step 5730: lr=1.00E-05, loss= 1.3161 (max= 2.2454), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:46,603 - root - INFO - Step 5740: lr=1.00E-05, loss= 1.2858 (max= 2.8508), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:46,603 - root - INFO - Step 5740: lr=1.00E-05, loss= 1.2858 (max= 2.8508), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:46,603 - root - INFO - Step 5740: lr=1.00E-05, loss= 1.2858 (max= 2.8508), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:46,603 - root - INFO - Step 5740: lr=1.00E-05, loss= 1.2858 (max= 2.8508), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:46,603 - root - INFO - Step 5740: lr=1.00E-05, loss= 1.2858 (max= 2.8508), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:46,603 - root - INFO - Step 5740: lr=1.00E-05, loss= 1.2858 (max= 2.8508), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:46,603 - root - INFO - Step 5740: lr=1.00E-05, loss= 1.2858 (max= 2.8508), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:17:46,604 - root - INFO - Step 5740: lr=1.00E-05, loss= 1.2858 (max= 2.8508), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:18:04,631 - root - INFO - Step 5750: lr=1.00E-05, loss= 1.2616 (max= 2.3321), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:04,631 - root - INFO - Step 5750: lr=1.00E-05, loss= 1.2616 (max= 2.3321), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:04,631 - root - INFO - Step 5750: lr=1.00E-05, loss= 1.2616 (max= 2.3321), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:04,631 - root - INFO - Step 5750: lr=1.00E-05, loss= 1.2616 (max= 2.3321), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:04,631 - root - INFO - Step 5750: lr=1.00E-05, loss= 1.2616 (max= 2.3321), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:04,631 - root - INFO - Step 5750: lr=1.00E-05, loss= 1.2616 (max= 2.3321), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:04,631 - root - INFO - Step 5750: lr=1.00E-05, loss= 1.2616 (max= 2.3321), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:04,631 - root - INFO - Step 5750: lr=1.00E-05, loss= 1.2616 (max= 2.3321), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:22,635 - root - INFO - Step 5760: lr=1.00E-05, loss= 1.2713 (max= 2.3196), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:22,635 - root - INFO - Step 5760: lr=1.00E-05, loss= 1.2713 (max= 2.3196), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:22,635 - root - INFO - Step 5760: lr=1.00E-05, loss= 1.2713 (max= 2.3196), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:22,635 - root - INFO - Step 5760: lr=1.00E-05, loss= 1.2713 (max= 2.3196), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:22,635 - root - INFO - Step 5760: lr=1.00E-05, loss= 1.2713 (max= 2.3196), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:22,635 - root - INFO - Step 5760: lr=1.00E-05, loss= 1.2713 (max= 2.3196), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:22,635 - root - INFO - Step 5760: lr=1.00E-05, loss= 1.2713 (max= 2.3196), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:22,635 - root - INFO - Step 5760: lr=1.00E-05, loss= 1.2713 (max= 2.3196), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:40,656 - root - INFO - Step 5770: lr=1.00E-05, loss= 1.2973 (max= 2.2930), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:40,656 - root - INFO - Step 5770: lr=1.00E-05, loss= 1.2973 (max= 2.2930), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:40,656 - root - INFO - Step 5770: lr=1.00E-05, loss= 1.2973 (max= 2.2930), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:40,656 - root - INFO - Step 5770: lr=1.00E-05, loss= 1.2973 (max= 2.2930), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:40,656 - root - INFO - Step 5770: lr=1.00E-05, loss= 1.2973 (max= 2.2930), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:40,656 - root - INFO - Step 5770: lr=1.00E-05, loss= 1.2973 (max= 2.2930), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:40,656 - root - INFO - Step 5770: lr=1.00E-05, loss= 1.2973 (max= 2.2930), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:40,656 - root - INFO - Step 5770: lr=1.00E-05, loss= 1.2973 (max= 2.2930), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:18:58,644 - root - INFO - Step 5780: lr=1.00E-05, loss= 1.3022 (max= 2.9389), tps=18219, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:18:58,644 - root - INFO - Step 5780: lr=1.00E-05, loss= 1.3022 (max= 2.9389), tps=18219, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:18:58,644 - root - INFO - Step 5780: lr=1.00E-05, loss= 1.3022 (max= 2.9389), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:18:58,644 - root - INFO - Step 5780: lr=1.00E-05, loss= 1.3022 (max= 2.9389), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:18:58,645 - root - INFO - Step 5780: lr=1.00E-05, loss= 1.3022 (max= 2.9389), tps=18219, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:18:58,645 - root - INFO - Step 5780: lr=1.00E-05, loss= 1.3022 (max= 2.9389), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:18:58,645 - root - INFO - Step 5780: lr=1.00E-05, loss= 1.3022 (max= 2.9389), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:18:58,645 - root - INFO - Step 5780: lr=1.00E-05, loss= 1.3022 (max= 2.9389), tps=18219, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:19:16,675 - root - INFO - Step 5790: lr=1.00E-05, loss= 1.2856 (max= 2.7056), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:19:16,675 - root - INFO - Step 5790: lr=1.00E-05, loss= 1.2856 (max= 2.7056), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:19:16,675 - root - INFO - Step 5790: lr=1.00E-05, loss= 1.2856 (max= 2.7056), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:19:16,675 - root - INFO - Step 5790: lr=1.00E-05, loss= 1.2856 (max= 2.7056), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:19:16,676 - root - INFO - Step 5790: lr=1.00E-05, loss= 1.2856 (max= 2.7056), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:19:16,676 - root - INFO - Step 5790: lr=1.00E-05, loss= 1.2856 (max= 2.7056), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:19:16,676 - root - INFO - Step 5790: lr=1.00E-05, loss= 1.2856 (max= 2.7056), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:19:16,676 - root - INFO - Step 5790: lr=1.00E-05, loss= 1.2856 (max= 2.7056), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:19:34,712 - root - INFO - Step 5800: lr=1.00E-05, loss= 1.2881 (max= 3.5011), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:19:34,712 - root - INFO - Step 5800: lr=1.00E-05, loss= 1.2881 (max= 3.5011), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:19:34,712 - root - INFO - Step 5800: lr=1.00E-05, loss= 1.2881 (max= 3.5011), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:19:34,712 - root - INFO - Step 5800: lr=1.00E-05, loss= 1.2881 (max= 3.5011), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:19:34,712 - root - INFO - Step 5800: lr=1.00E-05, loss= 1.2881 (max= 3.5011), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:19:34,712 - root - INFO - Step 5800: lr=1.00E-05, loss= 1.2881 (max= 3.5011), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:19:34,712 - root - INFO - Step 5800: lr=1.00E-05, loss= 1.2881 (max= 3.5011), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:19:34,712 - root - INFO - Step 5800: lr=1.00E-05, loss= 1.2881 (max= 3.5011), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:19:52,713 - root - INFO - Step 5810: lr=1.00E-05, loss= 1.3043 (max= 3.1877), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:19:52,713 - root - INFO - Step 5810: lr=1.00E-05, loss= 1.3043 (max= 3.1877), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:19:52,713 - root - INFO - Step 5810: lr=1.00E-05, loss= 1.3043 (max= 3.1877), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:19:52,713 - root - INFO - Step 5810: lr=1.00E-05, loss= 1.3043 (max= 3.1877), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:19:52,713 - root - INFO - Step 5810: lr=1.00E-05, loss= 1.3043 (max= 3.1877), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:19:52,713 - root - INFO - Step 5810: lr=1.00E-05, loss= 1.3043 (max= 3.1877), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:19:52,713 - root - INFO - Step 5810: lr=1.00E-05, loss= 1.3043 (max= 3.1877), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:19:52,713 - root - INFO - Step 5810: lr=1.00E-05, loss= 1.3043 (max= 3.1877), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:20:10,738 - root - INFO - Step 5820: lr=1.00E-05, loss= 1.3015 (max= 3.6410), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:20:10,738 - root - INFO - Step 5820: lr=1.00E-05, loss= 1.3015 (max= 3.6410), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:20:10,738 - root - INFO - Step 5820: lr=1.00E-05, loss= 1.3015 (max= 3.6410), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:20:10,738 - root - INFO - Step 5820: lr=1.00E-05, loss= 1.3015 (max= 3.6410), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:20:10,738 - root - INFO - Step 5820: lr=1.00E-05, loss= 1.3015 (max= 3.6410), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:20:10,738 - root - INFO - Step 5820: lr=1.00E-05, loss= 1.3015 (max= 3.6410), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:20:10,738 - root - INFO - Step 5820: lr=1.00E-05, loss= 1.3015 (max= 3.6410), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:20:10,738 - root - INFO - Step 5820: lr=1.00E-05, loss= 1.3015 (max= 3.6410), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:20:28,725 - root - INFO - Step 5830: lr=1.00E-05, loss= 1.3181 (max= 2.7748), tps=18221, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:20:28,725 - root - INFO - Step 5830: lr=1.00E-05, loss= 1.3181 (max= 2.7748), tps=18221, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:20:28,725 - root - INFO - Step 5830: lr=1.00E-05, loss= 1.3181 (max= 2.7748), tps=18221, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:20:28,725 - root - INFO - Step 5830: lr=1.00E-05, loss= 1.3181 (max= 2.7748), tps=18221, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:20:28,725 - root - INFO - Step 5830: lr=1.00E-05, loss= 1.3181 (max= 2.7748), tps=18221, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:20:28,725 - root - INFO - Step 5830: lr=1.00E-05, loss= 1.3181 (max= 2.7748), tps=18221, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:20:28,725 - root - INFO - Step 5830: lr=1.00E-05, loss= 1.3181 (max= 2.7748), tps=18221, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:20:28,725 - root - INFO - Step 5830: lr=1.00E-05, loss= 1.3181 (max= 2.7748), tps=18221, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:20:46,740 - root - INFO - Step 5840: lr=1.00E-05, loss= 1.3010 (max= 2.2929), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:20:46,740 - root - INFO - Step 5840: lr=1.00E-05, loss= 1.3010 (max= 2.2929), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:20:46,740 - root - INFO - Step 5840: lr=1.00E-05, loss= 1.3010 (max= 2.2929), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:20:46,740 - root - INFO - Step 5840: lr=1.00E-05, loss= 1.3010 (max= 2.2929), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:20:46,740 - root - INFO - Step 5840: lr=1.00E-05, loss= 1.3010 (max= 2.2929), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:20:46,740 - root - INFO - Step 5840: lr=1.00E-05, loss= 1.3010 (max= 2.2929), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:20:46,740 - root - INFO - Step 5840: lr=1.00E-05, loss= 1.3010 (max= 2.2929), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:20:46,740 - root - INFO - Step 5840: lr=1.00E-05, loss= 1.3010 (max= 2.2929), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:21:04,813 - root - INFO - Step 5850: lr=1.00E-05, loss= 1.3035 (max= 2.5379), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:04,814 - root - INFO - Step 5850: lr=1.00E-05, loss= 1.3035 (max= 2.5379), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:04,814 - root - INFO - Step 5850: lr=1.00E-05, loss= 1.3035 (max= 2.5379), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:04,814 - root - INFO - Step 5850: lr=1.00E-05, loss= 1.3035 (max= 2.5379), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:04,814 - root - INFO - Step 5850: lr=1.00E-05, loss= 1.3035 (max= 2.5379), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:04,814 - root - INFO - Step 5850: lr=1.00E-05, loss= 1.3035 (max= 2.5379), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:04,814 - root - INFO - Step 5850: lr=1.00E-05, loss= 1.3035 (max= 2.5379), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:04,814 - root - INFO - Step 5850: lr=1.00E-05, loss= 1.3035 (max= 2.5379), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:22,862 - root - INFO - Step 5860: lr=1.00E-05, loss= 1.2955 (max= 2.2850), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:22,862 - root - INFO - Step 5860: lr=1.00E-05, loss= 1.2955 (max= 2.2850), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:22,862 - root - INFO - Step 5860: lr=1.00E-05, loss= 1.2955 (max= 2.2850), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:22,862 - root - INFO - Step 5860: lr=1.00E-05, loss= 1.2955 (max= 2.2850), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:22,863 - root - INFO - Step 5860: lr=1.00E-05, loss= 1.2955 (max= 2.2850), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:22,863 - root - INFO - Step 5860: lr=1.00E-05, loss= 1.2955 (max= 2.2850), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:22,863 - root - INFO - Step 5860: lr=1.00E-05, loss= 1.2955 (max= 2.2850), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:22,863 - root - INFO - Step 5860: lr=1.00E-05, loss= 1.2955 (max= 2.2850), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:40,871 - root - INFO - Step 5870: lr=1.00E-05, loss= 1.3016 (max= 2.2580), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:40,871 - root - INFO - Step 5870: lr=1.00E-05, loss= 1.3016 (max= 2.2580), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:40,871 - root - INFO - Step 5870: lr=1.00E-05, loss= 1.3016 (max= 2.2580), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:40,871 - root - INFO - Step 5870: lr=1.00E-05, loss= 1.3016 (max= 2.2580), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:40,871 - root - INFO - Step 5870: lr=1.00E-05, loss= 1.3016 (max= 2.2580), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:40,871 - root - INFO - Step 5870: lr=1.00E-05, loss= 1.3016 (max= 2.2580), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:40,871 - root - INFO - Step 5870: lr=1.00E-05, loss= 1.3016 (max= 2.2580), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:40,871 - root - INFO - Step 5870: lr=1.00E-05, loss= 1.3016 (max= 2.2580), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:58,917 - root - INFO - Step 5880: lr=1.00E-05, loss= 1.3080 (max= 2.4338), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:58,917 - root - INFO - Step 5880: lr=1.00E-05, loss= 1.3080 (max= 2.4338), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:58,917 - root - INFO - Step 5880: lr=1.00E-05, loss= 1.3080 (max= 2.4338), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:58,917 - root - INFO - Step 5880: lr=1.00E-05, loss= 1.3080 (max= 2.4338), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:58,917 - root - INFO - Step 5880: lr=1.00E-05, loss= 1.3080 (max= 2.4338), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:58,917 - root - INFO - Step 5880: lr=1.00E-05, loss= 1.3080 (max= 2.4338), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:58,917 - root - INFO - Step 5880: lr=1.00E-05, loss= 1.3080 (max= 2.4338), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:21:58,917 - root - INFO - Step 5880: lr=1.00E-05, loss= 1.3080 (max= 2.4338), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:22:16,950 - root - INFO - Step 5890: lr=1.00E-05, loss= 1.2664 (max= 2.4476), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:22:16,950 - root - INFO - Step 5890: lr=1.00E-05, loss= 1.2664 (max= 2.4476), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:22:16,950 - root - INFO - Step 5890: lr=1.00E-05, loss= 1.2664 (max= 2.4476), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:22:16,950 - root - INFO - Step 5890: lr=1.00E-05, loss= 1.2664 (max= 2.4476), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:22:16,950 - root - INFO - Step 5890: lr=1.00E-05, loss= 1.2664 (max= 2.4476), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:22:16,951 - root - INFO - Step 5890: lr=1.00E-05, loss= 1.2664 (max= 2.4476), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:22:16,951 - root - INFO - Step 5890: lr=1.00E-05, loss= 1.2664 (max= 2.4476), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:22:16,951 - root - INFO - Step 5890: lr=1.00E-05, loss= 1.2664 (max= 2.4476), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:22:35,018 - root - INFO - Step 5900: lr=1.00E-05, loss= 1.2708 (max= 2.5009), tps=18140, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:22:35,018 - root - INFO - Step 5900: lr=1.00E-05, loss= 1.2708 (max= 2.5009), tps=18140, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:22:35,018 - root - INFO - Step 5900: lr=1.00E-05, loss= 1.2708 (max= 2.5009), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:22:35,018 - root - INFO - Step 5900: lr=1.00E-05, loss= 1.2708 (max= 2.5009), tps=18140, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:22:35,018 - root - INFO - Step 5900: lr=1.00E-05, loss= 1.2708 (max= 2.5009), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:22:35,018 - root - INFO - Step 5900: lr=1.00E-05, loss= 1.2708 (max= 2.5009), tps=18140, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:22:35,018 - root - INFO - Step 5900: lr=1.00E-05, loss= 1.2708 (max= 2.5009), tps=18140, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:22:35,018 - root - INFO - Step 5900: lr=1.00E-05, loss= 1.2708 (max= 2.5009), tps=18140, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:22:53,054 - root - INFO - Step 5910: lr=1.00E-05, loss= 1.2612 (max= 2.3376), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:22:53,054 - root - INFO - Step 5910: lr=1.00E-05, loss= 1.2612 (max= 2.3376), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:22:53,054 - root - INFO - Step 5910: lr=1.00E-05, loss= 1.2612 (max= 2.3376), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:22:53,054 - root - INFO - Step 5910: lr=1.00E-05, loss= 1.2612 (max= 2.3376), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:22:53,054 - root - INFO - Step 5910: lr=1.00E-05, loss= 1.2612 (max= 2.3376), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:22:53,054 - root - INFO - Step 5910: lr=1.00E-05, loss= 1.2612 (max= 2.3376), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:22:53,054 - root - INFO - Step 5910: lr=1.00E-05, loss= 1.2612 (max= 2.3376), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:22:53,055 - root - INFO - Step 5910: lr=1.00E-05, loss= 1.2612 (max= 2.3376), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:23:11,569 - root - INFO - Step 5920: lr=1.00E-05, loss= 1.3092 (max= 3.6851), tps=17702, mfu=36.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:23:11,569 - root - INFO - Step 5920: lr=1.00E-05, loss= 1.3092 (max= 3.6851), tps=17702, mfu=36.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:23:11,569 - root - INFO - Step 5920: lr=1.00E-05, loss= 1.3092 (max= 3.6851), tps=17702, mfu=36.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:23:11,569 - root - INFO - Step 5920: lr=1.00E-05, loss= 1.3092 (max= 3.6851), tps=17702, mfu=36.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:23:11,569 - root - INFO - Step 5920: lr=1.00E-05, loss= 1.3092 (max= 3.6851), tps=17702, mfu=36.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:23:11,569 - root - INFO - Step 5920: lr=1.00E-05, loss= 1.3092 (max= 3.6851), tps=17702, mfu=36.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:23:11,569 - root - INFO - Step 5920: lr=1.00E-05, loss= 1.3092 (max= 3.6851), tps=17702, mfu=36.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:23:11,570 - root - INFO - Step 5920: lr=1.00E-05, loss= 1.3092 (max= 3.6851), tps=17702, mfu=36.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:23:29,596 - root - INFO - Step 5930: lr=1.00E-05, loss= 1.3033 (max= 2.7662), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:23:29,596 - root - INFO - Step 5930: lr=1.00E-05, loss= 1.3033 (max= 2.7662), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:23:29,596 - root - INFO - Step 5930: lr=1.00E-05, loss= 1.3033 (max= 2.7662), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:23:29,596 - root - INFO - Step 5930: lr=1.00E-05, loss= 1.3033 (max= 2.7662), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:23:29,596 - root - INFO - Step 5930: lr=1.00E-05, loss= 1.3033 (max= 2.7662), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:23:29,596 - root - INFO - Step 5930: lr=1.00E-05, loss= 1.3033 (max= 2.7662), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:23:29,596 - root - INFO - Step 5930: lr=1.00E-05, loss= 1.3033 (max= 2.7662), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:23:29,596 - root - INFO - Step 5930: lr=1.00E-05, loss= 1.3033 (max= 2.7662), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:23:47,909 - root - INFO - Step 5940: lr=1.00E-05, loss= 1.2834 (max= 2.1133), tps=17898, mfu=37.29%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:23:47,909 - root - INFO - Step 5940: lr=1.00E-05, loss= 1.2834 (max= 2.1133), tps=17899, mfu=37.29%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:23:47,909 - root - INFO - Step 5940: lr=1.00E-05, loss= 1.2834 (max= 2.1133), tps=17899, mfu=37.29%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:23:47,909 - root - INFO - Step 5940: lr=1.00E-05, loss= 1.2834 (max= 2.1133), tps=17899, mfu=37.29%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:23:47,909 - root - INFO - Step 5940: lr=1.00E-05, loss= 1.2834 (max= 2.1133), tps=17899, mfu=37.29%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:23:47,909 - root - INFO - Step 5940: lr=1.00E-05, loss= 1.2834 (max= 2.1133), tps=17899, mfu=37.29%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:23:47,909 - root - INFO - Step 5940: lr=1.00E-05, loss= 1.2834 (max= 2.1133), tps=17899, mfu=37.29%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:23:47,909 - root - INFO - Step 5940: lr=1.00E-05, loss= 1.2834 (max= 2.1133), tps=17899, mfu=37.29%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:24:06,637 - root - INFO - Step 5950: lr=1.00E-05, loss= 1.3285 (max= 2.2815), tps=17500, mfu=36.46%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:24:06,637 - root - INFO - Step 5950: lr=1.00E-05, loss= 1.3285 (max= 2.2815), tps=17500, mfu=36.46%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:24:06,637 - root - INFO - Step 5950: lr=1.00E-05, loss= 1.3285 (max= 2.2815), tps=17500, mfu=36.46%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:24:06,637 - root - INFO - Step 5950: lr=1.00E-05, loss= 1.3285 (max= 2.2815), tps=17500, mfu=36.46%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:24:06,637 - root - INFO - Step 5950: lr=1.00E-05, loss= 1.3285 (max= 2.2815), tps=17500, mfu=36.46%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:24:06,637 - root - INFO - Step 5950: lr=1.00E-05, loss= 1.3285 (max= 2.2815), tps=17500, mfu=36.46%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:24:06,637 - root - INFO - Step 5950: lr=1.00E-05, loss= 1.3285 (max= 2.2815), tps=17500, mfu=36.46%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:24:06,637 - root - INFO - Step 5950: lr=1.00E-05, loss= 1.3285 (max= 2.2815), tps=17500, mfu=36.46%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:24:24,689 - root - INFO - Step 5960: lr=1.00E-05, loss= 1.2865 (max= 2.1691), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:24:24,689 - root - INFO - Step 5960: lr=1.00E-05, loss= 1.2865 (max= 2.1691), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:24:24,689 - root - INFO - Step 5960: lr=1.00E-05, loss= 1.2865 (max= 2.1691), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:24:24,689 - root - INFO - Step 5960: lr=1.00E-05, loss= 1.2865 (max= 2.1691), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:24:24,689 - root - INFO - Step 5960: lr=1.00E-05, loss= 1.2865 (max= 2.1691), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:24:24,689 - root - INFO - Step 5960: lr=1.00E-05, loss= 1.2865 (max= 2.1691), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:24:24,689 - root - INFO - Step 5960: lr=1.00E-05, loss= 1.2865 (max= 2.1691), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:24:24,689 - root - INFO - Step 5960: lr=1.00E-05, loss= 1.2865 (max= 2.1691), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:24:43,141 - root - INFO - Step 5970: lr=1.00E-05, loss= 1.2957 (max= 2.3788), tps=17764, mfu=37.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:24:43,141 - root - INFO - Step 5970: lr=1.00E-05, loss= 1.2957 (max= 2.3788), tps=17765, mfu=37.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:24:43,141 - root - INFO - Step 5970: lr=1.00E-05, loss= 1.2957 (max= 2.3788), tps=17765, mfu=37.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:24:43,141 - root - INFO - Step 5970: lr=1.00E-05, loss= 1.2957 (max= 2.3788), tps=17765, mfu=37.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:24:43,141 - root - INFO - Step 5970: lr=1.00E-05, loss= 1.2957 (max= 2.3788), tps=17764, mfu=37.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:24:43,141 - root - INFO - Step 5970: lr=1.00E-05, loss= 1.2957 (max= 2.3788), tps=17765, mfu=37.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:24:43,141 - root - INFO - Step 5970: lr=1.00E-05, loss= 1.2957 (max= 2.3788), tps=17765, mfu=37.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:24:43,141 - root - INFO - Step 5970: lr=1.00E-05, loss= 1.2957 (max= 2.3788), tps=17765, mfu=37.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:25:01,236 - root - INFO - Step 5980: lr=1.00E-05, loss= 1.2907 (max= 2.2541), tps=18112, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:25:01,236 - root - INFO - Step 5980: lr=1.00E-05, loss= 1.2907 (max= 2.2541), tps=18112, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:25:01,236 - root - INFO - Step 5980: lr=1.00E-05, loss= 1.2907 (max= 2.2541), tps=18112, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:25:01,236 - root - INFO - Step 5980: lr=1.00E-05, loss= 1.2907 (max= 2.2541), tps=18112, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:25:01,236 - root - INFO - Step 5980: lr=1.00E-05, loss= 1.2907 (max= 2.2541), tps=18112, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:25:01,236 - root - INFO - Step 5980: lr=1.00E-05, loss= 1.2907 (max= 2.2541), tps=18112, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:25:01,237 - root - INFO - Step 5980: lr=1.00E-05, loss= 1.2907 (max= 2.2541), tps=18112, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:25:01,237 - root - INFO - Step 5980: lr=1.00E-05, loss= 1.2907 (max= 2.2541), tps=18112, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:25:19,252 - root - INFO - Step 5990: lr=1.00E-05, loss= 1.2869 (max= 3.5005), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:25:19,252 - root - INFO - Step 5990: lr=1.00E-05, loss= 1.2869 (max= 3.5005), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:25:19,252 - root - INFO - Step 5990: lr=1.00E-05, loss= 1.2869 (max= 3.5005), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:25:19,252 - root - INFO - Step 5990: lr=1.00E-05, loss= 1.2869 (max= 3.5005), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:25:19,252 - root - INFO - Step 5990: lr=1.00E-05, loss= 1.2869 (max= 3.5005), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:25:19,252 - root - INFO - Step 5990: lr=1.00E-05, loss= 1.2869 (max= 3.5005), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:25:19,252 - root - INFO - Step 5990: lr=1.00E-05, loss= 1.2869 (max= 3.5005), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:25:19,253 - root - INFO - Step 5990: lr=1.00E-05, loss= 1.2869 (max= 3.5005), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-6000 +2025-10-24 12:25:37,235 - root - INFO - Step 6000: lr=1.00E-05, loss= 1.2647 (max= 2.2483), tps=18226, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:25:37,235 - root - INFO - Saving a full checkpoint at step 6000 +2025-10-24 12:25:37,235 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 12:25:37,235 - root - INFO - Step 6000: lr=1.00E-05, loss= 1.2647 (max= 2.2483), tps=18226, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:25:37,235 - root - INFO - Step 6000: lr=1.00E-05, loss= 1.2647 (max= 2.2483), tps=18226, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:25:37,235 - root - INFO - Saving a full checkpoint at step 6000 +2025-10-24 12:25:37,235 - root - INFO - Step 6000: lr=1.00E-05, loss= 1.2647 (max= 2.2483), tps=18226, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:25:37,235 - root - INFO - Saving a full checkpoint at step 6000 +2025-10-24 12:25:37,235 - root - INFO - Step 6000: lr=1.00E-05, loss= 1.2647 (max= 2.2483), tps=18226, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:25:37,235 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 12:25:37,235 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 12:25:37,235 - root - INFO - Saving a full checkpoint at step 6000 +2025-10-24 12:25:37,235 - root - INFO - Saving a full checkpoint at step 6000 +2025-10-24 12:25:37,235 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 12:25:37,235 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 12:25:37,235 - root - INFO - Step 6000: lr=1.00E-05, loss= 1.2647 (max= 2.2483), tps=18226, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:25:37,235 - root - INFO - Saving a full checkpoint at step 6000 +2025-10-24 12:25:37,235 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 12:25:37,235 - root - INFO - Step 6000: lr=1.00E-05, loss= 1.2647 (max= 2.2483), tps=18226, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:25:37,235 - root - INFO - Step 6000: lr=1.00E-05, loss= 1.2647 (max= 2.2483), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:25:37,236 - root - INFO - Saving a full checkpoint at step 6000 +2025-10-24 12:25:37,236 - root - INFO - Saving a full checkpoint at step 6000 +2025-10-24 12:25:37,236 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 12:25:37,236 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-6000! Save time: 4.487380504608154 +2025-10-24 12:25:51,673 - root - INFO - Finished saving the checkpoint in 14.44 seconds +2025-10-24 12:25:51,680 - root - INFO - Finished saving the checkpoint in 14.45 seconds +2025-10-24 12:25:51,681 - root - INFO - Finished saving the checkpoint in 14.45 seconds +2025-10-24 12:25:51,681 - root - INFO - Finished saving the checkpoint in 14.45 seconds +2025-10-24 12:25:51,682 - root - INFO - Finished saving the checkpoint in 14.45 seconds +2025-10-24 12:25:51,682 - root - INFO - Finished saving the checkpoint in 14.45 seconds +2025-10-24 12:25:51,682 - root - INFO - Finished saving the checkpoint in 14.45 seconds +2025-10-24 12:25:51,683 - root - INFO - Finished saving the checkpoint in 14.45 seconds +2025-10-24 12:26:09,629 - root - INFO - Step 6010: lr=1.00E-05, loss= 1.2636 (max= 2.2369), tps=10116, mfu=21.08%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 12:26:09,630 - root - INFO - Step 6010: lr=1.00E-05, loss= 1.2636 (max= 2.2369), tps=10116, mfu=21.08%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 12:26:09,630 - root - INFO - Step 6010: lr=1.00E-05, loss= 1.2636 (max= 2.2369), tps=10116, mfu=21.08%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 12:26:09,630 - root - INFO - Step 6010: lr=1.00E-05, loss= 1.2636 (max= 2.2369), tps=10116, mfu=21.08%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 12:26:09,630 - root - INFO - Step 6010: lr=1.00E-05, loss= 1.2636 (max= 2.2369), tps=10116, mfu=21.08%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 12:26:09,630 - root - INFO - Step 6010: lr=1.00E-05, loss= 1.2636 (max= 2.2369), tps=10116, mfu=21.08%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 12:26:09,630 - root - INFO - Step 6010: lr=1.00E-05, loss= 1.2636 (max= 2.2369), tps=10116, mfu=21.08%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 12:26:09,630 - root - INFO - Step 6010: lr=1.00E-05, loss= 1.2636 (max= 2.2369), tps=10116, mfu=21.08%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 12:26:27,627 - root - INFO - Step 6020: lr=1.00E-05, loss= 1.2739 (max= 3.6806), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:26:27,627 - root - INFO - Step 6020: lr=1.00E-05, loss= 1.2739 (max= 3.6806), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:26:27,628 - root - INFO - Step 6020: lr=1.00E-05, loss= 1.2739 (max= 3.6806), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:26:27,628 - root - INFO - Step 6020: lr=1.00E-05, loss= 1.2739 (max= 3.6806), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:26:27,628 - root - INFO - Step 6020: lr=1.00E-05, loss= 1.2739 (max= 3.6806), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:26:27,628 - root - INFO - Step 6020: lr=1.00E-05, loss= 1.2739 (max= 3.6806), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:26:27,628 - root - INFO - Step 6020: lr=1.00E-05, loss= 1.2739 (max= 3.6806), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:26:27,628 - root - INFO - Step 6020: lr=1.00E-05, loss= 1.2739 (max= 3.6806), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:26:45,603 - root - INFO - Step 6030: lr=1.00E-05, loss= 1.2848 (max= 2.6452), tps=18232, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:26:45,603 - root - INFO - Step 6030: lr=1.00E-05, loss= 1.2848 (max= 2.6452), tps=18233, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:26:45,603 - root - INFO - Step 6030: lr=1.00E-05, loss= 1.2848 (max= 2.6452), tps=18233, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:26:45,603 - root - INFO - Step 6030: lr=1.00E-05, loss= 1.2848 (max= 2.6452), tps=18233, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:26:45,603 - root - INFO - Step 6030: lr=1.00E-05, loss= 1.2848 (max= 2.6452), tps=18233, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:26:45,604 - root - INFO - Step 6030: lr=1.00E-05, loss= 1.2848 (max= 2.6452), tps=18233, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:26:45,604 - root - INFO - Step 6030: lr=1.00E-05, loss= 1.2848 (max= 2.6452), tps=18233, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:26:45,604 - root - INFO - Step 6030: lr=1.00E-05, loss= 1.2848 (max= 2.6452), tps=18233, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:03,649 - root - INFO - Step 6040: lr=1.00E-05, loss= 1.3027 (max= 3.5142), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:03,649 - root - INFO - Step 6040: lr=1.00E-05, loss= 1.3027 (max= 3.5142), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:03,649 - root - INFO - Step 6040: lr=1.00E-05, loss= 1.3027 (max= 3.5142), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:03,649 - root - INFO - Step 6040: lr=1.00E-05, loss= 1.3027 (max= 3.5142), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:03,649 - root - INFO - Step 6040: lr=1.00E-05, loss= 1.3027 (max= 3.5142), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:03,649 - root - INFO - Step 6040: lr=1.00E-05, loss= 1.3027 (max= 3.5142), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:03,649 - root - INFO - Step 6040: lr=1.00E-05, loss= 1.3027 (max= 3.5142), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:03,649 - root - INFO - Step 6040: lr=1.00E-05, loss= 1.3027 (max= 3.5142), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:21,713 - root - INFO - Step 6050: lr=1.00E-05, loss= 1.2709 (max= 2.6768), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:21,713 - root - INFO - Step 6050: lr=1.00E-05, loss= 1.2709 (max= 2.6768), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:21,713 - root - INFO - Step 6050: lr=1.00E-05, loss= 1.2709 (max= 2.6768), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:21,713 - root - INFO - Step 6050: lr=1.00E-05, loss= 1.2709 (max= 2.6768), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:21,713 - root - INFO - Step 6050: lr=1.00E-05, loss= 1.2709 (max= 2.6768), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:21,714 - root - INFO - Step 6050: lr=1.00E-05, loss= 1.2709 (max= 2.6768), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:21,714 - root - INFO - Step 6050: lr=1.00E-05, loss= 1.2709 (max= 2.6768), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:21,714 - root - INFO - Step 6050: lr=1.00E-05, loss= 1.2709 (max= 2.6768), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:39,752 - root - INFO - Step 6060: lr=1.00E-05, loss= 1.2842 (max= 2.2777), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:39,752 - root - INFO - Step 6060: lr=1.00E-05, loss= 1.2842 (max= 2.2777), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:39,752 - root - INFO - Step 6060: lr=1.00E-05, loss= 1.2842 (max= 2.2777), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:39,752 - root - INFO - Step 6060: lr=1.00E-05, loss= 1.2842 (max= 2.2777), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:39,752 - root - INFO - Step 6060: lr=1.00E-05, loss= 1.2842 (max= 2.2777), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:39,752 - root - INFO - Step 6060: lr=1.00E-05, loss= 1.2842 (max= 2.2777), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:39,752 - root - INFO - Step 6060: lr=1.00E-05, loss= 1.2842 (max= 2.2777), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:27:39,752 - root - INFO - Step 6060: lr=1.00E-05, loss= 1.2842 (max= 2.2777), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:28:04,798 - root - INFO - Step 6070: lr=1.00E-05, loss= 1.2221 (max= 2.2407), tps=13085, mfu=27.26%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.18s, 28.83%) +2025-10-24 12:28:04,798 - root - INFO - Step 6070: lr=1.00E-05, loss= 1.2221 (max= 2.2407), tps=13085, mfu=27.26%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.18s, 28.83%) +2025-10-24 12:28:04,798 - root - INFO - Step 6070: lr=1.00E-05, loss= 1.2221 (max= 2.2407), tps=13085, mfu=27.26%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.18s, 28.83%) +2025-10-24 12:28:04,798 - root - INFO - Step 6070: lr=1.00E-05, loss= 1.2221 (max= 2.2407), tps=13085, mfu=27.26%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.18s, 28.83%) +2025-10-24 12:28:04,798 - root - INFO - Step 6070: lr=1.00E-05, loss= 1.2221 (max= 2.2407), tps=13085, mfu=27.26%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.18s, 28.83%) +2025-10-24 12:28:04,798 - root - INFO - Step 6070: lr=1.00E-05, loss= 1.2221 (max= 2.2407), tps=13085, mfu=27.26%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.18s, 28.83%) +2025-10-24 12:28:04,798 - root - INFO - Step 6070: lr=1.00E-05, loss= 1.2221 (max= 2.2407), tps=13085, mfu=27.26%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.18s, 28.83%) +2025-10-24 12:28:04,798 - root - INFO - Step 6070: lr=1.00E-05, loss= 1.2221 (max= 2.2407), tps=13085, mfu=27.26%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.18s, 28.83%) +2025-10-24 12:28:22,805 - root - INFO - Step 6080: lr=1.00E-05, loss= 1.2806 (max= 3.6218), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:28:22,805 - root - INFO - Step 6080: lr=1.00E-05, loss= 1.2806 (max= 3.6218), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:28:22,805 - root - INFO - Step 6080: lr=1.00E-05, loss= 1.2806 (max= 3.6218), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:28:22,805 - root - INFO - Step 6080: lr=1.00E-05, loss= 1.2806 (max= 3.6218), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:28:22,805 - root - INFO - Step 6080: lr=1.00E-05, loss= 1.2806 (max= 3.6218), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:28:22,805 - root - INFO - Step 6080: lr=1.00E-05, loss= 1.2806 (max= 3.6218), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:28:22,805 - root - INFO - Step 6080: lr=1.00E-05, loss= 1.2806 (max= 3.6218), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:28:22,805 - root - INFO - Step 6080: lr=1.00E-05, loss= 1.2806 (max= 3.6218), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:28:40,854 - root - INFO - Step 6090: lr=1.00E-05, loss= 1.2867 (max= 2.2965), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:28:40,855 - root - INFO - Step 6090: lr=1.00E-05, loss= 1.2867 (max= 2.2965), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:28:40,855 - root - INFO - Step 6090: lr=1.00E-05, loss= 1.2867 (max= 2.2965), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:28:40,855 - root - INFO - Step 6090: lr=1.00E-05, loss= 1.2867 (max= 2.2965), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:28:40,855 - root - INFO - Step 6090: lr=1.00E-05, loss= 1.2867 (max= 2.2965), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:28:40,855 - root - INFO - Step 6090: lr=1.00E-05, loss= 1.2867 (max= 2.2965), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:28:40,855 - root - INFO - Step 6090: lr=1.00E-05, loss= 1.2867 (max= 2.2965), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:28:40,855 - root - INFO - Step 6090: lr=1.00E-05, loss= 1.2867 (max= 2.2965), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:29:00,429 - root - INFO - Step 6100: lr=1.00E-05, loss= 1.3198 (max= 2.6910), tps=16742, mfu=34.88%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.04s, 9.12%) +2025-10-24 12:29:00,430 - root - INFO - Step 6100: lr=1.00E-05, loss= 1.3198 (max= 2.6910), tps=16743, mfu=34.88%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.04s, 9.12%) +2025-10-24 12:29:00,430 - root - INFO - Step 6100: lr=1.00E-05, loss= 1.3198 (max= 2.6910), tps=16743, mfu=34.88%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.04s, 9.12%) +2025-10-24 12:29:00,430 - root - INFO - Step 6100: lr=1.00E-05, loss= 1.3198 (max= 2.6910), tps=16743, mfu=34.88%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.04s, 9.12%) +2025-10-24 12:29:00,430 - root - INFO - Step 6100: lr=1.00E-05, loss= 1.3198 (max= 2.6910), tps=16743, mfu=34.88%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.04s, 9.12%) +2025-10-24 12:29:00,430 - root - INFO - Step 6100: lr=1.00E-05, loss= 1.3198 (max= 2.6910), tps=16743, mfu=34.88%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.04s, 9.12%) +2025-10-24 12:29:00,430 - root - INFO - Step 6100: lr=1.00E-05, loss= 1.3198 (max= 2.6910), tps=16743, mfu=34.88%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.04s, 9.12%) +2025-10-24 12:29:00,430 - root - INFO - Step 6100: lr=1.00E-05, loss= 1.3198 (max= 2.6910), tps=16743, mfu=34.88%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.04s, 9.12%) +2025-10-24 12:29:18,637 - root - INFO - Step 6110: lr=1.00E-05, loss= 1.2836 (max= 3.3886), tps=18000, mfu=37.50%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:29:18,637 - root - INFO - Step 6110: lr=1.00E-05, loss= 1.2836 (max= 3.3886), tps=18000, mfu=37.50%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:29:18,637 - root - INFO - Step 6110: lr=1.00E-05, loss= 1.2836 (max= 3.3886), tps=18000, mfu=37.50%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:29:18,637 - root - INFO - Step 6110: lr=1.00E-05, loss= 1.2836 (max= 3.3886), tps=18000, mfu=37.50%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:29:18,637 - root - INFO - Step 6110: lr=1.00E-05, loss= 1.2836 (max= 3.3886), tps=18000, mfu=37.50%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:29:18,638 - root - INFO - Step 6110: lr=1.00E-05, loss= 1.2836 (max= 3.3886), tps=18000, mfu=37.50%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:29:18,638 - root - INFO - Step 6110: lr=1.00E-05, loss= 1.2836 (max= 3.3886), tps=18000, mfu=37.50%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:29:18,638 - root - INFO - Step 6110: lr=1.00E-05, loss= 1.2836 (max= 3.3886), tps=18000, mfu=37.50%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:29:37,207 - root - INFO - Step 6120: lr=1.00E-05, loss= 1.3030 (max= 3.4970), tps=17649, mfu=36.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:29:37,207 - root - INFO - Step 6120: lr=1.00E-05, loss= 1.3030 (max= 3.4970), tps=17649, mfu=36.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:29:37,207 - root - INFO - Step 6120: lr=1.00E-05, loss= 1.3030 (max= 3.4970), tps=17649, mfu=36.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:29:37,207 - root - INFO - Step 6120: lr=1.00E-05, loss= 1.3030 (max= 3.4970), tps=17650, mfu=36.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:29:37,207 - root - INFO - Step 6120: lr=1.00E-05, loss= 1.3030 (max= 3.4970), tps=17649, mfu=36.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:29:37,207 - root - INFO - Step 6120: lr=1.00E-05, loss= 1.3030 (max= 3.4970), tps=17650, mfu=36.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:29:37,207 - root - INFO - Step 6120: lr=1.00E-05, loss= 1.3030 (max= 3.4970), tps=17650, mfu=36.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:29:37,207 - root - INFO - Step 6120: lr=1.00E-05, loss= 1.3030 (max= 3.4970), tps=17650, mfu=36.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:29:55,790 - root - INFO - Step 6130: lr=1.00E-05, loss= 1.2980 (max= 2.4127), tps=17637, mfu=36.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:29:55,790 - root - INFO - Step 6130: lr=1.00E-05, loss= 1.2980 (max= 2.4127), tps=17637, mfu=36.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:29:55,790 - root - INFO - Step 6130: lr=1.00E-05, loss= 1.2980 (max= 2.4127), tps=17637, mfu=36.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:29:55,790 - root - INFO - Step 6130: lr=1.00E-05, loss= 1.2980 (max= 2.4127), tps=17637, mfu=36.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:29:55,790 - root - INFO - Step 6130: lr=1.00E-05, loss= 1.2980 (max= 2.4127), tps=17637, mfu=36.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:29:55,790 - root - INFO - Step 6130: lr=1.00E-05, loss= 1.2980 (max= 2.4127), tps=17637, mfu=36.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:29:55,790 - root - INFO - Step 6130: lr=1.00E-05, loss= 1.2980 (max= 2.4127), tps=17637, mfu=36.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:29:55,790 - root - INFO - Step 6130: lr=1.00E-05, loss= 1.2980 (max= 2.4127), tps=17637, mfu=36.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:30:13,816 - root - INFO - Step 6140: lr=1.00E-05, loss= 1.3189 (max= 3.3358), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:30:13,816 - root - INFO - Step 6140: lr=1.00E-05, loss= 1.3189 (max= 3.3358), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:30:13,816 - root - INFO - Step 6140: lr=1.00E-05, loss= 1.3189 (max= 3.3358), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:30:13,816 - root - INFO - Step 6140: lr=1.00E-05, loss= 1.3189 (max= 3.3358), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:30:13,816 - root - INFO - Step 6140: lr=1.00E-05, loss= 1.3189 (max= 3.3358), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:30:13,816 - root - INFO - Step 6140: lr=1.00E-05, loss= 1.3189 (max= 3.3358), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:30:13,816 - root - INFO - Step 6140: lr=1.00E-05, loss= 1.3189 (max= 3.3358), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:30:13,816 - root - INFO - Step 6140: lr=1.00E-05, loss= 1.3189 (max= 3.3358), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:30:31,848 - root - INFO - Step 6150: lr=1.00E-05, loss= 1.2945 (max= 2.3907), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:30:31,848 - root - INFO - Step 6150: lr=1.00E-05, loss= 1.2945 (max= 2.3907), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:30:31,848 - root - INFO - Step 6150: lr=1.00E-05, loss= 1.2945 (max= 2.3907), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:30:31,848 - root - INFO - Step 6150: lr=1.00E-05, loss= 1.2945 (max= 2.3907), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:30:31,848 - root - INFO - Step 6150: lr=1.00E-05, loss= 1.2945 (max= 2.3907), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:30:31,848 - root - INFO - Step 6150: lr=1.00E-05, loss= 1.2945 (max= 2.3907), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:30:31,848 - root - INFO - Step 6150: lr=1.00E-05, loss= 1.2945 (max= 2.3907), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:30:31,848 - root - INFO - Step 6150: lr=1.00E-05, loss= 1.2945 (max= 2.3907), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:30:49,857 - root - INFO - Step 6160: lr=1.00E-05, loss= 1.3039 (max= 2.2017), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:30:49,857 - root - INFO - Step 6160: lr=1.00E-05, loss= 1.3039 (max= 2.2017), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:30:49,857 - root - INFO - Step 6160: lr=1.00E-05, loss= 1.3039 (max= 2.2017), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:30:49,857 - root - INFO - Step 6160: lr=1.00E-05, loss= 1.3039 (max= 2.2017), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:30:49,857 - root - INFO - Step 6160: lr=1.00E-05, loss= 1.3039 (max= 2.2017), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:30:49,858 - root - INFO - Step 6160: lr=1.00E-05, loss= 1.3039 (max= 2.2017), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:30:49,858 - root - INFO - Step 6160: lr=1.00E-05, loss= 1.3039 (max= 2.2017), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:30:49,858 - root - INFO - Step 6160: lr=1.00E-05, loss= 1.3039 (max= 2.2017), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:31:07,862 - root - INFO - Step 6170: lr=1.00E-05, loss= 1.3154 (max= 2.5956), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:31:07,862 - root - INFO - Step 6170: lr=1.00E-05, loss= 1.3154 (max= 2.5956), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:31:07,862 - root - INFO - Step 6170: lr=1.00E-05, loss= 1.3154 (max= 2.5956), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:31:07,862 - root - INFO - Step 6170: lr=1.00E-05, loss= 1.3154 (max= 2.5956), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:31:07,862 - root - INFO - Step 6170: lr=1.00E-05, loss= 1.3154 (max= 2.5956), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:31:07,862 - root - INFO - Step 6170: lr=1.00E-05, loss= 1.3154 (max= 2.5956), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:31:07,862 - root - INFO - Step 6170: lr=1.00E-05, loss= 1.3154 (max= 2.5956), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:31:07,862 - root - INFO - Step 6170: lr=1.00E-05, loss= 1.3154 (max= 2.5956), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:31:25,893 - root - INFO - Step 6180: lr=1.00E-05, loss= 1.3185 (max= 2.1943), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:31:25,894 - root - INFO - Step 6180: lr=1.00E-05, loss= 1.3185 (max= 2.1943), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:31:25,894 - root - INFO - Step 6180: lr=1.00E-05, loss= 1.3185 (max= 2.1943), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:31:25,894 - root - INFO - Step 6180: lr=1.00E-05, loss= 1.3185 (max= 2.1943), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:31:25,894 - root - INFO - Step 6180: lr=1.00E-05, loss= 1.3185 (max= 2.1943), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:31:25,894 - root - INFO - Step 6180: lr=1.00E-05, loss= 1.3185 (max= 2.1943), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:31:25,894 - root - INFO - Step 6180: lr=1.00E-05, loss= 1.3185 (max= 2.1943), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:31:25,894 - root - INFO - Step 6180: lr=1.00E-05, loss= 1.3185 (max= 2.1943), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:31:43,933 - root - INFO - Step 6190: lr=1.00E-05, loss= 1.2984 (max= 2.2626), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:31:43,933 - root - INFO - Step 6190: lr=1.00E-05, loss= 1.2984 (max= 2.2626), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:31:43,933 - root - INFO - Step 6190: lr=1.00E-05, loss= 1.2984 (max= 2.2626), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:31:43,933 - root - INFO - Step 6190: lr=1.00E-05, loss= 1.2984 (max= 2.2626), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:31:43,934 - root - INFO - Step 6190: lr=1.00E-05, loss= 1.2984 (max= 2.2626), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:31:43,934 - root - INFO - Step 6190: lr=1.00E-05, loss= 1.2984 (max= 2.2626), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:31:43,934 - root - INFO - Step 6190: lr=1.00E-05, loss= 1.2984 (max= 2.2626), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:31:43,934 - root - INFO - Step 6190: lr=1.00E-05, loss= 1.2984 (max= 2.2626), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:32:01,975 - root - INFO - Step 6200: lr=1.00E-05, loss= 1.2828 (max= 3.6483), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:01,975 - root - INFO - Step 6200: lr=1.00E-05, loss= 1.2828 (max= 3.6483), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:01,975 - root - INFO - Step 6200: lr=1.00E-05, loss= 1.2828 (max= 3.6483), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:01,975 - root - INFO - Step 6200: lr=1.00E-05, loss= 1.2828 (max= 3.6483), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:01,975 - root - INFO - Step 6200: lr=1.00E-05, loss= 1.2828 (max= 3.6483), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:01,976 - root - INFO - Step 6200: lr=1.00E-05, loss= 1.2828 (max= 3.6483), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:01,976 - root - INFO - Step 6200: lr=1.00E-05, loss= 1.2828 (max= 3.6483), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:01,976 - root - INFO - Step 6200: lr=1.00E-05, loss= 1.2828 (max= 3.6483), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:19,973 - root - INFO - Step 6210: lr=1.00E-05, loss= 1.2636 (max= 2.3629), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:19,973 - root - INFO - Step 6210: lr=1.00E-05, loss= 1.2636 (max= 2.3629), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:19,973 - root - INFO - Step 6210: lr=1.00E-05, loss= 1.2636 (max= 2.3629), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:19,973 - root - INFO - Step 6210: lr=1.00E-05, loss= 1.2636 (max= 2.3629), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:19,973 - root - INFO - Step 6210: lr=1.00E-05, loss= 1.2636 (max= 2.3629), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:19,973 - root - INFO - Step 6210: lr=1.00E-05, loss= 1.2636 (max= 2.3629), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:19,973 - root - INFO - Step 6210: lr=1.00E-05, loss= 1.2636 (max= 2.3629), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:19,973 - root - INFO - Step 6210: lr=1.00E-05, loss= 1.2636 (max= 2.3629), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:37,973 - root - INFO - Step 6220: lr=1.00E-05, loss= 1.2658 (max= 2.4403), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:37,973 - root - INFO - Step 6220: lr=1.00E-05, loss= 1.2658 (max= 2.4403), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:37,973 - root - INFO - Step 6220: lr=1.00E-05, loss= 1.2658 (max= 2.4403), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:37,973 - root - INFO - Step 6220: lr=1.00E-05, loss= 1.2658 (max= 2.4403), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:37,973 - root - INFO - Step 6220: lr=1.00E-05, loss= 1.2658 (max= 2.4403), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:37,974 - root - INFO - Step 6220: lr=1.00E-05, loss= 1.2658 (max= 2.4403), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:37,974 - root - INFO - Step 6220: lr=1.00E-05, loss= 1.2658 (max= 2.4403), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:37,974 - root - INFO - Step 6220: lr=1.00E-05, loss= 1.2658 (max= 2.4403), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:54,388 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:6971605 +2025-10-24 12:32:55,995 - root - INFO - Step 6230: lr=1.00E-05, loss= 1.2685 (max= 2.4184), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:55,995 - root - INFO - Step 6230: lr=1.00E-05, loss= 1.2685 (max= 2.4184), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:55,995 - root - INFO - Step 6230: lr=1.00E-05, loss= 1.2685 (max= 2.4184), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:55,995 - root - INFO - Step 6230: lr=1.00E-05, loss= 1.2685 (max= 2.4184), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:55,995 - root - INFO - Step 6230: lr=1.00E-05, loss= 1.2685 (max= 2.4184), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:55,995 - root - INFO - Step 6230: lr=1.00E-05, loss= 1.2685 (max= 2.4184), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:55,995 - root - INFO - Step 6230: lr=1.00E-05, loss= 1.2685 (max= 2.4184), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:32:55,995 - root - INFO - Step 6230: lr=1.00E-05, loss= 1.2685 (max= 2.4184), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:33:15,161 - root - INFO - Step 6240: lr=1.00E-05, loss= 1.2915 (max= 2.7375), tps=17099, mfu=35.63%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.03s, 7.20%) +2025-10-24 12:33:15,162 - root - INFO - Step 6240: lr=1.00E-05, loss= 1.2915 (max= 2.7375), tps=17099, mfu=35.63%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.03s, 7.20%) +2025-10-24 12:33:15,162 - root - INFO - Step 6240: lr=1.00E-05, loss= 1.2915 (max= 2.7375), tps=17099, mfu=35.63%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.03s, 7.20%) +2025-10-24 12:33:15,162 - root - INFO - Step 6240: lr=1.00E-05, loss= 1.2915 (max= 2.7375), tps=17099, mfu=35.63%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.03s, 7.20%) +2025-10-24 12:33:15,162 - root - INFO - Step 6240: lr=1.00E-05, loss= 1.2915 (max= 2.7375), tps=17099, mfu=35.63%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.03s, 7.20%) +2025-10-24 12:33:15,162 - root - INFO - Step 6240: lr=1.00E-05, loss= 1.2915 (max= 2.7375), tps=17100, mfu=35.63%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.03s, 7.20%) +2025-10-24 12:33:15,162 - root - INFO - Step 6240: lr=1.00E-05, loss= 1.2915 (max= 2.7375), tps=17100, mfu=35.63%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.03s, 7.20%) +2025-10-24 12:33:15,162 - root - INFO - Step 6240: lr=1.00E-05, loss= 1.2915 (max= 2.7375), tps=17100, mfu=35.63%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.03s, 7.20%) +2025-10-24 12:33:33,189 - root - INFO - Step 6250: lr=1.00E-05, loss= 1.2721 (max= 2.0902), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:33:33,189 - root - INFO - Step 6250: lr=1.00E-05, loss= 1.2721 (max= 2.0902), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:33:33,189 - root - INFO - Step 6250: lr=1.00E-05, loss= 1.2721 (max= 2.0902), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:33:33,189 - root - INFO - Step 6250: lr=1.00E-05, loss= 1.2721 (max= 2.0902), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:33:33,189 - root - INFO - Step 6250: lr=1.00E-05, loss= 1.2721 (max= 2.0902), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:33:33,189 - root - INFO - Step 6250: lr=1.00E-05, loss= 1.2721 (max= 2.0902), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:33:33,189 - root - INFO - Step 6250: lr=1.00E-05, loss= 1.2721 (max= 2.0902), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:33:33,189 - root - INFO - Step 6250: lr=1.00E-05, loss= 1.2721 (max= 2.0902), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:33:51,215 - root - INFO - Step 6260: lr=1.00E-05, loss= 1.2939 (max= 3.1774), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:33:51,215 - root - INFO - Step 6260: lr=1.00E-05, loss= 1.2939 (max= 3.1774), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:33:51,215 - root - INFO - Step 6260: lr=1.00E-05, loss= 1.2939 (max= 3.1774), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:33:51,215 - root - INFO - Step 6260: lr=1.00E-05, loss= 1.2939 (max= 3.1774), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:33:51,215 - root - INFO - Step 6260: lr=1.00E-05, loss= 1.2939 (max= 3.1774), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:33:51,215 - root - INFO - Step 6260: lr=1.00E-05, loss= 1.2939 (max= 3.1774), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:33:51,215 - root - INFO - Step 6260: lr=1.00E-05, loss= 1.2939 (max= 3.1774), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:33:51,215 - root - INFO - Step 6260: lr=1.00E-05, loss= 1.2939 (max= 3.1774), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:09,252 - root - INFO - Step 6270: lr=1.00E-05, loss= 1.3059 (max= 3.6330), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:09,252 - root - INFO - Step 6270: lr=1.00E-05, loss= 1.3059 (max= 3.6330), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:09,252 - root - INFO - Step 6270: lr=1.00E-05, loss= 1.3059 (max= 3.6330), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:09,252 - root - INFO - Step 6270: lr=1.00E-05, loss= 1.3059 (max= 3.6330), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:09,252 - root - INFO - Step 6270: lr=1.00E-05, loss= 1.3059 (max= 3.6330), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:09,252 - root - INFO - Step 6270: lr=1.00E-05, loss= 1.3059 (max= 3.6330), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:09,252 - root - INFO - Step 6270: lr=1.00E-05, loss= 1.3059 (max= 3.6330), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:09,253 - root - INFO - Step 6270: lr=1.00E-05, loss= 1.3059 (max= 3.6330), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:27,272 - root - INFO - Step 6280: lr=1.00E-05, loss= 1.2710 (max= 2.8516), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:27,272 - root - INFO - Step 6280: lr=1.00E-05, loss= 1.2710 (max= 2.8516), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:27,272 - root - INFO - Step 6280: lr=1.00E-05, loss= 1.2710 (max= 2.8516), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:27,272 - root - INFO - Step 6280: lr=1.00E-05, loss= 1.2710 (max= 2.8516), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:27,272 - root - INFO - Step 6280: lr=1.00E-05, loss= 1.2710 (max= 2.8516), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:27,273 - root - INFO - Step 6280: lr=1.00E-05, loss= 1.2710 (max= 2.8516), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:27,273 - root - INFO - Step 6280: lr=1.00E-05, loss= 1.2710 (max= 2.8516), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:27,273 - root - INFO - Step 6280: lr=1.00E-05, loss= 1.2710 (max= 2.8516), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:45,322 - root - INFO - Step 6290: lr=1.00E-05, loss= 1.2947 (max= 2.6291), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:45,322 - root - INFO - Step 6290: lr=1.00E-05, loss= 1.2947 (max= 2.6291), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:45,322 - root - INFO - Step 6290: lr=1.00E-05, loss= 1.2947 (max= 2.6291), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:45,322 - root - INFO - Step 6290: lr=1.00E-05, loss= 1.2947 (max= 2.6291), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:45,322 - root - INFO - Step 6290: lr=1.00E-05, loss= 1.2947 (max= 2.6291), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:45,322 - root - INFO - Step 6290: lr=1.00E-05, loss= 1.2947 (max= 2.6291), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:45,322 - root - INFO - Step 6290: lr=1.00E-05, loss= 1.2947 (max= 2.6291), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:34:45,322 - root - INFO - Step 6290: lr=1.00E-05, loss= 1.2947 (max= 2.6291), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:03,396 - root - INFO - Step 6300: lr=1.00E-05, loss= 1.2501 (max= 2.4830), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:03,396 - root - INFO - Step 6300: lr=1.00E-05, loss= 1.2501 (max= 2.4830), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:03,396 - root - INFO - Step 6300: lr=1.00E-05, loss= 1.2501 (max= 2.4830), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:03,396 - root - INFO - Step 6300: lr=1.00E-05, loss= 1.2501 (max= 2.4830), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:03,396 - root - INFO - Step 6300: lr=1.00E-05, loss= 1.2501 (max= 2.4830), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:03,396 - root - INFO - Step 6300: lr=1.00E-05, loss= 1.2501 (max= 2.4830), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:03,396 - root - INFO - Step 6300: lr=1.00E-05, loss= 1.2501 (max= 2.4830), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:03,396 - root - INFO - Step 6300: lr=1.00E-05, loss= 1.2501 (max= 2.4830), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:21,395 - root - INFO - Step 6310: lr=1.00E-05, loss= 1.2812 (max= 2.5696), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:21,395 - root - INFO - Step 6310: lr=1.00E-05, loss= 1.2812 (max= 2.5696), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:21,395 - root - INFO - Step 6310: lr=1.00E-05, loss= 1.2812 (max= 2.5696), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:21,395 - root - INFO - Step 6310: lr=1.00E-05, loss= 1.2812 (max= 2.5696), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:21,395 - root - INFO - Step 6310: lr=1.00E-05, loss= 1.2812 (max= 2.5696), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:21,395 - root - INFO - Step 6310: lr=1.00E-05, loss= 1.2812 (max= 2.5696), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:21,395 - root - INFO - Step 6310: lr=1.00E-05, loss= 1.2812 (max= 2.5696), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:21,395 - root - INFO - Step 6310: lr=1.00E-05, loss= 1.2812 (max= 2.5696), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:39,443 - root - INFO - Step 6320: lr=1.00E-05, loss= 1.2659 (max= 2.3490), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:39,443 - root - INFO - Step 6320: lr=1.00E-05, loss= 1.2659 (max= 2.3490), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:39,443 - root - INFO - Step 6320: lr=1.00E-05, loss= 1.2659 (max= 2.3490), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:39,443 - root - INFO - Step 6320: lr=1.00E-05, loss= 1.2659 (max= 2.3490), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:39,443 - root - INFO - Step 6320: lr=1.00E-05, loss= 1.2659 (max= 2.3490), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:39,443 - root - INFO - Step 6320: lr=1.00E-05, loss= 1.2659 (max= 2.3490), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:39,443 - root - INFO - Step 6320: lr=1.00E-05, loss= 1.2659 (max= 2.3490), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:39,443 - root - INFO - Step 6320: lr=1.00E-05, loss= 1.2659 (max= 2.3490), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:57,477 - root - INFO - Step 6330: lr=1.00E-05, loss= 1.2783 (max= 2.4601), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:57,477 - root - INFO - Step 6330: lr=1.00E-05, loss= 1.2783 (max= 2.4601), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:57,477 - root - INFO - Step 6330: lr=1.00E-05, loss= 1.2783 (max= 2.4601), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:57,478 - root - INFO - Step 6330: lr=1.00E-05, loss= 1.2783 (max= 2.4601), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:57,478 - root - INFO - Step 6330: lr=1.00E-05, loss= 1.2783 (max= 2.4601), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:57,478 - root - INFO - Step 6330: lr=1.00E-05, loss= 1.2783 (max= 2.4601), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:57,478 - root - INFO - Step 6330: lr=1.00E-05, loss= 1.2783 (max= 2.4601), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:35:57,478 - root - INFO - Step 6330: lr=1.00E-05, loss= 1.2783 (max= 2.4601), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:36:15,492 - root - INFO - Step 6340: lr=1.00E-05, loss= 1.2785 (max= 3.3079), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:36:15,492 - root - INFO - Step 6340: lr=1.00E-05, loss= 1.2785 (max= 3.3079), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:36:15,492 - root - INFO - Step 6340: lr=1.00E-05, loss= 1.2785 (max= 3.3079), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:36:15,492 - root - INFO - Step 6340: lr=1.00E-05, loss= 1.2785 (max= 3.3079), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:36:15,492 - root - INFO - Step 6340: lr=1.00E-05, loss= 1.2785 (max= 3.3079), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:36:15,492 - root - INFO - Step 6340: lr=1.00E-05, loss= 1.2785 (max= 3.3079), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:36:15,492 - root - INFO - Step 6340: lr=1.00E-05, loss= 1.2785 (max= 3.3079), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:36:15,492 - root - INFO - Step 6340: lr=1.00E-05, loss= 1.2785 (max= 3.3079), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:36:33,516 - root - INFO - Step 6350: lr=1.00E-05, loss= 1.2593 (max= 2.6319), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:36:33,516 - root - INFO - Step 6350: lr=1.00E-05, loss= 1.2593 (max= 2.6319), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:36:33,516 - root - INFO - Step 6350: lr=1.00E-05, loss= 1.2593 (max= 2.6319), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:36:33,516 - root - INFO - Step 6350: lr=1.00E-05, loss= 1.2593 (max= 2.6319), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:36:33,516 - root - INFO - Step 6350: lr=1.00E-05, loss= 1.2593 (max= 2.6319), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:36:33,516 - root - INFO - Step 6350: lr=1.00E-05, loss= 1.2593 (max= 2.6319), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:36:33,517 - root - INFO - Step 6350: lr=1.00E-05, loss= 1.2593 (max= 2.6319), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:36:33,517 - root - INFO - Step 6350: lr=1.00E-05, loss= 1.2593 (max= 2.6319), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:36:51,545 - root - INFO - Step 6360: lr=1.00E-05, loss= 1.2631 (max= 3.6997), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:36:51,545 - root - INFO - Step 6360: lr=1.00E-05, loss= 1.2631 (max= 3.6997), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:36:51,545 - root - INFO - Step 6360: lr=1.00E-05, loss= 1.2631 (max= 3.6997), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:36:51,545 - root - INFO - Step 6360: lr=1.00E-05, loss= 1.2631 (max= 3.6997), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:36:51,545 - root - INFO - Step 6360: lr=1.00E-05, loss= 1.2631 (max= 3.6997), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:36:51,545 - root - INFO - Step 6360: lr=1.00E-05, loss= 1.2631 (max= 3.6997), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:36:51,545 - root - INFO - Step 6360: lr=1.00E-05, loss= 1.2631 (max= 3.6997), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:36:51,545 - root - INFO - Step 6360: lr=1.00E-05, loss= 1.2631 (max= 3.6997), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:09,565 - root - INFO - Step 6370: lr=1.00E-05, loss= 1.2678 (max= 2.4595), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:09,566 - root - INFO - Step 6370: lr=1.00E-05, loss= 1.2678 (max= 2.4595), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:09,566 - root - INFO - Step 6370: lr=1.00E-05, loss= 1.2678 (max= 2.4595), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:09,566 - root - INFO - Step 6370: lr=1.00E-05, loss= 1.2678 (max= 2.4595), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:09,566 - root - INFO - Step 6370: lr=1.00E-05, loss= 1.2678 (max= 2.4595), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:09,566 - root - INFO - Step 6370: lr=1.00E-05, loss= 1.2678 (max= 2.4595), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:09,566 - root - INFO - Step 6370: lr=1.00E-05, loss= 1.2678 (max= 2.4595), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:09,566 - root - INFO - Step 6370: lr=1.00E-05, loss= 1.2678 (max= 2.4595), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:27,599 - root - INFO - Step 6380: lr=1.00E-05, loss= 1.2608 (max= 2.4251), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:27,599 - root - INFO - Step 6380: lr=1.00E-05, loss= 1.2608 (max= 2.4251), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:27,599 - root - INFO - Step 6380: lr=1.00E-05, loss= 1.2608 (max= 2.4251), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:27,599 - root - INFO - Step 6380: lr=1.00E-05, loss= 1.2608 (max= 2.4251), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:27,599 - root - INFO - Step 6380: lr=1.00E-05, loss= 1.2608 (max= 2.4251), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:27,599 - root - INFO - Step 6380: lr=1.00E-05, loss= 1.2608 (max= 2.4251), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:27,599 - root - INFO - Step 6380: lr=1.00E-05, loss= 1.2608 (max= 2.4251), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:27,600 - root - INFO - Step 6380: lr=1.00E-05, loss= 1.2608 (max= 2.4251), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:36,802 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:7302095 +2025-10-24 12:37:45,616 - root - INFO - Step 6390: lr=1.00E-05, loss= 1.2844 (max= 3.0263), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:45,616 - root - INFO - Step 6390: lr=1.00E-05, loss= 1.2844 (max= 3.0263), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:45,616 - root - INFO - Step 6390: lr=1.00E-05, loss= 1.2844 (max= 3.0263), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:45,616 - root - INFO - Step 6390: lr=1.00E-05, loss= 1.2844 (max= 3.0263), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:45,616 - root - INFO - Step 6390: lr=1.00E-05, loss= 1.2844 (max= 3.0263), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:45,616 - root - INFO - Step 6390: lr=1.00E-05, loss= 1.2844 (max= 3.0263), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:45,616 - root - INFO - Step 6390: lr=1.00E-05, loss= 1.2844 (max= 3.0263), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:37:45,616 - root - INFO - Step 6390: lr=1.00E-05, loss= 1.2844 (max= 3.0263), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:03,635 - root - INFO - Step 6400: lr=1.00E-05, loss= 1.3012 (max= 2.5870), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:03,635 - root - INFO - Step 6400: lr=1.00E-05, loss= 1.3012 (max= 2.5870), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:03,635 - root - INFO - Step 6400: lr=1.00E-05, loss= 1.3012 (max= 2.5870), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:03,635 - root - INFO - Step 6400: lr=1.00E-05, loss= 1.3012 (max= 2.5870), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:03,635 - root - INFO - Step 6400: lr=1.00E-05, loss= 1.3012 (max= 2.5870), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:03,635 - root - INFO - Step 6400: lr=1.00E-05, loss= 1.3012 (max= 2.5870), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:03,635 - root - INFO - Step 6400: lr=1.00E-05, loss= 1.3012 (max= 2.5870), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:03,635 - root - INFO - Step 6400: lr=1.00E-05, loss= 1.3012 (max= 2.5870), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:21,638 - root - INFO - Step 6410: lr=1.00E-05, loss= 1.2425 (max= 2.4050), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:21,638 - root - INFO - Step 6410: lr=1.00E-05, loss= 1.2425 (max= 2.4050), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:21,638 - root - INFO - Step 6410: lr=1.00E-05, loss= 1.2425 (max= 2.4050), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:21,638 - root - INFO - Step 6410: lr=1.00E-05, loss= 1.2425 (max= 2.4050), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:21,639 - root - INFO - Step 6410: lr=1.00E-05, loss= 1.2425 (max= 2.4050), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:21,639 - root - INFO - Step 6410: lr=1.00E-05, loss= 1.2425 (max= 2.4050), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:21,639 - root - INFO - Step 6410: lr=1.00E-05, loss= 1.2425 (max= 2.4050), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:21,639 - root - INFO - Step 6410: lr=1.00E-05, loss= 1.2425 (max= 2.4050), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:39,675 - root - INFO - Step 6420: lr=1.00E-05, loss= 1.2399 (max= 2.2677), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:39,675 - root - INFO - Step 6420: lr=1.00E-05, loss= 1.2399 (max= 2.2677), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:39,675 - root - INFO - Step 6420: lr=1.00E-05, loss= 1.2399 (max= 2.2677), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:39,675 - root - INFO - Step 6420: lr=1.00E-05, loss= 1.2399 (max= 2.2677), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:39,675 - root - INFO - Step 6420: lr=1.00E-05, loss= 1.2399 (max= 2.2677), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:39,675 - root - INFO - Step 6420: lr=1.00E-05, loss= 1.2399 (max= 2.2677), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:39,675 - root - INFO - Step 6420: lr=1.00E-05, loss= 1.2399 (max= 2.2677), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:39,675 - root - INFO - Step 6420: lr=1.00E-05, loss= 1.2399 (max= 2.2677), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:57,696 - root - INFO - Step 6430: lr=1.00E-05, loss= 1.3082 (max= 2.4265), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:57,696 - root - INFO - Step 6430: lr=1.00E-05, loss= 1.3082 (max= 2.4265), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:57,696 - root - INFO - Step 6430: lr=1.00E-05, loss= 1.3082 (max= 2.4265), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:57,696 - root - INFO - Step 6430: lr=1.00E-05, loss= 1.3082 (max= 2.4265), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:57,696 - root - INFO - Step 6430: lr=1.00E-05, loss= 1.3082 (max= 2.4265), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:57,696 - root - INFO - Step 6430: lr=1.00E-05, loss= 1.3082 (max= 2.4265), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:57,696 - root - INFO - Step 6430: lr=1.00E-05, loss= 1.3082 (max= 2.4265), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:38:57,696 - root - INFO - Step 6430: lr=1.00E-05, loss= 1.3082 (max= 2.4265), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:15,698 - root - INFO - Step 6440: lr=1.00E-05, loss= 1.2836 (max= 2.6167), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:15,698 - root - INFO - Step 6440: lr=1.00E-05, loss= 1.2836 (max= 2.6167), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:15,698 - root - INFO - Step 6440: lr=1.00E-05, loss= 1.2836 (max= 2.6167), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:15,698 - root - INFO - Step 6440: lr=1.00E-05, loss= 1.2836 (max= 2.6167), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:15,698 - root - INFO - Step 6440: lr=1.00E-05, loss= 1.2836 (max= 2.6167), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:15,699 - root - INFO - Step 6440: lr=1.00E-05, loss= 1.2836 (max= 2.6167), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:15,699 - root - INFO - Step 6440: lr=1.00E-05, loss= 1.2836 (max= 2.6167), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:15,699 - root - INFO - Step 6440: lr=1.00E-05, loss= 1.2836 (max= 2.6167), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:33,734 - root - INFO - Step 6450: lr=1.00E-05, loss= 1.2573 (max= 2.5324), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:33,734 - root - INFO - Step 6450: lr=1.00E-05, loss= 1.2573 (max= 2.5324), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:33,734 - root - INFO - Step 6450: lr=1.00E-05, loss= 1.2573 (max= 2.5324), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:33,734 - root - INFO - Step 6450: lr=1.00E-05, loss= 1.2573 (max= 2.5324), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:33,734 - root - INFO - Step 6450: lr=1.00E-05, loss= 1.2573 (max= 2.5324), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:33,734 - root - INFO - Step 6450: lr=1.00E-05, loss= 1.2573 (max= 2.5324), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:33,734 - root - INFO - Step 6450: lr=1.00E-05, loss= 1.2573 (max= 2.5324), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:33,734 - root - INFO - Step 6450: lr=1.00E-05, loss= 1.2573 (max= 2.5324), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:51,773 - root - INFO - Step 6460: lr=1.00E-05, loss= 1.2567 (max= 2.4043), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:51,773 - root - INFO - Step 6460: lr=1.00E-05, loss= 1.2567 (max= 2.4043), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:51,773 - root - INFO - Step 6460: lr=1.00E-05, loss= 1.2567 (max= 2.4043), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:51,773 - root - INFO - Step 6460: lr=1.00E-05, loss= 1.2567 (max= 2.4043), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:51,774 - root - INFO - Step 6460: lr=1.00E-05, loss= 1.2567 (max= 2.4043), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:51,774 - root - INFO - Step 6460: lr=1.00E-05, loss= 1.2567 (max= 2.4043), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:51,774 - root - INFO - Step 6460: lr=1.00E-05, loss= 1.2567 (max= 2.4043), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:39:51,774 - root - INFO - Step 6460: lr=1.00E-05, loss= 1.2567 (max= 2.4043), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:40:09,797 - root - INFO - Step 6470: lr=1.00E-05, loss= 1.2770 (max= 2.2835), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:40:09,797 - root - INFO - Step 6470: lr=1.00E-05, loss= 1.2770 (max= 2.2835), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:40:09,797 - root - INFO - Step 6470: lr=1.00E-05, loss= 1.2770 (max= 2.2835), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:40:09,797 - root - INFO - Step 6470: lr=1.00E-05, loss= 1.2770 (max= 2.2835), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:40:09,797 - root - INFO - Step 6470: lr=1.00E-05, loss= 1.2770 (max= 2.2835), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:40:09,797 - root - INFO - Step 6470: lr=1.00E-05, loss= 1.2770 (max= 2.2835), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:40:09,797 - root - INFO - Step 6470: lr=1.00E-05, loss= 1.2770 (max= 2.2835), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:40:09,797 - root - INFO - Step 6470: lr=1.00E-05, loss= 1.2770 (max= 2.2835), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:40:27,815 - root - INFO - Step 6480: lr=1.00E-05, loss= 1.2704 (max= 2.6402), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:40:27,815 - root - INFO - Step 6480: lr=1.00E-05, loss= 1.2704 (max= 2.6402), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:40:27,815 - root - INFO - Step 6480: lr=1.00E-05, loss= 1.2704 (max= 2.6402), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:40:27,815 - root - INFO - Step 6480: lr=1.00E-05, loss= 1.2704 (max= 2.6402), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:40:27,815 - root - INFO - Step 6480: lr=1.00E-05, loss= 1.2704 (max= 2.6402), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:40:27,816 - root - INFO - Step 6480: lr=1.00E-05, loss= 1.2704 (max= 2.6402), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:40:27,816 - root - INFO - Step 6480: lr=1.00E-05, loss= 1.2704 (max= 2.6402), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:40:27,816 - root - INFO - Step 6480: lr=1.00E-05, loss= 1.2704 (max= 2.6402), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:40:45,842 - root - INFO - Step 6490: lr=1.00E-05, loss= 1.2562 (max= 2.4479), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:40:45,842 - root - INFO - Step 6490: lr=1.00E-05, loss= 1.2562 (max= 2.4479), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:40:45,842 - root - INFO - Step 6490: lr=1.00E-05, loss= 1.2562 (max= 2.4479), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:40:45,842 - root - INFO - Step 6490: lr=1.00E-05, loss= 1.2562 (max= 2.4479), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:40:45,842 - root - INFO - Step 6490: lr=1.00E-05, loss= 1.2562 (max= 2.4479), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:40:45,842 - root - INFO - Step 6490: lr=1.00E-05, loss= 1.2562 (max= 2.4479), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:40:45,842 - root - INFO - Step 6490: lr=1.00E-05, loss= 1.2562 (max= 2.4479), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:40:45,842 - root - INFO - Step 6490: lr=1.00E-05, loss= 1.2562 (max= 2.4479), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:41:03,862 - root - INFO - Step 6500: lr=1.00E-05, loss= 1.2611 (max= 2.1443), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:03,862 - root - INFO - Step 6500: lr=1.00E-05, loss= 1.2611 (max= 2.1443), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:03,862 - root - INFO - Step 6500: lr=1.00E-05, loss= 1.2611 (max= 2.1443), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:03,862 - root - INFO - Step 6500: lr=1.00E-05, loss= 1.2611 (max= 2.1443), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:03,862 - root - INFO - Step 6500: lr=1.00E-05, loss= 1.2611 (max= 2.1443), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:03,862 - root - INFO - Step 6500: lr=1.00E-05, loss= 1.2611 (max= 2.1443), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:03,862 - root - INFO - Step 6500: lr=1.00E-05, loss= 1.2611 (max= 2.1443), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:03,862 - root - INFO - Step 6500: lr=1.00E-05, loss= 1.2611 (max= 2.1443), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:21,855 - root - INFO - Step 6510: lr=1.00E-05, loss= 1.2907 (max= 3.1581), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:21,855 - root - INFO - Step 6510: lr=1.00E-05, loss= 1.2907 (max= 3.1581), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:21,855 - root - INFO - Step 6510: lr=1.00E-05, loss= 1.2907 (max= 3.1581), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:21,855 - root - INFO - Step 6510: lr=1.00E-05, loss= 1.2907 (max= 3.1581), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:21,855 - root - INFO - Step 6510: lr=1.00E-05, loss= 1.2907 (max= 3.1581), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:21,855 - root - INFO - Step 6510: lr=1.00E-05, loss= 1.2907 (max= 3.1581), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:21,855 - root - INFO - Step 6510: lr=1.00E-05, loss= 1.2907 (max= 3.1581), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:21,855 - root - INFO - Step 6510: lr=1.00E-05, loss= 1.2907 (max= 3.1581), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:39,886 - root - INFO - Step 6520: lr=1.00E-05, loss= 1.2664 (max= 2.5068), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:39,886 - root - INFO - Step 6520: lr=1.00E-05, loss= 1.2664 (max= 2.5068), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:39,886 - root - INFO - Step 6520: lr=1.00E-05, loss= 1.2664 (max= 2.5068), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:39,886 - root - INFO - Step 6520: lr=1.00E-05, loss= 1.2664 (max= 2.5068), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:39,886 - root - INFO - Step 6520: lr=1.00E-05, loss= 1.2664 (max= 2.5068), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:39,886 - root - INFO - Step 6520: lr=1.00E-05, loss= 1.2664 (max= 2.5068), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:39,886 - root - INFO - Step 6520: lr=1.00E-05, loss= 1.2664 (max= 2.5068), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:39,886 - root - INFO - Step 6520: lr=1.00E-05, loss= 1.2664 (max= 2.5068), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:41:57,911 - root - INFO - Step 6530: lr=1.00E-05, loss= 1.2594 (max= 2.1308), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:41:57,911 - root - INFO - Step 6530: lr=1.00E-05, loss= 1.2594 (max= 2.1308), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:41:57,911 - root - INFO - Step 6530: lr=1.00E-05, loss= 1.2594 (max= 2.1308), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:41:57,911 - root - INFO - Step 6530: lr=1.00E-05, loss= 1.2594 (max= 2.1308), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:41:57,911 - root - INFO - Step 6530: lr=1.00E-05, loss= 1.2594 (max= 2.1308), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:41:57,911 - root - INFO - Step 6530: lr=1.00E-05, loss= 1.2594 (max= 2.1308), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:41:57,911 - root - INFO - Step 6530: lr=1.00E-05, loss= 1.2594 (max= 2.1308), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:41:57,911 - root - INFO - Step 6530: lr=1.00E-05, loss= 1.2594 (max= 2.1308), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:42:15,914 - root - INFO - Step 6540: lr=1.00E-05, loss= 1.2882 (max= 3.2145), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:42:15,914 - root - INFO - Step 6540: lr=1.00E-05, loss= 1.2882 (max= 3.2145), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:42:15,914 - root - INFO - Step 6540: lr=1.00E-05, loss= 1.2882 (max= 3.2145), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:42:15,914 - root - INFO - Step 6540: lr=1.00E-05, loss= 1.2882 (max= 3.2145), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:42:15,914 - root - INFO - Step 6540: lr=1.00E-05, loss= 1.2882 (max= 3.2145), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:42:15,914 - root - INFO - Step 6540: lr=1.00E-05, loss= 1.2882 (max= 3.2145), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:42:15,915 - root - INFO - Step 6540: lr=1.00E-05, loss= 1.2882 (max= 3.2145), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:42:15,915 - root - INFO - Step 6540: lr=1.00E-05, loss= 1.2882 (max= 3.2145), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:42:33,945 - root - INFO - Step 6550: lr=1.00E-05, loss= 1.2774 (max= 2.4483), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:42:33,946 - root - INFO - Step 6550: lr=1.00E-05, loss= 1.2774 (max= 2.4483), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:42:33,946 - root - INFO - Step 6550: lr=1.00E-05, loss= 1.2774 (max= 2.4483), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:42:33,946 - root - INFO - Step 6550: lr=1.00E-05, loss= 1.2774 (max= 2.4483), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:42:33,946 - root - INFO - Step 6550: lr=1.00E-05, loss= 1.2774 (max= 2.4483), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:42:33,946 - root - INFO - Step 6550: lr=1.00E-05, loss= 1.2774 (max= 2.4483), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:42:33,946 - root - INFO - Step 6550: lr=1.00E-05, loss= 1.2774 (max= 2.4483), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:42:33,946 - root - INFO - Step 6550: lr=1.00E-05, loss= 1.2774 (max= 2.4483), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:42:51,973 - root - INFO - Step 6560: lr=1.00E-05, loss= 1.2659 (max= 2.4299), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:42:51,973 - root - INFO - Step 6560: lr=1.00E-05, loss= 1.2659 (max= 2.4299), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:42:51,973 - root - INFO - Step 6560: lr=1.00E-05, loss= 1.2659 (max= 2.4299), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:42:51,973 - root - INFO - Step 6560: lr=1.00E-05, loss= 1.2659 (max= 2.4299), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:42:51,973 - root - INFO - Step 6560: lr=1.00E-05, loss= 1.2659 (max= 2.4299), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:42:51,974 - root - INFO - Step 6560: lr=1.00E-05, loss= 1.2659 (max= 2.4299), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:42:51,974 - root - INFO - Step 6560: lr=1.00E-05, loss= 1.2659 (max= 2.4299), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:42:51,974 - root - INFO - Step 6560: lr=1.00E-05, loss= 1.2659 (max= 2.4299), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:43:09,996 - root - INFO - Step 6570: lr=1.00E-05, loss= 1.2871 (max= 2.7145), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:43:09,996 - root - INFO - Step 6570: lr=1.00E-05, loss= 1.2871 (max= 2.7145), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:43:09,996 - root - INFO - Step 6570: lr=1.00E-05, loss= 1.2871 (max= 2.7145), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:43:09,996 - root - INFO - Step 6570: lr=1.00E-05, loss= 1.2871 (max= 2.7145), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:43:09,996 - root - INFO - Step 6570: lr=1.00E-05, loss= 1.2871 (max= 2.7145), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:43:09,996 - root - INFO - Step 6570: lr=1.00E-05, loss= 1.2871 (max= 2.7145), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:43:09,996 - root - INFO - Step 6570: lr=1.00E-05, loss= 1.2871 (max= 2.7145), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:43:09,996 - root - INFO - Step 6570: lr=1.00E-05, loss= 1.2871 (max= 2.7145), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:43:27,984 - root - INFO - Step 6580: lr=1.00E-05, loss= 1.2867 (max= 2.7368), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:43:27,984 - root - INFO - Step 6580: lr=1.00E-05, loss= 1.2867 (max= 2.7368), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:43:27,984 - root - INFO - Step 6580: lr=1.00E-05, loss= 1.2867 (max= 2.7368), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:43:27,984 - root - INFO - Step 6580: lr=1.00E-05, loss= 1.2867 (max= 2.7368), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:43:27,984 - root - INFO - Step 6580: lr=1.00E-05, loss= 1.2867 (max= 2.7368), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:43:27,984 - root - INFO - Step 6580: lr=1.00E-05, loss= 1.2867 (max= 2.7368), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:43:27,984 - root - INFO - Step 6580: lr=1.00E-05, loss= 1.2867 (max= 2.7368), tps=18221, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:43:27,984 - root - INFO - Step 6580: lr=1.00E-05, loss= 1.2867 (max= 2.7368), tps=18221, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:43:46,041 - root - INFO - Step 6590: lr=1.00E-05, loss= 1.2938 (max= 2.3065), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:43:46,041 - root - INFO - Step 6590: lr=1.00E-05, loss= 1.2938 (max= 2.3065), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:43:46,041 - root - INFO - Step 6590: lr=1.00E-05, loss= 1.2938 (max= 2.3065), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:43:46,041 - root - INFO - Step 6590: lr=1.00E-05, loss= 1.2938 (max= 2.3065), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:43:46,041 - root - INFO - Step 6590: lr=1.00E-05, loss= 1.2938 (max= 2.3065), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:43:46,041 - root - INFO - Step 6590: lr=1.00E-05, loss= 1.2938 (max= 2.3065), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:43:46,041 - root - INFO - Step 6590: lr=1.00E-05, loss= 1.2938 (max= 2.3065), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:43:46,041 - root - INFO - Step 6590: lr=1.00E-05, loss= 1.2938 (max= 2.3065), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:04,034 - root - INFO - Step 6600: lr=1.00E-05, loss= 1.2507 (max= 2.1729), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:04,034 - root - INFO - Step 6600: lr=1.00E-05, loss= 1.2507 (max= 2.1729), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:04,034 - root - INFO - Step 6600: lr=1.00E-05, loss= 1.2507 (max= 2.1729), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:04,035 - root - INFO - Step 6600: lr=1.00E-05, loss= 1.2507 (max= 2.1729), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:04,035 - root - INFO - Step 6600: lr=1.00E-05, loss= 1.2507 (max= 2.1729), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:04,035 - root - INFO - Step 6600: lr=1.00E-05, loss= 1.2507 (max= 2.1729), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:04,035 - root - INFO - Step 6600: lr=1.00E-05, loss= 1.2507 (max= 2.1729), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:04,035 - root - INFO - Step 6600: lr=1.00E-05, loss= 1.2507 (max= 2.1729), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:22,045 - root - INFO - Step 6610: lr=1.00E-05, loss= 1.2563 (max= 2.3412), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:22,045 - root - INFO - Step 6610: lr=1.00E-05, loss= 1.2563 (max= 2.3412), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:22,045 - root - INFO - Step 6610: lr=1.00E-05, loss= 1.2563 (max= 2.3412), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:22,045 - root - INFO - Step 6610: lr=1.00E-05, loss= 1.2563 (max= 2.3412), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:22,045 - root - INFO - Step 6610: lr=1.00E-05, loss= 1.2563 (max= 2.3412), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:22,045 - root - INFO - Step 6610: lr=1.00E-05, loss= 1.2563 (max= 2.3412), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:22,046 - root - INFO - Step 6610: lr=1.00E-05, loss= 1.2563 (max= 2.3412), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:22,046 - root - INFO - Step 6610: lr=1.00E-05, loss= 1.2563 (max= 2.3412), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:40,058 - root - INFO - Step 6620: lr=1.00E-05, loss= 1.2781 (max= 2.8899), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:40,058 - root - INFO - Step 6620: lr=1.00E-05, loss= 1.2781 (max= 2.8899), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:40,058 - root - INFO - Step 6620: lr=1.00E-05, loss= 1.2781 (max= 2.8899), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:40,058 - root - INFO - Step 6620: lr=1.00E-05, loss= 1.2781 (max= 2.8899), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:40,058 - root - INFO - Step 6620: lr=1.00E-05, loss= 1.2781 (max= 2.8899), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:40,058 - root - INFO - Step 6620: lr=1.00E-05, loss= 1.2781 (max= 2.8899), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:40,058 - root - INFO - Step 6620: lr=1.00E-05, loss= 1.2781 (max= 2.8899), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:40,058 - root - INFO - Step 6620: lr=1.00E-05, loss= 1.2781 (max= 2.8899), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:58,081 - root - INFO - Step 6630: lr=1.00E-05, loss= 1.2747 (max= 3.8411), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:58,081 - root - INFO - Step 6630: lr=1.00E-05, loss= 1.2747 (max= 3.8411), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:58,081 - root - INFO - Step 6630: lr=1.00E-05, loss= 1.2747 (max= 3.8411), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:58,081 - root - INFO - Step 6630: lr=1.00E-05, loss= 1.2747 (max= 3.8411), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:58,081 - root - INFO - Step 6630: lr=1.00E-05, loss= 1.2747 (max= 3.8411), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:58,081 - root - INFO - Step 6630: lr=1.00E-05, loss= 1.2747 (max= 3.8411), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:58,081 - root - INFO - Step 6630: lr=1.00E-05, loss= 1.2747 (max= 3.8411), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:44:58,081 - root - INFO - Step 6630: lr=1.00E-05, loss= 1.2747 (max= 3.8411), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:16,123 - root - INFO - Step 6640: lr=1.00E-05, loss= 1.2857 (max= 2.3428), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:16,123 - root - INFO - Step 6640: lr=1.00E-05, loss= 1.2857 (max= 2.3428), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:16,123 - root - INFO - Step 6640: lr=1.00E-05, loss= 1.2857 (max= 2.3428), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:16,123 - root - INFO - Step 6640: lr=1.00E-05, loss= 1.2857 (max= 2.3428), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:16,123 - root - INFO - Step 6640: lr=1.00E-05, loss= 1.2857 (max= 2.3428), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:16,123 - root - INFO - Step 6640: lr=1.00E-05, loss= 1.2857 (max= 2.3428), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:16,123 - root - INFO - Step 6640: lr=1.00E-05, loss= 1.2857 (max= 2.3428), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:16,123 - root - INFO - Step 6640: lr=1.00E-05, loss= 1.2857 (max= 2.3428), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:34,182 - root - INFO - Step 6650: lr=1.00E-05, loss= 1.2590 (max= 2.4884), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:34,182 - root - INFO - Step 6650: lr=1.00E-05, loss= 1.2590 (max= 2.4884), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:34,182 - root - INFO - Step 6650: lr=1.00E-05, loss= 1.2590 (max= 2.4884), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:34,182 - root - INFO - Step 6650: lr=1.00E-05, loss= 1.2590 (max= 2.4884), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:34,182 - root - INFO - Step 6650: lr=1.00E-05, loss= 1.2590 (max= 2.4884), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:34,182 - root - INFO - Step 6650: lr=1.00E-05, loss= 1.2590 (max= 2.4884), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:34,182 - root - INFO - Step 6650: lr=1.00E-05, loss= 1.2590 (max= 2.4884), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:34,182 - root - INFO - Step 6650: lr=1.00E-05, loss= 1.2590 (max= 2.4884), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:52,227 - root - INFO - Step 6660: lr=1.00E-05, loss= 1.2764 (max= 2.6243), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:52,227 - root - INFO - Step 6660: lr=1.00E-05, loss= 1.2764 (max= 2.6243), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:52,227 - root - INFO - Step 6660: lr=1.00E-05, loss= 1.2764 (max= 2.6243), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:52,227 - root - INFO - Step 6660: lr=1.00E-05, loss= 1.2764 (max= 2.6243), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:52,227 - root - INFO - Step 6660: lr=1.00E-05, loss= 1.2764 (max= 2.6243), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:52,227 - root - INFO - Step 6660: lr=1.00E-05, loss= 1.2764 (max= 2.6243), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:52,227 - root - INFO - Step 6660: lr=1.00E-05, loss= 1.2764 (max= 2.6243), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:45:52,227 - root - INFO - Step 6660: lr=1.00E-05, loss= 1.2764 (max= 2.6243), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:46:10,262 - root - INFO - Step 6670: lr=1.00E-05, loss= 1.3008 (max= 2.6475), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:46:10,262 - root - INFO - Step 6670: lr=1.00E-05, loss= 1.3008 (max= 2.6475), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:46:10,262 - root - INFO - Step 6670: lr=1.00E-05, loss= 1.3008 (max= 2.6475), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:46:10,262 - root - INFO - Step 6670: lr=1.00E-05, loss= 1.3008 (max= 2.6475), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:46:10,262 - root - INFO - Step 6670: lr=1.00E-05, loss= 1.3008 (max= 2.6475), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:46:10,262 - root - INFO - Step 6670: lr=1.00E-05, loss= 1.3008 (max= 2.6475), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:46:10,262 - root - INFO - Step 6670: lr=1.00E-05, loss= 1.3008 (max= 2.6475), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:46:10,262 - root - INFO - Step 6670: lr=1.00E-05, loss= 1.3008 (max= 2.6475), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:46:28,266 - root - INFO - Step 6680: lr=1.00E-05, loss= 1.2972 (max= 2.3528), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:46:28,266 - root - INFO - Step 6680: lr=1.00E-05, loss= 1.2972 (max= 2.3528), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:46:28,266 - root - INFO - Step 6680: lr=1.00E-05, loss= 1.2972 (max= 2.3528), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:46:28,266 - root - INFO - Step 6680: lr=1.00E-05, loss= 1.2972 (max= 2.3528), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:46:28,266 - root - INFO - Step 6680: lr=1.00E-05, loss= 1.2972 (max= 2.3528), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:46:28,266 - root - INFO - Step 6680: lr=1.00E-05, loss= 1.2972 (max= 2.3528), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:46:28,266 - root - INFO - Step 6680: lr=1.00E-05, loss= 1.2972 (max= 2.3528), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:46:28,267 - root - INFO - Step 6680: lr=1.00E-05, loss= 1.2972 (max= 2.3528), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:46:46,335 - root - INFO - Step 6690: lr=1.00E-05, loss= 1.2850 (max= 2.3043), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:46:46,335 - root - INFO - Step 6690: lr=1.00E-05, loss= 1.2850 (max= 2.3043), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:46:46,335 - root - INFO - Step 6690: lr=1.00E-05, loss= 1.2850 (max= 2.3043), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:46:46,335 - root - INFO - Step 6690: lr=1.00E-05, loss= 1.2850 (max= 2.3043), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:46:46,335 - root - INFO - Step 6690: lr=1.00E-05, loss= 1.2850 (max= 2.3043), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:46:46,335 - root - INFO - Step 6690: lr=1.00E-05, loss= 1.2850 (max= 2.3043), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:46:46,335 - root - INFO - Step 6690: lr=1.00E-05, loss= 1.2850 (max= 2.3043), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:46:46,335 - root - INFO - Step 6690: lr=1.00E-05, loss= 1.2850 (max= 2.3043), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:47:04,342 - root - INFO - Step 6700: lr=1.00E-05, loss= 1.2512 (max= 2.0930), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:04,342 - root - INFO - Step 6700: lr=1.00E-05, loss= 1.2512 (max= 2.0930), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:04,342 - root - INFO - Step 6700: lr=1.00E-05, loss= 1.2512 (max= 2.0930), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:04,342 - root - INFO - Step 6700: lr=1.00E-05, loss= 1.2512 (max= 2.0930), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:04,342 - root - INFO - Step 6700: lr=1.00E-05, loss= 1.2512 (max= 2.0930), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:04,342 - root - INFO - Step 6700: lr=1.00E-05, loss= 1.2512 (max= 2.0930), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:04,342 - root - INFO - Step 6700: lr=1.00E-05, loss= 1.2512 (max= 2.0930), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:04,342 - root - INFO - Step 6700: lr=1.00E-05, loss= 1.2512 (max= 2.0930), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:22,330 - root - INFO - Step 6710: lr=1.00E-05, loss= 1.2721 (max= 2.5656), tps=18219, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:47:22,330 - root - INFO - Step 6710: lr=1.00E-05, loss= 1.2721 (max= 2.5656), tps=18219, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:47:22,330 - root - INFO - Step 6710: lr=1.00E-05, loss= 1.2721 (max= 2.5656), tps=18219, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:47:22,330 - root - INFO - Step 6710: lr=1.00E-05, loss= 1.2721 (max= 2.5656), tps=18219, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:47:22,330 - root - INFO - Step 6710: lr=1.00E-05, loss= 1.2721 (max= 2.5656), tps=18219, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:47:22,331 - root - INFO - Step 6710: lr=1.00E-05, loss= 1.2721 (max= 2.5656), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:47:22,331 - root - INFO - Step 6710: lr=1.00E-05, loss= 1.2721 (max= 2.5656), tps=18219, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:47:22,331 - root - INFO - Step 6710: lr=1.00E-05, loss= 1.2721 (max= 2.5656), tps=18219, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:47:40,296 - root - INFO - Step 6720: lr=1.00E-05, loss= 1.2495 (max= 2.0803), tps=18243, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:40,296 - root - INFO - Step 6720: lr=1.00E-05, loss= 1.2495 (max= 2.0803), tps=18243, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:40,296 - root - INFO - Step 6720: lr=1.00E-05, loss= 1.2495 (max= 2.0803), tps=18243, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:40,296 - root - INFO - Step 6720: lr=1.00E-05, loss= 1.2495 (max= 2.0803), tps=18243, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:40,296 - root - INFO - Step 6720: lr=1.00E-05, loss= 1.2495 (max= 2.0803), tps=18243, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:40,296 - root - INFO - Step 6720: lr=1.00E-05, loss= 1.2495 (max= 2.0803), tps=18243, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:40,296 - root - INFO - Step 6720: lr=1.00E-05, loss= 1.2495 (max= 2.0803), tps=18243, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:40,296 - root - INFO - Step 6720: lr=1.00E-05, loss= 1.2495 (max= 2.0803), tps=18243, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:58,321 - root - INFO - Step 6730: lr=1.00E-05, loss= 1.2614 (max= 2.3565), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:58,321 - root - INFO - Step 6730: lr=1.00E-05, loss= 1.2614 (max= 2.3565), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:58,321 - root - INFO - Step 6730: lr=1.00E-05, loss= 1.2614 (max= 2.3565), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:58,321 - root - INFO - Step 6730: lr=1.00E-05, loss= 1.2614 (max= 2.3565), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:58,321 - root - INFO - Step 6730: lr=1.00E-05, loss= 1.2614 (max= 2.3565), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:58,321 - root - INFO - Step 6730: lr=1.00E-05, loss= 1.2614 (max= 2.3565), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:58,321 - root - INFO - Step 6730: lr=1.00E-05, loss= 1.2614 (max= 2.3565), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:47:58,321 - root - INFO - Step 6730: lr=1.00E-05, loss= 1.2614 (max= 2.3565), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:48:16,335 - root - INFO - Step 6740: lr=1.00E-05, loss= 1.2955 (max= 2.4149), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:48:16,335 - root - INFO - Step 6740: lr=1.00E-05, loss= 1.2955 (max= 2.4149), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:48:16,335 - root - INFO - Step 6740: lr=1.00E-05, loss= 1.2955 (max= 2.4149), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:48:16,335 - root - INFO - Step 6740: lr=1.00E-05, loss= 1.2955 (max= 2.4149), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:48:16,335 - root - INFO - Step 6740: lr=1.00E-05, loss= 1.2955 (max= 2.4149), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:48:16,335 - root - INFO - Step 6740: lr=1.00E-05, loss= 1.2955 (max= 2.4149), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:48:16,335 - root - INFO - Step 6740: lr=1.00E-05, loss= 1.2955 (max= 2.4149), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:48:16,335 - root - INFO - Step 6740: lr=1.00E-05, loss= 1.2955 (max= 2.4149), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:48:41,618 - root - INFO - Step 6750: lr=1.00E-05, loss= 1.2710 (max= 2.3584), tps=12962, mfu=27.01%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.19s, 29.59%) +2025-10-24 12:48:41,618 - root - INFO - Step 6750: lr=1.00E-05, loss= 1.2710 (max= 2.3584), tps=12962, mfu=27.01%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.19s, 29.59%) +2025-10-24 12:48:41,618 - root - INFO - Step 6750: lr=1.00E-05, loss= 1.2710 (max= 2.3584), tps=12962, mfu=27.01%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.19s, 29.59%) +2025-10-24 12:48:41,618 - root - INFO - Step 6750: lr=1.00E-05, loss= 1.2710 (max= 2.3584), tps=12962, mfu=27.01%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.19s, 29.59%) +2025-10-24 12:48:41,618 - root - INFO - Step 6750: lr=1.00E-05, loss= 1.2710 (max= 2.3584), tps=12962, mfu=27.01%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.19s, 29.59%) +2025-10-24 12:48:41,618 - root - INFO - Step 6750: lr=1.00E-05, loss= 1.2710 (max= 2.3584), tps=12962, mfu=27.01%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.19s, 29.59%) +2025-10-24 12:48:41,618 - root - INFO - Step 6750: lr=1.00E-05, loss= 1.2710 (max= 2.3584), tps=12962, mfu=27.01%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.19s, 29.59%) +2025-10-24 12:48:41,618 - root - INFO - Step 6750: lr=1.00E-05, loss= 1.2710 (max= 2.3584), tps=12962, mfu=27.01%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.19s, 29.59%) +2025-10-24 12:48:59,615 - root - INFO - Step 6760: lr=1.00E-05, loss= 1.2573 (max= 2.5767), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:48:59,615 - root - INFO - Step 6760: lr=1.00E-05, loss= 1.2573 (max= 2.5767), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:48:59,615 - root - INFO - Step 6760: lr=1.00E-05, loss= 1.2573 (max= 2.5767), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:48:59,616 - root - INFO - Step 6760: lr=1.00E-05, loss= 1.2573 (max= 2.5767), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:48:59,616 - root - INFO - Step 6760: lr=1.00E-05, loss= 1.2573 (max= 2.5767), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:48:59,616 - root - INFO - Step 6760: lr=1.00E-05, loss= 1.2573 (max= 2.5767), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:48:59,616 - root - INFO - Step 6760: lr=1.00E-05, loss= 1.2573 (max= 2.5767), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:48:59,616 - root - INFO - Step 6760: lr=1.00E-05, loss= 1.2573 (max= 2.5767), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:49:17,659 - root - INFO - Step 6770: lr=1.00E-05, loss= 1.2538 (max= 2.5159), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:49:17,659 - root - INFO - Step 6770: lr=1.00E-05, loss= 1.2538 (max= 2.5159), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:49:17,659 - root - INFO - Step 6770: lr=1.00E-05, loss= 1.2538 (max= 2.5159), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:49:17,659 - root - INFO - Step 6770: lr=1.00E-05, loss= 1.2538 (max= 2.5159), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:49:17,659 - root - INFO - Step 6770: lr=1.00E-05, loss= 1.2538 (max= 2.5159), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:49:17,659 - root - INFO - Step 6770: lr=1.00E-05, loss= 1.2538 (max= 2.5159), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:49:17,659 - root - INFO - Step 6770: lr=1.00E-05, loss= 1.2538 (max= 2.5159), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:49:17,659 - root - INFO - Step 6770: lr=1.00E-05, loss= 1.2538 (max= 2.5159), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:49:35,685 - root - INFO - Step 6780: lr=1.00E-05, loss= 1.2468 (max= 2.5437), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:49:35,685 - root - INFO - Step 6780: lr=1.00E-05, loss= 1.2468 (max= 2.5437), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:49:35,685 - root - INFO - Step 6780: lr=1.00E-05, loss= 1.2468 (max= 2.5437), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:49:35,685 - root - INFO - Step 6780: lr=1.00E-05, loss= 1.2468 (max= 2.5437), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:49:35,685 - root - INFO - Step 6780: lr=1.00E-05, loss= 1.2468 (max= 2.5437), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:49:35,685 - root - INFO - Step 6780: lr=1.00E-05, loss= 1.2468 (max= 2.5437), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:49:35,685 - root - INFO - Step 6780: lr=1.00E-05, loss= 1.2468 (max= 2.5437), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:49:35,685 - root - INFO - Step 6780: lr=1.00E-05, loss= 1.2468 (max= 2.5437), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:49:53,681 - root - INFO - Step 6790: lr=1.00E-05, loss= 1.2998 (max= 2.3733), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:49:53,681 - root - INFO - Step 6790: lr=1.00E-05, loss= 1.2998 (max= 2.3733), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:49:53,681 - root - INFO - Step 6790: lr=1.00E-05, loss= 1.2998 (max= 2.3733), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:49:53,681 - root - INFO - Step 6790: lr=1.00E-05, loss= 1.2998 (max= 2.3733), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:49:53,681 - root - INFO - Step 6790: lr=1.00E-05, loss= 1.2998 (max= 2.3733), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:49:53,681 - root - INFO - Step 6790: lr=1.00E-05, loss= 1.2998 (max= 2.3733), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:49:53,681 - root - INFO - Step 6790: lr=1.00E-05, loss= 1.2998 (max= 2.3733), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:49:53,681 - root - INFO - Step 6790: lr=1.00E-05, loss= 1.2998 (max= 2.3733), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:50:11,662 - root - INFO - Step 6800: lr=1.00E-05, loss= 1.2607 (max= 2.3600), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:50:11,662 - root - INFO - Step 6800: lr=1.00E-05, loss= 1.2607 (max= 2.3600), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:50:11,662 - root - INFO - Step 6800: lr=1.00E-05, loss= 1.2607 (max= 2.3600), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:50:11,662 - root - INFO - Step 6800: lr=1.00E-05, loss= 1.2607 (max= 2.3600), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:50:11,662 - root - INFO - Step 6800: lr=1.00E-05, loss= 1.2607 (max= 2.3600), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:50:11,662 - root - INFO - Step 6800: lr=1.00E-05, loss= 1.2607 (max= 2.3600), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:50:11,662 - root - INFO - Step 6800: lr=1.00E-05, loss= 1.2607 (max= 2.3600), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:50:11,662 - root - INFO - Step 6800: lr=1.00E-05, loss= 1.2607 (max= 2.3600), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:50:29,662 - root - INFO - Step 6810: lr=1.00E-05, loss= 1.2496 (max= 2.8433), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:50:29,662 - root - INFO - Step 6810: lr=1.00E-05, loss= 1.2496 (max= 2.8433), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:50:29,662 - root - INFO - Step 6810: lr=1.00E-05, loss= 1.2496 (max= 2.8433), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:50:29,662 - root - INFO - Step 6810: lr=1.00E-05, loss= 1.2496 (max= 2.8433), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:50:29,662 - root - INFO - Step 6810: lr=1.00E-05, loss= 1.2496 (max= 2.8433), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:50:29,662 - root - INFO - Step 6810: lr=1.00E-05, loss= 1.2496 (max= 2.8433), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:50:29,662 - root - INFO - Step 6810: lr=1.00E-05, loss= 1.2496 (max= 2.8433), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:50:29,662 - root - INFO - Step 6810: lr=1.00E-05, loss= 1.2496 (max= 2.8433), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:50:47,654 - root - INFO - Step 6820: lr=1.00E-05, loss= 1.2897 (max= 3.5810), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:50:47,654 - root - INFO - Step 6820: lr=1.00E-05, loss= 1.2897 (max= 3.5810), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:50:47,654 - root - INFO - Step 6820: lr=1.00E-05, loss= 1.2897 (max= 3.5810), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:50:47,654 - root - INFO - Step 6820: lr=1.00E-05, loss= 1.2897 (max= 3.5810), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:50:47,654 - root - INFO - Step 6820: lr=1.00E-05, loss= 1.2897 (max= 3.5810), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:50:47,654 - root - INFO - Step 6820: lr=1.00E-05, loss= 1.2897 (max= 3.5810), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:50:47,654 - root - INFO - Step 6820: lr=1.00E-05, loss= 1.2897 (max= 3.5810), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:50:47,654 - root - INFO - Step 6820: lr=1.00E-05, loss= 1.2897 (max= 3.5810), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:05,670 - root - INFO - Step 6830: lr=1.00E-05, loss= 1.2794 (max= 2.3524), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:05,670 - root - INFO - Step 6830: lr=1.00E-05, loss= 1.2794 (max= 2.3524), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:05,670 - root - INFO - Step 6830: lr=1.00E-05, loss= 1.2794 (max= 2.3524), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:05,670 - root - INFO - Step 6830: lr=1.00E-05, loss= 1.2794 (max= 2.3524), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:05,670 - root - INFO - Step 6830: lr=1.00E-05, loss= 1.2794 (max= 2.3524), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:05,670 - root - INFO - Step 6830: lr=1.00E-05, loss= 1.2794 (max= 2.3524), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:05,670 - root - INFO - Step 6830: lr=1.00E-05, loss= 1.2794 (max= 2.3524), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:05,670 - root - INFO - Step 6830: lr=1.00E-05, loss= 1.2794 (max= 2.3524), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:23,695 - root - INFO - Step 6840: lr=1.00E-05, loss= 1.2865 (max= 2.3232), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:23,695 - root - INFO - Step 6840: lr=1.00E-05, loss= 1.2865 (max= 2.3232), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:23,695 - root - INFO - Step 6840: lr=1.00E-05, loss= 1.2865 (max= 2.3232), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:23,695 - root - INFO - Step 6840: lr=1.00E-05, loss= 1.2865 (max= 2.3232), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:23,695 - root - INFO - Step 6840: lr=1.00E-05, loss= 1.2865 (max= 2.3232), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:23,695 - root - INFO - Step 6840: lr=1.00E-05, loss= 1.2865 (max= 2.3232), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:23,696 - root - INFO - Step 6840: lr=1.00E-05, loss= 1.2865 (max= 2.3232), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:23,696 - root - INFO - Step 6840: lr=1.00E-05, loss= 1.2865 (max= 2.3232), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:41,722 - root - INFO - Step 6850: lr=1.00E-05, loss= 1.2904 (max= 2.6134), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:51:41,722 - root - INFO - Step 6850: lr=1.00E-05, loss= 1.2904 (max= 2.6134), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:51:41,722 - root - INFO - Step 6850: lr=1.00E-05, loss= 1.2904 (max= 2.6134), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:51:41,722 - root - INFO - Step 6850: lr=1.00E-05, loss= 1.2904 (max= 2.6134), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:51:41,722 - root - INFO - Step 6850: lr=1.00E-05, loss= 1.2904 (max= 2.6134), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:51:41,722 - root - INFO - Step 6850: lr=1.00E-05, loss= 1.2904 (max= 2.6134), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:51:41,722 - root - INFO - Step 6850: lr=1.00E-05, loss= 1.2904 (max= 2.6134), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:51:41,722 - root - INFO - Step 6850: lr=1.00E-05, loss= 1.2904 (max= 2.6134), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:51:59,750 - root - INFO - Step 6860: lr=1.00E-05, loss= 1.2655 (max= 2.2095), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:59,750 - root - INFO - Step 6860: lr=1.00E-05, loss= 1.2655 (max= 2.2095), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:59,750 - root - INFO - Step 6860: lr=1.00E-05, loss= 1.2655 (max= 2.2095), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:59,750 - root - INFO - Step 6860: lr=1.00E-05, loss= 1.2655 (max= 2.2095), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:59,750 - root - INFO - Step 6860: lr=1.00E-05, loss= 1.2655 (max= 2.2095), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:59,750 - root - INFO - Step 6860: lr=1.00E-05, loss= 1.2655 (max= 2.2095), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:59,750 - root - INFO - Step 6860: lr=1.00E-05, loss= 1.2655 (max= 2.2095), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:51:59,750 - root - INFO - Step 6860: lr=1.00E-05, loss= 1.2655 (max= 2.2095), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:17,804 - root - INFO - Step 6870: lr=1.00E-05, loss= 1.2477 (max= 2.1748), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:17,804 - root - INFO - Step 6870: lr=1.00E-05, loss= 1.2477 (max= 2.1748), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:17,804 - root - INFO - Step 6870: lr=1.00E-05, loss= 1.2477 (max= 2.1748), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:17,804 - root - INFO - Step 6870: lr=1.00E-05, loss= 1.2477 (max= 2.1748), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:17,804 - root - INFO - Step 6870: lr=1.00E-05, loss= 1.2477 (max= 2.1748), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:17,804 - root - INFO - Step 6870: lr=1.00E-05, loss= 1.2477 (max= 2.1748), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:17,805 - root - INFO - Step 6870: lr=1.00E-05, loss= 1.2477 (max= 2.1748), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:17,805 - root - INFO - Step 6870: lr=1.00E-05, loss= 1.2477 (max= 2.1748), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:35,777 - root - INFO - Step 6880: lr=1.00E-05, loss= 1.2273 (max= 2.1998), tps=18236, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:35,777 - root - INFO - Step 6880: lr=1.00E-05, loss= 1.2273 (max= 2.1998), tps=18236, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:35,777 - root - INFO - Step 6880: lr=1.00E-05, loss= 1.2273 (max= 2.1998), tps=18236, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:35,777 - root - INFO - Step 6880: lr=1.00E-05, loss= 1.2273 (max= 2.1998), tps=18236, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:35,777 - root - INFO - Step 6880: lr=1.00E-05, loss= 1.2273 (max= 2.1998), tps=18236, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:35,777 - root - INFO - Step 6880: lr=1.00E-05, loss= 1.2273 (max= 2.1998), tps=18236, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:35,777 - root - INFO - Step 6880: lr=1.00E-05, loss= 1.2273 (max= 2.1998), tps=18236, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:35,777 - root - INFO - Step 6880: lr=1.00E-05, loss= 1.2273 (max= 2.1998), tps=18236, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:53,784 - root - INFO - Step 6890: lr=1.00E-05, loss= 1.2875 (max= 2.3301), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:53,784 - root - INFO - Step 6890: lr=1.00E-05, loss= 1.2875 (max= 2.3301), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:53,784 - root - INFO - Step 6890: lr=1.00E-05, loss= 1.2875 (max= 2.3301), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:53,784 - root - INFO - Step 6890: lr=1.00E-05, loss= 1.2875 (max= 2.3301), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:53,784 - root - INFO - Step 6890: lr=1.00E-05, loss= 1.2875 (max= 2.3301), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:53,785 - root - INFO - Step 6890: lr=1.00E-05, loss= 1.2875 (max= 2.3301), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:53,785 - root - INFO - Step 6890: lr=1.00E-05, loss= 1.2875 (max= 2.3301), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:52:53,785 - root - INFO - Step 6890: lr=1.00E-05, loss= 1.2875 (max= 2.3301), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:11,817 - root - INFO - Step 6900: lr=1.00E-05, loss= 1.2804 (max= 2.5333), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:11,817 - root - INFO - Step 6900: lr=1.00E-05, loss= 1.2804 (max= 2.5333), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:11,817 - root - INFO - Step 6900: lr=1.00E-05, loss= 1.2804 (max= 2.5333), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:11,817 - root - INFO - Step 6900: lr=1.00E-05, loss= 1.2804 (max= 2.5333), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:11,817 - root - INFO - Step 6900: lr=1.00E-05, loss= 1.2804 (max= 2.5333), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:11,817 - root - INFO - Step 6900: lr=1.00E-05, loss= 1.2804 (max= 2.5333), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:11,817 - root - INFO - Step 6900: lr=1.00E-05, loss= 1.2804 (max= 2.5333), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:11,817 - root - INFO - Step 6900: lr=1.00E-05, loss= 1.2804 (max= 2.5333), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:29,836 - root - INFO - Step 6910: lr=1.00E-05, loss= 1.2628 (max= 2.6741), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:29,836 - root - INFO - Step 6910: lr=1.00E-05, loss= 1.2628 (max= 2.6741), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:29,836 - root - INFO - Step 6910: lr=1.00E-05, loss= 1.2628 (max= 2.6741), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:29,836 - root - INFO - Step 6910: lr=1.00E-05, loss= 1.2628 (max= 2.6741), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:29,836 - root - INFO - Step 6910: lr=1.00E-05, loss= 1.2628 (max= 2.6741), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:29,836 - root - INFO - Step 6910: lr=1.00E-05, loss= 1.2628 (max= 2.6741), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:29,837 - root - INFO - Step 6910: lr=1.00E-05, loss= 1.2628 (max= 2.6741), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:29,837 - root - INFO - Step 6910: lr=1.00E-05, loss= 1.2628 (max= 2.6741), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:47,863 - root - INFO - Step 6920: lr=1.00E-05, loss= 1.2693 (max= 2.3693), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:47,863 - root - INFO - Step 6920: lr=1.00E-05, loss= 1.2693 (max= 2.3693), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:47,863 - root - INFO - Step 6920: lr=1.00E-05, loss= 1.2693 (max= 2.3693), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:47,863 - root - INFO - Step 6920: lr=1.00E-05, loss= 1.2693 (max= 2.3693), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:47,863 - root - INFO - Step 6920: lr=1.00E-05, loss= 1.2693 (max= 2.3693), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:47,863 - root - INFO - Step 6920: lr=1.00E-05, loss= 1.2693 (max= 2.3693), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:47,863 - root - INFO - Step 6920: lr=1.00E-05, loss= 1.2693 (max= 2.3693), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:53:47,863 - root - INFO - Step 6920: lr=1.00E-05, loss= 1.2693 (max= 2.3693), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:05,865 - root - INFO - Step 6930: lr=1.00E-05, loss= 1.2556 (max= 2.8889), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:05,865 - root - INFO - Step 6930: lr=1.00E-05, loss= 1.2556 (max= 2.8889), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:05,865 - root - INFO - Step 6930: lr=1.00E-05, loss= 1.2556 (max= 2.8889), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:05,865 - root - INFO - Step 6930: lr=1.00E-05, loss= 1.2556 (max= 2.8889), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:05,865 - root - INFO - Step 6930: lr=1.00E-05, loss= 1.2556 (max= 2.8889), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:05,865 - root - INFO - Step 6930: lr=1.00E-05, loss= 1.2556 (max= 2.8889), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:05,865 - root - INFO - Step 6930: lr=1.00E-05, loss= 1.2556 (max= 2.8889), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:05,865 - root - INFO - Step 6930: lr=1.00E-05, loss= 1.2556 (max= 2.8889), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:23,875 - root - INFO - Step 6940: lr=1.00E-05, loss= 1.2809 (max= 2.1047), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:23,875 - root - INFO - Step 6940: lr=1.00E-05, loss= 1.2809 (max= 2.1047), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:23,875 - root - INFO - Step 6940: lr=1.00E-05, loss= 1.2809 (max= 2.1047), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:23,875 - root - INFO - Step 6940: lr=1.00E-05, loss= 1.2809 (max= 2.1047), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:23,875 - root - INFO - Step 6940: lr=1.00E-05, loss= 1.2809 (max= 2.1047), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:23,875 - root - INFO - Step 6940: lr=1.00E-05, loss= 1.2809 (max= 2.1047), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:23,876 - root - INFO - Step 6940: lr=1.00E-05, loss= 1.2809 (max= 2.1047), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:23,876 - root - INFO - Step 6940: lr=1.00E-05, loss= 1.2809 (max= 2.1047), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:41,917 - root - INFO - Step 6950: lr=1.00E-05, loss= 1.2305 (max= 2.1820), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:41,917 - root - INFO - Step 6950: lr=1.00E-05, loss= 1.2305 (max= 2.1820), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:41,917 - root - INFO - Step 6950: lr=1.00E-05, loss= 1.2305 (max= 2.1820), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:41,917 - root - INFO - Step 6950: lr=1.00E-05, loss= 1.2305 (max= 2.1820), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:41,917 - root - INFO - Step 6950: lr=1.00E-05, loss= 1.2305 (max= 2.1820), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:41,917 - root - INFO - Step 6950: lr=1.00E-05, loss= 1.2305 (max= 2.1820), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:41,917 - root - INFO - Step 6950: lr=1.00E-05, loss= 1.2305 (max= 2.1820), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:41,917 - root - INFO - Step 6950: lr=1.00E-05, loss= 1.2305 (max= 2.1820), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:59,941 - root - INFO - Step 6960: lr=1.00E-05, loss= 1.2618 (max= 2.7374), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:59,941 - root - INFO - Step 6960: lr=1.00E-05, loss= 1.2618 (max= 2.7374), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:59,941 - root - INFO - Step 6960: lr=1.00E-05, loss= 1.2618 (max= 2.7374), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:59,941 - root - INFO - Step 6960: lr=1.00E-05, loss= 1.2618 (max= 2.7374), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:59,941 - root - INFO - Step 6960: lr=1.00E-05, loss= 1.2618 (max= 2.7374), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:59,941 - root - INFO - Step 6960: lr=1.00E-05, loss= 1.2618 (max= 2.7374), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:59,941 - root - INFO - Step 6960: lr=1.00E-05, loss= 1.2618 (max= 2.7374), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:54:59,941 - root - INFO - Step 6960: lr=1.00E-05, loss= 1.2618 (max= 2.7374), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:17,960 - root - INFO - Step 6970: lr=1.00E-05, loss= 1.2662 (max= 2.2094), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:17,960 - root - INFO - Step 6970: lr=1.00E-05, loss= 1.2662 (max= 2.2094), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:17,960 - root - INFO - Step 6970: lr=1.00E-05, loss= 1.2662 (max= 2.2094), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:17,960 - root - INFO - Step 6970: lr=1.00E-05, loss= 1.2662 (max= 2.2094), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:17,960 - root - INFO - Step 6970: lr=1.00E-05, loss= 1.2662 (max= 2.2094), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:17,960 - root - INFO - Step 6970: lr=1.00E-05, loss= 1.2662 (max= 2.2094), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:17,960 - root - INFO - Step 6970: lr=1.00E-05, loss= 1.2662 (max= 2.2094), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:17,960 - root - INFO - Step 6970: lr=1.00E-05, loss= 1.2662 (max= 2.2094), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:35,978 - root - INFO - Step 6980: lr=1.00E-05, loss= 1.2507 (max= 2.2523), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:35,978 - root - INFO - Step 6980: lr=1.00E-05, loss= 1.2507 (max= 2.2523), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:35,978 - root - INFO - Step 6980: lr=1.00E-05, loss= 1.2507 (max= 2.2523), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:35,978 - root - INFO - Step 6980: lr=1.00E-05, loss= 1.2507 (max= 2.2523), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:35,978 - root - INFO - Step 6980: lr=1.00E-05, loss= 1.2507 (max= 2.2523), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:35,978 - root - INFO - Step 6980: lr=1.00E-05, loss= 1.2507 (max= 2.2523), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:35,978 - root - INFO - Step 6980: lr=1.00E-05, loss= 1.2507 (max= 2.2523), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:35,978 - root - INFO - Step 6980: lr=1.00E-05, loss= 1.2507 (max= 2.2523), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:46,996 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:6116677 +2025-10-24 12:55:53,996 - root - INFO - Step 6990: lr=1.00E-05, loss= 1.2625 (max= 2.2062), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:53,996 - root - INFO - Step 6990: lr=1.00E-05, loss= 1.2625 (max= 2.2062), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:53,996 - root - INFO - Step 6990: lr=1.00E-05, loss= 1.2625 (max= 2.2062), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:53,996 - root - INFO - Step 6990: lr=1.00E-05, loss= 1.2625 (max= 2.2062), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:53,996 - root - INFO - Step 6990: lr=1.00E-05, loss= 1.2625 (max= 2.2062), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:53,996 - root - INFO - Step 6990: lr=1.00E-05, loss= 1.2625 (max= 2.2062), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:53,996 - root - INFO - Step 6990: lr=1.00E-05, loss= 1.2625 (max= 2.2062), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:55:53,996 - root - INFO - Step 6990: lr=1.00E-05, loss= 1.2625 (max= 2.2062), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-7000 +2025-10-24 12:56:11,995 - root - INFO - Step 7000: lr=1.00E-05, loss= 1.2706 (max= 2.5870), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:56:11,995 - root - INFO - Step 7000: lr=1.00E-05, loss= 1.2706 (max= 2.5870), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:56:11,995 - root - INFO - Step 7000: lr=1.00E-05, loss= 1.2706 (max= 2.5870), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:56:11,995 - root - INFO - Saving a full checkpoint at step 7000 +2025-10-24 12:56:11,995 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 12:56:11,995 - root - INFO - Saving a full checkpoint at step 7000 +2025-10-24 12:56:11,995 - root - INFO - Saving a full checkpoint at step 7000 +2025-10-24 12:56:11,995 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 12:56:11,995 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 12:56:11,995 - root - INFO - Step 7000: lr=1.00E-05, loss= 1.2706 (max= 2.5870), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:56:11,995 - root - INFO - Step 7000: lr=1.00E-05, loss= 1.2706 (max= 2.5870), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:56:11,995 - root - INFO - Saving a full checkpoint at step 7000 +2025-10-24 12:56:11,995 - root - INFO - Step 7000: lr=1.00E-05, loss= 1.2706 (max= 2.5870), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:56:11,995 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 12:56:11,995 - root - INFO - Saving a full checkpoint at step 7000 +2025-10-24 12:56:11,995 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 12:56:11,995 - root - INFO - Saving a full checkpoint at step 7000 +2025-10-24 12:56:11,995 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 12:56:11,995 - root - INFO - Step 7000: lr=1.00E-05, loss= 1.2706 (max= 2.5870), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:56:11,996 - root - INFO - Saving a full checkpoint at step 7000 +2025-10-24 12:56:11,996 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 12:56:11,996 - root - INFO - Step 7000: lr=1.00E-05, loss= 1.2706 (max= 2.5870), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:56:11,996 - root - INFO - Saving a full checkpoint at step 7000 +2025-10-24 12:56:11,996 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-7000! Save time: 4.605895042419434 +2025-10-24 12:56:26,272 - root - INFO - Finished saving the checkpoint in 14.28 seconds +2025-10-24 12:56:26,279 - root - INFO - Finished saving the checkpoint in 14.28 seconds +2025-10-24 12:56:26,279 - root - INFO - Finished saving the checkpoint in 14.28 seconds +2025-10-24 12:56:26,280 - root - INFO - Finished saving the checkpoint in 14.28 seconds +2025-10-24 12:56:26,280 - root - INFO - Finished saving the checkpoint in 14.28 seconds +2025-10-24 12:56:26,281 - root - INFO - Finished saving the checkpoint in 14.29 seconds +2025-10-24 12:56:26,281 - root - INFO - Finished saving the checkpoint in 14.29 seconds +2025-10-24 12:56:26,282 - root - INFO - Finished saving the checkpoint in 14.29 seconds +2025-10-24 12:56:44,210 - root - INFO - Step 7010: lr=1.00E-05, loss= 1.2668 (max= 2.2839), tps=10173, mfu=21.20%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 12:56:44,210 - root - INFO - Step 7010: lr=1.00E-05, loss= 1.2668 (max= 2.2839), tps=10173, mfu=21.19%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 12:56:44,210 - root - INFO - Step 7010: lr=1.00E-05, loss= 1.2668 (max= 2.2839), tps=10173, mfu=21.19%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 12:56:44,210 - root - INFO - Step 7010: lr=1.00E-05, loss= 1.2668 (max= 2.2839), tps=10173, mfu=21.20%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 12:56:44,210 - root - INFO - Step 7010: lr=1.00E-05, loss= 1.2668 (max= 2.2839), tps=10173, mfu=21.19%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 12:56:44,210 - root - INFO - Step 7010: lr=1.00E-05, loss= 1.2668 (max= 2.2839), tps=10173, mfu=21.20%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 12:56:44,210 - root - INFO - Step 7010: lr=1.00E-05, loss= 1.2668 (max= 2.2839), tps=10173, mfu=21.20%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 12:56:44,210 - root - INFO - Step 7010: lr=1.00E-05, loss= 1.2668 (max= 2.2839), tps=10173, mfu=21.20%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 12:57:02,199 - root - INFO - Step 7020: lr=1.00E-05, loss= 1.2681 (max= 2.4726), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:57:02,200 - root - INFO - Step 7020: lr=1.00E-05, loss= 1.2681 (max= 2.4726), tps=18219, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:57:02,200 - root - INFO - Step 7020: lr=1.00E-05, loss= 1.2681 (max= 2.4726), tps=18219, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:57:02,200 - root - INFO - Step 7020: lr=1.00E-05, loss= 1.2681 (max= 2.4726), tps=18219, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:57:02,200 - root - INFO - Step 7020: lr=1.00E-05, loss= 1.2681 (max= 2.4726), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:57:02,200 - root - INFO - Step 7020: lr=1.00E-05, loss= 1.2681 (max= 2.4726), tps=18219, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:57:02,200 - root - INFO - Step 7020: lr=1.00E-05, loss= 1.2681 (max= 2.4726), tps=18219, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:57:02,200 - root - INFO - Step 7020: lr=1.00E-05, loss= 1.2681 (max= 2.4726), tps=18219, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:57:20,243 - root - INFO - Step 7030: lr=1.00E-05, loss= 1.2952 (max= 2.6708), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:57:20,244 - root - INFO - Step 7030: lr=1.00E-05, loss= 1.2952 (max= 2.6708), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:57:20,244 - root - INFO - Step 7030: lr=1.00E-05, loss= 1.2952 (max= 2.6708), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:57:20,244 - root - INFO - Step 7030: lr=1.00E-05, loss= 1.2952 (max= 2.6708), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:57:20,244 - root - INFO - Step 7030: lr=1.00E-05, loss= 1.2952 (max= 2.6708), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:57:20,244 - root - INFO - Step 7030: lr=1.00E-05, loss= 1.2952 (max= 2.6708), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:57:20,244 - root - INFO - Step 7030: lr=1.00E-05, loss= 1.2952 (max= 2.6708), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:57:20,244 - root - INFO - Step 7030: lr=1.00E-05, loss= 1.2952 (max= 2.6708), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:57:38,279 - root - INFO - Step 7040: lr=1.00E-05, loss= 1.2402 (max= 2.0636), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:57:38,280 - root - INFO - Step 7040: lr=1.00E-05, loss= 1.2402 (max= 2.0636), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:57:38,280 - root - INFO - Step 7040: lr=1.00E-05, loss= 1.2402 (max= 2.0636), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:57:38,280 - root - INFO - Step 7040: lr=1.00E-05, loss= 1.2402 (max= 2.0636), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:57:38,280 - root - INFO - Step 7040: lr=1.00E-05, loss= 1.2402 (max= 2.0636), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:57:38,280 - root - INFO - Step 7040: lr=1.00E-05, loss= 1.2402 (max= 2.0636), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:57:38,280 - root - INFO - Step 7040: lr=1.00E-05, loss= 1.2402 (max= 2.0636), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:57:38,280 - root - INFO - Step 7040: lr=1.00E-05, loss= 1.2402 (max= 2.0636), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:57:48,875 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:2104044 +2025-10-24 12:57:56,314 - root - INFO - Step 7050: lr=1.00E-05, loss= 1.2842 (max= 2.6227), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:57:56,315 - root - INFO - Step 7050: lr=1.00E-05, loss= 1.2842 (max= 2.6227), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:57:56,315 - root - INFO - Step 7050: lr=1.00E-05, loss= 1.2842 (max= 2.6227), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:57:56,315 - root - INFO - Step 7050: lr=1.00E-05, loss= 1.2842 (max= 2.6227), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:57:56,315 - root - INFO - Step 7050: lr=1.00E-05, loss= 1.2842 (max= 2.6227), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:57:56,315 - root - INFO - Step 7050: lr=1.00E-05, loss= 1.2842 (max= 2.6227), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:57:56,315 - root - INFO - Step 7050: lr=1.00E-05, loss= 1.2842 (max= 2.6227), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:57:56,315 - root - INFO - Step 7050: lr=1.00E-05, loss= 1.2842 (max= 2.6227), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:58:14,342 - root - INFO - Step 7060: lr=1.00E-05, loss= 1.2669 (max= 2.8871), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:58:14,342 - root - INFO - Step 7060: lr=1.00E-05, loss= 1.2669 (max= 2.8871), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:58:14,342 - root - INFO - Step 7060: lr=1.00E-05, loss= 1.2669 (max= 2.8871), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:58:14,342 - root - INFO - Step 7060: lr=1.00E-05, loss= 1.2669 (max= 2.8871), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:58:14,342 - root - INFO - Step 7060: lr=1.00E-05, loss= 1.2669 (max= 2.8871), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:58:14,342 - root - INFO - Step 7060: lr=1.00E-05, loss= 1.2669 (max= 2.8871), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:58:14,343 - root - INFO - Step 7060: lr=1.00E-05, loss= 1.2669 (max= 2.8871), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:58:14,343 - root - INFO - Step 7060: lr=1.00E-05, loss= 1.2669 (max= 2.8871), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:58:32,355 - root - INFO - Step 7070: lr=1.00E-05, loss= 1.2556 (max= 2.3869), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:58:32,355 - root - INFO - Step 7070: lr=1.00E-05, loss= 1.2556 (max= 2.3869), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:58:32,355 - root - INFO - Step 7070: lr=1.00E-05, loss= 1.2556 (max= 2.3869), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:58:32,355 - root - INFO - Step 7070: lr=1.00E-05, loss= 1.2556 (max= 2.3869), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:58:32,355 - root - INFO - Step 7070: lr=1.00E-05, loss= 1.2556 (max= 2.3869), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:58:32,355 - root - INFO - Step 7070: lr=1.00E-05, loss= 1.2556 (max= 2.3869), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:58:32,355 - root - INFO - Step 7070: lr=1.00E-05, loss= 1.2556 (max= 2.3869), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:58:32,355 - root - INFO - Step 7070: lr=1.00E-05, loss= 1.2556 (max= 2.3869), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:58:50,376 - root - INFO - Step 7080: lr=1.00E-05, loss= 1.3192 (max= 2.6256), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:58:50,376 - root - INFO - Step 7080: lr=1.00E-05, loss= 1.3192 (max= 2.6256), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:58:50,376 - root - INFO - Step 7080: lr=1.00E-05, loss= 1.3192 (max= 2.6256), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:58:50,376 - root - INFO - Step 7080: lr=1.00E-05, loss= 1.3192 (max= 2.6256), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:58:50,376 - root - INFO - Step 7080: lr=1.00E-05, loss= 1.3192 (max= 2.6256), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:58:50,376 - root - INFO - Step 7080: lr=1.00E-05, loss= 1.3192 (max= 2.6256), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:58:50,376 - root - INFO - Step 7080: lr=1.00E-05, loss= 1.3192 (max= 2.6256), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:58:50,376 - root - INFO - Step 7080: lr=1.00E-05, loss= 1.3192 (max= 2.6256), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:59:08,395 - root - INFO - Step 7090: lr=1.00E-05, loss= 1.2731 (max= 2.3814), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:59:08,396 - root - INFO - Step 7090: lr=1.00E-05, loss= 1.2731 (max= 2.3814), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:59:08,396 - root - INFO - Step 7090: lr=1.00E-05, loss= 1.2731 (max= 2.3814), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:59:08,396 - root - INFO - Step 7090: lr=1.00E-05, loss= 1.2731 (max= 2.3814), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:59:08,396 - root - INFO - Step 7090: lr=1.00E-05, loss= 1.2731 (max= 2.3814), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:59:08,396 - root - INFO - Step 7090: lr=1.00E-05, loss= 1.2731 (max= 2.3814), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:59:08,396 - root - INFO - Step 7090: lr=1.00E-05, loss= 1.2731 (max= 2.3814), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:59:08,396 - root - INFO - Step 7090: lr=1.00E-05, loss= 1.2731 (max= 2.3814), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:59:26,395 - root - INFO - Step 7100: lr=1.00E-05, loss= 1.3147 (max= 2.8276), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:59:26,396 - root - INFO - Step 7100: lr=1.00E-05, loss= 1.3147 (max= 2.8276), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:59:26,396 - root - INFO - Step 7100: lr=1.00E-05, loss= 1.3147 (max= 2.8276), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:59:26,396 - root - INFO - Step 7100: lr=1.00E-05, loss= 1.3147 (max= 2.8276), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:59:26,396 - root - INFO - Step 7100: lr=1.00E-05, loss= 1.3147 (max= 2.8276), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:59:26,396 - root - INFO - Step 7100: lr=1.00E-05, loss= 1.3147 (max= 2.8276), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:59:26,396 - root - INFO - Step 7100: lr=1.00E-05, loss= 1.3147 (max= 2.8276), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:59:26,396 - root - INFO - Step 7100: lr=1.00E-05, loss= 1.3147 (max= 2.8276), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 12:59:44,406 - root - INFO - Step 7110: lr=1.00E-05, loss= 1.2754 (max= 2.6284), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:59:44,406 - root - INFO - Step 7110: lr=1.00E-05, loss= 1.2754 (max= 2.6284), tps=18198, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:59:44,406 - root - INFO - Step 7110: lr=1.00E-05, loss= 1.2754 (max= 2.6284), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:59:44,406 - root - INFO - Step 7110: lr=1.00E-05, loss= 1.2754 (max= 2.6284), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:59:44,406 - root - INFO - Step 7110: lr=1.00E-05, loss= 1.2754 (max= 2.6284), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:59:44,406 - root - INFO - Step 7110: lr=1.00E-05, loss= 1.2754 (max= 2.6284), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:59:44,406 - root - INFO - Step 7110: lr=1.00E-05, loss= 1.2754 (max= 2.6284), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 12:59:44,406 - root - INFO - Step 7110: lr=1.00E-05, loss= 1.2754 (max= 2.6284), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:00:02,424 - root - INFO - Step 7120: lr=1.00E-05, loss= 1.3109 (max= 3.1127), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:00:02,424 - root - INFO - Step 7120: lr=1.00E-05, loss= 1.3109 (max= 3.1127), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:00:02,424 - root - INFO - Step 7120: lr=1.00E-05, loss= 1.3109 (max= 3.1127), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:00:02,424 - root - INFO - Step 7120: lr=1.00E-05, loss= 1.3109 (max= 3.1127), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:00:02,424 - root - INFO - Step 7120: lr=1.00E-05, loss= 1.3109 (max= 3.1127), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:00:02,424 - root - INFO - Step 7120: lr=1.00E-05, loss= 1.3109 (max= 3.1127), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:00:02,425 - root - INFO - Step 7120: lr=1.00E-05, loss= 1.3109 (max= 3.1127), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:00:02,425 - root - INFO - Step 7120: lr=1.00E-05, loss= 1.3109 (max= 3.1127), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:00:20,469 - root - INFO - Step 7130: lr=1.00E-05, loss= 1.2709 (max= 2.1427), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:00:20,470 - root - INFO - Step 7130: lr=1.00E-05, loss= 1.2709 (max= 2.1427), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:00:20,470 - root - INFO - Step 7130: lr=1.00E-05, loss= 1.2709 (max= 2.1427), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:00:20,470 - root - INFO - Step 7130: lr=1.00E-05, loss= 1.2709 (max= 2.1427), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:00:20,470 - root - INFO - Step 7130: lr=1.00E-05, loss= 1.2709 (max= 2.1427), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:00:20,470 - root - INFO - Step 7130: lr=1.00E-05, loss= 1.2709 (max= 2.1427), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:00:20,470 - root - INFO - Step 7130: lr=1.00E-05, loss= 1.2709 (max= 2.1427), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:00:20,470 - root - INFO - Step 7130: lr=1.00E-05, loss= 1.2709 (max= 2.1427), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:00:38,498 - root - INFO - Step 7140: lr=1.00E-05, loss= 1.2531 (max= 2.1964), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:00:38,499 - root - INFO - Step 7140: lr=1.00E-05, loss= 1.2531 (max= 2.1964), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:00:38,499 - root - INFO - Step 7140: lr=1.00E-05, loss= 1.2531 (max= 2.1964), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:00:38,499 - root - INFO - Step 7140: lr=1.00E-05, loss= 1.2531 (max= 2.1964), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:00:38,499 - root - INFO - Step 7140: lr=1.00E-05, loss= 1.2531 (max= 2.1964), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:00:38,499 - root - INFO - Step 7140: lr=1.00E-05, loss= 1.2531 (max= 2.1964), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:00:38,499 - root - INFO - Step 7140: lr=1.00E-05, loss= 1.2531 (max= 2.1964), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:00:38,499 - root - INFO - Step 7140: lr=1.00E-05, loss= 1.2531 (max= 2.1964), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:00:42,747 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:5597378 +2025-10-24 13:00:56,524 - root - INFO - Step 7150: lr=1.00E-05, loss= 1.2988 (max= 2.3024), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:00:56,524 - root - INFO - Step 7150: lr=1.00E-05, loss= 1.2988 (max= 2.3024), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:00:56,524 - root - INFO - Step 7150: lr=1.00E-05, loss= 1.2988 (max= 2.3024), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:00:56,524 - root - INFO - Step 7150: lr=1.00E-05, loss= 1.2988 (max= 2.3024), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:00:56,524 - root - INFO - Step 7150: lr=1.00E-05, loss= 1.2988 (max= 2.3024), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:00:56,524 - root - INFO - Step 7150: lr=1.00E-05, loss= 1.2988 (max= 2.3024), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:00:56,524 - root - INFO - Step 7150: lr=1.00E-05, loss= 1.2988 (max= 2.3024), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:00:56,524 - root - INFO - Step 7150: lr=1.00E-05, loss= 1.2988 (max= 2.3024), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:01:14,564 - root - INFO - Step 7160: lr=1.00E-05, loss= 1.2638 (max= 2.1842), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:01:14,564 - root - INFO - Step 7160: lr=1.00E-05, loss= 1.2638 (max= 2.1842), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:01:14,564 - root - INFO - Step 7160: lr=1.00E-05, loss= 1.2638 (max= 2.1842), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:01:14,564 - root - INFO - Step 7160: lr=1.00E-05, loss= 1.2638 (max= 2.1842), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:01:14,565 - root - INFO - Step 7160: lr=1.00E-05, loss= 1.2638 (max= 2.1842), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:01:14,565 - root - INFO - Step 7160: lr=1.00E-05, loss= 1.2638 (max= 2.1842), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:01:14,565 - root - INFO - Step 7160: lr=1.00E-05, loss= 1.2638 (max= 2.1842), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:01:14,565 - root - INFO - Step 7160: lr=1.00E-05, loss= 1.2638 (max= 2.1842), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:01:32,545 - root - INFO - Step 7170: lr=1.00E-05, loss= 1.2510 (max= 2.2495), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:01:32,546 - root - INFO - Step 7170: lr=1.00E-05, loss= 1.2510 (max= 2.2495), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:01:32,546 - root - INFO - Step 7170: lr=1.00E-05, loss= 1.2510 (max= 2.2495), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:01:32,546 - root - INFO - Step 7170: lr=1.00E-05, loss= 1.2510 (max= 2.2495), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:01:32,546 - root - INFO - Step 7170: lr=1.00E-05, loss= 1.2510 (max= 2.2495), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:01:32,546 - root - INFO - Step 7170: lr=1.00E-05, loss= 1.2510 (max= 2.2495), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:01:32,546 - root - INFO - Step 7170: lr=1.00E-05, loss= 1.2510 (max= 2.2495), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:01:32,546 - root - INFO - Step 7170: lr=1.00E-05, loss= 1.2510 (max= 2.2495), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:01:50,599 - root - INFO - Step 7180: lr=1.00E-05, loss= 1.2698 (max= 2.2654), tps=18154, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:01:50,599 - root - INFO - Step 7180: lr=1.00E-05, loss= 1.2698 (max= 2.2654), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:01:50,599 - root - INFO - Step 7180: lr=1.00E-05, loss= 1.2698 (max= 2.2654), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:01:50,599 - root - INFO - Step 7180: lr=1.00E-05, loss= 1.2698 (max= 2.2654), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:01:50,599 - root - INFO - Step 7180: lr=1.00E-05, loss= 1.2698 (max= 2.2654), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:01:50,599 - root - INFO - Step 7180: lr=1.00E-05, loss= 1.2698 (max= 2.2654), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:01:50,599 - root - INFO - Step 7180: lr=1.00E-05, loss= 1.2698 (max= 2.2654), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:01:50,599 - root - INFO - Step 7180: lr=1.00E-05, loss= 1.2698 (max= 2.2654), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:02:08,642 - root - INFO - Step 7190: lr=1.00E-05, loss= 1.3023 (max= 2.0901), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:02:08,642 - root - INFO - Step 7190: lr=1.00E-05, loss= 1.3023 (max= 2.0901), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:02:08,642 - root - INFO - Step 7190: lr=1.00E-05, loss= 1.3023 (max= 2.0901), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:02:08,642 - root - INFO - Step 7190: lr=1.00E-05, loss= 1.3023 (max= 2.0901), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:02:08,642 - root - INFO - Step 7190: lr=1.00E-05, loss= 1.3023 (max= 2.0901), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:02:08,642 - root - INFO - Step 7190: lr=1.00E-05, loss= 1.3023 (max= 2.0901), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:02:08,643 - root - INFO - Step 7190: lr=1.00E-05, loss= 1.3023 (max= 2.0901), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:02:08,643 - root - INFO - Step 7190: lr=1.00E-05, loss= 1.3023 (max= 2.0901), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:02:26,661 - root - INFO - Step 7200: lr=1.00E-05, loss= 1.2645 (max= 2.2956), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:02:26,661 - root - INFO - Step 7200: lr=1.00E-05, loss= 1.2645 (max= 2.2956), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:02:26,661 - root - INFO - Step 7200: lr=1.00E-05, loss= 1.2645 (max= 2.2956), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:02:26,661 - root - INFO - Step 7200: lr=1.00E-05, loss= 1.2645 (max= 2.2956), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:02:26,661 - root - INFO - Step 7200: lr=1.00E-05, loss= 1.2645 (max= 2.2956), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:02:26,661 - root - INFO - Step 7200: lr=1.00E-05, loss= 1.2645 (max= 2.2956), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:02:26,662 - root - INFO - Step 7200: lr=1.00E-05, loss= 1.2645 (max= 2.2956), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:02:26,662 - root - INFO - Step 7200: lr=1.00E-05, loss= 1.2645 (max= 2.2956), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:02:44,674 - root - INFO - Step 7210: lr=1.00E-05, loss= 1.2555 (max= 2.2660), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:02:44,674 - root - INFO - Step 7210: lr=1.00E-05, loss= 1.2555 (max= 2.2660), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:02:44,674 - root - INFO - Step 7210: lr=1.00E-05, loss= 1.2555 (max= 2.2660), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:02:44,674 - root - INFO - Step 7210: lr=1.00E-05, loss= 1.2555 (max= 2.2660), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:02:44,674 - root - INFO - Step 7210: lr=1.00E-05, loss= 1.2555 (max= 2.2660), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:02:44,674 - root - INFO - Step 7210: lr=1.00E-05, loss= 1.2555 (max= 2.2660), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:02:44,674 - root - INFO - Step 7210: lr=1.00E-05, loss= 1.2555 (max= 2.2660), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:02:44,675 - root - INFO - Step 7210: lr=1.00E-05, loss= 1.2555 (max= 2.2660), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:03:02,675 - root - INFO - Step 7220: lr=1.00E-05, loss= 1.2916 (max= 2.2254), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:02,676 - root - INFO - Step 7220: lr=1.00E-05, loss= 1.2916 (max= 2.2254), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:02,676 - root - INFO - Step 7220: lr=1.00E-05, loss= 1.2916 (max= 2.2254), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:02,676 - root - INFO - Step 7220: lr=1.00E-05, loss= 1.2916 (max= 2.2254), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:02,676 - root - INFO - Step 7220: lr=1.00E-05, loss= 1.2916 (max= 2.2254), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:02,676 - root - INFO - Step 7220: lr=1.00E-05, loss= 1.2916 (max= 2.2254), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:02,676 - root - INFO - Step 7220: lr=1.00E-05, loss= 1.2916 (max= 2.2254), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:02,676 - root - INFO - Step 7220: lr=1.00E-05, loss= 1.2916 (max= 2.2254), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:20,703 - root - INFO - Step 7230: lr=1.00E-05, loss= 1.2668 (max= 2.5306), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:20,703 - root - INFO - Step 7230: lr=1.00E-05, loss= 1.2668 (max= 2.5306), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:20,703 - root - INFO - Step 7230: lr=1.00E-05, loss= 1.2668 (max= 2.5306), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:20,703 - root - INFO - Step 7230: lr=1.00E-05, loss= 1.2668 (max= 2.5306), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:20,703 - root - INFO - Step 7230: lr=1.00E-05, loss= 1.2668 (max= 2.5306), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:20,703 - root - INFO - Step 7230: lr=1.00E-05, loss= 1.2668 (max= 2.5306), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:20,703 - root - INFO - Step 7230: lr=1.00E-05, loss= 1.2668 (max= 2.5306), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:20,703 - root - INFO - Step 7230: lr=1.00E-05, loss= 1.2668 (max= 2.5306), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:38,699 - root - INFO - Step 7240: lr=1.00E-05, loss= 1.2800 (max= 2.6232), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:38,699 - root - INFO - Step 7240: lr=1.00E-05, loss= 1.2800 (max= 2.6232), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:38,699 - root - INFO - Step 7240: lr=1.00E-05, loss= 1.2800 (max= 2.6232), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:38,699 - root - INFO - Step 7240: lr=1.00E-05, loss= 1.2800 (max= 2.6232), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:38,699 - root - INFO - Step 7240: lr=1.00E-05, loss= 1.2800 (max= 2.6232), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:38,699 - root - INFO - Step 7240: lr=1.00E-05, loss= 1.2800 (max= 2.6232), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:38,699 - root - INFO - Step 7240: lr=1.00E-05, loss= 1.2800 (max= 2.6232), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:38,699 - root - INFO - Step 7240: lr=1.00E-05, loss= 1.2800 (max= 2.6232), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:56,735 - root - INFO - Step 7250: lr=1.00E-05, loss= 1.2699 (max= 2.3855), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:56,735 - root - INFO - Step 7250: lr=1.00E-05, loss= 1.2699 (max= 2.3855), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:56,735 - root - INFO - Step 7250: lr=1.00E-05, loss= 1.2699 (max= 2.3855), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:56,735 - root - INFO - Step 7250: lr=1.00E-05, loss= 1.2699 (max= 2.3855), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:56,735 - root - INFO - Step 7250: lr=1.00E-05, loss= 1.2699 (max= 2.3855), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:56,735 - root - INFO - Step 7250: lr=1.00E-05, loss= 1.2699 (max= 2.3855), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:56,735 - root - INFO - Step 7250: lr=1.00E-05, loss= 1.2699 (max= 2.3855), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:03:56,735 - root - INFO - Step 7250: lr=1.00E-05, loss= 1.2699 (max= 2.3855), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:14,768 - root - INFO - Step 7260: lr=1.00E-05, loss= 1.2898 (max= 3.5392), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:14,769 - root - INFO - Step 7260: lr=1.00E-05, loss= 1.2898 (max= 3.5392), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:14,769 - root - INFO - Step 7260: lr=1.00E-05, loss= 1.2898 (max= 3.5392), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:14,769 - root - INFO - Step 7260: lr=1.00E-05, loss= 1.2898 (max= 3.5392), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:14,769 - root - INFO - Step 7260: lr=1.00E-05, loss= 1.2898 (max= 3.5392), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:14,769 - root - INFO - Step 7260: lr=1.00E-05, loss= 1.2898 (max= 3.5392), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:14,769 - root - INFO - Step 7260: lr=1.00E-05, loss= 1.2898 (max= 3.5392), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:14,769 - root - INFO - Step 7260: lr=1.00E-05, loss= 1.2898 (max= 3.5392), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:32,775 - root - INFO - Step 7270: lr=1.00E-05, loss= 1.2785 (max= 2.0666), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:32,776 - root - INFO - Step 7270: lr=1.00E-05, loss= 1.2785 (max= 2.0666), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:32,776 - root - INFO - Step 7270: lr=1.00E-05, loss= 1.2785 (max= 2.0666), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:32,776 - root - INFO - Step 7270: lr=1.00E-05, loss= 1.2785 (max= 2.0666), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:32,776 - root - INFO - Step 7270: lr=1.00E-05, loss= 1.2785 (max= 2.0666), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:32,776 - root - INFO - Step 7270: lr=1.00E-05, loss= 1.2785 (max= 2.0666), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:32,776 - root - INFO - Step 7270: lr=1.00E-05, loss= 1.2785 (max= 2.0666), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:32,776 - root - INFO - Step 7270: lr=1.00E-05, loss= 1.2785 (max= 2.0666), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:50,796 - root - INFO - Step 7280: lr=1.00E-05, loss= 1.2708 (max= 3.7747), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:50,796 - root - INFO - Step 7280: lr=1.00E-05, loss= 1.2708 (max= 3.7747), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:50,796 - root - INFO - Step 7280: lr=1.00E-05, loss= 1.2708 (max= 3.7747), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:50,796 - root - INFO - Step 7280: lr=1.00E-05, loss= 1.2708 (max= 3.7747), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:50,796 - root - INFO - Step 7280: lr=1.00E-05, loss= 1.2708 (max= 3.7747), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:50,796 - root - INFO - Step 7280: lr=1.00E-05, loss= 1.2708 (max= 3.7747), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:50,797 - root - INFO - Step 7280: lr=1.00E-05, loss= 1.2708 (max= 3.7747), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:04:50,797 - root - INFO - Step 7280: lr=1.00E-05, loss= 1.2708 (max= 3.7747), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:05:08,866 - root - INFO - Step 7290: lr=1.00E-05, loss= 1.2593 (max= 2.0746), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:05:08,866 - root - INFO - Step 7290: lr=1.00E-05, loss= 1.2593 (max= 2.0746), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:05:08,866 - root - INFO - Step 7290: lr=1.00E-05, loss= 1.2593 (max= 2.0746), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:05:08,866 - root - INFO - Step 7290: lr=1.00E-05, loss= 1.2593 (max= 2.0746), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:05:08,866 - root - INFO - Step 7290: lr=1.00E-05, loss= 1.2593 (max= 2.0746), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:05:08,866 - root - INFO - Step 7290: lr=1.00E-05, loss= 1.2593 (max= 2.0746), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:05:08,866 - root - INFO - Step 7290: lr=1.00E-05, loss= 1.2593 (max= 2.0746), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:05:08,866 - root - INFO - Step 7290: lr=1.00E-05, loss= 1.2593 (max= 2.0746), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:05:26,917 - root - INFO - Step 7300: lr=1.00E-05, loss= 1.2609 (max= 2.4419), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:05:26,917 - root - INFO - Step 7300: lr=1.00E-05, loss= 1.2609 (max= 2.4419), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:05:26,917 - root - INFO - Step 7300: lr=1.00E-05, loss= 1.2609 (max= 2.4419), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:05:26,917 - root - INFO - Step 7300: lr=1.00E-05, loss= 1.2609 (max= 2.4419), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:05:26,917 - root - INFO - Step 7300: lr=1.00E-05, loss= 1.2609 (max= 2.4419), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:05:26,917 - root - INFO - Step 7300: lr=1.00E-05, loss= 1.2609 (max= 2.4419), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:05:26,918 - root - INFO - Step 7300: lr=1.00E-05, loss= 1.2609 (max= 2.4419), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:05:26,918 - root - INFO - Step 7300: lr=1.00E-05, loss= 1.2609 (max= 2.4419), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:05:44,929 - root - INFO - Step 7310: lr=1.00E-05, loss= 1.2864 (max= 2.2371), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:05:44,930 - root - INFO - Step 7310: lr=1.00E-05, loss= 1.2864 (max= 2.2371), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:05:44,930 - root - INFO - Step 7310: lr=1.00E-05, loss= 1.2864 (max= 2.2371), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:05:44,930 - root - INFO - Step 7310: lr=1.00E-05, loss= 1.2864 (max= 2.2371), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:05:44,930 - root - INFO - Step 7310: lr=1.00E-05, loss= 1.2864 (max= 2.2371), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:05:44,930 - root - INFO - Step 7310: lr=1.00E-05, loss= 1.2864 (max= 2.2371), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:05:44,930 - root - INFO - Step 7310: lr=1.00E-05, loss= 1.2864 (max= 2.2371), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:05:44,930 - root - INFO - Step 7310: lr=1.00E-05, loss= 1.2864 (max= 2.2371), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:06:02,889 - root - INFO - Step 7320: lr=1.00E-05, loss= 1.2797 (max= 2.3796), tps=18249, mfu=38.02%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:06:02,889 - root - INFO - Step 7320: lr=1.00E-05, loss= 1.2797 (max= 2.3796), tps=18249, mfu=38.02%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:06:02,889 - root - INFO - Step 7320: lr=1.00E-05, loss= 1.2797 (max= 2.3796), tps=18249, mfu=38.02%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:06:02,889 - root - INFO - Step 7320: lr=1.00E-05, loss= 1.2797 (max= 2.3796), tps=18249, mfu=38.02%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:06:02,889 - root - INFO - Step 7320: lr=1.00E-05, loss= 1.2797 (max= 2.3796), tps=18249, mfu=38.02%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:06:02,889 - root - INFO - Step 7320: lr=1.00E-05, loss= 1.2797 (max= 2.3796), tps=18249, mfu=38.02%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:06:02,889 - root - INFO - Step 7320: lr=1.00E-05, loss= 1.2797 (max= 2.3796), tps=18250, mfu=38.02%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:06:02,889 - root - INFO - Step 7320: lr=1.00E-05, loss= 1.2797 (max= 2.3796), tps=18250, mfu=38.02%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:06:20,916 - root - INFO - Step 7330: lr=1.00E-05, loss= 1.2755 (max= 2.2882), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:06:20,917 - root - INFO - Step 7330: lr=1.00E-05, loss= 1.2755 (max= 2.2882), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:06:20,917 - root - INFO - Step 7330: lr=1.00E-05, loss= 1.2755 (max= 2.2882), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:06:20,917 - root - INFO - Step 7330: lr=1.00E-05, loss= 1.2755 (max= 2.2882), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:06:20,917 - root - INFO - Step 7330: lr=1.00E-05, loss= 1.2755 (max= 2.2882), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:06:20,917 - root - INFO - Step 7330: lr=1.00E-05, loss= 1.2755 (max= 2.2882), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:06:20,917 - root - INFO - Step 7330: lr=1.00E-05, loss= 1.2755 (max= 2.2882), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:06:20,917 - root - INFO - Step 7330: lr=1.00E-05, loss= 1.2755 (max= 2.2882), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:06:38,959 - root - INFO - Step 7340: lr=1.00E-05, loss= 1.2693 (max= 2.1361), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:06:38,959 - root - INFO - Step 7340: lr=1.00E-05, loss= 1.2693 (max= 2.1361), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:06:38,959 - root - INFO - Step 7340: lr=1.00E-05, loss= 1.2693 (max= 2.1361), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:06:38,959 - root - INFO - Step 7340: lr=1.00E-05, loss= 1.2693 (max= 2.1361), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:06:38,959 - root - INFO - Step 7340: lr=1.00E-05, loss= 1.2693 (max= 2.1361), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:06:38,959 - root - INFO - Step 7340: lr=1.00E-05, loss= 1.2693 (max= 2.1361), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:06:38,959 - root - INFO - Step 7340: lr=1.00E-05, loss= 1.2693 (max= 2.1361), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:06:38,959 - root - INFO - Step 7340: lr=1.00E-05, loss= 1.2693 (max= 2.1361), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:06:56,962 - root - INFO - Step 7350: lr=1.00E-05, loss= 1.2696 (max= 3.4347), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:06:56,963 - root - INFO - Step 7350: lr=1.00E-05, loss= 1.2696 (max= 3.4347), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:06:56,963 - root - INFO - Step 7350: lr=1.00E-05, loss= 1.2696 (max= 3.4347), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:06:56,963 - root - INFO - Step 7350: lr=1.00E-05, loss= 1.2696 (max= 3.4347), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:06:56,963 - root - INFO - Step 7350: lr=1.00E-05, loss= 1.2696 (max= 3.4347), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:06:56,963 - root - INFO - Step 7350: lr=1.00E-05, loss= 1.2696 (max= 3.4347), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:06:56,963 - root - INFO - Step 7350: lr=1.00E-05, loss= 1.2696 (max= 3.4347), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:06:56,963 - root - INFO - Step 7350: lr=1.00E-05, loss= 1.2696 (max= 3.4347), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:07:14,986 - root - INFO - Step 7360: lr=1.00E-05, loss= 1.2565 (max= 2.2922), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:07:14,986 - root - INFO - Step 7360: lr=1.00E-05, loss= 1.2565 (max= 2.2922), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:07:14,986 - root - INFO - Step 7360: lr=1.00E-05, loss= 1.2565 (max= 2.2922), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:07:14,986 - root - INFO - Step 7360: lr=1.00E-05, loss= 1.2565 (max= 2.2922), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:07:14,986 - root - INFO - Step 7360: lr=1.00E-05, loss= 1.2565 (max= 2.2922), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:07:14,986 - root - INFO - Step 7360: lr=1.00E-05, loss= 1.2565 (max= 2.2922), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:07:14,986 - root - INFO - Step 7360: lr=1.00E-05, loss= 1.2565 (max= 2.2922), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:07:14,986 - root - INFO - Step 7360: lr=1.00E-05, loss= 1.2565 (max= 2.2922), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:07:32,997 - root - INFO - Step 7370: lr=1.00E-05, loss= 1.2614 (max= 2.6151), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:07:32,998 - root - INFO - Step 7370: lr=1.00E-05, loss= 1.2614 (max= 2.6151), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:07:32,998 - root - INFO - Step 7370: lr=1.00E-05, loss= 1.2614 (max= 2.6151), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:07:32,998 - root - INFO - Step 7370: lr=1.00E-05, loss= 1.2614 (max= 2.6151), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:07:32,998 - root - INFO - Step 7370: lr=1.00E-05, loss= 1.2614 (max= 2.6151), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:07:32,998 - root - INFO - Step 7370: lr=1.00E-05, loss= 1.2614 (max= 2.6151), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:07:32,998 - root - INFO - Step 7370: lr=1.00E-05, loss= 1.2614 (max= 2.6151), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:07:32,998 - root - INFO - Step 7370: lr=1.00E-05, loss= 1.2614 (max= 2.6151), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:07:51,016 - root - INFO - Step 7380: lr=1.00E-05, loss= 1.2809 (max= 2.1434), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:07:51,017 - root - INFO - Step 7380: lr=1.00E-05, loss= 1.2809 (max= 2.1434), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:07:51,017 - root - INFO - Step 7380: lr=1.00E-05, loss= 1.2809 (max= 2.1434), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:07:51,017 - root - INFO - Step 7380: lr=1.00E-05, loss= 1.2809 (max= 2.1434), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:07:51,017 - root - INFO - Step 7380: lr=1.00E-05, loss= 1.2809 (max= 2.1434), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:07:51,017 - root - INFO - Step 7380: lr=1.00E-05, loss= 1.2809 (max= 2.1434), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:07:51,017 - root - INFO - Step 7380: lr=1.00E-05, loss= 1.2809 (max= 2.1434), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:07:51,017 - root - INFO - Step 7380: lr=1.00E-05, loss= 1.2809 (max= 2.1434), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:09,018 - root - INFO - Step 7390: lr=1.00E-05, loss= 1.2934 (max= 2.5689), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:09,018 - root - INFO - Step 7390: lr=1.00E-05, loss= 1.2934 (max= 2.5689), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:09,018 - root - INFO - Step 7390: lr=1.00E-05, loss= 1.2934 (max= 2.5689), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:09,018 - root - INFO - Step 7390: lr=1.00E-05, loss= 1.2934 (max= 2.5689), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:09,018 - root - INFO - Step 7390: lr=1.00E-05, loss= 1.2934 (max= 2.5689), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:09,018 - root - INFO - Step 7390: lr=1.00E-05, loss= 1.2934 (max= 2.5689), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:09,018 - root - INFO - Step 7390: lr=1.00E-05, loss= 1.2934 (max= 2.5689), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:09,019 - root - INFO - Step 7390: lr=1.00E-05, loss= 1.2934 (max= 2.5689), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:27,040 - root - INFO - Step 7400: lr=1.00E-05, loss= 1.2364 (max= 2.1073), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:27,040 - root - INFO - Step 7400: lr=1.00E-05, loss= 1.2364 (max= 2.1073), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:27,040 - root - INFO - Step 7400: lr=1.00E-05, loss= 1.2364 (max= 2.1073), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:27,040 - root - INFO - Step 7400: lr=1.00E-05, loss= 1.2364 (max= 2.1073), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:27,040 - root - INFO - Step 7400: lr=1.00E-05, loss= 1.2364 (max= 2.1073), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:27,040 - root - INFO - Step 7400: lr=1.00E-05, loss= 1.2364 (max= 2.1073), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:27,041 - root - INFO - Step 7400: lr=1.00E-05, loss= 1.2364 (max= 2.1073), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:27,041 - root - INFO - Step 7400: lr=1.00E-05, loss= 1.2364 (max= 2.1073), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:45,059 - root - INFO - Step 7410: lr=1.00E-05, loss= 1.2509 (max= 2.6686), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:45,060 - root - INFO - Step 7410: lr=1.00E-05, loss= 1.2509 (max= 2.6686), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:45,060 - root - INFO - Step 7410: lr=1.00E-05, loss= 1.2509 (max= 2.6686), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:45,060 - root - INFO - Step 7410: lr=1.00E-05, loss= 1.2509 (max= 2.6686), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:45,060 - root - INFO - Step 7410: lr=1.00E-05, loss= 1.2509 (max= 2.6686), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:45,060 - root - INFO - Step 7410: lr=1.00E-05, loss= 1.2509 (max= 2.6686), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:45,060 - root - INFO - Step 7410: lr=1.00E-05, loss= 1.2509 (max= 2.6686), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:08:45,060 - root - INFO - Step 7410: lr=1.00E-05, loss= 1.2509 (max= 2.6686), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:09:03,078 - root - INFO - Step 7420: lr=1.00E-05, loss= 1.2483 (max= 2.3900), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:03,078 - root - INFO - Step 7420: lr=1.00E-05, loss= 1.2483 (max= 2.3900), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:03,078 - root - INFO - Step 7420: lr=1.00E-05, loss= 1.2483 (max= 2.3900), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:03,078 - root - INFO - Step 7420: lr=1.00E-05, loss= 1.2483 (max= 2.3900), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:03,078 - root - INFO - Step 7420: lr=1.00E-05, loss= 1.2483 (max= 2.3900), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:03,078 - root - INFO - Step 7420: lr=1.00E-05, loss= 1.2483 (max= 2.3900), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:03,079 - root - INFO - Step 7420: lr=1.00E-05, loss= 1.2483 (max= 2.3900), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:03,079 - root - INFO - Step 7420: lr=1.00E-05, loss= 1.2483 (max= 2.3900), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:21,095 - root - INFO - Step 7430: lr=1.00E-05, loss= 1.2556 (max= 2.4350), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:21,096 - root - INFO - Step 7430: lr=1.00E-05, loss= 1.2556 (max= 2.4350), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:21,096 - root - INFO - Step 7430: lr=1.00E-05, loss= 1.2556 (max= 2.4350), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:21,096 - root - INFO - Step 7430: lr=1.00E-05, loss= 1.2556 (max= 2.4350), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:21,096 - root - INFO - Step 7430: lr=1.00E-05, loss= 1.2556 (max= 2.4350), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:21,096 - root - INFO - Step 7430: lr=1.00E-05, loss= 1.2556 (max= 2.4350), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:21,096 - root - INFO - Step 7430: lr=1.00E-05, loss= 1.2556 (max= 2.4350), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:21,096 - root - INFO - Step 7430: lr=1.00E-05, loss= 1.2556 (max= 2.4350), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:39,144 - root - INFO - Step 7440: lr=1.00E-05, loss= 1.2625 (max= 2.1704), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:39,144 - root - INFO - Step 7440: lr=1.00E-05, loss= 1.2625 (max= 2.1704), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:39,144 - root - INFO - Step 7440: lr=1.00E-05, loss= 1.2625 (max= 2.1704), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:39,144 - root - INFO - Step 7440: lr=1.00E-05, loss= 1.2625 (max= 2.1704), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:39,144 - root - INFO - Step 7440: lr=1.00E-05, loss= 1.2625 (max= 2.1704), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:39,144 - root - INFO - Step 7440: lr=1.00E-05, loss= 1.2625 (max= 2.1704), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:39,144 - root - INFO - Step 7440: lr=1.00E-05, loss= 1.2625 (max= 2.1704), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:39,144 - root - INFO - Step 7440: lr=1.00E-05, loss= 1.2625 (max= 2.1704), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:57,180 - root - INFO - Step 7450: lr=1.00E-05, loss= 1.3185 (max= 2.3676), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:57,180 - root - INFO - Step 7450: lr=1.00E-05, loss= 1.3185 (max= 2.3676), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:57,180 - root - INFO - Step 7450: lr=1.00E-05, loss= 1.3185 (max= 2.3676), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:57,180 - root - INFO - Step 7450: lr=1.00E-05, loss= 1.3185 (max= 2.3676), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:57,180 - root - INFO - Step 7450: lr=1.00E-05, loss= 1.3185 (max= 2.3676), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:57,180 - root - INFO - Step 7450: lr=1.00E-05, loss= 1.3185 (max= 2.3676), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:57,180 - root - INFO - Step 7450: lr=1.00E-05, loss= 1.3185 (max= 2.3676), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:09:57,180 - root - INFO - Step 7450: lr=1.00E-05, loss= 1.3185 (max= 2.3676), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:15,189 - root - INFO - Step 7460: lr=1.00E-05, loss= 1.2674 (max= 2.3343), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:15,189 - root - INFO - Step 7460: lr=1.00E-05, loss= 1.2674 (max= 2.3343), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:15,189 - root - INFO - Step 7460: lr=1.00E-05, loss= 1.2674 (max= 2.3343), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:15,189 - root - INFO - Step 7460: lr=1.00E-05, loss= 1.2674 (max= 2.3343), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:15,189 - root - INFO - Step 7460: lr=1.00E-05, loss= 1.2674 (max= 2.3343), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:15,189 - root - INFO - Step 7460: lr=1.00E-05, loss= 1.2674 (max= 2.3343), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:15,189 - root - INFO - Step 7460: lr=1.00E-05, loss= 1.2674 (max= 2.3343), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:15,189 - root - INFO - Step 7460: lr=1.00E-05, loss= 1.2674 (max= 2.3343), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:33,206 - root - INFO - Step 7470: lr=1.00E-05, loss= 1.2623 (max= 2.2129), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:33,206 - root - INFO - Step 7470: lr=1.00E-05, loss= 1.2623 (max= 2.2129), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:33,206 - root - INFO - Step 7470: lr=1.00E-05, loss= 1.2623 (max= 2.2129), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:33,206 - root - INFO - Step 7470: lr=1.00E-05, loss= 1.2623 (max= 2.2129), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:33,206 - root - INFO - Step 7470: lr=1.00E-05, loss= 1.2623 (max= 2.2129), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:33,206 - root - INFO - Step 7470: lr=1.00E-05, loss= 1.2623 (max= 2.2129), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:33,207 - root - INFO - Step 7470: lr=1.00E-05, loss= 1.2623 (max= 2.2129), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:33,207 - root - INFO - Step 7470: lr=1.00E-05, loss= 1.2623 (max= 2.2129), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:51,200 - root - INFO - Step 7480: lr=1.00E-05, loss= 1.2904 (max= 2.2250), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:51,200 - root - INFO - Step 7480: lr=1.00E-05, loss= 1.2904 (max= 2.2250), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:51,200 - root - INFO - Step 7480: lr=1.00E-05, loss= 1.2904 (max= 2.2250), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:51,200 - root - INFO - Step 7480: lr=1.00E-05, loss= 1.2904 (max= 2.2250), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:51,201 - root - INFO - Step 7480: lr=1.00E-05, loss= 1.2904 (max= 2.2250), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:51,201 - root - INFO - Step 7480: lr=1.00E-05, loss= 1.2904 (max= 2.2250), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:51,201 - root - INFO - Step 7480: lr=1.00E-05, loss= 1.2904 (max= 2.2250), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:10:51,201 - root - INFO - Step 7480: lr=1.00E-05, loss= 1.2904 (max= 2.2250), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:11:09,223 - root - INFO - Step 7490: lr=1.00E-05, loss= 1.2582 (max= 2.1902), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:11:09,223 - root - INFO - Step 7490: lr=1.00E-05, loss= 1.2582 (max= 2.1902), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:11:09,223 - root - INFO - Step 7490: lr=1.00E-05, loss= 1.2582 (max= 2.1902), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:11:09,223 - root - INFO - Step 7490: lr=1.00E-05, loss= 1.2582 (max= 2.1902), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:11:09,223 - root - INFO - Step 7490: lr=1.00E-05, loss= 1.2582 (max= 2.1902), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:11:09,223 - root - INFO - Step 7490: lr=1.00E-05, loss= 1.2582 (max= 2.1902), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:11:09,224 - root - INFO - Step 7490: lr=1.00E-05, loss= 1.2582 (max= 2.1902), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:11:09,224 - root - INFO - Step 7490: lr=1.00E-05, loss= 1.2582 (max= 2.1902), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:11:27,241 - root - INFO - Step 7500: lr=1.00E-05, loss= 1.2825 (max= 2.6972), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:11:27,241 - root - INFO - Step 7500: lr=1.00E-05, loss= 1.2825 (max= 2.6972), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:11:27,241 - root - INFO - Step 7500: lr=1.00E-05, loss= 1.2825 (max= 2.6972), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:11:27,241 - root - INFO - Step 7500: lr=1.00E-05, loss= 1.2825 (max= 2.6972), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:11:27,241 - root - INFO - Step 7500: lr=1.00E-05, loss= 1.2825 (max= 2.6972), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:11:27,241 - root - INFO - Step 7500: lr=1.00E-05, loss= 1.2825 (max= 2.6972), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:11:27,242 - root - INFO - Step 7500: lr=1.00E-05, loss= 1.2825 (max= 2.6972), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:11:27,242 - root - INFO - Step 7500: lr=1.00E-05, loss= 1.2825 (max= 2.6972), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:11:45,239 - root - INFO - Step 7510: lr=1.00E-05, loss= 1.2617 (max= 2.7630), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:11:45,239 - root - INFO - Step 7510: lr=1.00E-05, loss= 1.2617 (max= 2.7630), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:11:45,239 - root - INFO - Step 7510: lr=1.00E-05, loss= 1.2617 (max= 2.7630), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:11:45,239 - root - INFO - Step 7510: lr=1.00E-05, loss= 1.2617 (max= 2.7630), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:11:45,239 - root - INFO - Step 7510: lr=1.00E-05, loss= 1.2617 (max= 2.7630), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:11:45,239 - root - INFO - Step 7510: lr=1.00E-05, loss= 1.2617 (max= 2.7630), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:11:45,239 - root - INFO - Step 7510: lr=1.00E-05, loss= 1.2617 (max= 2.7630), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:11:45,240 - root - INFO - Step 7510: lr=1.00E-05, loss= 1.2617 (max= 2.7630), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:12:03,244 - root - INFO - Step 7520: lr=1.00E-05, loss= 1.2627 (max= 2.3776), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:03,244 - root - INFO - Step 7520: lr=1.00E-05, loss= 1.2627 (max= 2.3776), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:03,245 - root - INFO - Step 7520: lr=1.00E-05, loss= 1.2627 (max= 2.3776), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:03,245 - root - INFO - Step 7520: lr=1.00E-05, loss= 1.2627 (max= 2.3776), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:03,245 - root - INFO - Step 7520: lr=1.00E-05, loss= 1.2627 (max= 2.3776), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:03,245 - root - INFO - Step 7520: lr=1.00E-05, loss= 1.2627 (max= 2.3776), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:03,245 - root - INFO - Step 7520: lr=1.00E-05, loss= 1.2627 (max= 2.3776), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:03,245 - root - INFO - Step 7520: lr=1.00E-05, loss= 1.2627 (max= 2.3776), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:17,405 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:7336323 +2025-10-24 13:12:21,245 - root - INFO - Step 7530: lr=1.00E-05, loss= 1.2698 (max= 2.1900), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:12:21,245 - root - INFO - Step 7530: lr=1.00E-05, loss= 1.2698 (max= 2.1900), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:12:21,245 - root - INFO - Step 7530: lr=1.00E-05, loss= 1.2698 (max= 2.1900), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:12:21,245 - root - INFO - Step 7530: lr=1.00E-05, loss= 1.2698 (max= 2.1900), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:12:21,245 - root - INFO - Step 7530: lr=1.00E-05, loss= 1.2698 (max= 2.1900), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:12:21,245 - root - INFO - Step 7530: lr=1.00E-05, loss= 1.2698 (max= 2.1900), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:12:21,246 - root - INFO - Step 7530: lr=1.00E-05, loss= 1.2698 (max= 2.1900), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:12:21,246 - root - INFO - Step 7530: lr=1.00E-05, loss= 1.2698 (max= 2.1900), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:12:39,259 - root - INFO - Step 7540: lr=1.00E-05, loss= 1.2608 (max= 2.1654), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:39,260 - root - INFO - Step 7540: lr=1.00E-05, loss= 1.2608 (max= 2.1654), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:39,260 - root - INFO - Step 7540: lr=1.00E-05, loss= 1.2608 (max= 2.1654), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:39,260 - root - INFO - Step 7540: lr=1.00E-05, loss= 1.2608 (max= 2.1654), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:39,260 - root - INFO - Step 7540: lr=1.00E-05, loss= 1.2608 (max= 2.1654), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:39,260 - root - INFO - Step 7540: lr=1.00E-05, loss= 1.2608 (max= 2.1654), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:39,260 - root - INFO - Step 7540: lr=1.00E-05, loss= 1.2608 (max= 2.1654), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:39,260 - root - INFO - Step 7540: lr=1.00E-05, loss= 1.2608 (max= 2.1654), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:57,259 - root - INFO - Step 7550: lr=1.00E-05, loss= 1.2613 (max= 2.6183), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:57,259 - root - INFO - Step 7550: lr=1.00E-05, loss= 1.2613 (max= 2.6183), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:57,259 - root - INFO - Step 7550: lr=1.00E-05, loss= 1.2613 (max= 2.6183), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:57,259 - root - INFO - Step 7550: lr=1.00E-05, loss= 1.2613 (max= 2.6183), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:57,259 - root - INFO - Step 7550: lr=1.00E-05, loss= 1.2613 (max= 2.6183), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:57,259 - root - INFO - Step 7550: lr=1.00E-05, loss= 1.2613 (max= 2.6183), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:57,259 - root - INFO - Step 7550: lr=1.00E-05, loss= 1.2613 (max= 2.6183), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:12:57,259 - root - INFO - Step 7550: lr=1.00E-05, loss= 1.2613 (max= 2.6183), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:15,282 - root - INFO - Step 7560: lr=1.00E-05, loss= 1.2793 (max= 2.7418), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:15,283 - root - INFO - Step 7560: lr=1.00E-05, loss= 1.2793 (max= 2.7418), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:15,283 - root - INFO - Step 7560: lr=1.00E-05, loss= 1.2793 (max= 2.7418), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:15,283 - root - INFO - Step 7560: lr=1.00E-05, loss= 1.2793 (max= 2.7418), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:15,283 - root - INFO - Step 7560: lr=1.00E-05, loss= 1.2793 (max= 2.7418), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:15,283 - root - INFO - Step 7560: lr=1.00E-05, loss= 1.2793 (max= 2.7418), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:15,283 - root - INFO - Step 7560: lr=1.00E-05, loss= 1.2793 (max= 2.7418), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:15,283 - root - INFO - Step 7560: lr=1.00E-05, loss= 1.2793 (max= 2.7418), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:33,304 - root - INFO - Step 7570: lr=1.00E-05, loss= 1.2702 (max= 2.2329), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:33,304 - root - INFO - Step 7570: lr=1.00E-05, loss= 1.2702 (max= 2.2329), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:33,304 - root - INFO - Step 7570: lr=1.00E-05, loss= 1.2702 (max= 2.2329), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:33,304 - root - INFO - Step 7570: lr=1.00E-05, loss= 1.2702 (max= 2.2329), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:33,304 - root - INFO - Step 7570: lr=1.00E-05, loss= 1.2702 (max= 2.2329), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:33,304 - root - INFO - Step 7570: lr=1.00E-05, loss= 1.2702 (max= 2.2329), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:33,304 - root - INFO - Step 7570: lr=1.00E-05, loss= 1.2702 (max= 2.2329), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:33,304 - root - INFO - Step 7570: lr=1.00E-05, loss= 1.2702 (max= 2.2329), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:51,325 - root - INFO - Step 7580: lr=1.00E-05, loss= 1.2426 (max= 3.2531), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:51,325 - root - INFO - Step 7580: lr=1.00E-05, loss= 1.2426 (max= 3.2531), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:51,325 - root - INFO - Step 7580: lr=1.00E-05, loss= 1.2426 (max= 3.2531), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:51,325 - root - INFO - Step 7580: lr=1.00E-05, loss= 1.2426 (max= 3.2531), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:51,325 - root - INFO - Step 7580: lr=1.00E-05, loss= 1.2426 (max= 3.2531), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:51,325 - root - INFO - Step 7580: lr=1.00E-05, loss= 1.2426 (max= 3.2531), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:51,325 - root - INFO - Step 7580: lr=1.00E-05, loss= 1.2426 (max= 3.2531), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:13:51,325 - root - INFO - Step 7580: lr=1.00E-05, loss= 1.2426 (max= 3.2531), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:14:09,361 - root - INFO - Step 7590: lr=1.00E-05, loss= 1.2673 (max= 2.3504), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:14:09,361 - root - INFO - Step 7590: lr=1.00E-05, loss= 1.2673 (max= 2.3504), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:14:09,361 - root - INFO - Step 7590: lr=1.00E-05, loss= 1.2673 (max= 2.3504), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:14:09,361 - root - INFO - Step 7590: lr=1.00E-05, loss= 1.2673 (max= 2.3504), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:14:09,361 - root - INFO - Step 7590: lr=1.00E-05, loss= 1.2673 (max= 2.3504), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:14:09,361 - root - INFO - Step 7590: lr=1.00E-05, loss= 1.2673 (max= 2.3504), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:14:09,362 - root - INFO - Step 7590: lr=1.00E-05, loss= 1.2673 (max= 2.3504), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:14:09,362 - root - INFO - Step 7590: lr=1.00E-05, loss= 1.2673 (max= 2.3504), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:14:27,381 - root - INFO - Step 7600: lr=1.00E-05, loss= 1.2804 (max= 2.0127), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:14:27,381 - root - INFO - Step 7600: lr=1.00E-05, loss= 1.2804 (max= 2.0127), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:14:27,381 - root - INFO - Step 7600: lr=1.00E-05, loss= 1.2804 (max= 2.0127), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:14:27,381 - root - INFO - Step 7600: lr=1.00E-05, loss= 1.2804 (max= 2.0127), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:14:27,381 - root - INFO - Step 7600: lr=1.00E-05, loss= 1.2804 (max= 2.0127), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:14:27,381 - root - INFO - Step 7600: lr=1.00E-05, loss= 1.2804 (max= 2.0127), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:14:27,382 - root - INFO - Step 7600: lr=1.00E-05, loss= 1.2804 (max= 2.0127), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:14:27,382 - root - INFO - Step 7600: lr=1.00E-05, loss= 1.2804 (max= 2.0127), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:14:45,398 - root - INFO - Step 7610: lr=1.00E-05, loss= 1.2843 (max= 2.2877), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:14:45,398 - root - INFO - Step 7610: lr=1.00E-05, loss= 1.2843 (max= 2.2877), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:14:45,398 - root - INFO - Step 7610: lr=1.00E-05, loss= 1.2843 (max= 2.2877), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:14:45,398 - root - INFO - Step 7610: lr=1.00E-05, loss= 1.2843 (max= 2.2877), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:14:45,398 - root - INFO - Step 7610: lr=1.00E-05, loss= 1.2843 (max= 2.2877), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:14:45,398 - root - INFO - Step 7610: lr=1.00E-05, loss= 1.2843 (max= 2.2877), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:14:45,399 - root - INFO - Step 7610: lr=1.00E-05, loss= 1.2843 (max= 2.2877), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:14:45,399 - root - INFO - Step 7610: lr=1.00E-05, loss= 1.2843 (max= 2.2877), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:15:03,421 - root - INFO - Step 7620: lr=1.00E-05, loss= 1.2440 (max= 2.0772), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:15:03,421 - root - INFO - Step 7620: lr=1.00E-05, loss= 1.2440 (max= 2.0772), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:15:03,421 - root - INFO - Step 7620: lr=1.00E-05, loss= 1.2440 (max= 2.0772), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:15:03,422 - root - INFO - Step 7620: lr=1.00E-05, loss= 1.2440 (max= 2.0772), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:15:03,422 - root - INFO - Step 7620: lr=1.00E-05, loss= 1.2440 (max= 2.0772), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:15:03,422 - root - INFO - Step 7620: lr=1.00E-05, loss= 1.2440 (max= 2.0772), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:15:03,422 - root - INFO - Step 7620: lr=1.00E-05, loss= 1.2440 (max= 2.0772), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:15:03,422 - root - INFO - Step 7620: lr=1.00E-05, loss= 1.2440 (max= 2.0772), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:15:21,428 - root - INFO - Step 7630: lr=1.00E-05, loss= 1.2762 (max= 3.4132), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:21,428 - root - INFO - Step 7630: lr=1.00E-05, loss= 1.2762 (max= 3.4132), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:21,428 - root - INFO - Step 7630: lr=1.00E-05, loss= 1.2762 (max= 3.4132), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:21,428 - root - INFO - Step 7630: lr=1.00E-05, loss= 1.2762 (max= 3.4132), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:21,428 - root - INFO - Step 7630: lr=1.00E-05, loss= 1.2762 (max= 3.4132), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:21,428 - root - INFO - Step 7630: lr=1.00E-05, loss= 1.2762 (max= 3.4132), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:21,428 - root - INFO - Step 7630: lr=1.00E-05, loss= 1.2762 (max= 3.4132), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:21,428 - root - INFO - Step 7630: lr=1.00E-05, loss= 1.2762 (max= 3.4132), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:39,446 - root - INFO - Step 7640: lr=1.00E-05, loss= 1.2438 (max= 1.9942), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:39,446 - root - INFO - Step 7640: lr=1.00E-05, loss= 1.2438 (max= 1.9942), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:39,446 - root - INFO - Step 7640: lr=1.00E-05, loss= 1.2438 (max= 1.9942), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:39,446 - root - INFO - Step 7640: lr=1.00E-05, loss= 1.2438 (max= 1.9942), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:39,446 - root - INFO - Step 7640: lr=1.00E-05, loss= 1.2438 (max= 1.9942), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:39,446 - root - INFO - Step 7640: lr=1.00E-05, loss= 1.2438 (max= 1.9942), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:39,446 - root - INFO - Step 7640: lr=1.00E-05, loss= 1.2438 (max= 1.9942), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:39,446 - root - INFO - Step 7640: lr=1.00E-05, loss= 1.2438 (max= 1.9942), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:57,469 - root - INFO - Step 7650: lr=1.00E-05, loss= 1.2349 (max= 2.6567), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:57,470 - root - INFO - Step 7650: lr=1.00E-05, loss= 1.2349 (max= 2.6567), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:57,470 - root - INFO - Step 7650: lr=1.00E-05, loss= 1.2349 (max= 2.6567), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:57,470 - root - INFO - Step 7650: lr=1.00E-05, loss= 1.2349 (max= 2.6567), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:57,470 - root - INFO - Step 7650: lr=1.00E-05, loss= 1.2349 (max= 2.6567), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:57,470 - root - INFO - Step 7650: lr=1.00E-05, loss= 1.2349 (max= 2.6567), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:57,470 - root - INFO - Step 7650: lr=1.00E-05, loss= 1.2349 (max= 2.6567), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:15:57,470 - root - INFO - Step 7650: lr=1.00E-05, loss= 1.2349 (max= 2.6567), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:15,477 - root - INFO - Step 7660: lr=1.00E-05, loss= 1.2867 (max= 2.2108), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:15,477 - root - INFO - Step 7660: lr=1.00E-05, loss= 1.2867 (max= 2.2108), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:15,477 - root - INFO - Step 7660: lr=1.00E-05, loss= 1.2867 (max= 2.2108), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:15,477 - root - INFO - Step 7660: lr=1.00E-05, loss= 1.2867 (max= 2.2108), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:15,477 - root - INFO - Step 7660: lr=1.00E-05, loss= 1.2867 (max= 2.2108), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:15,477 - root - INFO - Step 7660: lr=1.00E-05, loss= 1.2867 (max= 2.2108), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:15,477 - root - INFO - Step 7660: lr=1.00E-05, loss= 1.2867 (max= 2.2108), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:15,477 - root - INFO - Step 7660: lr=1.00E-05, loss= 1.2867 (max= 2.2108), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:33,542 - root - INFO - Step 7670: lr=1.00E-05, loss= 1.2563 (max= 2.4173), tps=18142, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:33,542 - root - INFO - Step 7670: lr=1.00E-05, loss= 1.2563 (max= 2.4173), tps=18142, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:33,542 - root - INFO - Step 7670: lr=1.00E-05, loss= 1.2563 (max= 2.4173), tps=18142, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:33,542 - root - INFO - Step 7670: lr=1.00E-05, loss= 1.2563 (max= 2.4173), tps=18142, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:33,542 - root - INFO - Step 7670: lr=1.00E-05, loss= 1.2563 (max= 2.4173), tps=18142, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:33,542 - root - INFO - Step 7670: lr=1.00E-05, loss= 1.2563 (max= 2.4173), tps=18142, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:33,542 - root - INFO - Step 7670: lr=1.00E-05, loss= 1.2563 (max= 2.4173), tps=18142, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:33,542 - root - INFO - Step 7670: lr=1.00E-05, loss= 1.2563 (max= 2.4173), tps=18142, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:51,591 - root - INFO - Step 7680: lr=1.00E-05, loss= 1.2482 (max= 2.5412), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:51,591 - root - INFO - Step 7680: lr=1.00E-05, loss= 1.2482 (max= 2.5412), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:51,592 - root - INFO - Step 7680: lr=1.00E-05, loss= 1.2482 (max= 2.5412), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:51,592 - root - INFO - Step 7680: lr=1.00E-05, loss= 1.2482 (max= 2.5412), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:51,592 - root - INFO - Step 7680: lr=1.00E-05, loss= 1.2482 (max= 2.5412), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:51,592 - root - INFO - Step 7680: lr=1.00E-05, loss= 1.2482 (max= 2.5412), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:51,592 - root - INFO - Step 7680: lr=1.00E-05, loss= 1.2482 (max= 2.5412), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:16:51,592 - root - INFO - Step 7680: lr=1.00E-05, loss= 1.2482 (max= 2.5412), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:09,582 - root - INFO - Step 7690: lr=1.00E-05, loss= 1.2626 (max= 2.1992), tps=18217, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:09,582 - root - INFO - Step 7690: lr=1.00E-05, loss= 1.2626 (max= 2.1992), tps=18217, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:09,582 - root - INFO - Step 7690: lr=1.00E-05, loss= 1.2626 (max= 2.1992), tps=18217, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:09,582 - root - INFO - Step 7690: lr=1.00E-05, loss= 1.2626 (max= 2.1992), tps=18217, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:09,582 - root - INFO - Step 7690: lr=1.00E-05, loss= 1.2626 (max= 2.1992), tps=18217, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:09,582 - root - INFO - Step 7690: lr=1.00E-05, loss= 1.2626 (max= 2.1992), tps=18217, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:09,583 - root - INFO - Step 7690: lr=1.00E-05, loss= 1.2626 (max= 2.1992), tps=18217, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:09,583 - root - INFO - Step 7690: lr=1.00E-05, loss= 1.2626 (max= 2.1992), tps=18217, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:27,583 - root - INFO - Step 7700: lr=1.00E-05, loss= 1.2620 (max= 2.3084), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:27,583 - root - INFO - Step 7700: lr=1.00E-05, loss= 1.2620 (max= 2.3084), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:27,583 - root - INFO - Step 7700: lr=1.00E-05, loss= 1.2620 (max= 2.3084), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:27,583 - root - INFO - Step 7700: lr=1.00E-05, loss= 1.2620 (max= 2.3084), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:27,583 - root - INFO - Step 7700: lr=1.00E-05, loss= 1.2620 (max= 2.3084), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:27,583 - root - INFO - Step 7700: lr=1.00E-05, loss= 1.2620 (max= 2.3084), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:27,584 - root - INFO - Step 7700: lr=1.00E-05, loss= 1.2620 (max= 2.3084), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:27,584 - root - INFO - Step 7700: lr=1.00E-05, loss= 1.2620 (max= 2.3084), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:45,579 - root - INFO - Step 7710: lr=1.00E-05, loss= 1.2668 (max= 2.0875), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:45,580 - root - INFO - Step 7710: lr=1.00E-05, loss= 1.2668 (max= 2.0875), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:45,580 - root - INFO - Step 7710: lr=1.00E-05, loss= 1.2668 (max= 2.0875), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:45,580 - root - INFO - Step 7710: lr=1.00E-05, loss= 1.2668 (max= 2.0875), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:45,580 - root - INFO - Step 7710: lr=1.00E-05, loss= 1.2668 (max= 2.0875), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:45,580 - root - INFO - Step 7710: lr=1.00E-05, loss= 1.2668 (max= 2.0875), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:45,580 - root - INFO - Step 7710: lr=1.00E-05, loss= 1.2668 (max= 2.0875), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:17:45,580 - root - INFO - Step 7710: lr=1.00E-05, loss= 1.2668 (max= 2.0875), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:18:03,594 - root - INFO - Step 7720: lr=1.00E-05, loss= 1.2861 (max= 2.9471), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:18:03,594 - root - INFO - Step 7720: lr=1.00E-05, loss= 1.2861 (max= 2.9471), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:18:03,595 - root - INFO - Step 7720: lr=1.00E-05, loss= 1.2861 (max= 2.9471), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:18:03,595 - root - INFO - Step 7720: lr=1.00E-05, loss= 1.2861 (max= 2.9471), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:18:03,595 - root - INFO - Step 7720: lr=1.00E-05, loss= 1.2861 (max= 2.9471), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:18:03,595 - root - INFO - Step 7720: lr=1.00E-05, loss= 1.2861 (max= 2.9471), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:18:03,595 - root - INFO - Step 7720: lr=1.00E-05, loss= 1.2861 (max= 2.9471), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:18:03,595 - root - INFO - Step 7720: lr=1.00E-05, loss= 1.2861 (max= 2.9471), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:18:21,640 - root - INFO - Step 7730: lr=1.00E-05, loss= 1.2931 (max= 2.3919), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:18:21,641 - root - INFO - Step 7730: lr=1.00E-05, loss= 1.2931 (max= 2.3919), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:18:21,641 - root - INFO - Step 7730: lr=1.00E-05, loss= 1.2931 (max= 2.3919), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:18:21,641 - root - INFO - Step 7730: lr=1.00E-05, loss= 1.2931 (max= 2.3919), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:18:21,641 - root - INFO - Step 7730: lr=1.00E-05, loss= 1.2931 (max= 2.3919), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:18:21,641 - root - INFO - Step 7730: lr=1.00E-05, loss= 1.2931 (max= 2.3919), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:18:21,641 - root - INFO - Step 7730: lr=1.00E-05, loss= 1.2931 (max= 2.3919), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:18:21,641 - root - INFO - Step 7730: lr=1.00E-05, loss= 1.2931 (max= 2.3919), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:18:39,662 - root - INFO - Step 7740: lr=1.00E-05, loss= 1.2747 (max= 2.7039), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:18:39,662 - root - INFO - Step 7740: lr=1.00E-05, loss= 1.2747 (max= 2.7039), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:18:39,663 - root - INFO - Step 7740: lr=1.00E-05, loss= 1.2747 (max= 2.7039), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:18:39,663 - root - INFO - Step 7740: lr=1.00E-05, loss= 1.2747 (max= 2.7039), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:18:39,663 - root - INFO - Step 7740: lr=1.00E-05, loss= 1.2747 (max= 2.7039), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:18:39,663 - root - INFO - Step 7740: lr=1.00E-05, loss= 1.2747 (max= 2.7039), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:18:39,663 - root - INFO - Step 7740: lr=1.00E-05, loss= 1.2747 (max= 2.7039), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:18:39,663 - root - INFO - Step 7740: lr=1.00E-05, loss= 1.2747 (max= 2.7039), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:18:57,696 - root - INFO - Step 7750: lr=1.00E-05, loss= 1.2860 (max= 2.4439), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:18:57,696 - root - INFO - Step 7750: lr=1.00E-05, loss= 1.2860 (max= 2.4439), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:18:57,697 - root - INFO - Step 7750: lr=1.00E-05, loss= 1.2860 (max= 2.4439), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:18:57,697 - root - INFO - Step 7750: lr=1.00E-05, loss= 1.2860 (max= 2.4439), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:18:57,697 - root - INFO - Step 7750: lr=1.00E-05, loss= 1.2860 (max= 2.4439), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:18:57,697 - root - INFO - Step 7750: lr=1.00E-05, loss= 1.2860 (max= 2.4439), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:18:57,697 - root - INFO - Step 7750: lr=1.00E-05, loss= 1.2860 (max= 2.4439), tps=18174, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:18:57,697 - root - INFO - Step 7750: lr=1.00E-05, loss= 1.2860 (max= 2.4439), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:19:15,720 - root - INFO - Step 7760: lr=1.00E-05, loss= 1.2621 (max= 2.0789), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:19:15,720 - root - INFO - Step 7760: lr=1.00E-05, loss= 1.2621 (max= 2.0789), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:19:15,720 - root - INFO - Step 7760: lr=1.00E-05, loss= 1.2621 (max= 2.0789), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:19:15,720 - root - INFO - Step 7760: lr=1.00E-05, loss= 1.2621 (max= 2.0789), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:19:15,720 - root - INFO - Step 7760: lr=1.00E-05, loss= 1.2621 (max= 2.0789), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:19:15,720 - root - INFO - Step 7760: lr=1.00E-05, loss= 1.2621 (max= 2.0789), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:19:15,720 - root - INFO - Step 7760: lr=1.00E-05, loss= 1.2621 (max= 2.0789), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:19:15,721 - root - INFO - Step 7760: lr=1.00E-05, loss= 1.2621 (max= 2.0789), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:19:33,744 - root - INFO - Step 7770: lr=1.00E-05, loss= 1.2732 (max= 2.3428), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:19:33,744 - root - INFO - Step 7770: lr=1.00E-05, loss= 1.2732 (max= 2.3428), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:19:33,744 - root - INFO - Step 7770: lr=1.00E-05, loss= 1.2732 (max= 2.3428), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:19:33,744 - root - INFO - Step 7770: lr=1.00E-05, loss= 1.2732 (max= 2.3428), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:19:33,744 - root - INFO - Step 7770: lr=1.00E-05, loss= 1.2732 (max= 2.3428), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:19:33,744 - root - INFO - Step 7770: lr=1.00E-05, loss= 1.2732 (max= 2.3428), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:19:33,744 - root - INFO - Step 7770: lr=1.00E-05, loss= 1.2732 (max= 2.3428), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:19:33,744 - root - INFO - Step 7770: lr=1.00E-05, loss= 1.2732 (max= 2.3428), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:19:51,740 - root - INFO - Step 7780: lr=1.00E-05, loss= 1.2550 (max= 2.0884), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:19:51,740 - root - INFO - Step 7780: lr=1.00E-05, loss= 1.2550 (max= 2.0884), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:19:51,740 - root - INFO - Step 7780: lr=1.00E-05, loss= 1.2550 (max= 2.0884), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:19:51,740 - root - INFO - Step 7780: lr=1.00E-05, loss= 1.2550 (max= 2.0884), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:19:51,740 - root - INFO - Step 7780: lr=1.00E-05, loss= 1.2550 (max= 2.0884), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:19:51,740 - root - INFO - Step 7780: lr=1.00E-05, loss= 1.2550 (max= 2.0884), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:19:51,740 - root - INFO - Step 7780: lr=1.00E-05, loss= 1.2550 (max= 2.0884), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:19:51,740 - root - INFO - Step 7780: lr=1.00E-05, loss= 1.2550 (max= 2.0884), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:09,745 - root - INFO - Step 7790: lr=1.00E-05, loss= 1.2432 (max= 2.6736), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:09,745 - root - INFO - Step 7790: lr=1.00E-05, loss= 1.2432 (max= 2.6736), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:09,745 - root - INFO - Step 7790: lr=1.00E-05, loss= 1.2432 (max= 2.6736), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:09,745 - root - INFO - Step 7790: lr=1.00E-05, loss= 1.2432 (max= 2.6736), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:09,745 - root - INFO - Step 7790: lr=1.00E-05, loss= 1.2432 (max= 2.6736), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:09,745 - root - INFO - Step 7790: lr=1.00E-05, loss= 1.2432 (max= 2.6736), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:09,745 - root - INFO - Step 7790: lr=1.00E-05, loss= 1.2432 (max= 2.6736), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:09,745 - root - INFO - Step 7790: lr=1.00E-05, loss= 1.2432 (max= 2.6736), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:27,765 - root - INFO - Step 7800: lr=1.00E-05, loss= 1.2422 (max= 2.3021), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:27,765 - root - INFO - Step 7800: lr=1.00E-05, loss= 1.2422 (max= 2.3021), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:27,766 - root - INFO - Step 7800: lr=1.00E-05, loss= 1.2422 (max= 2.3021), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:27,766 - root - INFO - Step 7800: lr=1.00E-05, loss= 1.2422 (max= 2.3021), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:27,766 - root - INFO - Step 7800: lr=1.00E-05, loss= 1.2422 (max= 2.3021), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:27,766 - root - INFO - Step 7800: lr=1.00E-05, loss= 1.2422 (max= 2.3021), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:27,766 - root - INFO - Step 7800: lr=1.00E-05, loss= 1.2422 (max= 2.3021), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:27,766 - root - INFO - Step 7800: lr=1.00E-05, loss= 1.2422 (max= 2.3021), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:45,791 - root - INFO - Step 7810: lr=1.00E-05, loss= 1.2678 (max= 2.2679), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:45,791 - root - INFO - Step 7810: lr=1.00E-05, loss= 1.2678 (max= 2.2679), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:45,791 - root - INFO - Step 7810: lr=1.00E-05, loss= 1.2678 (max= 2.2679), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:45,791 - root - INFO - Step 7810: lr=1.00E-05, loss= 1.2678 (max= 2.2679), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:45,791 - root - INFO - Step 7810: lr=1.00E-05, loss= 1.2678 (max= 2.2679), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:45,791 - root - INFO - Step 7810: lr=1.00E-05, loss= 1.2678 (max= 2.2679), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:45,791 - root - INFO - Step 7810: lr=1.00E-05, loss= 1.2678 (max= 2.2679), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:20:45,792 - root - INFO - Step 7810: lr=1.00E-05, loss= 1.2678 (max= 2.2679), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:21:03,814 - root - INFO - Step 7820: lr=1.00E-05, loss= 1.2736 (max= 2.2340), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:21:03,814 - root - INFO - Step 7820: lr=1.00E-05, loss= 1.2736 (max= 2.2340), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:21:03,815 - root - INFO - Step 7820: lr=1.00E-05, loss= 1.2736 (max= 2.2340), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:21:03,815 - root - INFO - Step 7820: lr=1.00E-05, loss= 1.2736 (max= 2.2340), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:21:03,815 - root - INFO - Step 7820: lr=1.00E-05, loss= 1.2736 (max= 2.2340), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:21:03,815 - root - INFO - Step 7820: lr=1.00E-05, loss= 1.2736 (max= 2.2340), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:21:03,815 - root - INFO - Step 7820: lr=1.00E-05, loss= 1.2736 (max= 2.2340), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:21:03,815 - root - INFO - Step 7820: lr=1.00E-05, loss= 1.2736 (max= 2.2340), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:21:21,842 - root - INFO - Step 7830: lr=1.00E-05, loss= 1.2941 (max= 3.9430), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:21:21,842 - root - INFO - Step 7830: lr=1.00E-05, loss= 1.2941 (max= 3.9430), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:21:21,842 - root - INFO - Step 7830: lr=1.00E-05, loss= 1.2941 (max= 3.9430), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:21:21,842 - root - INFO - Step 7830: lr=1.00E-05, loss= 1.2941 (max= 3.9430), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:21:21,842 - root - INFO - Step 7830: lr=1.00E-05, loss= 1.2941 (max= 3.9430), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:21:21,843 - root - INFO - Step 7830: lr=1.00E-05, loss= 1.2941 (max= 3.9430), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:21:21,843 - root - INFO - Step 7830: lr=1.00E-05, loss= 1.2941 (max= 3.9430), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:21:21,843 - root - INFO - Step 7830: lr=1.00E-05, loss= 1.2941 (max= 3.9430), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:21:39,861 - root - INFO - Step 7840: lr=1.00E-05, loss= 1.2812 (max= 2.2826), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:21:39,861 - root - INFO - Step 7840: lr=1.00E-05, loss= 1.2812 (max= 2.2826), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:21:39,861 - root - INFO - Step 7840: lr=1.00E-05, loss= 1.2812 (max= 2.2826), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:21:39,862 - root - INFO - Step 7840: lr=1.00E-05, loss= 1.2812 (max= 2.2826), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:21:39,862 - root - INFO - Step 7840: lr=1.00E-05, loss= 1.2812 (max= 2.2826), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:21:39,862 - root - INFO - Step 7840: lr=1.00E-05, loss= 1.2812 (max= 2.2826), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:21:39,862 - root - INFO - Step 7840: lr=1.00E-05, loss= 1.2812 (max= 2.2826), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:21:39,862 - root - INFO - Step 7840: lr=1.00E-05, loss= 1.2812 (max= 2.2826), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:21:57,889 - root - INFO - Step 7850: lr=1.00E-05, loss= 1.2555 (max= 2.1231), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:21:57,889 - root - INFO - Step 7850: lr=1.00E-05, loss= 1.2555 (max= 2.1231), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:21:57,889 - root - INFO - Step 7850: lr=1.00E-05, loss= 1.2555 (max= 2.1231), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:21:57,889 - root - INFO - Step 7850: lr=1.00E-05, loss= 1.2555 (max= 2.1231), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:21:57,889 - root - INFO - Step 7850: lr=1.00E-05, loss= 1.2555 (max= 2.1231), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:21:57,889 - root - INFO - Step 7850: lr=1.00E-05, loss= 1.2555 (max= 2.1231), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:21:57,889 - root - INFO - Step 7850: lr=1.00E-05, loss= 1.2555 (max= 2.1231), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:21:57,889 - root - INFO - Step 7850: lr=1.00E-05, loss= 1.2555 (max= 2.1231), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:22:15,876 - root - INFO - Step 7860: lr=1.00E-05, loss= 1.2742 (max= 2.2137), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:22:15,876 - root - INFO - Step 7860: lr=1.00E-05, loss= 1.2742 (max= 2.2137), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:22:15,877 - root - INFO - Step 7860: lr=1.00E-05, loss= 1.2742 (max= 2.2137), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:22:15,877 - root - INFO - Step 7860: lr=1.00E-05, loss= 1.2742 (max= 2.2137), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:22:15,877 - root - INFO - Step 7860: lr=1.00E-05, loss= 1.2742 (max= 2.2137), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:22:15,877 - root - INFO - Step 7860: lr=1.00E-05, loss= 1.2742 (max= 2.2137), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:22:15,877 - root - INFO - Step 7860: lr=1.00E-05, loss= 1.2742 (max= 2.2137), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:22:15,877 - root - INFO - Step 7860: lr=1.00E-05, loss= 1.2742 (max= 2.2137), tps=18221, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:22:33,900 - root - INFO - Step 7870: lr=1.00E-05, loss= 1.3032 (max= 2.2370), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:22:33,900 - root - INFO - Step 7870: lr=1.00E-05, loss= 1.3032 (max= 2.2370), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:22:33,900 - root - INFO - Step 7870: lr=1.00E-05, loss= 1.3032 (max= 2.2370), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:22:33,900 - root - INFO - Step 7870: lr=1.00E-05, loss= 1.3032 (max= 2.2370), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:22:33,900 - root - INFO - Step 7870: lr=1.00E-05, loss= 1.3032 (max= 2.2370), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:22:33,900 - root - INFO - Step 7870: lr=1.00E-05, loss= 1.3032 (max= 2.2370), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:22:33,900 - root - INFO - Step 7870: lr=1.00E-05, loss= 1.3032 (max= 2.2370), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:22:33,900 - root - INFO - Step 7870: lr=1.00E-05, loss= 1.3032 (max= 2.2370), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:22:51,928 - root - INFO - Step 7880: lr=1.00E-05, loss= 1.2695 (max= 2.0939), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:22:51,928 - root - INFO - Step 7880: lr=1.00E-05, loss= 1.2695 (max= 2.0939), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:22:51,928 - root - INFO - Step 7880: lr=1.00E-05, loss= 1.2695 (max= 2.0939), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:22:51,929 - root - INFO - Step 7880: lr=1.00E-05, loss= 1.2695 (max= 2.0939), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:22:51,929 - root - INFO - Step 7880: lr=1.00E-05, loss= 1.2695 (max= 2.0939), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:22:51,929 - root - INFO - Step 7880: lr=1.00E-05, loss= 1.2695 (max= 2.0939), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:22:51,929 - root - INFO - Step 7880: lr=1.00E-05, loss= 1.2695 (max= 2.0939), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:22:51,929 - root - INFO - Step 7880: lr=1.00E-05, loss= 1.2695 (max= 2.0939), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:23:09,943 - root - INFO - Step 7890: lr=1.00E-05, loss= 1.2710 (max= 2.0932), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:23:09,943 - root - INFO - Step 7890: lr=1.00E-05, loss= 1.2710 (max= 2.0932), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:23:09,943 - root - INFO - Step 7890: lr=1.00E-05, loss= 1.2710 (max= 2.0932), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:23:09,943 - root - INFO - Step 7890: lr=1.00E-05, loss= 1.2710 (max= 2.0932), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:23:09,943 - root - INFO - Step 7890: lr=1.00E-05, loss= 1.2710 (max= 2.0932), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:23:09,943 - root - INFO - Step 7890: lr=1.00E-05, loss= 1.2710 (max= 2.0932), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:23:09,943 - root - INFO - Step 7890: lr=1.00E-05, loss= 1.2710 (max= 2.0932), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:23:09,943 - root - INFO - Step 7890: lr=1.00E-05, loss= 1.2710 (max= 2.0932), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:23:27,963 - root - INFO - Step 7900: lr=1.00E-05, loss= 1.2642 (max= 2.3639), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:23:27,963 - root - INFO - Step 7900: lr=1.00E-05, loss= 1.2642 (max= 2.3639), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:23:27,964 - root - INFO - Step 7900: lr=1.00E-05, loss= 1.2642 (max= 2.3639), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:23:27,964 - root - INFO - Step 7900: lr=1.00E-05, loss= 1.2642 (max= 2.3639), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:23:27,964 - root - INFO - Step 7900: lr=1.00E-05, loss= 1.2642 (max= 2.3639), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:23:27,964 - root - INFO - Step 7900: lr=1.00E-05, loss= 1.2642 (max= 2.3639), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:23:27,964 - root - INFO - Step 7900: lr=1.00E-05, loss= 1.2642 (max= 2.3639), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:23:27,964 - root - INFO - Step 7900: lr=1.00E-05, loss= 1.2642 (max= 2.3639), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:23:45,995 - root - INFO - Step 7910: lr=1.00E-05, loss= 1.2494 (max= 2.1331), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:23:45,995 - root - INFO - Step 7910: lr=1.00E-05, loss= 1.2494 (max= 2.1331), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:23:45,995 - root - INFO - Step 7910: lr=1.00E-05, loss= 1.2494 (max= 2.1331), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:23:45,995 - root - INFO - Step 7910: lr=1.00E-05, loss= 1.2494 (max= 2.1331), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:23:45,995 - root - INFO - Step 7910: lr=1.00E-05, loss= 1.2494 (max= 2.1331), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:23:45,995 - root - INFO - Step 7910: lr=1.00E-05, loss= 1.2494 (max= 2.1331), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:23:45,995 - root - INFO - Step 7910: lr=1.00E-05, loss= 1.2494 (max= 2.1331), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:23:45,995 - root - INFO - Step 7910: lr=1.00E-05, loss= 1.2494 (max= 2.1331), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:24:04,022 - root - INFO - Step 7920: lr=1.00E-05, loss= 1.2558 (max= 2.1830), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:24:04,022 - root - INFO - Step 7920: lr=1.00E-05, loss= 1.2558 (max= 2.1830), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:24:04,022 - root - INFO - Step 7920: lr=1.00E-05, loss= 1.2558 (max= 2.1830), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:24:04,022 - root - INFO - Step 7920: lr=1.00E-05, loss= 1.2558 (max= 2.1830), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:24:04,022 - root - INFO - Step 7920: lr=1.00E-05, loss= 1.2558 (max= 2.1830), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:24:04,022 - root - INFO - Step 7920: lr=1.00E-05, loss= 1.2558 (max= 2.1830), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:24:04,022 - root - INFO - Step 7920: lr=1.00E-05, loss= 1.2558 (max= 2.1830), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:24:04,023 - root - INFO - Step 7920: lr=1.00E-05, loss= 1.2558 (max= 2.1830), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:24:22,056 - root - INFO - Step 7930: lr=1.00E-05, loss= 1.2603 (max= 2.7069), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:24:22,056 - root - INFO - Step 7930: lr=1.00E-05, loss= 1.2603 (max= 2.7069), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:24:22,056 - root - INFO - Step 7930: lr=1.00E-05, loss= 1.2603 (max= 2.7069), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:24:22,056 - root - INFO - Step 7930: lr=1.00E-05, loss= 1.2603 (max= 2.7069), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:24:22,056 - root - INFO - Step 7930: lr=1.00E-05, loss= 1.2603 (max= 2.7069), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:24:22,056 - root - INFO - Step 7930: lr=1.00E-05, loss= 1.2603 (max= 2.7069), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:24:22,056 - root - INFO - Step 7930: lr=1.00E-05, loss= 1.2603 (max= 2.7069), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:24:22,056 - root - INFO - Step 7930: lr=1.00E-05, loss= 1.2603 (max= 2.7069), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:24:40,069 - root - INFO - Step 7940: lr=1.00E-05, loss= 1.2421 (max= 2.1325), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:24:40,069 - root - INFO - Step 7940: lr=1.00E-05, loss= 1.2421 (max= 2.1325), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:24:40,069 - root - INFO - Step 7940: lr=1.00E-05, loss= 1.2421 (max= 2.1325), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:24:40,069 - root - INFO - Step 7940: lr=1.00E-05, loss= 1.2421 (max= 2.1325), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:24:40,069 - root - INFO - Step 7940: lr=1.00E-05, loss= 1.2421 (max= 2.1325), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:24:40,069 - root - INFO - Step 7940: lr=1.00E-05, loss= 1.2421 (max= 2.1325), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:24:40,069 - root - INFO - Step 7940: lr=1.00E-05, loss= 1.2421 (max= 2.1325), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:24:40,069 - root - INFO - Step 7940: lr=1.00E-05, loss= 1.2421 (max= 2.1325), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:24:58,100 - root - INFO - Step 7950: lr=1.00E-05, loss= 1.2458 (max= 2.1011), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:24:58,100 - root - INFO - Step 7950: lr=1.00E-05, loss= 1.2458 (max= 2.1011), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:24:58,100 - root - INFO - Step 7950: lr=1.00E-05, loss= 1.2458 (max= 2.1011), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:24:58,100 - root - INFO - Step 7950: lr=1.00E-05, loss= 1.2458 (max= 2.1011), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:24:58,100 - root - INFO - Step 7950: lr=1.00E-05, loss= 1.2458 (max= 2.1011), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:24:58,100 - root - INFO - Step 7950: lr=1.00E-05, loss= 1.2458 (max= 2.1011), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:24:58,100 - root - INFO - Step 7950: lr=1.00E-05, loss= 1.2458 (max= 2.1011), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:24:58,101 - root - INFO - Step 7950: lr=1.00E-05, loss= 1.2458 (max= 2.1011), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:25:16,136 - root - INFO - Step 7960: lr=1.00E-05, loss= 1.2748 (max= 2.3823), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:25:16,136 - root - INFO - Step 7960: lr=1.00E-05, loss= 1.2748 (max= 2.3823), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:25:16,136 - root - INFO - Step 7960: lr=1.00E-05, loss= 1.2748 (max= 2.3823), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:25:16,136 - root - INFO - Step 7960: lr=1.00E-05, loss= 1.2748 (max= 2.3823), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:25:16,136 - root - INFO - Step 7960: lr=1.00E-05, loss= 1.2748 (max= 2.3823), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:25:16,136 - root - INFO - Step 7960: lr=1.00E-05, loss= 1.2748 (max= 2.3823), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:25:16,136 - root - INFO - Step 7960: lr=1.00E-05, loss= 1.2748 (max= 2.3823), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:25:16,136 - root - INFO - Step 7960: lr=1.00E-05, loss= 1.2748 (max= 2.3823), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:25:34,180 - root - INFO - Step 7970: lr=1.00E-05, loss= 1.2716 (max= 2.1953), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:25:34,180 - root - INFO - Step 7970: lr=1.00E-05, loss= 1.2716 (max= 2.1953), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:25:34,180 - root - INFO - Step 7970: lr=1.00E-05, loss= 1.2716 (max= 2.1953), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:25:34,180 - root - INFO - Step 7970: lr=1.00E-05, loss= 1.2716 (max= 2.1953), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:25:34,180 - root - INFO - Step 7970: lr=1.00E-05, loss= 1.2716 (max= 2.1953), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:25:34,180 - root - INFO - Step 7970: lr=1.00E-05, loss= 1.2716 (max= 2.1953), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:25:34,180 - root - INFO - Step 7970: lr=1.00E-05, loss= 1.2716 (max= 2.1953), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:25:34,180 - root - INFO - Step 7970: lr=1.00E-05, loss= 1.2716 (max= 2.1953), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:25:52,200 - root - INFO - Step 7980: lr=1.00E-05, loss= 1.2756 (max= 3.6848), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:25:52,200 - root - INFO - Step 7980: lr=1.00E-05, loss= 1.2756 (max= 3.6848), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:25:52,200 - root - INFO - Step 7980: lr=1.00E-05, loss= 1.2756 (max= 3.6848), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:25:52,200 - root - INFO - Step 7980: lr=1.00E-05, loss= 1.2756 (max= 3.6848), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:25:52,200 - root - INFO - Step 7980: lr=1.00E-05, loss= 1.2756 (max= 3.6848), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:25:52,200 - root - INFO - Step 7980: lr=1.00E-05, loss= 1.2756 (max= 3.6848), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:25:52,200 - root - INFO - Step 7980: lr=1.00E-05, loss= 1.2756 (max= 3.6848), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:25:52,200 - root - INFO - Step 7980: lr=1.00E-05, loss= 1.2756 (max= 3.6848), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:26:10,217 - root - INFO - Step 7990: lr=1.00E-05, loss= 1.2752 (max= 2.1501), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:26:10,217 - root - INFO - Step 7990: lr=1.00E-05, loss= 1.2752 (max= 2.1501), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:26:10,217 - root - INFO - Step 7990: lr=1.00E-05, loss= 1.2752 (max= 2.1501), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:26:10,217 - root - INFO - Step 7990: lr=1.00E-05, loss= 1.2752 (max= 2.1501), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:26:10,217 - root - INFO - Step 7990: lr=1.00E-05, loss= 1.2752 (max= 2.1501), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:26:10,217 - root - INFO - Step 7990: lr=1.00E-05, loss= 1.2752 (max= 2.1501), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:26:10,217 - root - INFO - Step 7990: lr=1.00E-05, loss= 1.2752 (max= 2.1501), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:26:10,218 - root - INFO - Step 7990: lr=1.00E-05, loss= 1.2752 (max= 2.1501), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-8000 +2025-10-24 13:26:28,230 - root - INFO - Step 8000: lr=1.00E-05, loss= 1.2672 (max= 2.4030), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:26:28,230 - root - INFO - Saving a full checkpoint at step 8000 +2025-10-24 13:26:28,230 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 13:26:28,230 - root - INFO - Step 8000: lr=1.00E-05, loss= 1.2672 (max= 2.4030), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:26:28,230 - root - INFO - Step 8000: lr=1.00E-05, loss= 1.2672 (max= 2.4030), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:26:28,230 - root - INFO - Saving a full checkpoint at step 8000 +2025-10-24 13:26:28,230 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 13:26:28,230 - root - INFO - Step 8000: lr=1.00E-05, loss= 1.2672 (max= 2.4030), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:26:28,230 - root - INFO - Saving a full checkpoint at step 8000 +2025-10-24 13:26:28,230 - root - INFO - Step 8000: lr=1.00E-05, loss= 1.2672 (max= 2.4030), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:26:28,230 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 13:26:28,230 - root - INFO - Step 8000: lr=1.00E-05, loss= 1.2672 (max= 2.4030), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:26:28,230 - root - INFO - Saving a full checkpoint at step 8000 +2025-10-24 13:26:28,231 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 13:26:28,231 - root - INFO - Saving a full checkpoint at step 8000 +2025-10-24 13:26:28,231 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 13:26:28,231 - root - INFO - Saving a full checkpoint at step 8000 +2025-10-24 13:26:28,231 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 13:26:28,231 - root - INFO - Step 8000: lr=1.00E-05, loss= 1.2672 (max= 2.4030), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:26:28,231 - root - INFO - Saving a full checkpoint at step 8000 +2025-10-24 13:26:28,231 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 13:26:28,231 - root - INFO - Step 8000: lr=1.00E-05, loss= 1.2672 (max= 2.4030), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:26:28,231 - root - INFO - Saving a full checkpoint at step 8000 +2025-10-24 13:26:28,231 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-8000! Save time: 4.627480983734131 +2025-10-24 13:26:42,966 - root - INFO - Finished saving the checkpoint in 14.74 seconds +2025-10-24 13:26:42,973 - root - INFO - Finished saving the checkpoint in 14.74 seconds +2025-10-24 13:26:42,974 - root - INFO - Finished saving the checkpoint in 14.74 seconds +2025-10-24 13:26:42,975 - root - INFO - Finished saving the checkpoint in 14.74 seconds +2025-10-24 13:26:42,975 - root - INFO - Finished saving the checkpoint in 14.74 seconds +2025-10-24 13:26:42,975 - root - INFO - Finished saving the checkpoint in 14.74 seconds +2025-10-24 13:26:42,976 - root - INFO - Finished saving the checkpoint in 14.74 seconds +2025-10-24 13:26:42,976 - root - INFO - Finished saving the checkpoint in 14.75 seconds +2025-10-24 13:27:00,925 - root - INFO - Step 8010: lr=1.00E-05, loss= 1.2510 (max= 2.6131), tps=10023, mfu=20.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 13:27:00,926 - root - INFO - Step 8010: lr=1.00E-05, loss= 1.2510 (max= 2.6131), tps=10023, mfu=20.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 13:27:00,926 - root - INFO - Step 8010: lr=1.00E-05, loss= 1.2510 (max= 2.6131), tps=10023, mfu=20.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 13:27:00,926 - root - INFO - Step 8010: lr=1.00E-05, loss= 1.2510 (max= 2.6131), tps=10023, mfu=20.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 13:27:00,926 - root - INFO - Step 8010: lr=1.00E-05, loss= 1.2510 (max= 2.6131), tps=10023, mfu=20.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 13:27:00,926 - root - INFO - Step 8010: lr=1.00E-05, loss= 1.2510 (max= 2.6131), tps=10023, mfu=20.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 13:27:00,926 - root - INFO - Step 8010: lr=1.00E-05, loss= 1.2510 (max= 2.6131), tps=10023, mfu=20.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 13:27:00,926 - root - INFO - Step 8010: lr=1.00E-05, loss= 1.2510 (max= 2.6131), tps=10023, mfu=20.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 13:27:18,961 - root - INFO - Step 8020: lr=1.00E-05, loss= 1.2757 (max= 2.8591), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:27:18,962 - root - INFO - Step 8020: lr=1.00E-05, loss= 1.2757 (max= 2.8591), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:27:18,962 - root - INFO - Step 8020: lr=1.00E-05, loss= 1.2757 (max= 2.8591), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:27:18,962 - root - INFO - Step 8020: lr=1.00E-05, loss= 1.2757 (max= 2.8591), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:27:18,962 - root - INFO - Step 8020: lr=1.00E-05, loss= 1.2757 (max= 2.8591), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:27:18,962 - root - INFO - Step 8020: lr=1.00E-05, loss= 1.2757 (max= 2.8591), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:27:18,962 - root - INFO - Step 8020: lr=1.00E-05, loss= 1.2757 (max= 2.8591), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:27:18,962 - root - INFO - Step 8020: lr=1.00E-05, loss= 1.2757 (max= 2.8591), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:27:36,988 - root - INFO - Step 8030: lr=1.00E-05, loss= 1.2588 (max= 2.9313), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:27:36,988 - root - INFO - Step 8030: lr=1.00E-05, loss= 1.2588 (max= 2.9313), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:27:36,988 - root - INFO - Step 8030: lr=1.00E-05, loss= 1.2588 (max= 2.9313), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:27:36,988 - root - INFO - Step 8030: lr=1.00E-05, loss= 1.2588 (max= 2.9313), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:27:36,988 - root - INFO - Step 8030: lr=1.00E-05, loss= 1.2588 (max= 2.9313), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:27:36,988 - root - INFO - Step 8030: lr=1.00E-05, loss= 1.2588 (max= 2.9313), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:27:36,988 - root - INFO - Step 8030: lr=1.00E-05, loss= 1.2588 (max= 2.9313), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:27:36,989 - root - INFO - Step 8030: lr=1.00E-05, loss= 1.2588 (max= 2.9313), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:27:54,984 - root - INFO - Step 8040: lr=1.00E-05, loss= 1.2260 (max= 2.1663), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:27:54,984 - root - INFO - Step 8040: lr=1.00E-05, loss= 1.2260 (max= 2.1663), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:27:54,984 - root - INFO - Step 8040: lr=1.00E-05, loss= 1.2260 (max= 2.1663), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:27:54,984 - root - INFO - Step 8040: lr=1.00E-05, loss= 1.2260 (max= 2.1663), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:27:54,984 - root - INFO - Step 8040: lr=1.00E-05, loss= 1.2260 (max= 2.1663), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:27:54,984 - root - INFO - Step 8040: lr=1.00E-05, loss= 1.2260 (max= 2.1663), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:27:54,984 - root - INFO - Step 8040: lr=1.00E-05, loss= 1.2260 (max= 2.1663), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:27:54,984 - root - INFO - Step 8040: lr=1.00E-05, loss= 1.2260 (max= 2.1663), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:12,987 - root - INFO - Step 8050: lr=1.00E-05, loss= 1.2446 (max= 2.5987), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:12,987 - root - INFO - Step 8050: lr=1.00E-05, loss= 1.2446 (max= 2.5987), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:12,987 - root - INFO - Step 8050: lr=1.00E-05, loss= 1.2446 (max= 2.5987), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:12,987 - root - INFO - Step 8050: lr=1.00E-05, loss= 1.2446 (max= 2.5987), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:12,987 - root - INFO - Step 8050: lr=1.00E-05, loss= 1.2446 (max= 2.5987), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:12,987 - root - INFO - Step 8050: lr=1.00E-05, loss= 1.2446 (max= 2.5987), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:12,987 - root - INFO - Step 8050: lr=1.00E-05, loss= 1.2446 (max= 2.5987), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:12,987 - root - INFO - Step 8050: lr=1.00E-05, loss= 1.2446 (max= 2.5987), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:31,005 - root - INFO - Step 8060: lr=1.00E-05, loss= 1.2591 (max= 2.4957), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:31,006 - root - INFO - Step 8060: lr=1.00E-05, loss= 1.2591 (max= 2.4957), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:31,006 - root - INFO - Step 8060: lr=1.00E-05, loss= 1.2591 (max= 2.4957), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:31,006 - root - INFO - Step 8060: lr=1.00E-05, loss= 1.2591 (max= 2.4957), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:31,006 - root - INFO - Step 8060: lr=1.00E-05, loss= 1.2591 (max= 2.4957), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:31,006 - root - INFO - Step 8060: lr=1.00E-05, loss= 1.2591 (max= 2.4957), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:31,006 - root - INFO - Step 8060: lr=1.00E-05, loss= 1.2591 (max= 2.4957), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:31,006 - root - INFO - Step 8060: lr=1.00E-05, loss= 1.2591 (max= 2.4957), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:49,038 - root - INFO - Step 8070: lr=1.00E-05, loss= 1.2571 (max= 2.1847), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:49,038 - root - INFO - Step 8070: lr=1.00E-05, loss= 1.2571 (max= 2.1847), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:49,038 - root - INFO - Step 8070: lr=1.00E-05, loss= 1.2571 (max= 2.1847), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:49,038 - root - INFO - Step 8070: lr=1.00E-05, loss= 1.2571 (max= 2.1847), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:49,038 - root - INFO - Step 8070: lr=1.00E-05, loss= 1.2571 (max= 2.1847), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:49,038 - root - INFO - Step 8070: lr=1.00E-05, loss= 1.2571 (max= 2.1847), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:49,038 - root - INFO - Step 8070: lr=1.00E-05, loss= 1.2571 (max= 2.1847), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:28:49,039 - root - INFO - Step 8070: lr=1.00E-05, loss= 1.2571 (max= 2.1847), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:29:07,059 - root - INFO - Step 8080: lr=1.00E-05, loss= 1.2609 (max= 2.2563), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:29:07,059 - root - INFO - Step 8080: lr=1.00E-05, loss= 1.2609 (max= 2.2563), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:29:07,059 - root - INFO - Step 8080: lr=1.00E-05, loss= 1.2609 (max= 2.2563), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:29:07,059 - root - INFO - Step 8080: lr=1.00E-05, loss= 1.2609 (max= 2.2563), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:29:07,059 - root - INFO - Step 8080: lr=1.00E-05, loss= 1.2609 (max= 2.2563), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:29:07,059 - root - INFO - Step 8080: lr=1.00E-05, loss= 1.2609 (max= 2.2563), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:29:07,059 - root - INFO - Step 8080: lr=1.00E-05, loss= 1.2609 (max= 2.2563), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:29:07,060 - root - INFO - Step 8080: lr=1.00E-05, loss= 1.2609 (max= 2.2563), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:29:25,065 - root - INFO - Step 8090: lr=1.00E-05, loss= 1.2660 (max= 2.0585), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:29:25,065 - root - INFO - Step 8090: lr=1.00E-05, loss= 1.2660 (max= 2.0585), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:29:25,065 - root - INFO - Step 8090: lr=1.00E-05, loss= 1.2660 (max= 2.0585), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:29:25,065 - root - INFO - Step 8090: lr=1.00E-05, loss= 1.2660 (max= 2.0585), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:29:25,065 - root - INFO - Step 8090: lr=1.00E-05, loss= 1.2660 (max= 2.0585), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:29:25,065 - root - INFO - Step 8090: lr=1.00E-05, loss= 1.2660 (max= 2.0585), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:29:25,065 - root - INFO - Step 8090: lr=1.00E-05, loss= 1.2660 (max= 2.0585), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:29:25,065 - root - INFO - Step 8090: lr=1.00E-05, loss= 1.2660 (max= 2.0585), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:29:43,133 - root - INFO - Step 8100: lr=1.00E-05, loss= 1.2538 (max= 2.1778), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:29:43,133 - root - INFO - Step 8100: lr=1.00E-05, loss= 1.2538 (max= 2.1778), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:29:43,133 - root - INFO - Step 8100: lr=1.00E-05, loss= 1.2538 (max= 2.1778), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:29:43,133 - root - INFO - Step 8100: lr=1.00E-05, loss= 1.2538 (max= 2.1778), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:29:43,133 - root - INFO - Step 8100: lr=1.00E-05, loss= 1.2538 (max= 2.1778), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:29:43,133 - root - INFO - Step 8100: lr=1.00E-05, loss= 1.2538 (max= 2.1778), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:29:43,133 - root - INFO - Step 8100: lr=1.00E-05, loss= 1.2538 (max= 2.1778), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:29:43,134 - root - INFO - Step 8100: lr=1.00E-05, loss= 1.2538 (max= 2.1778), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:30:01,185 - root - INFO - Step 8110: lr=1.00E-05, loss= 1.2613 (max= 3.1174), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:30:01,185 - root - INFO - Step 8110: lr=1.00E-05, loss= 1.2613 (max= 3.1174), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:30:01,185 - root - INFO - Step 8110: lr=1.00E-05, loss= 1.2613 (max= 3.1174), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:30:01,186 - root - INFO - Step 8110: lr=1.00E-05, loss= 1.2613 (max= 3.1174), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:30:01,186 - root - INFO - Step 8110: lr=1.00E-05, loss= 1.2613 (max= 3.1174), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:30:01,186 - root - INFO - Step 8110: lr=1.00E-05, loss= 1.2613 (max= 3.1174), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:30:01,186 - root - INFO - Step 8110: lr=1.00E-05, loss= 1.2613 (max= 3.1174), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:30:01,186 - root - INFO - Step 8110: lr=1.00E-05, loss= 1.2613 (max= 3.1174), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:30:19,232 - root - INFO - Step 8120: lr=1.00E-05, loss= 1.2630 (max= 2.2426), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:30:19,232 - root - INFO - Step 8120: lr=1.00E-05, loss= 1.2630 (max= 2.2426), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:30:19,232 - root - INFO - Step 8120: lr=1.00E-05, loss= 1.2630 (max= 2.2426), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:30:19,232 - root - INFO - Step 8120: lr=1.00E-05, loss= 1.2630 (max= 2.2426), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:30:19,232 - root - INFO - Step 8120: lr=1.00E-05, loss= 1.2630 (max= 2.2426), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:30:19,232 - root - INFO - Step 8120: lr=1.00E-05, loss= 1.2630 (max= 2.2426), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:30:19,232 - root - INFO - Step 8120: lr=1.00E-05, loss= 1.2630 (max= 2.2426), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:30:19,233 - root - INFO - Step 8120: lr=1.00E-05, loss= 1.2630 (max= 2.2426), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:30:37,222 - root - INFO - Step 8130: lr=1.00E-05, loss= 1.2666 (max= 2.2868), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:30:37,222 - root - INFO - Step 8130: lr=1.00E-05, loss= 1.2666 (max= 2.2868), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:30:37,222 - root - INFO - Step 8130: lr=1.00E-05, loss= 1.2666 (max= 2.2868), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:30:37,222 - root - INFO - Step 8130: lr=1.00E-05, loss= 1.2666 (max= 2.2868), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:30:37,222 - root - INFO - Step 8130: lr=1.00E-05, loss= 1.2666 (max= 2.2868), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:30:37,222 - root - INFO - Step 8130: lr=1.00E-05, loss= 1.2666 (max= 2.2868), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:30:37,222 - root - INFO - Step 8130: lr=1.00E-05, loss= 1.2666 (max= 2.2868), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:30:37,222 - root - INFO - Step 8130: lr=1.00E-05, loss= 1.2666 (max= 2.2868), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:30:55,216 - root - INFO - Step 8140: lr=1.00E-05, loss= 1.2344 (max= 2.4769), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:30:55,216 - root - INFO - Step 8140: lr=1.00E-05, loss= 1.2344 (max= 2.4769), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:30:55,217 - root - INFO - Step 8140: lr=1.00E-05, loss= 1.2344 (max= 2.4769), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:30:55,217 - root - INFO - Step 8140: lr=1.00E-05, loss= 1.2344 (max= 2.4769), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:30:55,217 - root - INFO - Step 8140: lr=1.00E-05, loss= 1.2344 (max= 2.4769), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:30:55,217 - root - INFO - Step 8140: lr=1.00E-05, loss= 1.2344 (max= 2.4769), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:30:55,217 - root - INFO - Step 8140: lr=1.00E-05, loss= 1.2344 (max= 2.4769), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:30:55,217 - root - INFO - Step 8140: lr=1.00E-05, loss= 1.2344 (max= 2.4769), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:13,242 - root - INFO - Step 8150: lr=1.00E-05, loss= 1.2885 (max= 2.1676), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:13,242 - root - INFO - Step 8150: lr=1.00E-05, loss= 1.2885 (max= 2.1676), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:13,242 - root - INFO - Step 8150: lr=1.00E-05, loss= 1.2885 (max= 2.1676), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:13,242 - root - INFO - Step 8150: lr=1.00E-05, loss= 1.2885 (max= 2.1676), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:13,242 - root - INFO - Step 8150: lr=1.00E-05, loss= 1.2885 (max= 2.1676), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:13,242 - root - INFO - Step 8150: lr=1.00E-05, loss= 1.2885 (max= 2.1676), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:13,242 - root - INFO - Step 8150: lr=1.00E-05, loss= 1.2885 (max= 2.1676), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:13,243 - root - INFO - Step 8150: lr=1.00E-05, loss= 1.2885 (max= 2.1676), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:31,259 - root - INFO - Step 8160: lr=1.00E-05, loss= 1.2406 (max= 2.7247), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:31,259 - root - INFO - Step 8160: lr=1.00E-05, loss= 1.2406 (max= 2.7247), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:31,259 - root - INFO - Step 8160: lr=1.00E-05, loss= 1.2406 (max= 2.7247), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:31,259 - root - INFO - Step 8160: lr=1.00E-05, loss= 1.2406 (max= 2.7247), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:31,259 - root - INFO - Step 8160: lr=1.00E-05, loss= 1.2406 (max= 2.7247), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:31,259 - root - INFO - Step 8160: lr=1.00E-05, loss= 1.2406 (max= 2.7247), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:31,259 - root - INFO - Step 8160: lr=1.00E-05, loss= 1.2406 (max= 2.7247), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:31,260 - root - INFO - Step 8160: lr=1.00E-05, loss= 1.2406 (max= 2.7247), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:49,303 - root - INFO - Step 8170: lr=1.00E-05, loss= 1.2186 (max= 2.2545), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:49,303 - root - INFO - Step 8170: lr=1.00E-05, loss= 1.2186 (max= 2.2545), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:49,303 - root - INFO - Step 8170: lr=1.00E-05, loss= 1.2186 (max= 2.2545), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:49,303 - root - INFO - Step 8170: lr=1.00E-05, loss= 1.2186 (max= 2.2545), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:49,303 - root - INFO - Step 8170: lr=1.00E-05, loss= 1.2186 (max= 2.2545), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:49,303 - root - INFO - Step 8170: lr=1.00E-05, loss= 1.2186 (max= 2.2545), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:49,303 - root - INFO - Step 8170: lr=1.00E-05, loss= 1.2186 (max= 2.2545), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:31:49,304 - root - INFO - Step 8170: lr=1.00E-05, loss= 1.2186 (max= 2.2545), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:32:07,308 - root - INFO - Step 8180: lr=1.00E-05, loss= 1.2979 (max= 2.8774), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:32:07,308 - root - INFO - Step 8180: lr=1.00E-05, loss= 1.2979 (max= 2.8774), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:32:07,308 - root - INFO - Step 8180: lr=1.00E-05, loss= 1.2979 (max= 2.8774), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:32:07,308 - root - INFO - Step 8180: lr=1.00E-05, loss= 1.2979 (max= 2.8774), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:32:07,308 - root - INFO - Step 8180: lr=1.00E-05, loss= 1.2979 (max= 2.8774), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:32:07,308 - root - INFO - Step 8180: lr=1.00E-05, loss= 1.2979 (max= 2.8774), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:32:07,308 - root - INFO - Step 8180: lr=1.00E-05, loss= 1.2979 (max= 2.8774), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:32:07,308 - root - INFO - Step 8180: lr=1.00E-05, loss= 1.2979 (max= 2.8774), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:32:25,339 - root - INFO - Step 8190: lr=1.00E-05, loss= 1.2537 (max= 2.2340), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:32:25,339 - root - INFO - Step 8190: lr=1.00E-05, loss= 1.2537 (max= 2.2340), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:32:25,339 - root - INFO - Step 8190: lr=1.00E-05, loss= 1.2537 (max= 2.2340), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:32:25,339 - root - INFO - Step 8190: lr=1.00E-05, loss= 1.2537 (max= 2.2340), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:32:25,339 - root - INFO - Step 8190: lr=1.00E-05, loss= 1.2537 (max= 2.2340), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:32:25,339 - root - INFO - Step 8190: lr=1.00E-05, loss= 1.2537 (max= 2.2340), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:32:25,339 - root - INFO - Step 8190: lr=1.00E-05, loss= 1.2537 (max= 2.2340), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:32:25,339 - root - INFO - Step 8190: lr=1.00E-05, loss= 1.2537 (max= 2.2340), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:32:43,363 - root - INFO - Step 8200: lr=1.00E-05, loss= 1.2697 (max= 2.4560), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:32:43,363 - root - INFO - Step 8200: lr=1.00E-05, loss= 1.2697 (max= 2.4560), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:32:43,363 - root - INFO - Step 8200: lr=1.00E-05, loss= 1.2697 (max= 2.4560), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:32:43,363 - root - INFO - Step 8200: lr=1.00E-05, loss= 1.2697 (max= 2.4560), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:32:43,363 - root - INFO - Step 8200: lr=1.00E-05, loss= 1.2697 (max= 2.4560), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:32:43,363 - root - INFO - Step 8200: lr=1.00E-05, loss= 1.2697 (max= 2.4560), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:32:43,363 - root - INFO - Step 8200: lr=1.00E-05, loss= 1.2697 (max= 2.4560), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:32:43,363 - root - INFO - Step 8200: lr=1.00E-05, loss= 1.2697 (max= 2.4560), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:33:01,387 - root - INFO - Step 8210: lr=1.00E-05, loss= 1.2499 (max= 2.2499), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:33:01,387 - root - INFO - Step 8210: lr=1.00E-05, loss= 1.2499 (max= 2.2499), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:33:01,388 - root - INFO - Step 8210: lr=1.00E-05, loss= 1.2499 (max= 2.2499), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:33:01,388 - root - INFO - Step 8210: lr=1.00E-05, loss= 1.2499 (max= 2.2499), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:33:01,388 - root - INFO - Step 8210: lr=1.00E-05, loss= 1.2499 (max= 2.2499), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:33:01,388 - root - INFO - Step 8210: lr=1.00E-05, loss= 1.2499 (max= 2.2499), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:33:01,388 - root - INFO - Step 8210: lr=1.00E-05, loss= 1.2499 (max= 2.2499), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:33:01,388 - root - INFO - Step 8210: lr=1.00E-05, loss= 1.2499 (max= 2.2499), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:33:19,399 - root - INFO - Step 8220: lr=1.00E-05, loss= 1.2444 (max= 2.3343), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:33:19,399 - root - INFO - Step 8220: lr=1.00E-05, loss= 1.2444 (max= 2.3343), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:33:19,399 - root - INFO - Step 8220: lr=1.00E-05, loss= 1.2444 (max= 2.3343), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:33:19,399 - root - INFO - Step 8220: lr=1.00E-05, loss= 1.2444 (max= 2.3343), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:33:19,399 - root - INFO - Step 8220: lr=1.00E-05, loss= 1.2444 (max= 2.3343), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:33:19,399 - root - INFO - Step 8220: lr=1.00E-05, loss= 1.2444 (max= 2.3343), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:33:19,399 - root - INFO - Step 8220: lr=1.00E-05, loss= 1.2444 (max= 2.3343), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:33:19,399 - root - INFO - Step 8220: lr=1.00E-05, loss= 1.2444 (max= 2.3343), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:33:37,444 - root - INFO - Step 8230: lr=1.00E-05, loss= 1.2470 (max= 2.1362), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:33:37,444 - root - INFO - Step 8230: lr=1.00E-05, loss= 1.2470 (max= 2.1362), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:33:37,444 - root - INFO - Step 8230: lr=1.00E-05, loss= 1.2470 (max= 2.1362), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:33:37,444 - root - INFO - Step 8230: lr=1.00E-05, loss= 1.2470 (max= 2.1362), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:33:37,444 - root - INFO - Step 8230: lr=1.00E-05, loss= 1.2470 (max= 2.1362), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:33:37,444 - root - INFO - Step 8230: lr=1.00E-05, loss= 1.2470 (max= 2.1362), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:33:37,444 - root - INFO - Step 8230: lr=1.00E-05, loss= 1.2470 (max= 2.1362), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:33:37,444 - root - INFO - Step 8230: lr=1.00E-05, loss= 1.2470 (max= 2.1362), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:33:55,472 - root - INFO - Step 8240: lr=1.00E-05, loss= 1.2328 (max= 2.1558), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:33:55,473 - root - INFO - Step 8240: lr=1.00E-05, loss= 1.2328 (max= 2.1558), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:33:55,473 - root - INFO - Step 8240: lr=1.00E-05, loss= 1.2328 (max= 2.1558), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:33:55,473 - root - INFO - Step 8240: lr=1.00E-05, loss= 1.2328 (max= 2.1558), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:33:55,473 - root - INFO - Step 8240: lr=1.00E-05, loss= 1.2328 (max= 2.1558), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:33:55,473 - root - INFO - Step 8240: lr=1.00E-05, loss= 1.2328 (max= 2.1558), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:33:55,473 - root - INFO - Step 8240: lr=1.00E-05, loss= 1.2328 (max= 2.1558), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:33:55,473 - root - INFO - Step 8240: lr=1.00E-05, loss= 1.2328 (max= 2.1558), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:34:13,487 - root - INFO - Step 8250: lr=1.00E-05, loss= 1.2391 (max= 2.0901), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:34:13,487 - root - INFO - Step 8250: lr=1.00E-05, loss= 1.2391 (max= 2.0901), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:34:13,487 - root - INFO - Step 8250: lr=1.00E-05, loss= 1.2391 (max= 2.0901), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:34:13,487 - root - INFO - Step 8250: lr=1.00E-05, loss= 1.2391 (max= 2.0901), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:34:13,487 - root - INFO - Step 8250: lr=1.00E-05, loss= 1.2391 (max= 2.0901), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:34:13,487 - root - INFO - Step 8250: lr=1.00E-05, loss= 1.2391 (max= 2.0901), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:34:13,487 - root - INFO - Step 8250: lr=1.00E-05, loss= 1.2391 (max= 2.0901), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:34:13,487 - root - INFO - Step 8250: lr=1.00E-05, loss= 1.2391 (max= 2.0901), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:34:31,511 - root - INFO - Step 8260: lr=1.00E-05, loss= 1.2384 (max= 2.1428), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:34:31,511 - root - INFO - Step 8260: lr=1.00E-05, loss= 1.2384 (max= 2.1428), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:34:31,511 - root - INFO - Step 8260: lr=1.00E-05, loss= 1.2384 (max= 2.1428), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:34:31,511 - root - INFO - Step 8260: lr=1.00E-05, loss= 1.2384 (max= 2.1428), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:34:31,511 - root - INFO - Step 8260: lr=1.00E-05, loss= 1.2384 (max= 2.1428), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:34:31,511 - root - INFO - Step 8260: lr=1.00E-05, loss= 1.2384 (max= 2.1428), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:34:31,511 - root - INFO - Step 8260: lr=1.00E-05, loss= 1.2384 (max= 2.1428), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:34:31,511 - root - INFO - Step 8260: lr=1.00E-05, loss= 1.2384 (max= 2.1428), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:34:49,508 - root - INFO - Step 8270: lr=1.00E-05, loss= 1.2618 (max= 2.2369), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:34:49,509 - root - INFO - Step 8270: lr=1.00E-05, loss= 1.2618 (max= 2.2369), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:34:49,509 - root - INFO - Step 8270: lr=1.00E-05, loss= 1.2618 (max= 2.2369), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:34:49,509 - root - INFO - Step 8270: lr=1.00E-05, loss= 1.2618 (max= 2.2369), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:34:49,509 - root - INFO - Step 8270: lr=1.00E-05, loss= 1.2618 (max= 2.2369), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:34:49,509 - root - INFO - Step 8270: lr=1.00E-05, loss= 1.2618 (max= 2.2369), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:34:49,509 - root - INFO - Step 8270: lr=1.00E-05, loss= 1.2618 (max= 2.2369), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:34:49,509 - root - INFO - Step 8270: lr=1.00E-05, loss= 1.2618 (max= 2.2369), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:35:07,509 - root - INFO - Step 8280: lr=1.00E-05, loss= 1.2437 (max= 2.1540), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:35:07,509 - root - INFO - Step 8280: lr=1.00E-05, loss= 1.2437 (max= 2.1540), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:35:07,509 - root - INFO - Step 8280: lr=1.00E-05, loss= 1.2437 (max= 2.1540), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:35:07,509 - root - INFO - Step 8280: lr=1.00E-05, loss= 1.2437 (max= 2.1540), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:35:07,509 - root - INFO - Step 8280: lr=1.00E-05, loss= 1.2437 (max= 2.1540), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:35:07,509 - root - INFO - Step 8280: lr=1.00E-05, loss= 1.2437 (max= 2.1540), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:35:07,509 - root - INFO - Step 8280: lr=1.00E-05, loss= 1.2437 (max= 2.1540), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:35:07,509 - root - INFO - Step 8280: lr=1.00E-05, loss= 1.2437 (max= 2.1540), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:35:25,528 - root - INFO - Step 8290: lr=1.00E-05, loss= 1.2436 (max= 2.2148), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:35:25,528 - root - INFO - Step 8290: lr=1.00E-05, loss= 1.2436 (max= 2.2148), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:35:25,528 - root - INFO - Step 8290: lr=1.00E-05, loss= 1.2436 (max= 2.2148), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:35:25,528 - root - INFO - Step 8290: lr=1.00E-05, loss= 1.2436 (max= 2.2148), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:35:25,528 - root - INFO - Step 8290: lr=1.00E-05, loss= 1.2436 (max= 2.2148), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:35:25,528 - root - INFO - Step 8290: lr=1.00E-05, loss= 1.2436 (max= 2.2148), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:35:25,528 - root - INFO - Step 8290: lr=1.00E-05, loss= 1.2436 (max= 2.2148), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:35:25,528 - root - INFO - Step 8290: lr=1.00E-05, loss= 1.2436 (max= 2.2148), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:35:43,511 - root - INFO - Step 8300: lr=1.00E-05, loss= 1.2669 (max= 2.6266), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:35:43,511 - root - INFO - Step 8300: lr=1.00E-05, loss= 1.2669 (max= 2.6266), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:35:43,511 - root - INFO - Step 8300: lr=1.00E-05, loss= 1.2669 (max= 2.6266), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:35:43,511 - root - INFO - Step 8300: lr=1.00E-05, loss= 1.2669 (max= 2.6266), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:35:43,511 - root - INFO - Step 8300: lr=1.00E-05, loss= 1.2669 (max= 2.6266), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:35:43,511 - root - INFO - Step 8300: lr=1.00E-05, loss= 1.2669 (max= 2.6266), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:35:43,511 - root - INFO - Step 8300: lr=1.00E-05, loss= 1.2669 (max= 2.6266), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:35:43,512 - root - INFO - Step 8300: lr=1.00E-05, loss= 1.2669 (max= 2.6266), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:35:53,168 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:1496286 +2025-10-24 13:36:01,553 - root - INFO - Step 8310: lr=1.00E-05, loss= 1.2378 (max= 2.1462), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:01,553 - root - INFO - Step 8310: lr=1.00E-05, loss= 1.2378 (max= 2.1462), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:01,553 - root - INFO - Step 8310: lr=1.00E-05, loss= 1.2378 (max= 2.1462), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:01,553 - root - INFO - Step 8310: lr=1.00E-05, loss= 1.2378 (max= 2.1462), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:01,553 - root - INFO - Step 8310: lr=1.00E-05, loss= 1.2378 (max= 2.1462), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:01,553 - root - INFO - Step 8310: lr=1.00E-05, loss= 1.2378 (max= 2.1462), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:01,553 - root - INFO - Step 8310: lr=1.00E-05, loss= 1.2378 (max= 2.1462), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:01,553 - root - INFO - Step 8310: lr=1.00E-05, loss= 1.2378 (max= 2.1462), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:19,586 - root - INFO - Step 8320: lr=1.00E-05, loss= 1.2757 (max= 2.2328), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:19,586 - root - INFO - Step 8320: lr=1.00E-05, loss= 1.2757 (max= 2.2328), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:19,586 - root - INFO - Step 8320: lr=1.00E-05, loss= 1.2757 (max= 2.2328), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:19,586 - root - INFO - Step 8320: lr=1.00E-05, loss= 1.2757 (max= 2.2328), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:19,586 - root - INFO - Step 8320: lr=1.00E-05, loss= 1.2757 (max= 2.2328), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:19,586 - root - INFO - Step 8320: lr=1.00E-05, loss= 1.2757 (max= 2.2328), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:19,586 - root - INFO - Step 8320: lr=1.00E-05, loss= 1.2757 (max= 2.2328), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:19,587 - root - INFO - Step 8320: lr=1.00E-05, loss= 1.2757 (max= 2.2328), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:37,611 - root - INFO - Step 8330: lr=1.00E-05, loss= 1.2641 (max= 2.5164), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:36:37,611 - root - INFO - Step 8330: lr=1.00E-05, loss= 1.2641 (max= 2.5164), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:36:37,611 - root - INFO - Step 8330: lr=1.00E-05, loss= 1.2641 (max= 2.5164), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:36:37,611 - root - INFO - Step 8330: lr=1.00E-05, loss= 1.2641 (max= 2.5164), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:36:37,611 - root - INFO - Step 8330: lr=1.00E-05, loss= 1.2641 (max= 2.5164), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:36:37,611 - root - INFO - Step 8330: lr=1.00E-05, loss= 1.2641 (max= 2.5164), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:36:37,611 - root - INFO - Step 8330: lr=1.00E-05, loss= 1.2641 (max= 2.5164), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:36:37,612 - root - INFO - Step 8330: lr=1.00E-05, loss= 1.2641 (max= 2.5164), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:36:55,603 - root - INFO - Step 8340: lr=1.00E-05, loss= 1.2390 (max= 2.1661), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:55,603 - root - INFO - Step 8340: lr=1.00E-05, loss= 1.2390 (max= 2.1661), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:55,603 - root - INFO - Step 8340: lr=1.00E-05, loss= 1.2390 (max= 2.1661), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:55,603 - root - INFO - Step 8340: lr=1.00E-05, loss= 1.2390 (max= 2.1661), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:55,603 - root - INFO - Step 8340: lr=1.00E-05, loss= 1.2390 (max= 2.1661), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:55,603 - root - INFO - Step 8340: lr=1.00E-05, loss= 1.2390 (max= 2.1661), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:55,603 - root - INFO - Step 8340: lr=1.00E-05, loss= 1.2390 (max= 2.1661), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:36:55,604 - root - INFO - Step 8340: lr=1.00E-05, loss= 1.2390 (max= 2.1661), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:13,642 - root - INFO - Step 8350: lr=1.00E-05, loss= 1.2370 (max= 2.3565), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:13,642 - root - INFO - Step 8350: lr=1.00E-05, loss= 1.2370 (max= 2.3565), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:13,642 - root - INFO - Step 8350: lr=1.00E-05, loss= 1.2370 (max= 2.3565), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:13,642 - root - INFO - Step 8350: lr=1.00E-05, loss= 1.2370 (max= 2.3565), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:13,642 - root - INFO - Step 8350: lr=1.00E-05, loss= 1.2370 (max= 2.3565), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:13,642 - root - INFO - Step 8350: lr=1.00E-05, loss= 1.2370 (max= 2.3565), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:13,642 - root - INFO - Step 8350: lr=1.00E-05, loss= 1.2370 (max= 2.3565), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:13,642 - root - INFO - Step 8350: lr=1.00E-05, loss= 1.2370 (max= 2.3565), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:31,640 - root - INFO - Step 8360: lr=1.00E-05, loss= 1.2294 (max= 2.7966), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:31,640 - root - INFO - Step 8360: lr=1.00E-05, loss= 1.2294 (max= 2.7966), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:31,640 - root - INFO - Step 8360: lr=1.00E-05, loss= 1.2294 (max= 2.7966), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:31,640 - root - INFO - Step 8360: lr=1.00E-05, loss= 1.2294 (max= 2.7966), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:31,640 - root - INFO - Step 8360: lr=1.00E-05, loss= 1.2294 (max= 2.7966), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:31,640 - root - INFO - Step 8360: lr=1.00E-05, loss= 1.2294 (max= 2.7966), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:31,640 - root - INFO - Step 8360: lr=1.00E-05, loss= 1.2294 (max= 2.7966), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:31,641 - root - INFO - Step 8360: lr=1.00E-05, loss= 1.2294 (max= 2.7966), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:49,628 - root - INFO - Step 8370: lr=1.00E-05, loss= 1.2807 (max= 2.2513), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:49,629 - root - INFO - Step 8370: lr=1.00E-05, loss= 1.2807 (max= 2.2513), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:49,629 - root - INFO - Step 8370: lr=1.00E-05, loss= 1.2807 (max= 2.2513), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:49,629 - root - INFO - Step 8370: lr=1.00E-05, loss= 1.2807 (max= 2.2513), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:49,629 - root - INFO - Step 8370: lr=1.00E-05, loss= 1.2807 (max= 2.2513), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:49,629 - root - INFO - Step 8370: lr=1.00E-05, loss= 1.2807 (max= 2.2513), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:49,629 - root - INFO - Step 8370: lr=1.00E-05, loss= 1.2807 (max= 2.2513), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:37:49,629 - root - INFO - Step 8370: lr=1.00E-05, loss= 1.2807 (max= 2.2513), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:07,645 - root - INFO - Step 8380: lr=1.00E-05, loss= 1.2639 (max= 2.0913), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:07,645 - root - INFO - Step 8380: lr=1.00E-05, loss= 1.2639 (max= 2.0913), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:07,645 - root - INFO - Step 8380: lr=1.00E-05, loss= 1.2639 (max= 2.0913), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:07,646 - root - INFO - Step 8380: lr=1.00E-05, loss= 1.2639 (max= 2.0913), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:07,646 - root - INFO - Step 8380: lr=1.00E-05, loss= 1.2639 (max= 2.0913), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:07,646 - root - INFO - Step 8380: lr=1.00E-05, loss= 1.2639 (max= 2.0913), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:07,646 - root - INFO - Step 8380: lr=1.00E-05, loss= 1.2639 (max= 2.0913), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:07,646 - root - INFO - Step 8380: lr=1.00E-05, loss= 1.2639 (max= 2.0913), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:25,688 - root - INFO - Step 8390: lr=1.00E-05, loss= 1.2371 (max= 2.4878), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:25,688 - root - INFO - Step 8390: lr=1.00E-05, loss= 1.2371 (max= 2.4878), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:25,688 - root - INFO - Step 8390: lr=1.00E-05, loss= 1.2371 (max= 2.4878), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:25,688 - root - INFO - Step 8390: lr=1.00E-05, loss= 1.2371 (max= 2.4878), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:25,688 - root - INFO - Step 8390: lr=1.00E-05, loss= 1.2371 (max= 2.4878), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:25,688 - root - INFO - Step 8390: lr=1.00E-05, loss= 1.2371 (max= 2.4878), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:25,688 - root - INFO - Step 8390: lr=1.00E-05, loss= 1.2371 (max= 2.4878), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:25,689 - root - INFO - Step 8390: lr=1.00E-05, loss= 1.2371 (max= 2.4878), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:43,678 - root - INFO - Step 8400: lr=1.00E-05, loss= 1.2531 (max= 2.1989), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:43,678 - root - INFO - Step 8400: lr=1.00E-05, loss= 1.2531 (max= 2.1989), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:43,678 - root - INFO - Step 8400: lr=1.00E-05, loss= 1.2531 (max= 2.1989), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:43,678 - root - INFO - Step 8400: lr=1.00E-05, loss= 1.2531 (max= 2.1989), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:43,678 - root - INFO - Step 8400: lr=1.00E-05, loss= 1.2531 (max= 2.1989), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:43,678 - root - INFO - Step 8400: lr=1.00E-05, loss= 1.2531 (max= 2.1989), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:43,678 - root - INFO - Step 8400: lr=1.00E-05, loss= 1.2531 (max= 2.1989), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:43,679 - root - INFO - Step 8400: lr=1.00E-05, loss= 1.2531 (max= 2.1989), tps=18218, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:38:43,683 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:3935636 +2025-10-24 13:39:01,705 - root - INFO - Step 8410: lr=1.00E-05, loss= 1.2557 (max= 2.1859), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:01,706 - root - INFO - Step 8410: lr=1.00E-05, loss= 1.2557 (max= 2.1859), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:01,706 - root - INFO - Step 8410: lr=1.00E-05, loss= 1.2557 (max= 2.1859), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:01,706 - root - INFO - Step 8410: lr=1.00E-05, loss= 1.2557 (max= 2.1859), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:01,706 - root - INFO - Step 8410: lr=1.00E-05, loss= 1.2557 (max= 2.1859), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:01,706 - root - INFO - Step 8410: lr=1.00E-05, loss= 1.2557 (max= 2.1859), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:01,706 - root - INFO - Step 8410: lr=1.00E-05, loss= 1.2557 (max= 2.1859), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:01,706 - root - INFO - Step 8410: lr=1.00E-05, loss= 1.2557 (max= 2.1859), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:19,746 - root - INFO - Step 8420: lr=1.00E-05, loss= 1.2679 (max= 2.5889), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:19,747 - root - INFO - Step 8420: lr=1.00E-05, loss= 1.2679 (max= 2.5889), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:19,747 - root - INFO - Step 8420: lr=1.00E-05, loss= 1.2679 (max= 2.5889), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:19,747 - root - INFO - Step 8420: lr=1.00E-05, loss= 1.2679 (max= 2.5889), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:19,747 - root - INFO - Step 8420: lr=1.00E-05, loss= 1.2679 (max= 2.5889), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:19,747 - root - INFO - Step 8420: lr=1.00E-05, loss= 1.2679 (max= 2.5889), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:19,747 - root - INFO - Step 8420: lr=1.00E-05, loss= 1.2679 (max= 2.5889), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:19,747 - root - INFO - Step 8420: lr=1.00E-05, loss= 1.2679 (max= 2.5889), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:37,776 - root - INFO - Step 8430: lr=1.00E-05, loss= 1.2640 (max= 2.6057), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:37,777 - root - INFO - Step 8430: lr=1.00E-05, loss= 1.2640 (max= 2.6057), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:37,777 - root - INFO - Step 8430: lr=1.00E-05, loss= 1.2640 (max= 2.6057), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:37,777 - root - INFO - Step 8430: lr=1.00E-05, loss= 1.2640 (max= 2.6057), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:37,777 - root - INFO - Step 8430: lr=1.00E-05, loss= 1.2640 (max= 2.6057), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:37,777 - root - INFO - Step 8430: lr=1.00E-05, loss= 1.2640 (max= 2.6057), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:37,777 - root - INFO - Step 8430: lr=1.00E-05, loss= 1.2640 (max= 2.6057), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:37,777 - root - INFO - Step 8430: lr=1.00E-05, loss= 1.2640 (max= 2.6057), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:55,788 - root - INFO - Step 8440: lr=1.00E-05, loss= 1.2646 (max= 2.2845), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:55,789 - root - INFO - Step 8440: lr=1.00E-05, loss= 1.2646 (max= 2.2845), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:55,789 - root - INFO - Step 8440: lr=1.00E-05, loss= 1.2646 (max= 2.2845), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:55,789 - root - INFO - Step 8440: lr=1.00E-05, loss= 1.2646 (max= 2.2845), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:55,789 - root - INFO - Step 8440: lr=1.00E-05, loss= 1.2646 (max= 2.2845), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:55,789 - root - INFO - Step 8440: lr=1.00E-05, loss= 1.2646 (max= 2.2845), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:55,789 - root - INFO - Step 8440: lr=1.00E-05, loss= 1.2646 (max= 2.2845), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:39:55,789 - root - INFO - Step 8440: lr=1.00E-05, loss= 1.2646 (max= 2.2845), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:40:13,818 - root - INFO - Step 8450: lr=1.00E-05, loss= 1.2488 (max= 2.1008), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:40:13,818 - root - INFO - Step 8450: lr=1.00E-05, loss= 1.2488 (max= 2.1008), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:40:13,818 - root - INFO - Step 8450: lr=1.00E-05, loss= 1.2488 (max= 2.1008), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:40:13,818 - root - INFO - Step 8450: lr=1.00E-05, loss= 1.2488 (max= 2.1008), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:40:13,818 - root - INFO - Step 8450: lr=1.00E-05, loss= 1.2488 (max= 2.1008), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:40:13,818 - root - INFO - Step 8450: lr=1.00E-05, loss= 1.2488 (max= 2.1008), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:40:13,818 - root - INFO - Step 8450: lr=1.00E-05, loss= 1.2488 (max= 2.1008), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:40:13,818 - root - INFO - Step 8450: lr=1.00E-05, loss= 1.2488 (max= 2.1008), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:40:31,824 - root - INFO - Step 8460: lr=1.00E-05, loss= 1.2502 (max= 2.5980), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:40:31,824 - root - INFO - Step 8460: lr=1.00E-05, loss= 1.2502 (max= 2.5980), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:40:31,824 - root - INFO - Step 8460: lr=1.00E-05, loss= 1.2502 (max= 2.5980), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:40:31,824 - root - INFO - Step 8460: lr=1.00E-05, loss= 1.2502 (max= 2.5980), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:40:31,824 - root - INFO - Step 8460: lr=1.00E-05, loss= 1.2502 (max= 2.5980), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:40:31,824 - root - INFO - Step 8460: lr=1.00E-05, loss= 1.2502 (max= 2.5980), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:40:31,824 - root - INFO - Step 8460: lr=1.00E-05, loss= 1.2502 (max= 2.5980), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:40:31,824 - root - INFO - Step 8460: lr=1.00E-05, loss= 1.2502 (max= 2.5980), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:40:49,886 - root - INFO - Step 8470: lr=1.00E-05, loss= 1.2479 (max= 2.1707), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:40:49,886 - root - INFO - Step 8470: lr=1.00E-05, loss= 1.2479 (max= 2.1707), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:40:49,886 - root - INFO - Step 8470: lr=1.00E-05, loss= 1.2479 (max= 2.1707), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:40:49,886 - root - INFO - Step 8470: lr=1.00E-05, loss= 1.2479 (max= 2.1707), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:40:49,886 - root - INFO - Step 8470: lr=1.00E-05, loss= 1.2479 (max= 2.1707), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:40:49,886 - root - INFO - Step 8470: lr=1.00E-05, loss= 1.2479 (max= 2.1707), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:40:49,886 - root - INFO - Step 8470: lr=1.00E-05, loss= 1.2479 (max= 2.1707), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:40:49,886 - root - INFO - Step 8470: lr=1.00E-05, loss= 1.2479 (max= 2.1707), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:41:07,903 - root - INFO - Step 8480: lr=1.00E-05, loss= 1.2637 (max= 2.6735), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:07,903 - root - INFO - Step 8480: lr=1.00E-05, loss= 1.2637 (max= 2.6735), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:07,903 - root - INFO - Step 8480: lr=1.00E-05, loss= 1.2637 (max= 2.6735), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:07,903 - root - INFO - Step 8480: lr=1.00E-05, loss= 1.2637 (max= 2.6735), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:07,903 - root - INFO - Step 8480: lr=1.00E-05, loss= 1.2637 (max= 2.6735), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:07,903 - root - INFO - Step 8480: lr=1.00E-05, loss= 1.2637 (max= 2.6735), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:07,903 - root - INFO - Step 8480: lr=1.00E-05, loss= 1.2637 (max= 2.6735), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:07,903 - root - INFO - Step 8480: lr=1.00E-05, loss= 1.2637 (max= 2.6735), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:25,949 - root - INFO - Step 8490: lr=1.00E-05, loss= 1.2565 (max= 2.6524), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:25,949 - root - INFO - Step 8490: lr=1.00E-05, loss= 1.2565 (max= 2.6524), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:25,949 - root - INFO - Step 8490: lr=1.00E-05, loss= 1.2565 (max= 2.6524), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:25,949 - root - INFO - Step 8490: lr=1.00E-05, loss= 1.2565 (max= 2.6524), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:25,949 - root - INFO - Step 8490: lr=1.00E-05, loss= 1.2565 (max= 2.6524), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:25,949 - root - INFO - Step 8490: lr=1.00E-05, loss= 1.2565 (max= 2.6524), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:25,949 - root - INFO - Step 8490: lr=1.00E-05, loss= 1.2565 (max= 2.6524), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:25,949 - root - INFO - Step 8490: lr=1.00E-05, loss= 1.2565 (max= 2.6524), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:43,960 - root - INFO - Step 8500: lr=1.00E-05, loss= 1.2494 (max= 2.4540), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:43,960 - root - INFO - Step 8500: lr=1.00E-05, loss= 1.2494 (max= 2.4540), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:43,960 - root - INFO - Step 8500: lr=1.00E-05, loss= 1.2494 (max= 2.4540), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:43,960 - root - INFO - Step 8500: lr=1.00E-05, loss= 1.2494 (max= 2.4540), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:43,960 - root - INFO - Step 8500: lr=1.00E-05, loss= 1.2494 (max= 2.4540), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:43,960 - root - INFO - Step 8500: lr=1.00E-05, loss= 1.2494 (max= 2.4540), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:43,960 - root - INFO - Step 8500: lr=1.00E-05, loss= 1.2494 (max= 2.4540), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:41:43,960 - root - INFO - Step 8500: lr=1.00E-05, loss= 1.2494 (max= 2.4540), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:01,989 - root - INFO - Step 8510: lr=1.00E-05, loss= 1.2747 (max= 2.5015), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:42:01,989 - root - INFO - Step 8510: lr=1.00E-05, loss= 1.2747 (max= 2.5015), tps=18178, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:42:01,989 - root - INFO - Step 8510: lr=1.00E-05, loss= 1.2747 (max= 2.5015), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:42:01,989 - root - INFO - Step 8510: lr=1.00E-05, loss= 1.2747 (max= 2.5015), tps=18178, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:42:01,989 - root - INFO - Step 8510: lr=1.00E-05, loss= 1.2747 (max= 2.5015), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:42:01,989 - root - INFO - Step 8510: lr=1.00E-05, loss= 1.2747 (max= 2.5015), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:42:01,989 - root - INFO - Step 8510: lr=1.00E-05, loss= 1.2747 (max= 2.5015), tps=18178, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:42:01,990 - root - INFO - Step 8510: lr=1.00E-05, loss= 1.2747 (max= 2.5015), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:42:20,052 - root - INFO - Step 8520: lr=1.00E-05, loss= 1.2353 (max= 2.2956), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:20,052 - root - INFO - Step 8520: lr=1.00E-05, loss= 1.2353 (max= 2.2956), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:20,052 - root - INFO - Step 8520: lr=1.00E-05, loss= 1.2353 (max= 2.2956), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:20,052 - root - INFO - Step 8520: lr=1.00E-05, loss= 1.2353 (max= 2.2956), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:20,052 - root - INFO - Step 8520: lr=1.00E-05, loss= 1.2353 (max= 2.2956), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:20,052 - root - INFO - Step 8520: lr=1.00E-05, loss= 1.2353 (max= 2.2956), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:20,053 - root - INFO - Step 8520: lr=1.00E-05, loss= 1.2353 (max= 2.2956), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:20,053 - root - INFO - Step 8520: lr=1.00E-05, loss= 1.2353 (max= 2.2956), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:38,085 - root - INFO - Step 8530: lr=1.00E-05, loss= 1.2391 (max= 2.3415), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:38,085 - root - INFO - Step 8530: lr=1.00E-05, loss= 1.2391 (max= 2.3415), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:38,085 - root - INFO - Step 8530: lr=1.00E-05, loss= 1.2391 (max= 2.3415), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:38,085 - root - INFO - Step 8530: lr=1.00E-05, loss= 1.2391 (max= 2.3415), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:38,085 - root - INFO - Step 8530: lr=1.00E-05, loss= 1.2391 (max= 2.3415), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:38,085 - root - INFO - Step 8530: lr=1.00E-05, loss= 1.2391 (max= 2.3415), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:38,085 - root - INFO - Step 8530: lr=1.00E-05, loss= 1.2391 (max= 2.3415), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:38,085 - root - INFO - Step 8530: lr=1.00E-05, loss= 1.2391 (max= 2.3415), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:56,104 - root - INFO - Step 8540: lr=1.00E-05, loss= 1.2373 (max= 2.0483), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:56,104 - root - INFO - Step 8540: lr=1.00E-05, loss= 1.2373 (max= 2.0483), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:56,104 - root - INFO - Step 8540: lr=1.00E-05, loss= 1.2373 (max= 2.0483), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:56,104 - root - INFO - Step 8540: lr=1.00E-05, loss= 1.2373 (max= 2.0483), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:56,104 - root - INFO - Step 8540: lr=1.00E-05, loss= 1.2373 (max= 2.0483), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:56,104 - root - INFO - Step 8540: lr=1.00E-05, loss= 1.2373 (max= 2.0483), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:56,104 - root - INFO - Step 8540: lr=1.00E-05, loss= 1.2373 (max= 2.0483), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:42:56,104 - root - INFO - Step 8540: lr=1.00E-05, loss= 1.2373 (max= 2.0483), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:14,131 - root - INFO - Step 8550: lr=1.00E-05, loss= 1.2372 (max= 2.0694), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:14,131 - root - INFO - Step 8550: lr=1.00E-05, loss= 1.2372 (max= 2.0694), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:14,131 - root - INFO - Step 8550: lr=1.00E-05, loss= 1.2372 (max= 2.0694), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:14,131 - root - INFO - Step 8550: lr=1.00E-05, loss= 1.2372 (max= 2.0694), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:14,131 - root - INFO - Step 8550: lr=1.00E-05, loss= 1.2372 (max= 2.0694), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:14,131 - root - INFO - Step 8550: lr=1.00E-05, loss= 1.2372 (max= 2.0694), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:14,131 - root - INFO - Step 8550: lr=1.00E-05, loss= 1.2372 (max= 2.0694), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:14,132 - root - INFO - Step 8550: lr=1.00E-05, loss= 1.2372 (max= 2.0694), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:32,182 - root - INFO - Step 8560: lr=1.00E-05, loss= 1.2117 (max= 2.0323), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:32,182 - root - INFO - Step 8560: lr=1.00E-05, loss= 1.2117 (max= 2.0323), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:32,182 - root - INFO - Step 8560: lr=1.00E-05, loss= 1.2117 (max= 2.0323), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:32,182 - root - INFO - Step 8560: lr=1.00E-05, loss= 1.2117 (max= 2.0323), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:32,182 - root - INFO - Step 8560: lr=1.00E-05, loss= 1.2117 (max= 2.0323), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:32,182 - root - INFO - Step 8560: lr=1.00E-05, loss= 1.2117 (max= 2.0323), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:32,182 - root - INFO - Step 8560: lr=1.00E-05, loss= 1.2117 (max= 2.0323), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:32,183 - root - INFO - Step 8560: lr=1.00E-05, loss= 1.2117 (max= 2.0323), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:50,180 - root - INFO - Step 8570: lr=1.00E-05, loss= 1.2201 (max= 2.2071), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:50,181 - root - INFO - Step 8570: lr=1.00E-05, loss= 1.2201 (max= 2.2071), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:50,181 - root - INFO - Step 8570: lr=1.00E-05, loss= 1.2201 (max= 2.2071), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:50,181 - root - INFO - Step 8570: lr=1.00E-05, loss= 1.2201 (max= 2.2071), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:50,181 - root - INFO - Step 8570: lr=1.00E-05, loss= 1.2201 (max= 2.2071), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:50,181 - root - INFO - Step 8570: lr=1.00E-05, loss= 1.2201 (max= 2.2071), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:50,181 - root - INFO - Step 8570: lr=1.00E-05, loss= 1.2201 (max= 2.2071), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:43:50,181 - root - INFO - Step 8570: lr=1.00E-05, loss= 1.2201 (max= 2.2071), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:08,204 - root - INFO - Step 8580: lr=1.00E-05, loss= 1.2253 (max= 2.3437), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:08,204 - root - INFO - Step 8580: lr=1.00E-05, loss= 1.2253 (max= 2.3437), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:08,205 - root - INFO - Step 8580: lr=1.00E-05, loss= 1.2253 (max= 2.3437), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:08,205 - root - INFO - Step 8580: lr=1.00E-05, loss= 1.2253 (max= 2.3437), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:08,205 - root - INFO - Step 8580: lr=1.00E-05, loss= 1.2253 (max= 2.3437), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:08,205 - root - INFO - Step 8580: lr=1.00E-05, loss= 1.2253 (max= 2.3437), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:08,205 - root - INFO - Step 8580: lr=1.00E-05, loss= 1.2253 (max= 2.3437), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:08,205 - root - INFO - Step 8580: lr=1.00E-05, loss= 1.2253 (max= 2.3437), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:26,224 - root - INFO - Step 8590: lr=1.00E-05, loss= 1.2446 (max= 2.4152), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:26,224 - root - INFO - Step 8590: lr=1.00E-05, loss= 1.2446 (max= 2.4152), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:26,224 - root - INFO - Step 8590: lr=1.00E-05, loss= 1.2446 (max= 2.4152), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:26,224 - root - INFO - Step 8590: lr=1.00E-05, loss= 1.2446 (max= 2.4152), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:26,224 - root - INFO - Step 8590: lr=1.00E-05, loss= 1.2446 (max= 2.4152), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:26,224 - root - INFO - Step 8590: lr=1.00E-05, loss= 1.2446 (max= 2.4152), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:26,224 - root - INFO - Step 8590: lr=1.00E-05, loss= 1.2446 (max= 2.4152), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:26,224 - root - INFO - Step 8590: lr=1.00E-05, loss= 1.2446 (max= 2.4152), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:44,277 - root - INFO - Step 8600: lr=1.00E-05, loss= 1.2559 (max= 2.1492), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:44,278 - root - INFO - Step 8600: lr=1.00E-05, loss= 1.2559 (max= 2.1492), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:44,278 - root - INFO - Step 8600: lr=1.00E-05, loss= 1.2559 (max= 2.1492), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:44,278 - root - INFO - Step 8600: lr=1.00E-05, loss= 1.2559 (max= 2.1492), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:44,278 - root - INFO - Step 8600: lr=1.00E-05, loss= 1.2559 (max= 2.1492), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:44,278 - root - INFO - Step 8600: lr=1.00E-05, loss= 1.2559 (max= 2.1492), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:44,278 - root - INFO - Step 8600: lr=1.00E-05, loss= 1.2559 (max= 2.1492), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:44:44,278 - root - INFO - Step 8600: lr=1.00E-05, loss= 1.2559 (max= 2.1492), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:02,294 - root - INFO - Step 8610: lr=1.00E-05, loss= 1.2385 (max= 2.6309), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:02,294 - root - INFO - Step 8610: lr=1.00E-05, loss= 1.2385 (max= 2.6309), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:02,294 - root - INFO - Step 8610: lr=1.00E-05, loss= 1.2385 (max= 2.6309), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:02,294 - root - INFO - Step 8610: lr=1.00E-05, loss= 1.2385 (max= 2.6309), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:02,295 - root - INFO - Step 8610: lr=1.00E-05, loss= 1.2385 (max= 2.6309), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:02,295 - root - INFO - Step 8610: lr=1.00E-05, loss= 1.2385 (max= 2.6309), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:02,295 - root - INFO - Step 8610: lr=1.00E-05, loss= 1.2385 (max= 2.6309), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:02,295 - root - INFO - Step 8610: lr=1.00E-05, loss= 1.2385 (max= 2.6309), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:20,347 - root - INFO - Step 8620: lr=1.00E-05, loss= 1.1950 (max= 2.0829), tps=18154, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:20,347 - root - INFO - Step 8620: lr=1.00E-05, loss= 1.1950 (max= 2.0829), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:20,347 - root - INFO - Step 8620: lr=1.00E-05, loss= 1.1950 (max= 2.0829), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:20,347 - root - INFO - Step 8620: lr=1.00E-05, loss= 1.1950 (max= 2.0829), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:20,347 - root - INFO - Step 8620: lr=1.00E-05, loss= 1.1950 (max= 2.0829), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:20,347 - root - INFO - Step 8620: lr=1.00E-05, loss= 1.1950 (max= 2.0829), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:20,347 - root - INFO - Step 8620: lr=1.00E-05, loss= 1.1950 (max= 2.0829), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:20,347 - root - INFO - Step 8620: lr=1.00E-05, loss= 1.1950 (max= 2.0829), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:38,371 - root - INFO - Step 8630: lr=1.00E-05, loss= 1.2616 (max= 2.1403), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:38,371 - root - INFO - Step 8630: lr=1.00E-05, loss= 1.2616 (max= 2.1403), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:38,371 - root - INFO - Step 8630: lr=1.00E-05, loss= 1.2616 (max= 2.1403), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:38,371 - root - INFO - Step 8630: lr=1.00E-05, loss= 1.2616 (max= 2.1403), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:38,371 - root - INFO - Step 8630: lr=1.00E-05, loss= 1.2616 (max= 2.1403), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:38,371 - root - INFO - Step 8630: lr=1.00E-05, loss= 1.2616 (max= 2.1403), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:38,371 - root - INFO - Step 8630: lr=1.00E-05, loss= 1.2616 (max= 2.1403), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:38,371 - root - INFO - Step 8630: lr=1.00E-05, loss= 1.2616 (max= 2.1403), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:56,384 - root - INFO - Step 8640: lr=1.00E-05, loss= 1.2645 (max= 2.4591), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:56,385 - root - INFO - Step 8640: lr=1.00E-05, loss= 1.2645 (max= 2.4591), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:56,385 - root - INFO - Step 8640: lr=1.00E-05, loss= 1.2645 (max= 2.4591), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:56,385 - root - INFO - Step 8640: lr=1.00E-05, loss= 1.2645 (max= 2.4591), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:56,385 - root - INFO - Step 8640: lr=1.00E-05, loss= 1.2645 (max= 2.4591), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:56,385 - root - INFO - Step 8640: lr=1.00E-05, loss= 1.2645 (max= 2.4591), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:56,385 - root - INFO - Step 8640: lr=1.00E-05, loss= 1.2645 (max= 2.4591), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:45:56,385 - root - INFO - Step 8640: lr=1.00E-05, loss= 1.2645 (max= 2.4591), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:14,420 - root - INFO - Step 8650: lr=1.00E-05, loss= 1.2347 (max= 2.9113), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:14,420 - root - INFO - Step 8650: lr=1.00E-05, loss= 1.2347 (max= 2.9113), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:14,420 - root - INFO - Step 8650: lr=1.00E-05, loss= 1.2347 (max= 2.9113), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:14,420 - root - INFO - Step 8650: lr=1.00E-05, loss= 1.2347 (max= 2.9113), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:14,420 - root - INFO - Step 8650: lr=1.00E-05, loss= 1.2347 (max= 2.9113), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:14,420 - root - INFO - Step 8650: lr=1.00E-05, loss= 1.2347 (max= 2.9113), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:14,420 - root - INFO - Step 8650: lr=1.00E-05, loss= 1.2347 (max= 2.9113), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:14,420 - root - INFO - Step 8650: lr=1.00E-05, loss= 1.2347 (max= 2.9113), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:32,436 - root - INFO - Step 8660: lr=1.00E-05, loss= 1.2291 (max= 2.3867), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:32,436 - root - INFO - Step 8660: lr=1.00E-05, loss= 1.2291 (max= 2.3867), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:32,436 - root - INFO - Step 8660: lr=1.00E-05, loss= 1.2291 (max= 2.3867), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:32,436 - root - INFO - Step 8660: lr=1.00E-05, loss= 1.2291 (max= 2.3867), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:32,436 - root - INFO - Step 8660: lr=1.00E-05, loss= 1.2291 (max= 2.3867), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:32,436 - root - INFO - Step 8660: lr=1.00E-05, loss= 1.2291 (max= 2.3867), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:32,436 - root - INFO - Step 8660: lr=1.00E-05, loss= 1.2291 (max= 2.3867), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:32,436 - root - INFO - Step 8660: lr=1.00E-05, loss= 1.2291 (max= 2.3867), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:50,468 - root - INFO - Step 8670: lr=1.00E-05, loss= 1.2624 (max= 3.6887), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:50,468 - root - INFO - Step 8670: lr=1.00E-05, loss= 1.2624 (max= 3.6887), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:50,468 - root - INFO - Step 8670: lr=1.00E-05, loss= 1.2624 (max= 3.6887), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:50,468 - root - INFO - Step 8670: lr=1.00E-05, loss= 1.2624 (max= 3.6887), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:50,468 - root - INFO - Step 8670: lr=1.00E-05, loss= 1.2624 (max= 3.6887), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:50,468 - root - INFO - Step 8670: lr=1.00E-05, loss= 1.2624 (max= 3.6887), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:50,468 - root - INFO - Step 8670: lr=1.00E-05, loss= 1.2624 (max= 3.6887), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:46:50,468 - root - INFO - Step 8670: lr=1.00E-05, loss= 1.2624 (max= 3.6887), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:47:05,098 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:5435190 +2025-10-24 13:47:08,496 - root - INFO - Step 8680: lr=1.00E-05, loss= 1.2264 (max= 3.4029), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:47:08,496 - root - INFO - Step 8680: lr=1.00E-05, loss= 1.2264 (max= 3.4029), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:47:08,496 - root - INFO - Step 8680: lr=1.00E-05, loss= 1.2264 (max= 3.4029), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:47:08,496 - root - INFO - Step 8680: lr=1.00E-05, loss= 1.2264 (max= 3.4029), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:47:08,496 - root - INFO - Step 8680: lr=1.00E-05, loss= 1.2264 (max= 3.4029), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:47:08,496 - root - INFO - Step 8680: lr=1.00E-05, loss= 1.2264 (max= 3.4029), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:47:08,496 - root - INFO - Step 8680: lr=1.00E-05, loss= 1.2264 (max= 3.4029), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:47:08,496 - root - INFO - Step 8680: lr=1.00E-05, loss= 1.2264 (max= 3.4029), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:47:29,771 - root - INFO - Step 8690: lr=1.00E-05, loss= 1.2358 (max= 2.3184), tps=15404, mfu=32.10%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.09s, 16.45%) +2025-10-24 13:47:29,771 - root - INFO - Step 8690: lr=1.00E-05, loss= 1.2358 (max= 2.3184), tps=15405, mfu=32.10%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.09s, 16.45%) +2025-10-24 13:47:29,771 - root - INFO - Step 8690: lr=1.00E-05, loss= 1.2358 (max= 2.3184), tps=15404, mfu=32.10%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.09s, 16.45%) +2025-10-24 13:47:29,771 - root - INFO - Step 8690: lr=1.00E-05, loss= 1.2358 (max= 2.3184), tps=15404, mfu=32.10%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.09s, 16.45%) +2025-10-24 13:47:29,771 - root - INFO - Step 8690: lr=1.00E-05, loss= 1.2358 (max= 2.3184), tps=15404, mfu=32.10%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.09s, 16.45%) +2025-10-24 13:47:29,771 - root - INFO - Step 8690: lr=1.00E-05, loss= 1.2358 (max= 2.3184), tps=15404, mfu=32.10%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.09s, 16.45%) +2025-10-24 13:47:29,771 - root - INFO - Step 8690: lr=1.00E-05, loss= 1.2358 (max= 2.3184), tps=15404, mfu=32.10%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.09s, 16.45%) +2025-10-24 13:47:29,772 - root - INFO - Step 8690: lr=1.00E-05, loss= 1.2358 (max= 2.3184), tps=15404, mfu=32.10%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.09s, 16.45%) +2025-10-24 13:47:47,823 - root - INFO - Step 8700: lr=1.00E-05, loss= 1.2372 (max= 2.2488), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:47:47,823 - root - INFO - Step 8700: lr=1.00E-05, loss= 1.2372 (max= 2.2488), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:47:47,824 - root - INFO - Step 8700: lr=1.00E-05, loss= 1.2372 (max= 2.2488), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:47:47,824 - root - INFO - Step 8700: lr=1.00E-05, loss= 1.2372 (max= 2.2488), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:47:47,824 - root - INFO - Step 8700: lr=1.00E-05, loss= 1.2372 (max= 2.2488), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:47:47,824 - root - INFO - Step 8700: lr=1.00E-05, loss= 1.2372 (max= 2.2488), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:47:47,824 - root - INFO - Step 8700: lr=1.00E-05, loss= 1.2372 (max= 2.2488), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:47:47,824 - root - INFO - Step 8700: lr=1.00E-05, loss= 1.2372 (max= 2.2488), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:48:05,849 - root - INFO - Step 8710: lr=1.00E-05, loss= 1.2537 (max= 2.2084), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:48:05,849 - root - INFO - Step 8710: lr=1.00E-05, loss= 1.2537 (max= 2.2084), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:48:05,849 - root - INFO - Step 8710: lr=1.00E-05, loss= 1.2537 (max= 2.2084), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:48:05,849 - root - INFO - Step 8710: lr=1.00E-05, loss= 1.2537 (max= 2.2084), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:48:05,849 - root - INFO - Step 8710: lr=1.00E-05, loss= 1.2537 (max= 2.2084), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:48:05,849 - root - INFO - Step 8710: lr=1.00E-05, loss= 1.2537 (max= 2.2084), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:48:05,849 - root - INFO - Step 8710: lr=1.00E-05, loss= 1.2537 (max= 2.2084), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:48:05,849 - root - INFO - Step 8710: lr=1.00E-05, loss= 1.2537 (max= 2.2084), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:48:23,863 - root - INFO - Step 8720: lr=1.00E-05, loss= 1.2761 (max= 2.3038), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:48:23,863 - root - INFO - Step 8720: lr=1.00E-05, loss= 1.2761 (max= 2.3038), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:48:23,864 - root - INFO - Step 8720: lr=1.00E-05, loss= 1.2761 (max= 2.3038), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:48:23,864 - root - INFO - Step 8720: lr=1.00E-05, loss= 1.2761 (max= 2.3038), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:48:23,864 - root - INFO - Step 8720: lr=1.00E-05, loss= 1.2761 (max= 2.3038), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:48:23,864 - root - INFO - Step 8720: lr=1.00E-05, loss= 1.2761 (max= 2.3038), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:48:23,864 - root - INFO - Step 8720: lr=1.00E-05, loss= 1.2761 (max= 2.3038), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:48:23,864 - root - INFO - Step 8720: lr=1.00E-05, loss= 1.2761 (max= 2.3038), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:48:41,889 - root - INFO - Step 8730: lr=1.00E-05, loss= 1.2703 (max= 2.0945), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:48:41,889 - root - INFO - Step 8730: lr=1.00E-05, loss= 1.2703 (max= 2.0945), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:48:41,890 - root - INFO - Step 8730: lr=1.00E-05, loss= 1.2703 (max= 2.0945), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:48:41,890 - root - INFO - Step 8730: lr=1.00E-05, loss= 1.2703 (max= 2.0945), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:48:41,890 - root - INFO - Step 8730: lr=1.00E-05, loss= 1.2703 (max= 2.0945), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:48:41,890 - root - INFO - Step 8730: lr=1.00E-05, loss= 1.2703 (max= 2.0945), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:48:41,890 - root - INFO - Step 8730: lr=1.00E-05, loss= 1.2703 (max= 2.0945), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:48:41,890 - root - INFO - Step 8730: lr=1.00E-05, loss= 1.2703 (max= 2.0945), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:48:59,895 - root - INFO - Step 8740: lr=1.00E-05, loss= 1.2099 (max= 2.0316), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:48:59,895 - root - INFO - Step 8740: lr=1.00E-05, loss= 1.2099 (max= 2.0316), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:48:59,895 - root - INFO - Step 8740: lr=1.00E-05, loss= 1.2099 (max= 2.0316), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:48:59,895 - root - INFO - Step 8740: lr=1.00E-05, loss= 1.2099 (max= 2.0316), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:48:59,895 - root - INFO - Step 8740: lr=1.00E-05, loss= 1.2099 (max= 2.0316), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:48:59,895 - root - INFO - Step 8740: lr=1.00E-05, loss= 1.2099 (max= 2.0316), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:48:59,895 - root - INFO - Step 8740: lr=1.00E-05, loss= 1.2099 (max= 2.0316), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:48:59,895 - root - INFO - Step 8740: lr=1.00E-05, loss= 1.2099 (max= 2.0316), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:49:17,918 - root - INFO - Step 8750: lr=1.00E-05, loss= 1.2471 (max= 2.1161), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:49:17,918 - root - INFO - Step 8750: lr=1.00E-05, loss= 1.2471 (max= 2.1161), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:49:17,919 - root - INFO - Step 8750: lr=1.00E-05, loss= 1.2471 (max= 2.1161), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:49:17,919 - root - INFO - Step 8750: lr=1.00E-05, loss= 1.2471 (max= 2.1161), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:49:17,919 - root - INFO - Step 8750: lr=1.00E-05, loss= 1.2471 (max= 2.1161), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:49:17,919 - root - INFO - Step 8750: lr=1.00E-05, loss= 1.2471 (max= 2.1161), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:49:17,919 - root - INFO - Step 8750: lr=1.00E-05, loss= 1.2471 (max= 2.1161), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:49:17,919 - root - INFO - Step 8750: lr=1.00E-05, loss= 1.2471 (max= 2.1161), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:49:35,964 - root - INFO - Step 8760: lr=1.00E-05, loss= 1.2384 (max= 3.2943), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:49:35,964 - root - INFO - Step 8760: lr=1.00E-05, loss= 1.2384 (max= 3.2943), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:49:35,964 - root - INFO - Step 8760: lr=1.00E-05, loss= 1.2384 (max= 3.2943), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:49:35,964 - root - INFO - Step 8760: lr=1.00E-05, loss= 1.2384 (max= 3.2943), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:49:35,964 - root - INFO - Step 8760: lr=1.00E-05, loss= 1.2384 (max= 3.2943), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:49:35,964 - root - INFO - Step 8760: lr=1.00E-05, loss= 1.2384 (max= 3.2943), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:49:35,964 - root - INFO - Step 8760: lr=1.00E-05, loss= 1.2384 (max= 3.2943), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:49:35,964 - root - INFO - Step 8760: lr=1.00E-05, loss= 1.2384 (max= 3.2943), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:49:54,006 - root - INFO - Step 8770: lr=1.00E-05, loss= 1.2671 (max= 2.5782), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:49:54,006 - root - INFO - Step 8770: lr=1.00E-05, loss= 1.2671 (max= 2.5782), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:49:54,006 - root - INFO - Step 8770: lr=1.00E-05, loss= 1.2671 (max= 2.5782), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:49:54,006 - root - INFO - Step 8770: lr=1.00E-05, loss= 1.2671 (max= 2.5782), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:49:54,006 - root - INFO - Step 8770: lr=1.00E-05, loss= 1.2671 (max= 2.5782), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:49:54,006 - root - INFO - Step 8770: lr=1.00E-05, loss= 1.2671 (max= 2.5782), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:49:54,007 - root - INFO - Step 8770: lr=1.00E-05, loss= 1.2671 (max= 2.5782), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:49:54,007 - root - INFO - Step 8770: lr=1.00E-05, loss= 1.2671 (max= 2.5782), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:12,025 - root - INFO - Step 8780: lr=1.00E-05, loss= 1.2372 (max= 3.6287), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:12,025 - root - INFO - Step 8780: lr=1.00E-05, loss= 1.2372 (max= 3.6287), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:12,025 - root - INFO - Step 8780: lr=1.00E-05, loss= 1.2372 (max= 3.6287), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:12,025 - root - INFO - Step 8780: lr=1.00E-05, loss= 1.2372 (max= 3.6287), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:12,026 - root - INFO - Step 8780: lr=1.00E-05, loss= 1.2372 (max= 3.6287), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:12,026 - root - INFO - Step 8780: lr=1.00E-05, loss= 1.2372 (max= 3.6287), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:12,026 - root - INFO - Step 8780: lr=1.00E-05, loss= 1.2372 (max= 3.6287), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:12,026 - root - INFO - Step 8780: lr=1.00E-05, loss= 1.2372 (max= 3.6287), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:30,059 - root - INFO - Step 8790: lr=1.00E-05, loss= 1.2203 (max= 2.1010), tps=18174, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:30,060 - root - INFO - Step 8790: lr=1.00E-05, loss= 1.2203 (max= 2.1010), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:30,060 - root - INFO - Step 8790: lr=1.00E-05, loss= 1.2203 (max= 2.1010), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:30,060 - root - INFO - Step 8790: lr=1.00E-05, loss= 1.2203 (max= 2.1010), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:30,060 - root - INFO - Step 8790: lr=1.00E-05, loss= 1.2203 (max= 2.1010), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:30,060 - root - INFO - Step 8790: lr=1.00E-05, loss= 1.2203 (max= 2.1010), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:30,060 - root - INFO - Step 8790: lr=1.00E-05, loss= 1.2203 (max= 2.1010), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:30,060 - root - INFO - Step 8790: lr=1.00E-05, loss= 1.2203 (max= 2.1010), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:48,105 - root - INFO - Step 8800: lr=1.00E-05, loss= 1.2719 (max= 3.5509), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:48,105 - root - INFO - Step 8800: lr=1.00E-05, loss= 1.2719 (max= 3.5509), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:48,105 - root - INFO - Step 8800: lr=1.00E-05, loss= 1.2719 (max= 3.5509), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:48,105 - root - INFO - Step 8800: lr=1.00E-05, loss= 1.2719 (max= 3.5509), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:48,105 - root - INFO - Step 8800: lr=1.00E-05, loss= 1.2719 (max= 3.5509), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:48,105 - root - INFO - Step 8800: lr=1.00E-05, loss= 1.2719 (max= 3.5509), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:48,105 - root - INFO - Step 8800: lr=1.00E-05, loss= 1.2719 (max= 3.5509), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:50:48,106 - root - INFO - Step 8800: lr=1.00E-05, loss= 1.2719 (max= 3.5509), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:06,119 - root - INFO - Step 8810: lr=1.00E-05, loss= 1.2371 (max= 2.4192), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:06,119 - root - INFO - Step 8810: lr=1.00E-05, loss= 1.2371 (max= 2.4192), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:06,119 - root - INFO - Step 8810: lr=1.00E-05, loss= 1.2371 (max= 2.4192), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:06,119 - root - INFO - Step 8810: lr=1.00E-05, loss= 1.2371 (max= 2.4192), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:06,119 - root - INFO - Step 8810: lr=1.00E-05, loss= 1.2371 (max= 2.4192), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:06,119 - root - INFO - Step 8810: lr=1.00E-05, loss= 1.2371 (max= 2.4192), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:06,120 - root - INFO - Step 8810: lr=1.00E-05, loss= 1.2371 (max= 2.4192), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:06,120 - root - INFO - Step 8810: lr=1.00E-05, loss= 1.2371 (max= 2.4192), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:24,104 - root - INFO - Step 8820: lr=1.00E-05, loss= 1.2684 (max= 3.9370), tps=18223, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:24,104 - root - INFO - Step 8820: lr=1.00E-05, loss= 1.2684 (max= 3.9370), tps=18223, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:24,104 - root - INFO - Step 8820: lr=1.00E-05, loss= 1.2684 (max= 3.9370), tps=18223, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:24,104 - root - INFO - Step 8820: lr=1.00E-05, loss= 1.2684 (max= 3.9370), tps=18223, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:24,104 - root - INFO - Step 8820: lr=1.00E-05, loss= 1.2684 (max= 3.9370), tps=18223, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:24,104 - root - INFO - Step 8820: lr=1.00E-05, loss= 1.2684 (max= 3.9370), tps=18223, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:24,104 - root - INFO - Step 8820: lr=1.00E-05, loss= 1.2684 (max= 3.9370), tps=18223, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:24,105 - root - INFO - Step 8820: lr=1.00E-05, loss= 1.2684 (max= 3.9370), tps=18223, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:42,138 - root - INFO - Step 8830: lr=1.00E-05, loss= 1.2842 (max= 3.7284), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:42,138 - root - INFO - Step 8830: lr=1.00E-05, loss= 1.2842 (max= 3.7284), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:42,139 - root - INFO - Step 8830: lr=1.00E-05, loss= 1.2842 (max= 3.7284), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:42,139 - root - INFO - Step 8830: lr=1.00E-05, loss= 1.2842 (max= 3.7284), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:42,139 - root - INFO - Step 8830: lr=1.00E-05, loss= 1.2842 (max= 3.7284), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:42,139 - root - INFO - Step 8830: lr=1.00E-05, loss= 1.2842 (max= 3.7284), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:42,139 - root - INFO - Step 8830: lr=1.00E-05, loss= 1.2842 (max= 3.7284), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:51:42,139 - root - INFO - Step 8830: lr=1.00E-05, loss= 1.2842 (max= 3.7284), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:00,139 - root - INFO - Step 8840: lr=1.00E-05, loss= 1.2375 (max= 2.1120), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:52:00,139 - root - INFO - Step 8840: lr=1.00E-05, loss= 1.2375 (max= 2.1120), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:52:00,139 - root - INFO - Step 8840: lr=1.00E-05, loss= 1.2375 (max= 2.1120), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:52:00,139 - root - INFO - Step 8840: lr=1.00E-05, loss= 1.2375 (max= 2.1120), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:52:00,139 - root - INFO - Step 8840: lr=1.00E-05, loss= 1.2375 (max= 2.1120), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:52:00,139 - root - INFO - Step 8840: lr=1.00E-05, loss= 1.2375 (max= 2.1120), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:52:00,140 - root - INFO - Step 8840: lr=1.00E-05, loss= 1.2375 (max= 2.1120), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:52:00,140 - root - INFO - Step 8840: lr=1.00E-05, loss= 1.2375 (max= 2.1120), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:52:18,166 - root - INFO - Step 8850: lr=1.00E-05, loss= 1.2322 (max= 2.4231), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:18,166 - root - INFO - Step 8850: lr=1.00E-05, loss= 1.2322 (max= 2.4231), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:18,166 - root - INFO - Step 8850: lr=1.00E-05, loss= 1.2322 (max= 2.4231), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:18,167 - root - INFO - Step 8850: lr=1.00E-05, loss= 1.2322 (max= 2.4231), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:18,167 - root - INFO - Step 8850: lr=1.00E-05, loss= 1.2322 (max= 2.4231), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:18,167 - root - INFO - Step 8850: lr=1.00E-05, loss= 1.2322 (max= 2.4231), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:18,167 - root - INFO - Step 8850: lr=1.00E-05, loss= 1.2322 (max= 2.4231), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:18,167 - root - INFO - Step 8850: lr=1.00E-05, loss= 1.2322 (max= 2.4231), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:36,226 - root - INFO - Step 8860: lr=1.00E-05, loss= 1.2798 (max= 2.4665), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:36,226 - root - INFO - Step 8860: lr=1.00E-05, loss= 1.2798 (max= 2.4665), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:36,226 - root - INFO - Step 8860: lr=1.00E-05, loss= 1.2798 (max= 2.4665), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:36,226 - root - INFO - Step 8860: lr=1.00E-05, loss= 1.2798 (max= 2.4665), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:36,226 - root - INFO - Step 8860: lr=1.00E-05, loss= 1.2798 (max= 2.4665), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:36,226 - root - INFO - Step 8860: lr=1.00E-05, loss= 1.2798 (max= 2.4665), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:36,226 - root - INFO - Step 8860: lr=1.00E-05, loss= 1.2798 (max= 2.4665), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:36,226 - root - INFO - Step 8860: lr=1.00E-05, loss= 1.2798 (max= 2.4665), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:54,257 - root - INFO - Step 8870: lr=1.00E-05, loss= 1.2662 (max= 2.0988), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:54,257 - root - INFO - Step 8870: lr=1.00E-05, loss= 1.2662 (max= 2.0988), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:54,257 - root - INFO - Step 8870: lr=1.00E-05, loss= 1.2662 (max= 2.0988), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:54,257 - root - INFO - Step 8870: lr=1.00E-05, loss= 1.2662 (max= 2.0988), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:54,257 - root - INFO - Step 8870: lr=1.00E-05, loss= 1.2662 (max= 2.0988), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:54,257 - root - INFO - Step 8870: lr=1.00E-05, loss= 1.2662 (max= 2.0988), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:54,257 - root - INFO - Step 8870: lr=1.00E-05, loss= 1.2662 (max= 2.0988), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:52:54,257 - root - INFO - Step 8870: lr=1.00E-05, loss= 1.2662 (max= 2.0988), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:53:12,268 - root - INFO - Step 8880: lr=1.00E-05, loss= 1.2274 (max= 2.2131), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:53:12,268 - root - INFO - Step 8880: lr=1.00E-05, loss= 1.2274 (max= 2.2131), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:53:12,268 - root - INFO - Step 8880: lr=1.00E-05, loss= 1.2274 (max= 2.2131), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:53:12,268 - root - INFO - Step 8880: lr=1.00E-05, loss= 1.2274 (max= 2.2131), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:53:12,268 - root - INFO - Step 8880: lr=1.00E-05, loss= 1.2274 (max= 2.2131), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:53:12,268 - root - INFO - Step 8880: lr=1.00E-05, loss= 1.2274 (max= 2.2131), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:53:12,268 - root - INFO - Step 8880: lr=1.00E-05, loss= 1.2274 (max= 2.2131), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:53:12,268 - root - INFO - Step 8880: lr=1.00E-05, loss= 1.2274 (max= 2.2131), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:53:34,822 - root - INFO - Step 8890: lr=1.00E-05, loss= 1.2387 (max= 1.9315), tps=14531, mfu=30.28%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.12s, 21.22%) +2025-10-24 13:53:34,822 - root - INFO - Step 8890: lr=1.00E-05, loss= 1.2387 (max= 1.9315), tps=14531, mfu=30.28%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.12s, 21.22%) +2025-10-24 13:53:34,822 - root - INFO - Step 8890: lr=1.00E-05, loss= 1.2387 (max= 1.9315), tps=14531, mfu=30.28%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.12s, 21.22%) +2025-10-24 13:53:34,822 - root - INFO - Step 8890: lr=1.00E-05, loss= 1.2387 (max= 1.9315), tps=14531, mfu=30.28%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.12s, 21.22%) +2025-10-24 13:53:34,822 - root - INFO - Step 8890: lr=1.00E-05, loss= 1.2387 (max= 1.9315), tps=14531, mfu=30.28%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.12s, 21.22%) +2025-10-24 13:53:34,823 - root - INFO - Step 8890: lr=1.00E-05, loss= 1.2387 (max= 1.9315), tps=14531, mfu=30.28%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.12s, 21.22%) +2025-10-24 13:53:34,823 - root - INFO - Step 8890: lr=1.00E-05, loss= 1.2387 (max= 1.9315), tps=14531, mfu=30.28%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.12s, 21.22%) +2025-10-24 13:53:34,823 - root - INFO - Step 8890: lr=1.00E-05, loss= 1.2387 (max= 1.9315), tps=14531, mfu=30.28%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.12s, 21.22%) +2025-10-24 13:53:52,830 - root - INFO - Step 8900: lr=1.00E-05, loss= 1.2629 (max= 2.3873), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:53:52,831 - root - INFO - Step 8900: lr=1.00E-05, loss= 1.2629 (max= 2.3873), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:53:52,831 - root - INFO - Step 8900: lr=1.00E-05, loss= 1.2629 (max= 2.3873), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:53:52,831 - root - INFO - Step 8900: lr=1.00E-05, loss= 1.2629 (max= 2.3873), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:53:52,831 - root - INFO - Step 8900: lr=1.00E-05, loss= 1.2629 (max= 2.3873), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:53:52,831 - root - INFO - Step 8900: lr=1.00E-05, loss= 1.2629 (max= 2.3873), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:53:52,831 - root - INFO - Step 8900: lr=1.00E-05, loss= 1.2629 (max= 2.3873), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:53:52,832 - root - INFO - Step 8900: lr=1.00E-05, loss= 1.2629 (max= 2.3873), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:54:10,860 - root - INFO - Step 8910: lr=1.00E-05, loss= 1.2417 (max= 3.4334), tps=18178, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:54:10,860 - root - INFO - Step 8910: lr=1.00E-05, loss= 1.2417 (max= 3.4334), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:54:10,860 - root - INFO - Step 8910: lr=1.00E-05, loss= 1.2417 (max= 3.4334), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:54:10,860 - root - INFO - Step 8910: lr=1.00E-05, loss= 1.2417 (max= 3.4334), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:54:10,860 - root - INFO - Step 8910: lr=1.00E-05, loss= 1.2417 (max= 3.4334), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:54:10,861 - root - INFO - Step 8910: lr=1.00E-05, loss= 1.2417 (max= 3.4334), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:54:10,861 - root - INFO - Step 8910: lr=1.00E-05, loss= 1.2417 (max= 3.4334), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:54:10,861 - root - INFO - Step 8910: lr=1.00E-05, loss= 1.2417 (max= 3.4334), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:54:28,898 - root - INFO - Step 8920: lr=1.00E-05, loss= 1.2335 (max= 2.1565), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:54:28,898 - root - INFO - Step 8920: lr=1.00E-05, loss= 1.2335 (max= 2.1565), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:54:28,898 - root - INFO - Step 8920: lr=1.00E-05, loss= 1.2335 (max= 2.1565), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:54:28,899 - root - INFO - Step 8920: lr=1.00E-05, loss= 1.2335 (max= 2.1565), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:54:28,899 - root - INFO - Step 8920: lr=1.00E-05, loss= 1.2335 (max= 2.1565), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:54:28,899 - root - INFO - Step 8920: lr=1.00E-05, loss= 1.2335 (max= 2.1565), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:54:28,899 - root - INFO - Step 8920: lr=1.00E-05, loss= 1.2335 (max= 2.1565), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:54:28,899 - root - INFO - Step 8920: lr=1.00E-05, loss= 1.2335 (max= 2.1565), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:54:46,953 - root - INFO - Step 8930: lr=1.00E-05, loss= 1.2494 (max= 2.6165), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:54:46,953 - root - INFO - Step 8930: lr=1.00E-05, loss= 1.2494 (max= 2.6165), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:54:46,953 - root - INFO - Step 8930: lr=1.00E-05, loss= 1.2494 (max= 2.6165), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:54:46,953 - root - INFO - Step 8930: lr=1.00E-05, loss= 1.2494 (max= 2.6165), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:54:46,953 - root - INFO - Step 8930: lr=1.00E-05, loss= 1.2494 (max= 2.6165), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:54:46,953 - root - INFO - Step 8930: lr=1.00E-05, loss= 1.2494 (max= 2.6165), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:54:46,954 - root - INFO - Step 8930: lr=1.00E-05, loss= 1.2494 (max= 2.6165), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:54:46,954 - root - INFO - Step 8930: lr=1.00E-05, loss= 1.2494 (max= 2.6165), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:04,990 - root - INFO - Step 8940: lr=1.00E-05, loss= 1.2609 (max= 2.3968), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:04,991 - root - INFO - Step 8940: lr=1.00E-05, loss= 1.2609 (max= 2.3968), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:04,991 - root - INFO - Step 8940: lr=1.00E-05, loss= 1.2609 (max= 2.3968), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:04,991 - root - INFO - Step 8940: lr=1.00E-05, loss= 1.2609 (max= 2.3968), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:04,991 - root - INFO - Step 8940: lr=1.00E-05, loss= 1.2609 (max= 2.3968), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:04,991 - root - INFO - Step 8940: lr=1.00E-05, loss= 1.2609 (max= 2.3968), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:04,991 - root - INFO - Step 8940: lr=1.00E-05, loss= 1.2609 (max= 2.3968), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:04,991 - root - INFO - Step 8940: lr=1.00E-05, loss= 1.2609 (max= 2.3968), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:22,997 - root - INFO - Step 8950: lr=1.00E-05, loss= 1.2357 (max= 2.1958), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:22,997 - root - INFO - Step 8950: lr=1.00E-05, loss= 1.2357 (max= 2.1958), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:22,997 - root - INFO - Step 8950: lr=1.00E-05, loss= 1.2357 (max= 2.1958), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:22,997 - root - INFO - Step 8950: lr=1.00E-05, loss= 1.2357 (max= 2.1958), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:22,997 - root - INFO - Step 8950: lr=1.00E-05, loss= 1.2357 (max= 2.1958), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:22,997 - root - INFO - Step 8950: lr=1.00E-05, loss= 1.2357 (max= 2.1958), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:22,997 - root - INFO - Step 8950: lr=1.00E-05, loss= 1.2357 (max= 2.1958), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:22,997 - root - INFO - Step 8950: lr=1.00E-05, loss= 1.2357 (max= 2.1958), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:41,052 - root - INFO - Step 8960: lr=1.00E-05, loss= 1.2649 (max= 2.0740), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:41,052 - root - INFO - Step 8960: lr=1.00E-05, loss= 1.2649 (max= 2.0740), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:41,052 - root - INFO - Step 8960: lr=1.00E-05, loss= 1.2649 (max= 2.0740), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:41,052 - root - INFO - Step 8960: lr=1.00E-05, loss= 1.2649 (max= 2.0740), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:41,052 - root - INFO - Step 8960: lr=1.00E-05, loss= 1.2649 (max= 2.0740), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:41,052 - root - INFO - Step 8960: lr=1.00E-05, loss= 1.2649 (max= 2.0740), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:41,052 - root - INFO - Step 8960: lr=1.00E-05, loss= 1.2649 (max= 2.0740), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:41,053 - root - INFO - Step 8960: lr=1.00E-05, loss= 1.2649 (max= 2.0740), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:59,064 - root - INFO - Step 8970: lr=1.00E-05, loss= 1.2531 (max= 2.0074), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:59,064 - root - INFO - Step 8970: lr=1.00E-05, loss= 1.2531 (max= 2.0074), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:59,064 - root - INFO - Step 8970: lr=1.00E-05, loss= 1.2531 (max= 2.0074), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:59,064 - root - INFO - Step 8970: lr=1.00E-05, loss= 1.2531 (max= 2.0074), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:59,064 - root - INFO - Step 8970: lr=1.00E-05, loss= 1.2531 (max= 2.0074), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:59,064 - root - INFO - Step 8970: lr=1.00E-05, loss= 1.2531 (max= 2.0074), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:59,064 - root - INFO - Step 8970: lr=1.00E-05, loss= 1.2531 (max= 2.0074), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:55:59,064 - root - INFO - Step 8970: lr=1.00E-05, loss= 1.2531 (max= 2.0074), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:17,102 - root - INFO - Step 8980: lr=1.00E-05, loss= 1.2840 (max= 2.6762), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:17,102 - root - INFO - Step 8980: lr=1.00E-05, loss= 1.2840 (max= 2.6762), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:17,102 - root - INFO - Step 8980: lr=1.00E-05, loss= 1.2840 (max= 2.6762), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:17,102 - root - INFO - Step 8980: lr=1.00E-05, loss= 1.2840 (max= 2.6762), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:17,102 - root - INFO - Step 8980: lr=1.00E-05, loss= 1.2840 (max= 2.6762), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:17,102 - root - INFO - Step 8980: lr=1.00E-05, loss= 1.2840 (max= 2.6762), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:17,102 - root - INFO - Step 8980: lr=1.00E-05, loss= 1.2840 (max= 2.6762), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:17,103 - root - INFO - Step 8980: lr=1.00E-05, loss= 1.2840 (max= 2.6762), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:35,115 - root - INFO - Step 8990: lr=1.00E-05, loss= 1.2767 (max= 3.7328), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:35,115 - root - INFO - Step 8990: lr=1.00E-05, loss= 1.2767 (max= 3.7328), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:35,115 - root - INFO - Step 8990: lr=1.00E-05, loss= 1.2767 (max= 3.7328), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:35,115 - root - INFO - Step 8990: lr=1.00E-05, loss= 1.2767 (max= 3.7328), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:35,115 - root - INFO - Step 8990: lr=1.00E-05, loss= 1.2767 (max= 3.7328), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:35,115 - root - INFO - Step 8990: lr=1.00E-05, loss= 1.2767 (max= 3.7328), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:35,115 - root - INFO - Step 8990: lr=1.00E-05, loss= 1.2767 (max= 3.7328), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:35,116 - root - INFO - Step 8990: lr=1.00E-05, loss= 1.2767 (max= 3.7328), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-9000 +2025-10-24 13:56:53,131 - root - INFO - Step 9000: lr=1.00E-05, loss= 1.2587 (max= 2.0904), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:53,131 - root - INFO - Step 9000: lr=1.00E-05, loss= 1.2587 (max= 2.0904), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:53,131 - root - INFO - Step 9000: lr=1.00E-05, loss= 1.2587 (max= 2.0904), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:53,131 - root - INFO - Saving a full checkpoint at step 9000 +2025-10-24 13:56:53,131 - root - INFO - Saving a full checkpoint at step 9000 +2025-10-24 13:56:53,131 - root - INFO - Saving a full checkpoint at step 9000 +2025-10-24 13:56:53,131 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 13:56:53,131 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 13:56:53,131 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 13:56:53,131 - root - INFO - Step 9000: lr=1.00E-05, loss= 1.2587 (max= 2.0904), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:53,131 - root - INFO - Step 9000: lr=1.00E-05, loss= 1.2587 (max= 2.0904), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:53,131 - root - INFO - Step 9000: lr=1.00E-05, loss= 1.2587 (max= 2.0904), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:53,131 - root - INFO - Saving a full checkpoint at step 9000 +2025-10-24 13:56:53,131 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 13:56:53,131 - root - INFO - Step 9000: lr=1.00E-05, loss= 1.2587 (max= 2.0904), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:53,131 - root - INFO - Saving a full checkpoint at step 9000 +2025-10-24 13:56:53,131 - root - INFO - Saving a full checkpoint at step 9000 +2025-10-24 13:56:53,131 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 13:56:53,131 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 13:56:53,131 - root - INFO - Saving a full checkpoint at step 9000 +2025-10-24 13:56:53,131 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 13:56:53,132 - root - INFO - Step 9000: lr=1.00E-05, loss= 1.2587 (max= 2.0904), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:56:53,132 - root - INFO - Saving a full checkpoint at step 9000 +2025-10-24 13:56:53,132 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-9000! Save time: 4.506908416748047 +2025-10-24 13:57:06,636 - root - INFO - Finished saving the checkpoint in 13.50 seconds +2025-10-24 13:57:06,642 - root - INFO - Finished saving the checkpoint in 13.51 seconds +2025-10-24 13:57:06,645 - root - INFO - Finished saving the checkpoint in 13.51 seconds +2025-10-24 13:57:06,645 - root - INFO - Finished saving the checkpoint in 13.51 seconds +2025-10-24 13:57:06,646 - root - INFO - Finished saving the checkpoint in 13.52 seconds +2025-10-24 13:57:06,646 - root - INFO - Finished saving the checkpoint in 13.52 seconds +2025-10-24 13:57:06,647 - root - INFO - Finished saving the checkpoint in 13.52 seconds +2025-10-24 13:57:06,648 - root - INFO - Finished saving the checkpoint in 13.52 seconds +2025-10-24 13:57:24,621 - root - INFO - Step 9010: lr=1.00E-05, loss= 1.2806 (max= 2.1491), tps=10407, mfu=21.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:57:24,621 - root - INFO - Step 9010: lr=1.00E-05, loss= 1.2806 (max= 2.1491), tps=10407, mfu=21.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:57:24,621 - root - INFO - Step 9010: lr=1.00E-05, loss= 1.2806 (max= 2.1491), tps=10407, mfu=21.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:57:24,621 - root - INFO - Step 9010: lr=1.00E-05, loss= 1.2806 (max= 2.1491), tps=10407, mfu=21.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:57:24,621 - root - INFO - Step 9010: lr=1.00E-05, loss= 1.2806 (max= 2.1491), tps=10407, mfu=21.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:57:24,621 - root - INFO - Step 9010: lr=1.00E-05, loss= 1.2806 (max= 2.1491), tps=10407, mfu=21.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:57:24,622 - root - INFO - Step 9010: lr=1.00E-05, loss= 1.2806 (max= 2.1491), tps=10407, mfu=21.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:57:24,623 - root - INFO - Step 9010: lr=1.00E-05, loss= 1.2806 (max= 2.1491), tps=10407, mfu=21.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:57:42,629 - root - INFO - Step 9020: lr=1.00E-05, loss= 1.2610 (max= 2.0759), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:57:42,629 - root - INFO - Step 9020: lr=1.00E-05, loss= 1.2610 (max= 2.0759), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:57:42,629 - root - INFO - Step 9020: lr=1.00E-05, loss= 1.2610 (max= 2.0759), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:57:42,629 - root - INFO - Step 9020: lr=1.00E-05, loss= 1.2610 (max= 2.0759), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:57:42,629 - root - INFO - Step 9020: lr=1.00E-05, loss= 1.2610 (max= 2.0759), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:57:42,629 - root - INFO - Step 9020: lr=1.00E-05, loss= 1.2610 (max= 2.0759), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:57:42,630 - root - INFO - Step 9020: lr=1.00E-05, loss= 1.2610 (max= 2.0759), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:57:42,630 - root - INFO - Step 9020: lr=1.00E-05, loss= 1.2610 (max= 2.0759), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:00,677 - root - INFO - Step 9030: lr=1.00E-05, loss= 1.2541 (max= 3.5913), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:58:00,677 - root - INFO - Step 9030: lr=1.00E-05, loss= 1.2541 (max= 3.5913), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:58:00,677 - root - INFO - Step 9030: lr=1.00E-05, loss= 1.2541 (max= 3.5913), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:58:00,677 - root - INFO - Step 9030: lr=1.00E-05, loss= 1.2541 (max= 3.5913), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:58:00,678 - root - INFO - Step 9030: lr=1.00E-05, loss= 1.2541 (max= 3.5913), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:58:00,678 - root - INFO - Step 9030: lr=1.00E-05, loss= 1.2541 (max= 3.5913), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:58:00,678 - root - INFO - Step 9030: lr=1.00E-05, loss= 1.2541 (max= 3.5913), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:58:00,678 - root - INFO - Step 9030: lr=1.00E-05, loss= 1.2541 (max= 3.5913), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:58:18,672 - root - INFO - Step 9040: lr=1.00E-05, loss= 1.2668 (max= 2.6001), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:18,672 - root - INFO - Step 9040: lr=1.00E-05, loss= 1.2668 (max= 2.6001), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:18,672 - root - INFO - Step 9040: lr=1.00E-05, loss= 1.2668 (max= 2.6001), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:18,672 - root - INFO - Step 9040: lr=1.00E-05, loss= 1.2668 (max= 2.6001), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:18,672 - root - INFO - Step 9040: lr=1.00E-05, loss= 1.2668 (max= 2.6001), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:18,672 - root - INFO - Step 9040: lr=1.00E-05, loss= 1.2668 (max= 2.6001), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:18,672 - root - INFO - Step 9040: lr=1.00E-05, loss= 1.2668 (max= 2.6001), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:18,672 - root - INFO - Step 9040: lr=1.00E-05, loss= 1.2668 (max= 2.6001), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:36,715 - root - INFO - Step 9050: lr=1.00E-05, loss= 1.2693 (max= 2.2664), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:36,715 - root - INFO - Step 9050: lr=1.00E-05, loss= 1.2693 (max= 2.2664), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:36,715 - root - INFO - Step 9050: lr=1.00E-05, loss= 1.2693 (max= 2.2664), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:36,716 - root - INFO - Step 9050: lr=1.00E-05, loss= 1.2693 (max= 2.2664), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:36,715 - root - INFO - Step 9050: lr=1.00E-05, loss= 1.2693 (max= 2.2664), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:36,716 - root - INFO - Step 9050: lr=1.00E-05, loss= 1.2693 (max= 2.2664), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:36,716 - root - INFO - Step 9050: lr=1.00E-05, loss= 1.2693 (max= 2.2664), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:36,716 - root - INFO - Step 9050: lr=1.00E-05, loss= 1.2693 (max= 2.2664), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:54,707 - root - INFO - Step 9060: lr=1.00E-05, loss= 1.2395 (max= 2.1663), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:54,707 - root - INFO - Step 9060: lr=1.00E-05, loss= 1.2395 (max= 2.1663), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:54,708 - root - INFO - Step 9060: lr=1.00E-05, loss= 1.2395 (max= 2.1663), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:54,708 - root - INFO - Step 9060: lr=1.00E-05, loss= 1.2395 (max= 2.1663), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:54,708 - root - INFO - Step 9060: lr=1.00E-05, loss= 1.2395 (max= 2.1663), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:54,708 - root - INFO - Step 9060: lr=1.00E-05, loss= 1.2395 (max= 2.1663), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:54,708 - root - INFO - Step 9060: lr=1.00E-05, loss= 1.2395 (max= 2.1663), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:58:54,708 - root - INFO - Step 9060: lr=1.00E-05, loss= 1.2395 (max= 2.1663), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:59:12,713 - root - INFO - Step 9070: lr=1.00E-05, loss= 1.2332 (max= 2.0821), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:59:12,714 - root - INFO - Step 9070: lr=1.00E-05, loss= 1.2332 (max= 2.0821), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:59:12,714 - root - INFO - Step 9070: lr=1.00E-05, loss= 1.2332 (max= 2.0821), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:59:12,714 - root - INFO - Step 9070: lr=1.00E-05, loss= 1.2332 (max= 2.0821), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:59:12,714 - root - INFO - Step 9070: lr=1.00E-05, loss= 1.2332 (max= 2.0821), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:59:12,714 - root - INFO - Step 9070: lr=1.00E-05, loss= 1.2332 (max= 2.0821), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:59:12,714 - root - INFO - Step 9070: lr=1.00E-05, loss= 1.2332 (max= 2.0821), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:59:12,714 - root - INFO - Step 9070: lr=1.00E-05, loss= 1.2332 (max= 2.0821), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 13:59:30,740 - root - INFO - Step 9080: lr=1.00E-05, loss= 1.2396 (max= 2.1266), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:59:30,740 - root - INFO - Step 9080: lr=1.00E-05, loss= 1.2396 (max= 2.1266), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:59:30,740 - root - INFO - Step 9080: lr=1.00E-05, loss= 1.2396 (max= 2.1266), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:59:30,740 - root - INFO - Step 9080: lr=1.00E-05, loss= 1.2396 (max= 2.1266), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:59:30,740 - root - INFO - Step 9080: lr=1.00E-05, loss= 1.2396 (max= 2.1266), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:59:30,740 - root - INFO - Step 9080: lr=1.00E-05, loss= 1.2396 (max= 2.1266), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:59:30,740 - root - INFO - Step 9080: lr=1.00E-05, loss= 1.2396 (max= 2.1266), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:59:30,740 - root - INFO - Step 9080: lr=1.00E-05, loss= 1.2396 (max= 2.1266), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:59:48,778 - root - INFO - Step 9090: lr=1.00E-05, loss= 1.2708 (max= 2.2423), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:59:48,778 - root - INFO - Step 9090: lr=1.00E-05, loss= 1.2708 (max= 2.2423), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:59:48,778 - root - INFO - Step 9090: lr=1.00E-05, loss= 1.2708 (max= 2.2423), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:59:48,778 - root - INFO - Step 9090: lr=1.00E-05, loss= 1.2708 (max= 2.2423), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:59:48,778 - root - INFO - Step 9090: lr=1.00E-05, loss= 1.2708 (max= 2.2423), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:59:48,778 - root - INFO - Step 9090: lr=1.00E-05, loss= 1.2708 (max= 2.2423), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:59:48,778 - root - INFO - Step 9090: lr=1.00E-05, loss= 1.2708 (max= 2.2423), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 13:59:48,779 - root - INFO - Step 9090: lr=1.00E-05, loss= 1.2708 (max= 2.2423), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:06,804 - root - INFO - Step 9100: lr=1.00E-05, loss= 1.2383 (max= 2.3613), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:06,804 - root - INFO - Step 9100: lr=1.00E-05, loss= 1.2383 (max= 2.3613), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:06,804 - root - INFO - Step 9100: lr=1.00E-05, loss= 1.2383 (max= 2.3613), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:06,804 - root - INFO - Step 9100: lr=1.00E-05, loss= 1.2383 (max= 2.3613), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:06,804 - root - INFO - Step 9100: lr=1.00E-05, loss= 1.2383 (max= 2.3613), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:06,805 - root - INFO - Step 9100: lr=1.00E-05, loss= 1.2383 (max= 2.3613), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:06,805 - root - INFO - Step 9100: lr=1.00E-05, loss= 1.2383 (max= 2.3613), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:06,805 - root - INFO - Step 9100: lr=1.00E-05, loss= 1.2383 (max= 2.3613), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:24,832 - root - INFO - Step 9110: lr=1.00E-05, loss= 1.2612 (max= 2.3127), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:24,832 - root - INFO - Step 9110: lr=1.00E-05, loss= 1.2612 (max= 2.3127), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:24,832 - root - INFO - Step 9110: lr=1.00E-05, loss= 1.2612 (max= 2.3127), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:24,832 - root - INFO - Step 9110: lr=1.00E-05, loss= 1.2612 (max= 2.3127), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:24,832 - root - INFO - Step 9110: lr=1.00E-05, loss= 1.2612 (max= 2.3127), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:24,832 - root - INFO - Step 9110: lr=1.00E-05, loss= 1.2612 (max= 2.3127), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:24,832 - root - INFO - Step 9110: lr=1.00E-05, loss= 1.2612 (max= 2.3127), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:24,833 - root - INFO - Step 9110: lr=1.00E-05, loss= 1.2612 (max= 2.3127), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:42,870 - root - INFO - Step 9120: lr=1.00E-05, loss= 1.2737 (max= 2.4447), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:42,871 - root - INFO - Step 9120: lr=1.00E-05, loss= 1.2737 (max= 2.4447), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:42,871 - root - INFO - Step 9120: lr=1.00E-05, loss= 1.2737 (max= 2.4447), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:42,871 - root - INFO - Step 9120: lr=1.00E-05, loss= 1.2737 (max= 2.4447), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:42,871 - root - INFO - Step 9120: lr=1.00E-05, loss= 1.2737 (max= 2.4447), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:42,871 - root - INFO - Step 9120: lr=1.00E-05, loss= 1.2737 (max= 2.4447), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:42,871 - root - INFO - Step 9120: lr=1.00E-05, loss= 1.2737 (max= 2.4447), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:00:42,871 - root - INFO - Step 9120: lr=1.00E-05, loss= 1.2737 (max= 2.4447), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:00,905 - root - INFO - Step 9130: lr=1.00E-05, loss= 1.2604 (max= 2.2270), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:00,906 - root - INFO - Step 9130: lr=1.00E-05, loss= 1.2604 (max= 2.2270), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:00,906 - root - INFO - Step 9130: lr=1.00E-05, loss= 1.2604 (max= 2.2270), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:00,906 - root - INFO - Step 9130: lr=1.00E-05, loss= 1.2604 (max= 2.2270), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:00,906 - root - INFO - Step 9130: lr=1.00E-05, loss= 1.2604 (max= 2.2270), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:00,906 - root - INFO - Step 9130: lr=1.00E-05, loss= 1.2604 (max= 2.2270), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:00,906 - root - INFO - Step 9130: lr=1.00E-05, loss= 1.2604 (max= 2.2270), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:00,906 - root - INFO - Step 9130: lr=1.00E-05, loss= 1.2604 (max= 2.2270), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:18,925 - root - INFO - Step 9140: lr=1.00E-05, loss= 1.2938 (max= 2.1563), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:18,925 - root - INFO - Step 9140: lr=1.00E-05, loss= 1.2938 (max= 2.1563), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:18,925 - root - INFO - Step 9140: lr=1.00E-05, loss= 1.2938 (max= 2.1563), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:18,925 - root - INFO - Step 9140: lr=1.00E-05, loss= 1.2938 (max= 2.1563), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:18,925 - root - INFO - Step 9140: lr=1.00E-05, loss= 1.2938 (max= 2.1563), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:18,925 - root - INFO - Step 9140: lr=1.00E-05, loss= 1.2938 (max= 2.1563), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:18,925 - root - INFO - Step 9140: lr=1.00E-05, loss= 1.2938 (max= 2.1563), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:18,926 - root - INFO - Step 9140: lr=1.00E-05, loss= 1.2938 (max= 2.1563), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:36,949 - root - INFO - Step 9150: lr=1.00E-05, loss= 1.2369 (max= 2.3067), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:36,949 - root - INFO - Step 9150: lr=1.00E-05, loss= 1.2369 (max= 2.3067), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:36,949 - root - INFO - Step 9150: lr=1.00E-05, loss= 1.2369 (max= 2.3067), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:36,949 - root - INFO - Step 9150: lr=1.00E-05, loss= 1.2369 (max= 2.3067), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:36,949 - root - INFO - Step 9150: lr=1.00E-05, loss= 1.2369 (max= 2.3067), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:36,949 - root - INFO - Step 9150: lr=1.00E-05, loss= 1.2369 (max= 2.3067), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:36,949 - root - INFO - Step 9150: lr=1.00E-05, loss= 1.2369 (max= 2.3067), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:36,950 - root - INFO - Step 9150: lr=1.00E-05, loss= 1.2369 (max= 2.3067), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:54,964 - root - INFO - Step 9160: lr=1.00E-05, loss= 1.3106 (max= 2.2311), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:54,964 - root - INFO - Step 9160: lr=1.00E-05, loss= 1.3106 (max= 2.2311), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:54,964 - root - INFO - Step 9160: lr=1.00E-05, loss= 1.3106 (max= 2.2311), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:54,964 - root - INFO - Step 9160: lr=1.00E-05, loss= 1.3106 (max= 2.2311), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:54,964 - root - INFO - Step 9160: lr=1.00E-05, loss= 1.3106 (max= 2.2311), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:54,964 - root - INFO - Step 9160: lr=1.00E-05, loss= 1.3106 (max= 2.2311), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:54,964 - root - INFO - Step 9160: lr=1.00E-05, loss= 1.3106 (max= 2.2311), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:01:54,965 - root - INFO - Step 9160: lr=1.00E-05, loss= 1.3106 (max= 2.2311), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:02:12,982 - root - INFO - Step 9170: lr=1.00E-05, loss= 1.2482 (max= 3.5871), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:12,982 - root - INFO - Step 9170: lr=1.00E-05, loss= 1.2482 (max= 3.5871), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:12,982 - root - INFO - Step 9170: lr=1.00E-05, loss= 1.2482 (max= 3.5871), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:12,982 - root - INFO - Step 9170: lr=1.00E-05, loss= 1.2482 (max= 3.5871), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:12,982 - root - INFO - Step 9170: lr=1.00E-05, loss= 1.2482 (max= 3.5871), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:12,982 - root - INFO - Step 9170: lr=1.00E-05, loss= 1.2482 (max= 3.5871), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:12,982 - root - INFO - Step 9170: lr=1.00E-05, loss= 1.2482 (max= 3.5871), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:12,983 - root - INFO - Step 9170: lr=1.00E-05, loss= 1.2482 (max= 3.5871), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:30,976 - root - INFO - Step 9180: lr=1.00E-05, loss= 1.2678 (max= 3.7168), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:30,977 - root - INFO - Step 9180: lr=1.00E-05, loss= 1.2678 (max= 3.7168), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:30,977 - root - INFO - Step 9180: lr=1.00E-05, loss= 1.2678 (max= 3.7168), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:30,977 - root - INFO - Step 9180: lr=1.00E-05, loss= 1.2678 (max= 3.7168), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:30,977 - root - INFO - Step 9180: lr=1.00E-05, loss= 1.2678 (max= 3.7168), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:30,977 - root - INFO - Step 9180: lr=1.00E-05, loss= 1.2678 (max= 3.7168), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:30,977 - root - INFO - Step 9180: lr=1.00E-05, loss= 1.2678 (max= 3.7168), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:30,977 - root - INFO - Step 9180: lr=1.00E-05, loss= 1.2678 (max= 3.7168), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:48,996 - root - INFO - Step 9190: lr=1.00E-05, loss= 1.2934 (max= 2.3326), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:48,996 - root - INFO - Step 9190: lr=1.00E-05, loss= 1.2934 (max= 2.3326), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:48,996 - root - INFO - Step 9190: lr=1.00E-05, loss= 1.2934 (max= 2.3326), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:48,996 - root - INFO - Step 9190: lr=1.00E-05, loss= 1.2934 (max= 2.3326), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:48,996 - root - INFO - Step 9190: lr=1.00E-05, loss= 1.2934 (max= 2.3326), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:48,996 - root - INFO - Step 9190: lr=1.00E-05, loss= 1.2934 (max= 2.3326), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:48,996 - root - INFO - Step 9190: lr=1.00E-05, loss= 1.2934 (max= 2.3326), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:02:48,997 - root - INFO - Step 9190: lr=1.00E-05, loss= 1.2934 (max= 2.3326), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:07,037 - root - INFO - Step 9200: lr=1.00E-05, loss= 1.2719 (max= 2.2637), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:07,037 - root - INFO - Step 9200: lr=1.00E-05, loss= 1.2719 (max= 2.2637), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:07,037 - root - INFO - Step 9200: lr=1.00E-05, loss= 1.2719 (max= 2.2637), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:07,037 - root - INFO - Step 9200: lr=1.00E-05, loss= 1.2719 (max= 2.2637), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:07,037 - root - INFO - Step 9200: lr=1.00E-05, loss= 1.2719 (max= 2.2637), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:07,037 - root - INFO - Step 9200: lr=1.00E-05, loss= 1.2719 (max= 2.2637), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:07,037 - root - INFO - Step 9200: lr=1.00E-05, loss= 1.2719 (max= 2.2637), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:07,037 - root - INFO - Step 9200: lr=1.00E-05, loss= 1.2719 (max= 2.2637), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:25,087 - root - INFO - Step 9210: lr=1.00E-05, loss= 1.2479 (max= 2.2908), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:25,087 - root - INFO - Step 9210: lr=1.00E-05, loss= 1.2479 (max= 2.2908), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:25,087 - root - INFO - Step 9210: lr=1.00E-05, loss= 1.2479 (max= 2.2908), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:25,087 - root - INFO - Step 9210: lr=1.00E-05, loss= 1.2479 (max= 2.2908), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:25,087 - root - INFO - Step 9210: lr=1.00E-05, loss= 1.2479 (max= 2.2908), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:25,087 - root - INFO - Step 9210: lr=1.00E-05, loss= 1.2479 (max= 2.2908), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:25,087 - root - INFO - Step 9210: lr=1.00E-05, loss= 1.2479 (max= 2.2908), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:25,087 - root - INFO - Step 9210: lr=1.00E-05, loss= 1.2479 (max= 2.2908), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:43,139 - root - INFO - Step 9220: lr=1.00E-05, loss= 1.2495 (max= 2.1858), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:43,139 - root - INFO - Step 9220: lr=1.00E-05, loss= 1.2495 (max= 2.1858), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:43,139 - root - INFO - Step 9220: lr=1.00E-05, loss= 1.2495 (max= 2.1858), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:43,139 - root - INFO - Step 9220: lr=1.00E-05, loss= 1.2495 (max= 2.1858), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:43,139 - root - INFO - Step 9220: lr=1.00E-05, loss= 1.2495 (max= 2.1858), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:43,139 - root - INFO - Step 9220: lr=1.00E-05, loss= 1.2495 (max= 2.1858), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:43,139 - root - INFO - Step 9220: lr=1.00E-05, loss= 1.2495 (max= 2.1858), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:03:43,139 - root - INFO - Step 9220: lr=1.00E-05, loss= 1.2495 (max= 2.1858), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:04:01,133 - root - INFO - Step 9230: lr=1.00E-05, loss= 1.2796 (max= 2.3516), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:01,133 - root - INFO - Step 9230: lr=1.00E-05, loss= 1.2796 (max= 2.3516), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:01,133 - root - INFO - Step 9230: lr=1.00E-05, loss= 1.2796 (max= 2.3516), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:01,133 - root - INFO - Step 9230: lr=1.00E-05, loss= 1.2796 (max= 2.3516), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:01,133 - root - INFO - Step 9230: lr=1.00E-05, loss= 1.2796 (max= 2.3516), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:01,133 - root - INFO - Step 9230: lr=1.00E-05, loss= 1.2796 (max= 2.3516), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:01,133 - root - INFO - Step 9230: lr=1.00E-05, loss= 1.2796 (max= 2.3516), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:01,133 - root - INFO - Step 9230: lr=1.00E-05, loss= 1.2796 (max= 2.3516), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:19,163 - root - INFO - Step 9240: lr=1.00E-05, loss= 1.2593 (max= 2.3317), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:19,163 - root - INFO - Step 9240: lr=1.00E-05, loss= 1.2593 (max= 2.3317), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:19,163 - root - INFO - Step 9240: lr=1.00E-05, loss= 1.2593 (max= 2.3317), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:19,163 - root - INFO - Step 9240: lr=1.00E-05, loss= 1.2593 (max= 2.3317), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:19,163 - root - INFO - Step 9240: lr=1.00E-05, loss= 1.2593 (max= 2.3317), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:19,163 - root - INFO - Step 9240: lr=1.00E-05, loss= 1.2593 (max= 2.3317), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:19,163 - root - INFO - Step 9240: lr=1.00E-05, loss= 1.2593 (max= 2.3317), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:19,164 - root - INFO - Step 9240: lr=1.00E-05, loss= 1.2593 (max= 2.3317), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:37,199 - root - INFO - Step 9250: lr=1.00E-05, loss= 1.2691 (max= 2.5782), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:37,199 - root - INFO - Step 9250: lr=1.00E-05, loss= 1.2691 (max= 2.5782), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:37,199 - root - INFO - Step 9250: lr=1.00E-05, loss= 1.2691 (max= 2.5782), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:37,199 - root - INFO - Step 9250: lr=1.00E-05, loss= 1.2691 (max= 2.5782), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:37,199 - root - INFO - Step 9250: lr=1.00E-05, loss= 1.2691 (max= 2.5782), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:37,200 - root - INFO - Step 9250: lr=1.00E-05, loss= 1.2691 (max= 2.5782), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:37,200 - root - INFO - Step 9250: lr=1.00E-05, loss= 1.2691 (max= 2.5782), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:37,200 - root - INFO - Step 9250: lr=1.00E-05, loss= 1.2691 (max= 2.5782), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:55,192 - root - INFO - Step 9260: lr=1.00E-05, loss= 1.2593 (max= 2.3490), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:55,192 - root - INFO - Step 9260: lr=1.00E-05, loss= 1.2593 (max= 2.3490), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:55,192 - root - INFO - Step 9260: lr=1.00E-05, loss= 1.2593 (max= 2.3490), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:55,192 - root - INFO - Step 9260: lr=1.00E-05, loss= 1.2593 (max= 2.3490), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:55,192 - root - INFO - Step 9260: lr=1.00E-05, loss= 1.2593 (max= 2.3490), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:55,192 - root - INFO - Step 9260: lr=1.00E-05, loss= 1.2593 (max= 2.3490), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:55,192 - root - INFO - Step 9260: lr=1.00E-05, loss= 1.2593 (max= 2.3490), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:04:55,193 - root - INFO - Step 9260: lr=1.00E-05, loss= 1.2593 (max= 2.3490), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:13,221 - root - INFO - Step 9270: lr=1.00E-05, loss= 1.2396 (max= 2.2590), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:13,221 - root - INFO - Step 9270: lr=1.00E-05, loss= 1.2396 (max= 2.2590), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:13,221 - root - INFO - Step 9270: lr=1.00E-05, loss= 1.2396 (max= 2.2590), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:13,221 - root - INFO - Step 9270: lr=1.00E-05, loss= 1.2396 (max= 2.2590), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:13,221 - root - INFO - Step 9270: lr=1.00E-05, loss= 1.2396 (max= 2.2590), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:13,221 - root - INFO - Step 9270: lr=1.00E-05, loss= 1.2396 (max= 2.2590), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:13,221 - root - INFO - Step 9270: lr=1.00E-05, loss= 1.2396 (max= 2.2590), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:13,221 - root - INFO - Step 9270: lr=1.00E-05, loss= 1.2396 (max= 2.2590), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:31,283 - root - INFO - Step 9280: lr=1.00E-05, loss= 1.2578 (max= 2.3483), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:31,283 - root - INFO - Step 9280: lr=1.00E-05, loss= 1.2578 (max= 2.3483), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:31,283 - root - INFO - Step 9280: lr=1.00E-05, loss= 1.2578 (max= 2.3483), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:31,283 - root - INFO - Step 9280: lr=1.00E-05, loss= 1.2578 (max= 2.3483), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:31,283 - root - INFO - Step 9280: lr=1.00E-05, loss= 1.2578 (max= 2.3483), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:31,283 - root - INFO - Step 9280: lr=1.00E-05, loss= 1.2578 (max= 2.3483), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:31,283 - root - INFO - Step 9280: lr=1.00E-05, loss= 1.2578 (max= 2.3483), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:31,283 - root - INFO - Step 9280: lr=1.00E-05, loss= 1.2578 (max= 2.3483), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:49,302 - root - INFO - Step 9290: lr=1.00E-05, loss= 1.2708 (max= 2.4490), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:49,302 - root - INFO - Step 9290: lr=1.00E-05, loss= 1.2708 (max= 2.4490), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:49,302 - root - INFO - Step 9290: lr=1.00E-05, loss= 1.2708 (max= 2.4490), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:49,302 - root - INFO - Step 9290: lr=1.00E-05, loss= 1.2708 (max= 2.4490), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:49,302 - root - INFO - Step 9290: lr=1.00E-05, loss= 1.2708 (max= 2.4490), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:49,302 - root - INFO - Step 9290: lr=1.00E-05, loss= 1.2708 (max= 2.4490), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:49,302 - root - INFO - Step 9290: lr=1.00E-05, loss= 1.2708 (max= 2.4490), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:05:49,303 - root - INFO - Step 9290: lr=1.00E-05, loss= 1.2708 (max= 2.4490), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:07,340 - root - INFO - Step 9300: lr=1.00E-05, loss= 1.2537 (max= 2.2667), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:07,340 - root - INFO - Step 9300: lr=1.00E-05, loss= 1.2537 (max= 2.2667), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:07,341 - root - INFO - Step 9300: lr=1.00E-05, loss= 1.2537 (max= 2.2667), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:07,341 - root - INFO - Step 9300: lr=1.00E-05, loss= 1.2537 (max= 2.2667), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:07,341 - root - INFO - Step 9300: lr=1.00E-05, loss= 1.2537 (max= 2.2667), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:07,341 - root - INFO - Step 9300: lr=1.00E-05, loss= 1.2537 (max= 2.2667), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:07,341 - root - INFO - Step 9300: lr=1.00E-05, loss= 1.2537 (max= 2.2667), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:07,341 - root - INFO - Step 9300: lr=1.00E-05, loss= 1.2537 (max= 2.2667), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:25,340 - root - INFO - Step 9310: lr=1.00E-05, loss= 1.2968 (max= 2.1475), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:25,340 - root - INFO - Step 9310: lr=1.00E-05, loss= 1.2968 (max= 2.1475), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:25,340 - root - INFO - Step 9310: lr=1.00E-05, loss= 1.2968 (max= 2.1475), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:25,340 - root - INFO - Step 9310: lr=1.00E-05, loss= 1.2968 (max= 2.1475), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:25,340 - root - INFO - Step 9310: lr=1.00E-05, loss= 1.2968 (max= 2.1475), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:25,340 - root - INFO - Step 9310: lr=1.00E-05, loss= 1.2968 (max= 2.1475), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:25,340 - root - INFO - Step 9310: lr=1.00E-05, loss= 1.2968 (max= 2.1475), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:25,341 - root - INFO - Step 9310: lr=1.00E-05, loss= 1.2968 (max= 2.1475), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:43,372 - root - INFO - Step 9320: lr=1.00E-05, loss= 1.2731 (max= 2.2962), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:43,372 - root - INFO - Step 9320: lr=1.00E-05, loss= 1.2731 (max= 2.2962), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:43,372 - root - INFO - Step 9320: lr=1.00E-05, loss= 1.2731 (max= 2.2962), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:43,372 - root - INFO - Step 9320: lr=1.00E-05, loss= 1.2731 (max= 2.2962), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:43,372 - root - INFO - Step 9320: lr=1.00E-05, loss= 1.2731 (max= 2.2962), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:43,372 - root - INFO - Step 9320: lr=1.00E-05, loss= 1.2731 (max= 2.2962), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:43,372 - root - INFO - Step 9320: lr=1.00E-05, loss= 1.2731 (max= 2.2962), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:06:43,372 - root - INFO - Step 9320: lr=1.00E-05, loss= 1.2731 (max= 2.2962), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:01,424 - root - INFO - Step 9330: lr=1.00E-05, loss= 1.2269 (max= 2.0820), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:01,424 - root - INFO - Step 9330: lr=1.00E-05, loss= 1.2269 (max= 2.0820), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:01,424 - root - INFO - Step 9330: lr=1.00E-05, loss= 1.2269 (max= 2.0820), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:01,424 - root - INFO - Step 9330: lr=1.00E-05, loss= 1.2269 (max= 2.0820), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:01,424 - root - INFO - Step 9330: lr=1.00E-05, loss= 1.2269 (max= 2.0820), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:01,424 - root - INFO - Step 9330: lr=1.00E-05, loss= 1.2269 (max= 2.0820), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:01,424 - root - INFO - Step 9330: lr=1.00E-05, loss= 1.2269 (max= 2.0820), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:01,424 - root - INFO - Step 9330: lr=1.00E-05, loss= 1.2269 (max= 2.0820), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:19,426 - root - INFO - Step 9340: lr=1.00E-05, loss= 1.2314 (max= 2.5394), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:19,426 - root - INFO - Step 9340: lr=1.00E-05, loss= 1.2314 (max= 2.5394), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:19,426 - root - INFO - Step 9340: lr=1.00E-05, loss= 1.2314 (max= 2.5394), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:19,426 - root - INFO - Step 9340: lr=1.00E-05, loss= 1.2314 (max= 2.5394), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:19,426 - root - INFO - Step 9340: lr=1.00E-05, loss= 1.2314 (max= 2.5394), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:19,426 - root - INFO - Step 9340: lr=1.00E-05, loss= 1.2314 (max= 2.5394), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:19,426 - root - INFO - Step 9340: lr=1.00E-05, loss= 1.2314 (max= 2.5394), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:19,426 - root - INFO - Step 9340: lr=1.00E-05, loss= 1.2314 (max= 2.5394), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:37,466 - root - INFO - Step 9350: lr=1.00E-05, loss= 1.2832 (max= 2.1354), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:37,466 - root - INFO - Step 9350: lr=1.00E-05, loss= 1.2832 (max= 2.1354), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:37,466 - root - INFO - Step 9350: lr=1.00E-05, loss= 1.2832 (max= 2.1354), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:37,466 - root - INFO - Step 9350: lr=1.00E-05, loss= 1.2832 (max= 2.1354), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:37,466 - root - INFO - Step 9350: lr=1.00E-05, loss= 1.2832 (max= 2.1354), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:37,466 - root - INFO - Step 9350: lr=1.00E-05, loss= 1.2832 (max= 2.1354), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:37,466 - root - INFO - Step 9350: lr=1.00E-05, loss= 1.2832 (max= 2.1354), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:37,466 - root - INFO - Step 9350: lr=1.00E-05, loss= 1.2832 (max= 2.1354), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:55,511 - root - INFO - Step 9360: lr=1.00E-05, loss= 1.2776 (max= 2.2037), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:55,512 - root - INFO - Step 9360: lr=1.00E-05, loss= 1.2776 (max= 2.2037), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:55,512 - root - INFO - Step 9360: lr=1.00E-05, loss= 1.2776 (max= 2.2037), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:55,512 - root - INFO - Step 9360: lr=1.00E-05, loss= 1.2776 (max= 2.2037), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:55,512 - root - INFO - Step 9360: lr=1.00E-05, loss= 1.2776 (max= 2.2037), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:55,512 - root - INFO - Step 9360: lr=1.00E-05, loss= 1.2776 (max= 2.2037), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:55,512 - root - INFO - Step 9360: lr=1.00E-05, loss= 1.2776 (max= 2.2037), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:07:55,512 - root - INFO - Step 9360: lr=1.00E-05, loss= 1.2776 (max= 2.2037), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:08:13,564 - root - INFO - Step 9370: lr=1.00E-05, loss= 1.2146 (max= 2.1213), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:08:13,564 - root - INFO - Step 9370: lr=1.00E-05, loss= 1.2146 (max= 2.1213), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:08:13,564 - root - INFO - Step 9370: lr=1.00E-05, loss= 1.2146 (max= 2.1213), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:08:13,564 - root - INFO - Step 9370: lr=1.00E-05, loss= 1.2146 (max= 2.1213), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:08:13,564 - root - INFO - Step 9370: lr=1.00E-05, loss= 1.2146 (max= 2.1213), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:08:13,564 - root - INFO - Step 9370: lr=1.00E-05, loss= 1.2146 (max= 2.1213), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:08:13,564 - root - INFO - Step 9370: lr=1.00E-05, loss= 1.2146 (max= 2.1213), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:08:13,565 - root - INFO - Step 9370: lr=1.00E-05, loss= 1.2146 (max= 2.1213), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:08:40,403 - root - INFO - Step 9380: lr=1.00E-05, loss= 1.2395 (max= 2.0447), tps=12211, mfu=25.44%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.23s, 33.82%) +2025-10-24 14:08:40,403 - root - INFO - Step 9380: lr=1.00E-05, loss= 1.2395 (max= 2.0447), tps=12211, mfu=25.44%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.23s, 33.82%) +2025-10-24 14:08:40,403 - root - INFO - Step 9380: lr=1.00E-05, loss= 1.2395 (max= 2.0447), tps=12211, mfu=25.44%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.23s, 33.82%) +2025-10-24 14:08:40,403 - root - INFO - Step 9380: lr=1.00E-05, loss= 1.2395 (max= 2.0447), tps=12211, mfu=25.44%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.23s, 33.82%) +2025-10-24 14:08:40,403 - root - INFO - Step 9380: lr=1.00E-05, loss= 1.2395 (max= 2.0447), tps=12211, mfu=25.44%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.23s, 33.82%) +2025-10-24 14:08:40,403 - root - INFO - Step 9380: lr=1.00E-05, loss= 1.2395 (max= 2.0447), tps=12211, mfu=25.44%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.23s, 33.82%) +2025-10-24 14:08:40,403 - root - INFO - Step 9380: lr=1.00E-05, loss= 1.2395 (max= 2.0447), tps=12211, mfu=25.44%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.23s, 33.82%) +2025-10-24 14:08:40,404 - root - INFO - Step 9380: lr=1.00E-05, loss= 1.2395 (max= 2.0447), tps=12211, mfu=25.44%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.23s, 33.82%) +2025-10-24 14:08:58,429 - root - INFO - Step 9390: lr=1.00E-05, loss= 1.2684 (max= 2.5979), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:08:58,429 - root - INFO - Step 9390: lr=1.00E-05, loss= 1.2684 (max= 2.5979), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:08:58,429 - root - INFO - Step 9390: lr=1.00E-05, loss= 1.2684 (max= 2.5979), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:08:58,429 - root - INFO - Step 9390: lr=1.00E-05, loss= 1.2684 (max= 2.5979), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:08:58,429 - root - INFO - Step 9390: lr=1.00E-05, loss= 1.2684 (max= 2.5979), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:08:58,429 - root - INFO - Step 9390: lr=1.00E-05, loss= 1.2684 (max= 2.5979), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:08:58,430 - root - INFO - Step 9390: lr=1.00E-05, loss= 1.2684 (max= 2.5979), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:08:58,430 - root - INFO - Step 9390: lr=1.00E-05, loss= 1.2684 (max= 2.5979), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:09:16,473 - root - INFO - Step 9400: lr=1.00E-05, loss= 1.2698 (max= 2.7493), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:09:16,474 - root - INFO - Step 9400: lr=1.00E-05, loss= 1.2698 (max= 2.7493), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:09:16,474 - root - INFO - Step 9400: lr=1.00E-05, loss= 1.2698 (max= 2.7493), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:09:16,474 - root - INFO - Step 9400: lr=1.00E-05, loss= 1.2698 (max= 2.7493), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:09:16,474 - root - INFO - Step 9400: lr=1.00E-05, loss= 1.2698 (max= 2.7493), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:09:16,474 - root - INFO - Step 9400: lr=1.00E-05, loss= 1.2698 (max= 2.7493), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:09:16,474 - root - INFO - Step 9400: lr=1.00E-05, loss= 1.2698 (max= 2.7493), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:09:16,475 - root - INFO - Step 9400: lr=1.00E-05, loss= 1.2698 (max= 2.7493), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:09:34,551 - root - INFO - Step 9410: lr=1.00E-05, loss= 1.2501 (max= 2.0075), tps=18130, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:09:34,551 - root - INFO - Step 9410: lr=1.00E-05, loss= 1.2501 (max= 2.0075), tps=18130, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:09:34,551 - root - INFO - Step 9410: lr=1.00E-05, loss= 1.2501 (max= 2.0075), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:09:34,551 - root - INFO - Step 9410: lr=1.00E-05, loss= 1.2501 (max= 2.0075), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:09:34,551 - root - INFO - Step 9410: lr=1.00E-05, loss= 1.2501 (max= 2.0075), tps=18130, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:09:34,551 - root - INFO - Step 9410: lr=1.00E-05, loss= 1.2501 (max= 2.0075), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:09:34,552 - root - INFO - Step 9410: lr=1.00E-05, loss= 1.2501 (max= 2.0075), tps=18130, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:09:34,552 - root - INFO - Step 9410: lr=1.00E-05, loss= 1.2501 (max= 2.0075), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:09:52,589 - root - INFO - Step 9420: lr=1.00E-05, loss= 1.2662 (max= 4.0196), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:09:52,589 - root - INFO - Step 9420: lr=1.00E-05, loss= 1.2662 (max= 4.0196), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:09:52,589 - root - INFO - Step 9420: lr=1.00E-05, loss= 1.2662 (max= 4.0196), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:09:52,589 - root - INFO - Step 9420: lr=1.00E-05, loss= 1.2662 (max= 4.0196), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:09:52,589 - root - INFO - Step 9420: lr=1.00E-05, loss= 1.2662 (max= 4.0196), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:09:52,590 - root - INFO - Step 9420: lr=1.00E-05, loss= 1.2662 (max= 4.0196), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:09:52,590 - root - INFO - Step 9420: lr=1.00E-05, loss= 1.2662 (max= 4.0196), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:09:52,590 - root - INFO - Step 9420: lr=1.00E-05, loss= 1.2662 (max= 4.0196), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:10:10,657 - root - INFO - Step 9430: lr=1.00E-05, loss= 1.2557 (max= 2.2772), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:10:10,657 - root - INFO - Step 9430: lr=1.00E-05, loss= 1.2557 (max= 2.2772), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:10:10,657 - root - INFO - Step 9430: lr=1.00E-05, loss= 1.2557 (max= 2.2772), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:10:10,657 - root - INFO - Step 9430: lr=1.00E-05, loss= 1.2557 (max= 2.2772), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:10:10,657 - root - INFO - Step 9430: lr=1.00E-05, loss= 1.2557 (max= 2.2772), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:10:10,657 - root - INFO - Step 9430: lr=1.00E-05, loss= 1.2557 (max= 2.2772), tps=18140, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:10:10,657 - root - INFO - Step 9430: lr=1.00E-05, loss= 1.2557 (max= 2.2772), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:10:10,657 - root - INFO - Step 9430: lr=1.00E-05, loss= 1.2557 (max= 2.2772), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:10:28,678 - root - INFO - Step 9440: lr=1.00E-05, loss= 1.2559 (max= 2.1999), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:10:28,678 - root - INFO - Step 9440: lr=1.00E-05, loss= 1.2559 (max= 2.1999), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:10:28,678 - root - INFO - Step 9440: lr=1.00E-05, loss= 1.2559 (max= 2.1999), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:10:28,678 - root - INFO - Step 9440: lr=1.00E-05, loss= 1.2559 (max= 2.1999), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:10:28,678 - root - INFO - Step 9440: lr=1.00E-05, loss= 1.2559 (max= 2.1999), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:10:28,678 - root - INFO - Step 9440: lr=1.00E-05, loss= 1.2559 (max= 2.1999), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:10:28,678 - root - INFO - Step 9440: lr=1.00E-05, loss= 1.2559 (max= 2.1999), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:10:28,678 - root - INFO - Step 9440: lr=1.00E-05, loss= 1.2559 (max= 2.1999), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:10:46,709 - root - INFO - Step 9450: lr=1.00E-05, loss= 1.2406 (max= 2.1559), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:10:46,709 - root - INFO - Step 9450: lr=1.00E-05, loss= 1.2406 (max= 2.1559), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:10:46,709 - root - INFO - Step 9450: lr=1.00E-05, loss= 1.2406 (max= 2.1559), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:10:46,709 - root - INFO - Step 9450: lr=1.00E-05, loss= 1.2406 (max= 2.1559), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:10:46,709 - root - INFO - Step 9450: lr=1.00E-05, loss= 1.2406 (max= 2.1559), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:10:46,709 - root - INFO - Step 9450: lr=1.00E-05, loss= 1.2406 (max= 2.1559), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:10:46,709 - root - INFO - Step 9450: lr=1.00E-05, loss= 1.2406 (max= 2.1559), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:10:46,709 - root - INFO - Step 9450: lr=1.00E-05, loss= 1.2406 (max= 2.1559), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:04,734 - root - INFO - Step 9460: lr=1.00E-05, loss= 1.2521 (max= 2.2187), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:04,734 - root - INFO - Step 9460: lr=1.00E-05, loss= 1.2521 (max= 2.2187), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:04,734 - root - INFO - Step 9460: lr=1.00E-05, loss= 1.2521 (max= 2.2187), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:04,734 - root - INFO - Step 9460: lr=1.00E-05, loss= 1.2521 (max= 2.2187), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:04,734 - root - INFO - Step 9460: lr=1.00E-05, loss= 1.2521 (max= 2.2187), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:04,735 - root - INFO - Step 9460: lr=1.00E-05, loss= 1.2521 (max= 2.2187), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:04,735 - root - INFO - Step 9460: lr=1.00E-05, loss= 1.2521 (max= 2.2187), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:04,735 - root - INFO - Step 9460: lr=1.00E-05, loss= 1.2521 (max= 2.2187), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:22,762 - root - INFO - Step 9470: lr=1.00E-05, loss= 1.2704 (max= 2.2208), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:22,762 - root - INFO - Step 9470: lr=1.00E-05, loss= 1.2704 (max= 2.2208), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:22,762 - root - INFO - Step 9470: lr=1.00E-05, loss= 1.2704 (max= 2.2208), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:22,762 - root - INFO - Step 9470: lr=1.00E-05, loss= 1.2704 (max= 2.2208), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:22,762 - root - INFO - Step 9470: lr=1.00E-05, loss= 1.2704 (max= 2.2208), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:22,762 - root - INFO - Step 9470: lr=1.00E-05, loss= 1.2704 (max= 2.2208), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:22,762 - root - INFO - Step 9470: lr=1.00E-05, loss= 1.2704 (max= 2.2208), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:22,762 - root - INFO - Step 9470: lr=1.00E-05, loss= 1.2704 (max= 2.2208), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:40,782 - root - INFO - Step 9480: lr=1.00E-05, loss= 1.2472 (max= 3.5026), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:11:40,782 - root - INFO - Step 9480: lr=1.00E-05, loss= 1.2472 (max= 3.5026), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:11:40,782 - root - INFO - Step 9480: lr=1.00E-05, loss= 1.2472 (max= 3.5026), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:11:40,782 - root - INFO - Step 9480: lr=1.00E-05, loss= 1.2472 (max= 3.5026), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:11:40,782 - root - INFO - Step 9480: lr=1.00E-05, loss= 1.2472 (max= 3.5026), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:11:40,782 - root - INFO - Step 9480: lr=1.00E-05, loss= 1.2472 (max= 3.5026), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:11:40,782 - root - INFO - Step 9480: lr=1.00E-05, loss= 1.2472 (max= 3.5026), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:11:40,782 - root - INFO - Step 9480: lr=1.00E-05, loss= 1.2472 (max= 3.5026), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:11:58,864 - root - INFO - Step 9490: lr=1.00E-05, loss= 1.2424 (max= 2.2908), tps=18125, mfu=37.76%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:58,864 - root - INFO - Step 9490: lr=1.00E-05, loss= 1.2424 (max= 2.2908), tps=18126, mfu=37.76%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:58,864 - root - INFO - Step 9490: lr=1.00E-05, loss= 1.2424 (max= 2.2908), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:58,864 - root - INFO - Step 9490: lr=1.00E-05, loss= 1.2424 (max= 2.2908), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:58,864 - root - INFO - Step 9490: lr=1.00E-05, loss= 1.2424 (max= 2.2908), tps=18125, mfu=37.76%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:58,864 - root - INFO - Step 9490: lr=1.00E-05, loss= 1.2424 (max= 2.2908), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:58,864 - root - INFO - Step 9490: lr=1.00E-05, loss= 1.2424 (max= 2.2908), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:11:58,864 - root - INFO - Step 9490: lr=1.00E-05, loss= 1.2424 (max= 2.2908), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:12:20,250 - root - INFO - Step 9500: lr=1.00E-05, loss= 1.2708 (max= 2.3350), tps=15325, mfu=31.93%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.09s, 16.90%) +2025-10-24 14:12:20,250 - root - INFO - Step 9500: lr=1.00E-05, loss= 1.2708 (max= 2.3350), tps=15325, mfu=31.93%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.09s, 16.90%) +2025-10-24 14:12:20,250 - root - INFO - Step 9500: lr=1.00E-05, loss= 1.2708 (max= 2.3350), tps=15325, mfu=31.93%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.09s, 16.90%) +2025-10-24 14:12:20,250 - root - INFO - Step 9500: lr=1.00E-05, loss= 1.2708 (max= 2.3350), tps=15325, mfu=31.93%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.09s, 16.90%) +2025-10-24 14:12:20,250 - root - INFO - Step 9500: lr=1.00E-05, loss= 1.2708 (max= 2.3350), tps=15325, mfu=31.93%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.09s, 16.90%) +2025-10-24 14:12:20,250 - root - INFO - Step 9500: lr=1.00E-05, loss= 1.2708 (max= 2.3350), tps=15325, mfu=31.93%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.09s, 16.90%) +2025-10-24 14:12:20,250 - root - INFO - Step 9500: lr=1.00E-05, loss= 1.2708 (max= 2.3350), tps=15325, mfu=31.93%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.09s, 16.90%) +2025-10-24 14:12:20,250 - root - INFO - Step 9500: lr=1.00E-05, loss= 1.2708 (max= 2.3350), tps=15324, mfu=31.93%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.09s, 16.90%) +2025-10-24 14:12:38,295 - root - INFO - Step 9510: lr=1.00E-05, loss= 1.2434 (max= 2.1533), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:12:38,296 - root - INFO - Step 9510: lr=1.00E-05, loss= 1.2434 (max= 2.1533), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:12:38,296 - root - INFO - Step 9510: lr=1.00E-05, loss= 1.2434 (max= 2.1533), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:12:38,296 - root - INFO - Step 9510: lr=1.00E-05, loss= 1.2434 (max= 2.1533), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:12:38,296 - root - INFO - Step 9510: lr=1.00E-05, loss= 1.2434 (max= 2.1533), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:12:38,296 - root - INFO - Step 9510: lr=1.00E-05, loss= 1.2434 (max= 2.1533), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:12:38,296 - root - INFO - Step 9510: lr=1.00E-05, loss= 1.2434 (max= 2.1533), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:12:38,296 - root - INFO - Step 9510: lr=1.00E-05, loss= 1.2434 (max= 2.1533), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:12:56,321 - root - INFO - Step 9520: lr=1.00E-05, loss= 1.2326 (max= 2.3504), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:12:56,321 - root - INFO - Step 9520: lr=1.00E-05, loss= 1.2326 (max= 2.3504), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:12:56,321 - root - INFO - Step 9520: lr=1.00E-05, loss= 1.2326 (max= 2.3504), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:12:56,321 - root - INFO - Step 9520: lr=1.00E-05, loss= 1.2326 (max= 2.3504), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:12:56,321 - root - INFO - Step 9520: lr=1.00E-05, loss= 1.2326 (max= 2.3504), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:12:56,321 - root - INFO - Step 9520: lr=1.00E-05, loss= 1.2326 (max= 2.3504), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:12:56,321 - root - INFO - Step 9520: lr=1.00E-05, loss= 1.2326 (max= 2.3504), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:12:56,321 - root - INFO - Step 9520: lr=1.00E-05, loss= 1.2326 (max= 2.3504), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:13:14,392 - root - INFO - Step 9530: lr=1.00E-05, loss= 1.2539 (max= 2.4334), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:13:14,392 - root - INFO - Step 9530: lr=1.00E-05, loss= 1.2539 (max= 2.4334), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:13:14,392 - root - INFO - Step 9530: lr=1.00E-05, loss= 1.2539 (max= 2.4334), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:13:14,392 - root - INFO - Step 9530: lr=1.00E-05, loss= 1.2539 (max= 2.4334), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:13:14,392 - root - INFO - Step 9530: lr=1.00E-05, loss= 1.2539 (max= 2.4334), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:13:14,392 - root - INFO - Step 9530: lr=1.00E-05, loss= 1.2539 (max= 2.4334), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:13:14,392 - root - INFO - Step 9530: lr=1.00E-05, loss= 1.2539 (max= 2.4334), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:13:14,392 - root - INFO - Step 9530: lr=1.00E-05, loss= 1.2539 (max= 2.4334), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:13:32,459 - root - INFO - Step 9540: lr=1.00E-05, loss= 1.2835 (max= 2.1207), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:13:32,459 - root - INFO - Step 9540: lr=1.00E-05, loss= 1.2835 (max= 2.1207), tps=18140, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:13:32,459 - root - INFO - Step 9540: lr=1.00E-05, loss= 1.2835 (max= 2.1207), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:13:32,459 - root - INFO - Step 9540: lr=1.00E-05, loss= 1.2835 (max= 2.1207), tps=18140, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:13:32,459 - root - INFO - Step 9540: lr=1.00E-05, loss= 1.2835 (max= 2.1207), tps=18140, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:13:32,459 - root - INFO - Step 9540: lr=1.00E-05, loss= 1.2835 (max= 2.1207), tps=18140, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:13:32,459 - root - INFO - Step 9540: lr=1.00E-05, loss= 1.2835 (max= 2.1207), tps=18140, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:13:32,459 - root - INFO - Step 9540: lr=1.00E-05, loss= 1.2835 (max= 2.1207), tps=18140, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:13:50,522 - root - INFO - Step 9550: lr=1.00E-05, loss= 1.2944 (max= 2.1952), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:13:50,522 - root - INFO - Step 9550: lr=1.00E-05, loss= 1.2944 (max= 2.1952), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:13:50,522 - root - INFO - Step 9550: lr=1.00E-05, loss= 1.2944 (max= 2.1952), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:13:50,522 - root - INFO - Step 9550: lr=1.00E-05, loss= 1.2944 (max= 2.1952), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:13:50,522 - root - INFO - Step 9550: lr=1.00E-05, loss= 1.2944 (max= 2.1952), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:13:50,522 - root - INFO - Step 9550: lr=1.00E-05, loss= 1.2944 (max= 2.1952), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:13:50,522 - root - INFO - Step 9550: lr=1.00E-05, loss= 1.2944 (max= 2.1952), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:13:50,522 - root - INFO - Step 9550: lr=1.00E-05, loss= 1.2944 (max= 2.1952), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:14:08,564 - root - INFO - Step 9560: lr=1.00E-05, loss= 1.2308 (max= 2.0800), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:14:08,564 - root - INFO - Step 9560: lr=1.00E-05, loss= 1.2308 (max= 2.0800), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:14:08,564 - root - INFO - Step 9560: lr=1.00E-05, loss= 1.2308 (max= 2.0800), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:14:08,564 - root - INFO - Step 9560: lr=1.00E-05, loss= 1.2308 (max= 2.0800), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:14:08,564 - root - INFO - Step 9560: lr=1.00E-05, loss= 1.2308 (max= 2.0800), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:14:08,564 - root - INFO - Step 9560: lr=1.00E-05, loss= 1.2308 (max= 2.0800), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:14:08,564 - root - INFO - Step 9560: lr=1.00E-05, loss= 1.2308 (max= 2.0800), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:14:08,564 - root - INFO - Step 9560: lr=1.00E-05, loss= 1.2308 (max= 2.0800), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:14:26,597 - root - INFO - Step 9570: lr=1.00E-05, loss= 1.2679 (max= 2.0976), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:14:26,598 - root - INFO - Step 9570: lr=1.00E-05, loss= 1.2679 (max= 2.0976), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:14:26,598 - root - INFO - Step 9570: lr=1.00E-05, loss= 1.2679 (max= 2.0976), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:14:26,598 - root - INFO - Step 9570: lr=1.00E-05, loss= 1.2679 (max= 2.0976), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:14:26,598 - root - INFO - Step 9570: lr=1.00E-05, loss= 1.2679 (max= 2.0976), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:14:26,598 - root - INFO - Step 9570: lr=1.00E-05, loss= 1.2679 (max= 2.0976), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:14:26,598 - root - INFO - Step 9570: lr=1.00E-05, loss= 1.2679 (max= 2.0976), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:14:26,598 - root - INFO - Step 9570: lr=1.00E-05, loss= 1.2679 (max= 2.0976), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:14:44,616 - root - INFO - Step 9580: lr=1.00E-05, loss= 1.2518 (max= 2.1099), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:14:44,616 - root - INFO - Step 9580: lr=1.00E-05, loss= 1.2518 (max= 2.1099), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:14:44,616 - root - INFO - Step 9580: lr=1.00E-05, loss= 1.2518 (max= 2.1099), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:14:44,616 - root - INFO - Step 9580: lr=1.00E-05, loss= 1.2518 (max= 2.1099), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:14:44,616 - root - INFO - Step 9580: lr=1.00E-05, loss= 1.2518 (max= 2.1099), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:14:44,616 - root - INFO - Step 9580: lr=1.00E-05, loss= 1.2518 (max= 2.1099), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:14:44,616 - root - INFO - Step 9580: lr=1.00E-05, loss= 1.2518 (max= 2.1099), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:14:44,616 - root - INFO - Step 9580: lr=1.00E-05, loss= 1.2518 (max= 2.1099), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:15:09,366 - root - INFO - Step 9590: lr=1.00E-05, loss= 1.2484 (max= 2.1476), tps=13241, mfu=27.59%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.17s, 28.20%) +2025-10-24 14:15:09,367 - root - INFO - Step 9590: lr=1.00E-05, loss= 1.2484 (max= 2.1476), tps=13241, mfu=27.59%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.17s, 28.20%) +2025-10-24 14:15:09,367 - root - INFO - Step 9590: lr=1.00E-05, loss= 1.2484 (max= 2.1476), tps=13241, mfu=27.59%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.17s, 28.20%) +2025-10-24 14:15:09,367 - root - INFO - Step 9590: lr=1.00E-05, loss= 1.2484 (max= 2.1476), tps=13241, mfu=27.59%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.17s, 28.20%) +2025-10-24 14:15:09,367 - root - INFO - Step 9590: lr=1.00E-05, loss= 1.2484 (max= 2.1476), tps=13241, mfu=27.59%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.17s, 28.20%) +2025-10-24 14:15:09,367 - root - INFO - Step 9590: lr=1.00E-05, loss= 1.2484 (max= 2.1476), tps=13241, mfu=27.59%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.17s, 28.20%) +2025-10-24 14:15:09,367 - root - INFO - Step 9590: lr=1.00E-05, loss= 1.2484 (max= 2.1476), tps=13241, mfu=27.59%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.17s, 28.20%) +2025-10-24 14:15:09,367 - root - INFO - Step 9590: lr=1.00E-05, loss= 1.2484 (max= 2.1476), tps=13241, mfu=27.59%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.17s, 28.20%) +2025-10-24 14:15:27,367 - root - INFO - Step 9600: lr=1.00E-05, loss= 1.2627 (max= 2.9403), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:15:27,367 - root - INFO - Step 9600: lr=1.00E-05, loss= 1.2627 (max= 2.9403), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:15:27,367 - root - INFO - Step 9600: lr=1.00E-05, loss= 1.2627 (max= 2.9403), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:15:27,367 - root - INFO - Step 9600: lr=1.00E-05, loss= 1.2627 (max= 2.9403), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:15:27,367 - root - INFO - Step 9600: lr=1.00E-05, loss= 1.2627 (max= 2.9403), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:15:27,367 - root - INFO - Step 9600: lr=1.00E-05, loss= 1.2627 (max= 2.9403), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:15:27,367 - root - INFO - Step 9600: lr=1.00E-05, loss= 1.2627 (max= 2.9403), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:15:27,367 - root - INFO - Step 9600: lr=1.00E-05, loss= 1.2627 (max= 2.9403), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:15:38,829 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:4558535 +2025-10-24 14:15:45,408 - root - INFO - Step 9610: lr=1.00E-05, loss= 1.2411 (max= 2.3018), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:15:45,409 - root - INFO - Step 9610: lr=1.00E-05, loss= 1.2411 (max= 2.3018), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:15:45,409 - root - INFO - Step 9610: lr=1.00E-05, loss= 1.2411 (max= 2.3018), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:15:45,409 - root - INFO - Step 9610: lr=1.00E-05, loss= 1.2411 (max= 2.3018), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:15:45,409 - root - INFO - Step 9610: lr=1.00E-05, loss= 1.2411 (max= 2.3018), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:15:45,409 - root - INFO - Step 9610: lr=1.00E-05, loss= 1.2411 (max= 2.3018), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:15:45,409 - root - INFO - Step 9610: lr=1.00E-05, loss= 1.2411 (max= 2.3018), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:15:45,409 - root - INFO - Step 9610: lr=1.00E-05, loss= 1.2411 (max= 2.3018), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:03,447 - root - INFO - Step 9620: lr=1.00E-05, loss= 1.2599 (max= 2.7210), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:03,447 - root - INFO - Step 9620: lr=1.00E-05, loss= 1.2599 (max= 2.7210), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:03,447 - root - INFO - Step 9620: lr=1.00E-05, loss= 1.2599 (max= 2.7210), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:03,447 - root - INFO - Step 9620: lr=1.00E-05, loss= 1.2599 (max= 2.7210), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:03,447 - root - INFO - Step 9620: lr=1.00E-05, loss= 1.2599 (max= 2.7210), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:03,447 - root - INFO - Step 9620: lr=1.00E-05, loss= 1.2599 (max= 2.7210), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:03,447 - root - INFO - Step 9620: lr=1.00E-05, loss= 1.2599 (max= 2.7210), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:03,447 - root - INFO - Step 9620: lr=1.00E-05, loss= 1.2599 (max= 2.7210), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:21,479 - root - INFO - Step 9630: lr=1.00E-05, loss= 1.2528 (max= 2.3872), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:21,479 - root - INFO - Step 9630: lr=1.00E-05, loss= 1.2528 (max= 2.3872), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:21,479 - root - INFO - Step 9630: lr=1.00E-05, loss= 1.2528 (max= 2.3872), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:21,479 - root - INFO - Step 9630: lr=1.00E-05, loss= 1.2528 (max= 2.3872), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:21,479 - root - INFO - Step 9630: lr=1.00E-05, loss= 1.2528 (max= 2.3872), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:21,479 - root - INFO - Step 9630: lr=1.00E-05, loss= 1.2528 (max= 2.3872), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:21,479 - root - INFO - Step 9630: lr=1.00E-05, loss= 1.2528 (max= 2.3872), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:21,479 - root - INFO - Step 9630: lr=1.00E-05, loss= 1.2528 (max= 2.3872), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:39,537 - root - INFO - Step 9640: lr=1.00E-05, loss= 1.2886 (max= 2.1311), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:39,538 - root - INFO - Step 9640: lr=1.00E-05, loss= 1.2886 (max= 2.1311), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:39,538 - root - INFO - Step 9640: lr=1.00E-05, loss= 1.2886 (max= 2.1311), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:39,538 - root - INFO - Step 9640: lr=1.00E-05, loss= 1.2886 (max= 2.1311), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:39,538 - root - INFO - Step 9640: lr=1.00E-05, loss= 1.2886 (max= 2.1311), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:39,538 - root - INFO - Step 9640: lr=1.00E-05, loss= 1.2886 (max= 2.1311), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:39,538 - root - INFO - Step 9640: lr=1.00E-05, loss= 1.2886 (max= 2.1311), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:39,538 - root - INFO - Step 9640: lr=1.00E-05, loss= 1.2886 (max= 2.1311), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:57,542 - root - INFO - Step 9650: lr=1.00E-05, loss= 1.2558 (max= 2.3783), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:57,543 - root - INFO - Step 9650: lr=1.00E-05, loss= 1.2558 (max= 2.3783), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:57,543 - root - INFO - Step 9650: lr=1.00E-05, loss= 1.2558 (max= 2.3783), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:57,543 - root - INFO - Step 9650: lr=1.00E-05, loss= 1.2558 (max= 2.3783), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:57,543 - root - INFO - Step 9650: lr=1.00E-05, loss= 1.2558 (max= 2.3783), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:57,543 - root - INFO - Step 9650: lr=1.00E-05, loss= 1.2558 (max= 2.3783), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:57,543 - root - INFO - Step 9650: lr=1.00E-05, loss= 1.2558 (max= 2.3783), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:16:57,543 - root - INFO - Step 9650: lr=1.00E-05, loss= 1.2558 (max= 2.3783), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:15,544 - root - INFO - Step 9660: lr=1.00E-05, loss= 1.2506 (max= 2.4706), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:15,544 - root - INFO - Step 9660: lr=1.00E-05, loss= 1.2506 (max= 2.4706), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:15,544 - root - INFO - Step 9660: lr=1.00E-05, loss= 1.2506 (max= 2.4706), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:15,544 - root - INFO - Step 9660: lr=1.00E-05, loss= 1.2506 (max= 2.4706), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:15,544 - root - INFO - Step 9660: lr=1.00E-05, loss= 1.2506 (max= 2.4706), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:15,544 - root - INFO - Step 9660: lr=1.00E-05, loss= 1.2506 (max= 2.4706), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:15,544 - root - INFO - Step 9660: lr=1.00E-05, loss= 1.2506 (max= 2.4706), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:15,544 - root - INFO - Step 9660: lr=1.00E-05, loss= 1.2506 (max= 2.4706), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:33,578 - root - INFO - Step 9670: lr=1.00E-05, loss= 1.2934 (max= 2.1984), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:33,578 - root - INFO - Step 9670: lr=1.00E-05, loss= 1.2934 (max= 2.1984), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:33,578 - root - INFO - Step 9670: lr=1.00E-05, loss= 1.2934 (max= 2.1984), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:33,578 - root - INFO - Step 9670: lr=1.00E-05, loss= 1.2934 (max= 2.1984), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:33,578 - root - INFO - Step 9670: lr=1.00E-05, loss= 1.2934 (max= 2.1984), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:33,578 - root - INFO - Step 9670: lr=1.00E-05, loss= 1.2934 (max= 2.1984), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:33,578 - root - INFO - Step 9670: lr=1.00E-05, loss= 1.2934 (max= 2.1984), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:33,578 - root - INFO - Step 9670: lr=1.00E-05, loss= 1.2934 (max= 2.1984), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:51,599 - root - INFO - Step 9680: lr=1.00E-05, loss= 1.2763 (max= 3.3503), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:51,599 - root - INFO - Step 9680: lr=1.00E-05, loss= 1.2763 (max= 3.3503), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:51,599 - root - INFO - Step 9680: lr=1.00E-05, loss= 1.2763 (max= 3.3503), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:51,599 - root - INFO - Step 9680: lr=1.00E-05, loss= 1.2763 (max= 3.3503), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:51,599 - root - INFO - Step 9680: lr=1.00E-05, loss= 1.2763 (max= 3.3503), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:51,599 - root - INFO - Step 9680: lr=1.00E-05, loss= 1.2763 (max= 3.3503), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:51,599 - root - INFO - Step 9680: lr=1.00E-05, loss= 1.2763 (max= 3.3503), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:17:51,600 - root - INFO - Step 9680: lr=1.00E-05, loss= 1.2763 (max= 3.3503), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:18:09,616 - root - INFO - Step 9690: lr=1.00E-05, loss= 1.2898 (max= 2.5675), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:18:09,616 - root - INFO - Step 9690: lr=1.00E-05, loss= 1.2898 (max= 2.5675), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:18:09,617 - root - INFO - Step 9690: lr=1.00E-05, loss= 1.2898 (max= 2.5675), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:18:09,617 - root - INFO - Step 9690: lr=1.00E-05, loss= 1.2898 (max= 2.5675), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:18:09,617 - root - INFO - Step 9690: lr=1.00E-05, loss= 1.2898 (max= 2.5675), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:18:09,617 - root - INFO - Step 9690: lr=1.00E-05, loss= 1.2898 (max= 2.5675), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:18:09,617 - root - INFO - Step 9690: lr=1.00E-05, loss= 1.2898 (max= 2.5675), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:18:09,617 - root - INFO - Step 9690: lr=1.00E-05, loss= 1.2898 (max= 2.5675), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:18:27,610 - root - INFO - Step 9700: lr=1.00E-05, loss= 1.2787 (max= 2.1112), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:18:27,610 - root - INFO - Step 9700: lr=1.00E-05, loss= 1.2787 (max= 2.1112), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:18:27,610 - root - INFO - Step 9700: lr=1.00E-05, loss= 1.2787 (max= 2.1112), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:18:27,610 - root - INFO - Step 9700: lr=1.00E-05, loss= 1.2787 (max= 2.1112), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:18:27,610 - root - INFO - Step 9700: lr=1.00E-05, loss= 1.2787 (max= 2.1112), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:18:27,610 - root - INFO - Step 9700: lr=1.00E-05, loss= 1.2787 (max= 2.1112), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:18:27,610 - root - INFO - Step 9700: lr=1.00E-05, loss= 1.2787 (max= 2.1112), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:18:27,610 - root - INFO - Step 9700: lr=1.00E-05, loss= 1.2787 (max= 2.1112), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:18:45,702 - root - INFO - Step 9710: lr=1.00E-05, loss= 1.2919 (max= 2.3635), tps=18115, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:18:45,702 - root - INFO - Step 9710: lr=1.00E-05, loss= 1.2919 (max= 2.3635), tps=18115, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:18:45,702 - root - INFO - Step 9710: lr=1.00E-05, loss= 1.2919 (max= 2.3635), tps=18116, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:18:45,702 - root - INFO - Step 9710: lr=1.00E-05, loss= 1.2919 (max= 2.3635), tps=18116, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:18:45,702 - root - INFO - Step 9710: lr=1.00E-05, loss= 1.2919 (max= 2.3635), tps=18116, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:18:45,702 - root - INFO - Step 9710: lr=1.00E-05, loss= 1.2919 (max= 2.3635), tps=18116, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:18:45,702 - root - INFO - Step 9710: lr=1.00E-05, loss= 1.2919 (max= 2.3635), tps=18116, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:18:45,702 - root - INFO - Step 9710: lr=1.00E-05, loss= 1.2919 (max= 2.3635), tps=18116, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:03,686 - root - INFO - Step 9720: lr=1.00E-05, loss= 1.2850 (max= 2.6780), tps=18223, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:03,686 - root - INFO - Step 9720: lr=1.00E-05, loss= 1.2850 (max= 2.6780), tps=18224, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:03,686 - root - INFO - Step 9720: lr=1.00E-05, loss= 1.2850 (max= 2.6780), tps=18224, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:03,686 - root - INFO - Step 9720: lr=1.00E-05, loss= 1.2850 (max= 2.6780), tps=18224, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:03,686 - root - INFO - Step 9720: lr=1.00E-05, loss= 1.2850 (max= 2.6780), tps=18224, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:03,686 - root - INFO - Step 9720: lr=1.00E-05, loss= 1.2850 (max= 2.6780), tps=18224, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:03,686 - root - INFO - Step 9720: lr=1.00E-05, loss= 1.2850 (max= 2.6780), tps=18224, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:03,686 - root - INFO - Step 9720: lr=1.00E-05, loss= 1.2850 (max= 2.6780), tps=18224, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:21,713 - root - INFO - Step 9730: lr=1.00E-05, loss= 1.3185 (max= 2.5706), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:21,713 - root - INFO - Step 9730: lr=1.00E-05, loss= 1.3185 (max= 2.5706), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:21,713 - root - INFO - Step 9730: lr=1.00E-05, loss= 1.3185 (max= 2.5706), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:21,713 - root - INFO - Step 9730: lr=1.00E-05, loss= 1.3185 (max= 2.5706), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:21,713 - root - INFO - Step 9730: lr=1.00E-05, loss= 1.3185 (max= 2.5706), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:21,713 - root - INFO - Step 9730: lr=1.00E-05, loss= 1.3185 (max= 2.5706), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:21,713 - root - INFO - Step 9730: lr=1.00E-05, loss= 1.3185 (max= 2.5706), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:21,713 - root - INFO - Step 9730: lr=1.00E-05, loss= 1.3185 (max= 2.5706), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:39,764 - root - INFO - Step 9740: lr=1.00E-05, loss= 1.2514 (max= 1.9919), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:39,764 - root - INFO - Step 9740: lr=1.00E-05, loss= 1.2514 (max= 1.9919), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:39,764 - root - INFO - Step 9740: lr=1.00E-05, loss= 1.2514 (max= 1.9919), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:39,764 - root - INFO - Step 9740: lr=1.00E-05, loss= 1.2514 (max= 1.9919), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:39,764 - root - INFO - Step 9740: lr=1.00E-05, loss= 1.2514 (max= 1.9919), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:39,765 - root - INFO - Step 9740: lr=1.00E-05, loss= 1.2514 (max= 1.9919), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:39,765 - root - INFO - Step 9740: lr=1.00E-05, loss= 1.2514 (max= 1.9919), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:39,765 - root - INFO - Step 9740: lr=1.00E-05, loss= 1.2514 (max= 1.9919), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:57,801 - root - INFO - Step 9750: lr=1.00E-05, loss= 1.2940 (max= 2.0849), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:57,801 - root - INFO - Step 9750: lr=1.00E-05, loss= 1.2940 (max= 2.0849), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:57,801 - root - INFO - Step 9750: lr=1.00E-05, loss= 1.2940 (max= 2.0849), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:57,801 - root - INFO - Step 9750: lr=1.00E-05, loss= 1.2940 (max= 2.0849), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:57,801 - root - INFO - Step 9750: lr=1.00E-05, loss= 1.2940 (max= 2.0849), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:57,801 - root - INFO - Step 9750: lr=1.00E-05, loss= 1.2940 (max= 2.0849), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:57,801 - root - INFO - Step 9750: lr=1.00E-05, loss= 1.2940 (max= 2.0849), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:19:57,801 - root - INFO - Step 9750: lr=1.00E-05, loss= 1.2940 (max= 2.0849), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:15,807 - root - INFO - Step 9760: lr=1.00E-05, loss= 1.2647 (max= 1.9812), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:15,807 - root - INFO - Step 9760: lr=1.00E-05, loss= 1.2647 (max= 1.9812), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:15,807 - root - INFO - Step 9760: lr=1.00E-05, loss= 1.2647 (max= 1.9812), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:15,807 - root - INFO - Step 9760: lr=1.00E-05, loss= 1.2647 (max= 1.9812), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:15,807 - root - INFO - Step 9760: lr=1.00E-05, loss= 1.2647 (max= 1.9812), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:15,807 - root - INFO - Step 9760: lr=1.00E-05, loss= 1.2647 (max= 1.9812), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:15,807 - root - INFO - Step 9760: lr=1.00E-05, loss= 1.2647 (max= 1.9812), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:15,807 - root - INFO - Step 9760: lr=1.00E-05, loss= 1.2647 (max= 1.9812), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:33,812 - root - INFO - Step 9770: lr=1.00E-05, loss= 1.2625 (max= 3.7531), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:33,812 - root - INFO - Step 9770: lr=1.00E-05, loss= 1.2625 (max= 3.7531), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:33,812 - root - INFO - Step 9770: lr=1.00E-05, loss= 1.2625 (max= 3.7531), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:33,812 - root - INFO - Step 9770: lr=1.00E-05, loss= 1.2625 (max= 3.7531), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:33,812 - root - INFO - Step 9770: lr=1.00E-05, loss= 1.2625 (max= 3.7531), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:33,813 - root - INFO - Step 9770: lr=1.00E-05, loss= 1.2625 (max= 3.7531), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:33,813 - root - INFO - Step 9770: lr=1.00E-05, loss= 1.2625 (max= 3.7531), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:33,813 - root - INFO - Step 9770: lr=1.00E-05, loss= 1.2625 (max= 3.7531), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:51,821 - root - INFO - Step 9780: lr=1.00E-05, loss= 1.2704 (max= 2.0706), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:51,821 - root - INFO - Step 9780: lr=1.00E-05, loss= 1.2704 (max= 2.0706), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:51,821 - root - INFO - Step 9780: lr=1.00E-05, loss= 1.2704 (max= 2.0706), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:51,821 - root - INFO - Step 9780: lr=1.00E-05, loss= 1.2704 (max= 2.0706), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:51,821 - root - INFO - Step 9780: lr=1.00E-05, loss= 1.2704 (max= 2.0706), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:51,821 - root - INFO - Step 9780: lr=1.00E-05, loss= 1.2704 (max= 2.0706), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:51,821 - root - INFO - Step 9780: lr=1.00E-05, loss= 1.2704 (max= 2.0706), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:20:51,821 - root - INFO - Step 9780: lr=1.00E-05, loss= 1.2704 (max= 2.0706), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:21:09,784 - root - INFO - Step 9790: lr=1.00E-05, loss= 1.2703 (max= 2.0872), tps=18245, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:21:09,784 - root - INFO - Step 9790: lr=1.00E-05, loss= 1.2703 (max= 2.0872), tps=18245, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:21:09,784 - root - INFO - Step 9790: lr=1.00E-05, loss= 1.2703 (max= 2.0872), tps=18245, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:21:09,784 - root - INFO - Step 9790: lr=1.00E-05, loss= 1.2703 (max= 2.0872), tps=18245, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:21:09,784 - root - INFO - Step 9790: lr=1.00E-05, loss= 1.2703 (max= 2.0872), tps=18245, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:21:09,784 - root - INFO - Step 9790: lr=1.00E-05, loss= 1.2703 (max= 2.0872), tps=18245, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:21:09,784 - root - INFO - Step 9790: lr=1.00E-05, loss= 1.2703 (max= 2.0872), tps=18245, mfu=38.01%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:21:09,784 - root - INFO - Step 9790: lr=1.00E-05, loss= 1.2703 (max= 2.0872), tps=18246, mfu=38.02%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:21:27,841 - root - INFO - Step 9800: lr=1.00E-05, loss= 1.2634 (max= 2.1236), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:21:27,842 - root - INFO - Step 9800: lr=1.00E-05, loss= 1.2634 (max= 2.1236), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:21:27,842 - root - INFO - Step 9800: lr=1.00E-05, loss= 1.2634 (max= 2.1236), tps=18150, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:21:27,842 - root - INFO - Step 9800: lr=1.00E-05, loss= 1.2634 (max= 2.1236), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:21:27,842 - root - INFO - Step 9800: lr=1.00E-05, loss= 1.2634 (max= 2.1236), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:21:27,842 - root - INFO - Step 9800: lr=1.00E-05, loss= 1.2634 (max= 2.1236), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:21:27,842 - root - INFO - Step 9800: lr=1.00E-05, loss= 1.2634 (max= 2.1236), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:21:27,842 - root - INFO - Step 9800: lr=1.00E-05, loss= 1.2634 (max= 2.1236), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:21:45,893 - root - INFO - Step 9810: lr=1.00E-05, loss= 1.2810 (max= 2.4258), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:21:45,894 - root - INFO - Step 9810: lr=1.00E-05, loss= 1.2810 (max= 2.4258), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:21:45,894 - root - INFO - Step 9810: lr=1.00E-05, loss= 1.2810 (max= 2.4258), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:21:45,894 - root - INFO - Step 9810: lr=1.00E-05, loss= 1.2810 (max= 2.4258), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:21:45,894 - root - INFO - Step 9810: lr=1.00E-05, loss= 1.2810 (max= 2.4258), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:21:45,894 - root - INFO - Step 9810: lr=1.00E-05, loss= 1.2810 (max= 2.4258), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:21:45,894 - root - INFO - Step 9810: lr=1.00E-05, loss= 1.2810 (max= 2.4258), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:21:45,894 - root - INFO - Step 9810: lr=1.00E-05, loss= 1.2810 (max= 2.4258), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:03,974 - root - INFO - Step 9820: lr=1.00E-05, loss= 1.2611 (max= 2.1746), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:03,975 - root - INFO - Step 9820: lr=1.00E-05, loss= 1.2611 (max= 2.1746), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:03,975 - root - INFO - Step 9820: lr=1.00E-05, loss= 1.2611 (max= 2.1746), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:03,975 - root - INFO - Step 9820: lr=1.00E-05, loss= 1.2611 (max= 2.1746), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:03,975 - root - INFO - Step 9820: lr=1.00E-05, loss= 1.2611 (max= 2.1746), tps=18127, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:03,975 - root - INFO - Step 9820: lr=1.00E-05, loss= 1.2611 (max= 2.1746), tps=18127, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:03,975 - root - INFO - Step 9820: lr=1.00E-05, loss= 1.2611 (max= 2.1746), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:03,975 - root - INFO - Step 9820: lr=1.00E-05, loss= 1.2611 (max= 2.1746), tps=18127, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:22,002 - root - INFO - Step 9830: lr=1.00E-05, loss= 1.2749 (max= 2.3061), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:22,002 - root - INFO - Step 9830: lr=1.00E-05, loss= 1.2749 (max= 2.3061), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:22,002 - root - INFO - Step 9830: lr=1.00E-05, loss= 1.2749 (max= 2.3061), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:22,002 - root - INFO - Step 9830: lr=1.00E-05, loss= 1.2749 (max= 2.3061), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:22,002 - root - INFO - Step 9830: lr=1.00E-05, loss= 1.2749 (max= 2.3061), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:22,002 - root - INFO - Step 9830: lr=1.00E-05, loss= 1.2749 (max= 2.3061), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:22,002 - root - INFO - Step 9830: lr=1.00E-05, loss= 1.2749 (max= 2.3061), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:22,002 - root - INFO - Step 9830: lr=1.00E-05, loss= 1.2749 (max= 2.3061), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:40,042 - root - INFO - Step 9840: lr=1.00E-05, loss= 1.3074 (max= 2.2051), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:22:40,042 - root - INFO - Step 9840: lr=1.00E-05, loss= 1.3074 (max= 2.2051), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:22:40,042 - root - INFO - Step 9840: lr=1.00E-05, loss= 1.3074 (max= 2.2051), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:22:40,042 - root - INFO - Step 9840: lr=1.00E-05, loss= 1.3074 (max= 2.2051), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:22:40,042 - root - INFO - Step 9840: lr=1.00E-05, loss= 1.3074 (max= 2.2051), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:22:40,042 - root - INFO - Step 9840: lr=1.00E-05, loss= 1.3074 (max= 2.2051), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:22:40,042 - root - INFO - Step 9840: lr=1.00E-05, loss= 1.3074 (max= 2.2051), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:22:40,043 - root - INFO - Step 9840: lr=1.00E-05, loss= 1.3074 (max= 2.2051), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:22:58,039 - root - INFO - Step 9850: lr=1.00E-05, loss= 1.2489 (max= 2.0893), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:58,039 - root - INFO - Step 9850: lr=1.00E-05, loss= 1.2489 (max= 2.0893), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:58,039 - root - INFO - Step 9850: lr=1.00E-05, loss= 1.2489 (max= 2.0893), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:58,039 - root - INFO - Step 9850: lr=1.00E-05, loss= 1.2489 (max= 2.0893), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:58,039 - root - INFO - Step 9850: lr=1.00E-05, loss= 1.2489 (max= 2.0893), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:58,039 - root - INFO - Step 9850: lr=1.00E-05, loss= 1.2489 (max= 2.0893), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:58,039 - root - INFO - Step 9850: lr=1.00E-05, loss= 1.2489 (max= 2.0893), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:22:58,039 - root - INFO - Step 9850: lr=1.00E-05, loss= 1.2489 (max= 2.0893), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:16,073 - root - INFO - Step 9860: lr=1.00E-05, loss= 1.2394 (max= 2.4759), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:16,073 - root - INFO - Step 9860: lr=1.00E-05, loss= 1.2394 (max= 2.4759), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:16,073 - root - INFO - Step 9860: lr=1.00E-05, loss= 1.2394 (max= 2.4759), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:16,073 - root - INFO - Step 9860: lr=1.00E-05, loss= 1.2394 (max= 2.4759), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:16,073 - root - INFO - Step 9860: lr=1.00E-05, loss= 1.2394 (max= 2.4759), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:16,073 - root - INFO - Step 9860: lr=1.00E-05, loss= 1.2394 (max= 2.4759), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:16,073 - root - INFO - Step 9860: lr=1.00E-05, loss= 1.2394 (max= 2.4759), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:16,074 - root - INFO - Step 9860: lr=1.00E-05, loss= 1.2394 (max= 2.4759), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:34,098 - root - INFO - Step 9870: lr=1.00E-05, loss= 1.2680 (max= 2.0832), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:34,099 - root - INFO - Step 9870: lr=1.00E-05, loss= 1.2680 (max= 2.0832), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:34,099 - root - INFO - Step 9870: lr=1.00E-05, loss= 1.2680 (max= 2.0832), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:34,099 - root - INFO - Step 9870: lr=1.00E-05, loss= 1.2680 (max= 2.0832), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:34,099 - root - INFO - Step 9870: lr=1.00E-05, loss= 1.2680 (max= 2.0832), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:34,099 - root - INFO - Step 9870: lr=1.00E-05, loss= 1.2680 (max= 2.0832), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:34,099 - root - INFO - Step 9870: lr=1.00E-05, loss= 1.2680 (max= 2.0832), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:34,099 - root - INFO - Step 9870: lr=1.00E-05, loss= 1.2680 (max= 2.0832), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:52,123 - root - INFO - Step 9880: lr=1.00E-05, loss= 1.2609 (max= 3.6326), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:52,123 - root - INFO - Step 9880: lr=1.00E-05, loss= 1.2609 (max= 3.6326), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:52,123 - root - INFO - Step 9880: lr=1.00E-05, loss= 1.2609 (max= 3.6326), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:52,123 - root - INFO - Step 9880: lr=1.00E-05, loss= 1.2609 (max= 3.6326), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:52,123 - root - INFO - Step 9880: lr=1.00E-05, loss= 1.2609 (max= 3.6326), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:52,123 - root - INFO - Step 9880: lr=1.00E-05, loss= 1.2609 (max= 3.6326), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:52,123 - root - INFO - Step 9880: lr=1.00E-05, loss= 1.2609 (max= 3.6326), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:23:52,124 - root - INFO - Step 9880: lr=1.00E-05, loss= 1.2609 (max= 3.6326), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:10,169 - root - INFO - Step 9890: lr=1.00E-05, loss= 1.2630 (max= 2.4036), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:10,169 - root - INFO - Step 9890: lr=1.00E-05, loss= 1.2630 (max= 2.4036), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:10,169 - root - INFO - Step 9890: lr=1.00E-05, loss= 1.2630 (max= 2.4036), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:10,169 - root - INFO - Step 9890: lr=1.00E-05, loss= 1.2630 (max= 2.4036), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:10,169 - root - INFO - Step 9890: lr=1.00E-05, loss= 1.2630 (max= 2.4036), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:10,169 - root - INFO - Step 9890: lr=1.00E-05, loss= 1.2630 (max= 2.4036), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:10,169 - root - INFO - Step 9890: lr=1.00E-05, loss= 1.2630 (max= 2.4036), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:10,169 - root - INFO - Step 9890: lr=1.00E-05, loss= 1.2630 (max= 2.4036), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:28,175 - root - INFO - Step 9900: lr=1.00E-05, loss= 1.2865 (max= 2.1206), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:28,175 - root - INFO - Step 9900: lr=1.00E-05, loss= 1.2865 (max= 2.1206), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:28,175 - root - INFO - Step 9900: lr=1.00E-05, loss= 1.2865 (max= 2.1206), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:28,175 - root - INFO - Step 9900: lr=1.00E-05, loss= 1.2865 (max= 2.1206), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:28,175 - root - INFO - Step 9900: lr=1.00E-05, loss= 1.2865 (max= 2.1206), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:28,175 - root - INFO - Step 9900: lr=1.00E-05, loss= 1.2865 (max= 2.1206), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:28,175 - root - INFO - Step 9900: lr=1.00E-05, loss= 1.2865 (max= 2.1206), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:28,175 - root - INFO - Step 9900: lr=1.00E-05, loss= 1.2865 (max= 2.1206), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:46,214 - root - INFO - Step 9910: lr=1.00E-05, loss= 1.2862 (max= 2.0549), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:46,215 - root - INFO - Step 9910: lr=1.00E-05, loss= 1.2862 (max= 2.0549), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:46,215 - root - INFO - Step 9910: lr=1.00E-05, loss= 1.2862 (max= 2.0549), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:46,215 - root - INFO - Step 9910: lr=1.00E-05, loss= 1.2862 (max= 2.0549), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:46,215 - root - INFO - Step 9910: lr=1.00E-05, loss= 1.2862 (max= 2.0549), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:46,215 - root - INFO - Step 9910: lr=1.00E-05, loss= 1.2862 (max= 2.0549), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:46,215 - root - INFO - Step 9910: lr=1.00E-05, loss= 1.2862 (max= 2.0549), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:24:46,215 - root - INFO - Step 9910: lr=1.00E-05, loss= 1.2862 (max= 2.0549), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:04,315 - root - INFO - Step 9920: lr=1.00E-05, loss= 1.2765 (max= 1.9744), tps=18107, mfu=37.73%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:04,315 - root - INFO - Step 9920: lr=1.00E-05, loss= 1.2765 (max= 1.9744), tps=18107, mfu=37.73%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:04,315 - root - INFO - Step 9920: lr=1.00E-05, loss= 1.2765 (max= 1.9744), tps=18107, mfu=37.73%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:04,315 - root - INFO - Step 9920: lr=1.00E-05, loss= 1.2765 (max= 1.9744), tps=18107, mfu=37.73%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:04,315 - root - INFO - Step 9920: lr=1.00E-05, loss= 1.2765 (max= 1.9744), tps=18107, mfu=37.73%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:04,315 - root - INFO - Step 9920: lr=1.00E-05, loss= 1.2765 (max= 1.9744), tps=18107, mfu=37.73%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:04,315 - root - INFO - Step 9920: lr=1.00E-05, loss= 1.2765 (max= 1.9744), tps=18107, mfu=37.73%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:04,315 - root - INFO - Step 9920: lr=1.00E-05, loss= 1.2765 (max= 1.9744), tps=18107, mfu=37.73%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:22,357 - root - INFO - Step 9930: lr=1.00E-05, loss= 1.2619 (max= 2.3569), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:22,358 - root - INFO - Step 9930: lr=1.00E-05, loss= 1.2619 (max= 2.3569), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:22,358 - root - INFO - Step 9930: lr=1.00E-05, loss= 1.2619 (max= 2.3569), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:22,358 - root - INFO - Step 9930: lr=1.00E-05, loss= 1.2619 (max= 2.3569), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:22,358 - root - INFO - Step 9930: lr=1.00E-05, loss= 1.2619 (max= 2.3569), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:22,358 - root - INFO - Step 9930: lr=1.00E-05, loss= 1.2619 (max= 2.3569), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:22,358 - root - INFO - Step 9930: lr=1.00E-05, loss= 1.2619 (max= 2.3569), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:22,358 - root - INFO - Step 9930: lr=1.00E-05, loss= 1.2619 (max= 2.3569), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:40,400 - root - INFO - Step 9940: lr=1.00E-05, loss= 1.2288 (max= 2.3326), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:40,400 - root - INFO - Step 9940: lr=1.00E-05, loss= 1.2288 (max= 2.3326), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:40,400 - root - INFO - Step 9940: lr=1.00E-05, loss= 1.2288 (max= 2.3326), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:40,400 - root - INFO - Step 9940: lr=1.00E-05, loss= 1.2288 (max= 2.3326), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:40,400 - root - INFO - Step 9940: lr=1.00E-05, loss= 1.2288 (max= 2.3326), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:40,400 - root - INFO - Step 9940: lr=1.00E-05, loss= 1.2288 (max= 2.3326), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:40,400 - root - INFO - Step 9940: lr=1.00E-05, loss= 1.2288 (max= 2.3326), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:40,400 - root - INFO - Step 9940: lr=1.00E-05, loss= 1.2288 (max= 2.3326), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:58,426 - root - INFO - Step 9950: lr=1.00E-05, loss= 1.2751 (max= 2.3015), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:58,426 - root - INFO - Step 9950: lr=1.00E-05, loss= 1.2751 (max= 2.3015), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:58,426 - root - INFO - Step 9950: lr=1.00E-05, loss= 1.2751 (max= 2.3015), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:58,426 - root - INFO - Step 9950: lr=1.00E-05, loss= 1.2751 (max= 2.3015), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:58,426 - root - INFO - Step 9950: lr=1.00E-05, loss= 1.2751 (max= 2.3015), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:58,426 - root - INFO - Step 9950: lr=1.00E-05, loss= 1.2751 (max= 2.3015), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:58,426 - root - INFO - Step 9950: lr=1.00E-05, loss= 1.2751 (max= 2.3015), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:25:58,426 - root - INFO - Step 9950: lr=1.00E-05, loss= 1.2751 (max= 2.3015), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:26:16,458 - root - INFO - Step 9960: lr=1.00E-05, loss= 1.2847 (max= 2.3265), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:26:16,458 - root - INFO - Step 9960: lr=1.00E-05, loss= 1.2847 (max= 2.3265), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:26:16,458 - root - INFO - Step 9960: lr=1.00E-05, loss= 1.2847 (max= 2.3265), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:26:16,458 - root - INFO - Step 9960: lr=1.00E-05, loss= 1.2847 (max= 2.3265), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:26:16,458 - root - INFO - Step 9960: lr=1.00E-05, loss= 1.2847 (max= 2.3265), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:26:16,458 - root - INFO - Step 9960: lr=1.00E-05, loss= 1.2847 (max= 2.3265), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:26:16,458 - root - INFO - Step 9960: lr=1.00E-05, loss= 1.2847 (max= 2.3265), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:26:16,458 - root - INFO - Step 9960: lr=1.00E-05, loss= 1.2847 (max= 2.3265), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:26:34,463 - root - INFO - Step 9970: lr=1.00E-05, loss= 1.2892 (max= 2.1359), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:26:34,463 - root - INFO - Step 9970: lr=1.00E-05, loss= 1.2892 (max= 2.1359), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:26:34,463 - root - INFO - Step 9970: lr=1.00E-05, loss= 1.2892 (max= 2.1359), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:26:34,463 - root - INFO - Step 9970: lr=1.00E-05, loss= 1.2892 (max= 2.1359), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:26:34,463 - root - INFO - Step 9970: lr=1.00E-05, loss= 1.2892 (max= 2.1359), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:26:34,463 - root - INFO - Step 9970: lr=1.00E-05, loss= 1.2892 (max= 2.1359), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:26:34,463 - root - INFO - Step 9970: lr=1.00E-05, loss= 1.2892 (max= 2.1359), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:26:34,463 - root - INFO - Step 9970: lr=1.00E-05, loss= 1.2892 (max= 2.1359), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:26:52,539 - root - INFO - Step 9980: lr=1.00E-05, loss= 1.2521 (max= 2.0331), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:26:52,539 - root - INFO - Step 9980: lr=1.00E-05, loss= 1.2521 (max= 2.0331), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:26:52,539 - root - INFO - Step 9980: lr=1.00E-05, loss= 1.2521 (max= 2.0331), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:26:52,539 - root - INFO - Step 9980: lr=1.00E-05, loss= 1.2521 (max= 2.0331), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:26:52,539 - root - INFO - Step 9980: lr=1.00E-05, loss= 1.2521 (max= 2.0331), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:26:52,539 - root - INFO - Step 9980: lr=1.00E-05, loss= 1.2521 (max= 2.0331), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:26:52,539 - root - INFO - Step 9980: lr=1.00E-05, loss= 1.2521 (max= 2.0331), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:26:52,539 - root - INFO - Step 9980: lr=1.00E-05, loss= 1.2521 (max= 2.0331), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:27:10,600 - root - INFO - Step 9990: lr=1.00E-05, loss= 1.2582 (max= 2.0664), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:27:10,600 - root - INFO - Step 9990: lr=1.00E-05, loss= 1.2582 (max= 2.0664), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:27:10,600 - root - INFO - Step 9990: lr=1.00E-05, loss= 1.2582 (max= 2.0664), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:27:10,600 - root - INFO - Step 9990: lr=1.00E-05, loss= 1.2582 (max= 2.0664), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:27:10,600 - root - INFO - Step 9990: lr=1.00E-05, loss= 1.2582 (max= 2.0664), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:27:10,600 - root - INFO - Step 9990: lr=1.00E-05, loss= 1.2582 (max= 2.0664), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:27:10,600 - root - INFO - Step 9990: lr=1.00E-05, loss= 1.2582 (max= 2.0664), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:27:10,600 - root - INFO - Step 9990: lr=1.00E-05, loss= 1.2582 (max= 2.0664), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-10000 +2025-10-24 14:27:28,644 - root - INFO - Step 10000: lr=1.00E-05, loss= 1.2478 (max= 2.2372), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:27:28,644 - root - INFO - Saving a full checkpoint at step 10000 +2025-10-24 14:27:28,644 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 14:27:28,644 - root - INFO - Step 10000: lr=1.00E-05, loss= 1.2478 (max= 2.2372), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:27:28,644 - root - INFO - Step 10000: lr=1.00E-05, loss= 1.2478 (max= 2.2372), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:27:28,644 - root - INFO - Step 10000: lr=1.00E-05, loss= 1.2478 (max= 2.2372), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:27:28,644 - root - INFO - Step 10000: lr=1.00E-05, loss= 1.2478 (max= 2.2372), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:27:28,644 - root - INFO - Saving a full checkpoint at step 10000 +2025-10-24 14:27:28,644 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 14:27:28,644 - root - INFO - Saving a full checkpoint at step 10000 +2025-10-24 14:27:28,644 - root - INFO - Saving a full checkpoint at step 10000 +2025-10-24 14:27:28,644 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 14:27:28,644 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 14:27:28,644 - root - INFO - Saving a full checkpoint at step 10000 +2025-10-24 14:27:28,644 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 14:27:28,644 - root - INFO - Step 10000: lr=1.00E-05, loss= 1.2478 (max= 2.2372), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:27:28,644 - root - INFO - Saving a full checkpoint at step 10000 +2025-10-24 14:27:28,644 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 14:27:28,645 - root - INFO - Step 10000: lr=1.00E-05, loss= 1.2478 (max= 2.2372), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:27:28,645 - root - INFO - Saving a full checkpoint at step 10000 +2025-10-24 14:27:28,645 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 14:27:28,645 - root - INFO - Step 10000: lr=1.00E-05, loss= 1.2478 (max= 2.2372), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:27:28,645 - root - INFO - Saving a full checkpoint at step 10000 +2025-10-24 14:27:28,645 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-10000! Save time: 4.632220983505249 +2025-10-24 14:27:48,136 - root - INFO - Finished saving the checkpoint in 19.49 seconds +2025-10-24 14:27:48,144 - root - INFO - Finished saving the checkpoint in 19.50 seconds +2025-10-24 14:27:48,145 - root - INFO - Finished saving the checkpoint in 19.50 seconds +2025-10-24 14:27:48,146 - root - INFO - Finished saving the checkpoint in 19.50 seconds +2025-10-24 14:27:48,146 - root - INFO - Finished saving the checkpoint in 19.50 seconds +2025-10-24 14:27:48,146 - root - INFO - Finished saving the checkpoint in 19.50 seconds +2025-10-24 14:27:48,147 - root - INFO - Finished saving the checkpoint in 19.50 seconds +2025-10-24 14:27:48,148 - root - INFO - Finished saving the checkpoint in 19.50 seconds +2025-10-24 14:28:06,114 - root - INFO - Step 10010: lr=1.00E-05, loss= 1.2773 (max= 2.1551), tps=8746, mfu=18.22%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:28:06,115 - root - INFO - Step 10010: lr=1.00E-05, loss= 1.2773 (max= 2.1551), tps=8746, mfu=18.22%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:28:06,115 - root - INFO - Step 10010: lr=1.00E-05, loss= 1.2773 (max= 2.1551), tps=8746, mfu=18.22%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:28:06,115 - root - INFO - Step 10010: lr=1.00E-05, loss= 1.2773 (max= 2.1551), tps=8746, mfu=18.22%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:28:06,115 - root - INFO - Step 10010: lr=1.00E-05, loss= 1.2773 (max= 2.1551), tps=8746, mfu=18.22%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:28:06,115 - root - INFO - Step 10010: lr=1.00E-05, loss= 1.2773 (max= 2.1551), tps=8746, mfu=18.22%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:28:06,115 - root - INFO - Step 10010: lr=1.00E-05, loss= 1.2773 (max= 2.1551), tps=8746, mfu=18.22%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:28:06,115 - root - INFO - Step 10010: lr=1.00E-05, loss= 1.2773 (max= 2.1551), tps=8746, mfu=18.22%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:28:24,178 - root - INFO - Step 10020: lr=1.00E-05, loss= 1.2638 (max= 1.9934), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:28:24,178 - root - INFO - Step 10020: lr=1.00E-05, loss= 1.2638 (max= 1.9934), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:28:24,178 - root - INFO - Step 10020: lr=1.00E-05, loss= 1.2638 (max= 1.9934), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:28:24,178 - root - INFO - Step 10020: lr=1.00E-05, loss= 1.2638 (max= 1.9934), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:28:24,178 - root - INFO - Step 10020: lr=1.00E-05, loss= 1.2638 (max= 1.9934), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:28:24,178 - root - INFO - Step 10020: lr=1.00E-05, loss= 1.2638 (max= 1.9934), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:28:24,179 - root - INFO - Step 10020: lr=1.00E-05, loss= 1.2638 (max= 1.9934), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:28:24,179 - root - INFO - Step 10020: lr=1.00E-05, loss= 1.2638 (max= 1.9934), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:28:42,214 - root - INFO - Step 10030: lr=1.00E-05, loss= 1.2844 (max= 2.6276), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:28:42,215 - root - INFO - Step 10030: lr=1.00E-05, loss= 1.2844 (max= 2.6276), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:28:42,215 - root - INFO - Step 10030: lr=1.00E-05, loss= 1.2844 (max= 2.6276), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:28:42,215 - root - INFO - Step 10030: lr=1.00E-05, loss= 1.2844 (max= 2.6276), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:28:42,215 - root - INFO - Step 10030: lr=1.00E-05, loss= 1.2844 (max= 2.6276), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:28:42,215 - root - INFO - Step 10030: lr=1.00E-05, loss= 1.2844 (max= 2.6276), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:28:42,215 - root - INFO - Step 10030: lr=1.00E-05, loss= 1.2844 (max= 2.6276), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:28:42,215 - root - INFO - Step 10030: lr=1.00E-05, loss= 1.2844 (max= 2.6276), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:00,236 - root - INFO - Step 10040: lr=1.00E-05, loss= 1.2411 (max= 2.2547), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:00,236 - root - INFO - Step 10040: lr=1.00E-05, loss= 1.2411 (max= 2.2547), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:00,236 - root - INFO - Step 10040: lr=1.00E-05, loss= 1.2411 (max= 2.2547), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:00,236 - root - INFO - Step 10040: lr=1.00E-05, loss= 1.2411 (max= 2.2547), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:00,236 - root - INFO - Step 10040: lr=1.00E-05, loss= 1.2411 (max= 2.2547), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:00,236 - root - INFO - Step 10040: lr=1.00E-05, loss= 1.2411 (max= 2.2547), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:00,236 - root - INFO - Step 10040: lr=1.00E-05, loss= 1.2411 (max= 2.2547), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:00,237 - root - INFO - Step 10040: lr=1.00E-05, loss= 1.2411 (max= 2.2547), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:18,318 - root - INFO - Step 10050: lr=1.00E-05, loss= 1.2746 (max= 2.3857), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:18,318 - root - INFO - Step 10050: lr=1.00E-05, loss= 1.2746 (max= 2.3857), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:18,318 - root - INFO - Step 10050: lr=1.00E-05, loss= 1.2746 (max= 2.3857), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:18,318 - root - INFO - Step 10050: lr=1.00E-05, loss= 1.2746 (max= 2.3857), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:18,318 - root - INFO - Step 10050: lr=1.00E-05, loss= 1.2746 (max= 2.3857), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:18,318 - root - INFO - Step 10050: lr=1.00E-05, loss= 1.2746 (max= 2.3857), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:18,318 - root - INFO - Step 10050: lr=1.00E-05, loss= 1.2746 (max= 2.3857), tps=18126, mfu=37.76%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:18,318 - root - INFO - Step 10050: lr=1.00E-05, loss= 1.2746 (max= 2.3857), tps=18126, mfu=37.76%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:36,350 - root - INFO - Step 10060: lr=1.00E-05, loss= 1.2647 (max= 2.0188), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:36,350 - root - INFO - Step 10060: lr=1.00E-05, loss= 1.2647 (max= 2.0188), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:36,350 - root - INFO - Step 10060: lr=1.00E-05, loss= 1.2647 (max= 2.0188), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:36,350 - root - INFO - Step 10060: lr=1.00E-05, loss= 1.2647 (max= 2.0188), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:36,350 - root - INFO - Step 10060: lr=1.00E-05, loss= 1.2647 (max= 2.0188), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:36,350 - root - INFO - Step 10060: lr=1.00E-05, loss= 1.2647 (max= 2.0188), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:36,350 - root - INFO - Step 10060: lr=1.00E-05, loss= 1.2647 (max= 2.0188), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:36,351 - root - INFO - Step 10060: lr=1.00E-05, loss= 1.2647 (max= 2.0188), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:29:54,410 - root - INFO - Step 10070: lr=1.00E-05, loss= 1.2754 (max= 2.2396), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:29:54,410 - root - INFO - Step 10070: lr=1.00E-05, loss= 1.2754 (max= 2.2396), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:29:54,410 - root - INFO - Step 10070: lr=1.00E-05, loss= 1.2754 (max= 2.2396), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:29:54,410 - root - INFO - Step 10070: lr=1.00E-05, loss= 1.2754 (max= 2.2396), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:29:54,410 - root - INFO - Step 10070: lr=1.00E-05, loss= 1.2754 (max= 2.2396), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:29:54,410 - root - INFO - Step 10070: lr=1.00E-05, loss= 1.2754 (max= 2.2396), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:29:54,410 - root - INFO - Step 10070: lr=1.00E-05, loss= 1.2754 (max= 2.2396), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:29:54,410 - root - INFO - Step 10070: lr=1.00E-05, loss= 1.2754 (max= 2.2396), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:30:12,439 - root - INFO - Step 10080: lr=1.00E-05, loss= 1.2652 (max= 2.0761), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:30:12,440 - root - INFO - Step 10080: lr=1.00E-05, loss= 1.2652 (max= 2.0761), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:30:12,440 - root - INFO - Step 10080: lr=1.00E-05, loss= 1.2652 (max= 2.0761), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:30:12,440 - root - INFO - Step 10080: lr=1.00E-05, loss= 1.2652 (max= 2.0761), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:30:12,440 - root - INFO - Step 10080: lr=1.00E-05, loss= 1.2652 (max= 2.0761), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:30:12,440 - root - INFO - Step 10080: lr=1.00E-05, loss= 1.2652 (max= 2.0761), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:30:12,440 - root - INFO - Step 10080: lr=1.00E-05, loss= 1.2652 (max= 2.0761), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:30:12,440 - root - INFO - Step 10080: lr=1.00E-05, loss= 1.2652 (max= 2.0761), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:30:30,528 - root - INFO - Step 10090: lr=1.00E-05, loss= 1.2633 (max= 2.1383), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:30:30,528 - root - INFO - Step 10090: lr=1.00E-05, loss= 1.2633 (max= 2.1383), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:30:30,528 - root - INFO - Step 10090: lr=1.00E-05, loss= 1.2633 (max= 2.1383), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:30:30,528 - root - INFO - Step 10090: lr=1.00E-05, loss= 1.2633 (max= 2.1383), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:30:30,528 - root - INFO - Step 10090: lr=1.00E-05, loss= 1.2633 (max= 2.1383), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:30:30,528 - root - INFO - Step 10090: lr=1.00E-05, loss= 1.2633 (max= 2.1383), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:30:30,528 - root - INFO - Step 10090: lr=1.00E-05, loss= 1.2633 (max= 2.1383), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:30:30,529 - root - INFO - Step 10090: lr=1.00E-05, loss= 1.2633 (max= 2.1383), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:30:48,563 - root - INFO - Step 10100: lr=1.00E-05, loss= 1.2583 (max= 2.2928), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:30:48,563 - root - INFO - Step 10100: lr=1.00E-05, loss= 1.2583 (max= 2.2928), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:30:48,563 - root - INFO - Step 10100: lr=1.00E-05, loss= 1.2583 (max= 2.2928), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:30:48,563 - root - INFO - Step 10100: lr=1.00E-05, loss= 1.2583 (max= 2.2928), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:30:48,563 - root - INFO - Step 10100: lr=1.00E-05, loss= 1.2583 (max= 2.2928), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:30:48,563 - root - INFO - Step 10100: lr=1.00E-05, loss= 1.2583 (max= 2.2928), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:30:48,563 - root - INFO - Step 10100: lr=1.00E-05, loss= 1.2583 (max= 2.2928), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:30:48,564 - root - INFO - Step 10100: lr=1.00E-05, loss= 1.2583 (max= 2.2928), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:31:06,577 - root - INFO - Step 10110: lr=1.00E-05, loss= 1.2570 (max= 2.2630), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:31:06,577 - root - INFO - Step 10110: lr=1.00E-05, loss= 1.2570 (max= 2.2630), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:31:06,577 - root - INFO - Step 10110: lr=1.00E-05, loss= 1.2570 (max= 2.2630), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:31:06,577 - root - INFO - Step 10110: lr=1.00E-05, loss= 1.2570 (max= 2.2630), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:31:06,577 - root - INFO - Step 10110: lr=1.00E-05, loss= 1.2570 (max= 2.2630), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:31:06,577 - root - INFO - Step 10110: lr=1.00E-05, loss= 1.2570 (max= 2.2630), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:31:06,577 - root - INFO - Step 10110: lr=1.00E-05, loss= 1.2570 (max= 2.2630), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:31:06,578 - root - INFO - Step 10110: lr=1.00E-05, loss= 1.2570 (max= 2.2630), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:31:24,606 - root - INFO - Step 10120: lr=1.00E-05, loss= 1.3226 (max= 3.1497), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:31:24,606 - root - INFO - Step 10120: lr=1.00E-05, loss= 1.3226 (max= 3.1497), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:31:24,606 - root - INFO - Step 10120: lr=1.00E-05, loss= 1.3226 (max= 3.1497), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:31:24,606 - root - INFO - Step 10120: lr=1.00E-05, loss= 1.3226 (max= 3.1497), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:31:24,606 - root - INFO - Step 10120: lr=1.00E-05, loss= 1.3226 (max= 3.1497), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:31:24,606 - root - INFO - Step 10120: lr=1.00E-05, loss= 1.3226 (max= 3.1497), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:31:24,606 - root - INFO - Step 10120: lr=1.00E-05, loss= 1.3226 (max= 3.1497), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:31:24,606 - root - INFO - Step 10120: lr=1.00E-05, loss= 1.3226 (max= 3.1497), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:31:42,608 - root - INFO - Step 10130: lr=1.00E-05, loss= 1.2744 (max= 2.2649), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:31:42,608 - root - INFO - Step 10130: lr=1.00E-05, loss= 1.2744 (max= 2.2649), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:31:42,608 - root - INFO - Step 10130: lr=1.00E-05, loss= 1.2744 (max= 2.2649), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:31:42,608 - root - INFO - Step 10130: lr=1.00E-05, loss= 1.2744 (max= 2.2649), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:31:42,608 - root - INFO - Step 10130: lr=1.00E-05, loss= 1.2744 (max= 2.2649), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:31:42,608 - root - INFO - Step 10130: lr=1.00E-05, loss= 1.2744 (max= 2.2649), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:31:42,608 - root - INFO - Step 10130: lr=1.00E-05, loss= 1.2744 (max= 2.2649), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:31:42,608 - root - INFO - Step 10130: lr=1.00E-05, loss= 1.2744 (max= 2.2649), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:32:00,642 - root - INFO - Step 10140: lr=1.00E-05, loss= 1.2599 (max= 2.3500), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:32:00,642 - root - INFO - Step 10140: lr=1.00E-05, loss= 1.2599 (max= 2.3500), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:32:00,642 - root - INFO - Step 10140: lr=1.00E-05, loss= 1.2599 (max= 2.3500), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:32:00,642 - root - INFO - Step 10140: lr=1.00E-05, loss= 1.2599 (max= 2.3500), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:32:00,642 - root - INFO - Step 10140: lr=1.00E-05, loss= 1.2599 (max= 2.3500), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:32:00,642 - root - INFO - Step 10140: lr=1.00E-05, loss= 1.2599 (max= 2.3500), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:32:00,642 - root - INFO - Step 10140: lr=1.00E-05, loss= 1.2599 (max= 2.3500), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:32:00,642 - root - INFO - Step 10140: lr=1.00E-05, loss= 1.2599 (max= 2.3500), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:32:18,683 - root - INFO - Step 10150: lr=1.00E-05, loss= 1.2755 (max= 2.4240), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:18,683 - root - INFO - Step 10150: lr=1.00E-05, loss= 1.2755 (max= 2.4240), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:18,683 - root - INFO - Step 10150: lr=1.00E-05, loss= 1.2755 (max= 2.4240), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:18,683 - root - INFO - Step 10150: lr=1.00E-05, loss= 1.2755 (max= 2.4240), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:18,683 - root - INFO - Step 10150: lr=1.00E-05, loss= 1.2755 (max= 2.4240), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:18,683 - root - INFO - Step 10150: lr=1.00E-05, loss= 1.2755 (max= 2.4240), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:18,684 - root - INFO - Step 10150: lr=1.00E-05, loss= 1.2755 (max= 2.4240), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:18,684 - root - INFO - Step 10150: lr=1.00E-05, loss= 1.2755 (max= 2.4240), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:36,699 - root - INFO - Step 10160: lr=1.00E-05, loss= 1.2387 (max= 1.9432), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:36,699 - root - INFO - Step 10160: lr=1.00E-05, loss= 1.2387 (max= 1.9432), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:36,699 - root - INFO - Step 10160: lr=1.00E-05, loss= 1.2387 (max= 1.9432), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:36,699 - root - INFO - Step 10160: lr=1.00E-05, loss= 1.2387 (max= 1.9432), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:36,699 - root - INFO - Step 10160: lr=1.00E-05, loss= 1.2387 (max= 1.9432), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:36,699 - root - INFO - Step 10160: lr=1.00E-05, loss= 1.2387 (max= 1.9432), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:36,699 - root - INFO - Step 10160: lr=1.00E-05, loss= 1.2387 (max= 1.9432), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:36,699 - root - INFO - Step 10160: lr=1.00E-05, loss= 1.2387 (max= 1.9432), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:54,719 - root - INFO - Step 10170: lr=1.00E-05, loss= 1.2335 (max= 2.4472), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:54,719 - root - INFO - Step 10170: lr=1.00E-05, loss= 1.2335 (max= 2.4472), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:54,719 - root - INFO - Step 10170: lr=1.00E-05, loss= 1.2335 (max= 2.4472), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:54,719 - root - INFO - Step 10170: lr=1.00E-05, loss= 1.2335 (max= 2.4472), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:54,719 - root - INFO - Step 10170: lr=1.00E-05, loss= 1.2335 (max= 2.4472), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:54,719 - root - INFO - Step 10170: lr=1.00E-05, loss= 1.2335 (max= 2.4472), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:54,720 - root - INFO - Step 10170: lr=1.00E-05, loss= 1.2335 (max= 2.4472), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:32:54,720 - root - INFO - Step 10170: lr=1.00E-05, loss= 1.2335 (max= 2.4472), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:12,763 - root - INFO - Step 10180: lr=1.00E-05, loss= 1.2948 (max= 2.5140), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:12,763 - root - INFO - Step 10180: lr=1.00E-05, loss= 1.2948 (max= 2.5140), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:12,763 - root - INFO - Step 10180: lr=1.00E-05, loss= 1.2948 (max= 2.5140), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:12,764 - root - INFO - Step 10180: lr=1.00E-05, loss= 1.2948 (max= 2.5140), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:12,764 - root - INFO - Step 10180: lr=1.00E-05, loss= 1.2948 (max= 2.5140), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:12,764 - root - INFO - Step 10180: lr=1.00E-05, loss= 1.2948 (max= 2.5140), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:12,764 - root - INFO - Step 10180: lr=1.00E-05, loss= 1.2948 (max= 2.5140), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:12,764 - root - INFO - Step 10180: lr=1.00E-05, loss= 1.2948 (max= 2.5140), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:30,823 - root - INFO - Step 10190: lr=1.00E-05, loss= 1.2688 (max= 2.1462), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:30,823 - root - INFO - Step 10190: lr=1.00E-05, loss= 1.2688 (max= 2.1462), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:30,823 - root - INFO - Step 10190: lr=1.00E-05, loss= 1.2688 (max= 2.1462), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:30,823 - root - INFO - Step 10190: lr=1.00E-05, loss= 1.2688 (max= 2.1462), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:30,824 - root - INFO - Step 10190: lr=1.00E-05, loss= 1.2688 (max= 2.1462), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:30,824 - root - INFO - Step 10190: lr=1.00E-05, loss= 1.2688 (max= 2.1462), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:30,824 - root - INFO - Step 10190: lr=1.00E-05, loss= 1.2688 (max= 2.1462), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:30,824 - root - INFO - Step 10190: lr=1.00E-05, loss= 1.2688 (max= 2.1462), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:48,841 - root - INFO - Step 10200: lr=1.00E-05, loss= 1.2587 (max= 3.6730), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:48,841 - root - INFO - Step 10200: lr=1.00E-05, loss= 1.2587 (max= 3.6730), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:48,841 - root - INFO - Step 10200: lr=1.00E-05, loss= 1.2587 (max= 3.6730), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:48,841 - root - INFO - Step 10200: lr=1.00E-05, loss= 1.2587 (max= 3.6730), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:48,841 - root - INFO - Step 10200: lr=1.00E-05, loss= 1.2587 (max= 3.6730), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:48,841 - root - INFO - Step 10200: lr=1.00E-05, loss= 1.2587 (max= 3.6730), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:48,841 - root - INFO - Step 10200: lr=1.00E-05, loss= 1.2587 (max= 3.6730), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:33:48,841 - root - INFO - Step 10200: lr=1.00E-05, loss= 1.2587 (max= 3.6730), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:06,916 - root - INFO - Step 10210: lr=1.00E-05, loss= 1.2722 (max= 2.4695), tps=18132, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:06,916 - root - INFO - Step 10210: lr=1.00E-05, loss= 1.2722 (max= 2.4695), tps=18132, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:06,916 - root - INFO - Step 10210: lr=1.00E-05, loss= 1.2722 (max= 2.4695), tps=18132, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:06,916 - root - INFO - Step 10210: lr=1.00E-05, loss= 1.2722 (max= 2.4695), tps=18132, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:06,916 - root - INFO - Step 10210: lr=1.00E-05, loss= 1.2722 (max= 2.4695), tps=18132, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:06,916 - root - INFO - Step 10210: lr=1.00E-05, loss= 1.2722 (max= 2.4695), tps=18132, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:06,916 - root - INFO - Step 10210: lr=1.00E-05, loss= 1.2722 (max= 2.4695), tps=18132, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:06,916 - root - INFO - Step 10210: lr=1.00E-05, loss= 1.2722 (max= 2.4695), tps=18132, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:24,935 - root - INFO - Step 10220: lr=1.00E-05, loss= 1.2561 (max= 2.4911), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:24,935 - root - INFO - Step 10220: lr=1.00E-05, loss= 1.2561 (max= 2.4911), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:24,935 - root - INFO - Step 10220: lr=1.00E-05, loss= 1.2561 (max= 2.4911), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:24,935 - root - INFO - Step 10220: lr=1.00E-05, loss= 1.2561 (max= 2.4911), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:24,935 - root - INFO - Step 10220: lr=1.00E-05, loss= 1.2561 (max= 2.4911), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:24,935 - root - INFO - Step 10220: lr=1.00E-05, loss= 1.2561 (max= 2.4911), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:24,935 - root - INFO - Step 10220: lr=1.00E-05, loss= 1.2561 (max= 2.4911), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:24,935 - root - INFO - Step 10220: lr=1.00E-05, loss= 1.2561 (max= 2.4911), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:42,938 - root - INFO - Step 10230: lr=1.00E-05, loss= 1.2626 (max= 2.1847), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:42,938 - root - INFO - Step 10230: lr=1.00E-05, loss= 1.2626 (max= 2.1847), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:42,938 - root - INFO - Step 10230: lr=1.00E-05, loss= 1.2626 (max= 2.1847), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:42,938 - root - INFO - Step 10230: lr=1.00E-05, loss= 1.2626 (max= 2.1847), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:42,938 - root - INFO - Step 10230: lr=1.00E-05, loss= 1.2626 (max= 2.1847), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:42,938 - root - INFO - Step 10230: lr=1.00E-05, loss= 1.2626 (max= 2.1847), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:42,938 - root - INFO - Step 10230: lr=1.00E-05, loss= 1.2626 (max= 2.1847), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:34:42,938 - root - INFO - Step 10230: lr=1.00E-05, loss= 1.2626 (max= 2.1847), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:00,962 - root - INFO - Step 10240: lr=1.00E-05, loss= 1.2364 (max= 2.8168), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:35:00,962 - root - INFO - Step 10240: lr=1.00E-05, loss= 1.2364 (max= 2.8168), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:35:00,962 - root - INFO - Step 10240: lr=1.00E-05, loss= 1.2364 (max= 2.8168), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:35:00,962 - root - INFO - Step 10240: lr=1.00E-05, loss= 1.2364 (max= 2.8168), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:35:00,962 - root - INFO - Step 10240: lr=1.00E-05, loss= 1.2364 (max= 2.8168), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:35:00,962 - root - INFO - Step 10240: lr=1.00E-05, loss= 1.2364 (max= 2.8168), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:35:00,962 - root - INFO - Step 10240: lr=1.00E-05, loss= 1.2364 (max= 2.8168), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:35:00,962 - root - INFO - Step 10240: lr=1.00E-05, loss= 1.2364 (max= 2.8168), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:35:18,982 - root - INFO - Step 10250: lr=1.00E-05, loss= 1.2633 (max= 2.4325), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:18,983 - root - INFO - Step 10250: lr=1.00E-05, loss= 1.2633 (max= 2.4325), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:18,983 - root - INFO - Step 10250: lr=1.00E-05, loss= 1.2633 (max= 2.4325), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:18,983 - root - INFO - Step 10250: lr=1.00E-05, loss= 1.2633 (max= 2.4325), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:18,983 - root - INFO - Step 10250: lr=1.00E-05, loss= 1.2633 (max= 2.4325), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:18,983 - root - INFO - Step 10250: lr=1.00E-05, loss= 1.2633 (max= 2.4325), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:18,983 - root - INFO - Step 10250: lr=1.00E-05, loss= 1.2633 (max= 2.4325), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:18,983 - root - INFO - Step 10250: lr=1.00E-05, loss= 1.2633 (max= 2.4325), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:37,049 - root - INFO - Step 10260: lr=1.00E-05, loss= 1.2546 (max= 2.5465), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:37,049 - root - INFO - Step 10260: lr=1.00E-05, loss= 1.2546 (max= 2.5465), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:37,049 - root - INFO - Step 10260: lr=1.00E-05, loss= 1.2546 (max= 2.5465), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:37,049 - root - INFO - Step 10260: lr=1.00E-05, loss= 1.2546 (max= 2.5465), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:37,049 - root - INFO - Step 10260: lr=1.00E-05, loss= 1.2546 (max= 2.5465), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:37,049 - root - INFO - Step 10260: lr=1.00E-05, loss= 1.2546 (max= 2.5465), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:37,049 - root - INFO - Step 10260: lr=1.00E-05, loss= 1.2546 (max= 2.5465), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:37,049 - root - INFO - Step 10260: lr=1.00E-05, loss= 1.2546 (max= 2.5465), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:55,108 - root - INFO - Step 10270: lr=1.00E-05, loss= 1.2558 (max= 2.4741), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:55,108 - root - INFO - Step 10270: lr=1.00E-05, loss= 1.2558 (max= 2.4741), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:55,108 - root - INFO - Step 10270: lr=1.00E-05, loss= 1.2558 (max= 2.4741), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:55,108 - root - INFO - Step 10270: lr=1.00E-05, loss= 1.2558 (max= 2.4741), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:55,108 - root - INFO - Step 10270: lr=1.00E-05, loss= 1.2558 (max= 2.4741), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:55,108 - root - INFO - Step 10270: lr=1.00E-05, loss= 1.2558 (max= 2.4741), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:55,108 - root - INFO - Step 10270: lr=1.00E-05, loss= 1.2558 (max= 2.4741), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:35:55,108 - root - INFO - Step 10270: lr=1.00E-05, loss= 1.2558 (max= 2.4741), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:36:13,141 - root - INFO - Step 10280: lr=1.00E-05, loss= 1.2797 (max= 2.1938), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:36:13,141 - root - INFO - Step 10280: lr=1.00E-05, loss= 1.2797 (max= 2.1938), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:36:13,141 - root - INFO - Step 10280: lr=1.00E-05, loss= 1.2797 (max= 2.1938), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:36:13,141 - root - INFO - Step 10280: lr=1.00E-05, loss= 1.2797 (max= 2.1938), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:36:13,141 - root - INFO - Step 10280: lr=1.00E-05, loss= 1.2797 (max= 2.1938), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:36:13,141 - root - INFO - Step 10280: lr=1.00E-05, loss= 1.2797 (max= 2.1938), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:36:13,141 - root - INFO - Step 10280: lr=1.00E-05, loss= 1.2797 (max= 2.1938), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:36:13,141 - root - INFO - Step 10280: lr=1.00E-05, loss= 1.2797 (max= 2.1938), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:36:31,158 - root - INFO - Step 10290: lr=1.00E-05, loss= 1.2472 (max= 2.2557), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:36:31,158 - root - INFO - Step 10290: lr=1.00E-05, loss= 1.2472 (max= 2.2557), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:36:31,158 - root - INFO - Step 10290: lr=1.00E-05, loss= 1.2472 (max= 2.2557), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:36:31,158 - root - INFO - Step 10290: lr=1.00E-05, loss= 1.2472 (max= 2.2557), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:36:31,158 - root - INFO - Step 10290: lr=1.00E-05, loss= 1.2472 (max= 2.2557), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:36:31,158 - root - INFO - Step 10290: lr=1.00E-05, loss= 1.2472 (max= 2.2557), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:36:31,158 - root - INFO - Step 10290: lr=1.00E-05, loss= 1.2472 (max= 2.2557), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:36:31,158 - root - INFO - Step 10290: lr=1.00E-05, loss= 1.2472 (max= 2.2557), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:36:49,176 - root - INFO - Step 10300: lr=1.00E-05, loss= 1.2522 (max= 2.2339), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:36:49,176 - root - INFO - Step 10300: lr=1.00E-05, loss= 1.2522 (max= 2.2339), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:36:49,176 - root - INFO - Step 10300: lr=1.00E-05, loss= 1.2522 (max= 2.2339), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:36:49,176 - root - INFO - Step 10300: lr=1.00E-05, loss= 1.2522 (max= 2.2339), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:36:49,176 - root - INFO - Step 10300: lr=1.00E-05, loss= 1.2522 (max= 2.2339), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:36:49,176 - root - INFO - Step 10300: lr=1.00E-05, loss= 1.2522 (max= 2.2339), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:36:49,176 - root - INFO - Step 10300: lr=1.00E-05, loss= 1.2522 (max= 2.2339), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:36:49,176 - root - INFO - Step 10300: lr=1.00E-05, loss= 1.2522 (max= 2.2339), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:37:07,196 - root - INFO - Step 10310: lr=1.00E-05, loss= 1.3139 (max= 2.4234), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:37:07,197 - root - INFO - Step 10310: lr=1.00E-05, loss= 1.3139 (max= 2.4234), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:37:07,197 - root - INFO - Step 10310: lr=1.00E-05, loss= 1.3139 (max= 2.4234), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:37:07,197 - root - INFO - Step 10310: lr=1.00E-05, loss= 1.3139 (max= 2.4234), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:37:07,197 - root - INFO - Step 10310: lr=1.00E-05, loss= 1.3139 (max= 2.4234), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:37:07,197 - root - INFO - Step 10310: lr=1.00E-05, loss= 1.3139 (max= 2.4234), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:37:07,197 - root - INFO - Step 10310: lr=1.00E-05, loss= 1.3139 (max= 2.4234), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:37:07,197 - root - INFO - Step 10310: lr=1.00E-05, loss= 1.3139 (max= 2.4234), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:37:25,218 - root - INFO - Step 10320: lr=1.00E-05, loss= 1.3181 (max= 2.6226), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:37:25,218 - root - INFO - Step 10320: lr=1.00E-05, loss= 1.3181 (max= 2.6226), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:37:25,218 - root - INFO - Step 10320: lr=1.00E-05, loss= 1.3181 (max= 2.6226), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:37:25,218 - root - INFO - Step 10320: lr=1.00E-05, loss= 1.3181 (max= 2.6226), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:37:25,218 - root - INFO - Step 10320: lr=1.00E-05, loss= 1.3181 (max= 2.6226), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:37:25,218 - root - INFO - Step 10320: lr=1.00E-05, loss= 1.3181 (max= 2.6226), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:37:25,218 - root - INFO - Step 10320: lr=1.00E-05, loss= 1.3181 (max= 2.6226), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:37:25,218 - root - INFO - Step 10320: lr=1.00E-05, loss= 1.3181 (max= 2.6226), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:37:43,269 - root - INFO - Step 10330: lr=1.00E-05, loss= 1.2916 (max= 2.7647), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:37:43,269 - root - INFO - Step 10330: lr=1.00E-05, loss= 1.2916 (max= 2.7647), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:37:43,269 - root - INFO - Step 10330: lr=1.00E-05, loss= 1.2916 (max= 2.7647), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:37:43,269 - root - INFO - Step 10330: lr=1.00E-05, loss= 1.2916 (max= 2.7647), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:37:43,269 - root - INFO - Step 10330: lr=1.00E-05, loss= 1.2916 (max= 2.7647), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:37:43,269 - root - INFO - Step 10330: lr=1.00E-05, loss= 1.2916 (max= 2.7647), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:37:43,269 - root - INFO - Step 10330: lr=1.00E-05, loss= 1.2916 (max= 2.7647), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:37:43,269 - root - INFO - Step 10330: lr=1.00E-05, loss= 1.2916 (max= 2.7647), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:38:01,303 - root - INFO - Step 10340: lr=1.00E-05, loss= 1.2660 (max= 2.5852), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:38:01,303 - root - INFO - Step 10340: lr=1.00E-05, loss= 1.2660 (max= 2.5852), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:38:01,303 - root - INFO - Step 10340: lr=1.00E-05, loss= 1.2660 (max= 2.5852), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:38:01,303 - root - INFO - Step 10340: lr=1.00E-05, loss= 1.2660 (max= 2.5852), tps=18174, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:38:01,303 - root - INFO - Step 10340: lr=1.00E-05, loss= 1.2660 (max= 2.5852), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:38:01,303 - root - INFO - Step 10340: lr=1.00E-05, loss= 1.2660 (max= 2.5852), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:38:01,303 - root - INFO - Step 10340: lr=1.00E-05, loss= 1.2660 (max= 2.5852), tps=18174, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:38:01,303 - root - INFO - Step 10340: lr=1.00E-05, loss= 1.2660 (max= 2.5852), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:38:19,297 - root - INFO - Step 10350: lr=1.00E-05, loss= 1.2716 (max= 3.1400), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:38:19,297 - root - INFO - Step 10350: lr=1.00E-05, loss= 1.2716 (max= 3.1400), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:38:19,297 - root - INFO - Step 10350: lr=1.00E-05, loss= 1.2716 (max= 3.1400), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:38:19,297 - root - INFO - Step 10350: lr=1.00E-05, loss= 1.2716 (max= 3.1400), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:38:19,297 - root - INFO - Step 10350: lr=1.00E-05, loss= 1.2716 (max= 3.1400), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:38:19,297 - root - INFO - Step 10350: lr=1.00E-05, loss= 1.2716 (max= 3.1400), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:38:19,297 - root - INFO - Step 10350: lr=1.00E-05, loss= 1.2716 (max= 3.1400), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:38:19,297 - root - INFO - Step 10350: lr=1.00E-05, loss= 1.2716 (max= 3.1400), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:38:37,317 - root - INFO - Step 10360: lr=1.00E-05, loss= 1.2823 (max= 2.2552), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:38:37,317 - root - INFO - Step 10360: lr=1.00E-05, loss= 1.2823 (max= 2.2552), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:38:37,317 - root - INFO - Step 10360: lr=1.00E-05, loss= 1.2823 (max= 2.2552), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:38:37,317 - root - INFO - Step 10360: lr=1.00E-05, loss= 1.2823 (max= 2.2552), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:38:37,317 - root - INFO - Step 10360: lr=1.00E-05, loss= 1.2823 (max= 2.2552), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:38:37,317 - root - INFO - Step 10360: lr=1.00E-05, loss= 1.2823 (max= 2.2552), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:38:37,317 - root - INFO - Step 10360: lr=1.00E-05, loss= 1.2823 (max= 2.2552), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:38:37,317 - root - INFO - Step 10360: lr=1.00E-05, loss= 1.2823 (max= 2.2552), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:38:55,342 - root - INFO - Step 10370: lr=1.00E-05, loss= 1.2675 (max= 2.4146), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:38:55,342 - root - INFO - Step 10370: lr=1.00E-05, loss= 1.2675 (max= 2.4146), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:38:55,342 - root - INFO - Step 10370: lr=1.00E-05, loss= 1.2675 (max= 2.4146), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:38:55,342 - root - INFO - Step 10370: lr=1.00E-05, loss= 1.2675 (max= 2.4146), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:38:55,342 - root - INFO - Step 10370: lr=1.00E-05, loss= 1.2675 (max= 2.4146), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:38:55,342 - root - INFO - Step 10370: lr=1.00E-05, loss= 1.2675 (max= 2.4146), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:38:55,342 - root - INFO - Step 10370: lr=1.00E-05, loss= 1.2675 (max= 2.4146), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:38:55,342 - root - INFO - Step 10370: lr=1.00E-05, loss= 1.2675 (max= 2.4146), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:39:13,361 - root - INFO - Step 10380: lr=1.00E-05, loss= 1.2555 (max= 1.9737), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:13,361 - root - INFO - Step 10380: lr=1.00E-05, loss= 1.2555 (max= 1.9737), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:13,361 - root - INFO - Step 10380: lr=1.00E-05, loss= 1.2555 (max= 1.9737), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:13,361 - root - INFO - Step 10380: lr=1.00E-05, loss= 1.2555 (max= 1.9737), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:13,362 - root - INFO - Step 10380: lr=1.00E-05, loss= 1.2555 (max= 1.9737), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:13,362 - root - INFO - Step 10380: lr=1.00E-05, loss= 1.2555 (max= 1.9737), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:13,362 - root - INFO - Step 10380: lr=1.00E-05, loss= 1.2555 (max= 1.9737), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:13,362 - root - INFO - Step 10380: lr=1.00E-05, loss= 1.2555 (max= 1.9737), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:31,386 - root - INFO - Step 10390: lr=1.00E-05, loss= 1.2395 (max= 2.3285), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:31,386 - root - INFO - Step 10390: lr=1.00E-05, loss= 1.2395 (max= 2.3285), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:31,386 - root - INFO - Step 10390: lr=1.00E-05, loss= 1.2395 (max= 2.3285), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:31,386 - root - INFO - Step 10390: lr=1.00E-05, loss= 1.2395 (max= 2.3285), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:31,386 - root - INFO - Step 10390: lr=1.00E-05, loss= 1.2395 (max= 2.3285), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:31,386 - root - INFO - Step 10390: lr=1.00E-05, loss= 1.2395 (max= 2.3285), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:31,387 - root - INFO - Step 10390: lr=1.00E-05, loss= 1.2395 (max= 2.3285), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:31,387 - root - INFO - Step 10390: lr=1.00E-05, loss= 1.2395 (max= 2.3285), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:49,447 - root - INFO - Step 10400: lr=1.00E-05, loss= 1.2779 (max= 2.2426), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:49,447 - root - INFO - Step 10400: lr=1.00E-05, loss= 1.2779 (max= 2.2426), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:49,447 - root - INFO - Step 10400: lr=1.00E-05, loss= 1.2779 (max= 2.2426), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:49,447 - root - INFO - Step 10400: lr=1.00E-05, loss= 1.2779 (max= 2.2426), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:49,447 - root - INFO - Step 10400: lr=1.00E-05, loss= 1.2779 (max= 2.2426), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:49,447 - root - INFO - Step 10400: lr=1.00E-05, loss= 1.2779 (max= 2.2426), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:49,447 - root - INFO - Step 10400: lr=1.00E-05, loss= 1.2779 (max= 2.2426), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:39:49,449 - root - INFO - Step 10400: lr=1.00E-05, loss= 1.2779 (max= 2.2426), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:40:07,497 - root - INFO - Step 10410: lr=1.00E-05, loss= 1.2661 (max= 2.1300), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:40:07,497 - root - INFO - Step 10410: lr=1.00E-05, loss= 1.2661 (max= 2.1300), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:40:07,497 - root - INFO - Step 10410: lr=1.00E-05, loss= 1.2661 (max= 2.1300), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:40:07,497 - root - INFO - Step 10410: lr=1.00E-05, loss= 1.2661 (max= 2.1300), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:40:07,497 - root - INFO - Step 10410: lr=1.00E-05, loss= 1.2661 (max= 2.1300), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:40:07,498 - root - INFO - Step 10410: lr=1.00E-05, loss= 1.2661 (max= 2.1300), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:40:07,498 - root - INFO - Step 10410: lr=1.00E-05, loss= 1.2661 (max= 2.1300), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:40:07,498 - root - INFO - Step 10410: lr=1.00E-05, loss= 1.2661 (max= 2.1300), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:40:28,186 - root - INFO - Step 10420: lr=1.00E-05, loss= 1.2848 (max= 2.2032), tps=15841, mfu=33.01%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.07s, 14.04%) +2025-10-24 14:40:28,186 - root - INFO - Step 10420: lr=1.00E-05, loss= 1.2848 (max= 2.2032), tps=15841, mfu=33.01%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.07s, 14.04%) +2025-10-24 14:40:28,186 - root - INFO - Step 10420: lr=1.00E-05, loss= 1.2848 (max= 2.2032), tps=15841, mfu=33.01%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.07s, 14.04%) +2025-10-24 14:40:28,186 - root - INFO - Step 10420: lr=1.00E-05, loss= 1.2848 (max= 2.2032), tps=15842, mfu=33.01%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.07s, 14.04%) +2025-10-24 14:40:28,186 - root - INFO - Step 10420: lr=1.00E-05, loss= 1.2848 (max= 2.2032), tps=15841, mfu=33.01%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.07s, 14.04%) +2025-10-24 14:40:28,186 - root - INFO - Step 10420: lr=1.00E-05, loss= 1.2848 (max= 2.2032), tps=15841, mfu=33.01%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.07s, 14.04%) +2025-10-24 14:40:28,187 - root - INFO - Step 10420: lr=1.00E-05, loss= 1.2848 (max= 2.2032), tps=15842, mfu=33.01%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.07s, 14.04%) +2025-10-24 14:40:28,187 - root - INFO - Step 10420: lr=1.00E-05, loss= 1.2848 (max= 2.2032), tps=15841, mfu=33.01%, memory: 78.54GiB(44.03%) time/data_loading=0.01s (max=0.07s, 14.04%) +2025-10-24 14:40:46,179 - root - INFO - Step 10430: lr=1.00E-05, loss= 1.2699 (max= 2.2527), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:40:46,179 - root - INFO - Step 10430: lr=1.00E-05, loss= 1.2699 (max= 2.2527), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:40:46,179 - root - INFO - Step 10430: lr=1.00E-05, loss= 1.2699 (max= 2.2527), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:40:46,179 - root - INFO - Step 10430: lr=1.00E-05, loss= 1.2699 (max= 2.2527), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:40:46,180 - root - INFO - Step 10430: lr=1.00E-05, loss= 1.2699 (max= 2.2527), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:40:46,179 - root - INFO - Step 10430: lr=1.00E-05, loss= 1.2699 (max= 2.2527), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:40:46,180 - root - INFO - Step 10430: lr=1.00E-05, loss= 1.2699 (max= 2.2527), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:40:46,180 - root - INFO - Step 10430: lr=1.00E-05, loss= 1.2699 (max= 2.2527), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:04,190 - root - INFO - Step 10440: lr=1.00E-05, loss= 1.2693 (max= 2.3546), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:04,190 - root - INFO - Step 10440: lr=1.00E-05, loss= 1.2693 (max= 2.3546), tps=18198, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:04,190 - root - INFO - Step 10440: lr=1.00E-05, loss= 1.2693 (max= 2.3546), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:04,190 - root - INFO - Step 10440: lr=1.00E-05, loss= 1.2693 (max= 2.3546), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:04,190 - root - INFO - Step 10440: lr=1.00E-05, loss= 1.2693 (max= 2.3546), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:04,190 - root - INFO - Step 10440: lr=1.00E-05, loss= 1.2693 (max= 2.3546), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:04,190 - root - INFO - Step 10440: lr=1.00E-05, loss= 1.2693 (max= 2.3546), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:04,191 - root - INFO - Step 10440: lr=1.00E-05, loss= 1.2693 (max= 2.3546), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:22,200 - root - INFO - Step 10450: lr=1.00E-05, loss= 1.2593 (max= 1.9788), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:22,200 - root - INFO - Step 10450: lr=1.00E-05, loss= 1.2593 (max= 1.9788), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:22,200 - root - INFO - Step 10450: lr=1.00E-05, loss= 1.2593 (max= 1.9788), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:22,200 - root - INFO - Step 10450: lr=1.00E-05, loss= 1.2593 (max= 1.9788), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:22,200 - root - INFO - Step 10450: lr=1.00E-05, loss= 1.2593 (max= 1.9788), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:22,201 - root - INFO - Step 10450: lr=1.00E-05, loss= 1.2593 (max= 1.9788), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:22,201 - root - INFO - Step 10450: lr=1.00E-05, loss= 1.2593 (max= 1.9788), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:22,201 - root - INFO - Step 10450: lr=1.00E-05, loss= 1.2593 (max= 1.9788), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:40,204 - root - INFO - Step 10460: lr=1.00E-05, loss= 1.2840 (max= 2.5107), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:40,204 - root - INFO - Step 10460: lr=1.00E-05, loss= 1.2840 (max= 2.5107), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:40,204 - root - INFO - Step 10460: lr=1.00E-05, loss= 1.2840 (max= 2.5107), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:40,204 - root - INFO - Step 10460: lr=1.00E-05, loss= 1.2840 (max= 2.5107), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:40,204 - root - INFO - Step 10460: lr=1.00E-05, loss= 1.2840 (max= 2.5107), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:40,204 - root - INFO - Step 10460: lr=1.00E-05, loss= 1.2840 (max= 2.5107), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:40,204 - root - INFO - Step 10460: lr=1.00E-05, loss= 1.2840 (max= 2.5107), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:40,204 - root - INFO - Step 10460: lr=1.00E-05, loss= 1.2840 (max= 2.5107), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:44,007 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:728933 +2025-10-24 14:41:58,212 - root - INFO - Step 10470: lr=1.00E-05, loss= 1.2750 (max= 2.5175), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:58,212 - root - INFO - Step 10470: lr=1.00E-05, loss= 1.2750 (max= 2.5175), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:58,213 - root - INFO - Step 10470: lr=1.00E-05, loss= 1.2750 (max= 2.5175), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:58,213 - root - INFO - Step 10470: lr=1.00E-05, loss= 1.2750 (max= 2.5175), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:58,213 - root - INFO - Step 10470: lr=1.00E-05, loss= 1.2750 (max= 2.5175), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:58,213 - root - INFO - Step 10470: lr=1.00E-05, loss= 1.2750 (max= 2.5175), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:58,213 - root - INFO - Step 10470: lr=1.00E-05, loss= 1.2750 (max= 2.5175), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:41:58,213 - root - INFO - Step 10470: lr=1.00E-05, loss= 1.2750 (max= 2.5175), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:16,282 - root - INFO - Step 10480: lr=1.00E-05, loss= 1.2592 (max= 2.1229), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:16,282 - root - INFO - Step 10480: lr=1.00E-05, loss= 1.2592 (max= 2.1229), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:16,282 - root - INFO - Step 10480: lr=1.00E-05, loss= 1.2592 (max= 2.1229), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:16,282 - root - INFO - Step 10480: lr=1.00E-05, loss= 1.2592 (max= 2.1229), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:16,282 - root - INFO - Step 10480: lr=1.00E-05, loss= 1.2592 (max= 2.1229), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:16,282 - root - INFO - Step 10480: lr=1.00E-05, loss= 1.2592 (max= 2.1229), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:16,282 - root - INFO - Step 10480: lr=1.00E-05, loss= 1.2592 (max= 2.1229), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:16,282 - root - INFO - Step 10480: lr=1.00E-05, loss= 1.2592 (max= 2.1229), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:34,304 - root - INFO - Step 10490: lr=1.00E-05, loss= 1.2601 (max= 2.0633), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:34,304 - root - INFO - Step 10490: lr=1.00E-05, loss= 1.2601 (max= 2.0633), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:34,304 - root - INFO - Step 10490: lr=1.00E-05, loss= 1.2601 (max= 2.0633), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:34,304 - root - INFO - Step 10490: lr=1.00E-05, loss= 1.2601 (max= 2.0633), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:34,304 - root - INFO - Step 10490: lr=1.00E-05, loss= 1.2601 (max= 2.0633), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:34,304 - root - INFO - Step 10490: lr=1.00E-05, loss= 1.2601 (max= 2.0633), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:34,305 - root - INFO - Step 10490: lr=1.00E-05, loss= 1.2601 (max= 2.0633), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:34,305 - root - INFO - Step 10490: lr=1.00E-05, loss= 1.2601 (max= 2.0633), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:52,319 - root - INFO - Step 10500: lr=1.00E-05, loss= 1.2656 (max= 2.1137), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:52,319 - root - INFO - Step 10500: lr=1.00E-05, loss= 1.2656 (max= 2.1137), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:52,319 - root - INFO - Step 10500: lr=1.00E-05, loss= 1.2656 (max= 2.1137), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:52,319 - root - INFO - Step 10500: lr=1.00E-05, loss= 1.2656 (max= 2.1137), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:52,320 - root - INFO - Step 10500: lr=1.00E-05, loss= 1.2656 (max= 2.1137), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:52,319 - root - INFO - Step 10500: lr=1.00E-05, loss= 1.2656 (max= 2.1137), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:52,320 - root - INFO - Step 10500: lr=1.00E-05, loss= 1.2656 (max= 2.1137), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:42:52,320 - root - INFO - Step 10500: lr=1.00E-05, loss= 1.2656 (max= 2.1137), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:43:10,338 - root - INFO - Step 10510: lr=1.00E-05, loss= 1.2516 (max= 2.2813), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:43:10,338 - root - INFO - Step 10510: lr=1.00E-05, loss= 1.2516 (max= 2.2813), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:43:10,339 - root - INFO - Step 10510: lr=1.00E-05, loss= 1.2516 (max= 2.2813), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:43:10,339 - root - INFO - Step 10510: lr=1.00E-05, loss= 1.2516 (max= 2.2813), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:43:10,339 - root - INFO - Step 10510: lr=1.00E-05, loss= 1.2516 (max= 2.2813), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:43:10,339 - root - INFO - Step 10510: lr=1.00E-05, loss= 1.2516 (max= 2.2813), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:43:10,339 - root - INFO - Step 10510: lr=1.00E-05, loss= 1.2516 (max= 2.2813), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:43:10,339 - root - INFO - Step 10510: lr=1.00E-05, loss= 1.2516 (max= 2.2813), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:43:28,339 - root - INFO - Step 10520: lr=1.00E-05, loss= 1.2549 (max= 3.6531), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:43:28,340 - root - INFO - Step 10520: lr=1.00E-05, loss= 1.2549 (max= 3.6531), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:43:28,340 - root - INFO - Step 10520: lr=1.00E-05, loss= 1.2549 (max= 3.6531), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:43:28,340 - root - INFO - Step 10520: lr=1.00E-05, loss= 1.2549 (max= 3.6531), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:43:28,340 - root - INFO - Step 10520: lr=1.00E-05, loss= 1.2549 (max= 3.6531), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:43:28,340 - root - INFO - Step 10520: lr=1.00E-05, loss= 1.2549 (max= 3.6531), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:43:28,340 - root - INFO - Step 10520: lr=1.00E-05, loss= 1.2549 (max= 3.6531), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:43:28,340 - root - INFO - Step 10520: lr=1.00E-05, loss= 1.2549 (max= 3.6531), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:43:46,335 - root - INFO - Step 10530: lr=1.00E-05, loss= 1.2709 (max= 2.2533), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:43:46,335 - root - INFO - Step 10530: lr=1.00E-05, loss= 1.2709 (max= 2.2533), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:43:46,335 - root - INFO - Step 10530: lr=1.00E-05, loss= 1.2709 (max= 2.2533), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:43:46,335 - root - INFO - Step 10530: lr=1.00E-05, loss= 1.2709 (max= 2.2533), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:43:46,335 - root - INFO - Step 10530: lr=1.00E-05, loss= 1.2709 (max= 2.2533), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:43:46,335 - root - INFO - Step 10530: lr=1.00E-05, loss= 1.2709 (max= 2.2533), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:43:46,335 - root - INFO - Step 10530: lr=1.00E-05, loss= 1.2709 (max= 2.2533), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:43:46,335 - root - INFO - Step 10530: lr=1.00E-05, loss= 1.2709 (max= 2.2533), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:44:04,336 - root - INFO - Step 10540: lr=1.00E-05, loss= 1.2484 (max= 2.2719), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:04,336 - root - INFO - Step 10540: lr=1.00E-05, loss= 1.2484 (max= 2.2719), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:04,336 - root - INFO - Step 10540: lr=1.00E-05, loss= 1.2484 (max= 2.2719), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:04,336 - root - INFO - Step 10540: lr=1.00E-05, loss= 1.2484 (max= 2.2719), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:04,336 - root - INFO - Step 10540: lr=1.00E-05, loss= 1.2484 (max= 2.2719), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:04,336 - root - INFO - Step 10540: lr=1.00E-05, loss= 1.2484 (max= 2.2719), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:04,336 - root - INFO - Step 10540: lr=1.00E-05, loss= 1.2484 (max= 2.2719), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:04,336 - root - INFO - Step 10540: lr=1.00E-05, loss= 1.2484 (max= 2.2719), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:22,334 - root - INFO - Step 10550: lr=1.00E-05, loss= 1.2542 (max= 2.6354), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:22,334 - root - INFO - Step 10550: lr=1.00E-05, loss= 1.2542 (max= 2.6354), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:22,334 - root - INFO - Step 10550: lr=1.00E-05, loss= 1.2542 (max= 2.6354), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:22,334 - root - INFO - Step 10550: lr=1.00E-05, loss= 1.2542 (max= 2.6354), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:22,334 - root - INFO - Step 10550: lr=1.00E-05, loss= 1.2542 (max= 2.6354), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:22,334 - root - INFO - Step 10550: lr=1.00E-05, loss= 1.2542 (max= 2.6354), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:22,334 - root - INFO - Step 10550: lr=1.00E-05, loss= 1.2542 (max= 2.6354), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:22,334 - root - INFO - Step 10550: lr=1.00E-05, loss= 1.2542 (max= 2.6354), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:40,378 - root - INFO - Step 10560: lr=1.00E-05, loss= 1.2617 (max= 2.1418), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:40,378 - root - INFO - Step 10560: lr=1.00E-05, loss= 1.2617 (max= 2.1418), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:40,378 - root - INFO - Step 10560: lr=1.00E-05, loss= 1.2617 (max= 2.1418), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:40,378 - root - INFO - Step 10560: lr=1.00E-05, loss= 1.2617 (max= 2.1418), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:40,378 - root - INFO - Step 10560: lr=1.00E-05, loss= 1.2617 (max= 2.1418), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:40,378 - root - INFO - Step 10560: lr=1.00E-05, loss= 1.2617 (max= 2.1418), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:40,378 - root - INFO - Step 10560: lr=1.00E-05, loss= 1.2617 (max= 2.1418), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:40,378 - root - INFO - Step 10560: lr=1.00E-05, loss= 1.2617 (max= 2.1418), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:44:58,449 - root - INFO - Step 10570: lr=1.00E-05, loss= 1.2521 (max= 2.2425), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:44:58,449 - root - INFO - Step 10570: lr=1.00E-05, loss= 1.2521 (max= 2.2425), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:44:58,449 - root - INFO - Step 10570: lr=1.00E-05, loss= 1.2521 (max= 2.2425), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:44:58,449 - root - INFO - Step 10570: lr=1.00E-05, loss= 1.2521 (max= 2.2425), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:44:58,449 - root - INFO - Step 10570: lr=1.00E-05, loss= 1.2521 (max= 2.2425), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:44:58,449 - root - INFO - Step 10570: lr=1.00E-05, loss= 1.2521 (max= 2.2425), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:44:58,449 - root - INFO - Step 10570: lr=1.00E-05, loss= 1.2521 (max= 2.2425), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:44:58,449 - root - INFO - Step 10570: lr=1.00E-05, loss= 1.2521 (max= 2.2425), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:45:16,481 - root - INFO - Step 10580: lr=1.00E-05, loss= 1.2500 (max= 2.3116), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:45:16,481 - root - INFO - Step 10580: lr=1.00E-05, loss= 1.2500 (max= 2.3116), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:45:16,481 - root - INFO - Step 10580: lr=1.00E-05, loss= 1.2500 (max= 2.3116), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:45:16,481 - root - INFO - Step 10580: lr=1.00E-05, loss= 1.2500 (max= 2.3116), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:45:16,481 - root - INFO - Step 10580: lr=1.00E-05, loss= 1.2500 (max= 2.3116), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:45:16,481 - root - INFO - Step 10580: lr=1.00E-05, loss= 1.2500 (max= 2.3116), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:45:16,481 - root - INFO - Step 10580: lr=1.00E-05, loss= 1.2500 (max= 2.3116), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:45:16,481 - root - INFO - Step 10580: lr=1.00E-05, loss= 1.2500 (max= 2.3116), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:45:34,524 - root - INFO - Step 10590: lr=1.00E-05, loss= 1.2776 (max= 2.3728), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:45:34,524 - root - INFO - Step 10590: lr=1.00E-05, loss= 1.2776 (max= 2.3728), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:45:34,524 - root - INFO - Step 10590: lr=1.00E-05, loss= 1.2776 (max= 2.3728), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:45:34,524 - root - INFO - Step 10590: lr=1.00E-05, loss= 1.2776 (max= 2.3728), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:45:34,524 - root - INFO - Step 10590: lr=1.00E-05, loss= 1.2776 (max= 2.3728), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:45:34,524 - root - INFO - Step 10590: lr=1.00E-05, loss= 1.2776 (max= 2.3728), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:45:34,524 - root - INFO - Step 10590: lr=1.00E-05, loss= 1.2776 (max= 2.3728), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:45:34,524 - root - INFO - Step 10590: lr=1.00E-05, loss= 1.2776 (max= 2.3728), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:45:52,560 - root - INFO - Step 10600: lr=1.00E-05, loss= 1.2829 (max= 2.0609), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:45:52,560 - root - INFO - Step 10600: lr=1.00E-05, loss= 1.2829 (max= 2.0609), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:45:52,560 - root - INFO - Step 10600: lr=1.00E-05, loss= 1.2829 (max= 2.0609), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:45:52,560 - root - INFO - Step 10600: lr=1.00E-05, loss= 1.2829 (max= 2.0609), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:45:52,560 - root - INFO - Step 10600: lr=1.00E-05, loss= 1.2829 (max= 2.0609), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:45:52,560 - root - INFO - Step 10600: lr=1.00E-05, loss= 1.2829 (max= 2.0609), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:45:52,560 - root - INFO - Step 10600: lr=1.00E-05, loss= 1.2829 (max= 2.0609), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:45:52,560 - root - INFO - Step 10600: lr=1.00E-05, loss= 1.2829 (max= 2.0609), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:46:10,578 - root - INFO - Step 10610: lr=1.00E-05, loss= 1.2628 (max= 2.4744), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:46:10,578 - root - INFO - Step 10610: lr=1.00E-05, loss= 1.2628 (max= 2.4744), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:46:10,579 - root - INFO - Step 10610: lr=1.00E-05, loss= 1.2628 (max= 2.4744), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:46:10,579 - root - INFO - Step 10610: lr=1.00E-05, loss= 1.2628 (max= 2.4744), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:46:10,579 - root - INFO - Step 10610: lr=1.00E-05, loss= 1.2628 (max= 2.4744), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:46:10,579 - root - INFO - Step 10610: lr=1.00E-05, loss= 1.2628 (max= 2.4744), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:46:10,579 - root - INFO - Step 10610: lr=1.00E-05, loss= 1.2628 (max= 2.4744), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:46:10,579 - root - INFO - Step 10610: lr=1.00E-05, loss= 1.2628 (max= 2.4744), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:46:28,596 - root - INFO - Step 10620: lr=1.00E-05, loss= 1.2815 (max= 2.3637), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:46:28,596 - root - INFO - Step 10620: lr=1.00E-05, loss= 1.2815 (max= 2.3637), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:46:28,597 - root - INFO - Step 10620: lr=1.00E-05, loss= 1.2815 (max= 2.3637), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:46:28,597 - root - INFO - Step 10620: lr=1.00E-05, loss= 1.2815 (max= 2.3637), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:46:28,597 - root - INFO - Step 10620: lr=1.00E-05, loss= 1.2815 (max= 2.3637), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:46:28,597 - root - INFO - Step 10620: lr=1.00E-05, loss= 1.2815 (max= 2.3637), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:46:28,597 - root - INFO - Step 10620: lr=1.00E-05, loss= 1.2815 (max= 2.3637), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:46:28,597 - root - INFO - Step 10620: lr=1.00E-05, loss= 1.2815 (max= 2.3637), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:46:46,622 - root - INFO - Step 10630: lr=1.00E-05, loss= 1.3160 (max= 2.4488), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:46:46,622 - root - INFO - Step 10630: lr=1.00E-05, loss= 1.3160 (max= 2.4488), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:46:46,622 - root - INFO - Step 10630: lr=1.00E-05, loss= 1.3160 (max= 2.4488), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:46:46,622 - root - INFO - Step 10630: lr=1.00E-05, loss= 1.3160 (max= 2.4488), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:46:46,622 - root - INFO - Step 10630: lr=1.00E-05, loss= 1.3160 (max= 2.4488), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:46:46,622 - root - INFO - Step 10630: lr=1.00E-05, loss= 1.3160 (max= 2.4488), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:46:46,622 - root - INFO - Step 10630: lr=1.00E-05, loss= 1.3160 (max= 2.4488), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:46:46,623 - root - INFO - Step 10630: lr=1.00E-05, loss= 1.3160 (max= 2.4488), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:04,645 - root - INFO - Step 10640: lr=1.00E-05, loss= 1.2931 (max= 2.2692), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:04,645 - root - INFO - Step 10640: lr=1.00E-05, loss= 1.2931 (max= 2.2692), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:04,645 - root - INFO - Step 10640: lr=1.00E-05, loss= 1.2931 (max= 2.2692), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:04,645 - root - INFO - Step 10640: lr=1.00E-05, loss= 1.2931 (max= 2.2692), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:04,645 - root - INFO - Step 10640: lr=1.00E-05, loss= 1.2931 (max= 2.2692), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:04,645 - root - INFO - Step 10640: lr=1.00E-05, loss= 1.2931 (max= 2.2692), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:04,645 - root - INFO - Step 10640: lr=1.00E-05, loss= 1.2931 (max= 2.2692), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:04,645 - root - INFO - Step 10640: lr=1.00E-05, loss= 1.2931 (max= 2.2692), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:22,677 - root - INFO - Step 10650: lr=1.00E-05, loss= 1.2391 (max= 2.1380), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:22,677 - root - INFO - Step 10650: lr=1.00E-05, loss= 1.2391 (max= 2.1380), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:22,677 - root - INFO - Step 10650: lr=1.00E-05, loss= 1.2391 (max= 2.1380), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:22,677 - root - INFO - Step 10650: lr=1.00E-05, loss= 1.2391 (max= 2.1380), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:22,677 - root - INFO - Step 10650: lr=1.00E-05, loss= 1.2391 (max= 2.1380), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:22,677 - root - INFO - Step 10650: lr=1.00E-05, loss= 1.2391 (max= 2.1380), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:22,677 - root - INFO - Step 10650: lr=1.00E-05, loss= 1.2391 (max= 2.1380), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:22,678 - root - INFO - Step 10650: lr=1.00E-05, loss= 1.2391 (max= 2.1380), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:40,692 - root - INFO - Step 10660: lr=1.00E-05, loss= 1.2809 (max= 2.5362), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:40,692 - root - INFO - Step 10660: lr=1.00E-05, loss= 1.2809 (max= 2.5362), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:40,692 - root - INFO - Step 10660: lr=1.00E-05, loss= 1.2809 (max= 2.5362), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:40,692 - root - INFO - Step 10660: lr=1.00E-05, loss= 1.2809 (max= 2.5362), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:40,692 - root - INFO - Step 10660: lr=1.00E-05, loss= 1.2809 (max= 2.5362), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:40,692 - root - INFO - Step 10660: lr=1.00E-05, loss= 1.2809 (max= 2.5362), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:40,692 - root - INFO - Step 10660: lr=1.00E-05, loss= 1.2809 (max= 2.5362), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:40,692 - root - INFO - Step 10660: lr=1.00E-05, loss= 1.2809 (max= 2.5362), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:58,704 - root - INFO - Step 10670: lr=1.00E-05, loss= 1.2496 (max= 2.0921), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:58,704 - root - INFO - Step 10670: lr=1.00E-05, loss= 1.2496 (max= 2.0921), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:58,705 - root - INFO - Step 10670: lr=1.00E-05, loss= 1.2496 (max= 2.0921), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:58,705 - root - INFO - Step 10670: lr=1.00E-05, loss= 1.2496 (max= 2.0921), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:58,705 - root - INFO - Step 10670: lr=1.00E-05, loss= 1.2496 (max= 2.0921), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:58,705 - root - INFO - Step 10670: lr=1.00E-05, loss= 1.2496 (max= 2.0921), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:58,705 - root - INFO - Step 10670: lr=1.00E-05, loss= 1.2496 (max= 2.0921), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:47:58,705 - root - INFO - Step 10670: lr=1.00E-05, loss= 1.2496 (max= 2.0921), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:48:26,968 - root - INFO - Step 10680: lr=1.00E-05, loss= 1.2841 (max= 2.5824), tps=11596, mfu=24.16%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.26s, 37.21%) +2025-10-24 14:48:26,968 - root - INFO - Step 10680: lr=1.00E-05, loss= 1.2841 (max= 2.5824), tps=11595, mfu=24.16%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.26s, 37.21%) +2025-10-24 14:48:26,968 - root - INFO - Step 10680: lr=1.00E-05, loss= 1.2841 (max= 2.5824), tps=11596, mfu=24.16%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.26s, 37.21%) +2025-10-24 14:48:26,968 - root - INFO - Step 10680: lr=1.00E-05, loss= 1.2841 (max= 2.5824), tps=11596, mfu=24.16%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.26s, 37.21%) +2025-10-24 14:48:26,968 - root - INFO - Step 10680: lr=1.00E-05, loss= 1.2841 (max= 2.5824), tps=11596, mfu=24.16%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.26s, 37.21%) +2025-10-24 14:48:26,968 - root - INFO - Step 10680: lr=1.00E-05, loss= 1.2841 (max= 2.5824), tps=11596, mfu=24.16%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.26s, 37.21%) +2025-10-24 14:48:26,969 - root - INFO - Step 10680: lr=1.00E-05, loss= 1.2841 (max= 2.5824), tps=11596, mfu=24.16%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.26s, 37.21%) +2025-10-24 14:48:26,969 - root - INFO - Step 10680: lr=1.00E-05, loss= 1.2841 (max= 2.5824), tps=11596, mfu=24.16%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.26s, 37.21%) +2025-10-24 14:48:44,998 - root - INFO - Step 10690: lr=1.00E-05, loss= 1.2635 (max= 2.5282), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:48:44,998 - root - INFO - Step 10690: lr=1.00E-05, loss= 1.2635 (max= 2.5282), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:48:44,998 - root - INFO - Step 10690: lr=1.00E-05, loss= 1.2635 (max= 2.5282), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:48:44,998 - root - INFO - Step 10690: lr=1.00E-05, loss= 1.2635 (max= 2.5282), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:48:44,998 - root - INFO - Step 10690: lr=1.00E-05, loss= 1.2635 (max= 2.5282), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:48:44,999 - root - INFO - Step 10690: lr=1.00E-05, loss= 1.2635 (max= 2.5282), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:48:44,999 - root - INFO - Step 10690: lr=1.00E-05, loss= 1.2635 (max= 2.5282), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:48:44,999 - root - INFO - Step 10690: lr=1.00E-05, loss= 1.2635 (max= 2.5282), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:49:03,020 - root - INFO - Step 10700: lr=1.00E-05, loss= 1.2844 (max= 2.2159), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:49:03,020 - root - INFO - Step 10700: lr=1.00E-05, loss= 1.2844 (max= 2.2159), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:49:03,020 - root - INFO - Step 10700: lr=1.00E-05, loss= 1.2844 (max= 2.2159), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:49:03,021 - root - INFO - Step 10700: lr=1.00E-05, loss= 1.2844 (max= 2.2159), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:49:03,021 - root - INFO - Step 10700: lr=1.00E-05, loss= 1.2844 (max= 2.2159), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:49:03,021 - root - INFO - Step 10700: lr=1.00E-05, loss= 1.2844 (max= 2.2159), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:49:03,021 - root - INFO - Step 10700: lr=1.00E-05, loss= 1.2844 (max= 2.2159), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:49:03,021 - root - INFO - Step 10700: lr=1.00E-05, loss= 1.2844 (max= 2.2159), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:49:21,045 - root - INFO - Step 10710: lr=1.00E-05, loss= 1.2573 (max= 2.8313), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:49:21,045 - root - INFO - Step 10710: lr=1.00E-05, loss= 1.2573 (max= 2.8313), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:49:21,046 - root - INFO - Step 10710: lr=1.00E-05, loss= 1.2573 (max= 2.8313), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:49:21,046 - root - INFO - Step 10710: lr=1.00E-05, loss= 1.2573 (max= 2.8313), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:49:21,046 - root - INFO - Step 10710: lr=1.00E-05, loss= 1.2573 (max= 2.8313), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:49:21,046 - root - INFO - Step 10710: lr=1.00E-05, loss= 1.2573 (max= 2.8313), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:49:21,046 - root - INFO - Step 10710: lr=1.00E-05, loss= 1.2573 (max= 2.8313), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:49:21,046 - root - INFO - Step 10710: lr=1.00E-05, loss= 1.2573 (max= 2.8313), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:49:39,104 - root - INFO - Step 10720: lr=1.00E-05, loss= 1.2788 (max= 2.1724), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:49:39,104 - root - INFO - Step 10720: lr=1.00E-05, loss= 1.2788 (max= 2.1724), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:49:39,104 - root - INFO - Step 10720: lr=1.00E-05, loss= 1.2788 (max= 2.1724), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:49:39,104 - root - INFO - Step 10720: lr=1.00E-05, loss= 1.2788 (max= 2.1724), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:49:39,105 - root - INFO - Step 10720: lr=1.00E-05, loss= 1.2788 (max= 2.1724), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:49:39,105 - root - INFO - Step 10720: lr=1.00E-05, loss= 1.2788 (max= 2.1724), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:49:39,105 - root - INFO - Step 10720: lr=1.00E-05, loss= 1.2788 (max= 2.1724), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:49:39,105 - root - INFO - Step 10720: lr=1.00E-05, loss= 1.2788 (max= 2.1724), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:49:57,148 - root - INFO - Step 10730: lr=1.00E-05, loss= 1.2674 (max= 2.1639), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:49:57,148 - root - INFO - Step 10730: lr=1.00E-05, loss= 1.2674 (max= 2.1639), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:49:57,148 - root - INFO - Step 10730: lr=1.00E-05, loss= 1.2674 (max= 2.1639), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:49:57,148 - root - INFO - Step 10730: lr=1.00E-05, loss= 1.2674 (max= 2.1639), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:49:57,148 - root - INFO - Step 10730: lr=1.00E-05, loss= 1.2674 (max= 2.1639), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:49:57,148 - root - INFO - Step 10730: lr=1.00E-05, loss= 1.2674 (max= 2.1639), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:49:57,148 - root - INFO - Step 10730: lr=1.00E-05, loss= 1.2674 (max= 2.1639), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:49:57,149 - root - INFO - Step 10730: lr=1.00E-05, loss= 1.2674 (max= 2.1639), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:50:15,203 - root - INFO - Step 10740: lr=1.00E-05, loss= 1.2731 (max= 2.3639), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:50:15,203 - root - INFO - Step 10740: lr=1.00E-05, loss= 1.2731 (max= 2.3639), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:50:15,203 - root - INFO - Step 10740: lr=1.00E-05, loss= 1.2731 (max= 2.3639), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:50:15,204 - root - INFO - Step 10740: lr=1.00E-05, loss= 1.2731 (max= 2.3639), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:50:15,204 - root - INFO - Step 10740: lr=1.00E-05, loss= 1.2731 (max= 2.3639), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:50:15,204 - root - INFO - Step 10740: lr=1.00E-05, loss= 1.2731 (max= 2.3639), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:50:15,204 - root - INFO - Step 10740: lr=1.00E-05, loss= 1.2731 (max= 2.3639), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:50:15,204 - root - INFO - Step 10740: lr=1.00E-05, loss= 1.2731 (max= 2.3639), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:50:33,202 - root - INFO - Step 10750: lr=1.00E-05, loss= 1.2736 (max= 2.3041), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:50:33,202 - root - INFO - Step 10750: lr=1.00E-05, loss= 1.2736 (max= 2.3041), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:50:33,202 - root - INFO - Step 10750: lr=1.00E-05, loss= 1.2736 (max= 2.3041), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:50:33,202 - root - INFO - Step 10750: lr=1.00E-05, loss= 1.2736 (max= 2.3041), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:50:33,202 - root - INFO - Step 10750: lr=1.00E-05, loss= 1.2736 (max= 2.3041), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:50:33,202 - root - INFO - Step 10750: lr=1.00E-05, loss= 1.2736 (max= 2.3041), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:50:33,202 - root - INFO - Step 10750: lr=1.00E-05, loss= 1.2736 (max= 2.3041), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:50:33,202 - root - INFO - Step 10750: lr=1.00E-05, loss= 1.2736 (max= 2.3041), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:50:51,213 - root - INFO - Step 10760: lr=1.00E-05, loss= 1.2844 (max= 2.2177), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:50:51,213 - root - INFO - Step 10760: lr=1.00E-05, loss= 1.2844 (max= 2.2177), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:50:51,213 - root - INFO - Step 10760: lr=1.00E-05, loss= 1.2844 (max= 2.2177), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:50:51,214 - root - INFO - Step 10760: lr=1.00E-05, loss= 1.2844 (max= 2.2177), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:50:51,214 - root - INFO - Step 10760: lr=1.00E-05, loss= 1.2844 (max= 2.2177), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:50:51,214 - root - INFO - Step 10760: lr=1.00E-05, loss= 1.2844 (max= 2.2177), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:50:51,214 - root - INFO - Step 10760: lr=1.00E-05, loss= 1.2844 (max= 2.2177), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:50:51,214 - root - INFO - Step 10760: lr=1.00E-05, loss= 1.2844 (max= 2.2177), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:51:09,224 - root - INFO - Step 10770: lr=1.00E-05, loss= 1.2388 (max= 2.5961), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:51:09,224 - root - INFO - Step 10770: lr=1.00E-05, loss= 1.2388 (max= 2.5961), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:51:09,224 - root - INFO - Step 10770: lr=1.00E-05, loss= 1.2388 (max= 2.5961), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:51:09,224 - root - INFO - Step 10770: lr=1.00E-05, loss= 1.2388 (max= 2.5961), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:51:09,224 - root - INFO - Step 10770: lr=1.00E-05, loss= 1.2388 (max= 2.5961), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:51:09,224 - root - INFO - Step 10770: lr=1.00E-05, loss= 1.2388 (max= 2.5961), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:51:09,224 - root - INFO - Step 10770: lr=1.00E-05, loss= 1.2388 (max= 2.5961), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:51:09,224 - root - INFO - Step 10770: lr=1.00E-05, loss= 1.2388 (max= 2.5961), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:51:27,247 - root - INFO - Step 10780: lr=1.00E-05, loss= 1.2671 (max= 2.4776), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:51:27,247 - root - INFO - Step 10780: lr=1.00E-05, loss= 1.2671 (max= 2.4776), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:51:27,247 - root - INFO - Step 10780: lr=1.00E-05, loss= 1.2671 (max= 2.4776), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:51:27,247 - root - INFO - Step 10780: lr=1.00E-05, loss= 1.2671 (max= 2.4776), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:51:27,247 - root - INFO - Step 10780: lr=1.00E-05, loss= 1.2671 (max= 2.4776), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:51:27,247 - root - INFO - Step 10780: lr=1.00E-05, loss= 1.2671 (max= 2.4776), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:51:27,247 - root - INFO - Step 10780: lr=1.00E-05, loss= 1.2671 (max= 2.4776), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:51:27,247 - root - INFO - Step 10780: lr=1.00E-05, loss= 1.2671 (max= 2.4776), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:51:45,218 - root - INFO - Step 10790: lr=1.00E-05, loss= 1.2776 (max= 2.2656), tps=18236, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:51:45,218 - root - INFO - Step 10790: lr=1.00E-05, loss= 1.2776 (max= 2.2656), tps=18236, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:51:45,218 - root - INFO - Step 10790: lr=1.00E-05, loss= 1.2776 (max= 2.2656), tps=18236, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:51:45,219 - root - INFO - Step 10790: lr=1.00E-05, loss= 1.2776 (max= 2.2656), tps=18237, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:51:45,219 - root - INFO - Step 10790: lr=1.00E-05, loss= 1.2776 (max= 2.2656), tps=18237, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:51:45,219 - root - INFO - Step 10790: lr=1.00E-05, loss= 1.2776 (max= 2.2656), tps=18237, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:51:45,219 - root - INFO - Step 10790: lr=1.00E-05, loss= 1.2776 (max= 2.2656), tps=18237, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:51:45,219 - root - INFO - Step 10790: lr=1.00E-05, loss= 1.2776 (max= 2.2656), tps=18237, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:52:03,250 - root - INFO - Step 10800: lr=1.00E-05, loss= 1.2564 (max= 2.2806), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:52:03,250 - root - INFO - Step 10800: lr=1.00E-05, loss= 1.2564 (max= 2.2806), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:52:03,250 - root - INFO - Step 10800: lr=1.00E-05, loss= 1.2564 (max= 2.2806), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:52:03,250 - root - INFO - Step 10800: lr=1.00E-05, loss= 1.2564 (max= 2.2806), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:52:03,250 - root - INFO - Step 10800: lr=1.00E-05, loss= 1.2564 (max= 2.2806), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:52:03,250 - root - INFO - Step 10800: lr=1.00E-05, loss= 1.2564 (max= 2.2806), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:52:03,250 - root - INFO - Step 10800: lr=1.00E-05, loss= 1.2564 (max= 2.2806), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:52:03,250 - root - INFO - Step 10800: lr=1.00E-05, loss= 1.2564 (max= 2.2806), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:52:21,282 - root - INFO - Step 10810: lr=1.00E-05, loss= 1.2820 (max= 2.1976), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:52:21,282 - root - INFO - Step 10810: lr=1.00E-05, loss= 1.2820 (max= 2.1976), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:52:21,282 - root - INFO - Step 10810: lr=1.00E-05, loss= 1.2820 (max= 2.1976), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:52:21,282 - root - INFO - Step 10810: lr=1.00E-05, loss= 1.2820 (max= 2.1976), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:52:21,282 - root - INFO - Step 10810: lr=1.00E-05, loss= 1.2820 (max= 2.1976), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:52:21,282 - root - INFO - Step 10810: lr=1.00E-05, loss= 1.2820 (max= 2.1976), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:52:21,282 - root - INFO - Step 10810: lr=1.00E-05, loss= 1.2820 (max= 2.1976), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:52:21,282 - root - INFO - Step 10810: lr=1.00E-05, loss= 1.2820 (max= 2.1976), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:52:39,279 - root - INFO - Step 10820: lr=1.00E-05, loss= 1.2663 (max= 2.1241), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:52:39,279 - root - INFO - Step 10820: lr=1.00E-05, loss= 1.2663 (max= 2.1241), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:52:39,279 - root - INFO - Step 10820: lr=1.00E-05, loss= 1.2663 (max= 2.1241), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:52:39,279 - root - INFO - Step 10820: lr=1.00E-05, loss= 1.2663 (max= 2.1241), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:52:39,279 - root - INFO - Step 10820: lr=1.00E-05, loss= 1.2663 (max= 2.1241), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:52:39,280 - root - INFO - Step 10820: lr=1.00E-05, loss= 1.2663 (max= 2.1241), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:52:39,280 - root - INFO - Step 10820: lr=1.00E-05, loss= 1.2663 (max= 2.1241), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:52:39,280 - root - INFO - Step 10820: lr=1.00E-05, loss= 1.2663 (max= 2.1241), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:52:57,311 - root - INFO - Step 10830: lr=1.00E-05, loss= 1.2936 (max= 2.3420), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:52:57,311 - root - INFO - Step 10830: lr=1.00E-05, loss= 1.2936 (max= 2.3420), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:52:57,311 - root - INFO - Step 10830: lr=1.00E-05, loss= 1.2936 (max= 2.3420), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:52:57,311 - root - INFO - Step 10830: lr=1.00E-05, loss= 1.2936 (max= 2.3420), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:52:57,311 - root - INFO - Step 10830: lr=1.00E-05, loss= 1.2936 (max= 2.3420), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:52:57,311 - root - INFO - Step 10830: lr=1.00E-05, loss= 1.2936 (max= 2.3420), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:52:57,311 - root - INFO - Step 10830: lr=1.00E-05, loss= 1.2936 (max= 2.3420), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:52:57,311 - root - INFO - Step 10830: lr=1.00E-05, loss= 1.2936 (max= 2.3420), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:15,363 - root - INFO - Step 10840: lr=1.00E-05, loss= 1.2975 (max= 2.6382), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:15,364 - root - INFO - Step 10840: lr=1.00E-05, loss= 1.2975 (max= 2.6382), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:15,364 - root - INFO - Step 10840: lr=1.00E-05, loss= 1.2975 (max= 2.6382), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:15,364 - root - INFO - Step 10840: lr=1.00E-05, loss= 1.2975 (max= 2.6382), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:15,364 - root - INFO - Step 10840: lr=1.00E-05, loss= 1.2975 (max= 2.6382), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:15,364 - root - INFO - Step 10840: lr=1.00E-05, loss= 1.2975 (max= 2.6382), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:15,364 - root - INFO - Step 10840: lr=1.00E-05, loss= 1.2975 (max= 2.6382), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:15,364 - root - INFO - Step 10840: lr=1.00E-05, loss= 1.2975 (max= 2.6382), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:33,383 - root - INFO - Step 10850: lr=1.00E-05, loss= 1.2851 (max= 2.2447), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:33,383 - root - INFO - Step 10850: lr=1.00E-05, loss= 1.2851 (max= 2.2447), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:33,383 - root - INFO - Step 10850: lr=1.00E-05, loss= 1.2851 (max= 2.2447), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:33,383 - root - INFO - Step 10850: lr=1.00E-05, loss= 1.2851 (max= 2.2447), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:33,383 - root - INFO - Step 10850: lr=1.00E-05, loss= 1.2851 (max= 2.2447), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:33,383 - root - INFO - Step 10850: lr=1.00E-05, loss= 1.2851 (max= 2.2447), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:33,383 - root - INFO - Step 10850: lr=1.00E-05, loss= 1.2851 (max= 2.2447), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:33,383 - root - INFO - Step 10850: lr=1.00E-05, loss= 1.2851 (max= 2.2447), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:51,408 - root - INFO - Step 10860: lr=1.00E-05, loss= 1.2728 (max= 2.3729), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:51,408 - root - INFO - Step 10860: lr=1.00E-05, loss= 1.2728 (max= 2.3729), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:51,408 - root - INFO - Step 10860: lr=1.00E-05, loss= 1.2728 (max= 2.3729), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:51,409 - root - INFO - Step 10860: lr=1.00E-05, loss= 1.2728 (max= 2.3729), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:51,409 - root - INFO - Step 10860: lr=1.00E-05, loss= 1.2728 (max= 2.3729), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:51,409 - root - INFO - Step 10860: lr=1.00E-05, loss= 1.2728 (max= 2.3729), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:51,409 - root - INFO - Step 10860: lr=1.00E-05, loss= 1.2728 (max= 2.3729), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:53:51,409 - root - INFO - Step 10860: lr=1.00E-05, loss= 1.2728 (max= 2.3729), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:54:09,420 - root - INFO - Step 10870: lr=1.00E-05, loss= 1.2808 (max= 2.2767), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:54:09,420 - root - INFO - Step 10870: lr=1.00E-05, loss= 1.2808 (max= 2.2767), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:54:09,420 - root - INFO - Step 10870: lr=1.00E-05, loss= 1.2808 (max= 2.2767), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:54:09,420 - root - INFO - Step 10870: lr=1.00E-05, loss= 1.2808 (max= 2.2767), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:54:09,420 - root - INFO - Step 10870: lr=1.00E-05, loss= 1.2808 (max= 2.2767), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:54:09,420 - root - INFO - Step 10870: lr=1.00E-05, loss= 1.2808 (max= 2.2767), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:54:09,420 - root - INFO - Step 10870: lr=1.00E-05, loss= 1.2808 (max= 2.2767), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:54:09,420 - root - INFO - Step 10870: lr=1.00E-05, loss= 1.2808 (max= 2.2767), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:54:27,443 - root - INFO - Step 10880: lr=1.00E-05, loss= 1.2729 (max= 2.2300), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:54:27,443 - root - INFO - Step 10880: lr=1.00E-05, loss= 1.2729 (max= 2.2300), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:54:27,443 - root - INFO - Step 10880: lr=1.00E-05, loss= 1.2729 (max= 2.2300), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:54:27,443 - root - INFO - Step 10880: lr=1.00E-05, loss= 1.2729 (max= 2.2300), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:54:27,443 - root - INFO - Step 10880: lr=1.00E-05, loss= 1.2729 (max= 2.2300), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:54:27,443 - root - INFO - Step 10880: lr=1.00E-05, loss= 1.2729 (max= 2.2300), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:54:27,443 - root - INFO - Step 10880: lr=1.00E-05, loss= 1.2729 (max= 2.2300), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:54:27,443 - root - INFO - Step 10880: lr=1.00E-05, loss= 1.2729 (max= 2.2300), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:54:45,482 - root - INFO - Step 10890: lr=1.00E-05, loss= 1.3075 (max= 3.7192), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:54:45,482 - root - INFO - Step 10890: lr=1.00E-05, loss= 1.3075 (max= 3.7192), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:54:45,482 - root - INFO - Step 10890: lr=1.00E-05, loss= 1.3075 (max= 3.7192), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:54:45,482 - root - INFO - Step 10890: lr=1.00E-05, loss= 1.3075 (max= 3.7192), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:54:45,482 - root - INFO - Step 10890: lr=1.00E-05, loss= 1.3075 (max= 3.7192), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:54:45,482 - root - INFO - Step 10890: lr=1.00E-05, loss= 1.3075 (max= 3.7192), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:54:45,482 - root - INFO - Step 10890: lr=1.00E-05, loss= 1.3075 (max= 3.7192), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:54:45,482 - root - INFO - Step 10890: lr=1.00E-05, loss= 1.3075 (max= 3.7192), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:55:03,479 - root - INFO - Step 10900: lr=1.00E-05, loss= 1.3220 (max= 3.7277), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:55:03,479 - root - INFO - Step 10900: lr=1.00E-05, loss= 1.3220 (max= 3.7277), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:55:03,479 - root - INFO - Step 10900: lr=1.00E-05, loss= 1.3220 (max= 3.7277), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:55:03,479 - root - INFO - Step 10900: lr=1.00E-05, loss= 1.3220 (max= 3.7277), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:55:03,479 - root - INFO - Step 10900: lr=1.00E-05, loss= 1.3220 (max= 3.7277), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:55:03,479 - root - INFO - Step 10900: lr=1.00E-05, loss= 1.3220 (max= 3.7277), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:55:03,479 - root - INFO - Step 10900: lr=1.00E-05, loss= 1.3220 (max= 3.7277), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:55:03,479 - root - INFO - Step 10900: lr=1.00E-05, loss= 1.3220 (max= 3.7277), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:55:21,459 - root - INFO - Step 10910: lr=1.00E-05, loss= 1.2772 (max= 2.2645), tps=18228, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:55:21,459 - root - INFO - Step 10910: lr=1.00E-05, loss= 1.2772 (max= 2.2645), tps=18228, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:55:21,459 - root - INFO - Step 10910: lr=1.00E-05, loss= 1.2772 (max= 2.2645), tps=18228, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:55:21,459 - root - INFO - Step 10910: lr=1.00E-05, loss= 1.2772 (max= 2.2645), tps=18228, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:55:21,459 - root - INFO - Step 10910: lr=1.00E-05, loss= 1.2772 (max= 2.2645), tps=18228, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:55:21,459 - root - INFO - Step 10910: lr=1.00E-05, loss= 1.2772 (max= 2.2645), tps=18228, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:55:21,460 - root - INFO - Step 10910: lr=1.00E-05, loss= 1.2772 (max= 2.2645), tps=18228, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:55:21,460 - root - INFO - Step 10910: lr=1.00E-05, loss= 1.2772 (max= 2.2645), tps=18228, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:55:40,622 - root - INFO - Step 10920: lr=1.00E-05, loss= 1.3052 (max= 2.2359), tps=17103, mfu=35.63%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.04s, 7.37%) +2025-10-24 14:55:40,622 - root - INFO - Step 10920: lr=1.00E-05, loss= 1.3052 (max= 2.2359), tps=17102, mfu=35.63%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.04s, 7.37%) +2025-10-24 14:55:40,622 - root - INFO - Step 10920: lr=1.00E-05, loss= 1.3052 (max= 2.2359), tps=17103, mfu=35.63%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.04s, 7.37%) +2025-10-24 14:55:40,623 - root - INFO - Step 10920: lr=1.00E-05, loss= 1.3052 (max= 2.2359), tps=17103, mfu=35.63%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.04s, 7.37%) +2025-10-24 14:55:40,623 - root - INFO - Step 10920: lr=1.00E-05, loss= 1.3052 (max= 2.2359), tps=17103, mfu=35.63%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.04s, 7.37%) +2025-10-24 14:55:40,623 - root - INFO - Step 10920: lr=1.00E-05, loss= 1.3052 (max= 2.2359), tps=17103, mfu=35.63%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.04s, 7.37%) +2025-10-24 14:55:40,623 - root - INFO - Step 10920: lr=1.00E-05, loss= 1.3052 (max= 2.2359), tps=17103, mfu=35.63%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.04s, 7.37%) +2025-10-24 14:55:40,626 - root - INFO - Step 10920: lr=1.00E-05, loss= 1.3052 (max= 2.2359), tps=17103, mfu=35.63%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.04s, 7.37%) +2025-10-24 14:55:58,670 - root - INFO - Step 10930: lr=1.00E-05, loss= 1.2542 (max= 2.1029), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:55:58,670 - root - INFO - Step 10930: lr=1.00E-05, loss= 1.2542 (max= 2.1029), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:55:58,670 - root - INFO - Step 10930: lr=1.00E-05, loss= 1.2542 (max= 2.1029), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:55:58,670 - root - INFO - Step 10930: lr=1.00E-05, loss= 1.2542 (max= 2.1029), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:55:58,670 - root - INFO - Step 10930: lr=1.00E-05, loss= 1.2542 (max= 2.1029), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:55:58,670 - root - INFO - Step 10930: lr=1.00E-05, loss= 1.2542 (max= 2.1029), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:55:58,670 - root - INFO - Step 10930: lr=1.00E-05, loss= 1.2542 (max= 2.1029), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:55:58,671 - root - INFO - Step 10930: lr=1.00E-05, loss= 1.2542 (max= 2.1029), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:56:16,710 - root - INFO - Step 10940: lr=1.00E-05, loss= 1.2657 (max= 2.2224), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:56:16,710 - root - INFO - Step 10940: lr=1.00E-05, loss= 1.2657 (max= 2.2224), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:56:16,711 - root - INFO - Step 10940: lr=1.00E-05, loss= 1.2657 (max= 2.2224), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:56:16,711 - root - INFO - Step 10940: lr=1.00E-05, loss= 1.2657 (max= 2.2224), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:56:16,711 - root - INFO - Step 10940: lr=1.00E-05, loss= 1.2657 (max= 2.2224), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:56:16,711 - root - INFO - Step 10940: lr=1.00E-05, loss= 1.2657 (max= 2.2224), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:56:16,711 - root - INFO - Step 10940: lr=1.00E-05, loss= 1.2657 (max= 2.2224), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:56:16,712 - root - INFO - Step 10940: lr=1.00E-05, loss= 1.2657 (max= 2.2224), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:56:34,721 - root - INFO - Step 10950: lr=1.00E-05, loss= 1.2616 (max= 2.3930), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:56:34,721 - root - INFO - Step 10950: lr=1.00E-05, loss= 1.2616 (max= 2.3930), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:56:34,721 - root - INFO - Step 10950: lr=1.00E-05, loss= 1.2616 (max= 2.3930), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:56:34,721 - root - INFO - Step 10950: lr=1.00E-05, loss= 1.2616 (max= 2.3930), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:56:34,721 - root - INFO - Step 10950: lr=1.00E-05, loss= 1.2616 (max= 2.3930), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:56:34,722 - root - INFO - Step 10950: lr=1.00E-05, loss= 1.2616 (max= 2.3930), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:56:34,722 - root - INFO - Step 10950: lr=1.00E-05, loss= 1.2616 (max= 2.3930), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:56:34,722 - root - INFO - Step 10950: lr=1.00E-05, loss= 1.2616 (max= 2.3930), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:56:52,745 - root - INFO - Step 10960: lr=1.00E-05, loss= 1.2740 (max= 2.0673), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:56:52,745 - root - INFO - Step 10960: lr=1.00E-05, loss= 1.2740 (max= 2.0673), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:56:52,745 - root - INFO - Step 10960: lr=1.00E-05, loss= 1.2740 (max= 2.0673), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:56:52,745 - root - INFO - Step 10960: lr=1.00E-05, loss= 1.2740 (max= 2.0673), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:56:52,745 - root - INFO - Step 10960: lr=1.00E-05, loss= 1.2740 (max= 2.0673), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:56:52,745 - root - INFO - Step 10960: lr=1.00E-05, loss= 1.2740 (max= 2.0673), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:56:52,745 - root - INFO - Step 10960: lr=1.00E-05, loss= 1.2740 (max= 2.0673), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:56:52,748 - root - INFO - Step 10960: lr=1.00E-05, loss= 1.2740 (max= 2.0673), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:57:10,741 - root - INFO - Step 10970: lr=1.00E-05, loss= 1.2711 (max= 2.0344), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:57:10,742 - root - INFO - Step 10970: lr=1.00E-05, loss= 1.2711 (max= 2.0344), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:57:10,742 - root - INFO - Step 10970: lr=1.00E-05, loss= 1.2711 (max= 2.0344), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:57:10,742 - root - INFO - Step 10970: lr=1.00E-05, loss= 1.2711 (max= 2.0344), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:57:10,742 - root - INFO - Step 10970: lr=1.00E-05, loss= 1.2711 (max= 2.0344), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:57:10,742 - root - INFO - Step 10970: lr=1.00E-05, loss= 1.2711 (max= 2.0344), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:57:10,742 - root - INFO - Step 10970: lr=1.00E-05, loss= 1.2711 (max= 2.0344), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:57:10,743 - root - INFO - Step 10970: lr=1.00E-05, loss= 1.2711 (max= 2.0344), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:57:28,769 - root - INFO - Step 10980: lr=1.00E-05, loss= 1.2652 (max= 2.3744), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:57:28,769 - root - INFO - Step 10980: lr=1.00E-05, loss= 1.2652 (max= 2.3744), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:57:28,769 - root - INFO - Step 10980: lr=1.00E-05, loss= 1.2652 (max= 2.3744), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:57:28,770 - root - INFO - Step 10980: lr=1.00E-05, loss= 1.2652 (max= 2.3744), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:57:28,770 - root - INFO - Step 10980: lr=1.00E-05, loss= 1.2652 (max= 2.3744), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:57:28,770 - root - INFO - Step 10980: lr=1.00E-05, loss= 1.2652 (max= 2.3744), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:57:28,770 - root - INFO - Step 10980: lr=1.00E-05, loss= 1.2652 (max= 2.3744), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:57:28,770 - root - INFO - Step 10980: lr=1.00E-05, loss= 1.2652 (max= 2.3744), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:57:46,808 - root - INFO - Step 10990: lr=1.00E-05, loss= 1.2472 (max= 2.2212), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:57:46,808 - root - INFO - Step 10990: lr=1.00E-05, loss= 1.2472 (max= 2.2212), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:57:46,808 - root - INFO - Step 10990: lr=1.00E-05, loss= 1.2472 (max= 2.2212), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:57:46,808 - root - INFO - Step 10990: lr=1.00E-05, loss= 1.2472 (max= 2.2212), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:57:46,808 - root - INFO - Step 10990: lr=1.00E-05, loss= 1.2472 (max= 2.2212), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:57:46,808 - root - INFO - Step 10990: lr=1.00E-05, loss= 1.2472 (max= 2.2212), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:57:46,808 - root - INFO - Step 10990: lr=1.00E-05, loss= 1.2472 (max= 2.2212), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:57:46,808 - root - INFO - Step 10990: lr=1.00E-05, loss= 1.2472 (max= 2.2212), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-11000 +2025-10-24 14:58:04,824 - root - INFO - Step 11000: lr=1.00E-05, loss= 1.2831 (max= 2.4312), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:58:04,824 - root - INFO - Step 11000: lr=1.00E-05, loss= 1.2831 (max= 2.4312), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:58:04,825 - root - INFO - Saving a full checkpoint at step 11000 +2025-10-24 14:58:04,825 - root - INFO - Saving a full checkpoint at step 11000 +2025-10-24 14:58:04,825 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 14:58:04,825 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 14:58:04,825 - root - INFO - Step 11000: lr=1.00E-05, loss= 1.2831 (max= 2.4312), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:58:04,825 - root - INFO - Step 11000: lr=1.00E-05, loss= 1.2831 (max= 2.4312), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:58:04,825 - root - INFO - Step 11000: lr=1.00E-05, loss= 1.2831 (max= 2.4312), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:58:04,825 - root - INFO - Saving a full checkpoint at step 11000 +2025-10-24 14:58:04,825 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 14:58:04,825 - root - INFO - Saving a full checkpoint at step 11000 +2025-10-24 14:58:04,825 - root - INFO - Saving a full checkpoint at step 11000 +2025-10-24 14:58:04,825 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 14:58:04,825 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 14:58:04,825 - root - INFO - Step 11000: lr=1.00E-05, loss= 1.2831 (max= 2.4312), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:58:04,825 - root - INFO - Saving a full checkpoint at step 11000 +2025-10-24 14:58:04,825 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 14:58:04,825 - root - INFO - Step 11000: lr=1.00E-05, loss= 1.2831 (max= 2.4312), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:58:04,825 - root - INFO - Saving a full checkpoint at step 11000 +2025-10-24 14:58:04,825 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 14:58:04,826 - root - INFO - Step 11000: lr=1.00E-05, loss= 1.2831 (max= 2.4312), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:58:04,826 - root - INFO - Saving a full checkpoint at step 11000 +2025-10-24 14:58:04,826 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-11000! Save time: 4.657272577285767 +2025-10-24 14:58:19,629 - root - INFO - Finished saving the checkpoint in 14.80 seconds +2025-10-24 14:58:19,638 - root - INFO - Finished saving the checkpoint in 14.81 seconds +2025-10-24 14:58:19,640 - root - INFO - Finished saving the checkpoint in 14.82 seconds +2025-10-24 14:58:19,641 - root - INFO - Finished saving the checkpoint in 14.82 seconds +2025-10-24 14:58:19,641 - root - INFO - Finished saving the checkpoint in 14.82 seconds +2025-10-24 14:58:19,642 - root - INFO - Finished saving the checkpoint in 14.82 seconds +2025-10-24 14:58:19,642 - root - INFO - Finished saving the checkpoint in 14.82 seconds +2025-10-24 14:58:19,644 - root - INFO - Finished saving the checkpoint in 14.82 seconds +2025-10-24 14:58:37,606 - root - INFO - Step 11010: lr=1.00E-05, loss= 1.2603 (max= 3.4334), tps=9997, mfu=20.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 14:58:37,606 - root - INFO - Step 11010: lr=1.00E-05, loss= 1.2603 (max= 3.4334), tps=9997, mfu=20.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 14:58:37,606 - root - INFO - Step 11010: lr=1.00E-05, loss= 1.2603 (max= 3.4334), tps=9997, mfu=20.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 14:58:37,606 - root - INFO - Step 11010: lr=1.00E-05, loss= 1.2603 (max= 3.4334), tps=9997, mfu=20.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 14:58:37,606 - root - INFO - Step 11010: lr=1.00E-05, loss= 1.2603 (max= 3.4334), tps=9997, mfu=20.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 14:58:37,606 - root - INFO - Step 11010: lr=1.00E-05, loss= 1.2603 (max= 3.4334), tps=9997, mfu=20.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 14:58:37,606 - root - INFO - Step 11010: lr=1.00E-05, loss= 1.2603 (max= 3.4334), tps=9997, mfu=20.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 14:58:37,606 - root - INFO - Step 11010: lr=1.00E-05, loss= 1.2603 (max= 3.4334), tps=9997, mfu=20.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 14:58:55,609 - root - INFO - Step 11020: lr=1.00E-05, loss= 1.2434 (max= 2.1736), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:58:55,610 - root - INFO - Step 11020: lr=1.00E-05, loss= 1.2434 (max= 2.1736), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:58:55,610 - root - INFO - Step 11020: lr=1.00E-05, loss= 1.2434 (max= 2.1736), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:58:55,610 - root - INFO - Step 11020: lr=1.00E-05, loss= 1.2434 (max= 2.1736), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:58:55,610 - root - INFO - Step 11020: lr=1.00E-05, loss= 1.2434 (max= 2.1736), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:58:55,610 - root - INFO - Step 11020: lr=1.00E-05, loss= 1.2434 (max= 2.1736), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:58:55,610 - root - INFO - Step 11020: lr=1.00E-05, loss= 1.2434 (max= 2.1736), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:58:55,611 - root - INFO - Step 11020: lr=1.00E-05, loss= 1.2434 (max= 2.1736), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 14:59:13,630 - root - INFO - Step 11030: lr=1.00E-05, loss= 1.2736 (max= 2.3367), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:13,630 - root - INFO - Step 11030: lr=1.00E-05, loss= 1.2736 (max= 2.3367), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:13,630 - root - INFO - Step 11030: lr=1.00E-05, loss= 1.2736 (max= 2.3367), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:13,630 - root - INFO - Step 11030: lr=1.00E-05, loss= 1.2736 (max= 2.3367), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:13,631 - root - INFO - Step 11030: lr=1.00E-05, loss= 1.2736 (max= 2.3367), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:13,631 - root - INFO - Step 11030: lr=1.00E-05, loss= 1.2736 (max= 2.3367), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:13,631 - root - INFO - Step 11030: lr=1.00E-05, loss= 1.2736 (max= 2.3367), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:13,631 - root - INFO - Step 11030: lr=1.00E-05, loss= 1.2736 (max= 2.3367), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:31,672 - root - INFO - Step 11040: lr=1.00E-05, loss= 1.2860 (max= 2.6400), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:31,673 - root - INFO - Step 11040: lr=1.00E-05, loss= 1.2860 (max= 2.6400), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:31,673 - root - INFO - Step 11040: lr=1.00E-05, loss= 1.2860 (max= 2.6400), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:31,673 - root - INFO - Step 11040: lr=1.00E-05, loss= 1.2860 (max= 2.6400), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:31,673 - root - INFO - Step 11040: lr=1.00E-05, loss= 1.2860 (max= 2.6400), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:31,673 - root - INFO - Step 11040: lr=1.00E-05, loss= 1.2860 (max= 2.6400), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:31,673 - root - INFO - Step 11040: lr=1.00E-05, loss= 1.2860 (max= 2.6400), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:31,673 - root - INFO - Step 11040: lr=1.00E-05, loss= 1.2860 (max= 2.6400), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:49,713 - root - INFO - Step 11050: lr=1.00E-05, loss= 1.2803 (max= 2.1643), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:49,713 - root - INFO - Step 11050: lr=1.00E-05, loss= 1.2803 (max= 2.1643), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:49,713 - root - INFO - Step 11050: lr=1.00E-05, loss= 1.2803 (max= 2.1643), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:49,713 - root - INFO - Step 11050: lr=1.00E-05, loss= 1.2803 (max= 2.1643), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:49,713 - root - INFO - Step 11050: lr=1.00E-05, loss= 1.2803 (max= 2.1643), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:49,713 - root - INFO - Step 11050: lr=1.00E-05, loss= 1.2803 (max= 2.1643), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:49,714 - root - INFO - Step 11050: lr=1.00E-05, loss= 1.2803 (max= 2.1643), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 14:59:49,714 - root - INFO - Step 11050: lr=1.00E-05, loss= 1.2803 (max= 2.1643), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:01,152 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:6151881 +2025-10-24 15:00:07,725 - root - INFO - Step 11060: lr=1.00E-05, loss= 1.2586 (max= 2.2199), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:07,725 - root - INFO - Step 11060: lr=1.00E-05, loss= 1.2586 (max= 2.2199), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:07,725 - root - INFO - Step 11060: lr=1.00E-05, loss= 1.2586 (max= 2.2199), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:07,726 - root - INFO - Step 11060: lr=1.00E-05, loss= 1.2586 (max= 2.2199), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:07,726 - root - INFO - Step 11060: lr=1.00E-05, loss= 1.2586 (max= 2.2199), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:07,726 - root - INFO - Step 11060: lr=1.00E-05, loss= 1.2586 (max= 2.2199), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:07,726 - root - INFO - Step 11060: lr=1.00E-05, loss= 1.2586 (max= 2.2199), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:07,726 - root - INFO - Step 11060: lr=1.00E-05, loss= 1.2586 (max= 2.2199), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:25,760 - root - INFO - Step 11070: lr=1.00E-05, loss= 1.2540 (max= 2.1347), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:25,760 - root - INFO - Step 11070: lr=1.00E-05, loss= 1.2540 (max= 2.1347), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:25,760 - root - INFO - Step 11070: lr=1.00E-05, loss= 1.2540 (max= 2.1347), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:25,760 - root - INFO - Step 11070: lr=1.00E-05, loss= 1.2540 (max= 2.1347), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:25,760 - root - INFO - Step 11070: lr=1.00E-05, loss= 1.2540 (max= 2.1347), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:25,760 - root - INFO - Step 11070: lr=1.00E-05, loss= 1.2540 (max= 2.1347), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:25,760 - root - INFO - Step 11070: lr=1.00E-05, loss= 1.2540 (max= 2.1347), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:25,760 - root - INFO - Step 11070: lr=1.00E-05, loss= 1.2540 (max= 2.1347), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:43,768 - root - INFO - Step 11080: lr=1.00E-05, loss= 1.2620 (max= 2.1530), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:43,768 - root - INFO - Step 11080: lr=1.00E-05, loss= 1.2620 (max= 2.1530), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:43,768 - root - INFO - Step 11080: lr=1.00E-05, loss= 1.2620 (max= 2.1530), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:43,768 - root - INFO - Step 11080: lr=1.00E-05, loss= 1.2620 (max= 2.1530), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:43,769 - root - INFO - Step 11080: lr=1.00E-05, loss= 1.2620 (max= 2.1530), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:43,769 - root - INFO - Step 11080: lr=1.00E-05, loss= 1.2620 (max= 2.1530), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:43,769 - root - INFO - Step 11080: lr=1.00E-05, loss= 1.2620 (max= 2.1530), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:00:43,769 - root - INFO - Step 11080: lr=1.00E-05, loss= 1.2620 (max= 2.1530), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:01,790 - root - INFO - Step 11090: lr=1.00E-05, loss= 1.2544 (max= 2.1588), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:01,790 - root - INFO - Step 11090: lr=1.00E-05, loss= 1.2544 (max= 2.1588), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:01,790 - root - INFO - Step 11090: lr=1.00E-05, loss= 1.2544 (max= 2.1588), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:01,790 - root - INFO - Step 11090: lr=1.00E-05, loss= 1.2544 (max= 2.1588), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:01,790 - root - INFO - Step 11090: lr=1.00E-05, loss= 1.2544 (max= 2.1588), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:01,790 - root - INFO - Step 11090: lr=1.00E-05, loss= 1.2544 (max= 2.1588), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:01,791 - root - INFO - Step 11090: lr=1.00E-05, loss= 1.2544 (max= 2.1588), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:01,791 - root - INFO - Step 11090: lr=1.00E-05, loss= 1.2544 (max= 2.1588), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:19,786 - root - INFO - Step 11100: lr=1.00E-05, loss= 1.2719 (max= 3.7005), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:01:19,787 - root - INFO - Step 11100: lr=1.00E-05, loss= 1.2719 (max= 3.7005), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:01:19,787 - root - INFO - Step 11100: lr=1.00E-05, loss= 1.2719 (max= 3.7005), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:01:19,787 - root - INFO - Step 11100: lr=1.00E-05, loss= 1.2719 (max= 3.7005), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:01:19,787 - root - INFO - Step 11100: lr=1.00E-05, loss= 1.2719 (max= 3.7005), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:01:19,787 - root - INFO - Step 11100: lr=1.00E-05, loss= 1.2719 (max= 3.7005), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:01:19,787 - root - INFO - Step 11100: lr=1.00E-05, loss= 1.2719 (max= 3.7005), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:01:19,787 - root - INFO - Step 11100: lr=1.00E-05, loss= 1.2719 (max= 3.7005), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:01:37,812 - root - INFO - Step 11110: lr=1.00E-05, loss= 1.2361 (max= 2.2972), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:37,812 - root - INFO - Step 11110: lr=1.00E-05, loss= 1.2361 (max= 2.2972), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:37,812 - root - INFO - Step 11110: lr=1.00E-05, loss= 1.2361 (max= 2.2972), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:37,812 - root - INFO - Step 11110: lr=1.00E-05, loss= 1.2361 (max= 2.2972), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:37,812 - root - INFO - Step 11110: lr=1.00E-05, loss= 1.2361 (max= 2.2972), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:37,812 - root - INFO - Step 11110: lr=1.00E-05, loss= 1.2361 (max= 2.2972), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:37,813 - root - INFO - Step 11110: lr=1.00E-05, loss= 1.2361 (max= 2.2972), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:37,813 - root - INFO - Step 11110: lr=1.00E-05, loss= 1.2361 (max= 2.2972), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:55,811 - root - INFO - Step 11120: lr=1.00E-05, loss= 1.2309 (max= 2.3620), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:55,811 - root - INFO - Step 11120: lr=1.00E-05, loss= 1.2309 (max= 2.3620), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:55,811 - root - INFO - Step 11120: lr=1.00E-05, loss= 1.2309 (max= 2.3620), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:55,811 - root - INFO - Step 11120: lr=1.00E-05, loss= 1.2309 (max= 2.3620), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:55,811 - root - INFO - Step 11120: lr=1.00E-05, loss= 1.2309 (max= 2.3620), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:55,811 - root - INFO - Step 11120: lr=1.00E-05, loss= 1.2309 (max= 2.3620), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:55,811 - root - INFO - Step 11120: lr=1.00E-05, loss= 1.2309 (max= 2.3620), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:01:55,811 - root - INFO - Step 11120: lr=1.00E-05, loss= 1.2309 (max= 2.3620), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:13,842 - root - INFO - Step 11130: lr=1.00E-05, loss= 1.2436 (max= 2.1190), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:13,842 - root - INFO - Step 11130: lr=1.00E-05, loss= 1.2436 (max= 2.1190), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:13,842 - root - INFO - Step 11130: lr=1.00E-05, loss= 1.2436 (max= 2.1190), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:13,842 - root - INFO - Step 11130: lr=1.00E-05, loss= 1.2436 (max= 2.1190), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:13,842 - root - INFO - Step 11130: lr=1.00E-05, loss= 1.2436 (max= 2.1190), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:13,842 - root - INFO - Step 11130: lr=1.00E-05, loss= 1.2436 (max= 2.1190), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:13,842 - root - INFO - Step 11130: lr=1.00E-05, loss= 1.2436 (max= 2.1190), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:13,843 - root - INFO - Step 11130: lr=1.00E-05, loss= 1.2436 (max= 2.1190), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:31,868 - root - INFO - Step 11140: lr=1.00E-05, loss= 1.2816 (max= 2.6873), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:31,868 - root - INFO - Step 11140: lr=1.00E-05, loss= 1.2816 (max= 2.6873), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:31,868 - root - INFO - Step 11140: lr=1.00E-05, loss= 1.2816 (max= 2.6873), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:31,868 - root - INFO - Step 11140: lr=1.00E-05, loss= 1.2816 (max= 2.6873), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:31,868 - root - INFO - Step 11140: lr=1.00E-05, loss= 1.2816 (max= 2.6873), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:31,868 - root - INFO - Step 11140: lr=1.00E-05, loss= 1.2816 (max= 2.6873), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:31,869 - root - INFO - Step 11140: lr=1.00E-05, loss= 1.2816 (max= 2.6873), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:31,869 - root - INFO - Step 11140: lr=1.00E-05, loss= 1.2816 (max= 2.6873), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:49,864 - root - INFO - Step 11150: lr=1.00E-05, loss= 1.2699 (max= 2.2689), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:49,864 - root - INFO - Step 11150: lr=1.00E-05, loss= 1.2699 (max= 2.2689), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:49,864 - root - INFO - Step 11150: lr=1.00E-05, loss= 1.2699 (max= 2.2689), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:49,864 - root - INFO - Step 11150: lr=1.00E-05, loss= 1.2699 (max= 2.2689), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:49,865 - root - INFO - Step 11150: lr=1.00E-05, loss= 1.2699 (max= 2.2689), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:49,865 - root - INFO - Step 11150: lr=1.00E-05, loss= 1.2699 (max= 2.2689), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:49,865 - root - INFO - Step 11150: lr=1.00E-05, loss= 1.2699 (max= 2.2689), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:02:49,865 - root - INFO - Step 11150: lr=1.00E-05, loss= 1.2699 (max= 2.2689), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:03:07,881 - root - INFO - Step 11160: lr=1.00E-05, loss= 1.2597 (max= 2.2895), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:03:07,881 - root - INFO - Step 11160: lr=1.00E-05, loss= 1.2597 (max= 2.2895), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:03:07,881 - root - INFO - Step 11160: lr=1.00E-05, loss= 1.2597 (max= 2.2895), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:03:07,881 - root - INFO - Step 11160: lr=1.00E-05, loss= 1.2597 (max= 2.2895), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:03:07,881 - root - INFO - Step 11160: lr=1.00E-05, loss= 1.2597 (max= 2.2895), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:03:07,882 - root - INFO - Step 11160: lr=1.00E-05, loss= 1.2597 (max= 2.2895), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:03:07,882 - root - INFO - Step 11160: lr=1.00E-05, loss= 1.2597 (max= 2.2895), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:03:07,882 - root - INFO - Step 11160: lr=1.00E-05, loss= 1.2597 (max= 2.2895), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:03:25,904 - root - INFO - Step 11170: lr=1.00E-05, loss= 1.2446 (max= 2.6258), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:03:25,904 - root - INFO - Step 11170: lr=1.00E-05, loss= 1.2446 (max= 2.6258), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:03:25,904 - root - INFO - Step 11170: lr=1.00E-05, loss= 1.2446 (max= 2.6258), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:03:25,904 - root - INFO - Step 11170: lr=1.00E-05, loss= 1.2446 (max= 2.6258), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:03:25,904 - root - INFO - Step 11170: lr=1.00E-05, loss= 1.2446 (max= 2.6258), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:03:25,904 - root - INFO - Step 11170: lr=1.00E-05, loss= 1.2446 (max= 2.6258), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:03:25,904 - root - INFO - Step 11170: lr=1.00E-05, loss= 1.2446 (max= 2.6258), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:03:25,904 - root - INFO - Step 11170: lr=1.00E-05, loss= 1.2446 (max= 2.6258), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:03:43,915 - root - INFO - Step 11180: lr=1.00E-05, loss= 1.2688 (max= 2.2477), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:03:43,915 - root - INFO - Step 11180: lr=1.00E-05, loss= 1.2688 (max= 2.2477), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:03:43,915 - root - INFO - Step 11180: lr=1.00E-05, loss= 1.2688 (max= 2.2477), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:03:43,915 - root - INFO - Step 11180: lr=1.00E-05, loss= 1.2688 (max= 2.2477), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:03:43,915 - root - INFO - Step 11180: lr=1.00E-05, loss= 1.2688 (max= 2.2477), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:03:43,916 - root - INFO - Step 11180: lr=1.00E-05, loss= 1.2688 (max= 2.2477), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:03:43,916 - root - INFO - Step 11180: lr=1.00E-05, loss= 1.2688 (max= 2.2477), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:03:43,916 - root - INFO - Step 11180: lr=1.00E-05, loss= 1.2688 (max= 2.2477), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:01,923 - root - INFO - Step 11190: lr=1.00E-05, loss= 1.2624 (max= 2.2007), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:01,923 - root - INFO - Step 11190: lr=1.00E-05, loss= 1.2624 (max= 2.2007), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:01,923 - root - INFO - Step 11190: lr=1.00E-05, loss= 1.2624 (max= 2.2007), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:01,923 - root - INFO - Step 11190: lr=1.00E-05, loss= 1.2624 (max= 2.2007), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:01,923 - root - INFO - Step 11190: lr=1.00E-05, loss= 1.2624 (max= 2.2007), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:01,924 - root - INFO - Step 11190: lr=1.00E-05, loss= 1.2624 (max= 2.2007), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:01,924 - root - INFO - Step 11190: lr=1.00E-05, loss= 1.2624 (max= 2.2007), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:01,924 - root - INFO - Step 11190: lr=1.00E-05, loss= 1.2624 (max= 2.2007), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:19,881 - root - INFO - Step 11200: lr=1.00E-05, loss= 1.2372 (max= 2.0767), tps=18251, mfu=38.03%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:19,881 - root - INFO - Step 11200: lr=1.00E-05, loss= 1.2372 (max= 2.0767), tps=18251, mfu=38.03%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:19,881 - root - INFO - Step 11200: lr=1.00E-05, loss= 1.2372 (max= 2.0767), tps=18250, mfu=38.03%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:19,881 - root - INFO - Step 11200: lr=1.00E-05, loss= 1.2372 (max= 2.0767), tps=18251, mfu=38.03%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:19,882 - root - INFO - Step 11200: lr=1.00E-05, loss= 1.2372 (max= 2.0767), tps=18251, mfu=38.03%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:19,882 - root - INFO - Step 11200: lr=1.00E-05, loss= 1.2372 (max= 2.0767), tps=18251, mfu=38.03%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:19,882 - root - INFO - Step 11200: lr=1.00E-05, loss= 1.2372 (max= 2.0767), tps=18251, mfu=38.03%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:19,882 - root - INFO - Step 11200: lr=1.00E-05, loss= 1.2372 (max= 2.0767), tps=18250, mfu=38.03%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:37,907 - root - INFO - Step 11210: lr=1.00E-05, loss= 1.2518 (max= 2.2019), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:37,907 - root - INFO - Step 11210: lr=1.00E-05, loss= 1.2518 (max= 2.2019), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:37,907 - root - INFO - Step 11210: lr=1.00E-05, loss= 1.2518 (max= 2.2019), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:37,907 - root - INFO - Step 11210: lr=1.00E-05, loss= 1.2518 (max= 2.2019), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:37,907 - root - INFO - Step 11210: lr=1.00E-05, loss= 1.2518 (max= 2.2019), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:37,908 - root - INFO - Step 11210: lr=1.00E-05, loss= 1.2518 (max= 2.2019), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:37,908 - root - INFO - Step 11210: lr=1.00E-05, loss= 1.2518 (max= 2.2019), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:37,908 - root - INFO - Step 11210: lr=1.00E-05, loss= 1.2518 (max= 2.2019), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:55,917 - root - INFO - Step 11220: lr=1.00E-05, loss= 1.2509 (max= 2.5148), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:55,917 - root - INFO - Step 11220: lr=1.00E-05, loss= 1.2509 (max= 2.5148), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:55,917 - root - INFO - Step 11220: lr=1.00E-05, loss= 1.2509 (max= 2.5148), tps=18198, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:55,917 - root - INFO - Step 11220: lr=1.00E-05, loss= 1.2509 (max= 2.5148), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:55,917 - root - INFO - Step 11220: lr=1.00E-05, loss= 1.2509 (max= 2.5148), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:55,917 - root - INFO - Step 11220: lr=1.00E-05, loss= 1.2509 (max= 2.5148), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:55,917 - root - INFO - Step 11220: lr=1.00E-05, loss= 1.2509 (max= 2.5148), tps=18198, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:04:55,917 - root - INFO - Step 11220: lr=1.00E-05, loss= 1.2509 (max= 2.5148), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:05:13,927 - root - INFO - Step 11230: lr=1.00E-05, loss= 1.2700 (max= 2.1814), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:13,927 - root - INFO - Step 11230: lr=1.00E-05, loss= 1.2700 (max= 2.1814), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:13,928 - root - INFO - Step 11230: lr=1.00E-05, loss= 1.2700 (max= 2.1814), tps=18198, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:13,928 - root - INFO - Step 11230: lr=1.00E-05, loss= 1.2700 (max= 2.1814), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:13,928 - root - INFO - Step 11230: lr=1.00E-05, loss= 1.2700 (max= 2.1814), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:13,928 - root - INFO - Step 11230: lr=1.00E-05, loss= 1.2700 (max= 2.1814), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:13,928 - root - INFO - Step 11230: lr=1.00E-05, loss= 1.2700 (max= 2.1814), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:13,928 - root - INFO - Step 11230: lr=1.00E-05, loss= 1.2700 (max= 2.1814), tps=18198, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:31,940 - root - INFO - Step 11240: lr=1.00E-05, loss= 1.2516 (max= 2.0488), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:31,940 - root - INFO - Step 11240: lr=1.00E-05, loss= 1.2516 (max= 2.0488), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:31,941 - root - INFO - Step 11240: lr=1.00E-05, loss= 1.2516 (max= 2.0488), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:31,941 - root - INFO - Step 11240: lr=1.00E-05, loss= 1.2516 (max= 2.0488), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:31,941 - root - INFO - Step 11240: lr=1.00E-05, loss= 1.2516 (max= 2.0488), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:31,941 - root - INFO - Step 11240: lr=1.00E-05, loss= 1.2516 (max= 2.0488), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:31,941 - root - INFO - Step 11240: lr=1.00E-05, loss= 1.2516 (max= 2.0488), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:31,941 - root - INFO - Step 11240: lr=1.00E-05, loss= 1.2516 (max= 2.0488), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:49,966 - root - INFO - Step 11250: lr=1.00E-05, loss= 1.2837 (max= 2.2722), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:49,966 - root - INFO - Step 11250: lr=1.00E-05, loss= 1.2837 (max= 2.2722), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:49,966 - root - INFO - Step 11250: lr=1.00E-05, loss= 1.2837 (max= 2.2722), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:49,966 - root - INFO - Step 11250: lr=1.00E-05, loss= 1.2837 (max= 2.2722), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:49,966 - root - INFO - Step 11250: lr=1.00E-05, loss= 1.2837 (max= 2.2722), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:49,966 - root - INFO - Step 11250: lr=1.00E-05, loss= 1.2837 (max= 2.2722), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:49,966 - root - INFO - Step 11250: lr=1.00E-05, loss= 1.2837 (max= 2.2722), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:05:49,966 - root - INFO - Step 11250: lr=1.00E-05, loss= 1.2837 (max= 2.2722), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:07,969 - root - INFO - Step 11260: lr=1.00E-05, loss= 1.2501 (max= 2.0736), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:07,969 - root - INFO - Step 11260: lr=1.00E-05, loss= 1.2501 (max= 2.0736), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:07,969 - root - INFO - Step 11260: lr=1.00E-05, loss= 1.2501 (max= 2.0736), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:07,969 - root - INFO - Step 11260: lr=1.00E-05, loss= 1.2501 (max= 2.0736), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:07,969 - root - INFO - Step 11260: lr=1.00E-05, loss= 1.2501 (max= 2.0736), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:07,969 - root - INFO - Step 11260: lr=1.00E-05, loss= 1.2501 (max= 2.0736), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:07,969 - root - INFO - Step 11260: lr=1.00E-05, loss= 1.2501 (max= 2.0736), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:07,969 - root - INFO - Step 11260: lr=1.00E-05, loss= 1.2501 (max= 2.0736), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:25,989 - root - INFO - Step 11270: lr=1.00E-05, loss= 1.2613 (max= 2.3011), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:25,989 - root - INFO - Step 11270: lr=1.00E-05, loss= 1.2613 (max= 2.3011), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:25,989 - root - INFO - Step 11270: lr=1.00E-05, loss= 1.2613 (max= 2.3011), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:25,989 - root - INFO - Step 11270: lr=1.00E-05, loss= 1.2613 (max= 2.3011), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:25,989 - root - INFO - Step 11270: lr=1.00E-05, loss= 1.2613 (max= 2.3011), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:25,989 - root - INFO - Step 11270: lr=1.00E-05, loss= 1.2613 (max= 2.3011), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:25,989 - root - INFO - Step 11270: lr=1.00E-05, loss= 1.2613 (max= 2.3011), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:25,989 - root - INFO - Step 11270: lr=1.00E-05, loss= 1.2613 (max= 2.3011), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:44,007 - root - INFO - Step 11280: lr=1.00E-05, loss= 1.2412 (max= 2.1576), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:44,007 - root - INFO - Step 11280: lr=1.00E-05, loss= 1.2412 (max= 2.1576), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:44,007 - root - INFO - Step 11280: lr=1.00E-05, loss= 1.2412 (max= 2.1576), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:44,007 - root - INFO - Step 11280: lr=1.00E-05, loss= 1.2412 (max= 2.1576), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:44,007 - root - INFO - Step 11280: lr=1.00E-05, loss= 1.2412 (max= 2.1576), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:44,007 - root - INFO - Step 11280: lr=1.00E-05, loss= 1.2412 (max= 2.1576), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:44,007 - root - INFO - Step 11280: lr=1.00E-05, loss= 1.2412 (max= 2.1576), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:06:44,007 - root - INFO - Step 11280: lr=1.00E-05, loss= 1.2412 (max= 2.1576), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:07:02,008 - root - INFO - Step 11290: lr=1.00E-05, loss= 1.2602 (max= 2.3691), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:07:02,008 - root - INFO - Step 11290: lr=1.00E-05, loss= 1.2602 (max= 2.3691), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:07:02,008 - root - INFO - Step 11290: lr=1.00E-05, loss= 1.2602 (max= 2.3691), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:07:02,008 - root - INFO - Step 11290: lr=1.00E-05, loss= 1.2602 (max= 2.3691), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:07:02,008 - root - INFO - Step 11290: lr=1.00E-05, loss= 1.2602 (max= 2.3691), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:07:02,008 - root - INFO - Step 11290: lr=1.00E-05, loss= 1.2602 (max= 2.3691), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:07:02,008 - root - INFO - Step 11290: lr=1.00E-05, loss= 1.2602 (max= 2.3691), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:07:02,008 - root - INFO - Step 11290: lr=1.00E-05, loss= 1.2602 (max= 2.3691), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:07:20,028 - root - INFO - Step 11300: lr=1.00E-05, loss= 1.2463 (max= 2.2669), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:07:20,028 - root - INFO - Step 11300: lr=1.00E-05, loss= 1.2463 (max= 2.2669), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:07:20,028 - root - INFO - Step 11300: lr=1.00E-05, loss= 1.2463 (max= 2.2669), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:07:20,029 - root - INFO - Step 11300: lr=1.00E-05, loss= 1.2463 (max= 2.2669), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:07:20,029 - root - INFO - Step 11300: lr=1.00E-05, loss= 1.2463 (max= 2.2669), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:07:20,029 - root - INFO - Step 11300: lr=1.00E-05, loss= 1.2463 (max= 2.2669), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:07:20,029 - root - INFO - Step 11300: lr=1.00E-05, loss= 1.2463 (max= 2.2669), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:07:20,029 - root - INFO - Step 11300: lr=1.00E-05, loss= 1.2463 (max= 2.2669), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:07:38,010 - root - INFO - Step 11310: lr=1.00E-05, loss= 1.2681 (max= 2.6072), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:07:38,010 - root - INFO - Step 11310: lr=1.00E-05, loss= 1.2681 (max= 2.6072), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:07:38,010 - root - INFO - Step 11310: lr=1.00E-05, loss= 1.2681 (max= 2.6072), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:07:38,010 - root - INFO - Step 11310: lr=1.00E-05, loss= 1.2681 (max= 2.6072), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:07:38,010 - root - INFO - Step 11310: lr=1.00E-05, loss= 1.2681 (max= 2.6072), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:07:38,010 - root - INFO - Step 11310: lr=1.00E-05, loss= 1.2681 (max= 2.6072), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:07:38,010 - root - INFO - Step 11310: lr=1.00E-05, loss= 1.2681 (max= 2.6072), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:07:38,010 - root - INFO - Step 11310: lr=1.00E-05, loss= 1.2681 (max= 2.6072), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:07:56,042 - root - INFO - Step 11320: lr=1.00E-05, loss= 1.2544 (max= 3.6382), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:07:56,042 - root - INFO - Step 11320: lr=1.00E-05, loss= 1.2544 (max= 3.6382), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:07:56,042 - root - INFO - Step 11320: lr=1.00E-05, loss= 1.2544 (max= 3.6382), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:07:56,043 - root - INFO - Step 11320: lr=1.00E-05, loss= 1.2544 (max= 3.6382), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:07:56,043 - root - INFO - Step 11320: lr=1.00E-05, loss= 1.2544 (max= 3.6382), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:07:56,043 - root - INFO - Step 11320: lr=1.00E-05, loss= 1.2544 (max= 3.6382), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:07:56,043 - root - INFO - Step 11320: lr=1.00E-05, loss= 1.2544 (max= 3.6382), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:07:56,043 - root - INFO - Step 11320: lr=1.00E-05, loss= 1.2544 (max= 3.6382), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:08:14,063 - root - INFO - Step 11330: lr=1.00E-05, loss= 1.2458 (max= 2.1366), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:08:14,063 - root - INFO - Step 11330: lr=1.00E-05, loss= 1.2458 (max= 2.1366), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:08:14,063 - root - INFO - Step 11330: lr=1.00E-05, loss= 1.2458 (max= 2.1366), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:08:14,063 - root - INFO - Step 11330: lr=1.00E-05, loss= 1.2458 (max= 2.1366), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:08:14,063 - root - INFO - Step 11330: lr=1.00E-05, loss= 1.2458 (max= 2.1366), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:08:14,063 - root - INFO - Step 11330: lr=1.00E-05, loss= 1.2458 (max= 2.1366), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:08:14,063 - root - INFO - Step 11330: lr=1.00E-05, loss= 1.2458 (max= 2.1366), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:08:14,063 - root - INFO - Step 11330: lr=1.00E-05, loss= 1.2458 (max= 2.1366), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:08:32,681 - root - INFO - Step 11340: lr=1.00E-05, loss= 1.2457 (max= 2.2130), tps=17604, mfu=36.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.02s, 4.53%) +2025-10-24 15:08:32,681 - root - INFO - Step 11340: lr=1.00E-05, loss= 1.2457 (max= 2.2130), tps=17604, mfu=36.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.02s, 4.53%) +2025-10-24 15:08:32,681 - root - INFO - Step 11340: lr=1.00E-05, loss= 1.2457 (max= 2.2130), tps=17604, mfu=36.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.02s, 4.53%) +2025-10-24 15:08:32,681 - root - INFO - Step 11340: lr=1.00E-05, loss= 1.2457 (max= 2.2130), tps=17604, mfu=36.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.02s, 4.53%) +2025-10-24 15:08:32,681 - root - INFO - Step 11340: lr=1.00E-05, loss= 1.2457 (max= 2.2130), tps=17604, mfu=36.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.02s, 4.53%) +2025-10-24 15:08:32,682 - root - INFO - Step 11340: lr=1.00E-05, loss= 1.2457 (max= 2.2130), tps=17604, mfu=36.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.02s, 4.53%) +2025-10-24 15:08:32,682 - root - INFO - Step 11340: lr=1.00E-05, loss= 1.2457 (max= 2.2130), tps=17603, mfu=36.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.02s, 4.53%) +2025-10-24 15:08:32,683 - root - INFO - Step 11340: lr=1.00E-05, loss= 1.2457 (max= 2.2130), tps=17603, mfu=36.68%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.02s, 4.53%) +2025-10-24 15:08:50,690 - root - INFO - Step 11350: lr=1.00E-05, loss= 1.2601 (max= 2.3831), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:08:50,691 - root - INFO - Step 11350: lr=1.00E-05, loss= 1.2601 (max= 2.3831), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:08:50,691 - root - INFO - Step 11350: lr=1.00E-05, loss= 1.2601 (max= 2.3831), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:08:50,691 - root - INFO - Step 11350: lr=1.00E-05, loss= 1.2601 (max= 2.3831), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:08:50,691 - root - INFO - Step 11350: lr=1.00E-05, loss= 1.2601 (max= 2.3831), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:08:50,691 - root - INFO - Step 11350: lr=1.00E-05, loss= 1.2601 (max= 2.3831), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:08:50,691 - root - INFO - Step 11350: lr=1.00E-05, loss= 1.2601 (max= 2.3831), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:08:50,691 - root - INFO - Step 11350: lr=1.00E-05, loss= 1.2601 (max= 2.3831), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:09:08,715 - root - INFO - Step 11360: lr=1.00E-05, loss= 1.2431 (max= 1.9677), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:09:08,715 - root - INFO - Step 11360: lr=1.00E-05, loss= 1.2431 (max= 1.9677), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:09:08,715 - root - INFO - Step 11360: lr=1.00E-05, loss= 1.2431 (max= 1.9677), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:09:08,715 - root - INFO - Step 11360: lr=1.00E-05, loss= 1.2431 (max= 1.9677), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:09:08,715 - root - INFO - Step 11360: lr=1.00E-05, loss= 1.2431 (max= 1.9677), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:09:08,715 - root - INFO - Step 11360: lr=1.00E-05, loss= 1.2431 (max= 1.9677), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:09:08,715 - root - INFO - Step 11360: lr=1.00E-05, loss= 1.2431 (max= 1.9677), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:09:08,716 - root - INFO - Step 11360: lr=1.00E-05, loss= 1.2431 (max= 1.9677), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:09:26,724 - root - INFO - Step 11370: lr=1.00E-05, loss= 1.2464 (max= 2.2801), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:09:26,724 - root - INFO - Step 11370: lr=1.00E-05, loss= 1.2464 (max= 2.2801), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:09:26,725 - root - INFO - Step 11370: lr=1.00E-05, loss= 1.2464 (max= 2.2801), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:09:26,725 - root - INFO - Step 11370: lr=1.00E-05, loss= 1.2464 (max= 2.2801), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:09:26,725 - root - INFO - Step 11370: lr=1.00E-05, loss= 1.2464 (max= 2.2801), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:09:26,725 - root - INFO - Step 11370: lr=1.00E-05, loss= 1.2464 (max= 2.2801), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:09:26,725 - root - INFO - Step 11370: lr=1.00E-05, loss= 1.2464 (max= 2.2801), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:09:26,725 - root - INFO - Step 11370: lr=1.00E-05, loss= 1.2464 (max= 2.2801), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:09:44,747 - root - INFO - Step 11380: lr=1.00E-05, loss= 1.2215 (max= 2.2389), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:09:44,747 - root - INFO - Step 11380: lr=1.00E-05, loss= 1.2215 (max= 2.2389), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:09:44,747 - root - INFO - Step 11380: lr=1.00E-05, loss= 1.2215 (max= 2.2389), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:09:44,747 - root - INFO - Step 11380: lr=1.00E-05, loss= 1.2215 (max= 2.2389), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:09:44,747 - root - INFO - Step 11380: lr=1.00E-05, loss= 1.2215 (max= 2.2389), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:09:44,747 - root - INFO - Step 11380: lr=1.00E-05, loss= 1.2215 (max= 2.2389), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:09:44,747 - root - INFO - Step 11380: lr=1.00E-05, loss= 1.2215 (max= 2.2389), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:09:44,747 - root - INFO - Step 11380: lr=1.00E-05, loss= 1.2215 (max= 2.2389), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:10:02,785 - root - INFO - Step 11390: lr=1.00E-05, loss= 1.2513 (max= 2.4614), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:02,785 - root - INFO - Step 11390: lr=1.00E-05, loss= 1.2513 (max= 2.4614), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:02,785 - root - INFO - Step 11390: lr=1.00E-05, loss= 1.2513 (max= 2.4614), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:02,785 - root - INFO - Step 11390: lr=1.00E-05, loss= 1.2513 (max= 2.4614), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:02,785 - root - INFO - Step 11390: lr=1.00E-05, loss= 1.2513 (max= 2.4614), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:02,785 - root - INFO - Step 11390: lr=1.00E-05, loss= 1.2513 (max= 2.4614), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:02,786 - root - INFO - Step 11390: lr=1.00E-05, loss= 1.2513 (max= 2.4614), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:02,789 - root - INFO - Step 11390: lr=1.00E-05, loss= 1.2513 (max= 2.4614), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:20,803 - root - INFO - Step 11400: lr=1.00E-05, loss= 1.2574 (max= 2.3120), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:20,803 - root - INFO - Step 11400: lr=1.00E-05, loss= 1.2574 (max= 2.3120), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:20,803 - root - INFO - Step 11400: lr=1.00E-05, loss= 1.2574 (max= 2.3120), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:20,803 - root - INFO - Step 11400: lr=1.00E-05, loss= 1.2574 (max= 2.3120), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:20,804 - root - INFO - Step 11400: lr=1.00E-05, loss= 1.2574 (max= 2.3120), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:20,804 - root - INFO - Step 11400: lr=1.00E-05, loss= 1.2574 (max= 2.3120), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:20,804 - root - INFO - Step 11400: lr=1.00E-05, loss= 1.2574 (max= 2.3120), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:20,804 - root - INFO - Step 11400: lr=1.00E-05, loss= 1.2574 (max= 2.3120), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:38,784 - root - INFO - Step 11410: lr=1.00E-05, loss= 1.2417 (max= 2.1901), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:10:38,784 - root - INFO - Step 11410: lr=1.00E-05, loss= 1.2417 (max= 2.1901), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:10:38,784 - root - INFO - Step 11410: lr=1.00E-05, loss= 1.2417 (max= 2.1901), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:10:38,784 - root - INFO - Step 11410: lr=1.00E-05, loss= 1.2417 (max= 2.1901), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:10:38,784 - root - INFO - Step 11410: lr=1.00E-05, loss= 1.2417 (max= 2.1901), tps=18228, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:10:38,784 - root - INFO - Step 11410: lr=1.00E-05, loss= 1.2417 (max= 2.1901), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:10:38,784 - root - INFO - Step 11410: lr=1.00E-05, loss= 1.2417 (max= 2.1901), tps=18228, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:10:38,784 - root - INFO - Step 11410: lr=1.00E-05, loss= 1.2417 (max= 2.1901), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:10:56,802 - root - INFO - Step 11420: lr=1.00E-05, loss= 1.2175 (max= 2.1246), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:56,802 - root - INFO - Step 11420: lr=1.00E-05, loss= 1.2175 (max= 2.1246), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:56,802 - root - INFO - Step 11420: lr=1.00E-05, loss= 1.2175 (max= 2.1246), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:56,802 - root - INFO - Step 11420: lr=1.00E-05, loss= 1.2175 (max= 2.1246), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:56,802 - root - INFO - Step 11420: lr=1.00E-05, loss= 1.2175 (max= 2.1246), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:56,802 - root - INFO - Step 11420: lr=1.00E-05, loss= 1.2175 (max= 2.1246), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:56,802 - root - INFO - Step 11420: lr=1.00E-05, loss= 1.2175 (max= 2.1246), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:10:56,802 - root - INFO - Step 11420: lr=1.00E-05, loss= 1.2175 (max= 2.1246), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:11:14,801 - root - INFO - Step 11430: lr=1.00E-05, loss= 1.2669 (max= 3.4843), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:14,801 - root - INFO - Step 11430: lr=1.00E-05, loss= 1.2669 (max= 3.4843), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:14,802 - root - INFO - Step 11430: lr=1.00E-05, loss= 1.2669 (max= 3.4843), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:14,802 - root - INFO - Step 11430: lr=1.00E-05, loss= 1.2669 (max= 3.4843), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:14,802 - root - INFO - Step 11430: lr=1.00E-05, loss= 1.2669 (max= 3.4843), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:14,802 - root - INFO - Step 11430: lr=1.00E-05, loss= 1.2669 (max= 3.4843), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:14,802 - root - INFO - Step 11430: lr=1.00E-05, loss= 1.2669 (max= 3.4843), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:14,802 - root - INFO - Step 11430: lr=1.00E-05, loss= 1.2669 (max= 3.4843), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:32,816 - root - INFO - Step 11440: lr=1.00E-05, loss= 1.2727 (max= 2.2069), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:32,816 - root - INFO - Step 11440: lr=1.00E-05, loss= 1.2727 (max= 2.2069), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:32,816 - root - INFO - Step 11440: lr=1.00E-05, loss= 1.2727 (max= 2.2069), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:32,816 - root - INFO - Step 11440: lr=1.00E-05, loss= 1.2727 (max= 2.2069), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:32,816 - root - INFO - Step 11440: lr=1.00E-05, loss= 1.2727 (max= 2.2069), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:32,816 - root - INFO - Step 11440: lr=1.00E-05, loss= 1.2727 (max= 2.2069), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:32,816 - root - INFO - Step 11440: lr=1.00E-05, loss= 1.2727 (max= 2.2069), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:32,816 - root - INFO - Step 11440: lr=1.00E-05, loss= 1.2727 (max= 2.2069), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:50,843 - root - INFO - Step 11450: lr=1.00E-05, loss= 1.2599 (max= 2.6104), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:50,843 - root - INFO - Step 11450: lr=1.00E-05, loss= 1.2599 (max= 2.6104), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:50,843 - root - INFO - Step 11450: lr=1.00E-05, loss= 1.2599 (max= 2.6104), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:50,843 - root - INFO - Step 11450: lr=1.00E-05, loss= 1.2599 (max= 2.6104), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:50,843 - root - INFO - Step 11450: lr=1.00E-05, loss= 1.2599 (max= 2.6104), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:50,843 - root - INFO - Step 11450: lr=1.00E-05, loss= 1.2599 (max= 2.6104), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:50,843 - root - INFO - Step 11450: lr=1.00E-05, loss= 1.2599 (max= 2.6104), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:11:50,843 - root - INFO - Step 11450: lr=1.00E-05, loss= 1.2599 (max= 2.6104), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:12:08,824 - root - INFO - Step 11460: lr=1.00E-05, loss= 1.2530 (max= 2.3989), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:12:08,824 - root - INFO - Step 11460: lr=1.00E-05, loss= 1.2530 (max= 2.3989), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:12:08,824 - root - INFO - Step 11460: lr=1.00E-05, loss= 1.2530 (max= 2.3989), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:12:08,824 - root - INFO - Step 11460: lr=1.00E-05, loss= 1.2530 (max= 2.3989), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:12:08,824 - root - INFO - Step 11460: lr=1.00E-05, loss= 1.2530 (max= 2.3989), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:12:08,824 - root - INFO - Step 11460: lr=1.00E-05, loss= 1.2530 (max= 2.3989), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:12:08,824 - root - INFO - Step 11460: lr=1.00E-05, loss= 1.2530 (max= 2.3989), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:12:08,824 - root - INFO - Step 11460: lr=1.00E-05, loss= 1.2530 (max= 2.3989), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:12:26,831 - root - INFO - Step 11470: lr=1.00E-05, loss= 1.2802 (max= 3.5392), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:12:26,831 - root - INFO - Step 11470: lr=1.00E-05, loss= 1.2802 (max= 3.5392), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:12:26,831 - root - INFO - Step 11470: lr=1.00E-05, loss= 1.2802 (max= 3.5392), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:12:26,831 - root - INFO - Step 11470: lr=1.00E-05, loss= 1.2802 (max= 3.5392), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:12:26,831 - root - INFO - Step 11470: lr=1.00E-05, loss= 1.2802 (max= 3.5392), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:12:26,831 - root - INFO - Step 11470: lr=1.00E-05, loss= 1.2802 (max= 3.5392), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:12:26,831 - root - INFO - Step 11470: lr=1.00E-05, loss= 1.2802 (max= 3.5392), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:12:26,831 - root - INFO - Step 11470: lr=1.00E-05, loss= 1.2802 (max= 3.5392), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:12:44,822 - root - INFO - Step 11480: lr=1.00E-05, loss= 1.2279 (max= 2.4495), tps=18217, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:12:44,822 - root - INFO - Step 11480: lr=1.00E-05, loss= 1.2279 (max= 2.4495), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:12:44,822 - root - INFO - Step 11480: lr=1.00E-05, loss= 1.2279 (max= 2.4495), tps=18217, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:12:44,822 - root - INFO - Step 11480: lr=1.00E-05, loss= 1.2279 (max= 2.4495), tps=18217, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:12:44,822 - root - INFO - Step 11480: lr=1.00E-05, loss= 1.2279 (max= 2.4495), tps=18217, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:12:44,822 - root - INFO - Step 11480: lr=1.00E-05, loss= 1.2279 (max= 2.4495), tps=18217, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:12:44,822 - root - INFO - Step 11480: lr=1.00E-05, loss= 1.2279 (max= 2.4495), tps=18217, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:12:44,822 - root - INFO - Step 11480: lr=1.00E-05, loss= 1.2279 (max= 2.4495), tps=18217, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:02,853 - root - INFO - Step 11490: lr=1.00E-05, loss= 1.2414 (max= 2.6660), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:02,853 - root - INFO - Step 11490: lr=1.00E-05, loss= 1.2414 (max= 2.6660), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:02,853 - root - INFO - Step 11490: lr=1.00E-05, loss= 1.2414 (max= 2.6660), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:02,853 - root - INFO - Step 11490: lr=1.00E-05, loss= 1.2414 (max= 2.6660), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:02,853 - root - INFO - Step 11490: lr=1.00E-05, loss= 1.2414 (max= 2.6660), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:02,853 - root - INFO - Step 11490: lr=1.00E-05, loss= 1.2414 (max= 2.6660), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:02,853 - root - INFO - Step 11490: lr=1.00E-05, loss= 1.2414 (max= 2.6660), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:02,854 - root - INFO - Step 11490: lr=1.00E-05, loss= 1.2414 (max= 2.6660), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:20,823 - root - INFO - Step 11500: lr=1.00E-05, loss= 1.2540 (max= 2.3840), tps=18239, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:20,823 - root - INFO - Step 11500: lr=1.00E-05, loss= 1.2540 (max= 2.3840), tps=18238, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:20,823 - root - INFO - Step 11500: lr=1.00E-05, loss= 1.2540 (max= 2.3840), tps=18238, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:20,823 - root - INFO - Step 11500: lr=1.00E-05, loss= 1.2540 (max= 2.3840), tps=18238, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:20,823 - root - INFO - Step 11500: lr=1.00E-05, loss= 1.2540 (max= 2.3840), tps=18239, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:20,823 - root - INFO - Step 11500: lr=1.00E-05, loss= 1.2540 (max= 2.3840), tps=18238, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:20,823 - root - INFO - Step 11500: lr=1.00E-05, loss= 1.2540 (max= 2.3840), tps=18238, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:20,823 - root - INFO - Step 11500: lr=1.00E-05, loss= 1.2540 (max= 2.3840), tps=18238, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:38,853 - root - INFO - Step 11510: lr=1.00E-05, loss= 1.2616 (max= 2.5558), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:38,853 - root - INFO - Step 11510: lr=1.00E-05, loss= 1.2616 (max= 2.5558), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:38,853 - root - INFO - Step 11510: lr=1.00E-05, loss= 1.2616 (max= 2.5558), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:38,853 - root - INFO - Step 11510: lr=1.00E-05, loss= 1.2616 (max= 2.5558), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:38,853 - root - INFO - Step 11510: lr=1.00E-05, loss= 1.2616 (max= 2.5558), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:38,853 - root - INFO - Step 11510: lr=1.00E-05, loss= 1.2616 (max= 2.5558), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:38,853 - root - INFO - Step 11510: lr=1.00E-05, loss= 1.2616 (max= 2.5558), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:38,854 - root - INFO - Step 11510: lr=1.00E-05, loss= 1.2616 (max= 2.5558), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:56,850 - root - INFO - Step 11520: lr=1.00E-05, loss= 1.2580 (max= 2.3694), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:56,850 - root - INFO - Step 11520: lr=1.00E-05, loss= 1.2580 (max= 2.3694), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:56,851 - root - INFO - Step 11520: lr=1.00E-05, loss= 1.2580 (max= 2.3694), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:56,851 - root - INFO - Step 11520: lr=1.00E-05, loss= 1.2580 (max= 2.3694), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:56,851 - root - INFO - Step 11520: lr=1.00E-05, loss= 1.2580 (max= 2.3694), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:56,851 - root - INFO - Step 11520: lr=1.00E-05, loss= 1.2580 (max= 2.3694), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:56,851 - root - INFO - Step 11520: lr=1.00E-05, loss= 1.2580 (max= 2.3694), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:13:56,851 - root - INFO - Step 11520: lr=1.00E-05, loss= 1.2580 (max= 2.3694), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:14,861 - root - INFO - Step 11530: lr=1.00E-05, loss= 1.2743 (max= 2.5049), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:14,861 - root - INFO - Step 11530: lr=1.00E-05, loss= 1.2743 (max= 2.5049), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:14,862 - root - INFO - Step 11530: lr=1.00E-05, loss= 1.2743 (max= 2.5049), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:14,862 - root - INFO - Step 11530: lr=1.00E-05, loss= 1.2743 (max= 2.5049), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:14,862 - root - INFO - Step 11530: lr=1.00E-05, loss= 1.2743 (max= 2.5049), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:14,862 - root - INFO - Step 11530: lr=1.00E-05, loss= 1.2743 (max= 2.5049), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:14,862 - root - INFO - Step 11530: lr=1.00E-05, loss= 1.2743 (max= 2.5049), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:14,862 - root - INFO - Step 11530: lr=1.00E-05, loss= 1.2743 (max= 2.5049), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:32,862 - root - INFO - Step 11540: lr=1.00E-05, loss= 1.2510 (max= 2.4159), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:32,862 - root - INFO - Step 11540: lr=1.00E-05, loss= 1.2510 (max= 2.4159), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:32,863 - root - INFO - Step 11540: lr=1.00E-05, loss= 1.2510 (max= 2.4159), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:32,863 - root - INFO - Step 11540: lr=1.00E-05, loss= 1.2510 (max= 2.4159), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:32,863 - root - INFO - Step 11540: lr=1.00E-05, loss= 1.2510 (max= 2.4159), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:32,863 - root - INFO - Step 11540: lr=1.00E-05, loss= 1.2510 (max= 2.4159), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:32,863 - root - INFO - Step 11540: lr=1.00E-05, loss= 1.2510 (max= 2.4159), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:32,863 - root - INFO - Step 11540: lr=1.00E-05, loss= 1.2510 (max= 2.4159), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:50,863 - root - INFO - Step 11550: lr=1.00E-05, loss= 1.2332 (max= 2.0244), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:50,863 - root - INFO - Step 11550: lr=1.00E-05, loss= 1.2332 (max= 2.0244), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:50,863 - root - INFO - Step 11550: lr=1.00E-05, loss= 1.2332 (max= 2.0244), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:50,863 - root - INFO - Step 11550: lr=1.00E-05, loss= 1.2332 (max= 2.0244), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:50,863 - root - INFO - Step 11550: lr=1.00E-05, loss= 1.2332 (max= 2.0244), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:50,863 - root - INFO - Step 11550: lr=1.00E-05, loss= 1.2332 (max= 2.0244), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:50,863 - root - INFO - Step 11550: lr=1.00E-05, loss= 1.2332 (max= 2.0244), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:14:50,863 - root - INFO - Step 11550: lr=1.00E-05, loss= 1.2332 (max= 2.0244), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:15:08,881 - root - INFO - Step 11560: lr=1.00E-05, loss= 1.2201 (max= 2.1217), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:15:08,881 - root - INFO - Step 11560: lr=1.00E-05, loss= 1.2201 (max= 2.1217), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:15:08,881 - root - INFO - Step 11560: lr=1.00E-05, loss= 1.2201 (max= 2.1217), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:15:08,881 - root - INFO - Step 11560: lr=1.00E-05, loss= 1.2201 (max= 2.1217), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:15:08,881 - root - INFO - Step 11560: lr=1.00E-05, loss= 1.2201 (max= 2.1217), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:15:08,881 - root - INFO - Step 11560: lr=1.00E-05, loss= 1.2201 (max= 2.1217), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:15:08,881 - root - INFO - Step 11560: lr=1.00E-05, loss= 1.2201 (max= 2.1217), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:15:08,881 - root - INFO - Step 11560: lr=1.00E-05, loss= 1.2201 (max= 2.1217), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:15:26,882 - root - INFO - Step 11570: lr=1.00E-05, loss= 1.2543 (max= 3.7109), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:15:26,882 - root - INFO - Step 11570: lr=1.00E-05, loss= 1.2543 (max= 3.7109), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:15:26,883 - root - INFO - Step 11570: lr=1.00E-05, loss= 1.2543 (max= 3.7109), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:15:26,883 - root - INFO - Step 11570: lr=1.00E-05, loss= 1.2543 (max= 3.7109), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:15:26,883 - root - INFO - Step 11570: lr=1.00E-05, loss= 1.2543 (max= 3.7109), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:15:26,883 - root - INFO - Step 11570: lr=1.00E-05, loss= 1.2543 (max= 3.7109), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:15:26,883 - root - INFO - Step 11570: lr=1.00E-05, loss= 1.2543 (max= 3.7109), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:15:26,883 - root - INFO - Step 11570: lr=1.00E-05, loss= 1.2543 (max= 3.7109), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:15:44,873 - root - INFO - Step 11580: lr=1.00E-05, loss= 1.2787 (max= 2.2321), tps=18217, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:15:44,873 - root - INFO - Step 11580: lr=1.00E-05, loss= 1.2787 (max= 2.2321), tps=18217, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:15:44,873 - root - INFO - Step 11580: lr=1.00E-05, loss= 1.2787 (max= 2.2321), tps=18217, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:15:44,873 - root - INFO - Step 11580: lr=1.00E-05, loss= 1.2787 (max= 2.2321), tps=18217, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:15:44,873 - root - INFO - Step 11580: lr=1.00E-05, loss= 1.2787 (max= 2.2321), tps=18217, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:15:44,873 - root - INFO - Step 11580: lr=1.00E-05, loss= 1.2787 (max= 2.2321), tps=18217, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:15:44,874 - root - INFO - Step 11580: lr=1.00E-05, loss= 1.2787 (max= 2.2321), tps=18217, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:15:44,874 - root - INFO - Step 11580: lr=1.00E-05, loss= 1.2787 (max= 2.2321), tps=18217, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:02,884 - root - INFO - Step 11590: lr=1.00E-05, loss= 1.2319 (max= 2.3922), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:02,885 - root - INFO - Step 11590: lr=1.00E-05, loss= 1.2319 (max= 2.3922), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:02,885 - root - INFO - Step 11590: lr=1.00E-05, loss= 1.2319 (max= 2.3922), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:02,885 - root - INFO - Step 11590: lr=1.00E-05, loss= 1.2319 (max= 2.3922), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:02,885 - root - INFO - Step 11590: lr=1.00E-05, loss= 1.2319 (max= 2.3922), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:02,885 - root - INFO - Step 11590: lr=1.00E-05, loss= 1.2319 (max= 2.3922), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:02,885 - root - INFO - Step 11590: lr=1.00E-05, loss= 1.2319 (max= 2.3922), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:02,885 - root - INFO - Step 11590: lr=1.00E-05, loss= 1.2319 (max= 2.3922), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:20,908 - root - INFO - Step 11600: lr=1.00E-05, loss= 1.2779 (max= 2.4784), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:20,908 - root - INFO - Step 11600: lr=1.00E-05, loss= 1.2779 (max= 2.4784), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:20,908 - root - INFO - Step 11600: lr=1.00E-05, loss= 1.2779 (max= 2.4784), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:20,908 - root - INFO - Step 11600: lr=1.00E-05, loss= 1.2779 (max= 2.4784), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:20,908 - root - INFO - Step 11600: lr=1.00E-05, loss= 1.2779 (max= 2.4784), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:20,908 - root - INFO - Step 11600: lr=1.00E-05, loss= 1.2779 (max= 2.4784), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:20,908 - root - INFO - Step 11600: lr=1.00E-05, loss= 1.2779 (max= 2.4784), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:20,908 - root - INFO - Step 11600: lr=1.00E-05, loss= 1.2779 (max= 2.4784), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:38,882 - root - INFO - Step 11610: lr=1.00E-05, loss= 1.2527 (max= 2.2781), tps=18234, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:38,882 - root - INFO - Step 11610: lr=1.00E-05, loss= 1.2527 (max= 2.2781), tps=18234, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:38,883 - root - INFO - Step 11610: lr=1.00E-05, loss= 1.2527 (max= 2.2781), tps=18234, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:38,883 - root - INFO - Step 11610: lr=1.00E-05, loss= 1.2527 (max= 2.2781), tps=18234, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:38,883 - root - INFO - Step 11610: lr=1.00E-05, loss= 1.2527 (max= 2.2781), tps=18234, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:38,883 - root - INFO - Step 11610: lr=1.00E-05, loss= 1.2527 (max= 2.2781), tps=18234, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:38,883 - root - INFO - Step 11610: lr=1.00E-05, loss= 1.2527 (max= 2.2781), tps=18234, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:38,883 - root - INFO - Step 11610: lr=1.00E-05, loss= 1.2527 (max= 2.2781), tps=18234, mfu=37.99%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:56,908 - root - INFO - Step 11620: lr=1.00E-05, loss= 1.2368 (max= 2.2362), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:56,908 - root - INFO - Step 11620: lr=1.00E-05, loss= 1.2368 (max= 2.2362), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:56,908 - root - INFO - Step 11620: lr=1.00E-05, loss= 1.2368 (max= 2.2362), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:56,908 - root - INFO - Step 11620: lr=1.00E-05, loss= 1.2368 (max= 2.2362), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:56,908 - root - INFO - Step 11620: lr=1.00E-05, loss= 1.2368 (max= 2.2362), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:56,908 - root - INFO - Step 11620: lr=1.00E-05, loss= 1.2368 (max= 2.2362), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:56,908 - root - INFO - Step 11620: lr=1.00E-05, loss= 1.2368 (max= 2.2362), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:16:56,908 - root - INFO - Step 11620: lr=1.00E-05, loss= 1.2368 (max= 2.2362), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:14,905 - root - INFO - Step 11630: lr=1.00E-05, loss= 1.2254 (max= 2.2773), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:14,905 - root - INFO - Step 11630: lr=1.00E-05, loss= 1.2254 (max= 2.2773), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:14,905 - root - INFO - Step 11630: lr=1.00E-05, loss= 1.2254 (max= 2.2773), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:14,905 - root - INFO - Step 11630: lr=1.00E-05, loss= 1.2254 (max= 2.2773), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:14,905 - root - INFO - Step 11630: lr=1.00E-05, loss= 1.2254 (max= 2.2773), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:14,905 - root - INFO - Step 11630: lr=1.00E-05, loss= 1.2254 (max= 2.2773), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:14,905 - root - INFO - Step 11630: lr=1.00E-05, loss= 1.2254 (max= 2.2773), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:14,905 - root - INFO - Step 11630: lr=1.00E-05, loss= 1.2254 (max= 2.2773), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:32,917 - root - INFO - Step 11640: lr=1.00E-05, loss= 1.2504 (max= 2.1040), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:32,917 - root - INFO - Step 11640: lr=1.00E-05, loss= 1.2504 (max= 2.1040), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:32,917 - root - INFO - Step 11640: lr=1.00E-05, loss= 1.2504 (max= 2.1040), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:32,917 - root - INFO - Step 11640: lr=1.00E-05, loss= 1.2504 (max= 2.1040), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:32,917 - root - INFO - Step 11640: lr=1.00E-05, loss= 1.2504 (max= 2.1040), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:32,917 - root - INFO - Step 11640: lr=1.00E-05, loss= 1.2504 (max= 2.1040), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:32,917 - root - INFO - Step 11640: lr=1.00E-05, loss= 1.2504 (max= 2.1040), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:32,917 - root - INFO - Step 11640: lr=1.00E-05, loss= 1.2504 (max= 2.1040), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:50,914 - root - INFO - Step 11650: lr=1.00E-05, loss= 1.2563 (max= 2.1847), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:50,914 - root - INFO - Step 11650: lr=1.00E-05, loss= 1.2563 (max= 2.1847), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:50,914 - root - INFO - Step 11650: lr=1.00E-05, loss= 1.2563 (max= 2.1847), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:50,914 - root - INFO - Step 11650: lr=1.00E-05, loss= 1.2563 (max= 2.1847), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:50,914 - root - INFO - Step 11650: lr=1.00E-05, loss= 1.2563 (max= 2.1847), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:50,914 - root - INFO - Step 11650: lr=1.00E-05, loss= 1.2563 (max= 2.1847), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:50,914 - root - INFO - Step 11650: lr=1.00E-05, loss= 1.2563 (max= 2.1847), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:17:50,914 - root - INFO - Step 11650: lr=1.00E-05, loss= 1.2563 (max= 2.1847), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:08,929 - root - INFO - Step 11660: lr=1.00E-05, loss= 1.2491 (max= 2.5885), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:08,929 - root - INFO - Step 11660: lr=1.00E-05, loss= 1.2491 (max= 2.5885), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:08,930 - root - INFO - Step 11660: lr=1.00E-05, loss= 1.2491 (max= 2.5885), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:08,930 - root - INFO - Step 11660: lr=1.00E-05, loss= 1.2491 (max= 2.5885), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:08,930 - root - INFO - Step 11660: lr=1.00E-05, loss= 1.2491 (max= 2.5885), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:08,930 - root - INFO - Step 11660: lr=1.00E-05, loss= 1.2491 (max= 2.5885), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:08,930 - root - INFO - Step 11660: lr=1.00E-05, loss= 1.2491 (max= 2.5885), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:08,930 - root - INFO - Step 11660: lr=1.00E-05, loss= 1.2491 (max= 2.5885), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:26,947 - root - INFO - Step 11670: lr=1.00E-05, loss= 1.2237 (max= 2.1702), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:26,947 - root - INFO - Step 11670: lr=1.00E-05, loss= 1.2237 (max= 2.1702), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:26,947 - root - INFO - Step 11670: lr=1.00E-05, loss= 1.2237 (max= 2.1702), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:26,947 - root - INFO - Step 11670: lr=1.00E-05, loss= 1.2237 (max= 2.1702), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:26,947 - root - INFO - Step 11670: lr=1.00E-05, loss= 1.2237 (max= 2.1702), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:26,947 - root - INFO - Step 11670: lr=1.00E-05, loss= 1.2237 (max= 2.1702), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:26,947 - root - INFO - Step 11670: lr=1.00E-05, loss= 1.2237 (max= 2.1702), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:26,947 - root - INFO - Step 11670: lr=1.00E-05, loss= 1.2237 (max= 2.1702), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:44,947 - root - INFO - Step 11680: lr=1.00E-05, loss= 1.2410 (max= 2.1785), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:44,947 - root - INFO - Step 11680: lr=1.00E-05, loss= 1.2410 (max= 2.1785), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:44,947 - root - INFO - Step 11680: lr=1.00E-05, loss= 1.2410 (max= 2.1785), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:44,947 - root - INFO - Step 11680: lr=1.00E-05, loss= 1.2410 (max= 2.1785), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:44,947 - root - INFO - Step 11680: lr=1.00E-05, loss= 1.2410 (max= 2.1785), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:44,947 - root - INFO - Step 11680: lr=1.00E-05, loss= 1.2410 (max= 2.1785), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:44,947 - root - INFO - Step 11680: lr=1.00E-05, loss= 1.2410 (max= 2.1785), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:18:44,947 - root - INFO - Step 11680: lr=1.00E-05, loss= 1.2410 (max= 2.1785), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:19:02,970 - root - INFO - Step 11690: lr=1.00E-05, loss= 1.2591 (max= 2.0747), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:02,971 - root - INFO - Step 11690: lr=1.00E-05, loss= 1.2591 (max= 2.0747), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:02,971 - root - INFO - Step 11690: lr=1.00E-05, loss= 1.2591 (max= 2.0747), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:02,971 - root - INFO - Step 11690: lr=1.00E-05, loss= 1.2591 (max= 2.0747), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:02,971 - root - INFO - Step 11690: lr=1.00E-05, loss= 1.2591 (max= 2.0747), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:02,971 - root - INFO - Step 11690: lr=1.00E-05, loss= 1.2591 (max= 2.0747), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:02,971 - root - INFO - Step 11690: lr=1.00E-05, loss= 1.2591 (max= 2.0747), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:02,971 - root - INFO - Step 11690: lr=1.00E-05, loss= 1.2591 (max= 2.0747), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:21,018 - root - INFO - Step 11700: lr=1.00E-05, loss= 1.2834 (max= 2.1311), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:21,018 - root - INFO - Step 11700: lr=1.00E-05, loss= 1.2834 (max= 2.1311), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:21,018 - root - INFO - Step 11700: lr=1.00E-05, loss= 1.2834 (max= 2.1311), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:21,018 - root - INFO - Step 11700: lr=1.00E-05, loss= 1.2834 (max= 2.1311), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:21,018 - root - INFO - Step 11700: lr=1.00E-05, loss= 1.2834 (max= 2.1311), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:21,018 - root - INFO - Step 11700: lr=1.00E-05, loss= 1.2834 (max= 2.1311), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:21,018 - root - INFO - Step 11700: lr=1.00E-05, loss= 1.2834 (max= 2.1311), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:21,019 - root - INFO - Step 11700: lr=1.00E-05, loss= 1.2834 (max= 2.1311), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:39,050 - root - INFO - Step 11710: lr=1.00E-05, loss= 1.2550 (max= 2.1530), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:39,050 - root - INFO - Step 11710: lr=1.00E-05, loss= 1.2550 (max= 2.1530), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:39,050 - root - INFO - Step 11710: lr=1.00E-05, loss= 1.2550 (max= 2.1530), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:39,050 - root - INFO - Step 11710: lr=1.00E-05, loss= 1.2550 (max= 2.1530), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:39,051 - root - INFO - Step 11710: lr=1.00E-05, loss= 1.2550 (max= 2.1530), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:39,051 - root - INFO - Step 11710: lr=1.00E-05, loss= 1.2550 (max= 2.1530), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:39,051 - root - INFO - Step 11710: lr=1.00E-05, loss= 1.2550 (max= 2.1530), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:39,051 - root - INFO - Step 11710: lr=1.00E-05, loss= 1.2550 (max= 2.1530), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:57,071 - root - INFO - Step 11720: lr=1.00E-05, loss= 1.2362 (max= 2.3508), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:57,071 - root - INFO - Step 11720: lr=1.00E-05, loss= 1.2362 (max= 2.3508), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:57,071 - root - INFO - Step 11720: lr=1.00E-05, loss= 1.2362 (max= 2.3508), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:57,071 - root - INFO - Step 11720: lr=1.00E-05, loss= 1.2362 (max= 2.3508), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:57,071 - root - INFO - Step 11720: lr=1.00E-05, loss= 1.2362 (max= 2.3508), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:57,071 - root - INFO - Step 11720: lr=1.00E-05, loss= 1.2362 (max= 2.3508), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:57,072 - root - INFO - Step 11720: lr=1.00E-05, loss= 1.2362 (max= 2.3508), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:19:57,072 - root - INFO - Step 11720: lr=1.00E-05, loss= 1.2362 (max= 2.3508), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:20:15,114 - root - INFO - Step 11730: lr=1.00E-05, loss= 1.2513 (max= 2.3213), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:20:15,114 - root - INFO - Step 11730: lr=1.00E-05, loss= 1.2513 (max= 2.3213), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:20:15,114 - root - INFO - Step 11730: lr=1.00E-05, loss= 1.2513 (max= 2.3213), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:20:15,114 - root - INFO - Step 11730: lr=1.00E-05, loss= 1.2513 (max= 2.3213), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:20:15,115 - root - INFO - Step 11730: lr=1.00E-05, loss= 1.2513 (max= 2.3213), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:20:15,115 - root - INFO - Step 11730: lr=1.00E-05, loss= 1.2513 (max= 2.3213), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:20:15,115 - root - INFO - Step 11730: lr=1.00E-05, loss= 1.2513 (max= 2.3213), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:20:15,115 - root - INFO - Step 11730: lr=1.00E-05, loss= 1.2513 (max= 2.3213), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:20:33,119 - root - INFO - Step 11740: lr=1.00E-05, loss= 1.2582 (max= 2.2462), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:20:33,119 - root - INFO - Step 11740: lr=1.00E-05, loss= 1.2582 (max= 2.2462), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:20:33,119 - root - INFO - Step 11740: lr=1.00E-05, loss= 1.2582 (max= 2.2462), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:20:33,120 - root - INFO - Step 11740: lr=1.00E-05, loss= 1.2582 (max= 2.2462), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:20:33,120 - root - INFO - Step 11740: lr=1.00E-05, loss= 1.2582 (max= 2.2462), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:20:33,120 - root - INFO - Step 11740: lr=1.00E-05, loss= 1.2582 (max= 2.2462), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:20:33,120 - root - INFO - Step 11740: lr=1.00E-05, loss= 1.2582 (max= 2.2462), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:20:33,120 - root - INFO - Step 11740: lr=1.00E-05, loss= 1.2582 (max= 2.2462), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:20:51,144 - root - INFO - Step 11750: lr=1.00E-05, loss= 1.2588 (max= 2.1730), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:20:51,144 - root - INFO - Step 11750: lr=1.00E-05, loss= 1.2588 (max= 2.1730), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:20:51,144 - root - INFO - Step 11750: lr=1.00E-05, loss= 1.2588 (max= 2.1730), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:20:51,144 - root - INFO - Step 11750: lr=1.00E-05, loss= 1.2588 (max= 2.1730), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:20:51,144 - root - INFO - Step 11750: lr=1.00E-05, loss= 1.2588 (max= 2.1730), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:20:51,145 - root - INFO - Step 11750: lr=1.00E-05, loss= 1.2588 (max= 2.1730), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:20:51,145 - root - INFO - Step 11750: lr=1.00E-05, loss= 1.2588 (max= 2.1730), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:20:51,145 - root - INFO - Step 11750: lr=1.00E-05, loss= 1.2588 (max= 2.1730), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:21:09,150 - root - INFO - Step 11760: lr=1.00E-05, loss= 1.2203 (max= 1.9026), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:09,150 - root - INFO - Step 11760: lr=1.00E-05, loss= 1.2203 (max= 1.9026), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:09,150 - root - INFO - Step 11760: lr=1.00E-05, loss= 1.2203 (max= 1.9026), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:09,150 - root - INFO - Step 11760: lr=1.00E-05, loss= 1.2203 (max= 1.9026), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:09,150 - root - INFO - Step 11760: lr=1.00E-05, loss= 1.2203 (max= 1.9026), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:09,150 - root - INFO - Step 11760: lr=1.00E-05, loss= 1.2203 (max= 1.9026), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:09,150 - root - INFO - Step 11760: lr=1.00E-05, loss= 1.2203 (max= 1.9026), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:09,151 - root - INFO - Step 11760: lr=1.00E-05, loss= 1.2203 (max= 1.9026), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:27,174 - root - INFO - Step 11770: lr=1.00E-05, loss= 1.2389 (max= 2.4517), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:27,174 - root - INFO - Step 11770: lr=1.00E-05, loss= 1.2389 (max= 2.4517), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:27,174 - root - INFO - Step 11770: lr=1.00E-05, loss= 1.2389 (max= 2.4517), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:27,174 - root - INFO - Step 11770: lr=1.00E-05, loss= 1.2389 (max= 2.4517), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:27,174 - root - INFO - Step 11770: lr=1.00E-05, loss= 1.2389 (max= 2.4517), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:27,174 - root - INFO - Step 11770: lr=1.00E-05, loss= 1.2389 (max= 2.4517), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:27,174 - root - INFO - Step 11770: lr=1.00E-05, loss= 1.2389 (max= 2.4517), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:27,174 - root - INFO - Step 11770: lr=1.00E-05, loss= 1.2389 (max= 2.4517), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:45,213 - root - INFO - Step 11780: lr=1.00E-05, loss= 1.2412 (max= 2.2707), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:45,213 - root - INFO - Step 11780: lr=1.00E-05, loss= 1.2412 (max= 2.2707), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:45,213 - root - INFO - Step 11780: lr=1.00E-05, loss= 1.2412 (max= 2.2707), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:45,213 - root - INFO - Step 11780: lr=1.00E-05, loss= 1.2412 (max= 2.2707), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:45,213 - root - INFO - Step 11780: lr=1.00E-05, loss= 1.2412 (max= 2.2707), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:45,213 - root - INFO - Step 11780: lr=1.00E-05, loss= 1.2412 (max= 2.2707), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:45,213 - root - INFO - Step 11780: lr=1.00E-05, loss= 1.2412 (max= 2.2707), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:21:45,213 - root - INFO - Step 11780: lr=1.00E-05, loss= 1.2412 (max= 2.2707), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:03,216 - root - INFO - Step 11790: lr=1.00E-05, loss= 1.2507 (max= 2.5863), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:03,216 - root - INFO - Step 11790: lr=1.00E-05, loss= 1.2507 (max= 2.5863), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:03,217 - root - INFO - Step 11790: lr=1.00E-05, loss= 1.2507 (max= 2.5863), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:03,217 - root - INFO - Step 11790: lr=1.00E-05, loss= 1.2507 (max= 2.5863), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:03,217 - root - INFO - Step 11790: lr=1.00E-05, loss= 1.2507 (max= 2.5863), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:03,217 - root - INFO - Step 11790: lr=1.00E-05, loss= 1.2507 (max= 2.5863), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:03,217 - root - INFO - Step 11790: lr=1.00E-05, loss= 1.2507 (max= 2.5863), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:03,217 - root - INFO - Step 11790: lr=1.00E-05, loss= 1.2507 (max= 2.5863), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:21,227 - root - INFO - Step 11800: lr=1.00E-05, loss= 1.2418 (max= 2.3014), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:21,227 - root - INFO - Step 11800: lr=1.00E-05, loss= 1.2418 (max= 2.3014), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:21,228 - root - INFO - Step 11800: lr=1.00E-05, loss= 1.2418 (max= 2.3014), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:21,228 - root - INFO - Step 11800: lr=1.00E-05, loss= 1.2418 (max= 2.3014), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:21,228 - root - INFO - Step 11800: lr=1.00E-05, loss= 1.2418 (max= 2.3014), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:21,228 - root - INFO - Step 11800: lr=1.00E-05, loss= 1.2418 (max= 2.3014), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:21,228 - root - INFO - Step 11800: lr=1.00E-05, loss= 1.2418 (max= 2.3014), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:21,228 - root - INFO - Step 11800: lr=1.00E-05, loss= 1.2418 (max= 2.3014), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:39,248 - root - INFO - Step 11810: lr=1.00E-05, loss= 1.2154 (max= 1.9863), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:39,248 - root - INFO - Step 11810: lr=1.00E-05, loss= 1.2154 (max= 1.9863), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:39,248 - root - INFO - Step 11810: lr=1.00E-05, loss= 1.2154 (max= 1.9863), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:39,248 - root - INFO - Step 11810: lr=1.00E-05, loss= 1.2154 (max= 1.9863), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:39,248 - root - INFO - Step 11810: lr=1.00E-05, loss= 1.2154 (max= 1.9863), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:39,248 - root - INFO - Step 11810: lr=1.00E-05, loss= 1.2154 (max= 1.9863), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:39,248 - root - INFO - Step 11810: lr=1.00E-05, loss= 1.2154 (max= 1.9863), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:39,248 - root - INFO - Step 11810: lr=1.00E-05, loss= 1.2154 (max= 1.9863), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:57,243 - root - INFO - Step 11820: lr=1.00E-05, loss= 1.2293 (max= 2.0392), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:57,243 - root - INFO - Step 11820: lr=1.00E-05, loss= 1.2293 (max= 2.0392), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:57,243 - root - INFO - Step 11820: lr=1.00E-05, loss= 1.2293 (max= 2.0392), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:57,243 - root - INFO - Step 11820: lr=1.00E-05, loss= 1.2293 (max= 2.0392), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:57,243 - root - INFO - Step 11820: lr=1.00E-05, loss= 1.2293 (max= 2.0392), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:57,243 - root - INFO - Step 11820: lr=1.00E-05, loss= 1.2293 (max= 2.0392), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:57,243 - root - INFO - Step 11820: lr=1.00E-05, loss= 1.2293 (max= 2.0392), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:22:57,243 - root - INFO - Step 11820: lr=1.00E-05, loss= 1.2293 (max= 2.0392), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:23:15,269 - root - INFO - Step 11830: lr=1.00E-05, loss= 1.2166 (max= 2.2734), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:23:15,269 - root - INFO - Step 11830: lr=1.00E-05, loss= 1.2166 (max= 2.2734), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:23:15,269 - root - INFO - Step 11830: lr=1.00E-05, loss= 1.2166 (max= 2.2734), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:23:15,269 - root - INFO - Step 11830: lr=1.00E-05, loss= 1.2166 (max= 2.2734), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:23:15,269 - root - INFO - Step 11830: lr=1.00E-05, loss= 1.2166 (max= 2.2734), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:23:15,269 - root - INFO - Step 11830: lr=1.00E-05, loss= 1.2166 (max= 2.2734), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:23:15,269 - root - INFO - Step 11830: lr=1.00E-05, loss= 1.2166 (max= 2.2734), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:23:15,269 - root - INFO - Step 11830: lr=1.00E-05, loss= 1.2166 (max= 2.2734), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:23:33,304 - root - INFO - Step 11840: lr=1.00E-05, loss= 1.2643 (max= 2.1336), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:23:33,304 - root - INFO - Step 11840: lr=1.00E-05, loss= 1.2643 (max= 2.1336), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:23:33,304 - root - INFO - Step 11840: lr=1.00E-05, loss= 1.2643 (max= 2.1336), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:23:33,304 - root - INFO - Step 11840: lr=1.00E-05, loss= 1.2643 (max= 2.1336), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:23:33,304 - root - INFO - Step 11840: lr=1.00E-05, loss= 1.2643 (max= 2.1336), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:23:33,305 - root - INFO - Step 11840: lr=1.00E-05, loss= 1.2643 (max= 2.1336), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:23:33,305 - root - INFO - Step 11840: lr=1.00E-05, loss= 1.2643 (max= 2.1336), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:23:33,305 - root - INFO - Step 11840: lr=1.00E-05, loss= 1.2643 (max= 2.1336), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:23:51,325 - root - INFO - Step 11850: lr=1.00E-05, loss= 1.2397 (max= 2.1861), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:23:51,325 - root - INFO - Step 11850: lr=1.00E-05, loss= 1.2397 (max= 2.1861), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:23:51,325 - root - INFO - Step 11850: lr=1.00E-05, loss= 1.2397 (max= 2.1861), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:23:51,325 - root - INFO - Step 11850: lr=1.00E-05, loss= 1.2397 (max= 2.1861), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:23:51,325 - root - INFO - Step 11850: lr=1.00E-05, loss= 1.2397 (max= 2.1861), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:23:51,325 - root - INFO - Step 11850: lr=1.00E-05, loss= 1.2397 (max= 2.1861), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:23:51,325 - root - INFO - Step 11850: lr=1.00E-05, loss= 1.2397 (max= 2.1861), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:23:51,325 - root - INFO - Step 11850: lr=1.00E-05, loss= 1.2397 (max= 2.1861), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:24:20,171 - root - INFO - Step 11860: lr=1.00E-05, loss= 1.2441 (max= 2.6129), tps=11361, mfu=23.67%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.28s, 38.43%) +2025-10-24 15:24:20,171 - root - INFO - Step 11860: lr=1.00E-05, loss= 1.2441 (max= 2.6129), tps=11361, mfu=23.67%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.28s, 38.43%) +2025-10-24 15:24:20,171 - root - INFO - Step 11860: lr=1.00E-05, loss= 1.2441 (max= 2.6129), tps=11361, mfu=23.67%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.28s, 38.43%) +2025-10-24 15:24:20,171 - root - INFO - Step 11860: lr=1.00E-05, loss= 1.2441 (max= 2.6129), tps=11361, mfu=23.67%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.28s, 38.43%) +2025-10-24 15:24:20,171 - root - INFO - Step 11860: lr=1.00E-05, loss= 1.2441 (max= 2.6129), tps=11361, mfu=23.67%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.28s, 38.43%) +2025-10-24 15:24:20,171 - root - INFO - Step 11860: lr=1.00E-05, loss= 1.2441 (max= 2.6129), tps=11361, mfu=23.67%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.28s, 38.43%) +2025-10-24 15:24:20,171 - root - INFO - Step 11860: lr=1.00E-05, loss= 1.2441 (max= 2.6129), tps=11361, mfu=23.67%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.28s, 38.43%) +2025-10-24 15:24:20,171 - root - INFO - Step 11860: lr=1.00E-05, loss= 1.2441 (max= 2.6129), tps=11361, mfu=23.67%, memory: 78.54GiB(44.03%) time/data_loading=0.03s (max=0.28s, 38.43%) +2025-10-24 15:24:38,181 - root - INFO - Step 11870: lr=1.00E-05, loss= 1.2447 (max= 2.0700), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:24:38,181 - root - INFO - Step 11870: lr=1.00E-05, loss= 1.2447 (max= 2.0700), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:24:38,182 - root - INFO - Step 11870: lr=1.00E-05, loss= 1.2447 (max= 2.0700), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:24:38,182 - root - INFO - Step 11870: lr=1.00E-05, loss= 1.2447 (max= 2.0700), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:24:38,182 - root - INFO - Step 11870: lr=1.00E-05, loss= 1.2447 (max= 2.0700), tps=18198, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:24:38,182 - root - INFO - Step 11870: lr=1.00E-05, loss= 1.2447 (max= 2.0700), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:24:38,182 - root - INFO - Step 11870: lr=1.00E-05, loss= 1.2447 (max= 2.0700), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:24:38,182 - root - INFO - Step 11870: lr=1.00E-05, loss= 1.2447 (max= 2.0700), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:24:56,206 - root - INFO - Step 11880: lr=1.00E-05, loss= 1.1997 (max= 2.4541), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:24:56,206 - root - INFO - Step 11880: lr=1.00E-05, loss= 1.1997 (max= 2.4541), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:24:56,206 - root - INFO - Step 11880: lr=1.00E-05, loss= 1.1997 (max= 2.4541), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:24:56,206 - root - INFO - Step 11880: lr=1.00E-05, loss= 1.1997 (max= 2.4541), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:24:56,206 - root - INFO - Step 11880: lr=1.00E-05, loss= 1.1997 (max= 2.4541), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:24:56,206 - root - INFO - Step 11880: lr=1.00E-05, loss= 1.1997 (max= 2.4541), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:24:56,206 - root - INFO - Step 11880: lr=1.00E-05, loss= 1.1997 (max= 2.4541), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:24:56,206 - root - INFO - Step 11880: lr=1.00E-05, loss= 1.1997 (max= 2.4541), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:25:14,229 - root - INFO - Step 11890: lr=1.00E-05, loss= 1.2432 (max= 3.0792), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:25:14,229 - root - INFO - Step 11890: lr=1.00E-05, loss= 1.2432 (max= 3.0792), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:25:14,230 - root - INFO - Step 11890: lr=1.00E-05, loss= 1.2432 (max= 3.0792), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:25:14,230 - root - INFO - Step 11890: lr=1.00E-05, loss= 1.2432 (max= 3.0792), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:25:14,230 - root - INFO - Step 11890: lr=1.00E-05, loss= 1.2432 (max= 3.0792), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:25:14,230 - root - INFO - Step 11890: lr=1.00E-05, loss= 1.2432 (max= 3.0792), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:25:14,230 - root - INFO - Step 11890: lr=1.00E-05, loss= 1.2432 (max= 3.0792), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:25:14,230 - root - INFO - Step 11890: lr=1.00E-05, loss= 1.2432 (max= 3.0792), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:25:32,246 - root - INFO - Step 11900: lr=1.00E-05, loss= 1.2680 (max= 2.6247), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:25:32,246 - root - INFO - Step 11900: lr=1.00E-05, loss= 1.2680 (max= 2.6247), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:25:32,246 - root - INFO - Step 11900: lr=1.00E-05, loss= 1.2680 (max= 2.6247), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:25:32,246 - root - INFO - Step 11900: lr=1.00E-05, loss= 1.2680 (max= 2.6247), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:25:32,246 - root - INFO - Step 11900: lr=1.00E-05, loss= 1.2680 (max= 2.6247), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:25:32,246 - root - INFO - Step 11900: lr=1.00E-05, loss= 1.2680 (max= 2.6247), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:25:32,246 - root - INFO - Step 11900: lr=1.00E-05, loss= 1.2680 (max= 2.6247), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:25:32,246 - root - INFO - Step 11900: lr=1.00E-05, loss= 1.2680 (max= 2.6247), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:25:50,272 - root - INFO - Step 11910: lr=1.00E-05, loss= 1.2221 (max= 2.3731), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:25:50,272 - root - INFO - Step 11910: lr=1.00E-05, loss= 1.2221 (max= 2.3731), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:25:50,272 - root - INFO - Step 11910: lr=1.00E-05, loss= 1.2221 (max= 2.3731), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:25:50,272 - root - INFO - Step 11910: lr=1.00E-05, loss= 1.2221 (max= 2.3731), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:25:50,272 - root - INFO - Step 11910: lr=1.00E-05, loss= 1.2221 (max= 2.3731), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:25:50,272 - root - INFO - Step 11910: lr=1.00E-05, loss= 1.2221 (max= 2.3731), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:25:50,273 - root - INFO - Step 11910: lr=1.00E-05, loss= 1.2221 (max= 2.3731), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:25:50,273 - root - INFO - Step 11910: lr=1.00E-05, loss= 1.2221 (max= 2.3731), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:26:08,313 - root - INFO - Step 11920: lr=1.00E-05, loss= 1.2252 (max= 2.1336), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:08,313 - root - INFO - Step 11920: lr=1.00E-05, loss= 1.2252 (max= 2.1336), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:08,313 - root - INFO - Step 11920: lr=1.00E-05, loss= 1.2252 (max= 2.1336), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:08,313 - root - INFO - Step 11920: lr=1.00E-05, loss= 1.2252 (max= 2.1336), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:08,313 - root - INFO - Step 11920: lr=1.00E-05, loss= 1.2252 (max= 2.1336), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:08,313 - root - INFO - Step 11920: lr=1.00E-05, loss= 1.2252 (max= 2.1336), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:08,313 - root - INFO - Step 11920: lr=1.00E-05, loss= 1.2252 (max= 2.1336), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:08,313 - root - INFO - Step 11920: lr=1.00E-05, loss= 1.2252 (max= 2.1336), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:26,334 - root - INFO - Step 11930: lr=1.00E-05, loss= 1.2354 (max= 2.3626), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:26,334 - root - INFO - Step 11930: lr=1.00E-05, loss= 1.2354 (max= 2.3626), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:26,334 - root - INFO - Step 11930: lr=1.00E-05, loss= 1.2354 (max= 2.3626), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:26,334 - root - INFO - Step 11930: lr=1.00E-05, loss= 1.2354 (max= 2.3626), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:26,334 - root - INFO - Step 11930: lr=1.00E-05, loss= 1.2354 (max= 2.3626), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:26,334 - root - INFO - Step 11930: lr=1.00E-05, loss= 1.2354 (max= 2.3626), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:26,334 - root - INFO - Step 11930: lr=1.00E-05, loss= 1.2354 (max= 2.3626), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:26,334 - root - INFO - Step 11930: lr=1.00E-05, loss= 1.2354 (max= 2.3626), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:44,364 - root - INFO - Step 11940: lr=1.00E-05, loss= 1.2476 (max= 2.0543), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:44,364 - root - INFO - Step 11940: lr=1.00E-05, loss= 1.2476 (max= 2.0543), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:44,364 - root - INFO - Step 11940: lr=1.00E-05, loss= 1.2476 (max= 2.0543), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:44,364 - root - INFO - Step 11940: lr=1.00E-05, loss= 1.2476 (max= 2.0543), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:44,364 - root - INFO - Step 11940: lr=1.00E-05, loss= 1.2476 (max= 2.0543), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:44,364 - root - INFO - Step 11940: lr=1.00E-05, loss= 1.2476 (max= 2.0543), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:44,364 - root - INFO - Step 11940: lr=1.00E-05, loss= 1.2476 (max= 2.0543), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:26:44,364 - root - INFO - Step 11940: lr=1.00E-05, loss= 1.2476 (max= 2.0543), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:02,398 - root - INFO - Step 11950: lr=1.00E-05, loss= 1.2491 (max= 2.3592), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:02,398 - root - INFO - Step 11950: lr=1.00E-05, loss= 1.2491 (max= 2.3592), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:02,398 - root - INFO - Step 11950: lr=1.00E-05, loss= 1.2491 (max= 2.3592), tps=18174, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:02,398 - root - INFO - Step 11950: lr=1.00E-05, loss= 1.2491 (max= 2.3592), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:02,398 - root - INFO - Step 11950: lr=1.00E-05, loss= 1.2491 (max= 2.3592), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:02,398 - root - INFO - Step 11950: lr=1.00E-05, loss= 1.2491 (max= 2.3592), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:02,398 - root - INFO - Step 11950: lr=1.00E-05, loss= 1.2491 (max= 2.3592), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:02,398 - root - INFO - Step 11950: lr=1.00E-05, loss= 1.2491 (max= 2.3592), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:20,423 - root - INFO - Step 11960: lr=1.00E-05, loss= 1.2140 (max= 2.0473), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:20,423 - root - INFO - Step 11960: lr=1.00E-05, loss= 1.2140 (max= 2.0473), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:20,423 - root - INFO - Step 11960: lr=1.00E-05, loss= 1.2140 (max= 2.0473), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:20,423 - root - INFO - Step 11960: lr=1.00E-05, loss= 1.2140 (max= 2.0473), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:20,423 - root - INFO - Step 11960: lr=1.00E-05, loss= 1.2140 (max= 2.0473), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:20,423 - root - INFO - Step 11960: lr=1.00E-05, loss= 1.2140 (max= 2.0473), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:20,423 - root - INFO - Step 11960: lr=1.00E-05, loss= 1.2140 (max= 2.0473), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:20,423 - root - INFO - Step 11960: lr=1.00E-05, loss= 1.2140 (max= 2.0473), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:38,446 - root - INFO - Step 11970: lr=1.00E-05, loss= 1.2542 (max= 3.2537), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:27:38,447 - root - INFO - Step 11970: lr=1.00E-05, loss= 1.2542 (max= 3.2537), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:27:38,447 - root - INFO - Step 11970: lr=1.00E-05, loss= 1.2542 (max= 3.2537), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:27:38,447 - root - INFO - Step 11970: lr=1.00E-05, loss= 1.2542 (max= 3.2537), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:27:38,447 - root - INFO - Step 11970: lr=1.00E-05, loss= 1.2542 (max= 3.2537), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:27:38,447 - root - INFO - Step 11970: lr=1.00E-05, loss= 1.2542 (max= 3.2537), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:27:38,447 - root - INFO - Step 11970: lr=1.00E-05, loss= 1.2542 (max= 3.2537), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:27:38,447 - root - INFO - Step 11970: lr=1.00E-05, loss= 1.2542 (max= 3.2537), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:27:56,471 - root - INFO - Step 11980: lr=1.00E-05, loss= 1.1862 (max= 2.1911), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:56,472 - root - INFO - Step 11980: lr=1.00E-05, loss= 1.1862 (max= 2.1911), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:56,472 - root - INFO - Step 11980: lr=1.00E-05, loss= 1.1862 (max= 2.1911), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:56,472 - root - INFO - Step 11980: lr=1.00E-05, loss= 1.1862 (max= 2.1911), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:56,472 - root - INFO - Step 11980: lr=1.00E-05, loss= 1.1862 (max= 2.1911), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:56,472 - root - INFO - Step 11980: lr=1.00E-05, loss= 1.1862 (max= 2.1911), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:56,472 - root - INFO - Step 11980: lr=1.00E-05, loss= 1.1862 (max= 2.1911), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:27:56,472 - root - INFO - Step 11980: lr=1.00E-05, loss= 1.1862 (max= 2.1911), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:28:14,486 - root - INFO - Step 11990: lr=1.00E-05, loss= 1.2401 (max= 2.2675), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:28:14,487 - root - INFO - Step 11990: lr=1.00E-05, loss= 1.2401 (max= 2.2675), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:28:14,487 - root - INFO - Step 11990: lr=1.00E-05, loss= 1.2401 (max= 2.2675), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:28:14,487 - root - INFO - Step 11990: lr=1.00E-05, loss= 1.2401 (max= 2.2675), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:28:14,487 - root - INFO - Step 11990: lr=1.00E-05, loss= 1.2401 (max= 2.2675), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:28:14,487 - root - INFO - Step 11990: lr=1.00E-05, loss= 1.2401 (max= 2.2675), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:28:14,487 - root - INFO - Step 11990: lr=1.00E-05, loss= 1.2401 (max= 2.2675), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:28:14,487 - root - INFO - Step 11990: lr=1.00E-05, loss= 1.2401 (max= 2.2675), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-12000 +2025-10-24 15:28:32,511 - root - INFO - Step 12000: lr=1.00E-05, loss= 1.2519 (max= 2.1474), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:28:32,512 - root - INFO - Saving a full checkpoint at step 12000 +2025-10-24 15:28:32,512 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 15:28:32,512 - root - INFO - Step 12000: lr=1.00E-05, loss= 1.2519 (max= 2.1474), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:28:32,512 - root - INFO - Saving a full checkpoint at step 12000 +2025-10-24 15:28:32,512 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 15:28:32,512 - root - INFO - Step 12000: lr=1.00E-05, loss= 1.2519 (max= 2.1474), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:28:32,512 - root - INFO - Step 12000: lr=1.00E-05, loss= 1.2519 (max= 2.1474), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:28:32,512 - root - INFO - Saving a full checkpoint at step 12000 +2025-10-24 15:28:32,512 - root - INFO - Saving a full checkpoint at step 12000 +2025-10-24 15:28:32,512 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 15:28:32,512 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 15:28:32,512 - root - INFO - Step 12000: lr=1.00E-05, loss= 1.2519 (max= 2.1474), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:28:32,512 - root - INFO - Step 12000: lr=1.00E-05, loss= 1.2519 (max= 2.1474), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:28:32,512 - root - INFO - Saving a full checkpoint at step 12000 +2025-10-24 15:28:32,512 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 15:28:32,512 - root - INFO - Saving a full checkpoint at step 12000 +2025-10-24 15:28:32,512 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 15:28:32,512 - root - INFO - Step 12000: lr=1.00E-05, loss= 1.2519 (max= 2.1474), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:28:32,513 - root - INFO - Saving a full checkpoint at step 12000 +2025-10-24 15:28:32,513 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 15:28:32,513 - root - INFO - Step 12000: lr=1.00E-05, loss= 1.2519 (max= 2.1474), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:28:32,513 - root - INFO - Saving a full checkpoint at step 12000 +2025-10-24 15:28:32,513 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-12000! Save time: 4.656987190246582 +2025-10-24 15:28:50,173 - root - INFO - Finished saving the checkpoint in 17.66 seconds +2025-10-24 15:28:50,181 - root - INFO - Finished saving the checkpoint in 17.67 seconds +2025-10-24 15:28:50,182 - root - INFO - Finished saving the checkpoint in 17.67 seconds +2025-10-24 15:28:50,182 - root - INFO - Finished saving the checkpoint in 17.67 seconds +2025-10-24 15:28:50,182 - root - INFO - Finished saving the checkpoint in 17.67 seconds +2025-10-24 15:28:50,182 - root - INFO - Finished saving the checkpoint in 17.67 seconds +2025-10-24 15:28:50,183 - root - INFO - Finished saving the checkpoint in 17.67 seconds +2025-10-24 15:28:50,183 - root - INFO - Finished saving the checkpoint in 17.67 seconds +2025-10-24 15:29:08,164 - root - INFO - Step 12010: lr=1.00E-05, loss= 1.2326 (max= 2.3195), tps=9193, mfu=19.15%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 15:29:08,164 - root - INFO - Step 12010: lr=1.00E-05, loss= 1.2326 (max= 2.3195), tps=9193, mfu=19.15%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 15:29:08,164 - root - INFO - Step 12010: lr=1.00E-05, loss= 1.2326 (max= 2.3195), tps=9193, mfu=19.15%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 15:29:08,164 - root - INFO - Step 12010: lr=1.00E-05, loss= 1.2326 (max= 2.3195), tps=9193, mfu=19.15%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 15:29:08,164 - root - INFO - Step 12010: lr=1.00E-05, loss= 1.2326 (max= 2.3195), tps=9193, mfu=19.15%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 15:29:08,164 - root - INFO - Step 12010: lr=1.00E-05, loss= 1.2326 (max= 2.3195), tps=9193, mfu=19.15%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 15:29:08,164 - root - INFO - Step 12010: lr=1.00E-05, loss= 1.2326 (max= 2.3195), tps=9193, mfu=19.15%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 15:29:08,168 - root - INFO - Step 12010: lr=1.00E-05, loss= 1.2326 (max= 2.3195), tps=9192, mfu=19.15%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 15:29:26,222 - root - INFO - Step 12020: lr=1.00E-05, loss= 1.1958 (max= 2.2837), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:29:26,222 - root - INFO - Step 12020: lr=1.00E-05, loss= 1.1958 (max= 2.2837), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:29:26,222 - root - INFO - Step 12020: lr=1.00E-05, loss= 1.1958 (max= 2.2837), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:29:26,222 - root - INFO - Step 12020: lr=1.00E-05, loss= 1.1958 (max= 2.2837), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:29:26,222 - root - INFO - Step 12020: lr=1.00E-05, loss= 1.1958 (max= 2.2837), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:29:26,222 - root - INFO - Step 12020: lr=1.00E-05, loss= 1.1958 (max= 2.2837), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:29:26,222 - root - INFO - Step 12020: lr=1.00E-05, loss= 1.1958 (max= 2.2837), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:29:26,222 - root - INFO - Step 12020: lr=1.00E-05, loss= 1.1958 (max= 2.2837), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:29:44,273 - root - INFO - Step 12030: lr=1.00E-05, loss= 1.2115 (max= 2.1080), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:29:44,274 - root - INFO - Step 12030: lr=1.00E-05, loss= 1.2115 (max= 2.1080), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:29:44,274 - root - INFO - Step 12030: lr=1.00E-05, loss= 1.2115 (max= 2.1080), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:29:44,274 - root - INFO - Step 12030: lr=1.00E-05, loss= 1.2115 (max= 2.1080), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:29:44,274 - root - INFO - Step 12030: lr=1.00E-05, loss= 1.2115 (max= 2.1080), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:29:44,274 - root - INFO - Step 12030: lr=1.00E-05, loss= 1.2115 (max= 2.1080), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:29:44,274 - root - INFO - Step 12030: lr=1.00E-05, loss= 1.2115 (max= 2.1080), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:29:44,274 - root - INFO - Step 12030: lr=1.00E-05, loss= 1.2115 (max= 2.1080), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:02,323 - root - INFO - Step 12040: lr=1.00E-05, loss= 1.2250 (max= 2.1431), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:02,323 - root - INFO - Step 12040: lr=1.00E-05, loss= 1.2250 (max= 2.1431), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:02,323 - root - INFO - Step 12040: lr=1.00E-05, loss= 1.2250 (max= 2.1431), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:02,324 - root - INFO - Step 12040: lr=1.00E-05, loss= 1.2250 (max= 2.1431), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:02,324 - root - INFO - Step 12040: lr=1.00E-05, loss= 1.2250 (max= 2.1431), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:02,324 - root - INFO - Step 12040: lr=1.00E-05, loss= 1.2250 (max= 2.1431), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:02,324 - root - INFO - Step 12040: lr=1.00E-05, loss= 1.2250 (max= 2.1431), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:02,324 - root - INFO - Step 12040: lr=1.00E-05, loss= 1.2250 (max= 2.1431), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:20,312 - root - INFO - Step 12050: lr=1.00E-05, loss= 1.2259 (max= 2.1313), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:20,312 - root - INFO - Step 12050: lr=1.00E-05, loss= 1.2259 (max= 2.1313), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:20,312 - root - INFO - Step 12050: lr=1.00E-05, loss= 1.2259 (max= 2.1313), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:20,312 - root - INFO - Step 12050: lr=1.00E-05, loss= 1.2259 (max= 2.1313), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:20,312 - root - INFO - Step 12050: lr=1.00E-05, loss= 1.2259 (max= 2.1313), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:20,312 - root - INFO - Step 12050: lr=1.00E-05, loss= 1.2259 (max= 2.1313), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:20,313 - root - INFO - Step 12050: lr=1.00E-05, loss= 1.2259 (max= 2.1313), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:20,313 - root - INFO - Step 12050: lr=1.00E-05, loss= 1.2259 (max= 2.1313), tps=18219, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:21,405 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:5119734 +2025-10-24 15:30:38,345 - root - INFO - Step 12060: lr=1.00E-05, loss= 1.1931 (max= 2.1951), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:38,345 - root - INFO - Step 12060: lr=1.00E-05, loss= 1.1931 (max= 2.1951), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:38,345 - root - INFO - Step 12060: lr=1.00E-05, loss= 1.1931 (max= 2.1951), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:38,345 - root - INFO - Step 12060: lr=1.00E-05, loss= 1.1931 (max= 2.1951), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:38,345 - root - INFO - Step 12060: lr=1.00E-05, loss= 1.1931 (max= 2.1951), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:38,345 - root - INFO - Step 12060: lr=1.00E-05, loss= 1.1931 (max= 2.1951), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:38,345 - root - INFO - Step 12060: lr=1.00E-05, loss= 1.1931 (max= 2.1951), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:38,346 - root - INFO - Step 12060: lr=1.00E-05, loss= 1.1931 (max= 2.1951), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:56,410 - root - INFO - Step 12070: lr=1.00E-05, loss= 1.2264 (max= 2.1764), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:56,410 - root - INFO - Step 12070: lr=1.00E-05, loss= 1.2264 (max= 2.1764), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:56,410 - root - INFO - Step 12070: lr=1.00E-05, loss= 1.2264 (max= 2.1764), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:56,410 - root - INFO - Step 12070: lr=1.00E-05, loss= 1.2264 (max= 2.1764), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:56,410 - root - INFO - Step 12070: lr=1.00E-05, loss= 1.2264 (max= 2.1764), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:56,410 - root - INFO - Step 12070: lr=1.00E-05, loss= 1.2264 (max= 2.1764), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:56,410 - root - INFO - Step 12070: lr=1.00E-05, loss= 1.2264 (max= 2.1764), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:30:56,410 - root - INFO - Step 12070: lr=1.00E-05, loss= 1.2264 (max= 2.1764), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:14,456 - root - INFO - Step 12080: lr=1.00E-05, loss= 1.2369 (max= 2.1133), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:14,456 - root - INFO - Step 12080: lr=1.00E-05, loss= 1.2369 (max= 2.1133), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:14,456 - root - INFO - Step 12080: lr=1.00E-05, loss= 1.2369 (max= 2.1133), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:14,456 - root - INFO - Step 12080: lr=1.00E-05, loss= 1.2369 (max= 2.1133), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:14,456 - root - INFO - Step 12080: lr=1.00E-05, loss= 1.2369 (max= 2.1133), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:14,456 - root - INFO - Step 12080: lr=1.00E-05, loss= 1.2369 (max= 2.1133), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:14,456 - root - INFO - Step 12080: lr=1.00E-05, loss= 1.2369 (max= 2.1133), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:14,459 - root - INFO - Step 12080: lr=1.00E-05, loss= 1.2369 (max= 2.1133), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:32,441 - root - INFO - Step 12090: lr=1.00E-05, loss= 1.2431 (max= 2.2958), tps=18223, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:32,441 - root - INFO - Step 12090: lr=1.00E-05, loss= 1.2431 (max= 2.2958), tps=18223, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:32,441 - root - INFO - Step 12090: lr=1.00E-05, loss= 1.2431 (max= 2.2958), tps=18223, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:32,441 - root - INFO - Step 12090: lr=1.00E-05, loss= 1.2431 (max= 2.2958), tps=18223, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:32,441 - root - INFO - Step 12090: lr=1.00E-05, loss= 1.2431 (max= 2.2958), tps=18223, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:32,441 - root - INFO - Step 12090: lr=1.00E-05, loss= 1.2431 (max= 2.2958), tps=18223, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:32,441 - root - INFO - Step 12090: lr=1.00E-05, loss= 1.2431 (max= 2.2958), tps=18223, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:32,441 - root - INFO - Step 12090: lr=1.00E-05, loss= 1.2431 (max= 2.2958), tps=18226, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:50,483 - root - INFO - Step 12100: lr=1.00E-05, loss= 1.2085 (max= 2.0226), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:50,483 - root - INFO - Step 12100: lr=1.00E-05, loss= 1.2085 (max= 2.0226), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:50,483 - root - INFO - Step 12100: lr=1.00E-05, loss= 1.2085 (max= 2.0226), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:50,483 - root - INFO - Step 12100: lr=1.00E-05, loss= 1.2085 (max= 2.0226), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:50,483 - root - INFO - Step 12100: lr=1.00E-05, loss= 1.2085 (max= 2.0226), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:50,483 - root - INFO - Step 12100: lr=1.00E-05, loss= 1.2085 (max= 2.0226), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:50,483 - root - INFO - Step 12100: lr=1.00E-05, loss= 1.2085 (max= 2.0226), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:31:50,483 - root - INFO - Step 12100: lr=1.00E-05, loss= 1.2085 (max= 2.0226), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:08,537 - root - INFO - Step 12110: lr=1.00E-05, loss= 1.2009 (max= 2.0789), tps=18154, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:08,537 - root - INFO - Step 12110: lr=1.00E-05, loss= 1.2009 (max= 2.0789), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:08,537 - root - INFO - Step 12110: lr=1.00E-05, loss= 1.2009 (max= 2.0789), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:08,537 - root - INFO - Step 12110: lr=1.00E-05, loss= 1.2009 (max= 2.0789), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:08,537 - root - INFO - Step 12110: lr=1.00E-05, loss= 1.2009 (max= 2.0789), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:08,537 - root - INFO - Step 12110: lr=1.00E-05, loss= 1.2009 (max= 2.0789), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:08,537 - root - INFO - Step 12110: lr=1.00E-05, loss= 1.2009 (max= 2.0789), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:08,538 - root - INFO - Step 12110: lr=1.00E-05, loss= 1.2009 (max= 2.0789), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:26,598 - root - INFO - Step 12120: lr=1.00E-05, loss= 1.2339 (max= 2.4315), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:26,598 - root - INFO - Step 12120: lr=1.00E-05, loss= 1.2339 (max= 2.4315), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:26,598 - root - INFO - Step 12120: lr=1.00E-05, loss= 1.2339 (max= 2.4315), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:26,598 - root - INFO - Step 12120: lr=1.00E-05, loss= 1.2339 (max= 2.4315), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:26,598 - root - INFO - Step 12120: lr=1.00E-05, loss= 1.2339 (max= 2.4315), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:26,598 - root - INFO - Step 12120: lr=1.00E-05, loss= 1.2339 (max= 2.4315), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:26,598 - root - INFO - Step 12120: lr=1.00E-05, loss= 1.2339 (max= 2.4315), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:26,598 - root - INFO - Step 12120: lr=1.00E-05, loss= 1.2339 (max= 2.4315), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:44,625 - root - INFO - Step 12130: lr=1.00E-05, loss= 1.2267 (max= 2.0833), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:44,625 - root - INFO - Step 12130: lr=1.00E-05, loss= 1.2267 (max= 2.0833), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:44,625 - root - INFO - Step 12130: lr=1.00E-05, loss= 1.2267 (max= 2.0833), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:44,625 - root - INFO - Step 12130: lr=1.00E-05, loss= 1.2267 (max= 2.0833), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:44,625 - root - INFO - Step 12130: lr=1.00E-05, loss= 1.2267 (max= 2.0833), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:44,625 - root - INFO - Step 12130: lr=1.00E-05, loss= 1.2267 (max= 2.0833), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:44,625 - root - INFO - Step 12130: lr=1.00E-05, loss= 1.2267 (max= 2.0833), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:32:44,626 - root - INFO - Step 12130: lr=1.00E-05, loss= 1.2267 (max= 2.0833), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:33:02,644 - root - INFO - Step 12140: lr=1.00E-05, loss= 1.2297 (max= 2.3349), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:02,644 - root - INFO - Step 12140: lr=1.00E-05, loss= 1.2297 (max= 2.3349), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:02,644 - root - INFO - Step 12140: lr=1.00E-05, loss= 1.2297 (max= 2.3349), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:02,644 - root - INFO - Step 12140: lr=1.00E-05, loss= 1.2297 (max= 2.3349), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:02,644 - root - INFO - Step 12140: lr=1.00E-05, loss= 1.2297 (max= 2.3349), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:02,644 - root - INFO - Step 12140: lr=1.00E-05, loss= 1.2297 (max= 2.3349), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:02,644 - root - INFO - Step 12140: lr=1.00E-05, loss= 1.2297 (max= 2.3349), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:02,645 - root - INFO - Step 12140: lr=1.00E-05, loss= 1.2297 (max= 2.3349), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:20,663 - root - INFO - Step 12150: lr=1.00E-05, loss= 1.2311 (max= 2.4160), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:33:20,663 - root - INFO - Step 12150: lr=1.00E-05, loss= 1.2311 (max= 2.4160), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:33:20,663 - root - INFO - Step 12150: lr=1.00E-05, loss= 1.2311 (max= 2.4160), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:33:20,663 - root - INFO - Step 12150: lr=1.00E-05, loss= 1.2311 (max= 2.4160), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:33:20,663 - root - INFO - Step 12150: lr=1.00E-05, loss= 1.2311 (max= 2.4160), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:33:20,663 - root - INFO - Step 12150: lr=1.00E-05, loss= 1.2311 (max= 2.4160), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:33:20,663 - root - INFO - Step 12150: lr=1.00E-05, loss= 1.2311 (max= 2.4160), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:33:20,663 - root - INFO - Step 12150: lr=1.00E-05, loss= 1.2311 (max= 2.4160), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:33:38,686 - root - INFO - Step 12160: lr=1.00E-05, loss= 1.2559 (max= 2.8867), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:38,687 - root - INFO - Step 12160: lr=1.00E-05, loss= 1.2559 (max= 2.8867), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:38,687 - root - INFO - Step 12160: lr=1.00E-05, loss= 1.2559 (max= 2.8867), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:38,687 - root - INFO - Step 12160: lr=1.00E-05, loss= 1.2559 (max= 2.8867), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:38,687 - root - INFO - Step 12160: lr=1.00E-05, loss= 1.2559 (max= 2.8867), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:38,687 - root - INFO - Step 12160: lr=1.00E-05, loss= 1.2559 (max= 2.8867), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:38,687 - root - INFO - Step 12160: lr=1.00E-05, loss= 1.2559 (max= 2.8867), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:38,687 - root - INFO - Step 12160: lr=1.00E-05, loss= 1.2559 (max= 2.8867), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:56,697 - root - INFO - Step 12170: lr=1.00E-05, loss= 1.1884 (max= 2.0959), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:56,697 - root - INFO - Step 12170: lr=1.00E-05, loss= 1.1884 (max= 2.0959), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:56,697 - root - INFO - Step 12170: lr=1.00E-05, loss= 1.1884 (max= 2.0959), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:56,697 - root - INFO - Step 12170: lr=1.00E-05, loss= 1.1884 (max= 2.0959), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:56,697 - root - INFO - Step 12170: lr=1.00E-05, loss= 1.1884 (max= 2.0959), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:56,697 - root - INFO - Step 12170: lr=1.00E-05, loss= 1.1884 (max= 2.0959), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:56,698 - root - INFO - Step 12170: lr=1.00E-05, loss= 1.1884 (max= 2.0959), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:33:56,698 - root - INFO - Step 12170: lr=1.00E-05, loss= 1.1884 (max= 2.0959), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:34:14,743 - root - INFO - Step 12180: lr=1.00E-05, loss= 1.2532 (max= 2.1607), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:34:14,743 - root - INFO - Step 12180: lr=1.00E-05, loss= 1.2532 (max= 2.1607), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:34:14,743 - root - INFO - Step 12180: lr=1.00E-05, loss= 1.2532 (max= 2.1607), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:34:14,743 - root - INFO - Step 12180: lr=1.00E-05, loss= 1.2532 (max= 2.1607), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:34:14,743 - root - INFO - Step 12180: lr=1.00E-05, loss= 1.2532 (max= 2.1607), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:34:14,743 - root - INFO - Step 12180: lr=1.00E-05, loss= 1.2532 (max= 2.1607), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:34:14,743 - root - INFO - Step 12180: lr=1.00E-05, loss= 1.2532 (max= 2.1607), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:34:14,743 - root - INFO - Step 12180: lr=1.00E-05, loss= 1.2532 (max= 2.1607), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:34:32,747 - root - INFO - Step 12190: lr=1.00E-05, loss= 1.2207 (max= 2.4678), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:34:32,748 - root - INFO - Step 12190: lr=1.00E-05, loss= 1.2207 (max= 2.4678), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:34:32,748 - root - INFO - Step 12190: lr=1.00E-05, loss= 1.2207 (max= 2.4678), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:34:32,748 - root - INFO - Step 12190: lr=1.00E-05, loss= 1.2207 (max= 2.4678), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:34:32,748 - root - INFO - Step 12190: lr=1.00E-05, loss= 1.2207 (max= 2.4678), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:34:32,748 - root - INFO - Step 12190: lr=1.00E-05, loss= 1.2207 (max= 2.4678), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:34:32,748 - root - INFO - Step 12190: lr=1.00E-05, loss= 1.2207 (max= 2.4678), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:34:32,748 - root - INFO - Step 12190: lr=1.00E-05, loss= 1.2207 (max= 2.4678), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:34:50,797 - root - INFO - Step 12200: lr=1.00E-05, loss= 1.2602 (max= 2.3080), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:34:50,797 - root - INFO - Step 12200: lr=1.00E-05, loss= 1.2602 (max= 2.3080), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:34:50,797 - root - INFO - Step 12200: lr=1.00E-05, loss= 1.2602 (max= 2.3080), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:34:50,797 - root - INFO - Step 12200: lr=1.00E-05, loss= 1.2602 (max= 2.3080), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:34:50,797 - root - INFO - Step 12200: lr=1.00E-05, loss= 1.2602 (max= 2.3080), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:34:50,797 - root - INFO - Step 12200: lr=1.00E-05, loss= 1.2602 (max= 2.3080), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:34:50,797 - root - INFO - Step 12200: lr=1.00E-05, loss= 1.2602 (max= 2.3080), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:34:50,797 - root - INFO - Step 12200: lr=1.00E-05, loss= 1.2602 (max= 2.3080), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:35:08,816 - root - INFO - Step 12210: lr=1.00E-05, loss= 1.2262 (max= 2.1274), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:35:08,816 - root - INFO - Step 12210: lr=1.00E-05, loss= 1.2262 (max= 2.1274), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:35:08,816 - root - INFO - Step 12210: lr=1.00E-05, loss= 1.2262 (max= 2.1274), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:35:08,816 - root - INFO - Step 12210: lr=1.00E-05, loss= 1.2262 (max= 2.1274), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:35:08,816 - root - INFO - Step 12210: lr=1.00E-05, loss= 1.2262 (max= 2.1274), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:35:08,816 - root - INFO - Step 12210: lr=1.00E-05, loss= 1.2262 (max= 2.1274), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:35:08,816 - root - INFO - Step 12210: lr=1.00E-05, loss= 1.2262 (max= 2.1274), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:35:08,817 - root - INFO - Step 12210: lr=1.00E-05, loss= 1.2262 (max= 2.1274), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:35:26,854 - root - INFO - Step 12220: lr=1.00E-05, loss= 1.1916 (max= 3.5741), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:35:26,854 - root - INFO - Step 12220: lr=1.00E-05, loss= 1.1916 (max= 3.5741), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:35:26,854 - root - INFO - Step 12220: lr=1.00E-05, loss= 1.1916 (max= 3.5741), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:35:26,854 - root - INFO - Step 12220: lr=1.00E-05, loss= 1.1916 (max= 3.5741), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:35:26,854 - root - INFO - Step 12220: lr=1.00E-05, loss= 1.1916 (max= 3.5741), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:35:26,854 - root - INFO - Step 12220: lr=1.00E-05, loss= 1.1916 (max= 3.5741), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:35:26,854 - root - INFO - Step 12220: lr=1.00E-05, loss= 1.1916 (max= 3.5741), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:35:26,854 - root - INFO - Step 12220: lr=1.00E-05, loss= 1.1916 (max= 3.5741), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:35:44,869 - root - INFO - Step 12230: lr=1.00E-05, loss= 1.2210 (max= 2.0881), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:35:44,869 - root - INFO - Step 12230: lr=1.00E-05, loss= 1.2210 (max= 2.0881), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:35:44,869 - root - INFO - Step 12230: lr=1.00E-05, loss= 1.2210 (max= 2.0881), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:35:44,869 - root - INFO - Step 12230: lr=1.00E-05, loss= 1.2210 (max= 2.0881), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:35:44,869 - root - INFO - Step 12230: lr=1.00E-05, loss= 1.2210 (max= 2.0881), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:35:44,869 - root - INFO - Step 12230: lr=1.00E-05, loss= 1.2210 (max= 2.0881), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:35:44,869 - root - INFO - Step 12230: lr=1.00E-05, loss= 1.2210 (max= 2.0881), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:35:44,869 - root - INFO - Step 12230: lr=1.00E-05, loss= 1.2210 (max= 2.0881), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:02,930 - root - INFO - Step 12240: lr=1.00E-05, loss= 1.2510 (max= 2.3446), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:36:02,930 - root - INFO - Step 12240: lr=1.00E-05, loss= 1.2510 (max= 2.3446), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:36:02,930 - root - INFO - Step 12240: lr=1.00E-05, loss= 1.2510 (max= 2.3446), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:36:02,930 - root - INFO - Step 12240: lr=1.00E-05, loss= 1.2510 (max= 2.3446), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:36:02,930 - root - INFO - Step 12240: lr=1.00E-05, loss= 1.2510 (max= 2.3446), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:36:02,930 - root - INFO - Step 12240: lr=1.00E-05, loss= 1.2510 (max= 2.3446), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:36:02,930 - root - INFO - Step 12240: lr=1.00E-05, loss= 1.2510 (max= 2.3446), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:36:02,930 - root - INFO - Step 12240: lr=1.00E-05, loss= 1.2510 (max= 2.3446), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:36:20,961 - root - INFO - Step 12250: lr=1.00E-05, loss= 1.2337 (max= 2.1642), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:20,961 - root - INFO - Step 12250: lr=1.00E-05, loss= 1.2337 (max= 2.1642), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:20,961 - root - INFO - Step 12250: lr=1.00E-05, loss= 1.2337 (max= 2.1642), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:20,961 - root - INFO - Step 12250: lr=1.00E-05, loss= 1.2337 (max= 2.1642), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:20,961 - root - INFO - Step 12250: lr=1.00E-05, loss= 1.2337 (max= 2.1642), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:20,961 - root - INFO - Step 12250: lr=1.00E-05, loss= 1.2337 (max= 2.1642), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:20,961 - root - INFO - Step 12250: lr=1.00E-05, loss= 1.2337 (max= 2.1642), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:20,961 - root - INFO - Step 12250: lr=1.00E-05, loss= 1.2337 (max= 2.1642), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:38,991 - root - INFO - Step 12260: lr=1.00E-05, loss= 1.2368 (max= 2.1324), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:38,991 - root - INFO - Step 12260: lr=1.00E-05, loss= 1.2368 (max= 2.1324), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:38,991 - root - INFO - Step 12260: lr=1.00E-05, loss= 1.2368 (max= 2.1324), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:38,992 - root - INFO - Step 12260: lr=1.00E-05, loss= 1.2368 (max= 2.1324), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:38,992 - root - INFO - Step 12260: lr=1.00E-05, loss= 1.2368 (max= 2.1324), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:38,992 - root - INFO - Step 12260: lr=1.00E-05, loss= 1.2368 (max= 2.1324), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:38,992 - root - INFO - Step 12260: lr=1.00E-05, loss= 1.2368 (max= 2.1324), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:38,992 - root - INFO - Step 12260: lr=1.00E-05, loss= 1.2368 (max= 2.1324), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:57,044 - root - INFO - Step 12270: lr=1.00E-05, loss= 1.2498 (max= 3.3510), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:57,044 - root - INFO - Step 12270: lr=1.00E-05, loss= 1.2498 (max= 3.3510), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:57,044 - root - INFO - Step 12270: lr=1.00E-05, loss= 1.2498 (max= 3.3510), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:57,044 - root - INFO - Step 12270: lr=1.00E-05, loss= 1.2498 (max= 3.3510), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:57,044 - root - INFO - Step 12270: lr=1.00E-05, loss= 1.2498 (max= 3.3510), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:57,044 - root - INFO - Step 12270: lr=1.00E-05, loss= 1.2498 (max= 3.3510), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:57,044 - root - INFO - Step 12270: lr=1.00E-05, loss= 1.2498 (max= 3.3510), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:36:57,044 - root - INFO - Step 12270: lr=1.00E-05, loss= 1.2498 (max= 3.3510), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:37:15,098 - root - INFO - Step 12280: lr=1.00E-05, loss= 1.2469 (max= 2.1401), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:37:15,098 - root - INFO - Step 12280: lr=1.00E-05, loss= 1.2469 (max= 2.1401), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:37:15,098 - root - INFO - Step 12280: lr=1.00E-05, loss= 1.2469 (max= 2.1401), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:37:15,098 - root - INFO - Step 12280: lr=1.00E-05, loss= 1.2469 (max= 2.1401), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:37:15,098 - root - INFO - Step 12280: lr=1.00E-05, loss= 1.2469 (max= 2.1401), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:37:15,098 - root - INFO - Step 12280: lr=1.00E-05, loss= 1.2469 (max= 2.1401), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:37:15,098 - root - INFO - Step 12280: lr=1.00E-05, loss= 1.2469 (max= 2.1401), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:37:15,098 - root - INFO - Step 12280: lr=1.00E-05, loss= 1.2469 (max= 2.1401), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:37:33,159 - root - INFO - Step 12290: lr=1.00E-05, loss= 1.2287 (max= 3.4566), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:37:33,159 - root - INFO - Step 12290: lr=1.00E-05, loss= 1.2287 (max= 3.4566), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:37:33,159 - root - INFO - Step 12290: lr=1.00E-05, loss= 1.2287 (max= 3.4566), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:37:33,159 - root - INFO - Step 12290: lr=1.00E-05, loss= 1.2287 (max= 3.4566), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:37:33,159 - root - INFO - Step 12290: lr=1.00E-05, loss= 1.2287 (max= 3.4566), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:37:33,159 - root - INFO - Step 12290: lr=1.00E-05, loss= 1.2287 (max= 3.4566), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:37:33,159 - root - INFO - Step 12290: lr=1.00E-05, loss= 1.2287 (max= 3.4566), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:37:33,159 - root - INFO - Step 12290: lr=1.00E-05, loss= 1.2287 (max= 3.4566), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:37:51,221 - root - INFO - Step 12300: lr=1.00E-05, loss= 1.2350 (max= 2.1148), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:37:51,221 - root - INFO - Step 12300: lr=1.00E-05, loss= 1.2350 (max= 2.1148), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:37:51,221 - root - INFO - Step 12300: lr=1.00E-05, loss= 1.2350 (max= 2.1148), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:37:51,221 - root - INFO - Step 12300: lr=1.00E-05, loss= 1.2350 (max= 2.1148), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:37:51,221 - root - INFO - Step 12300: lr=1.00E-05, loss= 1.2350 (max= 2.1148), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:37:51,221 - root - INFO - Step 12300: lr=1.00E-05, loss= 1.2350 (max= 2.1148), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:37:51,221 - root - INFO - Step 12300: lr=1.00E-05, loss= 1.2350 (max= 2.1148), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:37:51,221 - root - INFO - Step 12300: lr=1.00E-05, loss= 1.2350 (max= 2.1148), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:09,234 - root - INFO - Step 12310: lr=1.00E-05, loss= 1.2553 (max= 2.2368), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:09,234 - root - INFO - Step 12310: lr=1.00E-05, loss= 1.2553 (max= 2.2368), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:09,234 - root - INFO - Step 12310: lr=1.00E-05, loss= 1.2553 (max= 2.2368), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:09,234 - root - INFO - Step 12310: lr=1.00E-05, loss= 1.2553 (max= 2.2368), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:09,234 - root - INFO - Step 12310: lr=1.00E-05, loss= 1.2553 (max= 2.2368), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:09,234 - root - INFO - Step 12310: lr=1.00E-05, loss= 1.2553 (max= 2.2368), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:09,234 - root - INFO - Step 12310: lr=1.00E-05, loss= 1.2553 (max= 2.2368), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:09,234 - root - INFO - Step 12310: lr=1.00E-05, loss= 1.2553 (max= 2.2368), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:27,262 - root - INFO - Step 12320: lr=1.00E-05, loss= 1.2290 (max= 2.3896), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:27,262 - root - INFO - Step 12320: lr=1.00E-05, loss= 1.2290 (max= 2.3896), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:27,262 - root - INFO - Step 12320: lr=1.00E-05, loss= 1.2290 (max= 2.3896), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:27,262 - root - INFO - Step 12320: lr=1.00E-05, loss= 1.2290 (max= 2.3896), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:27,262 - root - INFO - Step 12320: lr=1.00E-05, loss= 1.2290 (max= 2.3896), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:27,262 - root - INFO - Step 12320: lr=1.00E-05, loss= 1.2290 (max= 2.3896), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:27,262 - root - INFO - Step 12320: lr=1.00E-05, loss= 1.2290 (max= 2.3896), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:27,262 - root - INFO - Step 12320: lr=1.00E-05, loss= 1.2290 (max= 2.3896), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:45,282 - root - INFO - Step 12330: lr=1.00E-05, loss= 1.2258 (max= 2.0140), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:45,282 - root - INFO - Step 12330: lr=1.00E-05, loss= 1.2258 (max= 2.0140), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:45,282 - root - INFO - Step 12330: lr=1.00E-05, loss= 1.2258 (max= 2.0140), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:45,282 - root - INFO - Step 12330: lr=1.00E-05, loss= 1.2258 (max= 2.0140), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:45,282 - root - INFO - Step 12330: lr=1.00E-05, loss= 1.2258 (max= 2.0140), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:45,282 - root - INFO - Step 12330: lr=1.00E-05, loss= 1.2258 (max= 2.0140), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:45,282 - root - INFO - Step 12330: lr=1.00E-05, loss= 1.2258 (max= 2.0140), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:38:45,282 - root - INFO - Step 12330: lr=1.00E-05, loss= 1.2258 (max= 2.0140), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:03,307 - root - INFO - Step 12340: lr=1.00E-05, loss= 1.2203 (max= 2.0799), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:03,307 - root - INFO - Step 12340: lr=1.00E-05, loss= 1.2203 (max= 2.0799), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:03,307 - root - INFO - Step 12340: lr=1.00E-05, loss= 1.2203 (max= 2.0799), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:03,307 - root - INFO - Step 12340: lr=1.00E-05, loss= 1.2203 (max= 2.0799), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:03,308 - root - INFO - Step 12340: lr=1.00E-05, loss= 1.2203 (max= 2.0799), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:03,308 - root - INFO - Step 12340: lr=1.00E-05, loss= 1.2203 (max= 2.0799), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:03,308 - root - INFO - Step 12340: lr=1.00E-05, loss= 1.2203 (max= 2.0799), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:03,308 - root - INFO - Step 12340: lr=1.00E-05, loss= 1.2203 (max= 2.0799), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:21,316 - root - INFO - Step 12350: lr=1.00E-05, loss= 1.2280 (max= 2.0709), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:21,316 - root - INFO - Step 12350: lr=1.00E-05, loss= 1.2280 (max= 2.0709), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:21,316 - root - INFO - Step 12350: lr=1.00E-05, loss= 1.2280 (max= 2.0709), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:21,316 - root - INFO - Step 12350: lr=1.00E-05, loss= 1.2280 (max= 2.0709), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:21,316 - root - INFO - Step 12350: lr=1.00E-05, loss= 1.2280 (max= 2.0709), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:21,316 - root - INFO - Step 12350: lr=1.00E-05, loss= 1.2280 (max= 2.0709), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:21,316 - root - INFO - Step 12350: lr=1.00E-05, loss= 1.2280 (max= 2.0709), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:21,316 - root - INFO - Step 12350: lr=1.00E-05, loss= 1.2280 (max= 2.0709), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:39,341 - root - INFO - Step 12360: lr=1.00E-05, loss= 1.2364 (max= 2.1124), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:39,341 - root - INFO - Step 12360: lr=1.00E-05, loss= 1.2364 (max= 2.1124), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:39,341 - root - INFO - Step 12360: lr=1.00E-05, loss= 1.2364 (max= 2.1124), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:39,341 - root - INFO - Step 12360: lr=1.00E-05, loss= 1.2364 (max= 2.1124), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:39,341 - root - INFO - Step 12360: lr=1.00E-05, loss= 1.2364 (max= 2.1124), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:39,341 - root - INFO - Step 12360: lr=1.00E-05, loss= 1.2364 (max= 2.1124), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:39,341 - root - INFO - Step 12360: lr=1.00E-05, loss= 1.2364 (max= 2.1124), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:39,342 - root - INFO - Step 12360: lr=1.00E-05, loss= 1.2364 (max= 2.1124), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:57,381 - root - INFO - Step 12370: lr=1.00E-05, loss= 1.2248 (max= 2.3808), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:57,381 - root - INFO - Step 12370: lr=1.00E-05, loss= 1.2248 (max= 2.3808), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:57,381 - root - INFO - Step 12370: lr=1.00E-05, loss= 1.2248 (max= 2.3808), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:57,381 - root - INFO - Step 12370: lr=1.00E-05, loss= 1.2248 (max= 2.3808), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:57,381 - root - INFO - Step 12370: lr=1.00E-05, loss= 1.2248 (max= 2.3808), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:57,381 - root - INFO - Step 12370: lr=1.00E-05, loss= 1.2248 (max= 2.3808), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:57,382 - root - INFO - Step 12370: lr=1.00E-05, loss= 1.2248 (max= 2.3808), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:39:57,382 - root - INFO - Step 12370: lr=1.00E-05, loss= 1.2248 (max= 2.3808), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:15,437 - root - INFO - Step 12380: lr=1.00E-05, loss= 1.2325 (max= 2.2201), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:15,437 - root - INFO - Step 12380: lr=1.00E-05, loss= 1.2325 (max= 2.2201), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:15,437 - root - INFO - Step 12380: lr=1.00E-05, loss= 1.2325 (max= 2.2201), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:15,437 - root - INFO - Step 12380: lr=1.00E-05, loss= 1.2325 (max= 2.2201), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:15,437 - root - INFO - Step 12380: lr=1.00E-05, loss= 1.2325 (max= 2.2201), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:15,437 - root - INFO - Step 12380: lr=1.00E-05, loss= 1.2325 (max= 2.2201), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:15,437 - root - INFO - Step 12380: lr=1.00E-05, loss= 1.2325 (max= 2.2201), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:15,437 - root - INFO - Step 12380: lr=1.00E-05, loss= 1.2325 (max= 2.2201), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:33,512 - root - INFO - Step 12390: lr=1.00E-05, loss= 1.2421 (max= 3.1821), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:33,512 - root - INFO - Step 12390: lr=1.00E-05, loss= 1.2421 (max= 3.1821), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:33,512 - root - INFO - Step 12390: lr=1.00E-05, loss= 1.2421 (max= 3.1821), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:33,512 - root - INFO - Step 12390: lr=1.00E-05, loss= 1.2421 (max= 3.1821), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:33,512 - root - INFO - Step 12390: lr=1.00E-05, loss= 1.2421 (max= 3.1821), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:33,513 - root - INFO - Step 12390: lr=1.00E-05, loss= 1.2421 (max= 3.1821), tps=18132, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:33,513 - root - INFO - Step 12390: lr=1.00E-05, loss= 1.2421 (max= 3.1821), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:33,513 - root - INFO - Step 12390: lr=1.00E-05, loss= 1.2421 (max= 3.1821), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:51,529 - root - INFO - Step 12400: lr=1.00E-05, loss= 1.2294 (max= 2.2783), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:51,529 - root - INFO - Step 12400: lr=1.00E-05, loss= 1.2294 (max= 2.2783), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:51,529 - root - INFO - Step 12400: lr=1.00E-05, loss= 1.2294 (max= 2.2783), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:51,529 - root - INFO - Step 12400: lr=1.00E-05, loss= 1.2294 (max= 2.2783), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:51,529 - root - INFO - Step 12400: lr=1.00E-05, loss= 1.2294 (max= 2.2783), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:51,529 - root - INFO - Step 12400: lr=1.00E-05, loss= 1.2294 (max= 2.2783), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:51,529 - root - INFO - Step 12400: lr=1.00E-05, loss= 1.2294 (max= 2.2783), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:40:51,529 - root - INFO - Step 12400: lr=1.00E-05, loss= 1.2294 (max= 2.2783), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:09,532 - root - INFO - Step 12410: lr=1.00E-05, loss= 1.2065 (max= 3.2521), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:09,532 - root - INFO - Step 12410: lr=1.00E-05, loss= 1.2065 (max= 3.2521), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:09,532 - root - INFO - Step 12410: lr=1.00E-05, loss= 1.2065 (max= 3.2521), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:09,532 - root - INFO - Step 12410: lr=1.00E-05, loss= 1.2065 (max= 3.2521), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:09,532 - root - INFO - Step 12410: lr=1.00E-05, loss= 1.2065 (max= 3.2521), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:09,532 - root - INFO - Step 12410: lr=1.00E-05, loss= 1.2065 (max= 3.2521), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:09,532 - root - INFO - Step 12410: lr=1.00E-05, loss= 1.2065 (max= 3.2521), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:09,533 - root - INFO - Step 12410: lr=1.00E-05, loss= 1.2065 (max= 3.2521), tps=18205, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:27,556 - root - INFO - Step 12420: lr=1.00E-05, loss= 1.2199 (max= 2.4418), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:27,556 - root - INFO - Step 12420: lr=1.00E-05, loss= 1.2199 (max= 2.4418), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:27,557 - root - INFO - Step 12420: lr=1.00E-05, loss= 1.2199 (max= 2.4418), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:27,557 - root - INFO - Step 12420: lr=1.00E-05, loss= 1.2199 (max= 2.4418), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:27,557 - root - INFO - Step 12420: lr=1.00E-05, loss= 1.2199 (max= 2.4418), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:27,557 - root - INFO - Step 12420: lr=1.00E-05, loss= 1.2199 (max= 2.4418), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:27,557 - root - INFO - Step 12420: lr=1.00E-05, loss= 1.2199 (max= 2.4418), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:27,557 - root - INFO - Step 12420: lr=1.00E-05, loss= 1.2199 (max= 2.4418), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:45,592 - root - INFO - Step 12430: lr=1.00E-05, loss= 1.2579 (max= 2.1994), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:45,592 - root - INFO - Step 12430: lr=1.00E-05, loss= 1.2579 (max= 2.1994), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:45,592 - root - INFO - Step 12430: lr=1.00E-05, loss= 1.2579 (max= 2.1994), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:45,592 - root - INFO - Step 12430: lr=1.00E-05, loss= 1.2579 (max= 2.1994), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:45,592 - root - INFO - Step 12430: lr=1.00E-05, loss= 1.2579 (max= 2.1994), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:45,592 - root - INFO - Step 12430: lr=1.00E-05, loss= 1.2579 (max= 2.1994), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:45,592 - root - INFO - Step 12430: lr=1.00E-05, loss= 1.2579 (max= 2.1994), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:41:45,595 - root - INFO - Step 12430: lr=1.00E-05, loss= 1.2579 (max= 2.1994), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:03,653 - root - INFO - Step 12440: lr=1.00E-05, loss= 1.2191 (max= 2.2309), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:03,653 - root - INFO - Step 12440: lr=1.00E-05, loss= 1.2191 (max= 2.2309), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:03,653 - root - INFO - Step 12440: lr=1.00E-05, loss= 1.2191 (max= 2.2309), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:03,653 - root - INFO - Step 12440: lr=1.00E-05, loss= 1.2191 (max= 2.2309), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:03,653 - root - INFO - Step 12440: lr=1.00E-05, loss= 1.2191 (max= 2.2309), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:03,653 - root - INFO - Step 12440: lr=1.00E-05, loss= 1.2191 (max= 2.2309), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:03,653 - root - INFO - Step 12440: lr=1.00E-05, loss= 1.2191 (max= 2.2309), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:03,653 - root - INFO - Step 12440: lr=1.00E-05, loss= 1.2191 (max= 2.2309), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:21,720 - root - INFO - Step 12450: lr=1.00E-05, loss= 1.2377 (max= 2.1514), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:21,720 - root - INFO - Step 12450: lr=1.00E-05, loss= 1.2377 (max= 2.1514), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:21,721 - root - INFO - Step 12450: lr=1.00E-05, loss= 1.2377 (max= 2.1514), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:21,721 - root - INFO - Step 12450: lr=1.00E-05, loss= 1.2377 (max= 2.1514), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:21,721 - root - INFO - Step 12450: lr=1.00E-05, loss= 1.2377 (max= 2.1514), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:21,721 - root - INFO - Step 12450: lr=1.00E-05, loss= 1.2377 (max= 2.1514), tps=18140, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:21,721 - root - INFO - Step 12450: lr=1.00E-05, loss= 1.2377 (max= 2.1514), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:21,721 - root - INFO - Step 12450: lr=1.00E-05, loss= 1.2377 (max= 2.1514), tps=18140, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:39,760 - root - INFO - Step 12460: lr=1.00E-05, loss= 1.2127 (max= 2.0882), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:39,760 - root - INFO - Step 12460: lr=1.00E-05, loss= 1.2127 (max= 2.0882), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:39,760 - root - INFO - Step 12460: lr=1.00E-05, loss= 1.2127 (max= 2.0882), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:39,760 - root - INFO - Step 12460: lr=1.00E-05, loss= 1.2127 (max= 2.0882), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:39,760 - root - INFO - Step 12460: lr=1.00E-05, loss= 1.2127 (max= 2.0882), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:39,760 - root - INFO - Step 12460: lr=1.00E-05, loss= 1.2127 (max= 2.0882), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:39,760 - root - INFO - Step 12460: lr=1.00E-05, loss= 1.2127 (max= 2.0882), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:39,760 - root - INFO - Step 12460: lr=1.00E-05, loss= 1.2127 (max= 2.0882), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:57,821 - root - INFO - Step 12470: lr=1.00E-05, loss= 1.2355 (max= 2.1649), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:57,822 - root - INFO - Step 12470: lr=1.00E-05, loss= 1.2355 (max= 2.1649), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:57,822 - root - INFO - Step 12470: lr=1.00E-05, loss= 1.2355 (max= 2.1649), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:57,822 - root - INFO - Step 12470: lr=1.00E-05, loss= 1.2355 (max= 2.1649), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:57,822 - root - INFO - Step 12470: lr=1.00E-05, loss= 1.2355 (max= 2.1649), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:57,822 - root - INFO - Step 12470: lr=1.00E-05, loss= 1.2355 (max= 2.1649), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:57,822 - root - INFO - Step 12470: lr=1.00E-05, loss= 1.2355 (max= 2.1649), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:42:57,822 - root - INFO - Step 12470: lr=1.00E-05, loss= 1.2355 (max= 2.1649), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:15,857 - root - INFO - Step 12480: lr=1.00E-05, loss= 1.2231 (max= 2.1342), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:15,857 - root - INFO - Step 12480: lr=1.00E-05, loss= 1.2231 (max= 2.1342), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:15,857 - root - INFO - Step 12480: lr=1.00E-05, loss= 1.2231 (max= 2.1342), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:15,857 - root - INFO - Step 12480: lr=1.00E-05, loss= 1.2231 (max= 2.1342), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:15,857 - root - INFO - Step 12480: lr=1.00E-05, loss= 1.2231 (max= 2.1342), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:15,857 - root - INFO - Step 12480: lr=1.00E-05, loss= 1.2231 (max= 2.1342), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:15,857 - root - INFO - Step 12480: lr=1.00E-05, loss= 1.2231 (max= 2.1342), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:15,857 - root - INFO - Step 12480: lr=1.00E-05, loss= 1.2231 (max= 2.1342), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:33,924 - root - INFO - Step 12490: lr=1.00E-05, loss= 1.2132 (max= 1.9739), tps=18140, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:33,924 - root - INFO - Step 12490: lr=1.00E-05, loss= 1.2132 (max= 1.9739), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:33,924 - root - INFO - Step 12490: lr=1.00E-05, loss= 1.2132 (max= 1.9739), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:33,924 - root - INFO - Step 12490: lr=1.00E-05, loss= 1.2132 (max= 1.9739), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:33,924 - root - INFO - Step 12490: lr=1.00E-05, loss= 1.2132 (max= 1.9739), tps=18141, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:33,925 - root - INFO - Step 12490: lr=1.00E-05, loss= 1.2132 (max= 1.9739), tps=18140, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:33,925 - root - INFO - Step 12490: lr=1.00E-05, loss= 1.2132 (max= 1.9739), tps=18140, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:33,925 - root - INFO - Step 12490: lr=1.00E-05, loss= 1.2132 (max= 1.9739), tps=18140, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:47,604 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:3824482 +2025-10-24 15:43:51,929 - root - INFO - Step 12500: lr=1.00E-05, loss= 1.2295 (max= 2.6264), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:51,929 - root - INFO - Step 12500: lr=1.00E-05, loss= 1.2295 (max= 2.6264), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:51,929 - root - INFO - Step 12500: lr=1.00E-05, loss= 1.2295 (max= 2.6264), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:51,929 - root - INFO - Step 12500: lr=1.00E-05, loss= 1.2295 (max= 2.6264), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:51,929 - root - INFO - Step 12500: lr=1.00E-05, loss= 1.2295 (max= 2.6264), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:51,930 - root - INFO - Step 12500: lr=1.00E-05, loss= 1.2295 (max= 2.6264), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:51,930 - root - INFO - Step 12500: lr=1.00E-05, loss= 1.2295 (max= 2.6264), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:43:51,930 - root - INFO - Step 12500: lr=1.00E-05, loss= 1.2295 (max= 2.6264), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:44:09,963 - root - INFO - Step 12510: lr=1.00E-05, loss= 1.2114 (max= 2.1646), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:44:09,963 - root - INFO - Step 12510: lr=1.00E-05, loss= 1.2114 (max= 2.1646), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:44:09,964 - root - INFO - Step 12510: lr=1.00E-05, loss= 1.2114 (max= 2.1646), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:44:09,964 - root - INFO - Step 12510: lr=1.00E-05, loss= 1.2114 (max= 2.1646), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:44:09,964 - root - INFO - Step 12510: lr=1.00E-05, loss= 1.2114 (max= 2.1646), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:44:09,964 - root - INFO - Step 12510: lr=1.00E-05, loss= 1.2114 (max= 2.1646), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:44:09,964 - root - INFO - Step 12510: lr=1.00E-05, loss= 1.2114 (max= 2.1646), tps=18174, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:44:09,964 - root - INFO - Step 12510: lr=1.00E-05, loss= 1.2114 (max= 2.1646), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:44:28,023 - root - INFO - Step 12520: lr=1.00E-05, loss= 1.2642 (max= 2.2458), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:44:28,023 - root - INFO - Step 12520: lr=1.00E-05, loss= 1.2642 (max= 2.2458), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:44:28,023 - root - INFO - Step 12520: lr=1.00E-05, loss= 1.2642 (max= 2.2458), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:44:28,023 - root - INFO - Step 12520: lr=1.00E-05, loss= 1.2642 (max= 2.2458), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:44:28,023 - root - INFO - Step 12520: lr=1.00E-05, loss= 1.2642 (max= 2.2458), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:44:28,023 - root - INFO - Step 12520: lr=1.00E-05, loss= 1.2642 (max= 2.2458), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:44:28,023 - root - INFO - Step 12520: lr=1.00E-05, loss= 1.2642 (max= 2.2458), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:44:28,023 - root - INFO - Step 12520: lr=1.00E-05, loss= 1.2642 (max= 2.2458), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:44:46,071 - root - INFO - Step 12530: lr=1.00E-05, loss= 1.1644 (max= 2.0530), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:44:46,072 - root - INFO - Step 12530: lr=1.00E-05, loss= 1.1644 (max= 2.0530), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:44:46,072 - root - INFO - Step 12530: lr=1.00E-05, loss= 1.1644 (max= 2.0530), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:44:46,072 - root - INFO - Step 12530: lr=1.00E-05, loss= 1.1644 (max= 2.0530), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:44:46,072 - root - INFO - Step 12530: lr=1.00E-05, loss= 1.1644 (max= 2.0530), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:44:46,072 - root - INFO - Step 12530: lr=1.00E-05, loss= 1.1644 (max= 2.0530), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:44:46,072 - root - INFO - Step 12530: lr=1.00E-05, loss= 1.1644 (max= 2.0530), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:44:46,072 - root - INFO - Step 12530: lr=1.00E-05, loss= 1.1644 (max= 2.0530), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:45:04,107 - root - INFO - Step 12540: lr=1.00E-05, loss= 1.2252 (max= 2.4009), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:04,107 - root - INFO - Step 12540: lr=1.00E-05, loss= 1.2252 (max= 2.4009), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:04,107 - root - INFO - Step 12540: lr=1.00E-05, loss= 1.2252 (max= 2.4009), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:04,107 - root - INFO - Step 12540: lr=1.00E-05, loss= 1.2252 (max= 2.4009), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:04,107 - root - INFO - Step 12540: lr=1.00E-05, loss= 1.2252 (max= 2.4009), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:04,107 - root - INFO - Step 12540: lr=1.00E-05, loss= 1.2252 (max= 2.4009), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:04,107 - root - INFO - Step 12540: lr=1.00E-05, loss= 1.2252 (max= 2.4009), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:04,107 - root - INFO - Step 12540: lr=1.00E-05, loss= 1.2252 (max= 2.4009), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:22,123 - root - INFO - Step 12550: lr=1.00E-05, loss= 1.2249 (max= 3.8188), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:45:22,123 - root - INFO - Step 12550: lr=1.00E-05, loss= 1.2249 (max= 3.8188), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:45:22,123 - root - INFO - Step 12550: lr=1.00E-05, loss= 1.2249 (max= 3.8188), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:45:22,123 - root - INFO - Step 12550: lr=1.00E-05, loss= 1.2249 (max= 3.8188), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:45:22,123 - root - INFO - Step 12550: lr=1.00E-05, loss= 1.2249 (max= 3.8188), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:45:22,123 - root - INFO - Step 12550: lr=1.00E-05, loss= 1.2249 (max= 3.8188), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:45:22,123 - root - INFO - Step 12550: lr=1.00E-05, loss= 1.2249 (max= 3.8188), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:45:22,123 - root - INFO - Step 12550: lr=1.00E-05, loss= 1.2249 (max= 3.8188), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:45:40,166 - root - INFO - Step 12560: lr=1.00E-05, loss= 1.2546 (max= 2.3843), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:40,166 - root - INFO - Step 12560: lr=1.00E-05, loss= 1.2546 (max= 2.3843), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:40,166 - root - INFO - Step 12560: lr=1.00E-05, loss= 1.2546 (max= 2.3843), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:40,166 - root - INFO - Step 12560: lr=1.00E-05, loss= 1.2546 (max= 2.3843), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:40,166 - root - INFO - Step 12560: lr=1.00E-05, loss= 1.2546 (max= 2.3843), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:40,166 - root - INFO - Step 12560: lr=1.00E-05, loss= 1.2546 (max= 2.3843), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:40,166 - root - INFO - Step 12560: lr=1.00E-05, loss= 1.2546 (max= 2.3843), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:40,166 - root - INFO - Step 12560: lr=1.00E-05, loss= 1.2546 (max= 2.3843), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:58,238 - root - INFO - Step 12570: lr=1.00E-05, loss= 1.2088 (max= 2.0448), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:58,238 - root - INFO - Step 12570: lr=1.00E-05, loss= 1.2088 (max= 2.0448), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:58,238 - root - INFO - Step 12570: lr=1.00E-05, loss= 1.2088 (max= 2.0448), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:58,238 - root - INFO - Step 12570: lr=1.00E-05, loss= 1.2088 (max= 2.0448), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:58,238 - root - INFO - Step 12570: lr=1.00E-05, loss= 1.2088 (max= 2.0448), tps=18135, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:58,238 - root - INFO - Step 12570: lr=1.00E-05, loss= 1.2088 (max= 2.0448), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:58,238 - root - INFO - Step 12570: lr=1.00E-05, loss= 1.2088 (max= 2.0448), tps=18135, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:45:58,238 - root - INFO - Step 12570: lr=1.00E-05, loss= 1.2088 (max= 2.0448), tps=18135, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:46:16,271 - root - INFO - Step 12580: lr=1.00E-05, loss= 1.2516 (max= 2.2447), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:46:16,271 - root - INFO - Step 12580: lr=1.00E-05, loss= 1.2516 (max= 2.2447), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:46:16,271 - root - INFO - Step 12580: lr=1.00E-05, loss= 1.2516 (max= 2.2447), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:46:16,271 - root - INFO - Step 12580: lr=1.00E-05, loss= 1.2516 (max= 2.2447), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:46:16,271 - root - INFO - Step 12580: lr=1.00E-05, loss= 1.2516 (max= 2.2447), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:46:16,271 - root - INFO - Step 12580: lr=1.00E-05, loss= 1.2516 (max= 2.2447), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:46:16,271 - root - INFO - Step 12580: lr=1.00E-05, loss= 1.2516 (max= 2.2447), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:46:16,271 - root - INFO - Step 12580: lr=1.00E-05, loss= 1.2516 (max= 2.2447), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:46:34,301 - root - INFO - Step 12590: lr=1.00E-05, loss= 1.2159 (max= 3.4894), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:46:34,301 - root - INFO - Step 12590: lr=1.00E-05, loss= 1.2159 (max= 3.4894), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:46:34,301 - root - INFO - Step 12590: lr=1.00E-05, loss= 1.2159 (max= 3.4894), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:46:34,301 - root - INFO - Step 12590: lr=1.00E-05, loss= 1.2159 (max= 3.4894), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:46:34,301 - root - INFO - Step 12590: lr=1.00E-05, loss= 1.2159 (max= 3.4894), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:46:34,301 - root - INFO - Step 12590: lr=1.00E-05, loss= 1.2159 (max= 3.4894), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:46:34,301 - root - INFO - Step 12590: lr=1.00E-05, loss= 1.2159 (max= 3.4894), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:46:34,302 - root - INFO - Step 12590: lr=1.00E-05, loss= 1.2159 (max= 3.4894), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:46:52,306 - root - INFO - Step 12600: lr=1.00E-05, loss= 1.2180 (max= 2.6378), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:46:52,306 - root - INFO - Step 12600: lr=1.00E-05, loss= 1.2180 (max= 2.6378), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:46:52,306 - root - INFO - Step 12600: lr=1.00E-05, loss= 1.2180 (max= 2.6378), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:46:52,306 - root - INFO - Step 12600: lr=1.00E-05, loss= 1.2180 (max= 2.6378), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:46:52,306 - root - INFO - Step 12600: lr=1.00E-05, loss= 1.2180 (max= 2.6378), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:46:52,306 - root - INFO - Step 12600: lr=1.00E-05, loss= 1.2180 (max= 2.6378), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:46:52,306 - root - INFO - Step 12600: lr=1.00E-05, loss= 1.2180 (max= 2.6378), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:46:52,306 - root - INFO - Step 12600: lr=1.00E-05, loss= 1.2180 (max= 2.6378), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:10,358 - root - INFO - Step 12610: lr=1.00E-05, loss= 1.2156 (max= 2.6189), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:10,358 - root - INFO - Step 12610: lr=1.00E-05, loss= 1.2156 (max= 2.6189), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:10,358 - root - INFO - Step 12610: lr=1.00E-05, loss= 1.2156 (max= 2.6189), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:10,358 - root - INFO - Step 12610: lr=1.00E-05, loss= 1.2156 (max= 2.6189), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:10,358 - root - INFO - Step 12610: lr=1.00E-05, loss= 1.2156 (max= 2.6189), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:10,358 - root - INFO - Step 12610: lr=1.00E-05, loss= 1.2156 (max= 2.6189), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:10,358 - root - INFO - Step 12610: lr=1.00E-05, loss= 1.2156 (max= 2.6189), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:10,358 - root - INFO - Step 12610: lr=1.00E-05, loss= 1.2156 (max= 2.6189), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:28,408 - root - INFO - Step 12620: lr=1.00E-05, loss= 1.2348 (max= 2.4847), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:28,408 - root - INFO - Step 12620: lr=1.00E-05, loss= 1.2348 (max= 2.4847), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:28,408 - root - INFO - Step 12620: lr=1.00E-05, loss= 1.2348 (max= 2.4847), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:28,408 - root - INFO - Step 12620: lr=1.00E-05, loss= 1.2348 (max= 2.4847), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:28,408 - root - INFO - Step 12620: lr=1.00E-05, loss= 1.2348 (max= 2.4847), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:28,408 - root - INFO - Step 12620: lr=1.00E-05, loss= 1.2348 (max= 2.4847), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:28,409 - root - INFO - Step 12620: lr=1.00E-05, loss= 1.2348 (max= 2.4847), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:28,409 - root - INFO - Step 12620: lr=1.00E-05, loss= 1.2348 (max= 2.4847), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:46,425 - root - INFO - Step 12630: lr=1.00E-05, loss= 1.2489 (max= 2.2553), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:46,425 - root - INFO - Step 12630: lr=1.00E-05, loss= 1.2489 (max= 2.2553), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:46,425 - root - INFO - Step 12630: lr=1.00E-05, loss= 1.2489 (max= 2.2553), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:46,425 - root - INFO - Step 12630: lr=1.00E-05, loss= 1.2489 (max= 2.2553), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:46,425 - root - INFO - Step 12630: lr=1.00E-05, loss= 1.2489 (max= 2.2553), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:46,425 - root - INFO - Step 12630: lr=1.00E-05, loss= 1.2489 (max= 2.2553), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:46,425 - root - INFO - Step 12630: lr=1.00E-05, loss= 1.2489 (max= 2.2553), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:47:46,425 - root - INFO - Step 12630: lr=1.00E-05, loss= 1.2489 (max= 2.2553), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:48:04,454 - root - INFO - Step 12640: lr=1.00E-05, loss= 1.2105 (max= 2.2076), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:48:04,455 - root - INFO - Step 12640: lr=1.00E-05, loss= 1.2105 (max= 2.2076), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:48:04,455 - root - INFO - Step 12640: lr=1.00E-05, loss= 1.2105 (max= 2.2076), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:48:04,455 - root - INFO - Step 12640: lr=1.00E-05, loss= 1.2105 (max= 2.2076), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:48:04,455 - root - INFO - Step 12640: lr=1.00E-05, loss= 1.2105 (max= 2.2076), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:48:04,455 - root - INFO - Step 12640: lr=1.00E-05, loss= 1.2105 (max= 2.2076), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:48:04,455 - root - INFO - Step 12640: lr=1.00E-05, loss= 1.2105 (max= 2.2076), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:48:04,455 - root - INFO - Step 12640: lr=1.00E-05, loss= 1.2105 (max= 2.2076), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:48:22,507 - root - INFO - Step 12650: lr=1.00E-05, loss= 1.1828 (max= 2.1905), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:48:22,507 - root - INFO - Step 12650: lr=1.00E-05, loss= 1.1828 (max= 2.1905), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:48:22,507 - root - INFO - Step 12650: lr=1.00E-05, loss= 1.1828 (max= 2.1905), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:48:22,507 - root - INFO - Step 12650: lr=1.00E-05, loss= 1.1828 (max= 2.1905), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:48:22,507 - root - INFO - Step 12650: lr=1.00E-05, loss= 1.1828 (max= 2.1905), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:48:22,507 - root - INFO - Step 12650: lr=1.00E-05, loss= 1.1828 (max= 2.1905), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:48:22,507 - root - INFO - Step 12650: lr=1.00E-05, loss= 1.1828 (max= 2.1905), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:48:22,507 - root - INFO - Step 12650: lr=1.00E-05, loss= 1.1828 (max= 2.1905), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:48:40,528 - root - INFO - Step 12660: lr=1.00E-05, loss= 1.2377 (max= 2.2437), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:48:40,528 - root - INFO - Step 12660: lr=1.00E-05, loss= 1.2377 (max= 2.2437), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:48:40,528 - root - INFO - Step 12660: lr=1.00E-05, loss= 1.2377 (max= 2.2437), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:48:40,529 - root - INFO - Step 12660: lr=1.00E-05, loss= 1.2377 (max= 2.2437), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:48:40,529 - root - INFO - Step 12660: lr=1.00E-05, loss= 1.2377 (max= 2.2437), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:48:40,529 - root - INFO - Step 12660: lr=1.00E-05, loss= 1.2377 (max= 2.2437), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:48:40,529 - root - INFO - Step 12660: lr=1.00E-05, loss= 1.2377 (max= 2.2437), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:48:40,529 - root - INFO - Step 12660: lr=1.00E-05, loss= 1.2377 (max= 2.2437), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:48:58,558 - root - INFO - Step 12670: lr=1.00E-05, loss= 1.2280 (max= 2.1449), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:48:58,558 - root - INFO - Step 12670: lr=1.00E-05, loss= 1.2280 (max= 2.1449), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:48:58,558 - root - INFO - Step 12670: lr=1.00E-05, loss= 1.2280 (max= 2.1449), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:48:58,558 - root - INFO - Step 12670: lr=1.00E-05, loss= 1.2280 (max= 2.1449), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:48:58,558 - root - INFO - Step 12670: lr=1.00E-05, loss= 1.2280 (max= 2.1449), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:48:58,558 - root - INFO - Step 12670: lr=1.00E-05, loss= 1.2280 (max= 2.1449), tps=18178, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:48:58,558 - root - INFO - Step 12670: lr=1.00E-05, loss= 1.2280 (max= 2.1449), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:48:58,558 - root - INFO - Step 12670: lr=1.00E-05, loss= 1.2280 (max= 2.1449), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:49:16,575 - root - INFO - Step 12680: lr=1.00E-05, loss= 1.2314 (max= 2.3416), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:49:16,575 - root - INFO - Step 12680: lr=1.00E-05, loss= 1.2314 (max= 2.3416), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:49:16,575 - root - INFO - Step 12680: lr=1.00E-05, loss= 1.2314 (max= 2.3416), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:49:16,575 - root - INFO - Step 12680: lr=1.00E-05, loss= 1.2314 (max= 2.3416), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:49:16,575 - root - INFO - Step 12680: lr=1.00E-05, loss= 1.2314 (max= 2.3416), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:49:16,575 - root - INFO - Step 12680: lr=1.00E-05, loss= 1.2314 (max= 2.3416), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:49:16,575 - root - INFO - Step 12680: lr=1.00E-05, loss= 1.2314 (max= 2.3416), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:49:16,575 - root - INFO - Step 12680: lr=1.00E-05, loss= 1.2314 (max= 2.3416), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:49:34,627 - root - INFO - Step 12690: lr=1.00E-05, loss= 1.2263 (max= 2.1095), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:49:34,627 - root - INFO - Step 12690: lr=1.00E-05, loss= 1.2263 (max= 2.1095), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:49:34,627 - root - INFO - Step 12690: lr=1.00E-05, loss= 1.2263 (max= 2.1095), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:49:34,627 - root - INFO - Step 12690: lr=1.00E-05, loss= 1.2263 (max= 2.1095), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:49:34,627 - root - INFO - Step 12690: lr=1.00E-05, loss= 1.2263 (max= 2.1095), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:49:34,627 - root - INFO - Step 12690: lr=1.00E-05, loss= 1.2263 (max= 2.1095), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:49:34,627 - root - INFO - Step 12690: lr=1.00E-05, loss= 1.2263 (max= 2.1095), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:49:34,627 - root - INFO - Step 12690: lr=1.00E-05, loss= 1.2263 (max= 2.1095), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:49:52,664 - root - INFO - Step 12700: lr=1.00E-05, loss= 1.2357 (max= 2.0922), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:49:52,664 - root - INFO - Step 12700: lr=1.00E-05, loss= 1.2357 (max= 2.0922), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:49:52,664 - root - INFO - Step 12700: lr=1.00E-05, loss= 1.2357 (max= 2.0922), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:49:52,664 - root - INFO - Step 12700: lr=1.00E-05, loss= 1.2357 (max= 2.0922), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:49:52,664 - root - INFO - Step 12700: lr=1.00E-05, loss= 1.2357 (max= 2.0922), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:49:52,664 - root - INFO - Step 12700: lr=1.00E-05, loss= 1.2357 (max= 2.0922), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:49:52,664 - root - INFO - Step 12700: lr=1.00E-05, loss= 1.2357 (max= 2.0922), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:49:52,664 - root - INFO - Step 12700: lr=1.00E-05, loss= 1.2357 (max= 2.0922), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:10,659 - root - INFO - Step 12710: lr=1.00E-05, loss= 1.2303 (max= 2.1501), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:10,659 - root - INFO - Step 12710: lr=1.00E-05, loss= 1.2303 (max= 2.1501), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:10,659 - root - INFO - Step 12710: lr=1.00E-05, loss= 1.2303 (max= 2.1501), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:10,659 - root - INFO - Step 12710: lr=1.00E-05, loss= 1.2303 (max= 2.1501), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:10,659 - root - INFO - Step 12710: lr=1.00E-05, loss= 1.2303 (max= 2.1501), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:10,659 - root - INFO - Step 12710: lr=1.00E-05, loss= 1.2303 (max= 2.1501), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:10,659 - root - INFO - Step 12710: lr=1.00E-05, loss= 1.2303 (max= 2.1501), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:10,660 - root - INFO - Step 12710: lr=1.00E-05, loss= 1.2303 (max= 2.1501), tps=18212, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:28,707 - root - INFO - Step 12720: lr=1.00E-05, loss= 1.2237 (max= 3.4147), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:28,707 - root - INFO - Step 12720: lr=1.00E-05, loss= 1.2237 (max= 3.4147), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:28,708 - root - INFO - Step 12720: lr=1.00E-05, loss= 1.2237 (max= 3.4147), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:28,708 - root - INFO - Step 12720: lr=1.00E-05, loss= 1.2237 (max= 3.4147), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:28,708 - root - INFO - Step 12720: lr=1.00E-05, loss= 1.2237 (max= 3.4147), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:28,708 - root - INFO - Step 12720: lr=1.00E-05, loss= 1.2237 (max= 3.4147), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:28,708 - root - INFO - Step 12720: lr=1.00E-05, loss= 1.2237 (max= 3.4147), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:28,708 - root - INFO - Step 12720: lr=1.00E-05, loss= 1.2237 (max= 3.4147), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:46,730 - root - INFO - Step 12730: lr=1.00E-05, loss= 1.2525 (max= 2.0169), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:46,730 - root - INFO - Step 12730: lr=1.00E-05, loss= 1.2525 (max= 2.0169), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:46,730 - root - INFO - Step 12730: lr=1.00E-05, loss= 1.2525 (max= 2.0169), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:46,730 - root - INFO - Step 12730: lr=1.00E-05, loss= 1.2525 (max= 2.0169), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:46,730 - root - INFO - Step 12730: lr=1.00E-05, loss= 1.2525 (max= 2.0169), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:46,730 - root - INFO - Step 12730: lr=1.00E-05, loss= 1.2525 (max= 2.0169), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:46,730 - root - INFO - Step 12730: lr=1.00E-05, loss= 1.2525 (max= 2.0169), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:50:46,730 - root - INFO - Step 12730: lr=1.00E-05, loss= 1.2525 (max= 2.0169), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:04,730 - root - INFO - Step 12740: lr=1.00E-05, loss= 1.2357 (max= 2.0725), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:04,730 - root - INFO - Step 12740: lr=1.00E-05, loss= 1.2357 (max= 2.0725), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:04,730 - root - INFO - Step 12740: lr=1.00E-05, loss= 1.2357 (max= 2.0725), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:04,730 - root - INFO - Step 12740: lr=1.00E-05, loss= 1.2357 (max= 2.0725), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:04,730 - root - INFO - Step 12740: lr=1.00E-05, loss= 1.2357 (max= 2.0725), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:04,730 - root - INFO - Step 12740: lr=1.00E-05, loss= 1.2357 (max= 2.0725), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:04,730 - root - INFO - Step 12740: lr=1.00E-05, loss= 1.2357 (max= 2.0725), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:04,731 - root - INFO - Step 12740: lr=1.00E-05, loss= 1.2357 (max= 2.0725), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:22,756 - root - INFO - Step 12750: lr=1.00E-05, loss= 1.2351 (max= 2.6188), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:22,757 - root - INFO - Step 12750: lr=1.00E-05, loss= 1.2351 (max= 2.6188), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:22,757 - root - INFO - Step 12750: lr=1.00E-05, loss= 1.2351 (max= 2.6188), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:22,757 - root - INFO - Step 12750: lr=1.00E-05, loss= 1.2351 (max= 2.6188), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:22,757 - root - INFO - Step 12750: lr=1.00E-05, loss= 1.2351 (max= 2.6188), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:22,757 - root - INFO - Step 12750: lr=1.00E-05, loss= 1.2351 (max= 2.6188), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:22,757 - root - INFO - Step 12750: lr=1.00E-05, loss= 1.2351 (max= 2.6188), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:22,757 - root - INFO - Step 12750: lr=1.00E-05, loss= 1.2351 (max= 2.6188), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:40,774 - root - INFO - Step 12760: lr=1.00E-05, loss= 1.2337 (max= 2.4855), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:40,774 - root - INFO - Step 12760: lr=1.00E-05, loss= 1.2337 (max= 2.4855), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:40,774 - root - INFO - Step 12760: lr=1.00E-05, loss= 1.2337 (max= 2.4855), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:40,775 - root - INFO - Step 12760: lr=1.00E-05, loss= 1.2337 (max= 2.4855), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:40,775 - root - INFO - Step 12760: lr=1.00E-05, loss= 1.2337 (max= 2.4855), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:40,775 - root - INFO - Step 12760: lr=1.00E-05, loss= 1.2337 (max= 2.4855), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:40,775 - root - INFO - Step 12760: lr=1.00E-05, loss= 1.2337 (max= 2.4855), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:40,775 - root - INFO - Step 12760: lr=1.00E-05, loss= 1.2337 (max= 2.4855), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:58,790 - root - INFO - Step 12770: lr=1.00E-05, loss= 1.2678 (max= 2.3844), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:58,790 - root - INFO - Step 12770: lr=1.00E-05, loss= 1.2678 (max= 2.3844), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:58,791 - root - INFO - Step 12770: lr=1.00E-05, loss= 1.2678 (max= 2.3844), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:58,791 - root - INFO - Step 12770: lr=1.00E-05, loss= 1.2678 (max= 2.3844), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:58,791 - root - INFO - Step 12770: lr=1.00E-05, loss= 1.2678 (max= 2.3844), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:58,791 - root - INFO - Step 12770: lr=1.00E-05, loss= 1.2678 (max= 2.3844), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:58,791 - root - INFO - Step 12770: lr=1.00E-05, loss= 1.2678 (max= 2.3844), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:51:58,791 - root - INFO - Step 12770: lr=1.00E-05, loss= 1.2678 (max= 2.3844), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:52:16,802 - root - INFO - Step 12780: lr=1.00E-05, loss= 1.1981 (max= 2.1619), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:52:16,802 - root - INFO - Step 12780: lr=1.00E-05, loss= 1.1981 (max= 2.1619), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:52:16,802 - root - INFO - Step 12780: lr=1.00E-05, loss= 1.1981 (max= 2.1619), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:52:16,803 - root - INFO - Step 12780: lr=1.00E-05, loss= 1.1981 (max= 2.1619), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:52:16,803 - root - INFO - Step 12780: lr=1.00E-05, loss= 1.1981 (max= 2.1619), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:52:16,803 - root - INFO - Step 12780: lr=1.00E-05, loss= 1.1981 (max= 2.1619), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:52:16,803 - root - INFO - Step 12780: lr=1.00E-05, loss= 1.1981 (max= 2.1619), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:52:16,803 - root - INFO - Step 12780: lr=1.00E-05, loss= 1.1981 (max= 2.1619), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:52:34,834 - root - INFO - Step 12790: lr=1.00E-05, loss= 1.1970 (max= 2.1937), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:52:34,835 - root - INFO - Step 12790: lr=1.00E-05, loss= 1.1970 (max= 2.1937), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:52:34,835 - root - INFO - Step 12790: lr=1.00E-05, loss= 1.1970 (max= 2.1937), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:52:34,835 - root - INFO - Step 12790: lr=1.00E-05, loss= 1.1970 (max= 2.1937), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:52:34,835 - root - INFO - Step 12790: lr=1.00E-05, loss= 1.1970 (max= 2.1937), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:52:34,835 - root - INFO - Step 12790: lr=1.00E-05, loss= 1.1970 (max= 2.1937), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:52:34,835 - root - INFO - Step 12790: lr=1.00E-05, loss= 1.1970 (max= 2.1937), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:52:34,835 - root - INFO - Step 12790: lr=1.00E-05, loss= 1.1970 (max= 2.1937), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:52:52,905 - root - INFO - Step 12800: lr=1.00E-05, loss= 1.2063 (max= 2.4181), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:52:52,905 - root - INFO - Step 12800: lr=1.00E-05, loss= 1.2063 (max= 2.4181), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:52:52,906 - root - INFO - Step 12800: lr=1.00E-05, loss= 1.2063 (max= 2.4181), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:52:52,906 - root - INFO - Step 12800: lr=1.00E-05, loss= 1.2063 (max= 2.4181), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:52:52,906 - root - INFO - Step 12800: lr=1.00E-05, loss= 1.2063 (max= 2.4181), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:52:52,906 - root - INFO - Step 12800: lr=1.00E-05, loss= 1.2063 (max= 2.4181), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:52:52,906 - root - INFO - Step 12800: lr=1.00E-05, loss= 1.2063 (max= 2.4181), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:52:52,906 - root - INFO - Step 12800: lr=1.00E-05, loss= 1.2063 (max= 2.4181), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:53:10,925 - root - INFO - Step 12810: lr=1.00E-05, loss= 1.2403 (max= 2.6330), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:53:10,925 - root - INFO - Step 12810: lr=1.00E-05, loss= 1.2403 (max= 2.6330), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:53:10,925 - root - INFO - Step 12810: lr=1.00E-05, loss= 1.2403 (max= 2.6330), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:53:10,925 - root - INFO - Step 12810: lr=1.00E-05, loss= 1.2403 (max= 2.6330), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:53:10,925 - root - INFO - Step 12810: lr=1.00E-05, loss= 1.2403 (max= 2.6330), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:53:10,925 - root - INFO - Step 12810: lr=1.00E-05, loss= 1.2403 (max= 2.6330), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:53:10,925 - root - INFO - Step 12810: lr=1.00E-05, loss= 1.2403 (max= 2.6330), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:53:10,926 - root - INFO - Step 12810: lr=1.00E-05, loss= 1.2403 (max= 2.6330), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:53:28,948 - root - INFO - Step 12820: lr=1.00E-05, loss= 1.2324 (max= 2.1688), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:53:28,948 - root - INFO - Step 12820: lr=1.00E-05, loss= 1.2324 (max= 2.1688), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:53:28,948 - root - INFO - Step 12820: lr=1.00E-05, loss= 1.2324 (max= 2.1688), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:53:28,948 - root - INFO - Step 12820: lr=1.00E-05, loss= 1.2324 (max= 2.1688), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:53:28,948 - root - INFO - Step 12820: lr=1.00E-05, loss= 1.2324 (max= 2.1688), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:53:28,948 - root - INFO - Step 12820: lr=1.00E-05, loss= 1.2324 (max= 2.1688), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:53:28,948 - root - INFO - Step 12820: lr=1.00E-05, loss= 1.2324 (max= 2.1688), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:53:28,949 - root - INFO - Step 12820: lr=1.00E-05, loss= 1.2324 (max= 2.1688), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:53:46,971 - root - INFO - Step 12830: lr=1.00E-05, loss= 1.2489 (max= 3.6215), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:53:46,971 - root - INFO - Step 12830: lr=1.00E-05, loss= 1.2489 (max= 3.6215), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:53:46,972 - root - INFO - Step 12830: lr=1.00E-05, loss= 1.2489 (max= 3.6215), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:53:46,972 - root - INFO - Step 12830: lr=1.00E-05, loss= 1.2489 (max= 3.6215), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:53:46,972 - root - INFO - Step 12830: lr=1.00E-05, loss= 1.2489 (max= 3.6215), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:53:46,972 - root - INFO - Step 12830: lr=1.00E-05, loss= 1.2489 (max= 3.6215), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:53:46,972 - root - INFO - Step 12830: lr=1.00E-05, loss= 1.2489 (max= 3.6215), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:53:46,972 - root - INFO - Step 12830: lr=1.00E-05, loss= 1.2489 (max= 3.6215), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:04,995 - root - INFO - Step 12840: lr=1.00E-05, loss= 1.2236 (max= 2.1378), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:54:04,996 - root - INFO - Step 12840: lr=1.00E-05, loss= 1.2236 (max= 2.1378), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:54:04,996 - root - INFO - Step 12840: lr=1.00E-05, loss= 1.2236 (max= 2.1378), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:54:04,996 - root - INFO - Step 12840: lr=1.00E-05, loss= 1.2236 (max= 2.1378), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:54:04,996 - root - INFO - Step 12840: lr=1.00E-05, loss= 1.2236 (max= 2.1378), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:54:04,996 - root - INFO - Step 12840: lr=1.00E-05, loss= 1.2236 (max= 2.1378), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:54:04,996 - root - INFO - Step 12840: lr=1.00E-05, loss= 1.2236 (max= 2.1378), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:54:04,996 - root - INFO - Step 12840: lr=1.00E-05, loss= 1.2236 (max= 2.1378), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:54:23,017 - root - INFO - Step 12850: lr=1.00E-05, loss= 1.2472 (max= 2.1349), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:23,017 - root - INFO - Step 12850: lr=1.00E-05, loss= 1.2472 (max= 2.1349), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:23,018 - root - INFO - Step 12850: lr=1.00E-05, loss= 1.2472 (max= 2.1349), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:23,018 - root - INFO - Step 12850: lr=1.00E-05, loss= 1.2472 (max= 2.1349), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:23,018 - root - INFO - Step 12850: lr=1.00E-05, loss= 1.2472 (max= 2.1349), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:23,018 - root - INFO - Step 12850: lr=1.00E-05, loss= 1.2472 (max= 2.1349), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:23,018 - root - INFO - Step 12850: lr=1.00E-05, loss= 1.2472 (max= 2.1349), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:23,019 - root - INFO - Step 12850: lr=1.00E-05, loss= 1.2472 (max= 2.1349), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:41,026 - root - INFO - Step 12860: lr=1.00E-05, loss= 1.2485 (max= 2.1603), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:41,026 - root - INFO - Step 12860: lr=1.00E-05, loss= 1.2485 (max= 2.1603), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:41,027 - root - INFO - Step 12860: lr=1.00E-05, loss= 1.2485 (max= 2.1603), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:41,027 - root - INFO - Step 12860: lr=1.00E-05, loss= 1.2485 (max= 2.1603), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:41,027 - root - INFO - Step 12860: lr=1.00E-05, loss= 1.2485 (max= 2.1603), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:41,027 - root - INFO - Step 12860: lr=1.00E-05, loss= 1.2485 (max= 2.1603), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:41,027 - root - INFO - Step 12860: lr=1.00E-05, loss= 1.2485 (max= 2.1603), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:41,027 - root - INFO - Step 12860: lr=1.00E-05, loss= 1.2485 (max= 2.1603), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:59,068 - root - INFO - Step 12870: lr=1.00E-05, loss= 1.2107 (max= 2.2378), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:59,068 - root - INFO - Step 12870: lr=1.00E-05, loss= 1.2107 (max= 2.2378), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:59,068 - root - INFO - Step 12870: lr=1.00E-05, loss= 1.2107 (max= 2.2378), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:59,068 - root - INFO - Step 12870: lr=1.00E-05, loss= 1.2107 (max= 2.2378), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:59,068 - root - INFO - Step 12870: lr=1.00E-05, loss= 1.2107 (max= 2.2378), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:59,068 - root - INFO - Step 12870: lr=1.00E-05, loss= 1.2107 (max= 2.2378), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:59,069 - root - INFO - Step 12870: lr=1.00E-05, loss= 1.2107 (max= 2.2378), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:54:59,069 - root - INFO - Step 12870: lr=1.00E-05, loss= 1.2107 (max= 2.2378), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:17,091 - root - INFO - Step 12880: lr=1.00E-05, loss= 1.2559 (max= 2.4369), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:17,091 - root - INFO - Step 12880: lr=1.00E-05, loss= 1.2559 (max= 2.4369), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:17,092 - root - INFO - Step 12880: lr=1.00E-05, loss= 1.2559 (max= 2.4369), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:17,092 - root - INFO - Step 12880: lr=1.00E-05, loss= 1.2559 (max= 2.4369), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:17,092 - root - INFO - Step 12880: lr=1.00E-05, loss= 1.2559 (max= 2.4369), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:17,092 - root - INFO - Step 12880: lr=1.00E-05, loss= 1.2559 (max= 2.4369), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:17,092 - root - INFO - Step 12880: lr=1.00E-05, loss= 1.2559 (max= 2.4369), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:17,092 - root - INFO - Step 12880: lr=1.00E-05, loss= 1.2559 (max= 2.4369), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:35,127 - root - INFO - Step 12890: lr=1.00E-05, loss= 1.2449 (max= 2.3264), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:35,127 - root - INFO - Step 12890: lr=1.00E-05, loss= 1.2449 (max= 2.3264), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:35,127 - root - INFO - Step 12890: lr=1.00E-05, loss= 1.2449 (max= 2.3264), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:35,127 - root - INFO - Step 12890: lr=1.00E-05, loss= 1.2449 (max= 2.3264), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:35,127 - root - INFO - Step 12890: lr=1.00E-05, loss= 1.2449 (max= 2.3264), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:35,127 - root - INFO - Step 12890: lr=1.00E-05, loss= 1.2449 (max= 2.3264), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:35,127 - root - INFO - Step 12890: lr=1.00E-05, loss= 1.2449 (max= 2.3264), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:35,128 - root - INFO - Step 12890: lr=1.00E-05, loss= 1.2449 (max= 2.3264), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:53,178 - root - INFO - Step 12900: lr=1.00E-05, loss= 1.2346 (max= 2.2062), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:53,178 - root - INFO - Step 12900: lr=1.00E-05, loss= 1.2346 (max= 2.2062), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:53,178 - root - INFO - Step 12900: lr=1.00E-05, loss= 1.2346 (max= 2.2062), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:53,178 - root - INFO - Step 12900: lr=1.00E-05, loss= 1.2346 (max= 2.2062), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:53,178 - root - INFO - Step 12900: lr=1.00E-05, loss= 1.2346 (max= 2.2062), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:53,179 - root - INFO - Step 12900: lr=1.00E-05, loss= 1.2346 (max= 2.2062), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:53,179 - root - INFO - Step 12900: lr=1.00E-05, loss= 1.2346 (max= 2.2062), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:55:53,179 - root - INFO - Step 12900: lr=1.00E-05, loss= 1.2346 (max= 2.2062), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:11,253 - root - INFO - Step 12910: lr=1.00E-05, loss= 1.2714 (max= 2.1365), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:11,253 - root - INFO - Step 12910: lr=1.00E-05, loss= 1.2714 (max= 2.1365), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:11,253 - root - INFO - Step 12910: lr=1.00E-05, loss= 1.2714 (max= 2.1365), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:11,253 - root - INFO - Step 12910: lr=1.00E-05, loss= 1.2714 (max= 2.1365), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:11,253 - root - INFO - Step 12910: lr=1.00E-05, loss= 1.2714 (max= 2.1365), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:11,253 - root - INFO - Step 12910: lr=1.00E-05, loss= 1.2714 (max= 2.1365), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:11,253 - root - INFO - Step 12910: lr=1.00E-05, loss= 1.2714 (max= 2.1365), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:11,253 - root - INFO - Step 12910: lr=1.00E-05, loss= 1.2714 (max= 2.1365), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:29,309 - root - INFO - Step 12920: lr=1.00E-05, loss= 1.2094 (max= 2.0835), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:29,309 - root - INFO - Step 12920: lr=1.00E-05, loss= 1.2094 (max= 2.0835), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:29,309 - root - INFO - Step 12920: lr=1.00E-05, loss= 1.2094 (max= 2.0835), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:29,309 - root - INFO - Step 12920: lr=1.00E-05, loss= 1.2094 (max= 2.0835), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:29,309 - root - INFO - Step 12920: lr=1.00E-05, loss= 1.2094 (max= 2.0835), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:29,309 - root - INFO - Step 12920: lr=1.00E-05, loss= 1.2094 (max= 2.0835), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:29,309 - root - INFO - Step 12920: lr=1.00E-05, loss= 1.2094 (max= 2.0835), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:29,309 - root - INFO - Step 12920: lr=1.00E-05, loss= 1.2094 (max= 2.0835), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:47,315 - root - INFO - Step 12930: lr=1.00E-05, loss= 1.2295 (max= 2.4057), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:47,315 - root - INFO - Step 12930: lr=1.00E-05, loss= 1.2295 (max= 2.4057), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:47,315 - root - INFO - Step 12930: lr=1.00E-05, loss= 1.2295 (max= 2.4057), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:47,315 - root - INFO - Step 12930: lr=1.00E-05, loss= 1.2295 (max= 2.4057), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:47,315 - root - INFO - Step 12930: lr=1.00E-05, loss= 1.2295 (max= 2.4057), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:47,315 - root - INFO - Step 12930: lr=1.00E-05, loss= 1.2295 (max= 2.4057), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:47,315 - root - INFO - Step 12930: lr=1.00E-05, loss= 1.2295 (max= 2.4057), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:56:47,315 - root - INFO - Step 12930: lr=1.00E-05, loss= 1.2295 (max= 2.4057), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:57:05,335 - root - INFO - Step 12940: lr=1.00E-05, loss= 1.2314 (max= 2.5577), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:05,335 - root - INFO - Step 12940: lr=1.00E-05, loss= 1.2314 (max= 2.5577), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:05,335 - root - INFO - Step 12940: lr=1.00E-05, loss= 1.2314 (max= 2.5577), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:05,335 - root - INFO - Step 12940: lr=1.00E-05, loss= 1.2314 (max= 2.5577), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:05,335 - root - INFO - Step 12940: lr=1.00E-05, loss= 1.2314 (max= 2.5577), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:05,335 - root - INFO - Step 12940: lr=1.00E-05, loss= 1.2314 (max= 2.5577), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:05,335 - root - INFO - Step 12940: lr=1.00E-05, loss= 1.2314 (max= 2.5577), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:05,335 - root - INFO - Step 12940: lr=1.00E-05, loss= 1.2314 (max= 2.5577), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:23,353 - root - INFO - Step 12950: lr=1.00E-05, loss= 1.2134 (max= 2.5148), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:23,353 - root - INFO - Step 12950: lr=1.00E-05, loss= 1.2134 (max= 2.5148), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:23,353 - root - INFO - Step 12950: lr=1.00E-05, loss= 1.2134 (max= 2.5148), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:23,353 - root - INFO - Step 12950: lr=1.00E-05, loss= 1.2134 (max= 2.5148), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:23,353 - root - INFO - Step 12950: lr=1.00E-05, loss= 1.2134 (max= 2.5148), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:23,353 - root - INFO - Step 12950: lr=1.00E-05, loss= 1.2134 (max= 2.5148), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:23,353 - root - INFO - Step 12950: lr=1.00E-05, loss= 1.2134 (max= 2.5148), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:23,353 - root - INFO - Step 12950: lr=1.00E-05, loss= 1.2134 (max= 2.5148), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:41,382 - root - INFO - Step 12960: lr=1.00E-05, loss= 1.1993 (max= 2.2088), tps=18178, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:57:41,382 - root - INFO - Step 12960: lr=1.00E-05, loss= 1.1993 (max= 2.2088), tps=18178, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:57:41,382 - root - INFO - Step 12960: lr=1.00E-05, loss= 1.1993 (max= 2.2088), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:57:41,382 - root - INFO - Step 12960: lr=1.00E-05, loss= 1.1993 (max= 2.2088), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:57:41,382 - root - INFO - Step 12960: lr=1.00E-05, loss= 1.1993 (max= 2.2088), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:57:41,382 - root - INFO - Step 12960: lr=1.00E-05, loss= 1.1993 (max= 2.2088), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:57:41,382 - root - INFO - Step 12960: lr=1.00E-05, loss= 1.1993 (max= 2.2088), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:57:41,382 - root - INFO - Step 12960: lr=1.00E-05, loss= 1.1993 (max= 2.2088), tps=18178, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:57:59,411 - root - INFO - Step 12970: lr=1.00E-05, loss= 1.2118 (max= 2.2017), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:59,411 - root - INFO - Step 12970: lr=1.00E-05, loss= 1.2118 (max= 2.2017), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:59,411 - root - INFO - Step 12970: lr=1.00E-05, loss= 1.2118 (max= 2.2017), tps=18178, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:59,411 - root - INFO - Step 12970: lr=1.00E-05, loss= 1.2118 (max= 2.2017), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:59,411 - root - INFO - Step 12970: lr=1.00E-05, loss= 1.2118 (max= 2.2017), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:59,411 - root - INFO - Step 12970: lr=1.00E-05, loss= 1.2118 (max= 2.2017), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:59,411 - root - INFO - Step 12970: lr=1.00E-05, loss= 1.2118 (max= 2.2017), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:57:59,411 - root - INFO - Step 12970: lr=1.00E-05, loss= 1.2118 (max= 2.2017), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:58:17,459 - root - INFO - Step 12980: lr=1.00E-05, loss= 1.2134 (max= 3.5803), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:58:17,460 - root - INFO - Step 12980: lr=1.00E-05, loss= 1.2134 (max= 3.5803), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:58:17,460 - root - INFO - Step 12980: lr=1.00E-05, loss= 1.2134 (max= 3.5803), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:58:17,460 - root - INFO - Step 12980: lr=1.00E-05, loss= 1.2134 (max= 3.5803), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:58:17,460 - root - INFO - Step 12980: lr=1.00E-05, loss= 1.2134 (max= 3.5803), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:58:17,460 - root - INFO - Step 12980: lr=1.00E-05, loss= 1.2134 (max= 3.5803), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:58:17,460 - root - INFO - Step 12980: lr=1.00E-05, loss= 1.2134 (max= 3.5803), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:58:17,460 - root - INFO - Step 12980: lr=1.00E-05, loss= 1.2134 (max= 3.5803), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:58:35,474 - root - INFO - Step 12990: lr=1.00E-05, loss= 1.2552 (max= 2.3849), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:58:35,475 - root - INFO - Step 12990: lr=1.00E-05, loss= 1.2552 (max= 2.3849), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:58:35,475 - root - INFO - Step 12990: lr=1.00E-05, loss= 1.2552 (max= 2.3849), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:58:35,475 - root - INFO - Step 12990: lr=1.00E-05, loss= 1.2552 (max= 2.3849), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:58:35,475 - root - INFO - Step 12990: lr=1.00E-05, loss= 1.2552 (max= 2.3849), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:58:35,475 - root - INFO - Step 12990: lr=1.00E-05, loss= 1.2552 (max= 2.3849), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:58:35,475 - root - INFO - Step 12990: lr=1.00E-05, loss= 1.2552 (max= 2.3849), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:58:35,475 - root - INFO - Step 12990: lr=1.00E-05, loss= 1.2552 (max= 2.3849), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-13000 +2025-10-24 15:58:53,496 - root - INFO - Step 13000: lr=1.00E-05, loss= 1.2344 (max= 2.2421), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:58:53,496 - root - INFO - Saving a full checkpoint at step 13000 +2025-10-24 15:58:53,496 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 15:58:53,496 - root - INFO - Step 13000: lr=1.00E-05, loss= 1.2344 (max= 2.2421), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:58:53,496 - root - INFO - Step 13000: lr=1.00E-05, loss= 1.2344 (max= 2.2421), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:58:53,496 - root - INFO - Step 13000: lr=1.00E-05, loss= 1.2344 (max= 2.2421), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:58:53,496 - root - INFO - Saving a full checkpoint at step 13000 +2025-10-24 15:58:53,496 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 15:58:53,496 - root - INFO - Saving a full checkpoint at step 13000 +2025-10-24 15:58:53,496 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 15:58:53,496 - root - INFO - Saving a full checkpoint at step 13000 +2025-10-24 15:58:53,496 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 15:58:53,496 - root - INFO - Step 13000: lr=1.00E-05, loss= 1.2344 (max= 2.2421), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:58:53,496 - root - INFO - Step 13000: lr=1.00E-05, loss= 1.2344 (max= 2.2421), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:58:53,497 - root - INFO - Step 13000: lr=1.00E-05, loss= 1.2344 (max= 2.2421), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:58:53,497 - root - INFO - Saving a full checkpoint at step 13000 +2025-10-24 15:58:53,497 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 15:58:53,497 - root - INFO - Saving a full checkpoint at step 13000 +2025-10-24 15:58:53,497 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 15:58:53,497 - root - INFO - Saving a full checkpoint at step 13000 +2025-10-24 15:58:53,497 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 15:58:53,497 - root - INFO - Step 13000: lr=1.00E-05, loss= 1.2344 (max= 2.2421), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:58:53,497 - root - INFO - Saving a full checkpoint at step 13000 +2025-10-24 15:58:53,497 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-13000! Save time: 4.573591709136963 +2025-10-24 15:59:06,967 - root - INFO - Finished saving the checkpoint in 13.47 seconds +2025-10-24 15:59:06,973 - root - INFO - Finished saving the checkpoint in 13.48 seconds +2025-10-24 15:59:06,973 - root - INFO - Finished saving the checkpoint in 13.48 seconds +2025-10-24 15:59:06,973 - root - INFO - Finished saving the checkpoint in 13.48 seconds +2025-10-24 15:59:06,974 - root - INFO - Finished saving the checkpoint in 13.48 seconds +2025-10-24 15:59:06,974 - root - INFO - Finished saving the checkpoint in 13.48 seconds +2025-10-24 15:59:06,974 - root - INFO - Finished saving the checkpoint in 13.48 seconds +2025-10-24 15:59:06,976 - root - INFO - Finished saving the checkpoint in 13.48 seconds +2025-10-24 15:59:24,911 - root - INFO - Step 13010: lr=1.00E-05, loss= 1.1948 (max= 2.2442), tps=10432, mfu=21.73%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:59:24,911 - root - INFO - Step 13010: lr=1.00E-05, loss= 1.1948 (max= 2.2442), tps=10432, mfu=21.73%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:59:24,911 - root - INFO - Step 13010: lr=1.00E-05, loss= 1.1948 (max= 2.2442), tps=10432, mfu=21.73%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:59:24,911 - root - INFO - Step 13010: lr=1.00E-05, loss= 1.1948 (max= 2.2442), tps=10432, mfu=21.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:59:24,911 - root - INFO - Step 13010: lr=1.00E-05, loss= 1.1948 (max= 2.2442), tps=10432, mfu=21.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:59:24,911 - root - INFO - Step 13010: lr=1.00E-05, loss= 1.1948 (max= 2.2442), tps=10432, mfu=21.73%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:59:24,911 - root - INFO - Step 13010: lr=1.00E-05, loss= 1.1948 (max= 2.2442), tps=10432, mfu=21.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:59:24,911 - root - INFO - Step 13010: lr=1.00E-05, loss= 1.1948 (max= 2.2442), tps=10432, mfu=21.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 15:59:42,913 - root - INFO - Step 13020: lr=1.00E-05, loss= 1.2481 (max= 2.5234), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:59:42,913 - root - INFO - Step 13020: lr=1.00E-05, loss= 1.2481 (max= 2.5234), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:59:42,913 - root - INFO - Step 13020: lr=1.00E-05, loss= 1.2481 (max= 2.5234), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:59:42,913 - root - INFO - Step 13020: lr=1.00E-05, loss= 1.2481 (max= 2.5234), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:59:42,913 - root - INFO - Step 13020: lr=1.00E-05, loss= 1.2481 (max= 2.5234), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:59:42,913 - root - INFO - Step 13020: lr=1.00E-05, loss= 1.2481 (max= 2.5234), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:59:42,914 - root - INFO - Step 13020: lr=1.00E-05, loss= 1.2481 (max= 2.5234), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 15:59:42,914 - root - INFO - Step 13020: lr=1.00E-05, loss= 1.2481 (max= 2.5234), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:00,964 - root - INFO - Step 13030: lr=1.00E-05, loss= 1.2258 (max= 2.3607), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:00,964 - root - INFO - Step 13030: lr=1.00E-05, loss= 1.2258 (max= 2.3607), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:00,964 - root - INFO - Step 13030: lr=1.00E-05, loss= 1.2258 (max= 2.3607), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:00,964 - root - INFO - Step 13030: lr=1.00E-05, loss= 1.2258 (max= 2.3607), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:00,964 - root - INFO - Step 13030: lr=1.00E-05, loss= 1.2258 (max= 2.3607), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:00,964 - root - INFO - Step 13030: lr=1.00E-05, loss= 1.2258 (max= 2.3607), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:00,964 - root - INFO - Step 13030: lr=1.00E-05, loss= 1.2258 (max= 2.3607), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:00,964 - root - INFO - Step 13030: lr=1.00E-05, loss= 1.2258 (max= 2.3607), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:19,018 - root - INFO - Step 13040: lr=1.00E-05, loss= 1.2221 (max= 3.1536), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:19,018 - root - INFO - Step 13040: lr=1.00E-05, loss= 1.2221 (max= 3.1536), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:19,018 - root - INFO - Step 13040: lr=1.00E-05, loss= 1.2221 (max= 3.1536), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:19,018 - root - INFO - Step 13040: lr=1.00E-05, loss= 1.2221 (max= 3.1536), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:19,018 - root - INFO - Step 13040: lr=1.00E-05, loss= 1.2221 (max= 3.1536), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:19,019 - root - INFO - Step 13040: lr=1.00E-05, loss= 1.2221 (max= 3.1536), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:19,019 - root - INFO - Step 13040: lr=1.00E-05, loss= 1.2221 (max= 3.1536), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:19,019 - root - INFO - Step 13040: lr=1.00E-05, loss= 1.2221 (max= 3.1536), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:37,059 - root - INFO - Step 13050: lr=1.00E-05, loss= 1.2166 (max= 2.0541), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:37,059 - root - INFO - Step 13050: lr=1.00E-05, loss= 1.2166 (max= 2.0541), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:37,059 - root - INFO - Step 13050: lr=1.00E-05, loss= 1.2166 (max= 2.0541), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:37,059 - root - INFO - Step 13050: lr=1.00E-05, loss= 1.2166 (max= 2.0541), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:37,059 - root - INFO - Step 13050: lr=1.00E-05, loss= 1.2166 (max= 2.0541), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:37,059 - root - INFO - Step 13050: lr=1.00E-05, loss= 1.2166 (max= 2.0541), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:37,059 - root - INFO - Step 13050: lr=1.00E-05, loss= 1.2166 (max= 2.0541), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:37,059 - root - INFO - Step 13050: lr=1.00E-05, loss= 1.2166 (max= 2.0541), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:55,150 - root - INFO - Step 13060: lr=1.00E-05, loss= 1.2447 (max= 2.4406), tps=18117, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:55,150 - root - INFO - Step 13060: lr=1.00E-05, loss= 1.2447 (max= 2.4406), tps=18117, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:55,150 - root - INFO - Step 13060: lr=1.00E-05, loss= 1.2447 (max= 2.4406), tps=18117, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:55,150 - root - INFO - Step 13060: lr=1.00E-05, loss= 1.2447 (max= 2.4406), tps=18117, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:55,150 - root - INFO - Step 13060: lr=1.00E-05, loss= 1.2447 (max= 2.4406), tps=18117, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:55,150 - root - INFO - Step 13060: lr=1.00E-05, loss= 1.2447 (max= 2.4406), tps=18116, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:55,150 - root - INFO - Step 13060: lr=1.00E-05, loss= 1.2447 (max= 2.4406), tps=18117, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:00:55,150 - root - INFO - Step 13060: lr=1.00E-05, loss= 1.2447 (max= 2.4406), tps=18117, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:13,164 - root - INFO - Step 13070: lr=1.00E-05, loss= 1.2050 (max= 2.3709), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:13,164 - root - INFO - Step 13070: lr=1.00E-05, loss= 1.2050 (max= 2.3709), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:13,164 - root - INFO - Step 13070: lr=1.00E-05, loss= 1.2050 (max= 2.3709), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:13,164 - root - INFO - Step 13070: lr=1.00E-05, loss= 1.2050 (max= 2.3709), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:13,164 - root - INFO - Step 13070: lr=1.00E-05, loss= 1.2050 (max= 2.3709), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:13,164 - root - INFO - Step 13070: lr=1.00E-05, loss= 1.2050 (max= 2.3709), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:13,164 - root - INFO - Step 13070: lr=1.00E-05, loss= 1.2050 (max= 2.3709), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:13,165 - root - INFO - Step 13070: lr=1.00E-05, loss= 1.2050 (max= 2.3709), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:31,217 - root - INFO - Step 13080: lr=1.00E-05, loss= 1.2284 (max= 2.0453), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:31,217 - root - INFO - Step 13080: lr=1.00E-05, loss= 1.2284 (max= 2.0453), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:31,218 - root - INFO - Step 13080: lr=1.00E-05, loss= 1.2284 (max= 2.0453), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:31,218 - root - INFO - Step 13080: lr=1.00E-05, loss= 1.2284 (max= 2.0453), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:31,218 - root - INFO - Step 13080: lr=1.00E-05, loss= 1.2284 (max= 2.0453), tps=18154, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:31,218 - root - INFO - Step 13080: lr=1.00E-05, loss= 1.2284 (max= 2.0453), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:31,218 - root - INFO - Step 13080: lr=1.00E-05, loss= 1.2284 (max= 2.0453), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:31,218 - root - INFO - Step 13080: lr=1.00E-05, loss= 1.2284 (max= 2.0453), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:49,233 - root - INFO - Step 13090: lr=1.00E-05, loss= 1.2284 (max= 2.1882), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:49,233 - root - INFO - Step 13090: lr=1.00E-05, loss= 1.2284 (max= 2.1882), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:49,233 - root - INFO - Step 13090: lr=1.00E-05, loss= 1.2284 (max= 2.1882), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:49,233 - root - INFO - Step 13090: lr=1.00E-05, loss= 1.2284 (max= 2.1882), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:49,233 - root - INFO - Step 13090: lr=1.00E-05, loss= 1.2284 (max= 2.1882), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:49,233 - root - INFO - Step 13090: lr=1.00E-05, loss= 1.2284 (max= 2.1882), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:49,233 - root - INFO - Step 13090: lr=1.00E-05, loss= 1.2284 (max= 2.1882), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:01:49,233 - root - INFO - Step 13090: lr=1.00E-05, loss= 1.2284 (max= 2.1882), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:02:07,244 - root - INFO - Step 13100: lr=1.00E-05, loss= 1.2425 (max= 2.2244), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:02:07,245 - root - INFO - Step 13100: lr=1.00E-05, loss= 1.2425 (max= 2.2244), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:02:07,245 - root - INFO - Step 13100: lr=1.00E-05, loss= 1.2425 (max= 2.2244), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:02:07,245 - root - INFO - Step 13100: lr=1.00E-05, loss= 1.2425 (max= 2.2244), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:02:07,245 - root - INFO - Step 13100: lr=1.00E-05, loss= 1.2425 (max= 2.2244), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:02:07,245 - root - INFO - Step 13100: lr=1.00E-05, loss= 1.2425 (max= 2.2244), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:02:07,245 - root - INFO - Step 13100: lr=1.00E-05, loss= 1.2425 (max= 2.2244), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:02:07,245 - root - INFO - Step 13100: lr=1.00E-05, loss= 1.2425 (max= 2.2244), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:02:25,276 - root - INFO - Step 13110: lr=1.00E-05, loss= 1.2535 (max= 3.3531), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:02:25,277 - root - INFO - Step 13110: lr=1.00E-05, loss= 1.2535 (max= 3.3531), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:02:25,277 - root - INFO - Step 13110: lr=1.00E-05, loss= 1.2535 (max= 3.3531), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:02:25,277 - root - INFO - Step 13110: lr=1.00E-05, loss= 1.2535 (max= 3.3531), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:02:25,277 - root - INFO - Step 13110: lr=1.00E-05, loss= 1.2535 (max= 3.3531), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:02:25,277 - root - INFO - Step 13110: lr=1.00E-05, loss= 1.2535 (max= 3.3531), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:02:25,277 - root - INFO - Step 13110: lr=1.00E-05, loss= 1.2535 (max= 3.3531), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:02:25,277 - root - INFO - Step 13110: lr=1.00E-05, loss= 1.2535 (max= 3.3531), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:02:43,337 - root - INFO - Step 13120: lr=1.00E-05, loss= 1.2336 (max= 3.1654), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:02:43,337 - root - INFO - Step 13120: lr=1.00E-05, loss= 1.2336 (max= 3.1654), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:02:43,337 - root - INFO - Step 13120: lr=1.00E-05, loss= 1.2336 (max= 3.1654), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:02:43,337 - root - INFO - Step 13120: lr=1.00E-05, loss= 1.2336 (max= 3.1654), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:02:43,337 - root - INFO - Step 13120: lr=1.00E-05, loss= 1.2336 (max= 3.1654), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:02:43,337 - root - INFO - Step 13120: lr=1.00E-05, loss= 1.2336 (max= 3.1654), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:02:43,337 - root - INFO - Step 13120: lr=1.00E-05, loss= 1.2336 (max= 3.1654), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:02:43,337 - root - INFO - Step 13120: lr=1.00E-05, loss= 1.2336 (max= 3.1654), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:03:01,351 - root - INFO - Step 13130: lr=1.00E-05, loss= 1.2379 (max= 2.0296), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:03:01,351 - root - INFO - Step 13130: lr=1.00E-05, loss= 1.2379 (max= 2.0296), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:03:01,351 - root - INFO - Step 13130: lr=1.00E-05, loss= 1.2379 (max= 2.0296), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:03:01,351 - root - INFO - Step 13130: lr=1.00E-05, loss= 1.2379 (max= 2.0296), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:03:01,351 - root - INFO - Step 13130: lr=1.00E-05, loss= 1.2379 (max= 2.0296), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:03:01,351 - root - INFO - Step 13130: lr=1.00E-05, loss= 1.2379 (max= 2.0296), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:03:01,351 - root - INFO - Step 13130: lr=1.00E-05, loss= 1.2379 (max= 2.0296), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:03:01,351 - root - INFO - Step 13130: lr=1.00E-05, loss= 1.2379 (max= 2.0296), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:03:19,375 - root - INFO - Step 13140: lr=1.00E-05, loss= 1.2404 (max= 2.0630), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:03:19,375 - root - INFO - Step 13140: lr=1.00E-05, loss= 1.2404 (max= 2.0630), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:03:19,375 - root - INFO - Step 13140: lr=1.00E-05, loss= 1.2404 (max= 2.0630), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:03:19,375 - root - INFO - Step 13140: lr=1.00E-05, loss= 1.2404 (max= 2.0630), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:03:19,375 - root - INFO - Step 13140: lr=1.00E-05, loss= 1.2404 (max= 2.0630), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:03:19,375 - root - INFO - Step 13140: lr=1.00E-05, loss= 1.2404 (max= 2.0630), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:03:19,375 - root - INFO - Step 13140: lr=1.00E-05, loss= 1.2404 (max= 2.0630), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:03:19,375 - root - INFO - Step 13140: lr=1.00E-05, loss= 1.2404 (max= 2.0630), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:03:37,387 - root - INFO - Step 13150: lr=1.00E-05, loss= 1.2214 (max= 2.2511), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:03:37,387 - root - INFO - Step 13150: lr=1.00E-05, loss= 1.2214 (max= 2.2511), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:03:37,387 - root - INFO - Step 13150: lr=1.00E-05, loss= 1.2214 (max= 2.2511), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:03:37,387 - root - INFO - Step 13150: lr=1.00E-05, loss= 1.2214 (max= 2.2511), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:03:37,387 - root - INFO - Step 13150: lr=1.00E-05, loss= 1.2214 (max= 2.2511), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:03:37,387 - root - INFO - Step 13150: lr=1.00E-05, loss= 1.2214 (max= 2.2511), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:03:37,387 - root - INFO - Step 13150: lr=1.00E-05, loss= 1.2214 (max= 2.2511), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:03:37,387 - root - INFO - Step 13150: lr=1.00E-05, loss= 1.2214 (max= 2.2511), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:03:55,412 - root - INFO - Step 13160: lr=1.00E-05, loss= 1.2021 (max= 2.0041), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:03:55,412 - root - INFO - Step 13160: lr=1.00E-05, loss= 1.2021 (max= 2.0041), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:03:55,412 - root - INFO - Step 13160: lr=1.00E-05, loss= 1.2021 (max= 2.0041), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:03:55,412 - root - INFO - Step 13160: lr=1.00E-05, loss= 1.2021 (max= 2.0041), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:03:55,412 - root - INFO - Step 13160: lr=1.00E-05, loss= 1.2021 (max= 2.0041), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:03:55,412 - root - INFO - Step 13160: lr=1.00E-05, loss= 1.2021 (max= 2.0041), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:03:55,412 - root - INFO - Step 13160: lr=1.00E-05, loss= 1.2021 (max= 2.0041), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:03:55,412 - root - INFO - Step 13160: lr=1.00E-05, loss= 1.2021 (max= 2.0041), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:04:13,406 - root - INFO - Step 13170: lr=1.00E-05, loss= 1.2351 (max= 2.3270), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:04:13,406 - root - INFO - Step 13170: lr=1.00E-05, loss= 1.2351 (max= 2.3270), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:04:13,406 - root - INFO - Step 13170: lr=1.00E-05, loss= 1.2351 (max= 2.3270), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:04:13,406 - root - INFO - Step 13170: lr=1.00E-05, loss= 1.2351 (max= 2.3270), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:04:13,406 - root - INFO - Step 13170: lr=1.00E-05, loss= 1.2351 (max= 2.3270), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:04:13,406 - root - INFO - Step 13170: lr=1.00E-05, loss= 1.2351 (max= 2.3270), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:04:13,406 - root - INFO - Step 13170: lr=1.00E-05, loss= 1.2351 (max= 2.3270), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:04:13,406 - root - INFO - Step 13170: lr=1.00E-05, loss= 1.2351 (max= 2.3270), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:04:31,434 - root - INFO - Step 13180: lr=1.00E-05, loss= 1.2201 (max= 2.3838), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:04:31,434 - root - INFO - Step 13180: lr=1.00E-05, loss= 1.2201 (max= 2.3838), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:04:31,434 - root - INFO - Step 13180: lr=1.00E-05, loss= 1.2201 (max= 2.3838), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:04:31,434 - root - INFO - Step 13180: lr=1.00E-05, loss= 1.2201 (max= 2.3838), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:04:31,434 - root - INFO - Step 13180: lr=1.00E-05, loss= 1.2201 (max= 2.3838), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:04:31,434 - root - INFO - Step 13180: lr=1.00E-05, loss= 1.2201 (max= 2.3838), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:04:31,434 - root - INFO - Step 13180: lr=1.00E-05, loss= 1.2201 (max= 2.3838), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:04:31,434 - root - INFO - Step 13180: lr=1.00E-05, loss= 1.2201 (max= 2.3838), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:04:49,467 - root - INFO - Step 13190: lr=1.00E-05, loss= 1.2004 (max= 2.0436), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:04:49,467 - root - INFO - Step 13190: lr=1.00E-05, loss= 1.2004 (max= 2.0436), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:04:49,467 - root - INFO - Step 13190: lr=1.00E-05, loss= 1.2004 (max= 2.0436), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:04:49,467 - root - INFO - Step 13190: lr=1.00E-05, loss= 1.2004 (max= 2.0436), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:04:49,467 - root - INFO - Step 13190: lr=1.00E-05, loss= 1.2004 (max= 2.0436), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:04:49,467 - root - INFO - Step 13190: lr=1.00E-05, loss= 1.2004 (max= 2.0436), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:04:49,467 - root - INFO - Step 13190: lr=1.00E-05, loss= 1.2004 (max= 2.0436), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:04:49,467 - root - INFO - Step 13190: lr=1.00E-05, loss= 1.2004 (max= 2.0436), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:07,518 - root - INFO - Step 13200: lr=1.00E-05, loss= 1.2340 (max= 2.3532), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:07,518 - root - INFO - Step 13200: lr=1.00E-05, loss= 1.2340 (max= 2.3532), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:07,518 - root - INFO - Step 13200: lr=1.00E-05, loss= 1.2340 (max= 2.3532), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:07,518 - root - INFO - Step 13200: lr=1.00E-05, loss= 1.2340 (max= 2.3532), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:07,518 - root - INFO - Step 13200: lr=1.00E-05, loss= 1.2340 (max= 2.3532), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:07,518 - root - INFO - Step 13200: lr=1.00E-05, loss= 1.2340 (max= 2.3532), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:07,518 - root - INFO - Step 13200: lr=1.00E-05, loss= 1.2340 (max= 2.3532), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:07,520 - root - INFO - Step 13200: lr=1.00E-05, loss= 1.2340 (max= 2.3532), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:25,559 - root - INFO - Step 13210: lr=1.00E-05, loss= 1.2293 (max= 2.2039), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:25,559 - root - INFO - Step 13210: lr=1.00E-05, loss= 1.2293 (max= 2.2039), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:25,559 - root - INFO - Step 13210: lr=1.00E-05, loss= 1.2293 (max= 2.2039), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:25,560 - root - INFO - Step 13210: lr=1.00E-05, loss= 1.2293 (max= 2.2039), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:25,560 - root - INFO - Step 13210: lr=1.00E-05, loss= 1.2293 (max= 2.2039), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:25,560 - root - INFO - Step 13210: lr=1.00E-05, loss= 1.2293 (max= 2.2039), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:25,560 - root - INFO - Step 13210: lr=1.00E-05, loss= 1.2293 (max= 2.2039), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:25,560 - root - INFO - Step 13210: lr=1.00E-05, loss= 1.2293 (max= 2.2039), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:43,574 - root - INFO - Step 13220: lr=1.00E-05, loss= 1.2552 (max= 2.3408), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:43,574 - root - INFO - Step 13220: lr=1.00E-05, loss= 1.2552 (max= 2.3408), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:43,574 - root - INFO - Step 13220: lr=1.00E-05, loss= 1.2552 (max= 2.3408), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:43,574 - root - INFO - Step 13220: lr=1.00E-05, loss= 1.2552 (max= 2.3408), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:43,574 - root - INFO - Step 13220: lr=1.00E-05, loss= 1.2552 (max= 2.3408), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:43,574 - root - INFO - Step 13220: lr=1.00E-05, loss= 1.2552 (max= 2.3408), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:43,574 - root - INFO - Step 13220: lr=1.00E-05, loss= 1.2552 (max= 2.3408), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:05:43,574 - root - INFO - Step 13220: lr=1.00E-05, loss= 1.2552 (max= 2.3408), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:01,625 - root - INFO - Step 13230: lr=1.00E-05, loss= 1.2215 (max= 2.2935), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:01,625 - root - INFO - Step 13230: lr=1.00E-05, loss= 1.2215 (max= 2.2935), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:01,625 - root - INFO - Step 13230: lr=1.00E-05, loss= 1.2215 (max= 2.2935), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:01,626 - root - INFO - Step 13230: lr=1.00E-05, loss= 1.2215 (max= 2.2935), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:01,626 - root - INFO - Step 13230: lr=1.00E-05, loss= 1.2215 (max= 2.2935), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:01,626 - root - INFO - Step 13230: lr=1.00E-05, loss= 1.2215 (max= 2.2935), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:01,626 - root - INFO - Step 13230: lr=1.00E-05, loss= 1.2215 (max= 2.2935), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:01,626 - root - INFO - Step 13230: lr=1.00E-05, loss= 1.2215 (max= 2.2935), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:19,675 - root - INFO - Step 13240: lr=1.00E-05, loss= 1.2277 (max= 2.3013), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:19,675 - root - INFO - Step 13240: lr=1.00E-05, loss= 1.2277 (max= 2.3013), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:19,675 - root - INFO - Step 13240: lr=1.00E-05, loss= 1.2277 (max= 2.3013), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:19,675 - root - INFO - Step 13240: lr=1.00E-05, loss= 1.2277 (max= 2.3013), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:19,675 - root - INFO - Step 13240: lr=1.00E-05, loss= 1.2277 (max= 2.3013), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:19,675 - root - INFO - Step 13240: lr=1.00E-05, loss= 1.2277 (max= 2.3013), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:19,676 - root - INFO - Step 13240: lr=1.00E-05, loss= 1.2277 (max= 2.3013), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:19,676 - root - INFO - Step 13240: lr=1.00E-05, loss= 1.2277 (max= 2.3013), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:37,739 - root - INFO - Step 13250: lr=1.00E-05, loss= 1.2286 (max= 2.1983), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:37,739 - root - INFO - Step 13250: lr=1.00E-05, loss= 1.2286 (max= 2.1983), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:37,739 - root - INFO - Step 13250: lr=1.00E-05, loss= 1.2286 (max= 2.1983), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:37,739 - root - INFO - Step 13250: lr=1.00E-05, loss= 1.2286 (max= 2.1983), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:37,739 - root - INFO - Step 13250: lr=1.00E-05, loss= 1.2286 (max= 2.1983), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:37,739 - root - INFO - Step 13250: lr=1.00E-05, loss= 1.2286 (max= 2.1983), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:37,739 - root - INFO - Step 13250: lr=1.00E-05, loss= 1.2286 (max= 2.1983), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:06:37,739 - root - INFO - Step 13250: lr=1.00E-05, loss= 1.2286 (max= 2.1983), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:02,198 - root - INFO - Step 13260: lr=1.00E-05, loss= 1.2121 (max= 2.1546), tps=13399, mfu=27.92%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.17s, 27.33%) +2025-10-24 16:07:02,198 - root - INFO - Step 13260: lr=1.00E-05, loss= 1.2121 (max= 2.1546), tps=13399, mfu=27.92%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.17s, 27.33%) +2025-10-24 16:07:02,198 - root - INFO - Step 13260: lr=1.00E-05, loss= 1.2121 (max= 2.1546), tps=13398, mfu=27.92%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.17s, 27.33%) +2025-10-24 16:07:02,198 - root - INFO - Step 13260: lr=1.00E-05, loss= 1.2121 (max= 2.1546), tps=13399, mfu=27.92%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.17s, 27.33%) +2025-10-24 16:07:02,198 - root - INFO - Step 13260: lr=1.00E-05, loss= 1.2121 (max= 2.1546), tps=13399, mfu=27.92%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.17s, 27.33%) +2025-10-24 16:07:02,199 - root - INFO - Step 13260: lr=1.00E-05, loss= 1.2121 (max= 2.1546), tps=13398, mfu=27.92%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.17s, 27.33%) +2025-10-24 16:07:02,199 - root - INFO - Step 13260: lr=1.00E-05, loss= 1.2121 (max= 2.1546), tps=13399, mfu=27.92%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.17s, 27.33%) +2025-10-24 16:07:02,199 - root - INFO - Step 13260: lr=1.00E-05, loss= 1.2121 (max= 2.1546), tps=13399, mfu=27.92%, memory: 78.54GiB(44.03%) time/data_loading=0.02s (max=0.17s, 27.33%) +2025-10-24 16:07:20,255 - root - INFO - Step 13270: lr=1.00E-05, loss= 1.2420 (max= 2.4106), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:20,255 - root - INFO - Step 13270: lr=1.00E-05, loss= 1.2420 (max= 2.4106), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:20,255 - root - INFO - Step 13270: lr=1.00E-05, loss= 1.2420 (max= 2.4106), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:20,256 - root - INFO - Step 13270: lr=1.00E-05, loss= 1.2420 (max= 2.4106), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:20,256 - root - INFO - Step 13270: lr=1.00E-05, loss= 1.2420 (max= 2.4106), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:20,256 - root - INFO - Step 13270: lr=1.00E-05, loss= 1.2420 (max= 2.4106), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:20,256 - root - INFO - Step 13270: lr=1.00E-05, loss= 1.2420 (max= 2.4106), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:20,256 - root - INFO - Step 13270: lr=1.00E-05, loss= 1.2420 (max= 2.4106), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:38,278 - root - INFO - Step 13280: lr=1.00E-05, loss= 1.2554 (max= 2.3895), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:38,278 - root - INFO - Step 13280: lr=1.00E-05, loss= 1.2554 (max= 2.3895), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:38,278 - root - INFO - Step 13280: lr=1.00E-05, loss= 1.2554 (max= 2.3895), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:38,278 - root - INFO - Step 13280: lr=1.00E-05, loss= 1.2554 (max= 2.3895), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:38,278 - root - INFO - Step 13280: lr=1.00E-05, loss= 1.2554 (max= 2.3895), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:38,278 - root - INFO - Step 13280: lr=1.00E-05, loss= 1.2554 (max= 2.3895), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:38,278 - root - INFO - Step 13280: lr=1.00E-05, loss= 1.2554 (max= 2.3895), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:38,278 - root - INFO - Step 13280: lr=1.00E-05, loss= 1.2554 (max= 2.3895), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:56,297 - root - INFO - Step 13290: lr=1.00E-05, loss= 1.2514 (max= 2.3270), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:56,297 - root - INFO - Step 13290: lr=1.00E-05, loss= 1.2514 (max= 2.3270), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:56,297 - root - INFO - Step 13290: lr=1.00E-05, loss= 1.2514 (max= 2.3270), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:56,297 - root - INFO - Step 13290: lr=1.00E-05, loss= 1.2514 (max= 2.3270), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:56,297 - root - INFO - Step 13290: lr=1.00E-05, loss= 1.2514 (max= 2.3270), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:56,297 - root - INFO - Step 13290: lr=1.00E-05, loss= 1.2514 (max= 2.3270), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:56,297 - root - INFO - Step 13290: lr=1.00E-05, loss= 1.2514 (max= 2.3270), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:07:56,297 - root - INFO - Step 13290: lr=1.00E-05, loss= 1.2514 (max= 2.3270), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:08:14,319 - root - INFO - Step 13300: lr=1.00E-05, loss= 1.2534 (max= 2.6327), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:08:14,319 - root - INFO - Step 13300: lr=1.00E-05, loss= 1.2534 (max= 2.6327), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:08:14,319 - root - INFO - Step 13300: lr=1.00E-05, loss= 1.2534 (max= 2.6327), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:08:14,319 - root - INFO - Step 13300: lr=1.00E-05, loss= 1.2534 (max= 2.6327), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:08:14,319 - root - INFO - Step 13300: lr=1.00E-05, loss= 1.2534 (max= 2.6327), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:08:14,319 - root - INFO - Step 13300: lr=1.00E-05, loss= 1.2534 (max= 2.6327), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:08:14,319 - root - INFO - Step 13300: lr=1.00E-05, loss= 1.2534 (max= 2.6327), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:08:14,319 - root - INFO - Step 13300: lr=1.00E-05, loss= 1.2534 (max= 2.6327), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:08:32,333 - root - INFO - Step 13310: lr=1.00E-05, loss= 1.2312 (max= 2.6959), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:08:32,333 - root - INFO - Step 13310: lr=1.00E-05, loss= 1.2312 (max= 2.6959), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:08:32,333 - root - INFO - Step 13310: lr=1.00E-05, loss= 1.2312 (max= 2.6959), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:08:32,333 - root - INFO - Step 13310: lr=1.00E-05, loss= 1.2312 (max= 2.6959), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:08:32,333 - root - INFO - Step 13310: lr=1.00E-05, loss= 1.2312 (max= 2.6959), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:08:32,333 - root - INFO - Step 13310: lr=1.00E-05, loss= 1.2312 (max= 2.6959), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:08:32,333 - root - INFO - Step 13310: lr=1.00E-05, loss= 1.2312 (max= 2.6959), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:08:32,333 - root - INFO - Step 13310: lr=1.00E-05, loss= 1.2312 (max= 2.6959), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:08:50,333 - root - INFO - Step 13320: lr=1.00E-05, loss= 1.2365 (max= 2.4577), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:08:50,333 - root - INFO - Step 13320: lr=1.00E-05, loss= 1.2365 (max= 2.4577), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:08:50,333 - root - INFO - Step 13320: lr=1.00E-05, loss= 1.2365 (max= 2.4577), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:08:50,333 - root - INFO - Step 13320: lr=1.00E-05, loss= 1.2365 (max= 2.4577), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:08:50,333 - root - INFO - Step 13320: lr=1.00E-05, loss= 1.2365 (max= 2.4577), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:08:50,333 - root - INFO - Step 13320: lr=1.00E-05, loss= 1.2365 (max= 2.4577), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:08:50,333 - root - INFO - Step 13320: lr=1.00E-05, loss= 1.2365 (max= 2.4577), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:08:50,333 - root - INFO - Step 13320: lr=1.00E-05, loss= 1.2365 (max= 2.4577), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:09:08,353 - root - INFO - Step 13330: lr=1.00E-05, loss= 1.2033 (max= 2.3571), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:09:08,353 - root - INFO - Step 13330: lr=1.00E-05, loss= 1.2033 (max= 2.3571), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:09:08,353 - root - INFO - Step 13330: lr=1.00E-05, loss= 1.2033 (max= 2.3571), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:09:08,353 - root - INFO - Step 13330: lr=1.00E-05, loss= 1.2033 (max= 2.3571), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:09:08,353 - root - INFO - Step 13330: lr=1.00E-05, loss= 1.2033 (max= 2.3571), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:09:08,353 - root - INFO - Step 13330: lr=1.00E-05, loss= 1.2033 (max= 2.3571), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:09:08,353 - root - INFO - Step 13330: lr=1.00E-05, loss= 1.2033 (max= 2.3571), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:09:08,353 - root - INFO - Step 13330: lr=1.00E-05, loss= 1.2033 (max= 2.3571), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:09:26,379 - root - INFO - Step 13340: lr=1.00E-05, loss= 1.2462 (max= 2.3449), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:09:26,379 - root - INFO - Step 13340: lr=1.00E-05, loss= 1.2462 (max= 2.3449), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:09:26,379 - root - INFO - Step 13340: lr=1.00E-05, loss= 1.2462 (max= 2.3449), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:09:26,379 - root - INFO - Step 13340: lr=1.00E-05, loss= 1.2462 (max= 2.3449), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:09:26,379 - root - INFO - Step 13340: lr=1.00E-05, loss= 1.2462 (max= 2.3449), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:09:26,379 - root - INFO - Step 13340: lr=1.00E-05, loss= 1.2462 (max= 2.3449), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:09:26,379 - root - INFO - Step 13340: lr=1.00E-05, loss= 1.2462 (max= 2.3449), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:09:26,379 - root - INFO - Step 13340: lr=1.00E-05, loss= 1.2462 (max= 2.3449), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:09:44,399 - root - INFO - Step 13350: lr=1.00E-05, loss= 1.2398 (max= 2.4204), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:09:44,399 - root - INFO - Step 13350: lr=1.00E-05, loss= 1.2398 (max= 2.4204), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:09:44,399 - root - INFO - Step 13350: lr=1.00E-05, loss= 1.2398 (max= 2.4204), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:09:44,399 - root - INFO - Step 13350: lr=1.00E-05, loss= 1.2398 (max= 2.4204), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:09:44,399 - root - INFO - Step 13350: lr=1.00E-05, loss= 1.2398 (max= 2.4204), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:09:44,399 - root - INFO - Step 13350: lr=1.00E-05, loss= 1.2398 (max= 2.4204), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:09:44,399 - root - INFO - Step 13350: lr=1.00E-05, loss= 1.2398 (max= 2.4204), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:09:44,399 - root - INFO - Step 13350: lr=1.00E-05, loss= 1.2398 (max= 2.4204), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:10:02,413 - root - INFO - Step 13360: lr=1.00E-05, loss= 1.2246 (max= 2.1066), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:10:02,413 - root - INFO - Step 13360: lr=1.00E-05, loss= 1.2246 (max= 2.1066), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:10:02,413 - root - INFO - Step 13360: lr=1.00E-05, loss= 1.2246 (max= 2.1066), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:10:02,413 - root - INFO - Step 13360: lr=1.00E-05, loss= 1.2246 (max= 2.1066), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:10:02,413 - root - INFO - Step 13360: lr=1.00E-05, loss= 1.2246 (max= 2.1066), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:10:02,413 - root - INFO - Step 13360: lr=1.00E-05, loss= 1.2246 (max= 2.1066), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:10:02,413 - root - INFO - Step 13360: lr=1.00E-05, loss= 1.2246 (max= 2.1066), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:10:02,413 - root - INFO - Step 13360: lr=1.00E-05, loss= 1.2246 (max= 2.1066), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:10:20,453 - root - INFO - Step 13370: lr=1.00E-05, loss= 1.2350 (max= 2.9341), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:10:20,453 - root - INFO - Step 13370: lr=1.00E-05, loss= 1.2350 (max= 2.9341), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:10:20,453 - root - INFO - Step 13370: lr=1.00E-05, loss= 1.2350 (max= 2.9341), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:10:20,453 - root - INFO - Step 13370: lr=1.00E-05, loss= 1.2350 (max= 2.9341), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:10:20,453 - root - INFO - Step 13370: lr=1.00E-05, loss= 1.2350 (max= 2.9341), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:10:20,453 - root - INFO - Step 13370: lr=1.00E-05, loss= 1.2350 (max= 2.9341), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:10:20,453 - root - INFO - Step 13370: lr=1.00E-05, loss= 1.2350 (max= 2.9341), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:10:20,453 - root - INFO - Step 13370: lr=1.00E-05, loss= 1.2350 (max= 2.9341), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:10:38,513 - root - INFO - Step 13380: lr=1.00E-05, loss= 1.2657 (max= 3.0160), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:10:38,513 - root - INFO - Step 13380: lr=1.00E-05, loss= 1.2657 (max= 3.0160), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:10:38,513 - root - INFO - Step 13380: lr=1.00E-05, loss= 1.2657 (max= 3.0160), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:10:38,513 - root - INFO - Step 13380: lr=1.00E-05, loss= 1.2657 (max= 3.0160), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:10:38,513 - root - INFO - Step 13380: lr=1.00E-05, loss= 1.2657 (max= 3.0160), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:10:38,513 - root - INFO - Step 13380: lr=1.00E-05, loss= 1.2657 (max= 3.0160), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:10:38,513 - root - INFO - Step 13380: lr=1.00E-05, loss= 1.2657 (max= 3.0160), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:10:38,513 - root - INFO - Step 13380: lr=1.00E-05, loss= 1.2657 (max= 3.0160), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:10:56,535 - root - INFO - Step 13390: lr=1.00E-05, loss= 1.2480 (max= 2.5516), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:10:56,535 - root - INFO - Step 13390: lr=1.00E-05, loss= 1.2480 (max= 2.5516), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:10:56,535 - root - INFO - Step 13390: lr=1.00E-05, loss= 1.2480 (max= 2.5516), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:10:56,535 - root - INFO - Step 13390: lr=1.00E-05, loss= 1.2480 (max= 2.5516), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:10:56,535 - root - INFO - Step 13390: lr=1.00E-05, loss= 1.2480 (max= 2.5516), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:10:56,535 - root - INFO - Step 13390: lr=1.00E-05, loss= 1.2480 (max= 2.5516), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:10:56,535 - root - INFO - Step 13390: lr=1.00E-05, loss= 1.2480 (max= 2.5516), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:10:56,535 - root - INFO - Step 13390: lr=1.00E-05, loss= 1.2480 (max= 2.5516), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:14,546 - root - INFO - Step 13400: lr=1.00E-05, loss= 1.2124 (max= 3.1133), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:14,546 - root - INFO - Step 13400: lr=1.00E-05, loss= 1.2124 (max= 3.1133), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:14,546 - root - INFO - Step 13400: lr=1.00E-05, loss= 1.2124 (max= 3.1133), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:14,546 - root - INFO - Step 13400: lr=1.00E-05, loss= 1.2124 (max= 3.1133), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:14,546 - root - INFO - Step 13400: lr=1.00E-05, loss= 1.2124 (max= 3.1133), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:14,546 - root - INFO - Step 13400: lr=1.00E-05, loss= 1.2124 (max= 3.1133), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:14,546 - root - INFO - Step 13400: lr=1.00E-05, loss= 1.2124 (max= 3.1133), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:14,546 - root - INFO - Step 13400: lr=1.00E-05, loss= 1.2124 (max= 3.1133), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:32,569 - root - INFO - Step 13410: lr=1.00E-05, loss= 1.2472 (max= 2.2886), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:32,569 - root - INFO - Step 13410: lr=1.00E-05, loss= 1.2472 (max= 2.2886), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:32,569 - root - INFO - Step 13410: lr=1.00E-05, loss= 1.2472 (max= 2.2886), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:32,569 - root - INFO - Step 13410: lr=1.00E-05, loss= 1.2472 (max= 2.2886), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:32,569 - root - INFO - Step 13410: lr=1.00E-05, loss= 1.2472 (max= 2.2886), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:32,569 - root - INFO - Step 13410: lr=1.00E-05, loss= 1.2472 (max= 2.2886), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:32,569 - root - INFO - Step 13410: lr=1.00E-05, loss= 1.2472 (max= 2.2886), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:32,569 - root - INFO - Step 13410: lr=1.00E-05, loss= 1.2472 (max= 2.2886), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:50,607 - root - INFO - Step 13420: lr=1.00E-05, loss= 1.2307 (max= 3.2785), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:50,607 - root - INFO - Step 13420: lr=1.00E-05, loss= 1.2307 (max= 3.2785), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:50,607 - root - INFO - Step 13420: lr=1.00E-05, loss= 1.2307 (max= 3.2785), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:50,607 - root - INFO - Step 13420: lr=1.00E-05, loss= 1.2307 (max= 3.2785), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:50,607 - root - INFO - Step 13420: lr=1.00E-05, loss= 1.2307 (max= 3.2785), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:50,607 - root - INFO - Step 13420: lr=1.00E-05, loss= 1.2307 (max= 3.2785), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:50,607 - root - INFO - Step 13420: lr=1.00E-05, loss= 1.2307 (max= 3.2785), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:11:50,607 - root - INFO - Step 13420: lr=1.00E-05, loss= 1.2307 (max= 3.2785), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:12:08,654 - root - INFO - Step 13430: lr=1.00E-05, loss= 1.2290 (max= 2.3065), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:12:08,654 - root - INFO - Step 13430: lr=1.00E-05, loss= 1.2290 (max= 2.3065), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:12:08,654 - root - INFO - Step 13430: lr=1.00E-05, loss= 1.2290 (max= 2.3065), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:12:08,654 - root - INFO - Step 13430: lr=1.00E-05, loss= 1.2290 (max= 2.3065), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:12:08,654 - root - INFO - Step 13430: lr=1.00E-05, loss= 1.2290 (max= 2.3065), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:12:08,654 - root - INFO - Step 13430: lr=1.00E-05, loss= 1.2290 (max= 2.3065), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:12:08,654 - root - INFO - Step 13430: lr=1.00E-05, loss= 1.2290 (max= 2.3065), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:12:08,654 - root - INFO - Step 13430: lr=1.00E-05, loss= 1.2290 (max= 2.3065), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:12:26,699 - root - INFO - Step 13440: lr=1.00E-05, loss= 1.2268 (max= 3.0304), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:12:26,699 - root - INFO - Step 13440: lr=1.00E-05, loss= 1.2268 (max= 3.0304), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:12:26,699 - root - INFO - Step 13440: lr=1.00E-05, loss= 1.2268 (max= 3.0304), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:12:26,699 - root - INFO - Step 13440: lr=1.00E-05, loss= 1.2268 (max= 3.0304), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:12:26,699 - root - INFO - Step 13440: lr=1.00E-05, loss= 1.2268 (max= 3.0304), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:12:26,699 - root - INFO - Step 13440: lr=1.00E-05, loss= 1.2268 (max= 3.0304), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:12:26,699 - root - INFO - Step 13440: lr=1.00E-05, loss= 1.2268 (max= 3.0304), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:12:26,699 - root - INFO - Step 13440: lr=1.00E-05, loss= 1.2268 (max= 3.0304), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:12:44,804 - root - INFO - Step 13450: lr=1.00E-05, loss= 1.2552 (max= 3.0437), tps=18102, mfu=37.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:12:44,804 - root - INFO - Step 13450: lr=1.00E-05, loss= 1.2552 (max= 3.0437), tps=18102, mfu=37.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:12:44,804 - root - INFO - Step 13450: lr=1.00E-05, loss= 1.2552 (max= 3.0437), tps=18102, mfu=37.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:12:44,804 - root - INFO - Step 13450: lr=1.00E-05, loss= 1.2552 (max= 3.0437), tps=18102, mfu=37.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:12:44,804 - root - INFO - Step 13450: lr=1.00E-05, loss= 1.2552 (max= 3.0437), tps=18102, mfu=37.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:12:44,804 - root - INFO - Step 13450: lr=1.00E-05, loss= 1.2552 (max= 3.0437), tps=18102, mfu=37.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:12:44,804 - root - INFO - Step 13450: lr=1.00E-05, loss= 1.2552 (max= 3.0437), tps=18102, mfu=37.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:12:44,804 - root - INFO - Step 13450: lr=1.00E-05, loss= 1.2552 (max= 3.0437), tps=18102, mfu=37.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:02,894 - root - INFO - Step 13460: lr=1.00E-05, loss= 1.2593 (max= 2.9344), tps=18117, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:02,894 - root - INFO - Step 13460: lr=1.00E-05, loss= 1.2593 (max= 2.9344), tps=18117, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:02,894 - root - INFO - Step 13460: lr=1.00E-05, loss= 1.2593 (max= 2.9344), tps=18117, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:02,894 - root - INFO - Step 13460: lr=1.00E-05, loss= 1.2593 (max= 2.9344), tps=18117, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:02,894 - root - INFO - Step 13460: lr=1.00E-05, loss= 1.2593 (max= 2.9344), tps=18117, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:02,894 - root - INFO - Step 13460: lr=1.00E-05, loss= 1.2593 (max= 2.9344), tps=18117, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:02,894 - root - INFO - Step 13460: lr=1.00E-05, loss= 1.2593 (max= 2.9344), tps=18117, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:02,894 - root - INFO - Step 13460: lr=1.00E-05, loss= 1.2593 (max= 2.9344), tps=18117, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:20,930 - root - INFO - Step 13470: lr=1.00E-05, loss= 1.2325 (max= 2.3210), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:20,930 - root - INFO - Step 13470: lr=1.00E-05, loss= 1.2325 (max= 2.3210), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:20,930 - root - INFO - Step 13470: lr=1.00E-05, loss= 1.2325 (max= 2.3210), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:20,930 - root - INFO - Step 13470: lr=1.00E-05, loss= 1.2325 (max= 2.3210), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:20,930 - root - INFO - Step 13470: lr=1.00E-05, loss= 1.2325 (max= 2.3210), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:20,930 - root - INFO - Step 13470: lr=1.00E-05, loss= 1.2325 (max= 2.3210), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:20,930 - root - INFO - Step 13470: lr=1.00E-05, loss= 1.2325 (max= 2.3210), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:20,931 - root - INFO - Step 13470: lr=1.00E-05, loss= 1.2325 (max= 2.3210), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:39,005 - root - INFO - Step 13480: lr=1.00E-05, loss= 1.2442 (max= 2.9481), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:39,005 - root - INFO - Step 13480: lr=1.00E-05, loss= 1.2442 (max= 2.9481), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:39,005 - root - INFO - Step 13480: lr=1.00E-05, loss= 1.2442 (max= 2.9481), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:39,005 - root - INFO - Step 13480: lr=1.00E-05, loss= 1.2442 (max= 2.9481), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:39,005 - root - INFO - Step 13480: lr=1.00E-05, loss= 1.2442 (max= 2.9481), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:39,005 - root - INFO - Step 13480: lr=1.00E-05, loss= 1.2442 (max= 2.9481), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:39,005 - root - INFO - Step 13480: lr=1.00E-05, loss= 1.2442 (max= 2.9481), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:39,005 - root - INFO - Step 13480: lr=1.00E-05, loss= 1.2442 (max= 2.9481), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:57,026 - root - INFO - Step 13490: lr=1.00E-05, loss= 1.2774 (max= 2.9818), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:57,026 - root - INFO - Step 13490: lr=1.00E-05, loss= 1.2774 (max= 2.9818), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:57,026 - root - INFO - Step 13490: lr=1.00E-05, loss= 1.2774 (max= 2.9818), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:57,026 - root - INFO - Step 13490: lr=1.00E-05, loss= 1.2774 (max= 2.9818), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:57,026 - root - INFO - Step 13490: lr=1.00E-05, loss= 1.2774 (max= 2.9818), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:57,026 - root - INFO - Step 13490: lr=1.00E-05, loss= 1.2774 (max= 2.9818), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:57,026 - root - INFO - Step 13490: lr=1.00E-05, loss= 1.2774 (max= 2.9818), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:13:57,026 - root - INFO - Step 13490: lr=1.00E-05, loss= 1.2774 (max= 2.9818), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:14:15,053 - root - INFO - Step 13500: lr=1.00E-05, loss= 1.2830 (max= 3.0195), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:15,053 - root - INFO - Step 13500: lr=1.00E-05, loss= 1.2830 (max= 3.0195), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:15,053 - root - INFO - Step 13500: lr=1.00E-05, loss= 1.2830 (max= 3.0195), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:15,053 - root - INFO - Step 13500: lr=1.00E-05, loss= 1.2830 (max= 3.0195), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:15,053 - root - INFO - Step 13500: lr=1.00E-05, loss= 1.2830 (max= 3.0195), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:15,053 - root - INFO - Step 13500: lr=1.00E-05, loss= 1.2830 (max= 3.0195), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:15,053 - root - INFO - Step 13500: lr=1.00E-05, loss= 1.2830 (max= 3.0195), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:15,053 - root - INFO - Step 13500: lr=1.00E-05, loss= 1.2830 (max= 3.0195), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:33,109 - root - INFO - Step 13510: lr=1.00E-05, loss= 1.2516 (max= 2.9310), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:33,109 - root - INFO - Step 13510: lr=1.00E-05, loss= 1.2516 (max= 2.9310), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:33,109 - root - INFO - Step 13510: lr=1.00E-05, loss= 1.2516 (max= 2.9310), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:33,109 - root - INFO - Step 13510: lr=1.00E-05, loss= 1.2516 (max= 2.9310), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:33,109 - root - INFO - Step 13510: lr=1.00E-05, loss= 1.2516 (max= 2.9310), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:33,109 - root - INFO - Step 13510: lr=1.00E-05, loss= 1.2516 (max= 2.9310), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:33,109 - root - INFO - Step 13510: lr=1.00E-05, loss= 1.2516 (max= 2.9310), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:33,109 - root - INFO - Step 13510: lr=1.00E-05, loss= 1.2516 (max= 2.9310), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:51,133 - root - INFO - Step 13520: lr=1.00E-05, loss= 1.2545 (max= 2.7652), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:51,133 - root - INFO - Step 13520: lr=1.00E-05, loss= 1.2545 (max= 2.7652), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:51,133 - root - INFO - Step 13520: lr=1.00E-05, loss= 1.2545 (max= 2.7652), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:51,133 - root - INFO - Step 13520: lr=1.00E-05, loss= 1.2545 (max= 2.7652), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:51,133 - root - INFO - Step 13520: lr=1.00E-05, loss= 1.2545 (max= 2.7652), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:51,133 - root - INFO - Step 13520: lr=1.00E-05, loss= 1.2545 (max= 2.7652), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:51,134 - root - INFO - Step 13520: lr=1.00E-05, loss= 1.2545 (max= 2.7652), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:14:51,134 - root - INFO - Step 13520: lr=1.00E-05, loss= 1.2545 (max= 2.7652), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:15:09,168 - root - INFO - Step 13530: lr=1.00E-05, loss= 1.2798 (max= 3.4069), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:09,168 - root - INFO - Step 13530: lr=1.00E-05, loss= 1.2798 (max= 3.4069), tps=18174, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:09,168 - root - INFO - Step 13530: lr=1.00E-05, loss= 1.2798 (max= 3.4069), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:09,168 - root - INFO - Step 13530: lr=1.00E-05, loss= 1.2798 (max= 3.4069), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:09,168 - root - INFO - Step 13530: lr=1.00E-05, loss= 1.2798 (max= 3.4069), tps=18174, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:09,168 - root - INFO - Step 13530: lr=1.00E-05, loss= 1.2798 (max= 3.4069), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:09,168 - root - INFO - Step 13530: lr=1.00E-05, loss= 1.2798 (max= 3.4069), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:09,168 - root - INFO - Step 13530: lr=1.00E-05, loss= 1.2798 (max= 3.4069), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:27,225 - root - INFO - Step 13540: lr=1.00E-05, loss= 1.3019 (max= 2.9844), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:27,225 - root - INFO - Step 13540: lr=1.00E-05, loss= 1.3019 (max= 2.9844), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:27,225 - root - INFO - Step 13540: lr=1.00E-05, loss= 1.3019 (max= 2.9844), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:27,225 - root - INFO - Step 13540: lr=1.00E-05, loss= 1.3019 (max= 2.9844), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:27,225 - root - INFO - Step 13540: lr=1.00E-05, loss= 1.3019 (max= 2.9844), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:27,225 - root - INFO - Step 13540: lr=1.00E-05, loss= 1.3019 (max= 2.9844), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:27,225 - root - INFO - Step 13540: lr=1.00E-05, loss= 1.3019 (max= 2.9844), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:27,225 - root - INFO - Step 13540: lr=1.00E-05, loss= 1.3019 (max= 2.9844), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:45,264 - root - INFO - Step 13550: lr=1.00E-05, loss= 1.2693 (max= 2.4637), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:45,264 - root - INFO - Step 13550: lr=1.00E-05, loss= 1.2693 (max= 2.4637), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:45,264 - root - INFO - Step 13550: lr=1.00E-05, loss= 1.2693 (max= 2.4637), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:45,264 - root - INFO - Step 13550: lr=1.00E-05, loss= 1.2693 (max= 2.4637), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:45,264 - root - INFO - Step 13550: lr=1.00E-05, loss= 1.2693 (max= 2.4637), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:45,264 - root - INFO - Step 13550: lr=1.00E-05, loss= 1.2693 (max= 2.4637), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:45,264 - root - INFO - Step 13550: lr=1.00E-05, loss= 1.2693 (max= 2.4637), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:15:45,264 - root - INFO - Step 13550: lr=1.00E-05, loss= 1.2693 (max= 2.4637), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:03,304 - root - INFO - Step 13560: lr=1.00E-05, loss= 1.2231 (max= 2.9373), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:16:03,304 - root - INFO - Step 13560: lr=1.00E-05, loss= 1.2231 (max= 2.9373), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:16:03,304 - root - INFO - Step 13560: lr=1.00E-05, loss= 1.2231 (max= 2.9373), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:16:03,304 - root - INFO - Step 13560: lr=1.00E-05, loss= 1.2231 (max= 2.9373), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:16:03,304 - root - INFO - Step 13560: lr=1.00E-05, loss= 1.2231 (max= 2.9373), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:16:03,304 - root - INFO - Step 13560: lr=1.00E-05, loss= 1.2231 (max= 2.9373), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:16:03,304 - root - INFO - Step 13560: lr=1.00E-05, loss= 1.2231 (max= 2.9373), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:16:03,304 - root - INFO - Step 13560: lr=1.00E-05, loss= 1.2231 (max= 2.9373), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:16:21,339 - root - INFO - Step 13570: lr=1.00E-05, loss= 1.2675 (max= 2.9332), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:21,339 - root - INFO - Step 13570: lr=1.00E-05, loss= 1.2675 (max= 2.9332), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:21,339 - root - INFO - Step 13570: lr=1.00E-05, loss= 1.2675 (max= 2.9332), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:21,339 - root - INFO - Step 13570: lr=1.00E-05, loss= 1.2675 (max= 2.9332), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:21,339 - root - INFO - Step 13570: lr=1.00E-05, loss= 1.2675 (max= 2.9332), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:21,339 - root - INFO - Step 13570: lr=1.00E-05, loss= 1.2675 (max= 2.9332), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:21,339 - root - INFO - Step 13570: lr=1.00E-05, loss= 1.2675 (max= 2.9332), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:21,339 - root - INFO - Step 13570: lr=1.00E-05, loss= 1.2675 (max= 2.9332), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:39,373 - root - INFO - Step 13580: lr=1.00E-05, loss= 1.2556 (max= 2.5364), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:39,373 - root - INFO - Step 13580: lr=1.00E-05, loss= 1.2556 (max= 2.5364), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:39,373 - root - INFO - Step 13580: lr=1.00E-05, loss= 1.2556 (max= 2.5364), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:39,373 - root - INFO - Step 13580: lr=1.00E-05, loss= 1.2556 (max= 2.5364), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:39,373 - root - INFO - Step 13580: lr=1.00E-05, loss= 1.2556 (max= 2.5364), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:39,373 - root - INFO - Step 13580: lr=1.00E-05, loss= 1.2556 (max= 2.5364), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:39,373 - root - INFO - Step 13580: lr=1.00E-05, loss= 1.2556 (max= 2.5364), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:39,373 - root - INFO - Step 13580: lr=1.00E-05, loss= 1.2556 (max= 2.5364), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:57,394 - root - INFO - Step 13590: lr=1.00E-05, loss= 1.2769 (max= 2.8202), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:57,394 - root - INFO - Step 13590: lr=1.00E-05, loss= 1.2769 (max= 2.8202), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:57,394 - root - INFO - Step 13590: lr=1.00E-05, loss= 1.2769 (max= 2.8202), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:57,394 - root - INFO - Step 13590: lr=1.00E-05, loss= 1.2769 (max= 2.8202), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:57,394 - root - INFO - Step 13590: lr=1.00E-05, loss= 1.2769 (max= 2.8202), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:57,394 - root - INFO - Step 13590: lr=1.00E-05, loss= 1.2769 (max= 2.8202), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:57,394 - root - INFO - Step 13590: lr=1.00E-05, loss= 1.2769 (max= 2.8202), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:16:57,394 - root - INFO - Step 13590: lr=1.00E-05, loss= 1.2769 (max= 2.8202), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:17:15,423 - root - INFO - Step 13600: lr=1.00E-05, loss= 1.2368 (max= 3.5074), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:17:15,423 - root - INFO - Step 13600: lr=1.00E-05, loss= 1.2368 (max= 3.5074), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:17:15,423 - root - INFO - Step 13600: lr=1.00E-05, loss= 1.2368 (max= 3.5074), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:17:15,423 - root - INFO - Step 13600: lr=1.00E-05, loss= 1.2368 (max= 3.5074), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:17:15,423 - root - INFO - Step 13600: lr=1.00E-05, loss= 1.2368 (max= 3.5074), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:17:15,423 - root - INFO - Step 13600: lr=1.00E-05, loss= 1.2368 (max= 3.5074), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:17:15,423 - root - INFO - Step 13600: lr=1.00E-05, loss= 1.2368 (max= 3.5074), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:17:15,423 - root - INFO - Step 13600: lr=1.00E-05, loss= 1.2368 (max= 3.5074), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:17:33,473 - root - INFO - Step 13610: lr=1.00E-05, loss= 1.2601 (max= 2.9002), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:17:33,473 - root - INFO - Step 13610: lr=1.00E-05, loss= 1.2601 (max= 2.9002), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:17:33,473 - root - INFO - Step 13610: lr=1.00E-05, loss= 1.2601 (max= 2.9002), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:17:33,473 - root - INFO - Step 13610: lr=1.00E-05, loss= 1.2601 (max= 2.9002), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:17:33,473 - root - INFO - Step 13610: lr=1.00E-05, loss= 1.2601 (max= 2.9002), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:17:33,473 - root - INFO - Step 13610: lr=1.00E-05, loss= 1.2601 (max= 2.9002), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:17:33,473 - root - INFO - Step 13610: lr=1.00E-05, loss= 1.2601 (max= 2.9002), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:17:33,473 - root - INFO - Step 13610: lr=1.00E-05, loss= 1.2601 (max= 2.9002), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:17:51,516 - root - INFO - Step 13620: lr=1.00E-05, loss= 1.2257 (max= 2.8619), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:17:51,516 - root - INFO - Step 13620: lr=1.00E-05, loss= 1.2257 (max= 2.8619), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:17:51,516 - root - INFO - Step 13620: lr=1.00E-05, loss= 1.2257 (max= 2.8619), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:17:51,516 - root - INFO - Step 13620: lr=1.00E-05, loss= 1.2257 (max= 2.8619), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:17:51,516 - root - INFO - Step 13620: lr=1.00E-05, loss= 1.2257 (max= 2.8619), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:17:51,516 - root - INFO - Step 13620: lr=1.00E-05, loss= 1.2257 (max= 2.8619), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:17:51,516 - root - INFO - Step 13620: lr=1.00E-05, loss= 1.2257 (max= 2.8619), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:17:51,516 - root - INFO - Step 13620: lr=1.00E-05, loss= 1.2257 (max= 2.8619), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:18:09,517 - root - INFO - Step 13630: lr=1.00E-05, loss= 1.2407 (max= 3.7404), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:09,517 - root - INFO - Step 13630: lr=1.00E-05, loss= 1.2407 (max= 3.7404), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:09,517 - root - INFO - Step 13630: lr=1.00E-05, loss= 1.2407 (max= 3.7404), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:09,517 - root - INFO - Step 13630: lr=1.00E-05, loss= 1.2407 (max= 3.7404), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:09,517 - root - INFO - Step 13630: lr=1.00E-05, loss= 1.2407 (max= 3.7404), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:09,517 - root - INFO - Step 13630: lr=1.00E-05, loss= 1.2407 (max= 3.7404), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:09,517 - root - INFO - Step 13630: lr=1.00E-05, loss= 1.2407 (max= 3.7404), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:09,517 - root - INFO - Step 13630: lr=1.00E-05, loss= 1.2407 (max= 3.7404), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:27,559 - root - INFO - Step 13640: lr=1.00E-05, loss= 1.2544 (max= 2.8961), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:27,559 - root - INFO - Step 13640: lr=1.00E-05, loss= 1.2544 (max= 2.8961), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:27,559 - root - INFO - Step 13640: lr=1.00E-05, loss= 1.2544 (max= 2.8961), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:27,559 - root - INFO - Step 13640: lr=1.00E-05, loss= 1.2544 (max= 2.8961), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:27,559 - root - INFO - Step 13640: lr=1.00E-05, loss= 1.2544 (max= 2.8961), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:27,559 - root - INFO - Step 13640: lr=1.00E-05, loss= 1.2544 (max= 2.8961), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:27,559 - root - INFO - Step 13640: lr=1.00E-05, loss= 1.2544 (max= 2.8961), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:27,559 - root - INFO - Step 13640: lr=1.00E-05, loss= 1.2544 (max= 2.8961), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:45,552 - root - INFO - Step 13650: lr=1.00E-05, loss= 1.2399 (max= 2.9132), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:45,552 - root - INFO - Step 13650: lr=1.00E-05, loss= 1.2399 (max= 2.9132), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:45,552 - root - INFO - Step 13650: lr=1.00E-05, loss= 1.2399 (max= 2.9132), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:45,552 - root - INFO - Step 13650: lr=1.00E-05, loss= 1.2399 (max= 2.9132), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:45,552 - root - INFO - Step 13650: lr=1.00E-05, loss= 1.2399 (max= 2.9132), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:45,552 - root - INFO - Step 13650: lr=1.00E-05, loss= 1.2399 (max= 2.9132), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:45,552 - root - INFO - Step 13650: lr=1.00E-05, loss= 1.2399 (max= 2.9132), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:18:45,553 - root - INFO - Step 13650: lr=1.00E-05, loss= 1.2399 (max= 2.9132), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:03,561 - root - INFO - Step 13660: lr=1.00E-05, loss= 1.2550 (max= 2.9816), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:03,561 - root - INFO - Step 13660: lr=1.00E-05, loss= 1.2550 (max= 2.9816), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:03,561 - root - INFO - Step 13660: lr=1.00E-05, loss= 1.2550 (max= 2.9816), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:03,561 - root - INFO - Step 13660: lr=1.00E-05, loss= 1.2550 (max= 2.9816), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:03,561 - root - INFO - Step 13660: lr=1.00E-05, loss= 1.2550 (max= 2.9816), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:03,561 - root - INFO - Step 13660: lr=1.00E-05, loss= 1.2550 (max= 2.9816), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:03,561 - root - INFO - Step 13660: lr=1.00E-05, loss= 1.2550 (max= 2.9816), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:03,561 - root - INFO - Step 13660: lr=1.00E-05, loss= 1.2550 (max= 2.9816), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:21,614 - root - INFO - Step 13670: lr=1.00E-05, loss= 1.2424 (max= 2.8797), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:21,614 - root - INFO - Step 13670: lr=1.00E-05, loss= 1.2424 (max= 2.8797), tps=18154, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:21,614 - root - INFO - Step 13670: lr=1.00E-05, loss= 1.2424 (max= 2.8797), tps=18154, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:21,614 - root - INFO - Step 13670: lr=1.00E-05, loss= 1.2424 (max= 2.8797), tps=18154, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:21,614 - root - INFO - Step 13670: lr=1.00E-05, loss= 1.2424 (max= 2.8797), tps=18154, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:21,614 - root - INFO - Step 13670: lr=1.00E-05, loss= 1.2424 (max= 2.8797), tps=18154, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:21,614 - root - INFO - Step 13670: lr=1.00E-05, loss= 1.2424 (max= 2.8797), tps=18154, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:21,614 - root - INFO - Step 13670: lr=1.00E-05, loss= 1.2424 (max= 2.8797), tps=18154, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:39,650 - root - INFO - Step 13680: lr=1.00E-05, loss= 1.2712 (max= 3.6518), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:19:39,650 - root - INFO - Step 13680: lr=1.00E-05, loss= 1.2712 (max= 3.6518), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:19:39,650 - root - INFO - Step 13680: lr=1.00E-05, loss= 1.2712 (max= 3.6518), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:19:39,650 - root - INFO - Step 13680: lr=1.00E-05, loss= 1.2712 (max= 3.6518), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:19:39,650 - root - INFO - Step 13680: lr=1.00E-05, loss= 1.2712 (max= 3.6518), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:19:39,650 - root - INFO - Step 13680: lr=1.00E-05, loss= 1.2712 (max= 3.6518), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:19:39,650 - root - INFO - Step 13680: lr=1.00E-05, loss= 1.2712 (max= 3.6518), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:19:39,651 - root - INFO - Step 13680: lr=1.00E-05, loss= 1.2712 (max= 3.6518), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:19:57,712 - root - INFO - Step 13690: lr=1.00E-05, loss= 1.2151 (max= 2.2994), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:57,712 - root - INFO - Step 13690: lr=1.00E-05, loss= 1.2151 (max= 2.2994), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:57,712 - root - INFO - Step 13690: lr=1.00E-05, loss= 1.2151 (max= 2.2994), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:57,712 - root - INFO - Step 13690: lr=1.00E-05, loss= 1.2151 (max= 2.2994), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:57,712 - root - INFO - Step 13690: lr=1.00E-05, loss= 1.2151 (max= 2.2994), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:57,712 - root - INFO - Step 13690: lr=1.00E-05, loss= 1.2151 (max= 2.2994), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:57,712 - root - INFO - Step 13690: lr=1.00E-05, loss= 1.2151 (max= 2.2994), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:19:57,712 - root - INFO - Step 13690: lr=1.00E-05, loss= 1.2151 (max= 2.2994), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:20:15,766 - root - INFO - Step 13700: lr=1.00E-05, loss= 1.2995 (max= 2.8745), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:15,766 - root - INFO - Step 13700: lr=1.00E-05, loss= 1.2995 (max= 2.8745), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:15,766 - root - INFO - Step 13700: lr=1.00E-05, loss= 1.2995 (max= 2.8745), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:15,766 - root - INFO - Step 13700: lr=1.00E-05, loss= 1.2995 (max= 2.8745), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:15,766 - root - INFO - Step 13700: lr=1.00E-05, loss= 1.2995 (max= 2.8745), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:15,766 - root - INFO - Step 13700: lr=1.00E-05, loss= 1.2995 (max= 2.8745), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:15,766 - root - INFO - Step 13700: lr=1.00E-05, loss= 1.2995 (max= 2.8745), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:15,766 - root - INFO - Step 13700: lr=1.00E-05, loss= 1.2995 (max= 2.8745), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:33,774 - root - INFO - Step 13710: lr=1.00E-05, loss= 1.2986 (max= 2.8921), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:33,774 - root - INFO - Step 13710: lr=1.00E-05, loss= 1.2986 (max= 2.8921), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:33,774 - root - INFO - Step 13710: lr=1.00E-05, loss= 1.2986 (max= 2.8921), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:33,774 - root - INFO - Step 13710: lr=1.00E-05, loss= 1.2986 (max= 2.8921), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:33,774 - root - INFO - Step 13710: lr=1.00E-05, loss= 1.2986 (max= 2.8921), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:33,774 - root - INFO - Step 13710: lr=1.00E-05, loss= 1.2986 (max= 2.8921), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:33,774 - root - INFO - Step 13710: lr=1.00E-05, loss= 1.2986 (max= 2.8921), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:33,774 - root - INFO - Step 13710: lr=1.00E-05, loss= 1.2986 (max= 2.8921), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:51,783 - root - INFO - Step 13720: lr=1.00E-05, loss= 1.2900 (max= 2.8880), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:51,783 - root - INFO - Step 13720: lr=1.00E-05, loss= 1.2900 (max= 2.8880), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:51,783 - root - INFO - Step 13720: lr=1.00E-05, loss= 1.2900 (max= 2.8880), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:51,783 - root - INFO - Step 13720: lr=1.00E-05, loss= 1.2900 (max= 2.8880), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:51,783 - root - INFO - Step 13720: lr=1.00E-05, loss= 1.2900 (max= 2.8880), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:51,783 - root - INFO - Step 13720: lr=1.00E-05, loss= 1.2900 (max= 2.8880), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:51,783 - root - INFO - Step 13720: lr=1.00E-05, loss= 1.2900 (max= 2.8880), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:20:51,783 - root - INFO - Step 13720: lr=1.00E-05, loss= 1.2900 (max= 2.8880), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:21:09,777 - root - INFO - Step 13730: lr=1.00E-05, loss= 1.2832 (max= 2.8290), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:21:09,777 - root - INFO - Step 13730: lr=1.00E-05, loss= 1.2832 (max= 2.8290), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:21:09,777 - root - INFO - Step 13730: lr=1.00E-05, loss= 1.2832 (max= 2.8290), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:21:09,777 - root - INFO - Step 13730: lr=1.00E-05, loss= 1.2832 (max= 2.8290), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:21:09,777 - root - INFO - Step 13730: lr=1.00E-05, loss= 1.2832 (max= 2.8290), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:21:09,777 - root - INFO - Step 13730: lr=1.00E-05, loss= 1.2832 (max= 2.8290), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:21:09,777 - root - INFO - Step 13730: lr=1.00E-05, loss= 1.2832 (max= 2.8290), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:21:09,777 - root - INFO - Step 13730: lr=1.00E-05, loss= 1.2832 (max= 2.8290), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:21:27,806 - root - INFO - Step 13740: lr=1.00E-05, loss= 1.2317 (max= 2.9074), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:21:27,806 - root - INFO - Step 13740: lr=1.00E-05, loss= 1.2317 (max= 2.9074), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:21:27,806 - root - INFO - Step 13740: lr=1.00E-05, loss= 1.2317 (max= 2.9074), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:21:27,806 - root - INFO - Step 13740: lr=1.00E-05, loss= 1.2317 (max= 2.9074), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:21:27,806 - root - INFO - Step 13740: lr=1.00E-05, loss= 1.2317 (max= 2.9074), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:21:27,806 - root - INFO - Step 13740: lr=1.00E-05, loss= 1.2317 (max= 2.9074), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:21:27,806 - root - INFO - Step 13740: lr=1.00E-05, loss= 1.2317 (max= 2.9074), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:21:27,806 - root - INFO - Step 13740: lr=1.00E-05, loss= 1.2317 (max= 2.9074), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:21:45,846 - root - INFO - Step 13750: lr=1.00E-05, loss= 1.2666 (max= 2.9218), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:21:45,846 - root - INFO - Step 13750: lr=1.00E-05, loss= 1.2666 (max= 2.9218), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:21:45,846 - root - INFO - Step 13750: lr=1.00E-05, loss= 1.2666 (max= 2.9218), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:21:45,847 - root - INFO - Step 13750: lr=1.00E-05, loss= 1.2666 (max= 2.9218), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:21:45,847 - root - INFO - Step 13750: lr=1.00E-05, loss= 1.2666 (max= 2.9218), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:21:45,847 - root - INFO - Step 13750: lr=1.00E-05, loss= 1.2666 (max= 2.9218), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:21:45,847 - root - INFO - Step 13750: lr=1.00E-05, loss= 1.2666 (max= 2.9218), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:21:45,847 - root - INFO - Step 13750: lr=1.00E-05, loss= 1.2666 (max= 2.9218), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:22:03,876 - root - INFO - Step 13760: lr=1.00E-05, loss= 1.2609 (max= 2.8511), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:03,877 - root - INFO - Step 13760: lr=1.00E-05, loss= 1.2609 (max= 2.8511), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:03,877 - root - INFO - Step 13760: lr=1.00E-05, loss= 1.2609 (max= 2.8511), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:03,877 - root - INFO - Step 13760: lr=1.00E-05, loss= 1.2609 (max= 2.8511), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:03,877 - root - INFO - Step 13760: lr=1.00E-05, loss= 1.2609 (max= 2.8511), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:03,877 - root - INFO - Step 13760: lr=1.00E-05, loss= 1.2609 (max= 2.8511), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:03,877 - root - INFO - Step 13760: lr=1.00E-05, loss= 1.2609 (max= 2.8511), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:03,877 - root - INFO - Step 13760: lr=1.00E-05, loss= 1.2609 (max= 2.8511), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:21,884 - root - INFO - Step 13770: lr=1.00E-05, loss= 1.2333 (max= 2.8554), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:21,884 - root - INFO - Step 13770: lr=1.00E-05, loss= 1.2333 (max= 2.8554), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:21,884 - root - INFO - Step 13770: lr=1.00E-05, loss= 1.2333 (max= 2.8554), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:21,884 - root - INFO - Step 13770: lr=1.00E-05, loss= 1.2333 (max= 2.8554), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:21,884 - root - INFO - Step 13770: lr=1.00E-05, loss= 1.2333 (max= 2.8554), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:21,884 - root - INFO - Step 13770: lr=1.00E-05, loss= 1.2333 (max= 2.8554), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:21,884 - root - INFO - Step 13770: lr=1.00E-05, loss= 1.2333 (max= 2.8554), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:21,884 - root - INFO - Step 13770: lr=1.00E-05, loss= 1.2333 (max= 2.8554), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:39,884 - root - INFO - Step 13780: lr=1.00E-05, loss= 1.2726 (max= 2.3027), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:39,884 - root - INFO - Step 13780: lr=1.00E-05, loss= 1.2726 (max= 2.3027), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:39,884 - root - INFO - Step 13780: lr=1.00E-05, loss= 1.2726 (max= 2.3027), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:39,884 - root - INFO - Step 13780: lr=1.00E-05, loss= 1.2726 (max= 2.3027), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:39,884 - root - INFO - Step 13780: lr=1.00E-05, loss= 1.2726 (max= 2.3027), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:39,884 - root - INFO - Step 13780: lr=1.00E-05, loss= 1.2726 (max= 2.3027), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:39,884 - root - INFO - Step 13780: lr=1.00E-05, loss= 1.2726 (max= 2.3027), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:39,884 - root - INFO - Step 13780: lr=1.00E-05, loss= 1.2726 (max= 2.3027), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:57,935 - root - INFO - Step 13790: lr=1.00E-05, loss= 1.2733 (max= 2.8004), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:57,935 - root - INFO - Step 13790: lr=1.00E-05, loss= 1.2733 (max= 2.8004), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:57,935 - root - INFO - Step 13790: lr=1.00E-05, loss= 1.2733 (max= 2.8004), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:57,935 - root - INFO - Step 13790: lr=1.00E-05, loss= 1.2733 (max= 2.8004), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:57,935 - root - INFO - Step 13790: lr=1.00E-05, loss= 1.2733 (max= 2.8004), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:57,935 - root - INFO - Step 13790: lr=1.00E-05, loss= 1.2733 (max= 2.8004), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:57,935 - root - INFO - Step 13790: lr=1.00E-05, loss= 1.2733 (max= 2.8004), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:22:57,935 - root - INFO - Step 13790: lr=1.00E-05, loss= 1.2733 (max= 2.8004), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:23:15,985 - root - INFO - Step 13800: lr=1.00E-05, loss= 1.2489 (max= 3.3660), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:23:15,985 - root - INFO - Step 13800: lr=1.00E-05, loss= 1.2489 (max= 3.3660), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:23:15,985 - root - INFO - Step 13800: lr=1.00E-05, loss= 1.2489 (max= 3.3660), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:23:15,985 - root - INFO - Step 13800: lr=1.00E-05, loss= 1.2489 (max= 3.3660), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:23:15,985 - root - INFO - Step 13800: lr=1.00E-05, loss= 1.2489 (max= 3.3660), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:23:15,985 - root - INFO - Step 13800: lr=1.00E-05, loss= 1.2489 (max= 3.3660), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:23:15,985 - root - INFO - Step 13800: lr=1.00E-05, loss= 1.2489 (max= 3.3660), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:23:15,986 - root - INFO - Step 13800: lr=1.00E-05, loss= 1.2489 (max= 3.3660), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:23:34,056 - root - INFO - Step 13810: lr=1.00E-05, loss= 1.2378 (max= 2.7430), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:23:34,056 - root - INFO - Step 13810: lr=1.00E-05, loss= 1.2378 (max= 2.7430), tps=18137, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:23:34,056 - root - INFO - Step 13810: lr=1.00E-05, loss= 1.2378 (max= 2.7430), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:23:34,056 - root - INFO - Step 13810: lr=1.00E-05, loss= 1.2378 (max= 2.7430), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:23:34,056 - root - INFO - Step 13810: lr=1.00E-05, loss= 1.2378 (max= 2.7430), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:23:34,056 - root - INFO - Step 13810: lr=1.00E-05, loss= 1.2378 (max= 2.7430), tps=18137, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:23:34,056 - root - INFO - Step 13810: lr=1.00E-05, loss= 1.2378 (max= 2.7430), tps=18137, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:23:34,056 - root - INFO - Step 13810: lr=1.00E-05, loss= 1.2378 (max= 2.7430), tps=18137, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:23:52,118 - root - INFO - Step 13820: lr=1.00E-05, loss= 1.2708 (max= 3.4731), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:23:52,118 - root - INFO - Step 13820: lr=1.00E-05, loss= 1.2708 (max= 3.4731), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:23:52,118 - root - INFO - Step 13820: lr=1.00E-05, loss= 1.2708 (max= 3.4731), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:23:52,118 - root - INFO - Step 13820: lr=1.00E-05, loss= 1.2708 (max= 3.4731), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:23:52,118 - root - INFO - Step 13820: lr=1.00E-05, loss= 1.2708 (max= 3.4731), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:23:52,118 - root - INFO - Step 13820: lr=1.00E-05, loss= 1.2708 (max= 3.4731), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:23:52,118 - root - INFO - Step 13820: lr=1.00E-05, loss= 1.2708 (max= 3.4731), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:23:52,118 - root - INFO - Step 13820: lr=1.00E-05, loss= 1.2708 (max= 3.4731), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:10,143 - root - INFO - Step 13830: lr=1.00E-05, loss= 1.2718 (max= 2.2114), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:10,143 - root - INFO - Step 13830: lr=1.00E-05, loss= 1.2718 (max= 2.2114), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:10,143 - root - INFO - Step 13830: lr=1.00E-05, loss= 1.2718 (max= 2.2114), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:10,143 - root - INFO - Step 13830: lr=1.00E-05, loss= 1.2718 (max= 2.2114), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:10,143 - root - INFO - Step 13830: lr=1.00E-05, loss= 1.2718 (max= 2.2114), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:10,143 - root - INFO - Step 13830: lr=1.00E-05, loss= 1.2718 (max= 2.2114), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:10,143 - root - INFO - Step 13830: lr=1.00E-05, loss= 1.2718 (max= 2.2114), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:10,143 - root - INFO - Step 13830: lr=1.00E-05, loss= 1.2718 (max= 2.2114), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:28,158 - root - INFO - Step 13840: lr=1.00E-05, loss= 1.2408 (max= 2.3643), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:28,158 - root - INFO - Step 13840: lr=1.00E-05, loss= 1.2408 (max= 2.3643), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:28,158 - root - INFO - Step 13840: lr=1.00E-05, loss= 1.2408 (max= 2.3643), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:28,158 - root - INFO - Step 13840: lr=1.00E-05, loss= 1.2408 (max= 2.3643), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:28,158 - root - INFO - Step 13840: lr=1.00E-05, loss= 1.2408 (max= 2.3643), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:28,158 - root - INFO - Step 13840: lr=1.00E-05, loss= 1.2408 (max= 2.3643), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:28,158 - root - INFO - Step 13840: lr=1.00E-05, loss= 1.2408 (max= 2.3643), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:28,158 - root - INFO - Step 13840: lr=1.00E-05, loss= 1.2408 (max= 2.3643), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:46,246 - root - INFO - Step 13850: lr=1.00E-05, loss= 1.2697 (max= 2.8515), tps=18118, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:46,246 - root - INFO - Step 13850: lr=1.00E-05, loss= 1.2697 (max= 2.8515), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:46,247 - root - INFO - Step 13850: lr=1.00E-05, loss= 1.2697 (max= 2.8515), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:46,247 - root - INFO - Step 13850: lr=1.00E-05, loss= 1.2697 (max= 2.8515), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:46,247 - root - INFO - Step 13850: lr=1.00E-05, loss= 1.2697 (max= 2.8515), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:46,247 - root - INFO - Step 13850: lr=1.00E-05, loss= 1.2697 (max= 2.8515), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:46,247 - root - INFO - Step 13850: lr=1.00E-05, loss= 1.2697 (max= 2.8515), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:24:46,247 - root - INFO - Step 13850: lr=1.00E-05, loss= 1.2697 (max= 2.8515), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:04,277 - root - INFO - Step 13860: lr=1.00E-05, loss= 1.2672 (max= 2.1864), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:04,277 - root - INFO - Step 13860: lr=1.00E-05, loss= 1.2672 (max= 2.1864), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:04,277 - root - INFO - Step 13860: lr=1.00E-05, loss= 1.2672 (max= 2.1864), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:04,277 - root - INFO - Step 13860: lr=1.00E-05, loss= 1.2672 (max= 2.1864), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:04,277 - root - INFO - Step 13860: lr=1.00E-05, loss= 1.2672 (max= 2.1864), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:04,277 - root - INFO - Step 13860: lr=1.00E-05, loss= 1.2672 (max= 2.1864), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:04,277 - root - INFO - Step 13860: lr=1.00E-05, loss= 1.2672 (max= 2.1864), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:04,277 - root - INFO - Step 13860: lr=1.00E-05, loss= 1.2672 (max= 2.1864), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:22,304 - root - INFO - Step 13870: lr=1.00E-05, loss= 1.2539 (max= 2.9770), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:22,304 - root - INFO - Step 13870: lr=1.00E-05, loss= 1.2539 (max= 2.9770), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:22,304 - root - INFO - Step 13870: lr=1.00E-05, loss= 1.2539 (max= 2.9770), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:22,304 - root - INFO - Step 13870: lr=1.00E-05, loss= 1.2539 (max= 2.9770), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:22,304 - root - INFO - Step 13870: lr=1.00E-05, loss= 1.2539 (max= 2.9770), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:22,304 - root - INFO - Step 13870: lr=1.00E-05, loss= 1.2539 (max= 2.9770), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:22,304 - root - INFO - Step 13870: lr=1.00E-05, loss= 1.2539 (max= 2.9770), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:22,304 - root - INFO - Step 13870: lr=1.00E-05, loss= 1.2539 (max= 2.9770), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:40,316 - root - INFO - Step 13880: lr=1.00E-05, loss= 1.2537 (max= 2.2303), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:40,317 - root - INFO - Step 13880: lr=1.00E-05, loss= 1.2537 (max= 2.2303), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:40,317 - root - INFO - Step 13880: lr=1.00E-05, loss= 1.2537 (max= 2.2303), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:40,317 - root - INFO - Step 13880: lr=1.00E-05, loss= 1.2537 (max= 2.2303), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:40,317 - root - INFO - Step 13880: lr=1.00E-05, loss= 1.2537 (max= 2.2303), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:40,317 - root - INFO - Step 13880: lr=1.00E-05, loss= 1.2537 (max= 2.2303), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:40,317 - root - INFO - Step 13880: lr=1.00E-05, loss= 1.2537 (max= 2.2303), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:40,317 - root - INFO - Step 13880: lr=1.00E-05, loss= 1.2537 (max= 2.2303), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:58,327 - root - INFO - Step 13890: lr=1.00E-05, loss= 1.2657 (max= 2.8234), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:58,327 - root - INFO - Step 13890: lr=1.00E-05, loss= 1.2657 (max= 2.8234), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:58,327 - root - INFO - Step 13890: lr=1.00E-05, loss= 1.2657 (max= 2.8234), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:58,327 - root - INFO - Step 13890: lr=1.00E-05, loss= 1.2657 (max= 2.8234), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:58,327 - root - INFO - Step 13890: lr=1.00E-05, loss= 1.2657 (max= 2.8234), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:58,327 - root - INFO - Step 13890: lr=1.00E-05, loss= 1.2657 (max= 2.8234), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:58,327 - root - INFO - Step 13890: lr=1.00E-05, loss= 1.2657 (max= 2.8234), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:25:58,327 - root - INFO - Step 13890: lr=1.00E-05, loss= 1.2657 (max= 2.8234), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:16,377 - root - INFO - Step 13900: lr=1.00E-05, loss= 1.2294 (max= 2.8033), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:16,377 - root - INFO - Step 13900: lr=1.00E-05, loss= 1.2294 (max= 2.8033), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:16,377 - root - INFO - Step 13900: lr=1.00E-05, loss= 1.2294 (max= 2.8033), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:16,377 - root - INFO - Step 13900: lr=1.00E-05, loss= 1.2294 (max= 2.8033), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:16,377 - root - INFO - Step 13900: lr=1.00E-05, loss= 1.2294 (max= 2.8033), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:16,377 - root - INFO - Step 13900: lr=1.00E-05, loss= 1.2294 (max= 2.8033), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:16,377 - root - INFO - Step 13900: lr=1.00E-05, loss= 1.2294 (max= 2.8033), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:16,377 - root - INFO - Step 13900: lr=1.00E-05, loss= 1.2294 (max= 2.8033), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:34,398 - root - INFO - Step 13910: lr=1.00E-05, loss= 1.2428 (max= 2.7962), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:34,398 - root - INFO - Step 13910: lr=1.00E-05, loss= 1.2428 (max= 2.7962), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:34,398 - root - INFO - Step 13910: lr=1.00E-05, loss= 1.2428 (max= 2.7962), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:34,398 - root - INFO - Step 13910: lr=1.00E-05, loss= 1.2428 (max= 2.7962), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:34,398 - root - INFO - Step 13910: lr=1.00E-05, loss= 1.2428 (max= 2.7962), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:34,398 - root - INFO - Step 13910: lr=1.00E-05, loss= 1.2428 (max= 2.7962), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:34,398 - root - INFO - Step 13910: lr=1.00E-05, loss= 1.2428 (max= 2.7962), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:34,398 - root - INFO - Step 13910: lr=1.00E-05, loss= 1.2428 (max= 2.7962), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:52,421 - root - INFO - Step 13920: lr=1.00E-05, loss= 1.2142 (max= 2.8592), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:52,422 - root - INFO - Step 13920: lr=1.00E-05, loss= 1.2142 (max= 2.8592), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:52,422 - root - INFO - Step 13920: lr=1.00E-05, loss= 1.2142 (max= 2.8592), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:52,422 - root - INFO - Step 13920: lr=1.00E-05, loss= 1.2142 (max= 2.8592), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:52,422 - root - INFO - Step 13920: lr=1.00E-05, loss= 1.2142 (max= 2.8592), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:52,422 - root - INFO - Step 13920: lr=1.00E-05, loss= 1.2142 (max= 2.8592), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:52,422 - root - INFO - Step 13920: lr=1.00E-05, loss= 1.2142 (max= 2.8592), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:26:52,422 - root - INFO - Step 13920: lr=1.00E-05, loss= 1.2142 (max= 2.8592), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:27:10,405 - root - INFO - Step 13930: lr=1.00E-05, loss= 1.2568 (max= 2.7934), tps=18224, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:27:10,405 - root - INFO - Step 13930: lr=1.00E-05, loss= 1.2568 (max= 2.7934), tps=18224, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:27:10,405 - root - INFO - Step 13930: lr=1.00E-05, loss= 1.2568 (max= 2.7934), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:27:10,405 - root - INFO - Step 13930: lr=1.00E-05, loss= 1.2568 (max= 2.7934), tps=18224, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:27:10,405 - root - INFO - Step 13930: lr=1.00E-05, loss= 1.2568 (max= 2.7934), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:27:10,405 - root - INFO - Step 13930: lr=1.00E-05, loss= 1.2568 (max= 2.7934), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:27:10,405 - root - INFO - Step 13930: lr=1.00E-05, loss= 1.2568 (max= 2.7934), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:27:10,405 - root - INFO - Step 13930: lr=1.00E-05, loss= 1.2568 (max= 2.7934), tps=18225, mfu=37.97%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:27:28,464 - root - INFO - Step 13940: lr=1.00E-05, loss= 1.2598 (max= 2.4542), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:27:28,464 - root - INFO - Step 13940: lr=1.00E-05, loss= 1.2598 (max= 2.4542), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:27:28,464 - root - INFO - Step 13940: lr=1.00E-05, loss= 1.2598 (max= 2.4542), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:27:28,464 - root - INFO - Step 13940: lr=1.00E-05, loss= 1.2598 (max= 2.4542), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:27:28,464 - root - INFO - Step 13940: lr=1.00E-05, loss= 1.2598 (max= 2.4542), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:27:28,464 - root - INFO - Step 13940: lr=1.00E-05, loss= 1.2598 (max= 2.4542), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:27:28,464 - root - INFO - Step 13940: lr=1.00E-05, loss= 1.2598 (max= 2.4542), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:27:28,464 - root - INFO - Step 13940: lr=1.00E-05, loss= 1.2598 (max= 2.4542), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:27:46,488 - root - INFO - Step 13950: lr=1.00E-05, loss= 1.3000 (max= 2.8904), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:27:46,488 - root - INFO - Step 13950: lr=1.00E-05, loss= 1.3000 (max= 2.8904), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:27:46,488 - root - INFO - Step 13950: lr=1.00E-05, loss= 1.3000 (max= 2.8904), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:27:46,488 - root - INFO - Step 13950: lr=1.00E-05, loss= 1.3000 (max= 2.8904), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:27:46,488 - root - INFO - Step 13950: lr=1.00E-05, loss= 1.3000 (max= 2.8904), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:27:46,488 - root - INFO - Step 13950: lr=1.00E-05, loss= 1.3000 (max= 2.8904), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:27:46,488 - root - INFO - Step 13950: lr=1.00E-05, loss= 1.3000 (max= 2.8904), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:27:46,488 - root - INFO - Step 13950: lr=1.00E-05, loss= 1.3000 (max= 2.8904), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:04,537 - root - INFO - Step 13960: lr=1.00E-05, loss= 1.2290 (max= 2.3647), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:04,537 - root - INFO - Step 13960: lr=1.00E-05, loss= 1.2290 (max= 2.3647), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:04,537 - root - INFO - Step 13960: lr=1.00E-05, loss= 1.2290 (max= 2.3647), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:04,537 - root - INFO - Step 13960: lr=1.00E-05, loss= 1.2290 (max= 2.3647), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:04,537 - root - INFO - Step 13960: lr=1.00E-05, loss= 1.2290 (max= 2.3647), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:04,537 - root - INFO - Step 13960: lr=1.00E-05, loss= 1.2290 (max= 2.3647), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:04,537 - root - INFO - Step 13960: lr=1.00E-05, loss= 1.2290 (max= 2.3647), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:04,538 - root - INFO - Step 13960: lr=1.00E-05, loss= 1.2290 (max= 2.3647), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:22,578 - root - INFO - Step 13970: lr=1.00E-05, loss= 1.2545 (max= 2.8231), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:28:22,578 - root - INFO - Step 13970: lr=1.00E-05, loss= 1.2545 (max= 2.8231), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:28:22,578 - root - INFO - Step 13970: lr=1.00E-05, loss= 1.2545 (max= 2.8231), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:28:22,578 - root - INFO - Step 13970: lr=1.00E-05, loss= 1.2545 (max= 2.8231), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:28:22,578 - root - INFO - Step 13970: lr=1.00E-05, loss= 1.2545 (max= 2.8231), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:28:22,578 - root - INFO - Step 13970: lr=1.00E-05, loss= 1.2545 (max= 2.8231), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:28:22,578 - root - INFO - Step 13970: lr=1.00E-05, loss= 1.2545 (max= 2.8231), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:28:22,578 - root - INFO - Step 13970: lr=1.00E-05, loss= 1.2545 (max= 2.8231), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:28:40,629 - root - INFO - Step 13980: lr=1.00E-05, loss= 1.2712 (max= 2.3477), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:40,630 - root - INFO - Step 13980: lr=1.00E-05, loss= 1.2712 (max= 2.3477), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:40,630 - root - INFO - Step 13980: lr=1.00E-05, loss= 1.2712 (max= 2.3477), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:40,630 - root - INFO - Step 13980: lr=1.00E-05, loss= 1.2712 (max= 2.3477), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:40,630 - root - INFO - Step 13980: lr=1.00E-05, loss= 1.2712 (max= 2.3477), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:40,630 - root - INFO - Step 13980: lr=1.00E-05, loss= 1.2712 (max= 2.3477), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:40,630 - root - INFO - Step 13980: lr=1.00E-05, loss= 1.2712 (max= 2.3477), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:40,630 - root - INFO - Step 13980: lr=1.00E-05, loss= 1.2712 (max= 2.3477), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:58,711 - root - INFO - Step 13990: lr=1.00E-05, loss= 1.2585 (max= 2.3240), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:58,711 - root - INFO - Step 13990: lr=1.00E-05, loss= 1.2585 (max= 2.3240), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:58,711 - root - INFO - Step 13990: lr=1.00E-05, loss= 1.2585 (max= 2.3240), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:58,711 - root - INFO - Step 13990: lr=1.00E-05, loss= 1.2585 (max= 2.3240), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:58,711 - root - INFO - Step 13990: lr=1.00E-05, loss= 1.2585 (max= 2.3240), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:58,711 - root - INFO - Step 13990: lr=1.00E-05, loss= 1.2585 (max= 2.3240), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:58,711 - root - INFO - Step 13990: lr=1.00E-05, loss= 1.2585 (max= 2.3240), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:28:58,711 - root - INFO - Step 13990: lr=1.00E-05, loss= 1.2585 (max= 2.3240), tps=18126, mfu=37.77%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-14000 +2025-10-24 16:29:16,692 - root - INFO - Step 14000: lr=1.00E-05, loss= 1.2351 (max= 2.2011), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:29:16,692 - root - INFO - Step 14000: lr=1.00E-05, loss= 1.2351 (max= 2.2011), tps=18228, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:29:16,692 - root - INFO - Saving a full checkpoint at step 14000 +2025-10-24 16:29:16,692 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 16:29:16,692 - root - INFO - Saving a full checkpoint at step 14000 +2025-10-24 16:29:16,692 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 16:29:16,692 - root - INFO - Step 14000: lr=1.00E-05, loss= 1.2351 (max= 2.2011), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:29:16,692 - root - INFO - Step 14000: lr=1.00E-05, loss= 1.2351 (max= 2.2011), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:29:16,692 - root - INFO - Step 14000: lr=1.00E-05, loss= 1.2351 (max= 2.2011), tps=18228, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:29:16,692 - root - INFO - Saving a full checkpoint at step 14000 +2025-10-24 16:29:16,692 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 16:29:16,692 - root - INFO - Saving a full checkpoint at step 14000 +2025-10-24 16:29:16,692 - root - INFO - Saving a full checkpoint at step 14000 +2025-10-24 16:29:16,692 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 16:29:16,692 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 16:29:16,692 - root - INFO - Step 14000: lr=1.00E-05, loss= 1.2351 (max= 2.2011), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:29:16,692 - root - INFO - Step 14000: lr=1.00E-05, loss= 1.2351 (max= 2.2011), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:29:16,693 - root - INFO - Saving a full checkpoint at step 14000 +2025-10-24 16:29:16,693 - root - INFO - Saving a full checkpoint at step 14000 +2025-10-24 16:29:16,693 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 16:29:16,693 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 16:29:16,693 - root - INFO - Step 14000: lr=1.00E-05, loss= 1.2351 (max= 2.2011), tps=18227, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:29:16,693 - root - INFO - Saving a full checkpoint at step 14000 +2025-10-24 16:29:16,693 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-14000! Save time: 4.714277744293213 +2025-10-24 16:29:30,031 - root - INFO - Finished saving the checkpoint in 13.34 seconds +2025-10-24 16:29:30,038 - root - INFO - Finished saving the checkpoint in 13.35 seconds +2025-10-24 16:29:30,038 - root - INFO - Finished saving the checkpoint in 13.35 seconds +2025-10-24 16:29:30,038 - root - INFO - Finished saving the checkpoint in 13.35 seconds +2025-10-24 16:29:30,039 - root - INFO - Finished saving the checkpoint in 13.35 seconds +2025-10-24 16:29:30,040 - root - INFO - Finished saving the checkpoint in 13.35 seconds +2025-10-24 16:29:30,042 - root - INFO - Finished saving the checkpoint in 13.35 seconds +2025-10-24 16:29:30,042 - root - INFO - Finished saving the checkpoint in 13.35 seconds +2025-10-24 16:29:48,010 - root - INFO - Step 14010: lr=1.00E-05, loss= 1.2870 (max= 2.8450), tps=10464, mfu=21.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:29:48,011 - root - INFO - Step 14010: lr=1.00E-05, loss= 1.2870 (max= 2.8450), tps=10464, mfu=21.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:29:48,011 - root - INFO - Step 14010: lr=1.00E-05, loss= 1.2870 (max= 2.8450), tps=10464, mfu=21.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:29:48,011 - root - INFO - Step 14010: lr=1.00E-05, loss= 1.2870 (max= 2.8450), tps=10464, mfu=21.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:29:48,011 - root - INFO - Step 14010: lr=1.00E-05, loss= 1.2870 (max= 2.8450), tps=10464, mfu=21.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:29:48,011 - root - INFO - Step 14010: lr=1.00E-05, loss= 1.2870 (max= 2.8450), tps=10464, mfu=21.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:29:48,011 - root - INFO - Step 14010: lr=1.00E-05, loss= 1.2870 (max= 2.8450), tps=10464, mfu=21.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:29:48,011 - root - INFO - Step 14010: lr=1.00E-05, loss= 1.2870 (max= 2.8450), tps=10464, mfu=21.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:30:06,026 - root - INFO - Step 14020: lr=1.00E-05, loss= 1.2796 (max= 2.3057), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:06,026 - root - INFO - Step 14020: lr=1.00E-05, loss= 1.2796 (max= 2.3057), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:06,026 - root - INFO - Step 14020: lr=1.00E-05, loss= 1.2796 (max= 2.3057), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:06,026 - root - INFO - Step 14020: lr=1.00E-05, loss= 1.2796 (max= 2.3057), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:06,026 - root - INFO - Step 14020: lr=1.00E-05, loss= 1.2796 (max= 2.3057), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:06,026 - root - INFO - Step 14020: lr=1.00E-05, loss= 1.2796 (max= 2.3057), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:06,026 - root - INFO - Step 14020: lr=1.00E-05, loss= 1.2796 (max= 2.3057), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:06,026 - root - INFO - Step 14020: lr=1.00E-05, loss= 1.2796 (max= 2.3057), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:24,063 - root - INFO - Step 14030: lr=1.00E-05, loss= 1.2888 (max= 2.7274), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:24,063 - root - INFO - Step 14030: lr=1.00E-05, loss= 1.2888 (max= 2.7274), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:24,063 - root - INFO - Step 14030: lr=1.00E-05, loss= 1.2888 (max= 2.7274), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:24,063 - root - INFO - Step 14030: lr=1.00E-05, loss= 1.2888 (max= 2.7274), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:24,063 - root - INFO - Step 14030: lr=1.00E-05, loss= 1.2888 (max= 2.7274), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:24,063 - root - INFO - Step 14030: lr=1.00E-05, loss= 1.2888 (max= 2.7274), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:24,063 - root - INFO - Step 14030: lr=1.00E-05, loss= 1.2888 (max= 2.7274), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:24,063 - root - INFO - Step 14030: lr=1.00E-05, loss= 1.2888 (max= 2.7274), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:42,091 - root - INFO - Step 14040: lr=1.00E-05, loss= 1.2286 (max= 2.9616), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:42,091 - root - INFO - Step 14040: lr=1.00E-05, loss= 1.2286 (max= 2.9616), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:42,092 - root - INFO - Step 14040: lr=1.00E-05, loss= 1.2286 (max= 2.9616), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:42,092 - root - INFO - Step 14040: lr=1.00E-05, loss= 1.2286 (max= 2.9616), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:42,092 - root - INFO - Step 14040: lr=1.00E-05, loss= 1.2286 (max= 2.9616), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:42,092 - root - INFO - Step 14040: lr=1.00E-05, loss= 1.2286 (max= 2.9616), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:42,092 - root - INFO - Step 14040: lr=1.00E-05, loss= 1.2286 (max= 2.9616), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:30:42,092 - root - INFO - Step 14040: lr=1.00E-05, loss= 1.2286 (max= 2.9616), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:00,137 - root - INFO - Step 14050: lr=1.00E-05, loss= 1.2420 (max= 2.3659), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:00,137 - root - INFO - Step 14050: lr=1.00E-05, loss= 1.2420 (max= 2.3659), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:00,137 - root - INFO - Step 14050: lr=1.00E-05, loss= 1.2420 (max= 2.3659), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:00,137 - root - INFO - Step 14050: lr=1.00E-05, loss= 1.2420 (max= 2.3659), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:00,137 - root - INFO - Step 14050: lr=1.00E-05, loss= 1.2420 (max= 2.3659), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:00,137 - root - INFO - Step 14050: lr=1.00E-05, loss= 1.2420 (max= 2.3659), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:00,138 - root - INFO - Step 14050: lr=1.00E-05, loss= 1.2420 (max= 2.3659), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:00,138 - root - INFO - Step 14050: lr=1.00E-05, loss= 1.2420 (max= 2.3659), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:18,169 - root - INFO - Step 14060: lr=1.00E-05, loss= 1.2789 (max= 3.6423), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:18,169 - root - INFO - Step 14060: lr=1.00E-05, loss= 1.2789 (max= 3.6423), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:18,169 - root - INFO - Step 14060: lr=1.00E-05, loss= 1.2789 (max= 3.6423), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:18,169 - root - INFO - Step 14060: lr=1.00E-05, loss= 1.2789 (max= 3.6423), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:18,170 - root - INFO - Step 14060: lr=1.00E-05, loss= 1.2789 (max= 3.6423), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:18,170 - root - INFO - Step 14060: lr=1.00E-05, loss= 1.2789 (max= 3.6423), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:18,170 - root - INFO - Step 14060: lr=1.00E-05, loss= 1.2789 (max= 3.6423), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:18,170 - root - INFO - Step 14060: lr=1.00E-05, loss= 1.2789 (max= 3.6423), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:36,210 - root - INFO - Step 14070: lr=1.00E-05, loss= 1.2743 (max= 2.3373), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:36,210 - root - INFO - Step 14070: lr=1.00E-05, loss= 1.2743 (max= 2.3373), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:36,210 - root - INFO - Step 14070: lr=1.00E-05, loss= 1.2743 (max= 2.3373), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:36,210 - root - INFO - Step 14070: lr=1.00E-05, loss= 1.2743 (max= 2.3373), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:36,211 - root - INFO - Step 14070: lr=1.00E-05, loss= 1.2743 (max= 2.3373), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:36,211 - root - INFO - Step 14070: lr=1.00E-05, loss= 1.2743 (max= 2.3373), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:36,211 - root - INFO - Step 14070: lr=1.00E-05, loss= 1.2743 (max= 2.3373), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:36,211 - root - INFO - Step 14070: lr=1.00E-05, loss= 1.2743 (max= 2.3373), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:54,271 - root - INFO - Step 14080: lr=1.00E-05, loss= 1.2403 (max= 2.4331), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:54,271 - root - INFO - Step 14080: lr=1.00E-05, loss= 1.2403 (max= 2.4331), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:54,271 - root - INFO - Step 14080: lr=1.00E-05, loss= 1.2403 (max= 2.4331), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:54,271 - root - INFO - Step 14080: lr=1.00E-05, loss= 1.2403 (max= 2.4331), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:54,271 - root - INFO - Step 14080: lr=1.00E-05, loss= 1.2403 (max= 2.4331), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:54,272 - root - INFO - Step 14080: lr=1.00E-05, loss= 1.2403 (max= 2.4331), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:54,272 - root - INFO - Step 14080: lr=1.00E-05, loss= 1.2403 (max= 2.4331), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:31:54,272 - root - INFO - Step 14080: lr=1.00E-05, loss= 1.2403 (max= 2.4331), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:12,268 - root - INFO - Step 14090: lr=1.00E-05, loss= 1.2804 (max= 2.8119), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:12,268 - root - INFO - Step 14090: lr=1.00E-05, loss= 1.2804 (max= 2.8119), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:12,268 - root - INFO - Step 14090: lr=1.00E-05, loss= 1.2804 (max= 2.8119), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:12,268 - root - INFO - Step 14090: lr=1.00E-05, loss= 1.2804 (max= 2.8119), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:12,268 - root - INFO - Step 14090: lr=1.00E-05, loss= 1.2804 (max= 2.8119), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:12,269 - root - INFO - Step 14090: lr=1.00E-05, loss= 1.2804 (max= 2.8119), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:12,269 - root - INFO - Step 14090: lr=1.00E-05, loss= 1.2804 (max= 2.8119), tps=18212, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:12,269 - root - INFO - Step 14090: lr=1.00E-05, loss= 1.2804 (max= 2.8119), tps=18211, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:30,321 - root - INFO - Step 14100: lr=1.00E-05, loss= 1.2473 (max= 2.3860), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:30,321 - root - INFO - Step 14100: lr=1.00E-05, loss= 1.2473 (max= 2.3860), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:30,321 - root - INFO - Step 14100: lr=1.00E-05, loss= 1.2473 (max= 2.3860), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:30,321 - root - INFO - Step 14100: lr=1.00E-05, loss= 1.2473 (max= 2.3860), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:30,321 - root - INFO - Step 14100: lr=1.00E-05, loss= 1.2473 (max= 2.3860), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:30,321 - root - INFO - Step 14100: lr=1.00E-05, loss= 1.2473 (max= 2.3860), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:30,321 - root - INFO - Step 14100: lr=1.00E-05, loss= 1.2473 (max= 2.3860), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:30,322 - root - INFO - Step 14100: lr=1.00E-05, loss= 1.2473 (max= 2.3860), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:48,357 - root - INFO - Step 14110: lr=1.00E-05, loss= 1.2855 (max= 2.4903), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:48,357 - root - INFO - Step 14110: lr=1.00E-05, loss= 1.2855 (max= 2.4903), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:48,357 - root - INFO - Step 14110: lr=1.00E-05, loss= 1.2855 (max= 2.4903), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:48,357 - root - INFO - Step 14110: lr=1.00E-05, loss= 1.2855 (max= 2.4903), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:48,357 - root - INFO - Step 14110: lr=1.00E-05, loss= 1.2855 (max= 2.4903), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:48,357 - root - INFO - Step 14110: lr=1.00E-05, loss= 1.2855 (max= 2.4903), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:48,358 - root - INFO - Step 14110: lr=1.00E-05, loss= 1.2855 (max= 2.4903), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:32:48,358 - root - INFO - Step 14110: lr=1.00E-05, loss= 1.2855 (max= 2.4903), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:33:06,382 - root - INFO - Step 14120: lr=1.00E-05, loss= 1.2553 (max= 2.3716), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:33:06,382 - root - INFO - Step 14120: lr=1.00E-05, loss= 1.2553 (max= 2.3716), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:33:06,383 - root - INFO - Step 14120: lr=1.00E-05, loss= 1.2553 (max= 2.3716), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:33:06,383 - root - INFO - Step 14120: lr=1.00E-05, loss= 1.2553 (max= 2.3716), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:33:06,383 - root - INFO - Step 14120: lr=1.00E-05, loss= 1.2553 (max= 2.3716), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:33:06,383 - root - INFO - Step 14120: lr=1.00E-05, loss= 1.2553 (max= 2.3716), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:33:06,383 - root - INFO - Step 14120: lr=1.00E-05, loss= 1.2553 (max= 2.3716), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:33:06,383 - root - INFO - Step 14120: lr=1.00E-05, loss= 1.2553 (max= 2.3716), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:33:24,411 - root - INFO - Step 14130: lr=1.00E-05, loss= 1.2397 (max= 2.9124), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:33:24,412 - root - INFO - Step 14130: lr=1.00E-05, loss= 1.2397 (max= 2.9124), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:33:24,412 - root - INFO - Step 14130: lr=1.00E-05, loss= 1.2397 (max= 2.9124), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:33:24,412 - root - INFO - Step 14130: lr=1.00E-05, loss= 1.2397 (max= 2.9124), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:33:24,412 - root - INFO - Step 14130: lr=1.00E-05, loss= 1.2397 (max= 2.9124), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:33:24,412 - root - INFO - Step 14130: lr=1.00E-05, loss= 1.2397 (max= 2.9124), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:33:24,412 - root - INFO - Step 14130: lr=1.00E-05, loss= 1.2397 (max= 2.9124), tps=18178, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:33:24,413 - root - INFO - Step 14130: lr=1.00E-05, loss= 1.2397 (max= 2.9124), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:33:42,420 - root - INFO - Step 14140: lr=1.00E-05, loss= 1.2650 (max= 2.8200), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:33:42,420 - root - INFO - Step 14140: lr=1.00E-05, loss= 1.2650 (max= 2.8200), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:33:42,420 - root - INFO - Step 14140: lr=1.00E-05, loss= 1.2650 (max= 2.8200), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:33:42,420 - root - INFO - Step 14140: lr=1.00E-05, loss= 1.2650 (max= 2.8200), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:33:42,421 - root - INFO - Step 14140: lr=1.00E-05, loss= 1.2650 (max= 2.8200), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:33:42,421 - root - INFO - Step 14140: lr=1.00E-05, loss= 1.2650 (max= 2.8200), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:33:42,421 - root - INFO - Step 14140: lr=1.00E-05, loss= 1.2650 (max= 2.8200), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:33:42,421 - root - INFO - Step 14140: lr=1.00E-05, loss= 1.2650 (max= 2.8200), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:34:00,473 - root - INFO - Step 14150: lr=1.00E-05, loss= 1.2643 (max= 2.8027), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:00,473 - root - INFO - Step 14150: lr=1.00E-05, loss= 1.2643 (max= 2.8027), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:00,473 - root - INFO - Step 14150: lr=1.00E-05, loss= 1.2643 (max= 2.8027), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:00,473 - root - INFO - Step 14150: lr=1.00E-05, loss= 1.2643 (max= 2.8027), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:00,473 - root - INFO - Step 14150: lr=1.00E-05, loss= 1.2643 (max= 2.8027), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:00,473 - root - INFO - Step 14150: lr=1.00E-05, loss= 1.2643 (max= 2.8027), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:00,473 - root - INFO - Step 14150: lr=1.00E-05, loss= 1.2643 (max= 2.8027), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:00,473 - root - INFO - Step 14150: lr=1.00E-05, loss= 1.2643 (max= 2.8027), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:18,533 - root - INFO - Step 14160: lr=1.00E-05, loss= 1.2472 (max= 2.7619), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:34:18,533 - root - INFO - Step 14160: lr=1.00E-05, loss= 1.2472 (max= 2.7619), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:34:18,533 - root - INFO - Step 14160: lr=1.00E-05, loss= 1.2472 (max= 2.7619), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:34:18,534 - root - INFO - Step 14160: lr=1.00E-05, loss= 1.2472 (max= 2.7619), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:34:18,534 - root - INFO - Step 14160: lr=1.00E-05, loss= 1.2472 (max= 2.7619), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:34:18,534 - root - INFO - Step 14160: lr=1.00E-05, loss= 1.2472 (max= 2.7619), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:34:18,534 - root - INFO - Step 14160: lr=1.00E-05, loss= 1.2472 (max= 2.7619), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:34:18,534 - root - INFO - Step 14160: lr=1.00E-05, loss= 1.2472 (max= 2.7619), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:34:36,557 - root - INFO - Step 14170: lr=1.00E-05, loss= 1.2589 (max= 2.8027), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:36,558 - root - INFO - Step 14170: lr=1.00E-05, loss= 1.2589 (max= 2.8027), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:36,558 - root - INFO - Step 14170: lr=1.00E-05, loss= 1.2589 (max= 2.8027), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:36,558 - root - INFO - Step 14170: lr=1.00E-05, loss= 1.2589 (max= 2.8027), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:36,558 - root - INFO - Step 14170: lr=1.00E-05, loss= 1.2589 (max= 2.8027), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:36,558 - root - INFO - Step 14170: lr=1.00E-05, loss= 1.2589 (max= 2.8027), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:36,558 - root - INFO - Step 14170: lr=1.00E-05, loss= 1.2589 (max= 2.8027), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:36,558 - root - INFO - Step 14170: lr=1.00E-05, loss= 1.2589 (max= 2.8027), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:54,592 - root - INFO - Step 14180: lr=1.00E-05, loss= 1.2700 (max= 3.4307), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:54,593 - root - INFO - Step 14180: lr=1.00E-05, loss= 1.2700 (max= 3.4307), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:54,593 - root - INFO - Step 14180: lr=1.00E-05, loss= 1.2700 (max= 3.4307), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:54,593 - root - INFO - Step 14180: lr=1.00E-05, loss= 1.2700 (max= 3.4307), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:54,593 - root - INFO - Step 14180: lr=1.00E-05, loss= 1.2700 (max= 3.4307), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:54,593 - root - INFO - Step 14180: lr=1.00E-05, loss= 1.2700 (max= 3.4307), tps=18174, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:54,594 - root - INFO - Step 14180: lr=1.00E-05, loss= 1.2700 (max= 3.4307), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:34:54,594 - root - INFO - Step 14180: lr=1.00E-05, loss= 1.2700 (max= 3.4307), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:12,626 - root - INFO - Step 14190: lr=1.00E-05, loss= 1.2466 (max= 2.7630), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:12,626 - root - INFO - Step 14190: lr=1.00E-05, loss= 1.2466 (max= 2.7630), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:12,626 - root - INFO - Step 14190: lr=1.00E-05, loss= 1.2466 (max= 2.7630), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:12,626 - root - INFO - Step 14190: lr=1.00E-05, loss= 1.2466 (max= 2.7630), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:12,627 - root - INFO - Step 14190: lr=1.00E-05, loss= 1.2466 (max= 2.7630), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:12,627 - root - INFO - Step 14190: lr=1.00E-05, loss= 1.2466 (max= 2.7630), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:12,627 - root - INFO - Step 14190: lr=1.00E-05, loss= 1.2466 (max= 2.7630), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:12,627 - root - INFO - Step 14190: lr=1.00E-05, loss= 1.2466 (max= 2.7630), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:30,691 - root - INFO - Step 14200: lr=1.00E-05, loss= 1.2875 (max= 3.7270), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:30,691 - root - INFO - Step 14200: lr=1.00E-05, loss= 1.2875 (max= 3.7270), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:30,691 - root - INFO - Step 14200: lr=1.00E-05, loss= 1.2875 (max= 3.7270), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:30,691 - root - INFO - Step 14200: lr=1.00E-05, loss= 1.2875 (max= 3.7270), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:30,691 - root - INFO - Step 14200: lr=1.00E-05, loss= 1.2875 (max= 3.7270), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:30,691 - root - INFO - Step 14200: lr=1.00E-05, loss= 1.2875 (max= 3.7270), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:30,691 - root - INFO - Step 14200: lr=1.00E-05, loss= 1.2875 (max= 3.7270), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:30,691 - root - INFO - Step 14200: lr=1.00E-05, loss= 1.2875 (max= 3.7270), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:48,754 - root - INFO - Step 14210: lr=1.00E-05, loss= 1.2722 (max= 2.8536), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:48,754 - root - INFO - Step 14210: lr=1.00E-05, loss= 1.2722 (max= 2.8536), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:48,754 - root - INFO - Step 14210: lr=1.00E-05, loss= 1.2722 (max= 2.8536), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:48,754 - root - INFO - Step 14210: lr=1.00E-05, loss= 1.2722 (max= 2.8536), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:48,755 - root - INFO - Step 14210: lr=1.00E-05, loss= 1.2722 (max= 2.8536), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:48,755 - root - INFO - Step 14210: lr=1.00E-05, loss= 1.2722 (max= 2.8536), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:48,755 - root - INFO - Step 14210: lr=1.00E-05, loss= 1.2722 (max= 2.8536), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:35:48,755 - root - INFO - Step 14210: lr=1.00E-05, loss= 1.2722 (max= 2.8536), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:36:06,763 - root - INFO - Step 14220: lr=1.00E-05, loss= 1.2521 (max= 2.1827), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:36:06,763 - root - INFO - Step 14220: lr=1.00E-05, loss= 1.2521 (max= 2.1827), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:36:06,763 - root - INFO - Step 14220: lr=1.00E-05, loss= 1.2521 (max= 2.1827), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:36:06,763 - root - INFO - Step 14220: lr=1.00E-05, loss= 1.2521 (max= 2.1827), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:36:06,763 - root - INFO - Step 14220: lr=1.00E-05, loss= 1.2521 (max= 2.1827), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:36:06,763 - root - INFO - Step 14220: lr=1.00E-05, loss= 1.2521 (max= 2.1827), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:36:06,763 - root - INFO - Step 14220: lr=1.00E-05, loss= 1.2521 (max= 2.1827), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:36:06,763 - root - INFO - Step 14220: lr=1.00E-05, loss= 1.2521 (max= 2.1827), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:36:24,811 - root - INFO - Step 14230: lr=1.00E-05, loss= 1.2959 (max= 2.8241), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:36:24,811 - root - INFO - Step 14230: lr=1.00E-05, loss= 1.2959 (max= 2.8241), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:36:24,811 - root - INFO - Step 14230: lr=1.00E-05, loss= 1.2959 (max= 2.8241), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:36:24,811 - root - INFO - Step 14230: lr=1.00E-05, loss= 1.2959 (max= 2.8241), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:36:24,811 - root - INFO - Step 14230: lr=1.00E-05, loss= 1.2959 (max= 2.8241), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:36:24,811 - root - INFO - Step 14230: lr=1.00E-05, loss= 1.2959 (max= 2.8241), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:36:24,811 - root - INFO - Step 14230: lr=1.00E-05, loss= 1.2959 (max= 2.8241), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:36:24,811 - root - INFO - Step 14230: lr=1.00E-05, loss= 1.2959 (max= 2.8241), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:36:42,834 - root - INFO - Step 14240: lr=1.00E-05, loss= 1.2537 (max= 2.1731), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:36:42,834 - root - INFO - Step 14240: lr=1.00E-05, loss= 1.2537 (max= 2.1731), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:36:42,834 - root - INFO - Step 14240: lr=1.00E-05, loss= 1.2537 (max= 2.1731), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:36:42,835 - root - INFO - Step 14240: lr=1.00E-05, loss= 1.2537 (max= 2.1731), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:36:42,835 - root - INFO - Step 14240: lr=1.00E-05, loss= 1.2537 (max= 2.1731), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:36:42,835 - root - INFO - Step 14240: lr=1.00E-05, loss= 1.2537 (max= 2.1731), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:36:42,836 - root - INFO - Step 14240: lr=1.00E-05, loss= 1.2537 (max= 2.1731), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:36:42,836 - root - INFO - Step 14240: lr=1.00E-05, loss= 1.2537 (max= 2.1731), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:00,852 - root - INFO - Step 14250: lr=1.00E-05, loss= 1.2458 (max= 2.4417), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:00,852 - root - INFO - Step 14250: lr=1.00E-05, loss= 1.2458 (max= 2.4417), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:00,852 - root - INFO - Step 14250: lr=1.00E-05, loss= 1.2458 (max= 2.4417), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:00,852 - root - INFO - Step 14250: lr=1.00E-05, loss= 1.2458 (max= 2.4417), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:00,852 - root - INFO - Step 14250: lr=1.00E-05, loss= 1.2458 (max= 2.4417), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:00,853 - root - INFO - Step 14250: lr=1.00E-05, loss= 1.2458 (max= 2.4417), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:00,853 - root - INFO - Step 14250: lr=1.00E-05, loss= 1.2458 (max= 2.4417), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:00,853 - root - INFO - Step 14250: lr=1.00E-05, loss= 1.2458 (max= 2.4417), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:18,892 - root - INFO - Step 14260: lr=1.00E-05, loss= 1.2238 (max= 2.3043), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:18,892 - root - INFO - Step 14260: lr=1.00E-05, loss= 1.2238 (max= 2.3043), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:18,892 - root - INFO - Step 14260: lr=1.00E-05, loss= 1.2238 (max= 2.3043), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:18,892 - root - INFO - Step 14260: lr=1.00E-05, loss= 1.2238 (max= 2.3043), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:18,892 - root - INFO - Step 14260: lr=1.00E-05, loss= 1.2238 (max= 2.3043), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:18,892 - root - INFO - Step 14260: lr=1.00E-05, loss= 1.2238 (max= 2.3043), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:18,892 - root - INFO - Step 14260: lr=1.00E-05, loss= 1.2238 (max= 2.3043), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:18,892 - root - INFO - Step 14260: lr=1.00E-05, loss= 1.2238 (max= 2.3043), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:36,930 - root - INFO - Step 14270: lr=1.00E-05, loss= 1.2283 (max= 2.4637), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:36,931 - root - INFO - Step 14270: lr=1.00E-05, loss= 1.2283 (max= 2.4637), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:36,931 - root - INFO - Step 14270: lr=1.00E-05, loss= 1.2283 (max= 2.4637), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:36,931 - root - INFO - Step 14270: lr=1.00E-05, loss= 1.2283 (max= 2.4637), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:36,931 - root - INFO - Step 14270: lr=1.00E-05, loss= 1.2283 (max= 2.4637), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:36,931 - root - INFO - Step 14270: lr=1.00E-05, loss= 1.2283 (max= 2.4637), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:36,931 - root - INFO - Step 14270: lr=1.00E-05, loss= 1.2283 (max= 2.4637), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:36,932 - root - INFO - Step 14270: lr=1.00E-05, loss= 1.2283 (max= 2.4637), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:54,996 - root - INFO - Step 14280: lr=1.00E-05, loss= 1.2537 (max= 2.6345), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:54,997 - root - INFO - Step 14280: lr=1.00E-05, loss= 1.2537 (max= 2.6345), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:54,997 - root - INFO - Step 14280: lr=1.00E-05, loss= 1.2537 (max= 2.6345), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:54,997 - root - INFO - Step 14280: lr=1.00E-05, loss= 1.2537 (max= 2.6345), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:54,997 - root - INFO - Step 14280: lr=1.00E-05, loss= 1.2537 (max= 2.6345), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:54,997 - root - INFO - Step 14280: lr=1.00E-05, loss= 1.2537 (max= 2.6345), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:54,997 - root - INFO - Step 14280: lr=1.00E-05, loss= 1.2537 (max= 2.6345), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:37:55,000 - root - INFO - Step 14280: lr=1.00E-05, loss= 1.2537 (max= 2.6345), tps=18142, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:38:13,050 - root - INFO - Step 14290: lr=1.00E-05, loss= 1.2741 (max= 2.7912), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:38:13,050 - root - INFO - Step 14290: lr=1.00E-05, loss= 1.2741 (max= 2.7912), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:38:13,050 - root - INFO - Step 14290: lr=1.00E-05, loss= 1.2741 (max= 2.7912), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:38:13,050 - root - INFO - Step 14290: lr=1.00E-05, loss= 1.2741 (max= 2.7912), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:38:13,050 - root - INFO - Step 14290: lr=1.00E-05, loss= 1.2741 (max= 2.7912), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:38:13,050 - root - INFO - Step 14290: lr=1.00E-05, loss= 1.2741 (max= 2.7912), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:38:13,050 - root - INFO - Step 14290: lr=1.00E-05, loss= 1.2741 (max= 2.7912), tps=18154, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:38:13,051 - root - INFO - Step 14290: lr=1.00E-05, loss= 1.2741 (max= 2.7912), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:38:31,088 - root - INFO - Step 14300: lr=1.00E-05, loss= 1.2674 (max= 2.8525), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:38:31,088 - root - INFO - Step 14300: lr=1.00E-05, loss= 1.2674 (max= 2.8525), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:38:31,088 - root - INFO - Step 14300: lr=1.00E-05, loss= 1.2674 (max= 2.8525), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:38:31,088 - root - INFO - Step 14300: lr=1.00E-05, loss= 1.2674 (max= 2.8525), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:38:31,088 - root - INFO - Step 14300: lr=1.00E-05, loss= 1.2674 (max= 2.8525), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:38:31,088 - root - INFO - Step 14300: lr=1.00E-05, loss= 1.2674 (max= 2.8525), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:38:31,088 - root - INFO - Step 14300: lr=1.00E-05, loss= 1.2674 (max= 2.8525), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:38:31,089 - root - INFO - Step 14300: lr=1.00E-05, loss= 1.2674 (max= 2.8525), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:38:49,159 - root - INFO - Step 14310: lr=1.00E-05, loss= 1.2543 (max= 2.1367), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:38:49,159 - root - INFO - Step 14310: lr=1.00E-05, loss= 1.2543 (max= 2.1367), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:38:49,160 - root - INFO - Step 14310: lr=1.00E-05, loss= 1.2543 (max= 2.1367), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:38:49,160 - root - INFO - Step 14310: lr=1.00E-05, loss= 1.2543 (max= 2.1367), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:38:49,160 - root - INFO - Step 14310: lr=1.00E-05, loss= 1.2543 (max= 2.1367), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:38:49,160 - root - INFO - Step 14310: lr=1.00E-05, loss= 1.2543 (max= 2.1367), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:38:49,160 - root - INFO - Step 14310: lr=1.00E-05, loss= 1.2543 (max= 2.1367), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:38:49,160 - root - INFO - Step 14310: lr=1.00E-05, loss= 1.2543 (max= 2.1367), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:39:07,166 - root - INFO - Step 14320: lr=1.00E-05, loss= 1.2456 (max= 2.7871), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:39:07,166 - root - INFO - Step 14320: lr=1.00E-05, loss= 1.2456 (max= 2.7871), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:39:07,166 - root - INFO - Step 14320: lr=1.00E-05, loss= 1.2456 (max= 2.7871), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:39:07,166 - root - INFO - Step 14320: lr=1.00E-05, loss= 1.2456 (max= 2.7871), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:39:07,166 - root - INFO - Step 14320: lr=1.00E-05, loss= 1.2456 (max= 2.7871), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:39:07,166 - root - INFO - Step 14320: lr=1.00E-05, loss= 1.2456 (max= 2.7871), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:39:07,166 - root - INFO - Step 14320: lr=1.00E-05, loss= 1.2456 (max= 2.7871), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:39:07,166 - root - INFO - Step 14320: lr=1.00E-05, loss= 1.2456 (max= 2.7871), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:39:25,209 - root - INFO - Step 14330: lr=1.00E-05, loss= 1.2581 (max= 2.7316), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:39:25,209 - root - INFO - Step 14330: lr=1.00E-05, loss= 1.2581 (max= 2.7316), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:39:25,209 - root - INFO - Step 14330: lr=1.00E-05, loss= 1.2581 (max= 2.7316), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:39:25,209 - root - INFO - Step 14330: lr=1.00E-05, loss= 1.2581 (max= 2.7316), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:39:25,209 - root - INFO - Step 14330: lr=1.00E-05, loss= 1.2581 (max= 2.7316), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:39:25,209 - root - INFO - Step 14330: lr=1.00E-05, loss= 1.2581 (max= 2.7316), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:39:25,209 - root - INFO - Step 14330: lr=1.00E-05, loss= 1.2581 (max= 2.7316), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:39:25,210 - root - INFO - Step 14330: lr=1.00E-05, loss= 1.2581 (max= 2.7316), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:39:43,247 - root - INFO - Step 14340: lr=1.00E-05, loss= 1.2123 (max= 2.7917), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:39:43,247 - root - INFO - Step 14340: lr=1.00E-05, loss= 1.2123 (max= 2.7917), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:39:43,247 - root - INFO - Step 14340: lr=1.00E-05, loss= 1.2123 (max= 2.7917), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:39:43,247 - root - INFO - Step 14340: lr=1.00E-05, loss= 1.2123 (max= 2.7917), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:39:43,247 - root - INFO - Step 14340: lr=1.00E-05, loss= 1.2123 (max= 2.7917), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:39:43,247 - root - INFO - Step 14340: lr=1.00E-05, loss= 1.2123 (max= 2.7917), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:39:43,247 - root - INFO - Step 14340: lr=1.00E-05, loss= 1.2123 (max= 2.7917), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:39:43,247 - root - INFO - Step 14340: lr=1.00E-05, loss= 1.2123 (max= 2.7917), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:01,273 - root - INFO - Step 14350: lr=1.00E-05, loss= 1.2802 (max= 2.7473), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:01,273 - root - INFO - Step 14350: lr=1.00E-05, loss= 1.2802 (max= 2.7473), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:01,273 - root - INFO - Step 14350: lr=1.00E-05, loss= 1.2802 (max= 2.7473), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:01,273 - root - INFO - Step 14350: lr=1.00E-05, loss= 1.2802 (max= 2.7473), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:01,273 - root - INFO - Step 14350: lr=1.00E-05, loss= 1.2802 (max= 2.7473), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:01,273 - root - INFO - Step 14350: lr=1.00E-05, loss= 1.2802 (max= 2.7473), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:01,273 - root - INFO - Step 14350: lr=1.00E-05, loss= 1.2802 (max= 2.7473), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:01,273 - root - INFO - Step 14350: lr=1.00E-05, loss= 1.2802 (max= 2.7473), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:19,314 - root - INFO - Step 14360: lr=1.00E-05, loss= 1.2139 (max= 2.1122), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:19,315 - root - INFO - Step 14360: lr=1.00E-05, loss= 1.2139 (max= 2.1122), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:19,315 - root - INFO - Step 14360: lr=1.00E-05, loss= 1.2139 (max= 2.1122), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:19,315 - root - INFO - Step 14360: lr=1.00E-05, loss= 1.2139 (max= 2.1122), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:19,315 - root - INFO - Step 14360: lr=1.00E-05, loss= 1.2139 (max= 2.1122), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:19,315 - root - INFO - Step 14360: lr=1.00E-05, loss= 1.2139 (max= 2.1122), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:19,315 - root - INFO - Step 14360: lr=1.00E-05, loss= 1.2139 (max= 2.1122), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:19,316 - root - INFO - Step 14360: lr=1.00E-05, loss= 1.2139 (max= 2.1122), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:37,366 - root - INFO - Step 14370: lr=1.00E-05, loss= 1.2563 (max= 2.6948), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:37,366 - root - INFO - Step 14370: lr=1.00E-05, loss= 1.2563 (max= 2.6948), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:37,366 - root - INFO - Step 14370: lr=1.00E-05, loss= 1.2563 (max= 2.6948), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:37,366 - root - INFO - Step 14370: lr=1.00E-05, loss= 1.2563 (max= 2.6948), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:37,366 - root - INFO - Step 14370: lr=1.00E-05, loss= 1.2563 (max= 2.6948), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:37,366 - root - INFO - Step 14370: lr=1.00E-05, loss= 1.2563 (max= 2.6948), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:37,366 - root - INFO - Step 14370: lr=1.00E-05, loss= 1.2563 (max= 2.6948), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:37,366 - root - INFO - Step 14370: lr=1.00E-05, loss= 1.2563 (max= 2.6948), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:55,398 - root - INFO - Step 14380: lr=1.00E-05, loss= 1.2579 (max= 2.3721), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:55,398 - root - INFO - Step 14380: lr=1.00E-05, loss= 1.2579 (max= 2.3721), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:55,398 - root - INFO - Step 14380: lr=1.00E-05, loss= 1.2579 (max= 2.3721), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:55,398 - root - INFO - Step 14380: lr=1.00E-05, loss= 1.2579 (max= 2.3721), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:55,398 - root - INFO - Step 14380: lr=1.00E-05, loss= 1.2579 (max= 2.3721), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:55,398 - root - INFO - Step 14380: lr=1.00E-05, loss= 1.2579 (max= 2.3721), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:55,399 - root - INFO - Step 14380: lr=1.00E-05, loss= 1.2579 (max= 2.3721), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:40:55,399 - root - INFO - Step 14380: lr=1.00E-05, loss= 1.2579 (max= 2.3721), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:41:13,408 - root - INFO - Step 14390: lr=1.00E-05, loss= 1.2432 (max= 3.2044), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:41:13,408 - root - INFO - Step 14390: lr=1.00E-05, loss= 1.2432 (max= 3.2044), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:41:13,408 - root - INFO - Step 14390: lr=1.00E-05, loss= 1.2432 (max= 3.2044), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:41:13,408 - root - INFO - Step 14390: lr=1.00E-05, loss= 1.2432 (max= 3.2044), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:41:13,408 - root - INFO - Step 14390: lr=1.00E-05, loss= 1.2432 (max= 3.2044), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:41:13,408 - root - INFO - Step 14390: lr=1.00E-05, loss= 1.2432 (max= 3.2044), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:41:13,408 - root - INFO - Step 14390: lr=1.00E-05, loss= 1.2432 (max= 3.2044), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:41:13,408 - root - INFO - Step 14390: lr=1.00E-05, loss= 1.2432 (max= 3.2044), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:41:31,435 - root - INFO - Step 14400: lr=1.00E-05, loss= 1.2491 (max= 2.4990), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:41:31,436 - root - INFO - Step 14400: lr=1.00E-05, loss= 1.2491 (max= 2.4990), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:41:31,436 - root - INFO - Step 14400: lr=1.00E-05, loss= 1.2491 (max= 2.4990), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:41:31,436 - root - INFO - Step 14400: lr=1.00E-05, loss= 1.2491 (max= 2.4990), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:41:31,436 - root - INFO - Step 14400: lr=1.00E-05, loss= 1.2491 (max= 2.4990), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:41:31,436 - root - INFO - Step 14400: lr=1.00E-05, loss= 1.2491 (max= 2.4990), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:41:31,436 - root - INFO - Step 14400: lr=1.00E-05, loss= 1.2491 (max= 2.4990), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:41:31,436 - root - INFO - Step 14400: lr=1.00E-05, loss= 1.2491 (max= 2.4990), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:41:49,469 - root - INFO - Step 14410: lr=1.00E-05, loss= 1.2549 (max= 2.2928), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:41:49,469 - root - INFO - Step 14410: lr=1.00E-05, loss= 1.2549 (max= 2.2928), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:41:49,469 - root - INFO - Step 14410: lr=1.00E-05, loss= 1.2549 (max= 2.2928), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:41:49,469 - root - INFO - Step 14410: lr=1.00E-05, loss= 1.2549 (max= 2.2928), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:41:49,469 - root - INFO - Step 14410: lr=1.00E-05, loss= 1.2549 (max= 2.2928), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:41:49,469 - root - INFO - Step 14410: lr=1.00E-05, loss= 1.2549 (max= 2.2928), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:41:49,469 - root - INFO - Step 14410: lr=1.00E-05, loss= 1.2549 (max= 2.2928), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:41:49,469 - root - INFO - Step 14410: lr=1.00E-05, loss= 1.2549 (max= 2.2928), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:07,480 - root - INFO - Step 14420: lr=1.00E-05, loss= 1.2265 (max= 3.7447), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:07,480 - root - INFO - Step 14420: lr=1.00E-05, loss= 1.2265 (max= 3.7447), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:07,480 - root - INFO - Step 14420: lr=1.00E-05, loss= 1.2265 (max= 3.7447), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:07,480 - root - INFO - Step 14420: lr=1.00E-05, loss= 1.2265 (max= 3.7447), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:07,480 - root - INFO - Step 14420: lr=1.00E-05, loss= 1.2265 (max= 3.7447), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:07,480 - root - INFO - Step 14420: lr=1.00E-05, loss= 1.2265 (max= 3.7447), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:07,481 - root - INFO - Step 14420: lr=1.00E-05, loss= 1.2265 (max= 3.7447), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:07,482 - root - INFO - Step 14420: lr=1.00E-05, loss= 1.2265 (max= 3.7447), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:25,513 - root - INFO - Step 14430: lr=1.00E-05, loss= 1.2745 (max= 2.8020), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:25,513 - root - INFO - Step 14430: lr=1.00E-05, loss= 1.2745 (max= 2.8020), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:25,513 - root - INFO - Step 14430: lr=1.00E-05, loss= 1.2745 (max= 2.8020), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:25,513 - root - INFO - Step 14430: lr=1.00E-05, loss= 1.2745 (max= 2.8020), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:25,513 - root - INFO - Step 14430: lr=1.00E-05, loss= 1.2745 (max= 2.8020), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:25,513 - root - INFO - Step 14430: lr=1.00E-05, loss= 1.2745 (max= 2.8020), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:25,514 - root - INFO - Step 14430: lr=1.00E-05, loss= 1.2745 (max= 2.8020), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:25,514 - root - INFO - Step 14430: lr=1.00E-05, loss= 1.2745 (max= 2.8020), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:43,529 - root - INFO - Step 14440: lr=1.00E-05, loss= 1.2309 (max= 2.3334), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:43,529 - root - INFO - Step 14440: lr=1.00E-05, loss= 1.2309 (max= 2.3334), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:43,529 - root - INFO - Step 14440: lr=1.00E-05, loss= 1.2309 (max= 2.3334), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:43,529 - root - INFO - Step 14440: lr=1.00E-05, loss= 1.2309 (max= 2.3334), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:43,529 - root - INFO - Step 14440: lr=1.00E-05, loss= 1.2309 (max= 2.3334), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:43,529 - root - INFO - Step 14440: lr=1.00E-05, loss= 1.2309 (max= 2.3334), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:43,530 - root - INFO - Step 14440: lr=1.00E-05, loss= 1.2309 (max= 2.3334), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:42:43,530 - root - INFO - Step 14440: lr=1.00E-05, loss= 1.2309 (max= 2.3334), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:01,568 - root - INFO - Step 14450: lr=1.00E-05, loss= 1.2505 (max= 3.6726), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:01,568 - root - INFO - Step 14450: lr=1.00E-05, loss= 1.2505 (max= 3.6726), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:01,568 - root - INFO - Step 14450: lr=1.00E-05, loss= 1.2505 (max= 3.6726), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:01,568 - root - INFO - Step 14450: lr=1.00E-05, loss= 1.2505 (max= 3.6726), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:01,568 - root - INFO - Step 14450: lr=1.00E-05, loss= 1.2505 (max= 3.6726), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:01,568 - root - INFO - Step 14450: lr=1.00E-05, loss= 1.2505 (max= 3.6726), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:01,568 - root - INFO - Step 14450: lr=1.00E-05, loss= 1.2505 (max= 3.6726), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:01,568 - root - INFO - Step 14450: lr=1.00E-05, loss= 1.2505 (max= 3.6726), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:19,608 - root - INFO - Step 14460: lr=1.00E-05, loss= 1.2689 (max= 2.8058), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:19,608 - root - INFO - Step 14460: lr=1.00E-05, loss= 1.2689 (max= 2.8058), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:19,608 - root - INFO - Step 14460: lr=1.00E-05, loss= 1.2689 (max= 2.8058), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:19,608 - root - INFO - Step 14460: lr=1.00E-05, loss= 1.2689 (max= 2.8058), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:19,608 - root - INFO - Step 14460: lr=1.00E-05, loss= 1.2689 (max= 2.8058), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:19,608 - root - INFO - Step 14460: lr=1.00E-05, loss= 1.2689 (max= 2.8058), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:19,608 - root - INFO - Step 14460: lr=1.00E-05, loss= 1.2689 (max= 2.8058), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:19,608 - root - INFO - Step 14460: lr=1.00E-05, loss= 1.2689 (max= 2.8058), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:37,628 - root - INFO - Step 14470: lr=1.00E-05, loss= 1.2485 (max= 2.2246), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:37,628 - root - INFO - Step 14470: lr=1.00E-05, loss= 1.2485 (max= 2.2246), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:37,628 - root - INFO - Step 14470: lr=1.00E-05, loss= 1.2485 (max= 2.2246), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:37,628 - root - INFO - Step 14470: lr=1.00E-05, loss= 1.2485 (max= 2.2246), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:37,628 - root - INFO - Step 14470: lr=1.00E-05, loss= 1.2485 (max= 2.2246), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:37,628 - root - INFO - Step 14470: lr=1.00E-05, loss= 1.2485 (max= 2.2246), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:37,628 - root - INFO - Step 14470: lr=1.00E-05, loss= 1.2485 (max= 2.2246), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:37,628 - root - INFO - Step 14470: lr=1.00E-05, loss= 1.2485 (max= 2.2246), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:55,655 - root - INFO - Step 14480: lr=1.00E-05, loss= 1.1948 (max= 2.8631), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:55,655 - root - INFO - Step 14480: lr=1.00E-05, loss= 1.1948 (max= 2.8631), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:55,655 - root - INFO - Step 14480: lr=1.00E-05, loss= 1.1948 (max= 2.8631), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:55,655 - root - INFO - Step 14480: lr=1.00E-05, loss= 1.1948 (max= 2.8631), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:55,655 - root - INFO - Step 14480: lr=1.00E-05, loss= 1.1948 (max= 2.8631), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:55,655 - root - INFO - Step 14480: lr=1.00E-05, loss= 1.1948 (max= 2.8631), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:55,656 - root - INFO - Step 14480: lr=1.00E-05, loss= 1.1948 (max= 2.8631), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:43:55,656 - root - INFO - Step 14480: lr=1.00E-05, loss= 1.1948 (max= 2.8631), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:13,710 - root - INFO - Step 14490: lr=1.00E-05, loss= 1.2592 (max= 2.8403), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:13,710 - root - INFO - Step 14490: lr=1.00E-05, loss= 1.2592 (max= 2.8403), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:13,710 - root - INFO - Step 14490: lr=1.00E-05, loss= 1.2592 (max= 2.8403), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:13,710 - root - INFO - Step 14490: lr=1.00E-05, loss= 1.2592 (max= 2.8403), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:13,711 - root - INFO - Step 14490: lr=1.00E-05, loss= 1.2592 (max= 2.8403), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:13,711 - root - INFO - Step 14490: lr=1.00E-05, loss= 1.2592 (max= 2.8403), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:13,711 - root - INFO - Step 14490: lr=1.00E-05, loss= 1.2592 (max= 2.8403), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:13,711 - root - INFO - Step 14490: lr=1.00E-05, loss= 1.2592 (max= 2.8403), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:31,813 - root - INFO - Step 14500: lr=1.00E-05, loss= 1.2468 (max= 2.8373), tps=18107, mfu=37.73%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:31,813 - root - INFO - Step 14500: lr=1.00E-05, loss= 1.2468 (max= 2.8373), tps=18106, mfu=37.73%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:31,813 - root - INFO - Step 14500: lr=1.00E-05, loss= 1.2468 (max= 2.8373), tps=18106, mfu=37.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:31,813 - root - INFO - Step 14500: lr=1.00E-05, loss= 1.2468 (max= 2.8373), tps=18107, mfu=37.73%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:31,813 - root - INFO - Step 14500: lr=1.00E-05, loss= 1.2468 (max= 2.8373), tps=18107, mfu=37.73%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:31,813 - root - INFO - Step 14500: lr=1.00E-05, loss= 1.2468 (max= 2.8373), tps=18107, mfu=37.73%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:31,813 - root - INFO - Step 14500: lr=1.00E-05, loss= 1.2468 (max= 2.8373), tps=18106, mfu=37.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:31,814 - root - INFO - Step 14500: lr=1.00E-05, loss= 1.2468 (max= 2.8373), tps=18106, mfu=37.72%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:49,808 - root - INFO - Step 14510: lr=1.00E-05, loss= 1.2670 (max= 2.3925), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:49,808 - root - INFO - Step 14510: lr=1.00E-05, loss= 1.2670 (max= 2.3925), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:49,808 - root - INFO - Step 14510: lr=1.00E-05, loss= 1.2670 (max= 2.3925), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:49,808 - root - INFO - Step 14510: lr=1.00E-05, loss= 1.2670 (max= 2.3925), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:49,808 - root - INFO - Step 14510: lr=1.00E-05, loss= 1.2670 (max= 2.3925), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:49,808 - root - INFO - Step 14510: lr=1.00E-05, loss= 1.2670 (max= 2.3925), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:49,808 - root - INFO - Step 14510: lr=1.00E-05, loss= 1.2670 (max= 2.3925), tps=18213, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:44:49,808 - root - INFO - Step 14510: lr=1.00E-05, loss= 1.2670 (max= 2.3925), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:45:07,812 - root - INFO - Step 14520: lr=1.00E-05, loss= 1.2196 (max= 2.7709), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:45:07,812 - root - INFO - Step 14520: lr=1.00E-05, loss= 1.2196 (max= 2.7709), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:45:07,812 - root - INFO - Step 14520: lr=1.00E-05, loss= 1.2196 (max= 2.7709), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:45:07,813 - root - INFO - Step 14520: lr=1.00E-05, loss= 1.2196 (max= 2.7709), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:45:07,813 - root - INFO - Step 14520: lr=1.00E-05, loss= 1.2196 (max= 2.7709), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:45:07,813 - root - INFO - Step 14520: lr=1.00E-05, loss= 1.2196 (max= 2.7709), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:45:07,813 - root - INFO - Step 14520: lr=1.00E-05, loss= 1.2196 (max= 2.7709), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:45:07,813 - root - INFO - Step 14520: lr=1.00E-05, loss= 1.2196 (max= 2.7709), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:45:25,814 - root - INFO - Step 14530: lr=1.00E-05, loss= 1.2569 (max= 3.4280), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:45:25,814 - root - INFO - Step 14530: lr=1.00E-05, loss= 1.2569 (max= 3.4280), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:45:25,814 - root - INFO - Step 14530: lr=1.00E-05, loss= 1.2569 (max= 3.4280), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:45:25,814 - root - INFO - Step 14530: lr=1.00E-05, loss= 1.2569 (max= 3.4280), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:45:25,814 - root - INFO - Step 14530: lr=1.00E-05, loss= 1.2569 (max= 3.4280), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:45:25,814 - root - INFO - Step 14530: lr=1.00E-05, loss= 1.2569 (max= 3.4280), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:45:25,814 - root - INFO - Step 14530: lr=1.00E-05, loss= 1.2569 (max= 3.4280), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:45:25,815 - root - INFO - Step 14530: lr=1.00E-05, loss= 1.2569 (max= 3.4280), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:45:43,845 - root - INFO - Step 14540: lr=1.00E-05, loss= 1.2654 (max= 2.5762), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:45:43,845 - root - INFO - Step 14540: lr=1.00E-05, loss= 1.2654 (max= 2.5762), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:45:43,846 - root - INFO - Step 14540: lr=1.00E-05, loss= 1.2654 (max= 2.5762), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:45:43,846 - root - INFO - Step 14540: lr=1.00E-05, loss= 1.2654 (max= 2.5762), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:45:43,846 - root - INFO - Step 14540: lr=1.00E-05, loss= 1.2654 (max= 2.5762), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:45:43,846 - root - INFO - Step 14540: lr=1.00E-05, loss= 1.2654 (max= 2.5762), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:45:43,846 - root - INFO - Step 14540: lr=1.00E-05, loss= 1.2654 (max= 2.5762), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:45:43,846 - root - INFO - Step 14540: lr=1.00E-05, loss= 1.2654 (max= 2.5762), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:46:01,869 - root - INFO - Step 14550: lr=1.00E-05, loss= 1.2170 (max= 2.1544), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:01,870 - root - INFO - Step 14550: lr=1.00E-05, loss= 1.2170 (max= 2.1544), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:01,870 - root - INFO - Step 14550: lr=1.00E-05, loss= 1.2170 (max= 2.1544), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:01,870 - root - INFO - Step 14550: lr=1.00E-05, loss= 1.2170 (max= 2.1544), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:01,870 - root - INFO - Step 14550: lr=1.00E-05, loss= 1.2170 (max= 2.1544), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:01,870 - root - INFO - Step 14550: lr=1.00E-05, loss= 1.2170 (max= 2.1544), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:01,870 - root - INFO - Step 14550: lr=1.00E-05, loss= 1.2170 (max= 2.1544), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:01,870 - root - INFO - Step 14550: lr=1.00E-05, loss= 1.2170 (max= 2.1544), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:19,879 - root - INFO - Step 14560: lr=1.00E-05, loss= 1.2725 (max= 2.6844), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:19,879 - root - INFO - Step 14560: lr=1.00E-05, loss= 1.2725 (max= 2.6844), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:19,879 - root - INFO - Step 14560: lr=1.00E-05, loss= 1.2725 (max= 2.6844), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:19,879 - root - INFO - Step 14560: lr=1.00E-05, loss= 1.2725 (max= 2.6844), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:19,879 - root - INFO - Step 14560: lr=1.00E-05, loss= 1.2725 (max= 2.6844), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:19,879 - root - INFO - Step 14560: lr=1.00E-05, loss= 1.2725 (max= 2.6844), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:19,879 - root - INFO - Step 14560: lr=1.00E-05, loss= 1.2725 (max= 2.6844), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:19,883 - root - INFO - Step 14560: lr=1.00E-05, loss= 1.2725 (max= 2.6844), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:37,916 - root - INFO - Step 14570: lr=1.00E-05, loss= 1.2335 (max= 2.2639), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:37,917 - root - INFO - Step 14570: lr=1.00E-05, loss= 1.2335 (max= 2.2639), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:37,917 - root - INFO - Step 14570: lr=1.00E-05, loss= 1.2335 (max= 2.2639), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:37,917 - root - INFO - Step 14570: lr=1.00E-05, loss= 1.2335 (max= 2.2639), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:37,917 - root - INFO - Step 14570: lr=1.00E-05, loss= 1.2335 (max= 2.2639), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:37,917 - root - INFO - Step 14570: lr=1.00E-05, loss= 1.2335 (max= 2.2639), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:37,917 - root - INFO - Step 14570: lr=1.00E-05, loss= 1.2335 (max= 2.2639), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:37,917 - root - INFO - Step 14570: lr=1.00E-05, loss= 1.2335 (max= 2.2639), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:55,933 - root - INFO - Step 14580: lr=1.00E-05, loss= 1.2333 (max= 2.3699), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:55,933 - root - INFO - Step 14580: lr=1.00E-05, loss= 1.2333 (max= 2.3699), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:55,933 - root - INFO - Step 14580: lr=1.00E-05, loss= 1.2333 (max= 2.3699), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:55,933 - root - INFO - Step 14580: lr=1.00E-05, loss= 1.2333 (max= 2.3699), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:55,933 - root - INFO - Step 14580: lr=1.00E-05, loss= 1.2333 (max= 2.3699), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:55,933 - root - INFO - Step 14580: lr=1.00E-05, loss= 1.2333 (max= 2.3699), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:55,933 - root - INFO - Step 14580: lr=1.00E-05, loss= 1.2333 (max= 2.3699), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:46:55,934 - root - INFO - Step 14580: lr=1.00E-05, loss= 1.2333 (max= 2.3699), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:13,983 - root - INFO - Step 14590: lr=1.00E-05, loss= 1.2425 (max= 2.4288), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:13,983 - root - INFO - Step 14590: lr=1.00E-05, loss= 1.2425 (max= 2.4288), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:13,983 - root - INFO - Step 14590: lr=1.00E-05, loss= 1.2425 (max= 2.4288), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:13,983 - root - INFO - Step 14590: lr=1.00E-05, loss= 1.2425 (max= 2.4288), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:13,983 - root - INFO - Step 14590: lr=1.00E-05, loss= 1.2425 (max= 2.4288), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:13,983 - root - INFO - Step 14590: lr=1.00E-05, loss= 1.2425 (max= 2.4288), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:13,983 - root - INFO - Step 14590: lr=1.00E-05, loss= 1.2425 (max= 2.4288), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:13,983 - root - INFO - Step 14590: lr=1.00E-05, loss= 1.2425 (max= 2.4288), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:32,029 - root - INFO - Step 14600: lr=1.00E-05, loss= 1.2557 (max= 2.2997), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:32,029 - root - INFO - Step 14600: lr=1.00E-05, loss= 1.2557 (max= 2.2997), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:32,029 - root - INFO - Step 14600: lr=1.00E-05, loss= 1.2557 (max= 2.2997), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:32,029 - root - INFO - Step 14600: lr=1.00E-05, loss= 1.2557 (max= 2.2997), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:32,029 - root - INFO - Step 14600: lr=1.00E-05, loss= 1.2557 (max= 2.2997), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:32,029 - root - INFO - Step 14600: lr=1.00E-05, loss= 1.2557 (max= 2.2997), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:32,029 - root - INFO - Step 14600: lr=1.00E-05, loss= 1.2557 (max= 2.2997), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:32,029 - root - INFO - Step 14600: lr=1.00E-05, loss= 1.2557 (max= 2.2997), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:50,059 - root - INFO - Step 14610: lr=1.00E-05, loss= 1.2616 (max= 2.7108), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:50,059 - root - INFO - Step 14610: lr=1.00E-05, loss= 1.2616 (max= 2.7108), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:50,059 - root - INFO - Step 14610: lr=1.00E-05, loss= 1.2616 (max= 2.7108), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:50,059 - root - INFO - Step 14610: lr=1.00E-05, loss= 1.2616 (max= 2.7108), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:50,060 - root - INFO - Step 14610: lr=1.00E-05, loss= 1.2616 (max= 2.7108), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:50,060 - root - INFO - Step 14610: lr=1.00E-05, loss= 1.2616 (max= 2.7108), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:50,060 - root - INFO - Step 14610: lr=1.00E-05, loss= 1.2616 (max= 2.7108), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:47:50,061 - root - INFO - Step 14610: lr=1.00E-05, loss= 1.2616 (max= 2.7108), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:08,083 - root - INFO - Step 14620: lr=1.00E-05, loss= 1.2040 (max= 2.1982), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:08,083 - root - INFO - Step 14620: lr=1.00E-05, loss= 1.2040 (max= 2.1982), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:08,083 - root - INFO - Step 14620: lr=1.00E-05, loss= 1.2040 (max= 2.1982), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:08,083 - root - INFO - Step 14620: lr=1.00E-05, loss= 1.2040 (max= 2.1982), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:08,083 - root - INFO - Step 14620: lr=1.00E-05, loss= 1.2040 (max= 2.1982), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:08,083 - root - INFO - Step 14620: lr=1.00E-05, loss= 1.2040 (max= 2.1982), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:08,083 - root - INFO - Step 14620: lr=1.00E-05, loss= 1.2040 (max= 2.1982), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:08,083 - root - INFO - Step 14620: lr=1.00E-05, loss= 1.2040 (max= 2.1982), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:26,107 - root - INFO - Step 14630: lr=1.00E-05, loss= 1.2235 (max= 2.7567), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:26,107 - root - INFO - Step 14630: lr=1.00E-05, loss= 1.2235 (max= 2.7567), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:26,108 - root - INFO - Step 14630: lr=1.00E-05, loss= 1.2235 (max= 2.7567), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:26,108 - root - INFO - Step 14630: lr=1.00E-05, loss= 1.2235 (max= 2.7567), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:26,108 - root - INFO - Step 14630: lr=1.00E-05, loss= 1.2235 (max= 2.7567), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:26,108 - root - INFO - Step 14630: lr=1.00E-05, loss= 1.2235 (max= 2.7567), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:26,108 - root - INFO - Step 14630: lr=1.00E-05, loss= 1.2235 (max= 2.7567), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:26,108 - root - INFO - Step 14630: lr=1.00E-05, loss= 1.2235 (max= 2.7567), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:44,140 - root - INFO - Step 14640: lr=1.00E-05, loss= 1.2459 (max= 2.1719), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:44,140 - root - INFO - Step 14640: lr=1.00E-05, loss= 1.2459 (max= 2.1719), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:44,140 - root - INFO - Step 14640: lr=1.00E-05, loss= 1.2459 (max= 2.1719), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:44,140 - root - INFO - Step 14640: lr=1.00E-05, loss= 1.2459 (max= 2.1719), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:44,140 - root - INFO - Step 14640: lr=1.00E-05, loss= 1.2459 (max= 2.1719), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:44,140 - root - INFO - Step 14640: lr=1.00E-05, loss= 1.2459 (max= 2.1719), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:44,141 - root - INFO - Step 14640: lr=1.00E-05, loss= 1.2459 (max= 2.1719), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:48:44,142 - root - INFO - Step 14640: lr=1.00E-05, loss= 1.2459 (max= 2.1719), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:02,157 - root - INFO - Step 14650: lr=1.00E-05, loss= 1.2234 (max= 2.4022), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:02,157 - root - INFO - Step 14650: lr=1.00E-05, loss= 1.2234 (max= 2.4022), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:02,157 - root - INFO - Step 14650: lr=1.00E-05, loss= 1.2234 (max= 2.4022), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:02,157 - root - INFO - Step 14650: lr=1.00E-05, loss= 1.2234 (max= 2.4022), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:02,157 - root - INFO - Step 14650: lr=1.00E-05, loss= 1.2234 (max= 2.4022), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:02,157 - root - INFO - Step 14650: lr=1.00E-05, loss= 1.2234 (max= 2.4022), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:02,157 - root - INFO - Step 14650: lr=1.00E-05, loss= 1.2234 (max= 2.4022), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:02,157 - root - INFO - Step 14650: lr=1.00E-05, loss= 1.2234 (max= 2.4022), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:20,198 - root - INFO - Step 14660: lr=1.00E-05, loss= 1.2412 (max= 2.2881), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:20,198 - root - INFO - Step 14660: lr=1.00E-05, loss= 1.2412 (max= 2.2881), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:20,198 - root - INFO - Step 14660: lr=1.00E-05, loss= 1.2412 (max= 2.2881), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:20,198 - root - INFO - Step 14660: lr=1.00E-05, loss= 1.2412 (max= 2.2881), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:20,199 - root - INFO - Step 14660: lr=1.00E-05, loss= 1.2412 (max= 2.2881), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:20,199 - root - INFO - Step 14660: lr=1.00E-05, loss= 1.2412 (max= 2.2881), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:20,199 - root - INFO - Step 14660: lr=1.00E-05, loss= 1.2412 (max= 2.2881), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:20,199 - root - INFO - Step 14660: lr=1.00E-05, loss= 1.2412 (max= 2.2881), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:38,233 - root - INFO - Step 14670: lr=1.00E-05, loss= 1.2105 (max= 2.3787), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:38,233 - root - INFO - Step 14670: lr=1.00E-05, loss= 1.2105 (max= 2.3787), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:38,233 - root - INFO - Step 14670: lr=1.00E-05, loss= 1.2105 (max= 2.3787), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:38,233 - root - INFO - Step 14670: lr=1.00E-05, loss= 1.2105 (max= 2.3787), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:38,233 - root - INFO - Step 14670: lr=1.00E-05, loss= 1.2105 (max= 2.3787), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:38,233 - root - INFO - Step 14670: lr=1.00E-05, loss= 1.2105 (max= 2.3787), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:38,234 - root - INFO - Step 14670: lr=1.00E-05, loss= 1.2105 (max= 2.3787), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:38,234 - root - INFO - Step 14670: lr=1.00E-05, loss= 1.2105 (max= 2.3787), tps=18174, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:56,249 - root - INFO - Step 14680: lr=1.00E-05, loss= 1.2436 (max= 2.2857), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:56,250 - root - INFO - Step 14680: lr=1.00E-05, loss= 1.2436 (max= 2.2857), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:56,250 - root - INFO - Step 14680: lr=1.00E-05, loss= 1.2436 (max= 2.2857), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:56,250 - root - INFO - Step 14680: lr=1.00E-05, loss= 1.2436 (max= 2.2857), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:56,250 - root - INFO - Step 14680: lr=1.00E-05, loss= 1.2436 (max= 2.2857), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:56,250 - root - INFO - Step 14680: lr=1.00E-05, loss= 1.2436 (max= 2.2857), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:56,250 - root - INFO - Step 14680: lr=1.00E-05, loss= 1.2436 (max= 2.2857), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:49:56,250 - root - INFO - Step 14680: lr=1.00E-05, loss= 1.2436 (max= 2.2857), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:14,288 - root - INFO - Step 14690: lr=1.00E-05, loss= 1.2622 (max= 2.6978), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:14,288 - root - INFO - Step 14690: lr=1.00E-05, loss= 1.2622 (max= 2.6978), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:14,289 - root - INFO - Step 14690: lr=1.00E-05, loss= 1.2622 (max= 2.6978), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:14,289 - root - INFO - Step 14690: lr=1.00E-05, loss= 1.2622 (max= 2.6978), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:14,289 - root - INFO - Step 14690: lr=1.00E-05, loss= 1.2622 (max= 2.6978), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:14,289 - root - INFO - Step 14690: lr=1.00E-05, loss= 1.2622 (max= 2.6978), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:14,289 - root - INFO - Step 14690: lr=1.00E-05, loss= 1.2622 (max= 2.6978), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:14,289 - root - INFO - Step 14690: lr=1.00E-05, loss= 1.2622 (max= 2.6978), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:32,289 - root - INFO - Step 14700: lr=1.00E-05, loss= 1.2478 (max= 2.8303), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:32,289 - root - INFO - Step 14700: lr=1.00E-05, loss= 1.2478 (max= 2.8303), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:32,289 - root - INFO - Step 14700: lr=1.00E-05, loss= 1.2478 (max= 2.8303), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:32,289 - root - INFO - Step 14700: lr=1.00E-05, loss= 1.2478 (max= 2.8303), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:32,289 - root - INFO - Step 14700: lr=1.00E-05, loss= 1.2478 (max= 2.8303), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:32,289 - root - INFO - Step 14700: lr=1.00E-05, loss= 1.2478 (max= 2.8303), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:32,289 - root - INFO - Step 14700: lr=1.00E-05, loss= 1.2478 (max= 2.8303), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:32,289 - root - INFO - Step 14700: lr=1.00E-05, loss= 1.2478 (max= 2.8303), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:50,305 - root - INFO - Step 14710: lr=1.00E-05, loss= 1.2655 (max= 2.7953), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:50,305 - root - INFO - Step 14710: lr=1.00E-05, loss= 1.2655 (max= 2.7953), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:50,305 - root - INFO - Step 14710: lr=1.00E-05, loss= 1.2655 (max= 2.7953), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:50,305 - root - INFO - Step 14710: lr=1.00E-05, loss= 1.2655 (max= 2.7953), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:50,305 - root - INFO - Step 14710: lr=1.00E-05, loss= 1.2655 (max= 2.7953), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:50,305 - root - INFO - Step 14710: lr=1.00E-05, loss= 1.2655 (max= 2.7953), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:50,305 - root - INFO - Step 14710: lr=1.00E-05, loss= 1.2655 (max= 2.7953), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:50:50,305 - root - INFO - Step 14710: lr=1.00E-05, loss= 1.2655 (max= 2.7953), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:08,334 - root - INFO - Step 14720: lr=1.00E-05, loss= 1.2138 (max= 2.8072), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:08,335 - root - INFO - Step 14720: lr=1.00E-05, loss= 1.2138 (max= 2.8072), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:08,335 - root - INFO - Step 14720: lr=1.00E-05, loss= 1.2138 (max= 2.8072), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:08,335 - root - INFO - Step 14720: lr=1.00E-05, loss= 1.2138 (max= 2.8072), tps=18178, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:08,335 - root - INFO - Step 14720: lr=1.00E-05, loss= 1.2138 (max= 2.8072), tps=18178, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:08,335 - root - INFO - Step 14720: lr=1.00E-05, loss= 1.2138 (max= 2.8072), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:08,335 - root - INFO - Step 14720: lr=1.00E-05, loss= 1.2138 (max= 2.8072), tps=18178, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:08,335 - root - INFO - Step 14720: lr=1.00E-05, loss= 1.2138 (max= 2.8072), tps=18178, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:26,369 - root - INFO - Step 14730: lr=1.00E-05, loss= 1.2303 (max= 2.1711), tps=18174, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:26,370 - root - INFO - Step 14730: lr=1.00E-05, loss= 1.2303 (max= 2.1711), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:26,370 - root - INFO - Step 14730: lr=1.00E-05, loss= 1.2303 (max= 2.1711), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:26,370 - root - INFO - Step 14730: lr=1.00E-05, loss= 1.2303 (max= 2.1711), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:26,370 - root - INFO - Step 14730: lr=1.00E-05, loss= 1.2303 (max= 2.1711), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:26,370 - root - INFO - Step 14730: lr=1.00E-05, loss= 1.2303 (max= 2.1711), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:26,370 - root - INFO - Step 14730: lr=1.00E-05, loss= 1.2303 (max= 2.1711), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:26,370 - root - INFO - Step 14730: lr=1.00E-05, loss= 1.2303 (max= 2.1711), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:44,396 - root - INFO - Step 14740: lr=1.00E-05, loss= 1.2340 (max= 2.7276), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:44,397 - root - INFO - Step 14740: lr=1.00E-05, loss= 1.2340 (max= 2.7276), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:44,397 - root - INFO - Step 14740: lr=1.00E-05, loss= 1.2340 (max= 2.7276), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:44,397 - root - INFO - Step 14740: lr=1.00E-05, loss= 1.2340 (max= 2.7276), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:44,397 - root - INFO - Step 14740: lr=1.00E-05, loss= 1.2340 (max= 2.7276), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:44,397 - root - INFO - Step 14740: lr=1.00E-05, loss= 1.2340 (max= 2.7276), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:44,397 - root - INFO - Step 14740: lr=1.00E-05, loss= 1.2340 (max= 2.7276), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:51:44,397 - root - INFO - Step 14740: lr=1.00E-05, loss= 1.2340 (max= 2.7276), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:02,421 - root - INFO - Step 14750: lr=1.00E-05, loss= 1.2397 (max= 2.2081), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:02,422 - root - INFO - Step 14750: lr=1.00E-05, loss= 1.2397 (max= 2.2081), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:02,422 - root - INFO - Step 14750: lr=1.00E-05, loss= 1.2397 (max= 2.2081), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:02,422 - root - INFO - Step 14750: lr=1.00E-05, loss= 1.2397 (max= 2.2081), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:02,422 - root - INFO - Step 14750: lr=1.00E-05, loss= 1.2397 (max= 2.2081), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:02,422 - root - INFO - Step 14750: lr=1.00E-05, loss= 1.2397 (max= 2.2081), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:02,422 - root - INFO - Step 14750: lr=1.00E-05, loss= 1.2397 (max= 2.2081), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:02,422 - root - INFO - Step 14750: lr=1.00E-05, loss= 1.2397 (max= 2.2081), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:20,449 - root - INFO - Step 14760: lr=1.00E-05, loss= 1.2135 (max= 2.7113), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:20,450 - root - INFO - Step 14760: lr=1.00E-05, loss= 1.2135 (max= 2.7113), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:20,450 - root - INFO - Step 14760: lr=1.00E-05, loss= 1.2135 (max= 2.7113), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:20,450 - root - INFO - Step 14760: lr=1.00E-05, loss= 1.2135 (max= 2.7113), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:20,450 - root - INFO - Step 14760: lr=1.00E-05, loss= 1.2135 (max= 2.7113), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:20,450 - root - INFO - Step 14760: lr=1.00E-05, loss= 1.2135 (max= 2.7113), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:20,450 - root - INFO - Step 14760: lr=1.00E-05, loss= 1.2135 (max= 2.7113), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:20,450 - root - INFO - Step 14760: lr=1.00E-05, loss= 1.2135 (max= 2.7113), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:38,473 - root - INFO - Step 14770: lr=1.00E-05, loss= 1.2613 (max= 2.8128), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:38,473 - root - INFO - Step 14770: lr=1.00E-05, loss= 1.2613 (max= 2.8128), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:38,473 - root - INFO - Step 14770: lr=1.00E-05, loss= 1.2613 (max= 2.8128), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:38,473 - root - INFO - Step 14770: lr=1.00E-05, loss= 1.2613 (max= 2.8128), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:38,473 - root - INFO - Step 14770: lr=1.00E-05, loss= 1.2613 (max= 2.8128), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:38,473 - root - INFO - Step 14770: lr=1.00E-05, loss= 1.2613 (max= 2.8128), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:38,474 - root - INFO - Step 14770: lr=1.00E-05, loss= 1.2613 (max= 2.8128), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:38,474 - root - INFO - Step 14770: lr=1.00E-05, loss= 1.2613 (max= 2.8128), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:56,490 - root - INFO - Step 14780: lr=1.00E-05, loss= 1.2160 (max= 2.2803), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:56,490 - root - INFO - Step 14780: lr=1.00E-05, loss= 1.2160 (max= 2.2803), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:56,490 - root - INFO - Step 14780: lr=1.00E-05, loss= 1.2160 (max= 2.2803), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:56,490 - root - INFO - Step 14780: lr=1.00E-05, loss= 1.2160 (max= 2.2803), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:56,490 - root - INFO - Step 14780: lr=1.00E-05, loss= 1.2160 (max= 2.2803), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:56,490 - root - INFO - Step 14780: lr=1.00E-05, loss= 1.2160 (max= 2.2803), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:56,490 - root - INFO - Step 14780: lr=1.00E-05, loss= 1.2160 (max= 2.2803), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:52:56,491 - root - INFO - Step 14780: lr=1.00E-05, loss= 1.2160 (max= 2.2803), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:14,579 - root - INFO - Step 14790: lr=1.00E-05, loss= 1.2434 (max= 2.7355), tps=18123, mfu=37.76%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:14,579 - root - INFO - Step 14790: lr=1.00E-05, loss= 1.2434 (max= 2.7355), tps=18123, mfu=37.76%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:14,579 - root - INFO - Step 14790: lr=1.00E-05, loss= 1.2434 (max= 2.7355), tps=18123, mfu=37.76%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:14,579 - root - INFO - Step 14790: lr=1.00E-05, loss= 1.2434 (max= 2.7355), tps=18123, mfu=37.76%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:14,579 - root - INFO - Step 14790: lr=1.00E-05, loss= 1.2434 (max= 2.7355), tps=18123, mfu=37.76%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:14,579 - root - INFO - Step 14790: lr=1.00E-05, loss= 1.2434 (max= 2.7355), tps=18123, mfu=37.76%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:14,580 - root - INFO - Step 14790: lr=1.00E-05, loss= 1.2434 (max= 2.7355), tps=18124, mfu=37.76%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:14,583 - root - INFO - Step 14790: lr=1.00E-05, loss= 1.2434 (max= 2.7355), tps=18118, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:32,622 - root - INFO - Step 14800: lr=1.00E-05, loss= 1.2471 (max= 2.7291), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:32,622 - root - INFO - Step 14800: lr=1.00E-05, loss= 1.2471 (max= 2.7291), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:32,622 - root - INFO - Step 14800: lr=1.00E-05, loss= 1.2471 (max= 2.7291), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:32,622 - root - INFO - Step 14800: lr=1.00E-05, loss= 1.2471 (max= 2.7291), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:32,622 - root - INFO - Step 14800: lr=1.00E-05, loss= 1.2471 (max= 2.7291), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:32,622 - root - INFO - Step 14800: lr=1.00E-05, loss= 1.2471 (max= 2.7291), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:32,622 - root - INFO - Step 14800: lr=1.00E-05, loss= 1.2471 (max= 2.7291), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:32,622 - root - INFO - Step 14800: lr=1.00E-05, loss= 1.2471 (max= 2.7291), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:50,635 - root - INFO - Step 14810: lr=1.00E-05, loss= 1.2319 (max= 2.8273), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:50,635 - root - INFO - Step 14810: lr=1.00E-05, loss= 1.2319 (max= 2.8273), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:50,635 - root - INFO - Step 14810: lr=1.00E-05, loss= 1.2319 (max= 2.8273), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:50,635 - root - INFO - Step 14810: lr=1.00E-05, loss= 1.2319 (max= 2.8273), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:50,635 - root - INFO - Step 14810: lr=1.00E-05, loss= 1.2319 (max= 2.8273), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:50,635 - root - INFO - Step 14810: lr=1.00E-05, loss= 1.2319 (max= 2.8273), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:50,636 - root - INFO - Step 14810: lr=1.00E-05, loss= 1.2319 (max= 2.8273), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:53:50,636 - root - INFO - Step 14810: lr=1.00E-05, loss= 1.2319 (max= 2.8273), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:54:08,690 - root - INFO - Step 14820: lr=1.00E-05, loss= 1.2164 (max= 2.3689), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:54:08,690 - root - INFO - Step 14820: lr=1.00E-05, loss= 1.2164 (max= 2.3689), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:54:08,690 - root - INFO - Step 14820: lr=1.00E-05, loss= 1.2164 (max= 2.3689), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:54:08,690 - root - INFO - Step 14820: lr=1.00E-05, loss= 1.2164 (max= 2.3689), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:54:08,690 - root - INFO - Step 14820: lr=1.00E-05, loss= 1.2164 (max= 2.3689), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:54:08,690 - root - INFO - Step 14820: lr=1.00E-05, loss= 1.2164 (max= 2.3689), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:54:08,690 - root - INFO - Step 14820: lr=1.00E-05, loss= 1.2164 (max= 2.3689), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:54:08,691 - root - INFO - Step 14820: lr=1.00E-05, loss= 1.2164 (max= 2.3689), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:54:26,717 - root - INFO - Step 14830: lr=1.00E-05, loss= 1.2106 (max= 3.3435), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 16:54:26,718 - root - INFO - Step 14830: lr=1.00E-05, loss= 1.2106 (max= 3.3435), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 16:54:26,718 - root - INFO - Step 14830: lr=1.00E-05, loss= 1.2106 (max= 3.3435), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 16:54:26,718 - root - INFO - Step 14830: lr=1.00E-05, loss= 1.2106 (max= 3.3435), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 16:54:26,718 - root - INFO - Step 14830: lr=1.00E-05, loss= 1.2106 (max= 3.3435), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 16:54:26,718 - root - INFO - Step 14830: lr=1.00E-05, loss= 1.2106 (max= 3.3435), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 16:54:26,718 - root - INFO - Step 14830: lr=1.00E-05, loss= 1.2106 (max= 3.3435), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 16:54:26,719 - root - INFO - Step 14830: lr=1.00E-05, loss= 1.2106 (max= 3.3435), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.04%) +2025-10-24 16:54:44,755 - root - INFO - Step 14840: lr=1.00E-05, loss= 1.2546 (max= 2.2369), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:54:44,755 - root - INFO - Step 14840: lr=1.00E-05, loss= 1.2546 (max= 2.2369), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:54:44,755 - root - INFO - Step 14840: lr=1.00E-05, loss= 1.2546 (max= 2.2369), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:54:44,755 - root - INFO - Step 14840: lr=1.00E-05, loss= 1.2546 (max= 2.2369), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:54:44,755 - root - INFO - Step 14840: lr=1.00E-05, loss= 1.2546 (max= 2.2369), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:54:44,755 - root - INFO - Step 14840: lr=1.00E-05, loss= 1.2546 (max= 2.2369), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:54:44,755 - root - INFO - Step 14840: lr=1.00E-05, loss= 1.2546 (max= 2.2369), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:54:44,755 - root - INFO - Step 14840: lr=1.00E-05, loss= 1.2546 (max= 2.2369), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:02,767 - root - INFO - Step 14850: lr=1.00E-05, loss= 1.2171 (max= 2.1123), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:02,768 - root - INFO - Step 14850: lr=1.00E-05, loss= 1.2171 (max= 2.1123), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:02,768 - root - INFO - Step 14850: lr=1.00E-05, loss= 1.2171 (max= 2.1123), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:02,768 - root - INFO - Step 14850: lr=1.00E-05, loss= 1.2171 (max= 2.1123), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:02,768 - root - INFO - Step 14850: lr=1.00E-05, loss= 1.2171 (max= 2.1123), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:02,768 - root - INFO - Step 14850: lr=1.00E-05, loss= 1.2171 (max= 2.1123), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:02,768 - root - INFO - Step 14850: lr=1.00E-05, loss= 1.2171 (max= 2.1123), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:02,768 - root - INFO - Step 14850: lr=1.00E-05, loss= 1.2171 (max= 2.1123), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:20,810 - root - INFO - Step 14860: lr=1.00E-05, loss= 1.2125 (max= 2.3867), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:20,810 - root - INFO - Step 14860: lr=1.00E-05, loss= 1.2125 (max= 2.3867), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:20,810 - root - INFO - Step 14860: lr=1.00E-05, loss= 1.2125 (max= 2.3867), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:20,810 - root - INFO - Step 14860: lr=1.00E-05, loss= 1.2125 (max= 2.3867), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:20,810 - root - INFO - Step 14860: lr=1.00E-05, loss= 1.2125 (max= 2.3867), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:20,810 - root - INFO - Step 14860: lr=1.00E-05, loss= 1.2125 (max= 2.3867), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:20,810 - root - INFO - Step 14860: lr=1.00E-05, loss= 1.2125 (max= 2.3867), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:20,810 - root - INFO - Step 14860: lr=1.00E-05, loss= 1.2125 (max= 2.3867), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:38,831 - root - INFO - Step 14870: lr=1.00E-05, loss= 1.2226 (max= 2.5116), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:38,831 - root - INFO - Step 14870: lr=1.00E-05, loss= 1.2226 (max= 2.5116), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:38,831 - root - INFO - Step 14870: lr=1.00E-05, loss= 1.2226 (max= 2.5116), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:38,831 - root - INFO - Step 14870: lr=1.00E-05, loss= 1.2226 (max= 2.5116), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:38,831 - root - INFO - Step 14870: lr=1.00E-05, loss= 1.2226 (max= 2.5116), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:38,831 - root - INFO - Step 14870: lr=1.00E-05, loss= 1.2226 (max= 2.5116), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:38,832 - root - INFO - Step 14870: lr=1.00E-05, loss= 1.2226 (max= 2.5116), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:38,832 - root - INFO - Step 14870: lr=1.00E-05, loss= 1.2226 (max= 2.5116), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:56,880 - root - INFO - Step 14880: lr=1.00E-05, loss= 1.2153 (max= 2.2321), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:56,880 - root - INFO - Step 14880: lr=1.00E-05, loss= 1.2153 (max= 2.2321), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:56,881 - root - INFO - Step 14880: lr=1.00E-05, loss= 1.2153 (max= 2.2321), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:56,881 - root - INFO - Step 14880: lr=1.00E-05, loss= 1.2153 (max= 2.2321), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:56,881 - root - INFO - Step 14880: lr=1.00E-05, loss= 1.2153 (max= 2.2321), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:56,881 - root - INFO - Step 14880: lr=1.00E-05, loss= 1.2153 (max= 2.2321), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:56,881 - root - INFO - Step 14880: lr=1.00E-05, loss= 1.2153 (max= 2.2321), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:55:56,881 - root - INFO - Step 14880: lr=1.00E-05, loss= 1.2153 (max= 2.2321), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:14,921 - root - INFO - Step 14890: lr=1.00E-05, loss= 1.2456 (max= 2.6936), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:14,922 - root - INFO - Step 14890: lr=1.00E-05, loss= 1.2456 (max= 2.6936), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:14,922 - root - INFO - Step 14890: lr=1.00E-05, loss= 1.2456 (max= 2.6936), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:14,922 - root - INFO - Step 14890: lr=1.00E-05, loss= 1.2456 (max= 2.6936), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:14,922 - root - INFO - Step 14890: lr=1.00E-05, loss= 1.2456 (max= 2.6936), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:14,922 - root - INFO - Step 14890: lr=1.00E-05, loss= 1.2456 (max= 2.6936), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:14,922 - root - INFO - Step 14890: lr=1.00E-05, loss= 1.2456 (max= 2.6936), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:14,922 - root - INFO - Step 14890: lr=1.00E-05, loss= 1.2456 (max= 2.6936), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:20,079 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:909234 +2025-10-24 16:56:32,931 - root - INFO - Step 14900: lr=1.00E-05, loss= 1.2168 (max= 2.3808), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:32,931 - root - INFO - Step 14900: lr=1.00E-05, loss= 1.2168 (max= 2.3808), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:32,931 - root - INFO - Step 14900: lr=1.00E-05, loss= 1.2168 (max= 2.3808), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:32,931 - root - INFO - Step 14900: lr=1.00E-05, loss= 1.2168 (max= 2.3808), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:32,931 - root - INFO - Step 14900: lr=1.00E-05, loss= 1.2168 (max= 2.3808), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:32,931 - root - INFO - Step 14900: lr=1.00E-05, loss= 1.2168 (max= 2.3808), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:32,931 - root - INFO - Step 14900: lr=1.00E-05, loss= 1.2168 (max= 2.3808), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:32,931 - root - INFO - Step 14900: lr=1.00E-05, loss= 1.2168 (max= 2.3808), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:50,946 - root - INFO - Step 14910: lr=1.00E-05, loss= 1.1968 (max= 2.5941), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:50,946 - root - INFO - Step 14910: lr=1.00E-05, loss= 1.1968 (max= 2.5941), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:50,946 - root - INFO - Step 14910: lr=1.00E-05, loss= 1.1968 (max= 2.5941), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:50,946 - root - INFO - Step 14910: lr=1.00E-05, loss= 1.1968 (max= 2.5941), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:50,946 - root - INFO - Step 14910: lr=1.00E-05, loss= 1.1968 (max= 2.5941), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:50,946 - root - INFO - Step 14910: lr=1.00E-05, loss= 1.1968 (max= 2.5941), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:50,946 - root - INFO - Step 14910: lr=1.00E-05, loss= 1.1968 (max= 2.5941), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:56:50,946 - root - INFO - Step 14910: lr=1.00E-05, loss= 1.1968 (max= 2.5941), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:57:08,986 - root - INFO - Step 14920: lr=1.00E-05, loss= 1.2080 (max= 2.4047), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:57:08,986 - root - INFO - Step 14920: lr=1.00E-05, loss= 1.2080 (max= 2.4047), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:57:08,986 - root - INFO - Step 14920: lr=1.00E-05, loss= 1.2080 (max= 2.4047), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:57:08,986 - root - INFO - Step 14920: lr=1.00E-05, loss= 1.2080 (max= 2.4047), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:57:08,986 - root - INFO - Step 14920: lr=1.00E-05, loss= 1.2080 (max= 2.4047), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:57:08,986 - root - INFO - Step 14920: lr=1.00E-05, loss= 1.2080 (max= 2.4047), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:57:08,986 - root - INFO - Step 14920: lr=1.00E-05, loss= 1.2080 (max= 2.4047), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:57:08,987 - root - INFO - Step 14920: lr=1.00E-05, loss= 1.2080 (max= 2.4047), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 16:57:26,993 - root - INFO - Step 14930: lr=1.00E-05, loss= 1.2358 (max= 2.5365), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:57:26,993 - root - INFO - Step 14930: lr=1.00E-05, loss= 1.2358 (max= 2.5365), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:57:26,993 - root - INFO - Step 14930: lr=1.00E-05, loss= 1.2358 (max= 2.5365), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:57:26,994 - root - INFO - Step 14930: lr=1.00E-05, loss= 1.2358 (max= 2.5365), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:57:26,994 - root - INFO - Step 14930: lr=1.00E-05, loss= 1.2358 (max= 2.5365), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:57:26,994 - root - INFO - Step 14930: lr=1.00E-05, loss= 1.2358 (max= 2.5365), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:57:26,994 - root - INFO - Step 14930: lr=1.00E-05, loss= 1.2358 (max= 2.5365), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:57:26,994 - root - INFO - Step 14930: lr=1.00E-05, loss= 1.2358 (max= 2.5365), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:57:45,058 - root - INFO - Step 14940: lr=1.00E-05, loss= 1.2305 (max= 2.8119), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:57:45,058 - root - INFO - Step 14940: lr=1.00E-05, loss= 1.2305 (max= 2.8119), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:57:45,058 - root - INFO - Step 14940: lr=1.00E-05, loss= 1.2305 (max= 2.8119), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:57:45,058 - root - INFO - Step 14940: lr=1.00E-05, loss= 1.2305 (max= 2.8119), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:57:45,058 - root - INFO - Step 14940: lr=1.00E-05, loss= 1.2305 (max= 2.8119), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:57:45,058 - root - INFO - Step 14940: lr=1.00E-05, loss= 1.2305 (max= 2.8119), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:57:45,058 - root - INFO - Step 14940: lr=1.00E-05, loss= 1.2305 (max= 2.8119), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:57:45,058 - root - INFO - Step 14940: lr=1.00E-05, loss= 1.2305 (max= 2.8119), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:03,117 - root - INFO - Step 14950: lr=1.00E-05, loss= 1.2145 (max= 2.2992), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:03,117 - root - INFO - Step 14950: lr=1.00E-05, loss= 1.2145 (max= 2.2992), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:03,117 - root - INFO - Step 14950: lr=1.00E-05, loss= 1.2145 (max= 2.2992), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:03,118 - root - INFO - Step 14950: lr=1.00E-05, loss= 1.2145 (max= 2.2992), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:03,118 - root - INFO - Step 14950: lr=1.00E-05, loss= 1.2145 (max= 2.2992), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:03,118 - root - INFO - Step 14950: lr=1.00E-05, loss= 1.2145 (max= 2.2992), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:03,118 - root - INFO - Step 14950: lr=1.00E-05, loss= 1.2145 (max= 2.2992), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:03,118 - root - INFO - Step 14950: lr=1.00E-05, loss= 1.2145 (max= 2.2992), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:21,150 - root - INFO - Step 14960: lr=1.00E-05, loss= 1.2339 (max= 2.2638), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:21,150 - root - INFO - Step 14960: lr=1.00E-05, loss= 1.2339 (max= 2.2638), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:21,150 - root - INFO - Step 14960: lr=1.00E-05, loss= 1.2339 (max= 2.2638), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:21,150 - root - INFO - Step 14960: lr=1.00E-05, loss= 1.2339 (max= 2.2638), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:21,150 - root - INFO - Step 14960: lr=1.00E-05, loss= 1.2339 (max= 2.2638), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:21,150 - root - INFO - Step 14960: lr=1.00E-05, loss= 1.2339 (max= 2.2638), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:21,150 - root - INFO - Step 14960: lr=1.00E-05, loss= 1.2339 (max= 2.2638), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:21,151 - root - INFO - Step 14960: lr=1.00E-05, loss= 1.2339 (max= 2.2638), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:39,172 - root - INFO - Step 14970: lr=1.00E-05, loss= 1.2136 (max= 2.8401), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:39,172 - root - INFO - Step 14970: lr=1.00E-05, loss= 1.2136 (max= 2.8401), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:39,172 - root - INFO - Step 14970: lr=1.00E-05, loss= 1.2136 (max= 2.8401), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:39,172 - root - INFO - Step 14970: lr=1.00E-05, loss= 1.2136 (max= 2.8401), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:39,172 - root - INFO - Step 14970: lr=1.00E-05, loss= 1.2136 (max= 2.8401), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:39,172 - root - INFO - Step 14970: lr=1.00E-05, loss= 1.2136 (max= 2.8401), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:39,172 - root - INFO - Step 14970: lr=1.00E-05, loss= 1.2136 (max= 2.8401), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:39,172 - root - INFO - Step 14970: lr=1.00E-05, loss= 1.2136 (max= 2.8401), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:57,193 - root - INFO - Step 14980: lr=1.00E-05, loss= 1.1899 (max= 2.2451), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:57,193 - root - INFO - Step 14980: lr=1.00E-05, loss= 1.1899 (max= 2.2451), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:57,193 - root - INFO - Step 14980: lr=1.00E-05, loss= 1.1899 (max= 2.2451), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:57,193 - root - INFO - Step 14980: lr=1.00E-05, loss= 1.1899 (max= 2.2451), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:57,193 - root - INFO - Step 14980: lr=1.00E-05, loss= 1.1899 (max= 2.2451), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:57,193 - root - INFO - Step 14980: lr=1.00E-05, loss= 1.1899 (max= 2.2451), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:57,193 - root - INFO - Step 14980: lr=1.00E-05, loss= 1.1899 (max= 2.2451), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:58:57,194 - root - INFO - Step 14980: lr=1.00E-05, loss= 1.1899 (max= 2.2451), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:59:15,212 - root - INFO - Step 14990: lr=1.00E-05, loss= 1.2364 (max= 2.7911), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:59:15,212 - root - INFO - Step 14990: lr=1.00E-05, loss= 1.2364 (max= 2.7911), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:59:15,212 - root - INFO - Step 14990: lr=1.00E-05, loss= 1.2364 (max= 2.7911), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:59:15,212 - root - INFO - Step 14990: lr=1.00E-05, loss= 1.2364 (max= 2.7911), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:59:15,212 - root - INFO - Step 14990: lr=1.00E-05, loss= 1.2364 (max= 2.7911), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:59:15,212 - root - INFO - Step 14990: lr=1.00E-05, loss= 1.2364 (max= 2.7911), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:59:15,212 - root - INFO - Step 14990: lr=1.00E-05, loss= 1.2364 (max= 2.7911), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:59:15,212 - root - INFO - Step 14990: lr=1.00E-05, loss= 1.2364 (max= 2.7911), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-15000 +2025-10-24 16:59:33,223 - root - INFO - Step 15000: lr=1.00E-05, loss= 1.2128 (max= 2.3755), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:59:33,223 - root - INFO - Saving a full checkpoint at step 15000 +2025-10-24 16:59:33,223 - root - INFO - Step 15000: lr=1.00E-05, loss= 1.2128 (max= 2.3755), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:59:33,223 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 16:59:33,223 - root - INFO - Saving a full checkpoint at step 15000 +2025-10-24 16:59:33,223 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 16:59:33,223 - root - INFO - Step 15000: lr=1.00E-05, loss= 1.2128 (max= 2.3755), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:59:33,223 - root - INFO - Step 15000: lr=1.00E-05, loss= 1.2128 (max= 2.3755), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:59:33,223 - root - INFO - Saving a full checkpoint at step 15000 +2025-10-24 16:59:33,223 - root - INFO - Saving a full checkpoint at step 15000 +2025-10-24 16:59:33,223 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 16:59:33,223 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 16:59:33,223 - root - INFO - Step 15000: lr=1.00E-05, loss= 1.2128 (max= 2.3755), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:59:33,223 - root - INFO - Step 15000: lr=1.00E-05, loss= 1.2128 (max= 2.3755), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:59:33,223 - root - INFO - Step 15000: lr=1.00E-05, loss= 1.2128 (max= 2.3755), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:59:33,223 - root - INFO - Step 15000: lr=1.00E-05, loss= 1.2128 (max= 2.3755), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 16:59:33,223 - root - INFO - Saving a full checkpoint at step 15000 +2025-10-24 16:59:33,223 - root - INFO - Saving a full checkpoint at step 15000 +2025-10-24 16:59:33,223 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 16:59:33,223 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 16:59:33,223 - root - INFO - Saving a full checkpoint at step 15000 +2025-10-24 16:59:33,223 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 16:59:33,223 - root - INFO - Saving a full checkpoint at step 15000 +2025-10-24 16:59:33,223 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-15000! Save time: 4.570747375488281 +2025-10-24 16:59:47,755 - root - INFO - Finished saving the checkpoint in 14.53 seconds +2025-10-24 16:59:47,760 - root - INFO - Finished saving the checkpoint in 14.54 seconds +2025-10-24 16:59:47,761 - root - INFO - Finished saving the checkpoint in 14.54 seconds +2025-10-24 16:59:47,761 - root - INFO - Finished saving the checkpoint in 14.54 seconds +2025-10-24 16:59:47,761 - root - INFO - Finished saving the checkpoint in 14.54 seconds +2025-10-24 16:59:47,761 - root - INFO - Finished saving the checkpoint in 14.54 seconds +2025-10-24 16:59:47,762 - root - INFO - Finished saving the checkpoint in 14.54 seconds +2025-10-24 16:59:47,762 - root - INFO - Finished saving the checkpoint in 14.54 seconds +2025-10-24 17:00:05,730 - root - INFO - Step 15010: lr=1.00E-05, loss= 1.2299 (max= 2.1990), tps=10081, mfu=21.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 17:00:05,730 - root - INFO - Step 15010: lr=1.00E-05, loss= 1.2299 (max= 2.1990), tps=10081, mfu=21.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 17:00:05,730 - root - INFO - Step 15010: lr=1.00E-05, loss= 1.2299 (max= 2.1990), tps=10081, mfu=21.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 17:00:05,730 - root - INFO - Step 15010: lr=1.00E-05, loss= 1.2299 (max= 2.1990), tps=10081, mfu=21.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 17:00:05,730 - root - INFO - Step 15010: lr=1.00E-05, loss= 1.2299 (max= 2.1990), tps=10081, mfu=21.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 17:00:05,730 - root - INFO - Step 15010: lr=1.00E-05, loss= 1.2299 (max= 2.1990), tps=10081, mfu=21.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 17:00:05,730 - root - INFO - Step 15010: lr=1.00E-05, loss= 1.2299 (max= 2.1990), tps=10081, mfu=21.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 17:00:05,730 - root - INFO - Step 15010: lr=1.00E-05, loss= 1.2299 (max= 2.1990), tps=10081, mfu=21.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 17:00:23,748 - root - INFO - Step 15020: lr=1.00E-05, loss= 1.1836 (max= 2.2243), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:00:23,748 - root - INFO - Step 15020: lr=1.00E-05, loss= 1.1836 (max= 2.2243), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:00:23,748 - root - INFO - Step 15020: lr=1.00E-05, loss= 1.1836 (max= 2.2243), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:00:23,748 - root - INFO - Step 15020: lr=1.00E-05, loss= 1.1836 (max= 2.2243), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:00:23,748 - root - INFO - Step 15020: lr=1.00E-05, loss= 1.1836 (max= 2.2243), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:00:23,748 - root - INFO - Step 15020: lr=1.00E-05, loss= 1.1836 (max= 2.2243), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:00:23,749 - root - INFO - Step 15020: lr=1.00E-05, loss= 1.1836 (max= 2.2243), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:00:23,749 - root - INFO - Step 15020: lr=1.00E-05, loss= 1.1836 (max= 2.2243), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:00:41,764 - root - INFO - Step 15030: lr=1.00E-05, loss= 1.2094 (max= 2.5446), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:00:41,764 - root - INFO - Step 15030: lr=1.00E-05, loss= 1.2094 (max= 2.5446), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:00:41,765 - root - INFO - Step 15030: lr=1.00E-05, loss= 1.2094 (max= 2.5446), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:00:41,765 - root - INFO - Step 15030: lr=1.00E-05, loss= 1.2094 (max= 2.5446), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:00:41,765 - root - INFO - Step 15030: lr=1.00E-05, loss= 1.2094 (max= 2.5446), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:00:41,765 - root - INFO - Step 15030: lr=1.00E-05, loss= 1.2094 (max= 2.5446), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:00:41,765 - root - INFO - Step 15030: lr=1.00E-05, loss= 1.2094 (max= 2.5446), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:00:41,765 - root - INFO - Step 15030: lr=1.00E-05, loss= 1.2094 (max= 2.5446), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:00:59,791 - root - INFO - Step 15040: lr=1.00E-05, loss= 1.2238 (max= 2.3365), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:00:59,791 - root - INFO - Step 15040: lr=1.00E-05, loss= 1.2238 (max= 2.3365), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:00:59,791 - root - INFO - Step 15040: lr=1.00E-05, loss= 1.2238 (max= 2.3365), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:00:59,791 - root - INFO - Step 15040: lr=1.00E-05, loss= 1.2238 (max= 2.3365), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:00:59,791 - root - INFO - Step 15040: lr=1.00E-05, loss= 1.2238 (max= 2.3365), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:00:59,791 - root - INFO - Step 15040: lr=1.00E-05, loss= 1.2238 (max= 2.3365), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:00:59,791 - root - INFO - Step 15040: lr=1.00E-05, loss= 1.2238 (max= 2.3365), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:00:59,791 - root - INFO - Step 15040: lr=1.00E-05, loss= 1.2238 (max= 2.3365), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:01:17,848 - root - INFO - Step 15050: lr=1.00E-05, loss= 1.2181 (max= 2.6414), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:17,848 - root - INFO - Step 15050: lr=1.00E-05, loss= 1.2181 (max= 2.6414), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:17,848 - root - INFO - Step 15050: lr=1.00E-05, loss= 1.2181 (max= 2.6414), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:17,848 - root - INFO - Step 15050: lr=1.00E-05, loss= 1.2181 (max= 2.6414), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:17,848 - root - INFO - Step 15050: lr=1.00E-05, loss= 1.2181 (max= 2.6414), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:17,849 - root - INFO - Step 15050: lr=1.00E-05, loss= 1.2181 (max= 2.6414), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:17,849 - root - INFO - Step 15050: lr=1.00E-05, loss= 1.2181 (max= 2.6414), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:17,849 - root - INFO - Step 15050: lr=1.00E-05, loss= 1.2181 (max= 2.6414), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:35,877 - root - INFO - Step 15060: lr=1.00E-05, loss= 1.2151 (max= 2.1651), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:35,877 - root - INFO - Step 15060: lr=1.00E-05, loss= 1.2151 (max= 2.1651), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:35,877 - root - INFO - Step 15060: lr=1.00E-05, loss= 1.2151 (max= 2.1651), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:35,877 - root - INFO - Step 15060: lr=1.00E-05, loss= 1.2151 (max= 2.1651), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:35,877 - root - INFO - Step 15060: lr=1.00E-05, loss= 1.2151 (max= 2.1651), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:35,877 - root - INFO - Step 15060: lr=1.00E-05, loss= 1.2151 (max= 2.1651), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:35,877 - root - INFO - Step 15060: lr=1.00E-05, loss= 1.2151 (max= 2.1651), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:35,877 - root - INFO - Step 15060: lr=1.00E-05, loss= 1.2151 (max= 2.1651), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:53,936 - root - INFO - Step 15070: lr=1.00E-05, loss= 1.2340 (max= 2.2105), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:53,936 - root - INFO - Step 15070: lr=1.00E-05, loss= 1.2340 (max= 2.2105), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:53,936 - root - INFO - Step 15070: lr=1.00E-05, loss= 1.2340 (max= 2.2105), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:53,937 - root - INFO - Step 15070: lr=1.00E-05, loss= 1.2340 (max= 2.2105), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:53,937 - root - INFO - Step 15070: lr=1.00E-05, loss= 1.2340 (max= 2.2105), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:53,937 - root - INFO - Step 15070: lr=1.00E-05, loss= 1.2340 (max= 2.2105), tps=18150, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:53,937 - root - INFO - Step 15070: lr=1.00E-05, loss= 1.2340 (max= 2.2105), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:01:53,938 - root - INFO - Step 15070: lr=1.00E-05, loss= 1.2340 (max= 2.2105), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:11,953 - root - INFO - Step 15080: lr=1.00E-05, loss= 1.2213 (max= 2.0941), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:11,954 - root - INFO - Step 15080: lr=1.00E-05, loss= 1.2213 (max= 2.0941), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:11,954 - root - INFO - Step 15080: lr=1.00E-05, loss= 1.2213 (max= 2.0941), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:11,954 - root - INFO - Step 15080: lr=1.00E-05, loss= 1.2213 (max= 2.0941), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:11,954 - root - INFO - Step 15080: lr=1.00E-05, loss= 1.2213 (max= 2.0941), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:11,954 - root - INFO - Step 15080: lr=1.00E-05, loss= 1.2213 (max= 2.0941), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:11,954 - root - INFO - Step 15080: lr=1.00E-05, loss= 1.2213 (max= 2.0941), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:11,954 - root - INFO - Step 15080: lr=1.00E-05, loss= 1.2213 (max= 2.0941), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:29,975 - root - INFO - Step 15090: lr=1.00E-05, loss= 1.2124 (max= 2.0226), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:29,975 - root - INFO - Step 15090: lr=1.00E-05, loss= 1.2124 (max= 2.0226), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:29,975 - root - INFO - Step 15090: lr=1.00E-05, loss= 1.2124 (max= 2.0226), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:29,975 - root - INFO - Step 15090: lr=1.00E-05, loss= 1.2124 (max= 2.0226), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:29,975 - root - INFO - Step 15090: lr=1.00E-05, loss= 1.2124 (max= 2.0226), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:29,975 - root - INFO - Step 15090: lr=1.00E-05, loss= 1.2124 (max= 2.0226), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:29,975 - root - INFO - Step 15090: lr=1.00E-05, loss= 1.2124 (max= 2.0226), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:29,976 - root - INFO - Step 15090: lr=1.00E-05, loss= 1.2124 (max= 2.0226), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:48,015 - root - INFO - Step 15100: lr=1.00E-05, loss= 1.2264 (max= 2.8254), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:48,015 - root - INFO - Step 15100: lr=1.00E-05, loss= 1.2264 (max= 2.8254), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:48,015 - root - INFO - Step 15100: lr=1.00E-05, loss= 1.2264 (max= 2.8254), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:48,015 - root - INFO - Step 15100: lr=1.00E-05, loss= 1.2264 (max= 2.8254), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:48,015 - root - INFO - Step 15100: lr=1.00E-05, loss= 1.2264 (max= 2.8254), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:48,015 - root - INFO - Step 15100: lr=1.00E-05, loss= 1.2264 (max= 2.8254), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:48,016 - root - INFO - Step 15100: lr=1.00E-05, loss= 1.2264 (max= 2.8254), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:02:48,016 - root - INFO - Step 15100: lr=1.00E-05, loss= 1.2264 (max= 2.8254), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:03:06,043 - root - INFO - Step 15110: lr=1.00E-05, loss= 1.2113 (max= 2.1340), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:03:06,043 - root - INFO - Step 15110: lr=1.00E-05, loss= 1.2113 (max= 2.1340), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:03:06,043 - root - INFO - Step 15110: lr=1.00E-05, loss= 1.2113 (max= 2.1340), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:03:06,043 - root - INFO - Step 15110: lr=1.00E-05, loss= 1.2113 (max= 2.1340), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:03:06,043 - root - INFO - Step 15110: lr=1.00E-05, loss= 1.2113 (max= 2.1340), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:03:06,043 - root - INFO - Step 15110: lr=1.00E-05, loss= 1.2113 (max= 2.1340), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:03:06,043 - root - INFO - Step 15110: lr=1.00E-05, loss= 1.2113 (max= 2.1340), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:03:06,043 - root - INFO - Step 15110: lr=1.00E-05, loss= 1.2113 (max= 2.1340), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:03:24,036 - root - INFO - Step 15120: lr=1.00E-05, loss= 1.2060 (max= 2.1850), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:03:24,036 - root - INFO - Step 15120: lr=1.00E-05, loss= 1.2060 (max= 2.1850), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:03:24,037 - root - INFO - Step 15120: lr=1.00E-05, loss= 1.2060 (max= 2.1850), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:03:24,037 - root - INFO - Step 15120: lr=1.00E-05, loss= 1.2060 (max= 2.1850), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:03:24,037 - root - INFO - Step 15120: lr=1.00E-05, loss= 1.2060 (max= 2.1850), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:03:24,037 - root - INFO - Step 15120: lr=1.00E-05, loss= 1.2060 (max= 2.1850), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:03:24,037 - root - INFO - Step 15120: lr=1.00E-05, loss= 1.2060 (max= 2.1850), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:03:24,037 - root - INFO - Step 15120: lr=1.00E-05, loss= 1.2060 (max= 2.1850), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:03:42,085 - root - INFO - Step 15130: lr=1.00E-05, loss= 1.1933 (max= 2.2967), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:03:42,085 - root - INFO - Step 15130: lr=1.00E-05, loss= 1.1933 (max= 2.2967), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:03:42,085 - root - INFO - Step 15130: lr=1.00E-05, loss= 1.1933 (max= 2.2967), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:03:42,086 - root - INFO - Step 15130: lr=1.00E-05, loss= 1.1933 (max= 2.2967), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:03:42,086 - root - INFO - Step 15130: lr=1.00E-05, loss= 1.1933 (max= 2.2967), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:03:42,086 - root - INFO - Step 15130: lr=1.00E-05, loss= 1.1933 (max= 2.2967), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:03:42,086 - root - INFO - Step 15130: lr=1.00E-05, loss= 1.1933 (max= 2.2967), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:03:42,086 - root - INFO - Step 15130: lr=1.00E-05, loss= 1.1933 (max= 2.2967), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:00,106 - root - INFO - Step 15140: lr=1.00E-05, loss= 1.2284 (max= 2.2091), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:00,106 - root - INFO - Step 15140: lr=1.00E-05, loss= 1.2284 (max= 2.2091), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:00,106 - root - INFO - Step 15140: lr=1.00E-05, loss= 1.2284 (max= 2.2091), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:00,106 - root - INFO - Step 15140: lr=1.00E-05, loss= 1.2284 (max= 2.2091), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:00,106 - root - INFO - Step 15140: lr=1.00E-05, loss= 1.2284 (max= 2.2091), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:00,106 - root - INFO - Step 15140: lr=1.00E-05, loss= 1.2284 (max= 2.2091), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:00,106 - root - INFO - Step 15140: lr=1.00E-05, loss= 1.2284 (max= 2.2091), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:00,106 - root - INFO - Step 15140: lr=1.00E-05, loss= 1.2284 (max= 2.2091), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:18,129 - root - INFO - Step 15150: lr=1.00E-05, loss= 1.2275 (max= 2.4667), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:18,129 - root - INFO - Step 15150: lr=1.00E-05, loss= 1.2275 (max= 2.4667), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:18,130 - root - INFO - Step 15150: lr=1.00E-05, loss= 1.2275 (max= 2.4667), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:18,130 - root - INFO - Step 15150: lr=1.00E-05, loss= 1.2275 (max= 2.4667), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:18,130 - root - INFO - Step 15150: lr=1.00E-05, loss= 1.2275 (max= 2.4667), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:18,130 - root - INFO - Step 15150: lr=1.00E-05, loss= 1.2275 (max= 2.4667), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:18,130 - root - INFO - Step 15150: lr=1.00E-05, loss= 1.2275 (max= 2.4667), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:18,130 - root - INFO - Step 15150: lr=1.00E-05, loss= 1.2275 (max= 2.4667), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:36,149 - root - INFO - Step 15160: lr=1.00E-05, loss= 1.2000 (max= 2.0606), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:04:36,149 - root - INFO - Step 15160: lr=1.00E-05, loss= 1.2000 (max= 2.0606), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:04:36,149 - root - INFO - Step 15160: lr=1.00E-05, loss= 1.2000 (max= 2.0606), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:04:36,149 - root - INFO - Step 15160: lr=1.00E-05, loss= 1.2000 (max= 2.0606), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:04:36,149 - root - INFO - Step 15160: lr=1.00E-05, loss= 1.2000 (max= 2.0606), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:04:36,149 - root - INFO - Step 15160: lr=1.00E-05, loss= 1.2000 (max= 2.0606), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:04:36,149 - root - INFO - Step 15160: lr=1.00E-05, loss= 1.2000 (max= 2.0606), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:04:36,150 - root - INFO - Step 15160: lr=1.00E-05, loss= 1.2000 (max= 2.0606), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:04:54,204 - root - INFO - Step 15170: lr=1.00E-05, loss= 1.1824 (max= 2.2120), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:54,204 - root - INFO - Step 15170: lr=1.00E-05, loss= 1.1824 (max= 2.2120), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:54,204 - root - INFO - Step 15170: lr=1.00E-05, loss= 1.1824 (max= 2.2120), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:54,204 - root - INFO - Step 15170: lr=1.00E-05, loss= 1.1824 (max= 2.2120), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:54,204 - root - INFO - Step 15170: lr=1.00E-05, loss= 1.1824 (max= 2.2120), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:54,204 - root - INFO - Step 15170: lr=1.00E-05, loss= 1.1824 (max= 2.2120), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:54,204 - root - INFO - Step 15170: lr=1.00E-05, loss= 1.1824 (max= 2.2120), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:04:54,204 - root - INFO - Step 15170: lr=1.00E-05, loss= 1.1824 (max= 2.2120), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:05:12,229 - root - INFO - Step 15180: lr=1.00E-05, loss= 1.2334 (max= 2.8029), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:05:12,229 - root - INFO - Step 15180: lr=1.00E-05, loss= 1.2334 (max= 2.8029), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:05:12,230 - root - INFO - Step 15180: lr=1.00E-05, loss= 1.2334 (max= 2.8029), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:05:12,230 - root - INFO - Step 15180: lr=1.00E-05, loss= 1.2334 (max= 2.8029), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:05:12,230 - root - INFO - Step 15180: lr=1.00E-05, loss= 1.2334 (max= 2.8029), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:05:12,230 - root - INFO - Step 15180: lr=1.00E-05, loss= 1.2334 (max= 2.8029), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:05:12,230 - root - INFO - Step 15180: lr=1.00E-05, loss= 1.2334 (max= 2.8029), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:05:12,230 - root - INFO - Step 15180: lr=1.00E-05, loss= 1.2334 (max= 2.8029), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:05:30,254 - root - INFO - Step 15190: lr=1.00E-05, loss= 1.2432 (max= 2.7753), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:05:30,254 - root - INFO - Step 15190: lr=1.00E-05, loss= 1.2432 (max= 2.7753), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:05:30,255 - root - INFO - Step 15190: lr=1.00E-05, loss= 1.2432 (max= 2.7753), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:05:30,255 - root - INFO - Step 15190: lr=1.00E-05, loss= 1.2432 (max= 2.7753), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:05:30,255 - root - INFO - Step 15190: lr=1.00E-05, loss= 1.2432 (max= 2.7753), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:05:30,255 - root - INFO - Step 15190: lr=1.00E-05, loss= 1.2432 (max= 2.7753), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:05:30,255 - root - INFO - Step 15190: lr=1.00E-05, loss= 1.2432 (max= 2.7753), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:05:30,255 - root - INFO - Step 15190: lr=1.00E-05, loss= 1.2432 (max= 2.7753), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:05:48,248 - root - INFO - Step 15200: lr=1.00E-05, loss= 1.2521 (max= 2.3306), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:05:48,248 - root - INFO - Step 15200: lr=1.00E-05, loss= 1.2521 (max= 2.3306), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:05:48,248 - root - INFO - Step 15200: lr=1.00E-05, loss= 1.2521 (max= 2.3306), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:05:48,248 - root - INFO - Step 15200: lr=1.00E-05, loss= 1.2521 (max= 2.3306), tps=18214, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:05:48,248 - root - INFO - Step 15200: lr=1.00E-05, loss= 1.2521 (max= 2.3306), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:05:48,248 - root - INFO - Step 15200: lr=1.00E-05, loss= 1.2521 (max= 2.3306), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:05:48,248 - root - INFO - Step 15200: lr=1.00E-05, loss= 1.2521 (max= 2.3306), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:05:48,248 - root - INFO - Step 15200: lr=1.00E-05, loss= 1.2521 (max= 2.3306), tps=18215, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:06:06,271 - root - INFO - Step 15210: lr=1.00E-05, loss= 1.2158 (max= 2.5398), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:06:06,271 - root - INFO - Step 15210: lr=1.00E-05, loss= 1.2158 (max= 2.5398), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:06:06,271 - root - INFO - Step 15210: lr=1.00E-05, loss= 1.2158 (max= 2.5398), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:06:06,271 - root - INFO - Step 15210: lr=1.00E-05, loss= 1.2158 (max= 2.5398), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:06:06,271 - root - INFO - Step 15210: lr=1.00E-05, loss= 1.2158 (max= 2.5398), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:06:06,271 - root - INFO - Step 15210: lr=1.00E-05, loss= 1.2158 (max= 2.5398), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:06:06,271 - root - INFO - Step 15210: lr=1.00E-05, loss= 1.2158 (max= 2.5398), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:06:06,271 - root - INFO - Step 15210: lr=1.00E-05, loss= 1.2158 (max= 2.5398), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:06:24,283 - root - INFO - Step 15220: lr=1.00E-05, loss= 1.2006 (max= 2.2014), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:06:24,283 - root - INFO - Step 15220: lr=1.00E-05, loss= 1.2006 (max= 2.2014), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:06:24,283 - root - INFO - Step 15220: lr=1.00E-05, loss= 1.2006 (max= 2.2014), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:06:24,283 - root - INFO - Step 15220: lr=1.00E-05, loss= 1.2006 (max= 2.2014), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:06:24,283 - root - INFO - Step 15220: lr=1.00E-05, loss= 1.2006 (max= 2.2014), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:06:24,283 - root - INFO - Step 15220: lr=1.00E-05, loss= 1.2006 (max= 2.2014), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:06:24,283 - root - INFO - Step 15220: lr=1.00E-05, loss= 1.2006 (max= 2.2014), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:06:24,283 - root - INFO - Step 15220: lr=1.00E-05, loss= 1.2006 (max= 2.2014), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:06:42,313 - root - INFO - Step 15230: lr=1.00E-05, loss= 1.2039 (max= 2.3872), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:06:42,313 - root - INFO - Step 15230: lr=1.00E-05, loss= 1.2039 (max= 2.3872), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:06:42,313 - root - INFO - Step 15230: lr=1.00E-05, loss= 1.2039 (max= 2.3872), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:06:42,313 - root - INFO - Step 15230: lr=1.00E-05, loss= 1.2039 (max= 2.3872), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:06:42,313 - root - INFO - Step 15230: lr=1.00E-05, loss= 1.2039 (max= 2.3872), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:06:42,313 - root - INFO - Step 15230: lr=1.00E-05, loss= 1.2039 (max= 2.3872), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:06:42,313 - root - INFO - Step 15230: lr=1.00E-05, loss= 1.2039 (max= 2.3872), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:06:42,313 - root - INFO - Step 15230: lr=1.00E-05, loss= 1.2039 (max= 2.3872), tps=18177, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:00,331 - root - INFO - Step 15240: lr=1.00E-05, loss= 1.2236 (max= 2.4162), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:00,331 - root - INFO - Step 15240: lr=1.00E-05, loss= 1.2236 (max= 2.4162), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:00,331 - root - INFO - Step 15240: lr=1.00E-05, loss= 1.2236 (max= 2.4162), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:00,331 - root - INFO - Step 15240: lr=1.00E-05, loss= 1.2236 (max= 2.4162), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:00,331 - root - INFO - Step 15240: lr=1.00E-05, loss= 1.2236 (max= 2.4162), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:00,331 - root - INFO - Step 15240: lr=1.00E-05, loss= 1.2236 (max= 2.4162), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:00,332 - root - INFO - Step 15240: lr=1.00E-05, loss= 1.2236 (max= 2.4162), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:00,332 - root - INFO - Step 15240: lr=1.00E-05, loss= 1.2236 (max= 2.4162), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:18,421 - root - INFO - Step 15250: lr=1.00E-05, loss= 1.2052 (max= 2.3468), tps=18118, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:18,421 - root - INFO - Step 15250: lr=1.00E-05, loss= 1.2052 (max= 2.3468), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:18,422 - root - INFO - Step 15250: lr=1.00E-05, loss= 1.2052 (max= 2.3468), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:18,422 - root - INFO - Step 15250: lr=1.00E-05, loss= 1.2052 (max= 2.3468), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:18,422 - root - INFO - Step 15250: lr=1.00E-05, loss= 1.2052 (max= 2.3468), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:18,422 - root - INFO - Step 15250: lr=1.00E-05, loss= 1.2052 (max= 2.3468), tps=18119, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:18,422 - root - INFO - Step 15250: lr=1.00E-05, loss= 1.2052 (max= 2.3468), tps=18118, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:18,423 - root - INFO - Step 15250: lr=1.00E-05, loss= 1.2052 (max= 2.3468), tps=18117, mfu=37.75%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:36,476 - root - INFO - Step 15260: lr=1.00E-05, loss= 1.2440 (max= 2.2794), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:36,476 - root - INFO - Step 15260: lr=1.00E-05, loss= 1.2440 (max= 2.2794), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:36,476 - root - INFO - Step 15260: lr=1.00E-05, loss= 1.2440 (max= 2.2794), tps=18154, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:36,476 - root - INFO - Step 15260: lr=1.00E-05, loss= 1.2440 (max= 2.2794), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:36,476 - root - INFO - Step 15260: lr=1.00E-05, loss= 1.2440 (max= 2.2794), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:36,476 - root - INFO - Step 15260: lr=1.00E-05, loss= 1.2440 (max= 2.2794), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:36,476 - root - INFO - Step 15260: lr=1.00E-05, loss= 1.2440 (max= 2.2794), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:36,477 - root - INFO - Step 15260: lr=1.00E-05, loss= 1.2440 (max= 2.2794), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:54,520 - root - INFO - Step 15270: lr=1.00E-05, loss= 1.2005 (max= 2.3432), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:54,520 - root - INFO - Step 15270: lr=1.00E-05, loss= 1.2005 (max= 2.3432), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:54,520 - root - INFO - Step 15270: lr=1.00E-05, loss= 1.2005 (max= 2.3432), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:54,520 - root - INFO - Step 15270: lr=1.00E-05, loss= 1.2005 (max= 2.3432), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:54,520 - root - INFO - Step 15270: lr=1.00E-05, loss= 1.2005 (max= 2.3432), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:54,520 - root - INFO - Step 15270: lr=1.00E-05, loss= 1.2005 (max= 2.3432), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:54,520 - root - INFO - Step 15270: lr=1.00E-05, loss= 1.2005 (max= 2.3432), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:07:54,521 - root - INFO - Step 15270: lr=1.00E-05, loss= 1.2005 (max= 2.3432), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:08:12,534 - root - INFO - Step 15280: lr=1.00E-05, loss= 1.2065 (max= 2.7754), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:08:12,534 - root - INFO - Step 15280: lr=1.00E-05, loss= 1.2065 (max= 2.7754), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:08:12,534 - root - INFO - Step 15280: lr=1.00E-05, loss= 1.2065 (max= 2.7754), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:08:12,534 - root - INFO - Step 15280: lr=1.00E-05, loss= 1.2065 (max= 2.7754), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:08:12,534 - root - INFO - Step 15280: lr=1.00E-05, loss= 1.2065 (max= 2.7754), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:08:12,534 - root - INFO - Step 15280: lr=1.00E-05, loss= 1.2065 (max= 2.7754), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:08:12,534 - root - INFO - Step 15280: lr=1.00E-05, loss= 1.2065 (max= 2.7754), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:08:12,536 - root - INFO - Step 15280: lr=1.00E-05, loss= 1.2065 (max= 2.7754), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:08:30,562 - root - INFO - Step 15290: lr=1.00E-05, loss= 1.2336 (max= 2.4996), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:08:30,562 - root - INFO - Step 15290: lr=1.00E-05, loss= 1.2336 (max= 2.4996), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:08:30,562 - root - INFO - Step 15290: lr=1.00E-05, loss= 1.2336 (max= 2.4996), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:08:30,562 - root - INFO - Step 15290: lr=1.00E-05, loss= 1.2336 (max= 2.4996), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:08:30,562 - root - INFO - Step 15290: lr=1.00E-05, loss= 1.2336 (max= 2.4996), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:08:30,562 - root - INFO - Step 15290: lr=1.00E-05, loss= 1.2336 (max= 2.4996), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:08:30,562 - root - INFO - Step 15290: lr=1.00E-05, loss= 1.2336 (max= 2.4996), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:08:30,562 - root - INFO - Step 15290: lr=1.00E-05, loss= 1.2336 (max= 2.4996), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:08:48,619 - root - INFO - Step 15300: lr=1.00E-05, loss= 1.1907 (max= 2.3739), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:08:48,619 - root - INFO - Step 15300: lr=1.00E-05, loss= 1.1907 (max= 2.3739), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:08:48,619 - root - INFO - Step 15300: lr=1.00E-05, loss= 1.1907 (max= 2.3739), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:08:48,619 - root - INFO - Step 15300: lr=1.00E-05, loss= 1.1907 (max= 2.3739), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:08:48,619 - root - INFO - Step 15300: lr=1.00E-05, loss= 1.1907 (max= 2.3739), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:08:48,619 - root - INFO - Step 15300: lr=1.00E-05, loss= 1.1907 (max= 2.3739), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:08:48,620 - root - INFO - Step 15300: lr=1.00E-05, loss= 1.1907 (max= 2.3739), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:08:48,620 - root - INFO - Step 15300: lr=1.00E-05, loss= 1.1907 (max= 2.3739), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:09:06,637 - root - INFO - Step 15310: lr=1.00E-05, loss= 1.1720 (max= 2.5943), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:06,637 - root - INFO - Step 15310: lr=1.00E-05, loss= 1.1720 (max= 2.5943), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:06,637 - root - INFO - Step 15310: lr=1.00E-05, loss= 1.1720 (max= 2.5943), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:06,637 - root - INFO - Step 15310: lr=1.00E-05, loss= 1.1720 (max= 2.5943), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:06,637 - root - INFO - Step 15310: lr=1.00E-05, loss= 1.1720 (max= 2.5943), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:06,637 - root - INFO - Step 15310: lr=1.00E-05, loss= 1.1720 (max= 2.5943), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:06,637 - root - INFO - Step 15310: lr=1.00E-05, loss= 1.1720 (max= 2.5943), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:06,637 - root - INFO - Step 15310: lr=1.00E-05, loss= 1.1720 (max= 2.5943), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:24,696 - root - INFO - Step 15320: lr=1.00E-05, loss= 1.2155 (max= 2.3843), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:24,697 - root - INFO - Step 15320: lr=1.00E-05, loss= 1.2155 (max= 2.3843), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:24,697 - root - INFO - Step 15320: lr=1.00E-05, loss= 1.2155 (max= 2.3843), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:24,697 - root - INFO - Step 15320: lr=1.00E-05, loss= 1.2155 (max= 2.3843), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:24,697 - root - INFO - Step 15320: lr=1.00E-05, loss= 1.2155 (max= 2.3843), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:24,697 - root - INFO - Step 15320: lr=1.00E-05, loss= 1.2155 (max= 2.3843), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:24,697 - root - INFO - Step 15320: lr=1.00E-05, loss= 1.2155 (max= 2.3843), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:24,697 - root - INFO - Step 15320: lr=1.00E-05, loss= 1.2155 (max= 2.3843), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:42,735 - root - INFO - Step 15330: lr=1.00E-05, loss= 1.2142 (max= 2.1440), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:42,735 - root - INFO - Step 15330: lr=1.00E-05, loss= 1.2142 (max= 2.1440), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:42,735 - root - INFO - Step 15330: lr=1.00E-05, loss= 1.2142 (max= 2.1440), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:42,735 - root - INFO - Step 15330: lr=1.00E-05, loss= 1.2142 (max= 2.1440), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:42,735 - root - INFO - Step 15330: lr=1.00E-05, loss= 1.2142 (max= 2.1440), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:42,735 - root - INFO - Step 15330: lr=1.00E-05, loss= 1.2142 (max= 2.1440), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:42,735 - root - INFO - Step 15330: lr=1.00E-05, loss= 1.2142 (max= 2.1440), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:09:42,735 - root - INFO - Step 15330: lr=1.00E-05, loss= 1.2142 (max= 2.1440), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:10:00,737 - root - INFO - Step 15340: lr=1.00E-05, loss= 1.1966 (max= 2.3927), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:10:00,737 - root - INFO - Step 15340: lr=1.00E-05, loss= 1.1966 (max= 2.3927), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:10:00,737 - root - INFO - Step 15340: lr=1.00E-05, loss= 1.1966 (max= 2.3927), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:10:00,737 - root - INFO - Step 15340: lr=1.00E-05, loss= 1.1966 (max= 2.3927), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:10:00,737 - root - INFO - Step 15340: lr=1.00E-05, loss= 1.1966 (max= 2.3927), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:10:00,737 - root - INFO - Step 15340: lr=1.00E-05, loss= 1.1966 (max= 2.3927), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:10:00,737 - root - INFO - Step 15340: lr=1.00E-05, loss= 1.1966 (max= 2.3927), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:10:00,737 - root - INFO - Step 15340: lr=1.00E-05, loss= 1.1966 (max= 2.3927), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:10:18,800 - root - INFO - Step 15350: lr=1.00E-05, loss= 1.2350 (max= 3.5213), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:10:18,800 - root - INFO - Step 15350: lr=1.00E-05, loss= 1.2350 (max= 3.5213), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:10:18,800 - root - INFO - Step 15350: lr=1.00E-05, loss= 1.2350 (max= 3.5213), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:10:18,800 - root - INFO - Step 15350: lr=1.00E-05, loss= 1.2350 (max= 3.5213), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:10:18,800 - root - INFO - Step 15350: lr=1.00E-05, loss= 1.2350 (max= 3.5213), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:10:18,800 - root - INFO - Step 15350: lr=1.00E-05, loss= 1.2350 (max= 3.5213), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:10:18,800 - root - INFO - Step 15350: lr=1.00E-05, loss= 1.2350 (max= 3.5213), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:10:18,800 - root - INFO - Step 15350: lr=1.00E-05, loss= 1.2350 (max= 3.5213), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:10:36,821 - root - INFO - Step 15360: lr=1.00E-05, loss= 1.2022 (max= 2.5076), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:10:36,822 - root - INFO - Step 15360: lr=1.00E-05, loss= 1.2022 (max= 2.5076), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:10:36,822 - root - INFO - Step 15360: lr=1.00E-05, loss= 1.2022 (max= 2.5076), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:10:36,822 - root - INFO - Step 15360: lr=1.00E-05, loss= 1.2022 (max= 2.5076), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:10:36,822 - root - INFO - Step 15360: lr=1.00E-05, loss= 1.2022 (max= 2.5076), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:10:36,822 - root - INFO - Step 15360: lr=1.00E-05, loss= 1.2022 (max= 2.5076), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:10:36,822 - root - INFO - Step 15360: lr=1.00E-05, loss= 1.2022 (max= 2.5076), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:10:36,822 - root - INFO - Step 15360: lr=1.00E-05, loss= 1.2022 (max= 2.5076), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:10:54,830 - root - INFO - Step 15370: lr=1.00E-05, loss= 1.1989 (max= 2.3859), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:10:54,831 - root - INFO - Step 15370: lr=1.00E-05, loss= 1.1989 (max= 2.3859), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:10:54,831 - root - INFO - Step 15370: lr=1.00E-05, loss= 1.1989 (max= 2.3859), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:10:54,831 - root - INFO - Step 15370: lr=1.00E-05, loss= 1.1989 (max= 2.3859), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:10:54,831 - root - INFO - Step 15370: lr=1.00E-05, loss= 1.1989 (max= 2.3859), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:10:54,831 - root - INFO - Step 15370: lr=1.00E-05, loss= 1.1989 (max= 2.3859), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:10:54,831 - root - INFO - Step 15370: lr=1.00E-05, loss= 1.1989 (max= 2.3859), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:10:54,831 - root - INFO - Step 15370: lr=1.00E-05, loss= 1.1989 (max= 2.3859), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:11:12,874 - root - INFO - Step 15380: lr=1.00E-05, loss= 1.2409 (max= 2.5234), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:12,874 - root - INFO - Step 15380: lr=1.00E-05, loss= 1.2409 (max= 2.5234), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:12,874 - root - INFO - Step 15380: lr=1.00E-05, loss= 1.2409 (max= 2.5234), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:12,874 - root - INFO - Step 15380: lr=1.00E-05, loss= 1.2409 (max= 2.5234), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:12,874 - root - INFO - Step 15380: lr=1.00E-05, loss= 1.2409 (max= 2.5234), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:12,874 - root - INFO - Step 15380: lr=1.00E-05, loss= 1.2409 (max= 2.5234), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:12,874 - root - INFO - Step 15380: lr=1.00E-05, loss= 1.2409 (max= 2.5234), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:12,874 - root - INFO - Step 15380: lr=1.00E-05, loss= 1.2409 (max= 2.5234), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:30,873 - root - INFO - Step 15390: lr=1.00E-05, loss= 1.2200 (max= 2.3987), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:30,874 - root - INFO - Step 15390: lr=1.00E-05, loss= 1.2200 (max= 2.3987), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:30,874 - root - INFO - Step 15390: lr=1.00E-05, loss= 1.2200 (max= 2.3987), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:30,874 - root - INFO - Step 15390: lr=1.00E-05, loss= 1.2200 (max= 2.3987), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:30,874 - root - INFO - Step 15390: lr=1.00E-05, loss= 1.2200 (max= 2.3987), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:30,874 - root - INFO - Step 15390: lr=1.00E-05, loss= 1.2200 (max= 2.3987), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:30,874 - root - INFO - Step 15390: lr=1.00E-05, loss= 1.2200 (max= 2.3987), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:30,874 - root - INFO - Step 15390: lr=1.00E-05, loss= 1.2200 (max= 2.3987), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:48,927 - root - INFO - Step 15400: lr=1.00E-05, loss= 1.2208 (max= 2.3579), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:48,927 - root - INFO - Step 15400: lr=1.00E-05, loss= 1.2208 (max= 2.3579), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:48,927 - root - INFO - Step 15400: lr=1.00E-05, loss= 1.2208 (max= 2.3579), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:48,927 - root - INFO - Step 15400: lr=1.00E-05, loss= 1.2208 (max= 2.3579), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:48,927 - root - INFO - Step 15400: lr=1.00E-05, loss= 1.2208 (max= 2.3579), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:48,927 - root - INFO - Step 15400: lr=1.00E-05, loss= 1.2208 (max= 2.3579), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:48,927 - root - INFO - Step 15400: lr=1.00E-05, loss= 1.2208 (max= 2.3579), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:11:48,927 - root - INFO - Step 15400: lr=1.00E-05, loss= 1.2208 (max= 2.3579), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:12:06,965 - root - INFO - Step 15410: lr=1.00E-05, loss= 1.2331 (max= 2.3534), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:06,966 - root - INFO - Step 15410: lr=1.00E-05, loss= 1.2331 (max= 2.3534), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:06,966 - root - INFO - Step 15410: lr=1.00E-05, loss= 1.2331 (max= 2.3534), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:06,966 - root - INFO - Step 15410: lr=1.00E-05, loss= 1.2331 (max= 2.3534), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:06,966 - root - INFO - Step 15410: lr=1.00E-05, loss= 1.2331 (max= 2.3534), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:06,966 - root - INFO - Step 15410: lr=1.00E-05, loss= 1.2331 (max= 2.3534), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:06,966 - root - INFO - Step 15410: lr=1.00E-05, loss= 1.2331 (max= 2.3534), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:06,966 - root - INFO - Step 15410: lr=1.00E-05, loss= 1.2331 (max= 2.3534), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:24,966 - root - INFO - Step 15420: lr=1.00E-05, loss= 1.2502 (max= 2.2067), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:24,966 - root - INFO - Step 15420: lr=1.00E-05, loss= 1.2502 (max= 2.2067), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:24,966 - root - INFO - Step 15420: lr=1.00E-05, loss= 1.2502 (max= 2.2067), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:24,966 - root - INFO - Step 15420: lr=1.00E-05, loss= 1.2502 (max= 2.2067), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:24,966 - root - INFO - Step 15420: lr=1.00E-05, loss= 1.2502 (max= 2.2067), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:24,966 - root - INFO - Step 15420: lr=1.00E-05, loss= 1.2502 (max= 2.2067), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:24,966 - root - INFO - Step 15420: lr=1.00E-05, loss= 1.2502 (max= 2.2067), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:24,966 - root - INFO - Step 15420: lr=1.00E-05, loss= 1.2502 (max= 2.2067), tps=18208, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:42,993 - root - INFO - Step 15430: lr=1.00E-05, loss= 1.1913 (max= 2.5592), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:42,994 - root - INFO - Step 15430: lr=1.00E-05, loss= 1.1913 (max= 2.5592), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:42,994 - root - INFO - Step 15430: lr=1.00E-05, loss= 1.1913 (max= 2.5592), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:42,994 - root - INFO - Step 15430: lr=1.00E-05, loss= 1.1913 (max= 2.5592), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:42,994 - root - INFO - Step 15430: lr=1.00E-05, loss= 1.1913 (max= 2.5592), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:42,994 - root - INFO - Step 15430: lr=1.00E-05, loss= 1.1913 (max= 2.5592), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:42,994 - root - INFO - Step 15430: lr=1.00E-05, loss= 1.1913 (max= 2.5592), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:12:42,994 - root - INFO - Step 15430: lr=1.00E-05, loss= 1.1913 (max= 2.5592), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:13:01,015 - root - INFO - Step 15440: lr=1.00E-05, loss= 1.1778 (max= 1.9958), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:13:01,016 - root - INFO - Step 15440: lr=1.00E-05, loss= 1.1778 (max= 1.9958), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:13:01,016 - root - INFO - Step 15440: lr=1.00E-05, loss= 1.1778 (max= 1.9958), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:13:01,016 - root - INFO - Step 15440: lr=1.00E-05, loss= 1.1778 (max= 1.9958), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:13:01,016 - root - INFO - Step 15440: lr=1.00E-05, loss= 1.1778 (max= 1.9958), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:13:01,016 - root - INFO - Step 15440: lr=1.00E-05, loss= 1.1778 (max= 1.9958), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:13:01,016 - root - INFO - Step 15440: lr=1.00E-05, loss= 1.1778 (max= 1.9958), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:13:01,016 - root - INFO - Step 15440: lr=1.00E-05, loss= 1.1778 (max= 1.9958), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:13:30,726 - root - INFO - Step 15450: lr=1.00E-05, loss= 1.2115 (max= 2.5390), tps=11030, mfu=22.98%, memory: 78.54GiB(44.03%) time/data_loading=0.04s (max=0.30s, 40.23%) +2025-10-24 17:13:30,726 - root - INFO - Step 15450: lr=1.00E-05, loss= 1.2115 (max= 2.5390), tps=11030, mfu=22.98%, memory: 78.54GiB(44.03%) time/data_loading=0.04s (max=0.30s, 40.23%) +2025-10-24 17:13:30,726 - root - INFO - Step 15450: lr=1.00E-05, loss= 1.2115 (max= 2.5390), tps=11030, mfu=22.98%, memory: 78.54GiB(44.03%) time/data_loading=0.04s (max=0.30s, 40.23%) +2025-10-24 17:13:30,726 - root - INFO - Step 15450: lr=1.00E-05, loss= 1.2115 (max= 2.5390), tps=11030, mfu=22.98%, memory: 78.54GiB(44.03%) time/data_loading=0.04s (max=0.30s, 40.23%) +2025-10-24 17:13:30,726 - root - INFO - Step 15450: lr=1.00E-05, loss= 1.2115 (max= 2.5390), tps=11030, mfu=22.98%, memory: 78.54GiB(44.03%) time/data_loading=0.04s (max=0.30s, 40.23%) +2025-10-24 17:13:30,726 - root - INFO - Step 15450: lr=1.00E-05, loss= 1.2115 (max= 2.5390), tps=11030, mfu=22.98%, memory: 78.54GiB(44.03%) time/data_loading=0.04s (max=0.30s, 40.23%) +2025-10-24 17:13:30,726 - root - INFO - Step 15450: lr=1.00E-05, loss= 1.2115 (max= 2.5390), tps=11030, mfu=22.98%, memory: 78.54GiB(44.03%) time/data_loading=0.04s (max=0.30s, 40.23%) +2025-10-24 17:13:30,726 - root - INFO - Step 15450: lr=1.00E-05, loss= 1.2115 (max= 2.5390), tps=11030, mfu=22.98%, memory: 78.54GiB(44.03%) time/data_loading=0.04s (max=0.30s, 40.23%) +2025-10-24 17:13:48,697 - root - INFO - Step 15460: lr=1.00E-05, loss= 1.2084 (max= 2.4492), tps=18237, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:13:48,697 - root - INFO - Step 15460: lr=1.00E-05, loss= 1.2084 (max= 2.4492), tps=18237, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:13:48,697 - root - INFO - Step 15460: lr=1.00E-05, loss= 1.2084 (max= 2.4492), tps=18237, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:13:48,697 - root - INFO - Step 15460: lr=1.00E-05, loss= 1.2084 (max= 2.4492), tps=18237, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:13:48,697 - root - INFO - Step 15460: lr=1.00E-05, loss= 1.2084 (max= 2.4492), tps=18237, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:13:48,697 - root - INFO - Step 15460: lr=1.00E-05, loss= 1.2084 (max= 2.4492), tps=18237, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:13:48,697 - root - INFO - Step 15460: lr=1.00E-05, loss= 1.2084 (max= 2.4492), tps=18237, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:13:48,697 - root - INFO - Step 15460: lr=1.00E-05, loss= 1.2084 (max= 2.4492), tps=18237, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:14:06,744 - root - INFO - Step 15470: lr=1.00E-05, loss= 1.1794 (max= 2.5325), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:14:06,745 - root - INFO - Step 15470: lr=1.00E-05, loss= 1.1794 (max= 2.5325), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:14:06,745 - root - INFO - Step 15470: lr=1.00E-05, loss= 1.1794 (max= 2.5325), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:14:06,745 - root - INFO - Step 15470: lr=1.00E-05, loss= 1.1794 (max= 2.5325), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:14:06,745 - root - INFO - Step 15470: lr=1.00E-05, loss= 1.1794 (max= 2.5325), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:14:06,745 - root - INFO - Step 15470: lr=1.00E-05, loss= 1.1794 (max= 2.5325), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:14:06,745 - root - INFO - Step 15470: lr=1.00E-05, loss= 1.1794 (max= 2.5325), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:14:06,745 - root - INFO - Step 15470: lr=1.00E-05, loss= 1.1794 (max= 2.5325), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:14:24,787 - root - INFO - Step 15480: lr=1.00E-05, loss= 1.2291 (max= 2.4599), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:14:24,787 - root - INFO - Step 15480: lr=1.00E-05, loss= 1.2291 (max= 2.4599), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:14:24,787 - root - INFO - Step 15480: lr=1.00E-05, loss= 1.2291 (max= 2.4599), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:14:24,787 - root - INFO - Step 15480: lr=1.00E-05, loss= 1.2291 (max= 2.4599), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:14:24,787 - root - INFO - Step 15480: lr=1.00E-05, loss= 1.2291 (max= 2.4599), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:14:24,787 - root - INFO - Step 15480: lr=1.00E-05, loss= 1.2291 (max= 2.4599), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:14:24,787 - root - INFO - Step 15480: lr=1.00E-05, loss= 1.2291 (max= 2.4599), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:14:24,787 - root - INFO - Step 15480: lr=1.00E-05, loss= 1.2291 (max= 2.4599), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:14:42,810 - root - INFO - Step 15490: lr=1.00E-05, loss= 1.2004 (max= 2.5260), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:14:42,810 - root - INFO - Step 15490: lr=1.00E-05, loss= 1.2004 (max= 2.5260), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:14:42,810 - root - INFO - Step 15490: lr=1.00E-05, loss= 1.2004 (max= 2.5260), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:14:42,810 - root - INFO - Step 15490: lr=1.00E-05, loss= 1.2004 (max= 2.5260), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:14:42,810 - root - INFO - Step 15490: lr=1.00E-05, loss= 1.2004 (max= 2.5260), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:14:42,810 - root - INFO - Step 15490: lr=1.00E-05, loss= 1.2004 (max= 2.5260), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:14:42,810 - root - INFO - Step 15490: lr=1.00E-05, loss= 1.2004 (max= 2.5260), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:14:42,810 - root - INFO - Step 15490: lr=1.00E-05, loss= 1.2004 (max= 2.5260), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:15:00,850 - root - INFO - Step 15500: lr=1.00E-05, loss= 1.2151 (max= 2.3795), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:00,851 - root - INFO - Step 15500: lr=1.00E-05, loss= 1.2151 (max= 2.3795), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:00,851 - root - INFO - Step 15500: lr=1.00E-05, loss= 1.2151 (max= 2.3795), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:00,851 - root - INFO - Step 15500: lr=1.00E-05, loss= 1.2151 (max= 2.3795), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:00,851 - root - INFO - Step 15500: lr=1.00E-05, loss= 1.2151 (max= 2.3795), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:00,851 - root - INFO - Step 15500: lr=1.00E-05, loss= 1.2151 (max= 2.3795), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:00,851 - root - INFO - Step 15500: lr=1.00E-05, loss= 1.2151 (max= 2.3795), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:00,851 - root - INFO - Step 15500: lr=1.00E-05, loss= 1.2151 (max= 2.3795), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:18,877 - root - INFO - Step 15510: lr=1.00E-05, loss= 1.2180 (max= 2.5201), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:18,877 - root - INFO - Step 15510: lr=1.00E-05, loss= 1.2180 (max= 2.5201), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:18,877 - root - INFO - Step 15510: lr=1.00E-05, loss= 1.2180 (max= 2.5201), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:18,877 - root - INFO - Step 15510: lr=1.00E-05, loss= 1.2180 (max= 2.5201), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:18,877 - root - INFO - Step 15510: lr=1.00E-05, loss= 1.2180 (max= 2.5201), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:18,877 - root - INFO - Step 15510: lr=1.00E-05, loss= 1.2180 (max= 2.5201), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:18,877 - root - INFO - Step 15510: lr=1.00E-05, loss= 1.2180 (max= 2.5201), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:18,877 - root - INFO - Step 15510: lr=1.00E-05, loss= 1.2180 (max= 2.5201), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:36,939 - root - INFO - Step 15520: lr=1.00E-05, loss= 1.2039 (max= 2.4414), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:36,939 - root - INFO - Step 15520: lr=1.00E-05, loss= 1.2039 (max= 2.4414), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:36,939 - root - INFO - Step 15520: lr=1.00E-05, loss= 1.2039 (max= 2.4414), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:36,939 - root - INFO - Step 15520: lr=1.00E-05, loss= 1.2039 (max= 2.4414), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:36,939 - root - INFO - Step 15520: lr=1.00E-05, loss= 1.2039 (max= 2.4414), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:36,939 - root - INFO - Step 15520: lr=1.00E-05, loss= 1.2039 (max= 2.4414), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:36,939 - root - INFO - Step 15520: lr=1.00E-05, loss= 1.2039 (max= 2.4414), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:36,939 - root - INFO - Step 15520: lr=1.00E-05, loss= 1.2039 (max= 2.4414), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:54,986 - root - INFO - Step 15530: lr=1.00E-05, loss= 1.2040 (max= 2.6988), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:54,986 - root - INFO - Step 15530: lr=1.00E-05, loss= 1.2040 (max= 2.6988), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:54,986 - root - INFO - Step 15530: lr=1.00E-05, loss= 1.2040 (max= 2.6988), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:54,986 - root - INFO - Step 15530: lr=1.00E-05, loss= 1.2040 (max= 2.6988), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:54,986 - root - INFO - Step 15530: lr=1.00E-05, loss= 1.2040 (max= 2.6988), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:54,986 - root - INFO - Step 15530: lr=1.00E-05, loss= 1.2040 (max= 2.6988), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:54,987 - root - INFO - Step 15530: lr=1.00E-05, loss= 1.2040 (max= 2.6988), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:15:54,987 - root - INFO - Step 15530: lr=1.00E-05, loss= 1.2040 (max= 2.6988), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:13,025 - root - INFO - Step 15540: lr=1.00E-05, loss= 1.2229 (max= 2.8106), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:13,025 - root - INFO - Step 15540: lr=1.00E-05, loss= 1.2229 (max= 2.8106), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:13,025 - root - INFO - Step 15540: lr=1.00E-05, loss= 1.2229 (max= 2.8106), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:13,026 - root - INFO - Step 15540: lr=1.00E-05, loss= 1.2229 (max= 2.8106), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:13,026 - root - INFO - Step 15540: lr=1.00E-05, loss= 1.2229 (max= 2.8106), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:13,026 - root - INFO - Step 15540: lr=1.00E-05, loss= 1.2229 (max= 2.8106), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:13,026 - root - INFO - Step 15540: lr=1.00E-05, loss= 1.2229 (max= 2.8106), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:13,026 - root - INFO - Step 15540: lr=1.00E-05, loss= 1.2229 (max= 2.8106), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:31,050 - root - INFO - Step 15550: lr=1.00E-05, loss= 1.2036 (max= 2.6541), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:31,050 - root - INFO - Step 15550: lr=1.00E-05, loss= 1.2036 (max= 2.6541), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:31,050 - root - INFO - Step 15550: lr=1.00E-05, loss= 1.2036 (max= 2.6541), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:31,050 - root - INFO - Step 15550: lr=1.00E-05, loss= 1.2036 (max= 2.6541), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:31,050 - root - INFO - Step 15550: lr=1.00E-05, loss= 1.2036 (max= 2.6541), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:31,050 - root - INFO - Step 15550: lr=1.00E-05, loss= 1.2036 (max= 2.6541), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:31,050 - root - INFO - Step 15550: lr=1.00E-05, loss= 1.2036 (max= 2.6541), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:31,050 - root - INFO - Step 15550: lr=1.00E-05, loss= 1.2036 (max= 2.6541), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:49,069 - root - INFO - Step 15560: lr=1.00E-05, loss= 1.1856 (max= 2.7800), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:49,069 - root - INFO - Step 15560: lr=1.00E-05, loss= 1.1856 (max= 2.7800), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:49,069 - root - INFO - Step 15560: lr=1.00E-05, loss= 1.1856 (max= 2.7800), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:49,069 - root - INFO - Step 15560: lr=1.00E-05, loss= 1.1856 (max= 2.7800), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:49,069 - root - INFO - Step 15560: lr=1.00E-05, loss= 1.1856 (max= 2.7800), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:49,069 - root - INFO - Step 15560: lr=1.00E-05, loss= 1.1856 (max= 2.7800), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:49,069 - root - INFO - Step 15560: lr=1.00E-05, loss= 1.1856 (max= 2.7800), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:16:49,069 - root - INFO - Step 15560: lr=1.00E-05, loss= 1.1856 (max= 2.7800), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:07,118 - root - INFO - Step 15570: lr=1.00E-05, loss= 1.2110 (max= 2.5376), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:07,118 - root - INFO - Step 15570: lr=1.00E-05, loss= 1.2110 (max= 2.5376), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:07,118 - root - INFO - Step 15570: lr=1.00E-05, loss= 1.2110 (max= 2.5376), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:07,118 - root - INFO - Step 15570: lr=1.00E-05, loss= 1.2110 (max= 2.5376), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:07,118 - root - INFO - Step 15570: lr=1.00E-05, loss= 1.2110 (max= 2.5376), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:07,118 - root - INFO - Step 15570: lr=1.00E-05, loss= 1.2110 (max= 2.5376), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:07,118 - root - INFO - Step 15570: lr=1.00E-05, loss= 1.2110 (max= 2.5376), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:07,118 - root - INFO - Step 15570: lr=1.00E-05, loss= 1.2110 (max= 2.5376), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:25,133 - root - INFO - Step 15580: lr=1.00E-05, loss= 1.2252 (max= 2.1129), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:25,133 - root - INFO - Step 15580: lr=1.00E-05, loss= 1.2252 (max= 2.1129), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:25,133 - root - INFO - Step 15580: lr=1.00E-05, loss= 1.2252 (max= 2.1129), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:25,133 - root - INFO - Step 15580: lr=1.00E-05, loss= 1.2252 (max= 2.1129), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:25,133 - root - INFO - Step 15580: lr=1.00E-05, loss= 1.2252 (max= 2.1129), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:25,133 - root - INFO - Step 15580: lr=1.00E-05, loss= 1.2252 (max= 2.1129), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:25,133 - root - INFO - Step 15580: lr=1.00E-05, loss= 1.2252 (max= 2.1129), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:25,133 - root - INFO - Step 15580: lr=1.00E-05, loss= 1.2252 (max= 2.1129), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:43,157 - root - INFO - Step 15590: lr=1.00E-05, loss= 1.2094 (max= 2.2592), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:43,157 - root - INFO - Step 15590: lr=1.00E-05, loss= 1.2094 (max= 2.2592), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:43,157 - root - INFO - Step 15590: lr=1.00E-05, loss= 1.2094 (max= 2.2592), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:43,157 - root - INFO - Step 15590: lr=1.00E-05, loss= 1.2094 (max= 2.2592), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:43,157 - root - INFO - Step 15590: lr=1.00E-05, loss= 1.2094 (max= 2.2592), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:43,157 - root - INFO - Step 15590: lr=1.00E-05, loss= 1.2094 (max= 2.2592), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:43,157 - root - INFO - Step 15590: lr=1.00E-05, loss= 1.2094 (max= 2.2592), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:17:43,157 - root - INFO - Step 15590: lr=1.00E-05, loss= 1.2094 (max= 2.2592), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:01,196 - root - INFO - Step 15600: lr=1.00E-05, loss= 1.1975 (max= 2.1977), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:01,196 - root - INFO - Step 15600: lr=1.00E-05, loss= 1.1975 (max= 2.1977), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:01,196 - root - INFO - Step 15600: lr=1.00E-05, loss= 1.1975 (max= 2.1977), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:01,196 - root - INFO - Step 15600: lr=1.00E-05, loss= 1.1975 (max= 2.1977), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:01,196 - root - INFO - Step 15600: lr=1.00E-05, loss= 1.1975 (max= 2.1977), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:01,196 - root - INFO - Step 15600: lr=1.00E-05, loss= 1.1975 (max= 2.1977), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:01,196 - root - INFO - Step 15600: lr=1.00E-05, loss= 1.1975 (max= 2.1977), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:01,196 - root - INFO - Step 15600: lr=1.00E-05, loss= 1.1975 (max= 2.1977), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:19,208 - root - INFO - Step 15610: lr=1.00E-05, loss= 1.2185 (max= 2.1588), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:18:19,208 - root - INFO - Step 15610: lr=1.00E-05, loss= 1.2185 (max= 2.1588), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:18:19,208 - root - INFO - Step 15610: lr=1.00E-05, loss= 1.2185 (max= 2.1588), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:18:19,208 - root - INFO - Step 15610: lr=1.00E-05, loss= 1.2185 (max= 2.1588), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:18:19,208 - root - INFO - Step 15610: lr=1.00E-05, loss= 1.2185 (max= 2.1588), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:18:19,208 - root - INFO - Step 15610: lr=1.00E-05, loss= 1.2185 (max= 2.1588), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:18:19,208 - root - INFO - Step 15610: lr=1.00E-05, loss= 1.2185 (max= 2.1588), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:18:19,208 - root - INFO - Step 15610: lr=1.00E-05, loss= 1.2185 (max= 2.1588), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:18:37,229 - root - INFO - Step 15620: lr=1.00E-05, loss= 1.2329 (max= 2.8291), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:37,229 - root - INFO - Step 15620: lr=1.00E-05, loss= 1.2329 (max= 2.8291), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:37,230 - root - INFO - Step 15620: lr=1.00E-05, loss= 1.2329 (max= 2.8291), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:37,230 - root - INFO - Step 15620: lr=1.00E-05, loss= 1.2329 (max= 2.8291), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:37,230 - root - INFO - Step 15620: lr=1.00E-05, loss= 1.2329 (max= 2.8291), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:37,230 - root - INFO - Step 15620: lr=1.00E-05, loss= 1.2329 (max= 2.8291), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:37,230 - root - INFO - Step 15620: lr=1.00E-05, loss= 1.2329 (max= 2.8291), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:37,230 - root - INFO - Step 15620: lr=1.00E-05, loss= 1.2329 (max= 2.8291), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:55,249 - root - INFO - Step 15630: lr=1.00E-05, loss= 1.1886 (max= 2.4009), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:55,249 - root - INFO - Step 15630: lr=1.00E-05, loss= 1.1886 (max= 2.4009), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:55,249 - root - INFO - Step 15630: lr=1.00E-05, loss= 1.1886 (max= 2.4009), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:55,249 - root - INFO - Step 15630: lr=1.00E-05, loss= 1.1886 (max= 2.4009), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:55,249 - root - INFO - Step 15630: lr=1.00E-05, loss= 1.1886 (max= 2.4009), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:55,249 - root - INFO - Step 15630: lr=1.00E-05, loss= 1.1886 (max= 2.4009), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:55,249 - root - INFO - Step 15630: lr=1.00E-05, loss= 1.1886 (max= 2.4009), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:18:55,249 - root - INFO - Step 15630: lr=1.00E-05, loss= 1.1886 (max= 2.4009), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:13,291 - root - INFO - Step 15640: lr=1.00E-05, loss= 1.2556 (max= 2.3173), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:13,291 - root - INFO - Step 15640: lr=1.00E-05, loss= 1.2556 (max= 2.3173), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:13,291 - root - INFO - Step 15640: lr=1.00E-05, loss= 1.2556 (max= 2.3173), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:13,291 - root - INFO - Step 15640: lr=1.00E-05, loss= 1.2556 (max= 2.3173), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:13,291 - root - INFO - Step 15640: lr=1.00E-05, loss= 1.2556 (max= 2.3173), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:13,291 - root - INFO - Step 15640: lr=1.00E-05, loss= 1.2556 (max= 2.3173), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:13,291 - root - INFO - Step 15640: lr=1.00E-05, loss= 1.2556 (max= 2.3173), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:13,291 - root - INFO - Step 15640: lr=1.00E-05, loss= 1.2556 (max= 2.3173), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:31,308 - root - INFO - Step 15650: lr=1.00E-05, loss= 1.1914 (max= 2.5930), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:31,308 - root - INFO - Step 15650: lr=1.00E-05, loss= 1.1914 (max= 2.5930), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:31,308 - root - INFO - Step 15650: lr=1.00E-05, loss= 1.1914 (max= 2.5930), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:31,308 - root - INFO - Step 15650: lr=1.00E-05, loss= 1.1914 (max= 2.5930), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:31,308 - root - INFO - Step 15650: lr=1.00E-05, loss= 1.1914 (max= 2.5930), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:31,308 - root - INFO - Step 15650: lr=1.00E-05, loss= 1.1914 (max= 2.5930), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:31,308 - root - INFO - Step 15650: lr=1.00E-05, loss= 1.1914 (max= 2.5930), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:31,308 - root - INFO - Step 15650: lr=1.00E-05, loss= 1.1914 (max= 2.5930), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:49,348 - root - INFO - Step 15660: lr=1.00E-05, loss= 1.2121 (max= 2.3127), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:49,348 - root - INFO - Step 15660: lr=1.00E-05, loss= 1.2121 (max= 2.3127), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:49,348 - root - INFO - Step 15660: lr=1.00E-05, loss= 1.2121 (max= 2.3127), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:49,348 - root - INFO - Step 15660: lr=1.00E-05, loss= 1.2121 (max= 2.3127), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:49,348 - root - INFO - Step 15660: lr=1.00E-05, loss= 1.2121 (max= 2.3127), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:49,348 - root - INFO - Step 15660: lr=1.00E-05, loss= 1.2121 (max= 2.3127), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:49,348 - root - INFO - Step 15660: lr=1.00E-05, loss= 1.2121 (max= 2.3127), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:19:49,348 - root - INFO - Step 15660: lr=1.00E-05, loss= 1.2121 (max= 2.3127), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:20:07,371 - root - INFO - Step 15670: lr=1.00E-05, loss= 1.2346 (max= 2.2492), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:20:07,371 - root - INFO - Step 15670: lr=1.00E-05, loss= 1.2346 (max= 2.2492), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:20:07,372 - root - INFO - Step 15670: lr=1.00E-05, loss= 1.2346 (max= 2.2492), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:20:07,372 - root - INFO - Step 15670: lr=1.00E-05, loss= 1.2346 (max= 2.2492), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:20:07,372 - root - INFO - Step 15670: lr=1.00E-05, loss= 1.2346 (max= 2.2492), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:20:07,372 - root - INFO - Step 15670: lr=1.00E-05, loss= 1.2346 (max= 2.2492), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:20:07,372 - root - INFO - Step 15670: lr=1.00E-05, loss= 1.2346 (max= 2.2492), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:20:07,372 - root - INFO - Step 15670: lr=1.00E-05, loss= 1.2346 (max= 2.2492), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:20:25,392 - root - INFO - Step 15680: lr=1.00E-05, loss= 1.2151 (max= 2.8280), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:20:25,392 - root - INFO - Step 15680: lr=1.00E-05, loss= 1.2151 (max= 2.8280), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:20:25,393 - root - INFO - Step 15680: lr=1.00E-05, loss= 1.2151 (max= 2.8280), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:20:25,393 - root - INFO - Step 15680: lr=1.00E-05, loss= 1.2151 (max= 2.8280), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:20:25,393 - root - INFO - Step 15680: lr=1.00E-05, loss= 1.2151 (max= 2.8280), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:20:25,393 - root - INFO - Step 15680: lr=1.00E-05, loss= 1.2151 (max= 2.8280), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:20:25,393 - root - INFO - Step 15680: lr=1.00E-05, loss= 1.2151 (max= 2.8280), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:20:25,393 - root - INFO - Step 15680: lr=1.00E-05, loss= 1.2151 (max= 2.8280), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:20:43,437 - root - INFO - Step 15690: lr=1.00E-05, loss= 1.2356 (max= 2.0521), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:20:43,437 - root - INFO - Step 15690: lr=1.00E-05, loss= 1.2356 (max= 2.0521), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:20:43,437 - root - INFO - Step 15690: lr=1.00E-05, loss= 1.2356 (max= 2.0521), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:20:43,437 - root - INFO - Step 15690: lr=1.00E-05, loss= 1.2356 (max= 2.0521), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:20:43,437 - root - INFO - Step 15690: lr=1.00E-05, loss= 1.2356 (max= 2.0521), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:20:43,437 - root - INFO - Step 15690: lr=1.00E-05, loss= 1.2356 (max= 2.0521), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:20:43,437 - root - INFO - Step 15690: lr=1.00E-05, loss= 1.2356 (max= 2.0521), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:20:43,437 - root - INFO - Step 15690: lr=1.00E-05, loss= 1.2356 (max= 2.0521), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:01,479 - root - INFO - Step 15700: lr=1.00E-05, loss= 1.2245 (max= 2.3057), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:01,480 - root - INFO - Step 15700: lr=1.00E-05, loss= 1.2245 (max= 2.3057), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:01,480 - root - INFO - Step 15700: lr=1.00E-05, loss= 1.2245 (max= 2.3057), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:01,480 - root - INFO - Step 15700: lr=1.00E-05, loss= 1.2245 (max= 2.3057), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:01,480 - root - INFO - Step 15700: lr=1.00E-05, loss= 1.2245 (max= 2.3057), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:01,480 - root - INFO - Step 15700: lr=1.00E-05, loss= 1.2245 (max= 2.3057), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:01,480 - root - INFO - Step 15700: lr=1.00E-05, loss= 1.2245 (max= 2.3057), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:01,480 - root - INFO - Step 15700: lr=1.00E-05, loss= 1.2245 (max= 2.3057), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:19,550 - root - INFO - Step 15710: lr=1.00E-05, loss= 1.1909 (max= 2.1846), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:19,550 - root - INFO - Step 15710: lr=1.00E-05, loss= 1.1909 (max= 2.1846), tps=18137, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:19,550 - root - INFO - Step 15710: lr=1.00E-05, loss= 1.1909 (max= 2.1846), tps=18137, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:19,550 - root - INFO - Step 15710: lr=1.00E-05, loss= 1.1909 (max= 2.1846), tps=18137, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:19,550 - root - INFO - Step 15710: lr=1.00E-05, loss= 1.1909 (max= 2.1846), tps=18137, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:19,550 - root - INFO - Step 15710: lr=1.00E-05, loss= 1.1909 (max= 2.1846), tps=18137, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:19,550 - root - INFO - Step 15710: lr=1.00E-05, loss= 1.1909 (max= 2.1846), tps=18137, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:19,550 - root - INFO - Step 15710: lr=1.00E-05, loss= 1.1909 (max= 2.1846), tps=18137, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:37,590 - root - INFO - Step 15720: lr=1.00E-05, loss= 1.1830 (max= 2.0794), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:37,590 - root - INFO - Step 15720: lr=1.00E-05, loss= 1.1830 (max= 2.0794), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:37,590 - root - INFO - Step 15720: lr=1.00E-05, loss= 1.1830 (max= 2.0794), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:37,590 - root - INFO - Step 15720: lr=1.00E-05, loss= 1.1830 (max= 2.0794), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:37,590 - root - INFO - Step 15720: lr=1.00E-05, loss= 1.1830 (max= 2.0794), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:37,590 - root - INFO - Step 15720: lr=1.00E-05, loss= 1.1830 (max= 2.0794), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:37,590 - root - INFO - Step 15720: lr=1.00E-05, loss= 1.1830 (max= 2.0794), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:37,590 - root - INFO - Step 15720: lr=1.00E-05, loss= 1.1830 (max= 2.0794), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:55,617 - root - INFO - Step 15730: lr=1.00E-05, loss= 1.2455 (max= 2.6288), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:55,617 - root - INFO - Step 15730: lr=1.00E-05, loss= 1.2455 (max= 2.6288), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:55,617 - root - INFO - Step 15730: lr=1.00E-05, loss= 1.2455 (max= 2.6288), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:55,617 - root - INFO - Step 15730: lr=1.00E-05, loss= 1.2455 (max= 2.6288), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:55,617 - root - INFO - Step 15730: lr=1.00E-05, loss= 1.2455 (max= 2.6288), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:55,617 - root - INFO - Step 15730: lr=1.00E-05, loss= 1.2455 (max= 2.6288), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:55,617 - root - INFO - Step 15730: lr=1.00E-05, loss= 1.2455 (max= 2.6288), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:21:55,617 - root - INFO - Step 15730: lr=1.00E-05, loss= 1.2455 (max= 2.6288), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:22:13,660 - root - INFO - Step 15740: lr=1.00E-05, loss= 1.2040 (max= 2.3261), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:22:13,660 - root - INFO - Step 15740: lr=1.00E-05, loss= 1.2040 (max= 2.3261), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:22:13,660 - root - INFO - Step 15740: lr=1.00E-05, loss= 1.2040 (max= 2.3261), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:22:13,660 - root - INFO - Step 15740: lr=1.00E-05, loss= 1.2040 (max= 2.3261), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:22:13,660 - root - INFO - Step 15740: lr=1.00E-05, loss= 1.2040 (max= 2.3261), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:22:13,660 - root - INFO - Step 15740: lr=1.00E-05, loss= 1.2040 (max= 2.3261), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:22:13,660 - root - INFO - Step 15740: lr=1.00E-05, loss= 1.2040 (max= 2.3261), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:22:13,660 - root - INFO - Step 15740: lr=1.00E-05, loss= 1.2040 (max= 2.3261), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:22:31,714 - root - INFO - Step 15750: lr=1.00E-05, loss= 1.2009 (max= 2.3857), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:22:31,714 - root - INFO - Step 15750: lr=1.00E-05, loss= 1.2009 (max= 2.3857), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:22:31,714 - root - INFO - Step 15750: lr=1.00E-05, loss= 1.2009 (max= 2.3857), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:22:31,714 - root - INFO - Step 15750: lr=1.00E-05, loss= 1.2009 (max= 2.3857), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:22:31,714 - root - INFO - Step 15750: lr=1.00E-05, loss= 1.2009 (max= 2.3857), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:22:31,714 - root - INFO - Step 15750: lr=1.00E-05, loss= 1.2009 (max= 2.3857), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:22:31,714 - root - INFO - Step 15750: lr=1.00E-05, loss= 1.2009 (max= 2.3857), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:22:31,714 - root - INFO - Step 15750: lr=1.00E-05, loss= 1.2009 (max= 2.3857), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:22:49,730 - root - INFO - Step 15760: lr=1.00E-05, loss= 1.1906 (max= 2.1847), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:22:49,730 - root - INFO - Step 15760: lr=1.00E-05, loss= 1.1906 (max= 2.1847), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:22:49,730 - root - INFO - Step 15760: lr=1.00E-05, loss= 1.1906 (max= 2.1847), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:22:49,730 - root - INFO - Step 15760: lr=1.00E-05, loss= 1.1906 (max= 2.1847), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:22:49,730 - root - INFO - Step 15760: lr=1.00E-05, loss= 1.1906 (max= 2.1847), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:22:49,730 - root - INFO - Step 15760: lr=1.00E-05, loss= 1.1906 (max= 2.1847), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:22:49,730 - root - INFO - Step 15760: lr=1.00E-05, loss= 1.1906 (max= 2.1847), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:22:49,730 - root - INFO - Step 15760: lr=1.00E-05, loss= 1.1906 (max= 2.1847), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:23:07,736 - root - INFO - Step 15770: lr=1.00E-05, loss= 1.1881 (max= 2.6058), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:07,736 - root - INFO - Step 15770: lr=1.00E-05, loss= 1.1881 (max= 2.6058), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:07,736 - root - INFO - Step 15770: lr=1.00E-05, loss= 1.1881 (max= 2.6058), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:07,736 - root - INFO - Step 15770: lr=1.00E-05, loss= 1.1881 (max= 2.6058), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:07,736 - root - INFO - Step 15770: lr=1.00E-05, loss= 1.1881 (max= 2.6058), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:07,736 - root - INFO - Step 15770: lr=1.00E-05, loss= 1.1881 (max= 2.6058), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:07,736 - root - INFO - Step 15770: lr=1.00E-05, loss= 1.1881 (max= 2.6058), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:07,736 - root - INFO - Step 15770: lr=1.00E-05, loss= 1.1881 (max= 2.6058), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:25,753 - root - INFO - Step 15780: lr=1.00E-05, loss= 1.2263 (max= 2.0705), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:25,753 - root - INFO - Step 15780: lr=1.00E-05, loss= 1.2263 (max= 2.0705), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:25,753 - root - INFO - Step 15780: lr=1.00E-05, loss= 1.2263 (max= 2.0705), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:25,753 - root - INFO - Step 15780: lr=1.00E-05, loss= 1.2263 (max= 2.0705), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:25,753 - root - INFO - Step 15780: lr=1.00E-05, loss= 1.2263 (max= 2.0705), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:25,753 - root - INFO - Step 15780: lr=1.00E-05, loss= 1.2263 (max= 2.0705), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:25,753 - root - INFO - Step 15780: lr=1.00E-05, loss= 1.2263 (max= 2.0705), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:25,753 - root - INFO - Step 15780: lr=1.00E-05, loss= 1.2263 (max= 2.0705), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:43,829 - root - INFO - Step 15790: lr=1.00E-05, loss= 1.1871 (max= 2.1674), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:43,829 - root - INFO - Step 15790: lr=1.00E-05, loss= 1.1871 (max= 2.1674), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:43,829 - root - INFO - Step 15790: lr=1.00E-05, loss= 1.1871 (max= 2.1674), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:43,829 - root - INFO - Step 15790: lr=1.00E-05, loss= 1.1871 (max= 2.1674), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:43,829 - root - INFO - Step 15790: lr=1.00E-05, loss= 1.1871 (max= 2.1674), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:43,829 - root - INFO - Step 15790: lr=1.00E-05, loss= 1.1871 (max= 2.1674), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:43,829 - root - INFO - Step 15790: lr=1.00E-05, loss= 1.1871 (max= 2.1674), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:23:43,829 - root - INFO - Step 15790: lr=1.00E-05, loss= 1.1871 (max= 2.1674), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:01,821 - root - INFO - Step 15800: lr=1.00E-05, loss= 1.2148 (max= 2.3505), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:01,821 - root - INFO - Step 15800: lr=1.00E-05, loss= 1.2148 (max= 2.3505), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:01,821 - root - INFO - Step 15800: lr=1.00E-05, loss= 1.2148 (max= 2.3505), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:01,821 - root - INFO - Step 15800: lr=1.00E-05, loss= 1.2148 (max= 2.3505), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:01,822 - root - INFO - Step 15800: lr=1.00E-05, loss= 1.2148 (max= 2.3505), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:01,822 - root - INFO - Step 15800: lr=1.00E-05, loss= 1.2148 (max= 2.3505), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:01,822 - root - INFO - Step 15800: lr=1.00E-05, loss= 1.2148 (max= 2.3505), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:01,822 - root - INFO - Step 15800: lr=1.00E-05, loss= 1.2148 (max= 2.3505), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:19,879 - root - INFO - Step 15810: lr=1.00E-05, loss= 1.2290 (max= 2.1348), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:19,879 - root - INFO - Step 15810: lr=1.00E-05, loss= 1.2290 (max= 2.1348), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:19,879 - root - INFO - Step 15810: lr=1.00E-05, loss= 1.2290 (max= 2.1348), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:19,879 - root - INFO - Step 15810: lr=1.00E-05, loss= 1.2290 (max= 2.1348), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:19,879 - root - INFO - Step 15810: lr=1.00E-05, loss= 1.2290 (max= 2.1348), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:19,879 - root - INFO - Step 15810: lr=1.00E-05, loss= 1.2290 (max= 2.1348), tps=18150, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:19,879 - root - INFO - Step 15810: lr=1.00E-05, loss= 1.2290 (max= 2.1348), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:19,879 - root - INFO - Step 15810: lr=1.00E-05, loss= 1.2290 (max= 2.1348), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:37,925 - root - INFO - Step 15820: lr=1.00E-05, loss= 1.2176 (max= 2.2094), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:37,925 - root - INFO - Step 15820: lr=1.00E-05, loss= 1.2176 (max= 2.2094), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:37,925 - root - INFO - Step 15820: lr=1.00E-05, loss= 1.2176 (max= 2.2094), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:37,925 - root - INFO - Step 15820: lr=1.00E-05, loss= 1.2176 (max= 2.2094), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:37,925 - root - INFO - Step 15820: lr=1.00E-05, loss= 1.2176 (max= 2.2094), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:37,925 - root - INFO - Step 15820: lr=1.00E-05, loss= 1.2176 (max= 2.2094), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:37,925 - root - INFO - Step 15820: lr=1.00E-05, loss= 1.2176 (max= 2.2094), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:37,925 - root - INFO - Step 15820: lr=1.00E-05, loss= 1.2176 (max= 2.2094), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:55,945 - root - INFO - Step 15830: lr=1.00E-05, loss= 1.2271 (max= 2.2785), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:55,946 - root - INFO - Step 15830: lr=1.00E-05, loss= 1.2271 (max= 2.2785), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:55,946 - root - INFO - Step 15830: lr=1.00E-05, loss= 1.2271 (max= 2.2785), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:55,946 - root - INFO - Step 15830: lr=1.00E-05, loss= 1.2271 (max= 2.2785), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:55,946 - root - INFO - Step 15830: lr=1.00E-05, loss= 1.2271 (max= 2.2785), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:55,946 - root - INFO - Step 15830: lr=1.00E-05, loss= 1.2271 (max= 2.2785), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:55,946 - root - INFO - Step 15830: lr=1.00E-05, loss= 1.2271 (max= 2.2785), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:24:55,946 - root - INFO - Step 15830: lr=1.00E-05, loss= 1.2271 (max= 2.2785), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:25:13,970 - root - INFO - Step 15840: lr=1.00E-05, loss= 1.2414 (max= 2.8833), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:25:13,970 - root - INFO - Step 15840: lr=1.00E-05, loss= 1.2414 (max= 2.8833), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:25:13,970 - root - INFO - Step 15840: lr=1.00E-05, loss= 1.2414 (max= 2.8833), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:25:13,970 - root - INFO - Step 15840: lr=1.00E-05, loss= 1.2414 (max= 2.8833), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:25:13,970 - root - INFO - Step 15840: lr=1.00E-05, loss= 1.2414 (max= 2.8833), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:25:13,970 - root - INFO - Step 15840: lr=1.00E-05, loss= 1.2414 (max= 2.8833), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:25:13,970 - root - INFO - Step 15840: lr=1.00E-05, loss= 1.2414 (max= 2.8833), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:25:13,970 - root - INFO - Step 15840: lr=1.00E-05, loss= 1.2414 (max= 2.8833), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:25:31,993 - root - INFO - Step 15850: lr=1.00E-05, loss= 1.2381 (max= 2.3170), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:25:31,993 - root - INFO - Step 15850: lr=1.00E-05, loss= 1.2381 (max= 2.3170), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:25:31,993 - root - INFO - Step 15850: lr=1.00E-05, loss= 1.2381 (max= 2.3170), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:25:31,993 - root - INFO - Step 15850: lr=1.00E-05, loss= 1.2381 (max= 2.3170), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:25:31,993 - root - INFO - Step 15850: lr=1.00E-05, loss= 1.2381 (max= 2.3170), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:25:31,993 - root - INFO - Step 15850: lr=1.00E-05, loss= 1.2381 (max= 2.3170), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:25:31,993 - root - INFO - Step 15850: lr=1.00E-05, loss= 1.2381 (max= 2.3170), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:25:31,993 - root - INFO - Step 15850: lr=1.00E-05, loss= 1.2381 (max= 2.3170), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:25:50,010 - root - INFO - Step 15860: lr=1.00E-05, loss= 1.2132 (max= 2.3073), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:25:50,010 - root - INFO - Step 15860: lr=1.00E-05, loss= 1.2132 (max= 2.3073), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:25:50,010 - root - INFO - Step 15860: lr=1.00E-05, loss= 1.2132 (max= 2.3073), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:25:50,010 - root - INFO - Step 15860: lr=1.00E-05, loss= 1.2132 (max= 2.3073), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:25:50,010 - root - INFO - Step 15860: lr=1.00E-05, loss= 1.2132 (max= 2.3073), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:25:50,011 - root - INFO - Step 15860: lr=1.00E-05, loss= 1.2132 (max= 2.3073), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:25:50,011 - root - INFO - Step 15860: lr=1.00E-05, loss= 1.2132 (max= 2.3073), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:25:50,011 - root - INFO - Step 15860: lr=1.00E-05, loss= 1.2132 (max= 2.3073), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:26:08,044 - root - INFO - Step 15870: lr=1.00E-05, loss= 1.1999 (max= 2.3691), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:08,044 - root - INFO - Step 15870: lr=1.00E-05, loss= 1.1999 (max= 2.3691), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:08,044 - root - INFO - Step 15870: lr=1.00E-05, loss= 1.1999 (max= 2.3691), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:08,044 - root - INFO - Step 15870: lr=1.00E-05, loss= 1.1999 (max= 2.3691), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:08,044 - root - INFO - Step 15870: lr=1.00E-05, loss= 1.1999 (max= 2.3691), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:08,044 - root - INFO - Step 15870: lr=1.00E-05, loss= 1.1999 (max= 2.3691), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:08,044 - root - INFO - Step 15870: lr=1.00E-05, loss= 1.1999 (max= 2.3691), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:08,044 - root - INFO - Step 15870: lr=1.00E-05, loss= 1.1999 (max= 2.3691), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:26,089 - root - INFO - Step 15880: lr=1.00E-05, loss= 1.2003 (max= 2.8689), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:26,089 - root - INFO - Step 15880: lr=1.00E-05, loss= 1.2003 (max= 2.8689), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:26,089 - root - INFO - Step 15880: lr=1.00E-05, loss= 1.2003 (max= 2.8689), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:26,089 - root - INFO - Step 15880: lr=1.00E-05, loss= 1.2003 (max= 2.8689), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:26,089 - root - INFO - Step 15880: lr=1.00E-05, loss= 1.2003 (max= 2.8689), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:26,089 - root - INFO - Step 15880: lr=1.00E-05, loss= 1.2003 (max= 2.8689), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:26,089 - root - INFO - Step 15880: lr=1.00E-05, loss= 1.2003 (max= 2.8689), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:26,089 - root - INFO - Step 15880: lr=1.00E-05, loss= 1.2003 (max= 2.8689), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:44,113 - root - INFO - Step 15890: lr=1.00E-05, loss= 1.2409 (max= 2.8036), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:44,113 - root - INFO - Step 15890: lr=1.00E-05, loss= 1.2409 (max= 2.8036), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:44,114 - root - INFO - Step 15890: lr=1.00E-05, loss= 1.2409 (max= 2.8036), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:44,114 - root - INFO - Step 15890: lr=1.00E-05, loss= 1.2409 (max= 2.8036), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:44,114 - root - INFO - Step 15890: lr=1.00E-05, loss= 1.2409 (max= 2.8036), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:44,114 - root - INFO - Step 15890: lr=1.00E-05, loss= 1.2409 (max= 2.8036), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:44,114 - root - INFO - Step 15890: lr=1.00E-05, loss= 1.2409 (max= 2.8036), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:26:44,114 - root - INFO - Step 15890: lr=1.00E-05, loss= 1.2409 (max= 2.8036), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:27:02,105 - root - INFO - Step 15900: lr=1.00E-05, loss= 1.2220 (max= 2.7977), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:27:02,105 - root - INFO - Step 15900: lr=1.00E-05, loss= 1.2220 (max= 2.7977), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:27:02,105 - root - INFO - Step 15900: lr=1.00E-05, loss= 1.2220 (max= 2.7977), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:27:02,105 - root - INFO - Step 15900: lr=1.00E-05, loss= 1.2220 (max= 2.7977), tps=18217, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:27:02,105 - root - INFO - Step 15900: lr=1.00E-05, loss= 1.2220 (max= 2.7977), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:27:02,105 - root - INFO - Step 15900: lr=1.00E-05, loss= 1.2220 (max= 2.7977), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:27:02,105 - root - INFO - Step 15900: lr=1.00E-05, loss= 1.2220 (max= 2.7977), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:27:02,105 - root - INFO - Step 15900: lr=1.00E-05, loss= 1.2220 (max= 2.7977), tps=18216, mfu=37.95%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:27:20,113 - root - INFO - Step 15910: lr=1.00E-05, loss= 1.2332 (max= 2.2308), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:27:20,113 - root - INFO - Step 15910: lr=1.00E-05, loss= 1.2332 (max= 2.2308), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:27:20,113 - root - INFO - Step 15910: lr=1.00E-05, loss= 1.2332 (max= 2.2308), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:27:20,113 - root - INFO - Step 15910: lr=1.00E-05, loss= 1.2332 (max= 2.2308), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:27:20,113 - root - INFO - Step 15910: lr=1.00E-05, loss= 1.2332 (max= 2.2308), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:27:20,113 - root - INFO - Step 15910: lr=1.00E-05, loss= 1.2332 (max= 2.2308), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:27:20,114 - root - INFO - Step 15910: lr=1.00E-05, loss= 1.2332 (max= 2.2308), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:27:20,114 - root - INFO - Step 15910: lr=1.00E-05, loss= 1.2332 (max= 2.2308), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:27:38,150 - root - INFO - Step 15920: lr=1.00E-05, loss= 1.2254 (max= 2.3444), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:27:38,150 - root - INFO - Step 15920: lr=1.00E-05, loss= 1.2254 (max= 2.3444), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:27:38,150 - root - INFO - Step 15920: lr=1.00E-05, loss= 1.2254 (max= 2.3444), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:27:38,150 - root - INFO - Step 15920: lr=1.00E-05, loss= 1.2254 (max= 2.3444), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:27:38,150 - root - INFO - Step 15920: lr=1.00E-05, loss= 1.2254 (max= 2.3444), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:27:38,150 - root - INFO - Step 15920: lr=1.00E-05, loss= 1.2254 (max= 2.3444), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:27:38,150 - root - INFO - Step 15920: lr=1.00E-05, loss= 1.2254 (max= 2.3444), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:27:38,150 - root - INFO - Step 15920: lr=1.00E-05, loss= 1.2254 (max= 2.3444), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:27:56,213 - root - INFO - Step 15930: lr=1.00E-05, loss= 1.2465 (max= 2.3238), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:27:56,213 - root - INFO - Step 15930: lr=1.00E-05, loss= 1.2465 (max= 2.3238), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:27:56,213 - root - INFO - Step 15930: lr=1.00E-05, loss= 1.2465 (max= 2.3238), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:27:56,213 - root - INFO - Step 15930: lr=1.00E-05, loss= 1.2465 (max= 2.3238), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:27:56,213 - root - INFO - Step 15930: lr=1.00E-05, loss= 1.2465 (max= 2.3238), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:27:56,213 - root - INFO - Step 15930: lr=1.00E-05, loss= 1.2465 (max= 2.3238), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:27:56,213 - root - INFO - Step 15930: lr=1.00E-05, loss= 1.2465 (max= 2.3238), tps=18145, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:27:56,213 - root - INFO - Step 15930: lr=1.00E-05, loss= 1.2465 (max= 2.3238), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:14,225 - root - INFO - Step 15940: lr=1.00E-05, loss= 1.1897 (max= 2.3544), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:14,225 - root - INFO - Step 15940: lr=1.00E-05, loss= 1.1897 (max= 2.3544), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:14,225 - root - INFO - Step 15940: lr=1.00E-05, loss= 1.1897 (max= 2.3544), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:14,225 - root - INFO - Step 15940: lr=1.00E-05, loss= 1.1897 (max= 2.3544), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:14,225 - root - INFO - Step 15940: lr=1.00E-05, loss= 1.1897 (max= 2.3544), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:14,225 - root - INFO - Step 15940: lr=1.00E-05, loss= 1.1897 (max= 2.3544), tps=18195, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:14,225 - root - INFO - Step 15940: lr=1.00E-05, loss= 1.1897 (max= 2.3544), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:14,225 - root - INFO - Step 15940: lr=1.00E-05, loss= 1.1897 (max= 2.3544), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:32,260 - root - INFO - Step 15950: lr=1.00E-05, loss= 1.2005 (max= 2.1021), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:32,260 - root - INFO - Step 15950: lr=1.00E-05, loss= 1.2005 (max= 2.1021), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:32,260 - root - INFO - Step 15950: lr=1.00E-05, loss= 1.2005 (max= 2.1021), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:32,261 - root - INFO - Step 15950: lr=1.00E-05, loss= 1.2005 (max= 2.1021), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:32,261 - root - INFO - Step 15950: lr=1.00E-05, loss= 1.2005 (max= 2.1021), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:32,261 - root - INFO - Step 15950: lr=1.00E-05, loss= 1.2005 (max= 2.1021), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:32,261 - root - INFO - Step 15950: lr=1.00E-05, loss= 1.2005 (max= 2.1021), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:32,261 - root - INFO - Step 15950: lr=1.00E-05, loss= 1.2005 (max= 2.1021), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:50,316 - root - INFO - Step 15960: lr=1.00E-05, loss= 1.2201 (max= 2.2208), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:50,316 - root - INFO - Step 15960: lr=1.00E-05, loss= 1.2201 (max= 2.2208), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:50,316 - root - INFO - Step 15960: lr=1.00E-05, loss= 1.2201 (max= 2.2208), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:50,316 - root - INFO - Step 15960: lr=1.00E-05, loss= 1.2201 (max= 2.2208), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:50,316 - root - INFO - Step 15960: lr=1.00E-05, loss= 1.2201 (max= 2.2208), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:50,316 - root - INFO - Step 15960: lr=1.00E-05, loss= 1.2201 (max= 2.2208), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:50,316 - root - INFO - Step 15960: lr=1.00E-05, loss= 1.2201 (max= 2.2208), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:28:50,316 - root - INFO - Step 15960: lr=1.00E-05, loss= 1.2201 (max= 2.2208), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:08,348 - root - INFO - Step 15970: lr=1.00E-05, loss= 1.2291 (max= 3.6505), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:08,348 - root - INFO - Step 15970: lr=1.00E-05, loss= 1.2291 (max= 3.6505), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:08,348 - root - INFO - Step 15970: lr=1.00E-05, loss= 1.2291 (max= 3.6505), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:08,349 - root - INFO - Step 15970: lr=1.00E-05, loss= 1.2291 (max= 3.6505), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:08,349 - root - INFO - Step 15970: lr=1.00E-05, loss= 1.2291 (max= 3.6505), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:08,349 - root - INFO - Step 15970: lr=1.00E-05, loss= 1.2291 (max= 3.6505), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:08,349 - root - INFO - Step 15970: lr=1.00E-05, loss= 1.2291 (max= 3.6505), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:08,349 - root - INFO - Step 15970: lr=1.00E-05, loss= 1.2291 (max= 3.6505), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:26,367 - root - INFO - Step 15980: lr=1.00E-05, loss= 1.2241 (max= 2.6168), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:26,367 - root - INFO - Step 15980: lr=1.00E-05, loss= 1.2241 (max= 2.6168), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:26,367 - root - INFO - Step 15980: lr=1.00E-05, loss= 1.2241 (max= 2.6168), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:26,367 - root - INFO - Step 15980: lr=1.00E-05, loss= 1.2241 (max= 2.6168), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:26,367 - root - INFO - Step 15980: lr=1.00E-05, loss= 1.2241 (max= 2.6168), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:26,367 - root - INFO - Step 15980: lr=1.00E-05, loss= 1.2241 (max= 2.6168), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:26,367 - root - INFO - Step 15980: lr=1.00E-05, loss= 1.2241 (max= 2.6168), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:26,367 - root - INFO - Step 15980: lr=1.00E-05, loss= 1.2241 (max= 2.6168), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:44,393 - root - INFO - Step 15990: lr=1.00E-05, loss= 1.1966 (max= 2.2278), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:44,393 - root - INFO - Step 15990: lr=1.00E-05, loss= 1.1966 (max= 2.2278), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:44,393 - root - INFO - Step 15990: lr=1.00E-05, loss= 1.1966 (max= 2.2278), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:44,394 - root - INFO - Step 15990: lr=1.00E-05, loss= 1.1966 (max= 2.2278), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:44,394 - root - INFO - Step 15990: lr=1.00E-05, loss= 1.1966 (max= 2.2278), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:44,394 - root - INFO - Step 15990: lr=1.00E-05, loss= 1.1966 (max= 2.2278), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:44,394 - root - INFO - Step 15990: lr=1.00E-05, loss= 1.1966 (max= 2.2278), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:29:44,394 - root - INFO - Step 15990: lr=1.00E-05, loss= 1.1966 (max= 2.2278), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-16000 +2025-10-24 17:30:02,421 - root - INFO - Step 16000: lr=1.00E-05, loss= 1.2087 (max= 2.2496), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:30:02,421 - root - INFO - Saving a full checkpoint at step 16000 +2025-10-24 17:30:02,421 - root - INFO - Step 16000: lr=1.00E-05, loss= 1.2087 (max= 2.2496), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:30:02,421 - root - INFO - Step 16000: lr=1.00E-05, loss= 1.2087 (max= 2.2496), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:30:02,421 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 17:30:02,421 - root - INFO - Saving a full checkpoint at step 16000 +2025-10-24 17:30:02,421 - root - INFO - Saving a full checkpoint at step 16000 +2025-10-24 17:30:02,421 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 17:30:02,421 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 17:30:02,421 - root - INFO - Step 16000: lr=1.00E-05, loss= 1.2087 (max= 2.2496), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:30:02,421 - root - INFO - Step 16000: lr=1.00E-05, loss= 1.2087 (max= 2.2496), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:30:02,421 - root - INFO - Step 16000: lr=1.00E-05, loss= 1.2087 (max= 2.2496), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:30:02,421 - root - INFO - Step 16000: lr=1.00E-05, loss= 1.2087 (max= 2.2496), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:30:02,421 - root - INFO - Saving a full checkpoint at step 16000 +2025-10-24 17:30:02,421 - root - INFO - Saving a full checkpoint at step 16000 +2025-10-24 17:30:02,421 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 17:30:02,421 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 17:30:02,421 - root - INFO - Saving a full checkpoint at step 16000 +2025-10-24 17:30:02,421 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 17:30:02,421 - root - INFO - Saving a full checkpoint at step 16000 +2025-10-24 17:30:02,421 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 17:30:02,421 - root - INFO - Step 16000: lr=1.00E-05, loss= 1.2087 (max= 2.2496), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:30:02,422 - root - INFO - Saving a full checkpoint at step 16000 +2025-10-24 17:30:02,422 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-16000! Save time: 4.768320322036743 +2025-10-24 17:30:16,775 - root - INFO - Finished saving the checkpoint in 14.35 seconds +2025-10-24 17:30:16,782 - root - INFO - Finished saving the checkpoint in 14.36 seconds +2025-10-24 17:30:16,783 - root - INFO - Finished saving the checkpoint in 14.36 seconds +2025-10-24 17:30:16,784 - root - INFO - Finished saving the checkpoint in 14.36 seconds +2025-10-24 17:30:16,784 - root - INFO - Finished saving the checkpoint in 14.36 seconds +2025-10-24 17:30:16,784 - root - INFO - Finished saving the checkpoint in 14.36 seconds +2025-10-24 17:30:16,785 - root - INFO - Finished saving the checkpoint in 14.36 seconds +2025-10-24 17:30:16,785 - root - INFO - Finished saving the checkpoint in 14.36 seconds +2025-10-24 17:30:34,791 - root - INFO - Step 16010: lr=1.00E-05, loss= 1.2198 (max= 2.8469), tps=10124, mfu=21.09%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 17:30:34,791 - root - INFO - Step 16010: lr=1.00E-05, loss= 1.2198 (max= 2.8469), tps=10124, mfu=21.09%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 17:30:34,791 - root - INFO - Step 16010: lr=1.00E-05, loss= 1.2198 (max= 2.8469), tps=10124, mfu=21.09%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 17:30:34,791 - root - INFO - Step 16010: lr=1.00E-05, loss= 1.2198 (max= 2.8469), tps=10124, mfu=21.09%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 17:30:34,791 - root - INFO - Step 16010: lr=1.00E-05, loss= 1.2198 (max= 2.8469), tps=10124, mfu=21.09%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 17:30:34,791 - root - INFO - Step 16010: lr=1.00E-05, loss= 1.2198 (max= 2.8469), tps=10124, mfu=21.09%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 17:30:34,791 - root - INFO - Step 16010: lr=1.00E-05, loss= 1.2198 (max= 2.8469), tps=10124, mfu=21.09%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 17:30:34,791 - root - INFO - Step 16010: lr=1.00E-05, loss= 1.2198 (max= 2.8469), tps=10124, mfu=21.09%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 17:30:52,837 - root - INFO - Step 16020: lr=1.00E-05, loss= 1.1664 (max= 2.1168), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:30:52,837 - root - INFO - Step 16020: lr=1.00E-05, loss= 1.1664 (max= 2.1168), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:30:52,837 - root - INFO - Step 16020: lr=1.00E-05, loss= 1.1664 (max= 2.1168), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:30:52,837 - root - INFO - Step 16020: lr=1.00E-05, loss= 1.1664 (max= 2.1168), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:30:52,837 - root - INFO - Step 16020: lr=1.00E-05, loss= 1.1664 (max= 2.1168), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:30:52,838 - root - INFO - Step 16020: lr=1.00E-05, loss= 1.1664 (max= 2.1168), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:30:52,838 - root - INFO - Step 16020: lr=1.00E-05, loss= 1.1664 (max= 2.1168), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:30:52,838 - root - INFO - Step 16020: lr=1.00E-05, loss= 1.1664 (max= 2.1168), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:31:10,893 - root - INFO - Step 16030: lr=1.00E-05, loss= 1.2018 (max= 2.0446), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:10,893 - root - INFO - Step 16030: lr=1.00E-05, loss= 1.2018 (max= 2.0446), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:10,893 - root - INFO - Step 16030: lr=1.00E-05, loss= 1.2018 (max= 2.0446), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:10,893 - root - INFO - Step 16030: lr=1.00E-05, loss= 1.2018 (max= 2.0446), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:10,893 - root - INFO - Step 16030: lr=1.00E-05, loss= 1.2018 (max= 2.0446), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:10,893 - root - INFO - Step 16030: lr=1.00E-05, loss= 1.2018 (max= 2.0446), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:10,893 - root - INFO - Step 16030: lr=1.00E-05, loss= 1.2018 (max= 2.0446), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:10,893 - root - INFO - Step 16030: lr=1.00E-05, loss= 1.2018 (max= 2.0446), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:28,945 - root - INFO - Step 16040: lr=1.00E-05, loss= 1.2183 (max= 2.5601), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:28,945 - root - INFO - Step 16040: lr=1.00E-05, loss= 1.2183 (max= 2.5601), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:28,945 - root - INFO - Step 16040: lr=1.00E-05, loss= 1.2183 (max= 2.5601), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:28,945 - root - INFO - Step 16040: lr=1.00E-05, loss= 1.2183 (max= 2.5601), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:28,945 - root - INFO - Step 16040: lr=1.00E-05, loss= 1.2183 (max= 2.5601), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:28,945 - root - INFO - Step 16040: lr=1.00E-05, loss= 1.2183 (max= 2.5601), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:28,946 - root - INFO - Step 16040: lr=1.00E-05, loss= 1.2183 (max= 2.5601), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:28,946 - root - INFO - Step 16040: lr=1.00E-05, loss= 1.2183 (max= 2.5601), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:46,979 - root - INFO - Step 16050: lr=1.00E-05, loss= 1.2369 (max= 2.2698), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:46,979 - root - INFO - Step 16050: lr=1.00E-05, loss= 1.2369 (max= 2.2698), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:46,980 - root - INFO - Step 16050: lr=1.00E-05, loss= 1.2369 (max= 2.2698), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:46,980 - root - INFO - Step 16050: lr=1.00E-05, loss= 1.2369 (max= 2.2698), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:46,980 - root - INFO - Step 16050: lr=1.00E-05, loss= 1.2369 (max= 2.2698), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:46,980 - root - INFO - Step 16050: lr=1.00E-05, loss= 1.2369 (max= 2.2698), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:46,980 - root - INFO - Step 16050: lr=1.00E-05, loss= 1.2369 (max= 2.2698), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:31:46,980 - root - INFO - Step 16050: lr=1.00E-05, loss= 1.2369 (max= 2.2698), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:32:05,003 - root - INFO - Step 16060: lr=1.00E-05, loss= 1.2289 (max= 2.4815), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:05,003 - root - INFO - Step 16060: lr=1.00E-05, loss= 1.2289 (max= 2.4815), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:05,003 - root - INFO - Step 16060: lr=1.00E-05, loss= 1.2289 (max= 2.4815), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:05,003 - root - INFO - Step 16060: lr=1.00E-05, loss= 1.2289 (max= 2.4815), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:05,003 - root - INFO - Step 16060: lr=1.00E-05, loss= 1.2289 (max= 2.4815), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:05,003 - root - INFO - Step 16060: lr=1.00E-05, loss= 1.2289 (max= 2.4815), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:05,004 - root - INFO - Step 16060: lr=1.00E-05, loss= 1.2289 (max= 2.4815), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:05,004 - root - INFO - Step 16060: lr=1.00E-05, loss= 1.2289 (max= 2.4815), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:23,097 - root - INFO - Step 16070: lr=1.00E-05, loss= 1.1835 (max= 2.3259), tps=18113, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:23,097 - root - INFO - Step 16070: lr=1.00E-05, loss= 1.1835 (max= 2.3259), tps=18113, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:23,098 - root - INFO - Step 16070: lr=1.00E-05, loss= 1.1835 (max= 2.3259), tps=18113, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:23,098 - root - INFO - Step 16070: lr=1.00E-05, loss= 1.1835 (max= 2.3259), tps=18114, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:23,098 - root - INFO - Step 16070: lr=1.00E-05, loss= 1.1835 (max= 2.3259), tps=18113, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:23,098 - root - INFO - Step 16070: lr=1.00E-05, loss= 1.1835 (max= 2.3259), tps=18113, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:23,098 - root - INFO - Step 16070: lr=1.00E-05, loss= 1.1835 (max= 2.3259), tps=18113, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:23,098 - root - INFO - Step 16070: lr=1.00E-05, loss= 1.1835 (max= 2.3259), tps=18113, mfu=37.74%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:41,137 - root - INFO - Step 16080: lr=1.00E-05, loss= 1.2378 (max= 2.5514), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:32:41,137 - root - INFO - Step 16080: lr=1.00E-05, loss= 1.2378 (max= 2.5514), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:32:41,137 - root - INFO - Step 16080: lr=1.00E-05, loss= 1.2378 (max= 2.5514), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:32:41,137 - root - INFO - Step 16080: lr=1.00E-05, loss= 1.2378 (max= 2.5514), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:32:41,137 - root - INFO - Step 16080: lr=1.00E-05, loss= 1.2378 (max= 2.5514), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:32:41,137 - root - INFO - Step 16080: lr=1.00E-05, loss= 1.2378 (max= 2.5514), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:32:41,137 - root - INFO - Step 16080: lr=1.00E-05, loss= 1.2378 (max= 2.5514), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:32:41,137 - root - INFO - Step 16080: lr=1.00E-05, loss= 1.2378 (max= 2.5514), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:32:59,145 - root - INFO - Step 16090: lr=1.00E-05, loss= 1.2126 (max= 2.2053), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:59,145 - root - INFO - Step 16090: lr=1.00E-05, loss= 1.2126 (max= 2.2053), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:59,145 - root - INFO - Step 16090: lr=1.00E-05, loss= 1.2126 (max= 2.2053), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:59,145 - root - INFO - Step 16090: lr=1.00E-05, loss= 1.2126 (max= 2.2053), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:59,145 - root - INFO - Step 16090: lr=1.00E-05, loss= 1.2126 (max= 2.2053), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:59,145 - root - INFO - Step 16090: lr=1.00E-05, loss= 1.2126 (max= 2.2053), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:59,145 - root - INFO - Step 16090: lr=1.00E-05, loss= 1.2126 (max= 2.2053), tps=18200, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:32:59,145 - root - INFO - Step 16090: lr=1.00E-05, loss= 1.2126 (max= 2.2053), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:33:17,199 - root - INFO - Step 16100: lr=1.00E-05, loss= 1.2442 (max= 2.2165), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:33:17,199 - root - INFO - Step 16100: lr=1.00E-05, loss= 1.2442 (max= 2.2165), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:33:17,200 - root - INFO - Step 16100: lr=1.00E-05, loss= 1.2442 (max= 2.2165), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:33:17,200 - root - INFO - Step 16100: lr=1.00E-05, loss= 1.2442 (max= 2.2165), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:33:17,200 - root - INFO - Step 16100: lr=1.00E-05, loss= 1.2442 (max= 2.2165), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:33:17,200 - root - INFO - Step 16100: lr=1.00E-05, loss= 1.2442 (max= 2.2165), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:33:17,200 - root - INFO - Step 16100: lr=1.00E-05, loss= 1.2442 (max= 2.2165), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:33:17,200 - root - INFO - Step 16100: lr=1.00E-05, loss= 1.2442 (max= 2.2165), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:33:35,248 - root - INFO - Step 16110: lr=1.00E-05, loss= 1.2056 (max= 2.1361), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:33:35,248 - root - INFO - Step 16110: lr=1.00E-05, loss= 1.2056 (max= 2.1361), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:33:35,248 - root - INFO - Step 16110: lr=1.00E-05, loss= 1.2056 (max= 2.1361), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:33:35,248 - root - INFO - Step 16110: lr=1.00E-05, loss= 1.2056 (max= 2.1361), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:33:35,248 - root - INFO - Step 16110: lr=1.00E-05, loss= 1.2056 (max= 2.1361), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:33:35,248 - root - INFO - Step 16110: lr=1.00E-05, loss= 1.2056 (max= 2.1361), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:33:35,248 - root - INFO - Step 16110: lr=1.00E-05, loss= 1.2056 (max= 2.1361), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:33:35,249 - root - INFO - Step 16110: lr=1.00E-05, loss= 1.2056 (max= 2.1361), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:33:53,284 - root - INFO - Step 16120: lr=1.00E-05, loss= 1.2127 (max= 2.2710), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:33:53,284 - root - INFO - Step 16120: lr=1.00E-05, loss= 1.2127 (max= 2.2710), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:33:53,284 - root - INFO - Step 16120: lr=1.00E-05, loss= 1.2127 (max= 2.2710), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:33:53,284 - root - INFO - Step 16120: lr=1.00E-05, loss= 1.2127 (max= 2.2710), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:33:53,284 - root - INFO - Step 16120: lr=1.00E-05, loss= 1.2127 (max= 2.2710), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:33:53,284 - root - INFO - Step 16120: lr=1.00E-05, loss= 1.2127 (max= 2.2710), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:33:53,284 - root - INFO - Step 16120: lr=1.00E-05, loss= 1.2127 (max= 2.2710), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:33:53,284 - root - INFO - Step 16120: lr=1.00E-05, loss= 1.2127 (max= 2.2710), tps=18171, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:11,336 - root - INFO - Step 16130: lr=1.00E-05, loss= 1.1984 (max= 2.3452), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:11,337 - root - INFO - Step 16130: lr=1.00E-05, loss= 1.1984 (max= 2.3452), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:11,337 - root - INFO - Step 16130: lr=1.00E-05, loss= 1.1984 (max= 2.3452), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:11,337 - root - INFO - Step 16130: lr=1.00E-05, loss= 1.1984 (max= 2.3452), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:11,337 - root - INFO - Step 16130: lr=1.00E-05, loss= 1.1984 (max= 2.3452), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:11,337 - root - INFO - Step 16130: lr=1.00E-05, loss= 1.1984 (max= 2.3452), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:11,337 - root - INFO - Step 16130: lr=1.00E-05, loss= 1.1984 (max= 2.3452), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:11,337 - root - INFO - Step 16130: lr=1.00E-05, loss= 1.1984 (max= 2.3452), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:29,355 - root - INFO - Step 16140: lr=1.00E-05, loss= 1.2086 (max= 2.3866), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:29,356 - root - INFO - Step 16140: lr=1.00E-05, loss= 1.2086 (max= 2.3866), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:29,356 - root - INFO - Step 16140: lr=1.00E-05, loss= 1.2086 (max= 2.3866), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:29,356 - root - INFO - Step 16140: lr=1.00E-05, loss= 1.2086 (max= 2.3866), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:29,356 - root - INFO - Step 16140: lr=1.00E-05, loss= 1.2086 (max= 2.3866), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:29,356 - root - INFO - Step 16140: lr=1.00E-05, loss= 1.2086 (max= 2.3866), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:29,356 - root - INFO - Step 16140: lr=1.00E-05, loss= 1.2086 (max= 2.3866), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:29,356 - root - INFO - Step 16140: lr=1.00E-05, loss= 1.2086 (max= 2.3866), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:47,416 - root - INFO - Step 16150: lr=1.00E-05, loss= 1.2569 (max= 2.4510), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:47,416 - root - INFO - Step 16150: lr=1.00E-05, loss= 1.2569 (max= 2.4510), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:47,416 - root - INFO - Step 16150: lr=1.00E-05, loss= 1.2569 (max= 2.4510), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:47,417 - root - INFO - Step 16150: lr=1.00E-05, loss= 1.2569 (max= 2.4510), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:47,417 - root - INFO - Step 16150: lr=1.00E-05, loss= 1.2569 (max= 2.4510), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:47,417 - root - INFO - Step 16150: lr=1.00E-05, loss= 1.2569 (max= 2.4510), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:47,417 - root - INFO - Step 16150: lr=1.00E-05, loss= 1.2569 (max= 2.4510), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:34:47,417 - root - INFO - Step 16150: lr=1.00E-05, loss= 1.2569 (max= 2.4510), tps=18147, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:35:05,438 - root - INFO - Step 16160: lr=1.00E-05, loss= 1.2257 (max= 2.0531), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:05,438 - root - INFO - Step 16160: lr=1.00E-05, loss= 1.2257 (max= 2.0531), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:05,438 - root - INFO - Step 16160: lr=1.00E-05, loss= 1.2257 (max= 2.0531), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:05,438 - root - INFO - Step 16160: lr=1.00E-05, loss= 1.2257 (max= 2.0531), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:05,438 - root - INFO - Step 16160: lr=1.00E-05, loss= 1.2257 (max= 2.0531), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:05,438 - root - INFO - Step 16160: lr=1.00E-05, loss= 1.2257 (max= 2.0531), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:05,439 - root - INFO - Step 16160: lr=1.00E-05, loss= 1.2257 (max= 2.0531), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:05,439 - root - INFO - Step 16160: lr=1.00E-05, loss= 1.2257 (max= 2.0531), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:23,492 - root - INFO - Step 16170: lr=1.00E-05, loss= 1.2144 (max= 2.2657), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:35:23,492 - root - INFO - Step 16170: lr=1.00E-05, loss= 1.2144 (max= 2.2657), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:35:23,492 - root - INFO - Step 16170: lr=1.00E-05, loss= 1.2144 (max= 2.2657), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:35:23,492 - root - INFO - Step 16170: lr=1.00E-05, loss= 1.2144 (max= 2.2657), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:35:23,492 - root - INFO - Step 16170: lr=1.00E-05, loss= 1.2144 (max= 2.2657), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:35:23,492 - root - INFO - Step 16170: lr=1.00E-05, loss= 1.2144 (max= 2.2657), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:35:23,492 - root - INFO - Step 16170: lr=1.00E-05, loss= 1.2144 (max= 2.2657), tps=18154, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:35:23,492 - root - INFO - Step 16170: lr=1.00E-05, loss= 1.2144 (max= 2.2657), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:35:31,339 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:5954820 +2025-10-24 17:35:41,509 - root - INFO - Step 16180: lr=1.00E-05, loss= 1.2233 (max= 2.2570), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:41,509 - root - INFO - Step 16180: lr=1.00E-05, loss= 1.2233 (max= 2.2570), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:41,509 - root - INFO - Step 16180: lr=1.00E-05, loss= 1.2233 (max= 2.2570), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:41,509 - root - INFO - Step 16180: lr=1.00E-05, loss= 1.2233 (max= 2.2570), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:41,510 - root - INFO - Step 16180: lr=1.00E-05, loss= 1.2233 (max= 2.2570), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:41,510 - root - INFO - Step 16180: lr=1.00E-05, loss= 1.2233 (max= 2.2570), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:41,510 - root - INFO - Step 16180: lr=1.00E-05, loss= 1.2233 (max= 2.2570), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:41,510 - root - INFO - Step 16180: lr=1.00E-05, loss= 1.2233 (max= 2.2570), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:59,477 - root - INFO - Step 16190: lr=1.00E-05, loss= 1.2144 (max= 2.3106), tps=18240, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:59,477 - root - INFO - Step 16190: lr=1.00E-05, loss= 1.2144 (max= 2.3106), tps=18240, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:59,477 - root - INFO - Step 16190: lr=1.00E-05, loss= 1.2144 (max= 2.3106), tps=18240, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:59,477 - root - INFO - Step 16190: lr=1.00E-05, loss= 1.2144 (max= 2.3106), tps=18240, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:59,477 - root - INFO - Step 16190: lr=1.00E-05, loss= 1.2144 (max= 2.3106), tps=18240, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:59,477 - root - INFO - Step 16190: lr=1.00E-05, loss= 1.2144 (max= 2.3106), tps=18241, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:59,477 - root - INFO - Step 16190: lr=1.00E-05, loss= 1.2144 (max= 2.3106), tps=18241, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:35:59,477 - root - INFO - Step 16190: lr=1.00E-05, loss= 1.2144 (max= 2.3106), tps=18240, mfu=38.00%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:36:17,512 - root - INFO - Step 16200: lr=1.00E-05, loss= 1.1941 (max= 2.3195), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:17,513 - root - INFO - Step 16200: lr=1.00E-05, loss= 1.1941 (max= 2.3195), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:17,513 - root - INFO - Step 16200: lr=1.00E-05, loss= 1.1941 (max= 2.3195), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:17,513 - root - INFO - Step 16200: lr=1.00E-05, loss= 1.1941 (max= 2.3195), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:17,513 - root - INFO - Step 16200: lr=1.00E-05, loss= 1.1941 (max= 2.3195), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:17,513 - root - INFO - Step 16200: lr=1.00E-05, loss= 1.1941 (max= 2.3195), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:17,513 - root - INFO - Step 16200: lr=1.00E-05, loss= 1.1941 (max= 2.3195), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:17,513 - root - INFO - Step 16200: lr=1.00E-05, loss= 1.1941 (max= 2.3195), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:35,520 - root - INFO - Step 16210: lr=1.00E-05, loss= 1.2163 (max= 2.5469), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:35,520 - root - INFO - Step 16210: lr=1.00E-05, loss= 1.2163 (max= 2.5469), tps=18202, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:35,520 - root - INFO - Step 16210: lr=1.00E-05, loss= 1.2163 (max= 2.5469), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:35,520 - root - INFO - Step 16210: lr=1.00E-05, loss= 1.2163 (max= 2.5469), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:35,520 - root - INFO - Step 16210: lr=1.00E-05, loss= 1.2163 (max= 2.5469), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:35,520 - root - INFO - Step 16210: lr=1.00E-05, loss= 1.2163 (max= 2.5469), tps=18202, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:35,520 - root - INFO - Step 16210: lr=1.00E-05, loss= 1.2163 (max= 2.5469), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:35,521 - root - INFO - Step 16210: lr=1.00E-05, loss= 1.2163 (max= 2.5469), tps=18201, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:53,594 - root - INFO - Step 16220: lr=1.00E-05, loss= 1.2185 (max= 2.0265), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:53,594 - root - INFO - Step 16220: lr=1.00E-05, loss= 1.2185 (max= 2.0265), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:53,594 - root - INFO - Step 16220: lr=1.00E-05, loss= 1.2185 (max= 2.0265), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:53,594 - root - INFO - Step 16220: lr=1.00E-05, loss= 1.2185 (max= 2.0265), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:53,594 - root - INFO - Step 16220: lr=1.00E-05, loss= 1.2185 (max= 2.0265), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:53,594 - root - INFO - Step 16220: lr=1.00E-05, loss= 1.2185 (max= 2.0265), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:53,594 - root - INFO - Step 16220: lr=1.00E-05, loss= 1.2185 (max= 2.0265), tps=18135, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:36:53,594 - root - INFO - Step 16220: lr=1.00E-05, loss= 1.2185 (max= 2.0265), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:37:11,667 - root - INFO - Step 16230: lr=1.00E-05, loss= 1.2156 (max= 2.2270), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:11,667 - root - INFO - Step 16230: lr=1.00E-05, loss= 1.2156 (max= 2.2270), tps=18135, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:11,667 - root - INFO - Step 16230: lr=1.00E-05, loss= 1.2156 (max= 2.2270), tps=18135, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:11,667 - root - INFO - Step 16230: lr=1.00E-05, loss= 1.2156 (max= 2.2270), tps=18135, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:11,667 - root - INFO - Step 16230: lr=1.00E-05, loss= 1.2156 (max= 2.2270), tps=18135, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:11,667 - root - INFO - Step 16230: lr=1.00E-05, loss= 1.2156 (max= 2.2270), tps=18135, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:11,667 - root - INFO - Step 16230: lr=1.00E-05, loss= 1.2156 (max= 2.2270), tps=18135, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:11,670 - root - INFO - Step 16230: lr=1.00E-05, loss= 1.2156 (max= 2.2270), tps=18135, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:29,739 - root - INFO - Step 16240: lr=1.00E-05, loss= 1.1954 (max= 2.2088), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:29,739 - root - INFO - Step 16240: lr=1.00E-05, loss= 1.1954 (max= 2.2088), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:29,739 - root - INFO - Step 16240: lr=1.00E-05, loss= 1.1954 (max= 2.2088), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:29,739 - root - INFO - Step 16240: lr=1.00E-05, loss= 1.1954 (max= 2.2088), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:29,739 - root - INFO - Step 16240: lr=1.00E-05, loss= 1.1954 (max= 2.2088), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:29,739 - root - INFO - Step 16240: lr=1.00E-05, loss= 1.1954 (max= 2.2088), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:29,739 - root - INFO - Step 16240: lr=1.00E-05, loss= 1.1954 (max= 2.2088), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:29,739 - root - INFO - Step 16240: lr=1.00E-05, loss= 1.1954 (max= 2.2088), tps=18136, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:47,751 - root - INFO - Step 16250: lr=1.00E-05, loss= 1.2191 (max= 2.5176), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:47,751 - root - INFO - Step 16250: lr=1.00E-05, loss= 1.2191 (max= 2.5176), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:47,751 - root - INFO - Step 16250: lr=1.00E-05, loss= 1.2191 (max= 2.5176), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:47,751 - root - INFO - Step 16250: lr=1.00E-05, loss= 1.2191 (max= 2.5176), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:47,751 - root - INFO - Step 16250: lr=1.00E-05, loss= 1.2191 (max= 2.5176), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:47,751 - root - INFO - Step 16250: lr=1.00E-05, loss= 1.2191 (max= 2.5176), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:47,751 - root - INFO - Step 16250: lr=1.00E-05, loss= 1.2191 (max= 2.5176), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:37:47,751 - root - INFO - Step 16250: lr=1.00E-05, loss= 1.2191 (max= 2.5176), tps=18196, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:38:05,750 - root - INFO - Step 16260: lr=1.00E-05, loss= 1.1922 (max= 2.2223), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:38:05,750 - root - INFO - Step 16260: lr=1.00E-05, loss= 1.1922 (max= 2.2223), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:38:05,750 - root - INFO - Step 16260: lr=1.00E-05, loss= 1.1922 (max= 2.2223), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:38:05,750 - root - INFO - Step 16260: lr=1.00E-05, loss= 1.1922 (max= 2.2223), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:38:05,750 - root - INFO - Step 16260: lr=1.00E-05, loss= 1.1922 (max= 2.2223), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:38:05,750 - root - INFO - Step 16260: lr=1.00E-05, loss= 1.1922 (max= 2.2223), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:38:05,750 - root - INFO - Step 16260: lr=1.00E-05, loss= 1.1922 (max= 2.2223), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:38:05,750 - root - INFO - Step 16260: lr=1.00E-05, loss= 1.1922 (max= 2.2223), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:38:23,803 - root - INFO - Step 16270: lr=1.00E-05, loss= 1.2118 (max= 2.0946), tps=18154, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:38:23,803 - root - INFO - Step 16270: lr=1.00E-05, loss= 1.2118 (max= 2.0946), tps=18154, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:38:23,803 - root - INFO - Step 16270: lr=1.00E-05, loss= 1.2118 (max= 2.0946), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:38:23,803 - root - INFO - Step 16270: lr=1.00E-05, loss= 1.2118 (max= 2.0946), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:38:23,803 - root - INFO - Step 16270: lr=1.00E-05, loss= 1.2118 (max= 2.0946), tps=18154, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:38:23,803 - root - INFO - Step 16270: lr=1.00E-05, loss= 1.2118 (max= 2.0946), tps=18154, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:38:23,803 - root - INFO - Step 16270: lr=1.00E-05, loss= 1.2118 (max= 2.0946), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:38:23,804 - root - INFO - Step 16270: lr=1.00E-05, loss= 1.2118 (max= 2.0946), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:38:43,055 - root - INFO - Step 16280: lr=1.00E-05, loss= 1.2202 (max= 2.8289), tps=17025, mfu=35.47%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.04s, 7.49%) +2025-10-24 17:38:43,055 - root - INFO - Step 16280: lr=1.00E-05, loss= 1.2202 (max= 2.8289), tps=17025, mfu=35.47%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.04s, 7.49%) +2025-10-24 17:38:43,055 - root - INFO - Step 16280: lr=1.00E-05, loss= 1.2202 (max= 2.8289), tps=17025, mfu=35.47%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.04s, 7.49%) +2025-10-24 17:38:43,055 - root - INFO - Step 16280: lr=1.00E-05, loss= 1.2202 (max= 2.8289), tps=17024, mfu=35.47%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.04s, 7.49%) +2025-10-24 17:38:43,055 - root - INFO - Step 16280: lr=1.00E-05, loss= 1.2202 (max= 2.8289), tps=17024, mfu=35.47%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.04s, 7.49%) +2025-10-24 17:38:43,055 - root - INFO - Step 16280: lr=1.00E-05, loss= 1.2202 (max= 2.8289), tps=17025, mfu=35.47%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.04s, 7.49%) +2025-10-24 17:38:43,055 - root - INFO - Step 16280: lr=1.00E-05, loss= 1.2202 (max= 2.8289), tps=17025, mfu=35.47%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.04s, 7.49%) +2025-10-24 17:38:43,055 - root - INFO - Step 16280: lr=1.00E-05, loss= 1.2202 (max= 2.8289), tps=17025, mfu=35.47%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.04s, 7.49%) +2025-10-24 17:39:01,124 - root - INFO - Step 16290: lr=1.00E-05, loss= 1.2161 (max= 2.2600), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:01,124 - root - INFO - Step 16290: lr=1.00E-05, loss= 1.2161 (max= 2.2600), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:01,124 - root - INFO - Step 16290: lr=1.00E-05, loss= 1.2161 (max= 2.2600), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:01,124 - root - INFO - Step 16290: lr=1.00E-05, loss= 1.2161 (max= 2.2600), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:01,124 - root - INFO - Step 16290: lr=1.00E-05, loss= 1.2161 (max= 2.2600), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:01,124 - root - INFO - Step 16290: lr=1.00E-05, loss= 1.2161 (max= 2.2600), tps=18139, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:01,124 - root - INFO - Step 16290: lr=1.00E-05, loss= 1.2161 (max= 2.2600), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:01,124 - root - INFO - Step 16290: lr=1.00E-05, loss= 1.2161 (max= 2.2600), tps=18138, mfu=37.79%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:19,159 - root - INFO - Step 16300: lr=1.00E-05, loss= 1.2004 (max= 2.1283), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:19,159 - root - INFO - Step 16300: lr=1.00E-05, loss= 1.2004 (max= 2.1283), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:19,159 - root - INFO - Step 16300: lr=1.00E-05, loss= 1.2004 (max= 2.1283), tps=18174, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:19,159 - root - INFO - Step 16300: lr=1.00E-05, loss= 1.2004 (max= 2.1283), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:19,159 - root - INFO - Step 16300: lr=1.00E-05, loss= 1.2004 (max= 2.1283), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:19,159 - root - INFO - Step 16300: lr=1.00E-05, loss= 1.2004 (max= 2.1283), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:19,159 - root - INFO - Step 16300: lr=1.00E-05, loss= 1.2004 (max= 2.1283), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:19,160 - root - INFO - Step 16300: lr=1.00E-05, loss= 1.2004 (max= 2.1283), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:37,197 - root - INFO - Step 16310: lr=1.00E-05, loss= 1.2363 (max= 2.4314), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:37,197 - root - INFO - Step 16310: lr=1.00E-05, loss= 1.2363 (max= 2.4314), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:37,197 - root - INFO - Step 16310: lr=1.00E-05, loss= 1.2363 (max= 2.4314), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:37,197 - root - INFO - Step 16310: lr=1.00E-05, loss= 1.2363 (max= 2.4314), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:37,197 - root - INFO - Step 16310: lr=1.00E-05, loss= 1.2363 (max= 2.4314), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:37,197 - root - INFO - Step 16310: lr=1.00E-05, loss= 1.2363 (max= 2.4314), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:37,197 - root - INFO - Step 16310: lr=1.00E-05, loss= 1.2363 (max= 2.4314), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:37,197 - root - INFO - Step 16310: lr=1.00E-05, loss= 1.2363 (max= 2.4314), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:55,255 - root - INFO - Step 16320: lr=1.00E-05, loss= 1.2043 (max= 2.3855), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:55,255 - root - INFO - Step 16320: lr=1.00E-05, loss= 1.2043 (max= 2.3855), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:55,255 - root - INFO - Step 16320: lr=1.00E-05, loss= 1.2043 (max= 2.3855), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:55,255 - root - INFO - Step 16320: lr=1.00E-05, loss= 1.2043 (max= 2.3855), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:55,255 - root - INFO - Step 16320: lr=1.00E-05, loss= 1.2043 (max= 2.3855), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:55,255 - root - INFO - Step 16320: lr=1.00E-05, loss= 1.2043 (max= 2.3855), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:55,255 - root - INFO - Step 16320: lr=1.00E-05, loss= 1.2043 (max= 2.3855), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:39:55,255 - root - INFO - Step 16320: lr=1.00E-05, loss= 1.2043 (max= 2.3855), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:40:13,259 - root - INFO - Step 16330: lr=1.00E-05, loss= 1.2120 (max= 2.4637), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:40:13,260 - root - INFO - Step 16330: lr=1.00E-05, loss= 1.2120 (max= 2.4637), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:40:13,260 - root - INFO - Step 16330: lr=1.00E-05, loss= 1.2120 (max= 2.4637), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:40:13,260 - root - INFO - Step 16330: lr=1.00E-05, loss= 1.2120 (max= 2.4637), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:40:13,260 - root - INFO - Step 16330: lr=1.00E-05, loss= 1.2120 (max= 2.4637), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:40:13,260 - root - INFO - Step 16330: lr=1.00E-05, loss= 1.2120 (max= 2.4637), tps=18203, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:40:13,260 - root - INFO - Step 16330: lr=1.00E-05, loss= 1.2120 (max= 2.4637), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:40:13,260 - root - INFO - Step 16330: lr=1.00E-05, loss= 1.2120 (max= 2.4637), tps=18204, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:40:31,292 - root - INFO - Step 16340: lr=1.00E-05, loss= 1.2450 (max= 2.2756), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:40:31,292 - root - INFO - Step 16340: lr=1.00E-05, loss= 1.2450 (max= 2.2756), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:40:31,293 - root - INFO - Step 16340: lr=1.00E-05, loss= 1.2450 (max= 2.2756), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:40:31,293 - root - INFO - Step 16340: lr=1.00E-05, loss= 1.2450 (max= 2.2756), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:40:31,293 - root - INFO - Step 16340: lr=1.00E-05, loss= 1.2450 (max= 2.2756), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:40:31,293 - root - INFO - Step 16340: lr=1.00E-05, loss= 1.2450 (max= 2.2756), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:40:31,293 - root - INFO - Step 16340: lr=1.00E-05, loss= 1.2450 (max= 2.2756), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:40:31,293 - root - INFO - Step 16340: lr=1.00E-05, loss= 1.2450 (max= 2.2756), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:40:49,321 - root - INFO - Step 16350: lr=1.00E-05, loss= 1.2354 (max= 2.6431), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:40:49,321 - root - INFO - Step 16350: lr=1.00E-05, loss= 1.2354 (max= 2.6431), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:40:49,321 - root - INFO - Step 16350: lr=1.00E-05, loss= 1.2354 (max= 2.6431), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:40:49,321 - root - INFO - Step 16350: lr=1.00E-05, loss= 1.2354 (max= 2.6431), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:40:49,321 - root - INFO - Step 16350: lr=1.00E-05, loss= 1.2354 (max= 2.6431), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:40:49,321 - root - INFO - Step 16350: lr=1.00E-05, loss= 1.2354 (max= 2.6431), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:40:49,321 - root - INFO - Step 16350: lr=1.00E-05, loss= 1.2354 (max= 2.6431), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:40:49,322 - root - INFO - Step 16350: lr=1.00E-05, loss= 1.2354 (max= 2.6431), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:41:07,386 - root - INFO - Step 16360: lr=1.00E-05, loss= 1.2568 (max= 2.3844), tps=18142, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:41:07,386 - root - INFO - Step 16360: lr=1.00E-05, loss= 1.2568 (max= 2.3844), tps=18142, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:41:07,387 - root - INFO - Step 16360: lr=1.00E-05, loss= 1.2568 (max= 2.3844), tps=18142, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:41:07,387 - root - INFO - Step 16360: lr=1.00E-05, loss= 1.2568 (max= 2.3844), tps=18142, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:41:07,387 - root - INFO - Step 16360: lr=1.00E-05, loss= 1.2568 (max= 2.3844), tps=18142, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:41:07,387 - root - INFO - Step 16360: lr=1.00E-05, loss= 1.2568 (max= 2.3844), tps=18142, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:41:07,387 - root - INFO - Step 16360: lr=1.00E-05, loss= 1.2568 (max= 2.3844), tps=18143, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:41:07,387 - root - INFO - Step 16360: lr=1.00E-05, loss= 1.2568 (max= 2.3844), tps=18142, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:41:25,432 - root - INFO - Step 16370: lr=1.00E-05, loss= 1.1950 (max= 2.4487), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:41:25,432 - root - INFO - Step 16370: lr=1.00E-05, loss= 1.1950 (max= 2.4487), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:41:25,432 - root - INFO - Step 16370: lr=1.00E-05, loss= 1.1950 (max= 2.4487), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:41:25,432 - root - INFO - Step 16370: lr=1.00E-05, loss= 1.1950 (max= 2.4487), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:41:25,433 - root - INFO - Step 16370: lr=1.00E-05, loss= 1.1950 (max= 2.4487), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:41:25,433 - root - INFO - Step 16370: lr=1.00E-05, loss= 1.1950 (max= 2.4487), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:41:25,433 - root - INFO - Step 16370: lr=1.00E-05, loss= 1.1950 (max= 2.4487), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:41:25,433 - root - INFO - Step 16370: lr=1.00E-05, loss= 1.1950 (max= 2.4487), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:41:43,484 - root - INFO - Step 16380: lr=1.00E-05, loss= 1.2194 (max= 2.3359), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:41:43,485 - root - INFO - Step 16380: lr=1.00E-05, loss= 1.2194 (max= 2.3359), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:41:43,485 - root - INFO - Step 16380: lr=1.00E-05, loss= 1.2194 (max= 2.3359), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:41:43,485 - root - INFO - Step 16380: lr=1.00E-05, loss= 1.2194 (max= 2.3359), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:41:43,485 - root - INFO - Step 16380: lr=1.00E-05, loss= 1.2194 (max= 2.3359), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:41:43,485 - root - INFO - Step 16380: lr=1.00E-05, loss= 1.2194 (max= 2.3359), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:41:43,485 - root - INFO - Step 16380: lr=1.00E-05, loss= 1.2194 (max= 2.3359), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:41:43,485 - root - INFO - Step 16380: lr=1.00E-05, loss= 1.2194 (max= 2.3359), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:42:01,517 - root - INFO - Step 16390: lr=1.00E-05, loss= 1.2293 (max= 2.1131), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:42:01,517 - root - INFO - Step 16390: lr=1.00E-05, loss= 1.2293 (max= 2.1131), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:42:01,517 - root - INFO - Step 16390: lr=1.00E-05, loss= 1.2293 (max= 2.1131), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:42:01,517 - root - INFO - Step 16390: lr=1.00E-05, loss= 1.2293 (max= 2.1131), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:42:01,517 - root - INFO - Step 16390: lr=1.00E-05, loss= 1.2293 (max= 2.1131), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:42:01,517 - root - INFO - Step 16390: lr=1.00E-05, loss= 1.2293 (max= 2.1131), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:42:01,517 - root - INFO - Step 16390: lr=1.00E-05, loss= 1.2293 (max= 2.1131), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:42:01,517 - root - INFO - Step 16390: lr=1.00E-05, loss= 1.2293 (max= 2.1131), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:42:19,561 - root - INFO - Step 16400: lr=1.00E-05, loss= 1.1961 (max= 2.6788), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:42:19,561 - root - INFO - Step 16400: lr=1.00E-05, loss= 1.1961 (max= 2.6788), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:42:19,561 - root - INFO - Step 16400: lr=1.00E-05, loss= 1.1961 (max= 2.6788), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:42:19,562 - root - INFO - Step 16400: lr=1.00E-05, loss= 1.1961 (max= 2.6788), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:42:19,562 - root - INFO - Step 16400: lr=1.00E-05, loss= 1.1961 (max= 2.6788), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:42:19,562 - root - INFO - Step 16400: lr=1.00E-05, loss= 1.1961 (max= 2.6788), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:42:19,562 - root - INFO - Step 16400: lr=1.00E-05, loss= 1.1961 (max= 2.6788), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:42:19,562 - root - INFO - Step 16400: lr=1.00E-05, loss= 1.1961 (max= 2.6788), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:42:37,601 - root - INFO - Step 16410: lr=1.00E-05, loss= 1.1964 (max= 2.4031), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:42:37,601 - root - INFO - Step 16410: lr=1.00E-05, loss= 1.1964 (max= 2.4031), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:42:37,601 - root - INFO - Step 16410: lr=1.00E-05, loss= 1.1964 (max= 2.4031), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:42:37,601 - root - INFO - Step 16410: lr=1.00E-05, loss= 1.1964 (max= 2.4031), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:42:37,601 - root - INFO - Step 16410: lr=1.00E-05, loss= 1.1964 (max= 2.4031), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:42:37,601 - root - INFO - Step 16410: lr=1.00E-05, loss= 1.1964 (max= 2.4031), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:42:37,601 - root - INFO - Step 16410: lr=1.00E-05, loss= 1.1964 (max= 2.4031), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:42:37,601 - root - INFO - Step 16410: lr=1.00E-05, loss= 1.1964 (max= 2.4031), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:42:55,629 - root - INFO - Step 16420: lr=1.00E-05, loss= 1.2205 (max= 2.9792), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:42:55,629 - root - INFO - Step 16420: lr=1.00E-05, loss= 1.2205 (max= 2.9792), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:42:55,629 - root - INFO - Step 16420: lr=1.00E-05, loss= 1.2205 (max= 2.9792), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:42:55,629 - root - INFO - Step 16420: lr=1.00E-05, loss= 1.2205 (max= 2.9792), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:42:55,629 - root - INFO - Step 16420: lr=1.00E-05, loss= 1.2205 (max= 2.9792), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:42:55,629 - root - INFO - Step 16420: lr=1.00E-05, loss= 1.2205 (max= 2.9792), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:42:55,629 - root - INFO - Step 16420: lr=1.00E-05, loss= 1.2205 (max= 2.9792), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:42:55,630 - root - INFO - Step 16420: lr=1.00E-05, loss= 1.2205 (max= 2.9792), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:13,657 - root - INFO - Step 16430: lr=1.00E-05, loss= 1.2306 (max= 3.3509), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:13,657 - root - INFO - Step 16430: lr=1.00E-05, loss= 1.2306 (max= 3.3509), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:13,657 - root - INFO - Step 16430: lr=1.00E-05, loss= 1.2306 (max= 3.3509), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:13,657 - root - INFO - Step 16430: lr=1.00E-05, loss= 1.2306 (max= 3.3509), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:13,657 - root - INFO - Step 16430: lr=1.00E-05, loss= 1.2306 (max= 3.3509), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:13,657 - root - INFO - Step 16430: lr=1.00E-05, loss= 1.2306 (max= 3.3509), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:13,657 - root - INFO - Step 16430: lr=1.00E-05, loss= 1.2306 (max= 3.3509), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:13,661 - root - INFO - Step 16430: lr=1.00E-05, loss= 1.2306 (max= 3.3509), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:31,720 - root - INFO - Step 16440: lr=1.00E-05, loss= 1.2174 (max= 2.4064), tps=18145, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:31,720 - root - INFO - Step 16440: lr=1.00E-05, loss= 1.2174 (max= 2.4064), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:31,720 - root - INFO - Step 16440: lr=1.00E-05, loss= 1.2174 (max= 2.4064), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:31,720 - root - INFO - Step 16440: lr=1.00E-05, loss= 1.2174 (max= 2.4064), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:31,720 - root - INFO - Step 16440: lr=1.00E-05, loss= 1.2174 (max= 2.4064), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:31,720 - root - INFO - Step 16440: lr=1.00E-05, loss= 1.2174 (max= 2.4064), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:31,721 - root - INFO - Step 16440: lr=1.00E-05, loss= 1.2174 (max= 2.4064), tps=18144, mfu=37.80%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:31,721 - root - INFO - Step 16440: lr=1.00E-05, loss= 1.2174 (max= 2.4064), tps=18146, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:49,753 - root - INFO - Step 16450: lr=1.00E-05, loss= 1.1771 (max= 2.1510), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:49,753 - root - INFO - Step 16450: lr=1.00E-05, loss= 1.1771 (max= 2.1510), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:49,753 - root - INFO - Step 16450: lr=1.00E-05, loss= 1.1771 (max= 2.1510), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:49,754 - root - INFO - Step 16450: lr=1.00E-05, loss= 1.1771 (max= 2.1510), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:49,754 - root - INFO - Step 16450: lr=1.00E-05, loss= 1.1771 (max= 2.1510), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:49,754 - root - INFO - Step 16450: lr=1.00E-05, loss= 1.1771 (max= 2.1510), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:49,754 - root - INFO - Step 16450: lr=1.00E-05, loss= 1.1771 (max= 2.1510), tps=18175, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:43:49,754 - root - INFO - Step 16450: lr=1.00E-05, loss= 1.1771 (max= 2.1510), tps=18176, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:07,809 - root - INFO - Step 16460: lr=1.00E-05, loss= 1.1800 (max= 2.0077), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:07,809 - root - INFO - Step 16460: lr=1.00E-05, loss= 1.1800 (max= 2.0077), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:07,810 - root - INFO - Step 16460: lr=1.00E-05, loss= 1.1800 (max= 2.0077), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:07,810 - root - INFO - Step 16460: lr=1.00E-05, loss= 1.1800 (max= 2.0077), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:07,810 - root - INFO - Step 16460: lr=1.00E-05, loss= 1.1800 (max= 2.0077), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:07,810 - root - INFO - Step 16460: lr=1.00E-05, loss= 1.1800 (max= 2.0077), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:07,810 - root - INFO - Step 16460: lr=1.00E-05, loss= 1.1800 (max= 2.0077), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:07,810 - root - INFO - Step 16460: lr=1.00E-05, loss= 1.1800 (max= 2.0077), tps=18152, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:25,835 - root - INFO - Step 16470: lr=1.00E-05, loss= 1.2049 (max= 2.3715), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:25,835 - root - INFO - Step 16470: lr=1.00E-05, loss= 1.2049 (max= 2.3715), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:25,835 - root - INFO - Step 16470: lr=1.00E-05, loss= 1.2049 (max= 2.3715), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:25,835 - root - INFO - Step 16470: lr=1.00E-05, loss= 1.2049 (max= 2.3715), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:25,836 - root - INFO - Step 16470: lr=1.00E-05, loss= 1.2049 (max= 2.3715), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:25,836 - root - INFO - Step 16470: lr=1.00E-05, loss= 1.2049 (max= 2.3715), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:25,836 - root - INFO - Step 16470: lr=1.00E-05, loss= 1.2049 (max= 2.3715), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:25,836 - root - INFO - Step 16470: lr=1.00E-05, loss= 1.2049 (max= 2.3715), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:43,877 - root - INFO - Step 16480: lr=1.00E-05, loss= 1.2277 (max= 2.3038), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:43,877 - root - INFO - Step 16480: lr=1.00E-05, loss= 1.2277 (max= 2.3038), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:43,877 - root - INFO - Step 16480: lr=1.00E-05, loss= 1.2277 (max= 2.3038), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:43,878 - root - INFO - Step 16480: lr=1.00E-05, loss= 1.2277 (max= 2.3038), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:43,878 - root - INFO - Step 16480: lr=1.00E-05, loss= 1.2277 (max= 2.3038), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:43,878 - root - INFO - Step 16480: lr=1.00E-05, loss= 1.2277 (max= 2.3038), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:43,878 - root - INFO - Step 16480: lr=1.00E-05, loss= 1.2277 (max= 2.3038), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:44:43,878 - root - INFO - Step 16480: lr=1.00E-05, loss= 1.2277 (max= 2.3038), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:45:01,886 - root - INFO - Step 16490: lr=1.00E-05, loss= 1.1888 (max= 2.0347), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:01,887 - root - INFO - Step 16490: lr=1.00E-05, loss= 1.1888 (max= 2.0347), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:01,887 - root - INFO - Step 16490: lr=1.00E-05, loss= 1.1888 (max= 2.0347), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:01,887 - root - INFO - Step 16490: lr=1.00E-05, loss= 1.1888 (max= 2.0347), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:01,887 - root - INFO - Step 16490: lr=1.00E-05, loss= 1.1888 (max= 2.0347), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:01,887 - root - INFO - Step 16490: lr=1.00E-05, loss= 1.1888 (max= 2.0347), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:01,887 - root - INFO - Step 16490: lr=1.00E-05, loss= 1.1888 (max= 2.0347), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:01,887 - root - INFO - Step 16490: lr=1.00E-05, loss= 1.1888 (max= 2.0347), tps=18199, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:19,939 - root - INFO - Step 16500: lr=1.00E-05, loss= 1.2333 (max= 2.3308), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:19,939 - root - INFO - Step 16500: lr=1.00E-05, loss= 1.2333 (max= 2.3308), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:19,939 - root - INFO - Step 16500: lr=1.00E-05, loss= 1.2333 (max= 2.3308), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:19,939 - root - INFO - Step 16500: lr=1.00E-05, loss= 1.2333 (max= 2.3308), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:19,939 - root - INFO - Step 16500: lr=1.00E-05, loss= 1.2333 (max= 2.3308), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:19,939 - root - INFO - Step 16500: lr=1.00E-05, loss= 1.2333 (max= 2.3308), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:19,939 - root - INFO - Step 16500: lr=1.00E-05, loss= 1.2333 (max= 2.3308), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:19,939 - root - INFO - Step 16500: lr=1.00E-05, loss= 1.2333 (max= 2.3308), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:37,954 - root - INFO - Step 16510: lr=1.00E-05, loss= 1.2063 (max= 2.3461), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:45:37,954 - root - INFO - Step 16510: lr=1.00E-05, loss= 1.2063 (max= 2.3461), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:45:37,954 - root - INFO - Step 16510: lr=1.00E-05, loss= 1.2063 (max= 2.3461), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:45:37,954 - root - INFO - Step 16510: lr=1.00E-05, loss= 1.2063 (max= 2.3461), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:45:37,954 - root - INFO - Step 16510: lr=1.00E-05, loss= 1.2063 (max= 2.3461), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:45:37,954 - root - INFO - Step 16510: lr=1.00E-05, loss= 1.2063 (max= 2.3461), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:45:37,954 - root - INFO - Step 16510: lr=1.00E-05, loss= 1.2063 (max= 2.3461), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:45:37,955 - root - INFO - Step 16510: lr=1.00E-05, loss= 1.2063 (max= 2.3461), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:45:55,979 - root - INFO - Step 16520: lr=1.00E-05, loss= 1.1891 (max= 2.0848), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:55,979 - root - INFO - Step 16520: lr=1.00E-05, loss= 1.1891 (max= 2.0848), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:55,980 - root - INFO - Step 16520: lr=1.00E-05, loss= 1.1891 (max= 2.0848), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:55,980 - root - INFO - Step 16520: lr=1.00E-05, loss= 1.1891 (max= 2.0848), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:55,980 - root - INFO - Step 16520: lr=1.00E-05, loss= 1.1891 (max= 2.0848), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:55,980 - root - INFO - Step 16520: lr=1.00E-05, loss= 1.1891 (max= 2.0848), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:55,980 - root - INFO - Step 16520: lr=1.00E-05, loss= 1.1891 (max= 2.0848), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:45:55,980 - root - INFO - Step 16520: lr=1.00E-05, loss= 1.1891 (max= 2.0848), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:46:14,024 - root - INFO - Step 16530: lr=1.00E-05, loss= 1.2345 (max= 2.2905), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:46:14,024 - root - INFO - Step 16530: lr=1.00E-05, loss= 1.2345 (max= 2.2905), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:46:14,024 - root - INFO - Step 16530: lr=1.00E-05, loss= 1.2345 (max= 2.2905), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:46:14,024 - root - INFO - Step 16530: lr=1.00E-05, loss= 1.2345 (max= 2.2905), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:46:14,024 - root - INFO - Step 16530: lr=1.00E-05, loss= 1.2345 (max= 2.2905), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:46:14,024 - root - INFO - Step 16530: lr=1.00E-05, loss= 1.2345 (max= 2.2905), tps=18163, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:46:14,024 - root - INFO - Step 16530: lr=1.00E-05, loss= 1.2345 (max= 2.2905), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:46:14,025 - root - INFO - Step 16530: lr=1.00E-05, loss= 1.2345 (max= 2.2905), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:46:32,063 - root - INFO - Step 16540: lr=1.00E-05, loss= 1.1856 (max= 1.9900), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:46:32,063 - root - INFO - Step 16540: lr=1.00E-05, loss= 1.1856 (max= 1.9900), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:46:32,063 - root - INFO - Step 16540: lr=1.00E-05, loss= 1.1856 (max= 1.9900), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:46:32,063 - root - INFO - Step 16540: lr=1.00E-05, loss= 1.1856 (max= 1.9900), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:46:32,063 - root - INFO - Step 16540: lr=1.00E-05, loss= 1.1856 (max= 1.9900), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:46:32,063 - root - INFO - Step 16540: lr=1.00E-05, loss= 1.1856 (max= 1.9900), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:46:32,063 - root - INFO - Step 16540: lr=1.00E-05, loss= 1.1856 (max= 1.9900), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:46:32,063 - root - INFO - Step 16540: lr=1.00E-05, loss= 1.1856 (max= 1.9900), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:46:50,103 - root - INFO - Step 16550: lr=1.00E-05, loss= 1.2185 (max= 2.6180), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:46:50,104 - root - INFO - Step 16550: lr=1.00E-05, loss= 1.2185 (max= 2.6180), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:46:50,104 - root - INFO - Step 16550: lr=1.00E-05, loss= 1.2185 (max= 2.6180), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:46:50,104 - root - INFO - Step 16550: lr=1.00E-05, loss= 1.2185 (max= 2.6180), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:46:50,104 - root - INFO - Step 16550: lr=1.00E-05, loss= 1.2185 (max= 2.6180), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:46:50,104 - root - INFO - Step 16550: lr=1.00E-05, loss= 1.2185 (max= 2.6180), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:46:50,104 - root - INFO - Step 16550: lr=1.00E-05, loss= 1.2185 (max= 2.6180), tps=18166, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:46:50,104 - root - INFO - Step 16550: lr=1.00E-05, loss= 1.2185 (max= 2.6180), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:47:08,118 - root - INFO - Step 16560: lr=1.00E-05, loss= 1.1639 (max= 2.1654), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:47:08,118 - root - INFO - Step 16560: lr=1.00E-05, loss= 1.1639 (max= 2.1654), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:47:08,118 - root - INFO - Step 16560: lr=1.00E-05, loss= 1.1639 (max= 2.1654), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:47:08,118 - root - INFO - Step 16560: lr=1.00E-05, loss= 1.1639 (max= 2.1654), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:47:08,118 - root - INFO - Step 16560: lr=1.00E-05, loss= 1.1639 (max= 2.1654), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:47:08,118 - root - INFO - Step 16560: lr=1.00E-05, loss= 1.1639 (max= 2.1654), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:47:08,118 - root - INFO - Step 16560: lr=1.00E-05, loss= 1.1639 (max= 2.1654), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:47:08,118 - root - INFO - Step 16560: lr=1.00E-05, loss= 1.1639 (max= 2.1654), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:47:26,142 - root - INFO - Step 16570: lr=1.00E-05, loss= 1.2280 (max= 2.3241), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:47:26,142 - root - INFO - Step 16570: lr=1.00E-05, loss= 1.2280 (max= 2.3241), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:47:26,142 - root - INFO - Step 16570: lr=1.00E-05, loss= 1.2280 (max= 2.3241), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:47:26,142 - root - INFO - Step 16570: lr=1.00E-05, loss= 1.2280 (max= 2.3241), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:47:26,142 - root - INFO - Step 16570: lr=1.00E-05, loss= 1.2280 (max= 2.3241), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:47:26,142 - root - INFO - Step 16570: lr=1.00E-05, loss= 1.2280 (max= 2.3241), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:47:26,142 - root - INFO - Step 16570: lr=1.00E-05, loss= 1.2280 (max= 2.3241), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:47:26,142 - root - INFO - Step 16570: lr=1.00E-05, loss= 1.2280 (max= 2.3241), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:47:44,161 - root - INFO - Step 16580: lr=1.00E-05, loss= 1.2097 (max= 2.3240), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:47:44,162 - root - INFO - Step 16580: lr=1.00E-05, loss= 1.2097 (max= 2.3240), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:47:44,162 - root - INFO - Step 16580: lr=1.00E-05, loss= 1.2097 (max= 2.3240), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:47:44,162 - root - INFO - Step 16580: lr=1.00E-05, loss= 1.2097 (max= 2.3240), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:47:44,162 - root - INFO - Step 16580: lr=1.00E-05, loss= 1.2097 (max= 2.3240), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:47:44,162 - root - INFO - Step 16580: lr=1.00E-05, loss= 1.2097 (max= 2.3240), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:47:44,162 - root - INFO - Step 16580: lr=1.00E-05, loss= 1.2097 (max= 2.3240), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:47:44,162 - root - INFO - Step 16580: lr=1.00E-05, loss= 1.2097 (max= 2.3240), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:48:02,237 - root - INFO - Step 16590: lr=1.00E-05, loss= 1.1938 (max= 2.4082), tps=18132, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:48:02,238 - root - INFO - Step 16590: lr=1.00E-05, loss= 1.1938 (max= 2.4082), tps=18132, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:48:02,238 - root - INFO - Step 16590: lr=1.00E-05, loss= 1.1938 (max= 2.4082), tps=18132, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:48:02,238 - root - INFO - Step 16590: lr=1.00E-05, loss= 1.1938 (max= 2.4082), tps=18132, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:48:02,238 - root - INFO - Step 16590: lr=1.00E-05, loss= 1.1938 (max= 2.4082), tps=18132, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:48:02,238 - root - INFO - Step 16590: lr=1.00E-05, loss= 1.1938 (max= 2.4082), tps=18132, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:48:02,238 - root - INFO - Step 16590: lr=1.00E-05, loss= 1.1938 (max= 2.4082), tps=18131, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:48:02,238 - root - INFO - Step 16590: lr=1.00E-05, loss= 1.1938 (max= 2.4082), tps=18132, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:48:20,258 - root - INFO - Step 16600: lr=1.00E-05, loss= 1.2128 (max= 2.4336), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:48:20,258 - root - INFO - Step 16600: lr=1.00E-05, loss= 1.2128 (max= 2.4336), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:48:20,258 - root - INFO - Step 16600: lr=1.00E-05, loss= 1.2128 (max= 2.4336), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:48:20,258 - root - INFO - Step 16600: lr=1.00E-05, loss= 1.2128 (max= 2.4336), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:48:20,258 - root - INFO - Step 16600: lr=1.00E-05, loss= 1.2128 (max= 2.4336), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:48:20,258 - root - INFO - Step 16600: lr=1.00E-05, loss= 1.2128 (max= 2.4336), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:48:20,259 - root - INFO - Step 16600: lr=1.00E-05, loss= 1.2128 (max= 2.4336), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:48:20,259 - root - INFO - Step 16600: lr=1.00E-05, loss= 1.2128 (max= 2.4336), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:48:38,310 - root - INFO - Step 16610: lr=1.00E-05, loss= 1.2138 (max= 2.2300), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:48:38,310 - root - INFO - Step 16610: lr=1.00E-05, loss= 1.2138 (max= 2.2300), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:48:38,310 - root - INFO - Step 16610: lr=1.00E-05, loss= 1.2138 (max= 2.2300), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:48:38,310 - root - INFO - Step 16610: lr=1.00E-05, loss= 1.2138 (max= 2.2300), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:48:38,310 - root - INFO - Step 16610: lr=1.00E-05, loss= 1.2138 (max= 2.2300), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:48:38,310 - root - INFO - Step 16610: lr=1.00E-05, loss= 1.2138 (max= 2.2300), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:48:38,310 - root - INFO - Step 16610: lr=1.00E-05, loss= 1.2138 (max= 2.2300), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:48:38,310 - root - INFO - Step 16610: lr=1.00E-05, loss= 1.2138 (max= 2.2300), tps=18155, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:48:56,354 - root - INFO - Step 16620: lr=1.00E-05, loss= 1.2054 (max= 2.0443), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:48:56,355 - root - INFO - Step 16620: lr=1.00E-05, loss= 1.2054 (max= 2.0443), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:48:56,355 - root - INFO - Step 16620: lr=1.00E-05, loss= 1.2054 (max= 2.0443), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:48:56,355 - root - INFO - Step 16620: lr=1.00E-05, loss= 1.2054 (max= 2.0443), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:48:56,355 - root - INFO - Step 16620: lr=1.00E-05, loss= 1.2054 (max= 2.0443), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:48:56,355 - root - INFO - Step 16620: lr=1.00E-05, loss= 1.2054 (max= 2.0443), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:48:56,355 - root - INFO - Step 16620: lr=1.00E-05, loss= 1.2054 (max= 2.0443), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:48:56,358 - root - INFO - Step 16620: lr=1.00E-05, loss= 1.2054 (max= 2.0443), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:48:56,366 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:1822030 +2025-10-24 17:49:14,356 - root - INFO - Step 16630: lr=1.00E-05, loss= 1.2238 (max= 2.3405), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:49:14,356 - root - INFO - Step 16630: lr=1.00E-05, loss= 1.2238 (max= 2.3405), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:49:14,356 - root - INFO - Step 16630: lr=1.00E-05, loss= 1.2238 (max= 2.3405), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:49:14,356 - root - INFO - Step 16630: lr=1.00E-05, loss= 1.2238 (max= 2.3405), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:49:14,356 - root - INFO - Step 16630: lr=1.00E-05, loss= 1.2238 (max= 2.3405), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:49:14,357 - root - INFO - Step 16630: lr=1.00E-05, loss= 1.2238 (max= 2.3405), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:49:14,357 - root - INFO - Step 16630: lr=1.00E-05, loss= 1.2238 (max= 2.3405), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:49:14,357 - root - INFO - Step 16630: lr=1.00E-05, loss= 1.2238 (max= 2.3405), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:49:32,412 - root - INFO - Step 16640: lr=1.00E-05, loss= 1.2452 (max= 2.3227), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:49:32,413 - root - INFO - Step 16640: lr=1.00E-05, loss= 1.2452 (max= 2.3227), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:49:32,413 - root - INFO - Step 16640: lr=1.00E-05, loss= 1.2452 (max= 2.3227), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:49:32,413 - root - INFO - Step 16640: lr=1.00E-05, loss= 1.2452 (max= 2.3227), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:49:32,413 - root - INFO - Step 16640: lr=1.00E-05, loss= 1.2452 (max= 2.3227), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:49:32,413 - root - INFO - Step 16640: lr=1.00E-05, loss= 1.2452 (max= 2.3227), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:49:32,413 - root - INFO - Step 16640: lr=1.00E-05, loss= 1.2452 (max= 2.3227), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:49:32,413 - root - INFO - Step 16640: lr=1.00E-05, loss= 1.2452 (max= 2.3227), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:49:50,402 - root - INFO - Step 16650: lr=1.00E-05, loss= 1.1854 (max= 2.3228), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:49:50,402 - root - INFO - Step 16650: lr=1.00E-05, loss= 1.1854 (max= 2.3228), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:49:50,402 - root - INFO - Step 16650: lr=1.00E-05, loss= 1.1854 (max= 2.3228), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:49:50,402 - root - INFO - Step 16650: lr=1.00E-05, loss= 1.1854 (max= 2.3228), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:49:50,402 - root - INFO - Step 16650: lr=1.00E-05, loss= 1.1854 (max= 2.3228), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:49:50,402 - root - INFO - Step 16650: lr=1.00E-05, loss= 1.1854 (max= 2.3228), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:49:50,402 - root - INFO - Step 16650: lr=1.00E-05, loss= 1.1854 (max= 2.3228), tps=18219, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:49:50,402 - root - INFO - Step 16650: lr=1.00E-05, loss= 1.1854 (max= 2.3228), tps=18220, mfu=37.96%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:08,417 - root - INFO - Step 16660: lr=1.00E-05, loss= 1.1999 (max= 2.0348), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:08,417 - root - INFO - Step 16660: lr=1.00E-05, loss= 1.1999 (max= 2.0348), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:08,417 - root - INFO - Step 16660: lr=1.00E-05, loss= 1.1999 (max= 2.0348), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:08,417 - root - INFO - Step 16660: lr=1.00E-05, loss= 1.1999 (max= 2.0348), tps=18193, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:08,418 - root - INFO - Step 16660: lr=1.00E-05, loss= 1.1999 (max= 2.0348), tps=18193, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:08,418 - root - INFO - Step 16660: lr=1.00E-05, loss= 1.1999 (max= 2.0348), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:08,418 - root - INFO - Step 16660: lr=1.00E-05, loss= 1.1999 (max= 2.0348), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:08,418 - root - INFO - Step 16660: lr=1.00E-05, loss= 1.1999 (max= 2.0348), tps=18192, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:26,476 - root - INFO - Step 16670: lr=1.00E-05, loss= 1.1942 (max= 2.0581), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:26,476 - root - INFO - Step 16670: lr=1.00E-05, loss= 1.1942 (max= 2.0581), tps=18148, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:26,476 - root - INFO - Step 16670: lr=1.00E-05, loss= 1.1942 (max= 2.0581), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:26,476 - root - INFO - Step 16670: lr=1.00E-05, loss= 1.1942 (max= 2.0581), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:26,476 - root - INFO - Step 16670: lr=1.00E-05, loss= 1.1942 (max= 2.0581), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:26,477 - root - INFO - Step 16670: lr=1.00E-05, loss= 1.1942 (max= 2.0581), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:26,477 - root - INFO - Step 16670: lr=1.00E-05, loss= 1.1942 (max= 2.0581), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:26,477 - root - INFO - Step 16670: lr=1.00E-05, loss= 1.1942 (max= 2.0581), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:44,497 - root - INFO - Step 16680: lr=1.00E-05, loss= 1.1992 (max= 2.0294), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:44,497 - root - INFO - Step 16680: lr=1.00E-05, loss= 1.1992 (max= 2.0294), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:44,497 - root - INFO - Step 16680: lr=1.00E-05, loss= 1.1992 (max= 2.0294), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:44,497 - root - INFO - Step 16680: lr=1.00E-05, loss= 1.1992 (max= 2.0294), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:44,498 - root - INFO - Step 16680: lr=1.00E-05, loss= 1.1992 (max= 2.0294), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:44,498 - root - INFO - Step 16680: lr=1.00E-05, loss= 1.1992 (max= 2.0294), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:44,498 - root - INFO - Step 16680: lr=1.00E-05, loss= 1.1992 (max= 2.0294), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:50:44,498 - root - INFO - Step 16680: lr=1.00E-05, loss= 1.1992 (max= 2.0294), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:51:02,537 - root - INFO - Step 16690: lr=1.00E-05, loss= 1.2430 (max= 2.7486), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:02,538 - root - INFO - Step 16690: lr=1.00E-05, loss= 1.2430 (max= 2.7486), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:02,538 - root - INFO - Step 16690: lr=1.00E-05, loss= 1.2430 (max= 2.7486), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:02,538 - root - INFO - Step 16690: lr=1.00E-05, loss= 1.2430 (max= 2.7486), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:02,538 - root - INFO - Step 16690: lr=1.00E-05, loss= 1.2430 (max= 2.7486), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:02,538 - root - INFO - Step 16690: lr=1.00E-05, loss= 1.2430 (max= 2.7486), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:02,538 - root - INFO - Step 16690: lr=1.00E-05, loss= 1.2430 (max= 2.7486), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:02,538 - root - INFO - Step 16690: lr=1.00E-05, loss= 1.2430 (max= 2.7486), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:20,586 - root - INFO - Step 16700: lr=1.00E-05, loss= 1.2013 (max= 2.3262), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:20,586 - root - INFO - Step 16700: lr=1.00E-05, loss= 1.2013 (max= 2.3262), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:20,586 - root - INFO - Step 16700: lr=1.00E-05, loss= 1.2013 (max= 2.3262), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:20,586 - root - INFO - Step 16700: lr=1.00E-05, loss= 1.2013 (max= 2.3262), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:20,586 - root - INFO - Step 16700: lr=1.00E-05, loss= 1.2013 (max= 2.3262), tps=18160, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:20,586 - root - INFO - Step 16700: lr=1.00E-05, loss= 1.2013 (max= 2.3262), tps=18159, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:20,586 - root - INFO - Step 16700: lr=1.00E-05, loss= 1.2013 (max= 2.3262), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:20,587 - root - INFO - Step 16700: lr=1.00E-05, loss= 1.2013 (max= 2.3262), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:38,603 - root - INFO - Step 16710: lr=1.00E-05, loss= 1.2296 (max= 2.3271), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:51:38,603 - root - INFO - Step 16710: lr=1.00E-05, loss= 1.2296 (max= 2.3271), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:51:38,603 - root - INFO - Step 16710: lr=1.00E-05, loss= 1.2296 (max= 2.3271), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:51:38,604 - root - INFO - Step 16710: lr=1.00E-05, loss= 1.2296 (max= 2.3271), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:51:38,604 - root - INFO - Step 16710: lr=1.00E-05, loss= 1.2296 (max= 2.3271), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:51:38,604 - root - INFO - Step 16710: lr=1.00E-05, loss= 1.2296 (max= 2.3271), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:51:38,604 - root - INFO - Step 16710: lr=1.00E-05, loss= 1.2296 (max= 2.3271), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:51:38,604 - root - INFO - Step 16710: lr=1.00E-05, loss= 1.2296 (max= 2.3271), tps=18191, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:51:56,624 - root - INFO - Step 16720: lr=1.00E-05, loss= 1.2198 (max= 2.1481), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:56,624 - root - INFO - Step 16720: lr=1.00E-05, loss= 1.2198 (max= 2.1481), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:56,624 - root - INFO - Step 16720: lr=1.00E-05, loss= 1.2198 (max= 2.1481), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:56,624 - root - INFO - Step 16720: lr=1.00E-05, loss= 1.2198 (max= 2.1481), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:56,624 - root - INFO - Step 16720: lr=1.00E-05, loss= 1.2198 (max= 2.1481), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:56,624 - root - INFO - Step 16720: lr=1.00E-05, loss= 1.2198 (max= 2.1481), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:56,624 - root - INFO - Step 16720: lr=1.00E-05, loss= 1.2198 (max= 2.1481), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:51:56,625 - root - INFO - Step 16720: lr=1.00E-05, loss= 1.2198 (max= 2.1481), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:52:14,676 - root - INFO - Step 16730: lr=1.00E-05, loss= 1.2459 (max= 2.4156), tps=18157, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:52:14,676 - root - INFO - Step 16730: lr=1.00E-05, loss= 1.2459 (max= 2.4156), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:52:14,676 - root - INFO - Step 16730: lr=1.00E-05, loss= 1.2459 (max= 2.4156), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:52:14,676 - root - INFO - Step 16730: lr=1.00E-05, loss= 1.2459 (max= 2.4156), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:52:14,677 - root - INFO - Step 16730: lr=1.00E-05, loss= 1.2459 (max= 2.4156), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:52:14,677 - root - INFO - Step 16730: lr=1.00E-05, loss= 1.2459 (max= 2.4156), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:52:14,677 - root - INFO - Step 16730: lr=1.00E-05, loss= 1.2459 (max= 2.4156), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:52:14,678 - root - INFO - Step 16730: lr=1.00E-05, loss= 1.2459 (max= 2.4156), tps=18156, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:52:32,700 - root - INFO - Step 16740: lr=1.00E-05, loss= 1.1795 (max= 2.5781), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:52:32,700 - root - INFO - Step 16740: lr=1.00E-05, loss= 1.1795 (max= 2.5781), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:52:32,700 - root - INFO - Step 16740: lr=1.00E-05, loss= 1.1795 (max= 2.5781), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:52:32,700 - root - INFO - Step 16740: lr=1.00E-05, loss= 1.1795 (max= 2.5781), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:52:32,700 - root - INFO - Step 16740: lr=1.00E-05, loss= 1.1795 (max= 2.5781), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:52:32,700 - root - INFO - Step 16740: lr=1.00E-05, loss= 1.1795 (max= 2.5781), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:52:32,700 - root - INFO - Step 16740: lr=1.00E-05, loss= 1.1795 (max= 2.5781), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:52:32,700 - root - INFO - Step 16740: lr=1.00E-05, loss= 1.1795 (max= 2.5781), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:52:50,740 - root - INFO - Step 16750: lr=1.00E-05, loss= 1.1917 (max= 2.1624), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:52:50,740 - root - INFO - Step 16750: lr=1.00E-05, loss= 1.1917 (max= 2.1624), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:52:50,740 - root - INFO - Step 16750: lr=1.00E-05, loss= 1.1917 (max= 2.1624), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:52:50,740 - root - INFO - Step 16750: lr=1.00E-05, loss= 1.1917 (max= 2.1624), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:52:50,740 - root - INFO - Step 16750: lr=1.00E-05, loss= 1.1917 (max= 2.1624), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:52:50,740 - root - INFO - Step 16750: lr=1.00E-05, loss= 1.1917 (max= 2.1624), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:52:50,740 - root - INFO - Step 16750: lr=1.00E-05, loss= 1.1917 (max= 2.1624), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:52:50,740 - root - INFO - Step 16750: lr=1.00E-05, loss= 1.1917 (max= 2.1624), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:53:08,765 - root - INFO - Step 16760: lr=1.00E-05, loss= 1.1990 (max= 2.5729), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:53:08,765 - root - INFO - Step 16760: lr=1.00E-05, loss= 1.1990 (max= 2.5729), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:53:08,765 - root - INFO - Step 16760: lr=1.00E-05, loss= 1.1990 (max= 2.5729), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:53:08,765 - root - INFO - Step 16760: lr=1.00E-05, loss= 1.1990 (max= 2.5729), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:53:08,765 - root - INFO - Step 16760: lr=1.00E-05, loss= 1.1990 (max= 2.5729), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:53:08,765 - root - INFO - Step 16760: lr=1.00E-05, loss= 1.1990 (max= 2.5729), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:53:08,765 - root - INFO - Step 16760: lr=1.00E-05, loss= 1.1990 (max= 2.5729), tps=18183, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:53:08,766 - root - INFO - Step 16760: lr=1.00E-05, loss= 1.1990 (max= 2.5729), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:53:26,803 - root - INFO - Step 16770: lr=1.00E-05, loss= 1.1681 (max= 2.4409), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:53:26,803 - root - INFO - Step 16770: lr=1.00E-05, loss= 1.1681 (max= 2.4409), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:53:26,803 - root - INFO - Step 16770: lr=1.00E-05, loss= 1.1681 (max= 2.4409), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:53:26,803 - root - INFO - Step 16770: lr=1.00E-05, loss= 1.1681 (max= 2.4409), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:53:26,803 - root - INFO - Step 16770: lr=1.00E-05, loss= 1.1681 (max= 2.4409), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:53:26,803 - root - INFO - Step 16770: lr=1.00E-05, loss= 1.1681 (max= 2.4409), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:53:26,803 - root - INFO - Step 16770: lr=1.00E-05, loss= 1.1681 (max= 2.4409), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:53:26,804 - root - INFO - Step 16770: lr=1.00E-05, loss= 1.1681 (max= 2.4409), tps=18170, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:53:44,822 - root - INFO - Step 16780: lr=1.00E-05, loss= 1.2223 (max= 2.1343), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:53:44,822 - root - INFO - Step 16780: lr=1.00E-05, loss= 1.2223 (max= 2.1343), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:53:44,822 - root - INFO - Step 16780: lr=1.00E-05, loss= 1.2223 (max= 2.1343), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:53:44,822 - root - INFO - Step 16780: lr=1.00E-05, loss= 1.2223 (max= 2.1343), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:53:44,822 - root - INFO - Step 16780: lr=1.00E-05, loss= 1.2223 (max= 2.1343), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:53:44,822 - root - INFO - Step 16780: lr=1.00E-05, loss= 1.2223 (max= 2.1343), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:53:44,822 - root - INFO - Step 16780: lr=1.00E-05, loss= 1.2223 (max= 2.1343), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:53:44,822 - root - INFO - Step 16780: lr=1.00E-05, loss= 1.2223 (max= 2.1343), tps=18190, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:02,835 - root - INFO - Step 16790: lr=1.00E-05, loss= 1.2307 (max= 2.2391), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:02,835 - root - INFO - Step 16790: lr=1.00E-05, loss= 1.2307 (max= 2.2391), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:02,836 - root - INFO - Step 16790: lr=1.00E-05, loss= 1.2307 (max= 2.2391), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:02,836 - root - INFO - Step 16790: lr=1.00E-05, loss= 1.2307 (max= 2.2391), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:02,836 - root - INFO - Step 16790: lr=1.00E-05, loss= 1.2307 (max= 2.2391), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:02,836 - root - INFO - Step 16790: lr=1.00E-05, loss= 1.2307 (max= 2.2391), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:02,836 - root - INFO - Step 16790: lr=1.00E-05, loss= 1.2307 (max= 2.2391), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:02,836 - root - INFO - Step 16790: lr=1.00E-05, loss= 1.2307 (max= 2.2391), tps=18194, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:20,875 - root - INFO - Step 16800: lr=1.00E-05, loss= 1.2182 (max= 2.2743), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:20,875 - root - INFO - Step 16800: lr=1.00E-05, loss= 1.2182 (max= 2.2743), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:20,875 - root - INFO - Step 16800: lr=1.00E-05, loss= 1.2182 (max= 2.2743), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:20,875 - root - INFO - Step 16800: lr=1.00E-05, loss= 1.2182 (max= 2.2743), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:20,875 - root - INFO - Step 16800: lr=1.00E-05, loss= 1.2182 (max= 2.2743), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:20,875 - root - INFO - Step 16800: lr=1.00E-05, loss= 1.2182 (max= 2.2743), tps=18167, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:20,875 - root - INFO - Step 16800: lr=1.00E-05, loss= 1.2182 (max= 2.2743), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:20,876 - root - INFO - Step 16800: lr=1.00E-05, loss= 1.2182 (max= 2.2743), tps=18168, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:38,899 - root - INFO - Step 16810: lr=1.00E-05, loss= 1.2259 (max= 2.1797), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:54:38,899 - root - INFO - Step 16810: lr=1.00E-05, loss= 1.2259 (max= 2.1797), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:54:38,900 - root - INFO - Step 16810: lr=1.00E-05, loss= 1.2259 (max= 2.1797), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:54:38,900 - root - INFO - Step 16810: lr=1.00E-05, loss= 1.2259 (max= 2.1797), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:54:38,900 - root - INFO - Step 16810: lr=1.00E-05, loss= 1.2259 (max= 2.1797), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:54:38,900 - root - INFO - Step 16810: lr=1.00E-05, loss= 1.2259 (max= 2.1797), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:54:38,900 - root - INFO - Step 16810: lr=1.00E-05, loss= 1.2259 (max= 2.1797), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:54:38,900 - root - INFO - Step 16810: lr=1.00E-05, loss= 1.2259 (max= 2.1797), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:54:56,922 - root - INFO - Step 16820: lr=1.00E-05, loss= 1.2191 (max= 2.4195), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:56,922 - root - INFO - Step 16820: lr=1.00E-05, loss= 1.2191 (max= 2.4195), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:56,922 - root - INFO - Step 16820: lr=1.00E-05, loss= 1.2191 (max= 2.4195), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:56,922 - root - INFO - Step 16820: lr=1.00E-05, loss= 1.2191 (max= 2.4195), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:56,922 - root - INFO - Step 16820: lr=1.00E-05, loss= 1.2191 (max= 2.4195), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:56,922 - root - INFO - Step 16820: lr=1.00E-05, loss= 1.2191 (max= 2.4195), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:56,922 - root - INFO - Step 16820: lr=1.00E-05, loss= 1.2191 (max= 2.4195), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:54:56,923 - root - INFO - Step 16820: lr=1.00E-05, loss= 1.2191 (max= 2.4195), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:55:14,946 - root - INFO - Step 16830: lr=1.00E-05, loss= 1.2140 (max= 2.3467), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:14,946 - root - INFO - Step 16830: lr=1.00E-05, loss= 1.2140 (max= 2.3467), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:14,946 - root - INFO - Step 16830: lr=1.00E-05, loss= 1.2140 (max= 2.3467), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:14,947 - root - INFO - Step 16830: lr=1.00E-05, loss= 1.2140 (max= 2.3467), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:14,947 - root - INFO - Step 16830: lr=1.00E-05, loss= 1.2140 (max= 2.3467), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:14,947 - root - INFO - Step 16830: lr=1.00E-05, loss= 1.2140 (max= 2.3467), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:14,947 - root - INFO - Step 16830: lr=1.00E-05, loss= 1.2140 (max= 2.3467), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:14,947 - root - INFO - Step 16830: lr=1.00E-05, loss= 1.2140 (max= 2.3467), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:33,001 - root - INFO - Step 16840: lr=1.00E-05, loss= 1.2126 (max= 2.3760), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:33,001 - root - INFO - Step 16840: lr=1.00E-05, loss= 1.2126 (max= 2.3760), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:33,001 - root - INFO - Step 16840: lr=1.00E-05, loss= 1.2126 (max= 2.3760), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:33,001 - root - INFO - Step 16840: lr=1.00E-05, loss= 1.2126 (max= 2.3760), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:33,001 - root - INFO - Step 16840: lr=1.00E-05, loss= 1.2126 (max= 2.3760), tps=18154, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:33,001 - root - INFO - Step 16840: lr=1.00E-05, loss= 1.2126 (max= 2.3760), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:33,001 - root - INFO - Step 16840: lr=1.00E-05, loss= 1.2126 (max= 2.3760), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:33,001 - root - INFO - Step 16840: lr=1.00E-05, loss= 1.2126 (max= 2.3760), tps=18153, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:51,021 - root - INFO - Step 16850: lr=1.00E-05, loss= 1.2331 (max= 2.3620), tps=18189, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:51,021 - root - INFO - Step 16850: lr=1.00E-05, loss= 1.2331 (max= 2.3620), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:51,021 - root - INFO - Step 16850: lr=1.00E-05, loss= 1.2331 (max= 2.3620), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:51,021 - root - INFO - Step 16850: lr=1.00E-05, loss= 1.2331 (max= 2.3620), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:51,021 - root - INFO - Step 16850: lr=1.00E-05, loss= 1.2331 (max= 2.3620), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:51,021 - root - INFO - Step 16850: lr=1.00E-05, loss= 1.2331 (max= 2.3620), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:51,021 - root - INFO - Step 16850: lr=1.00E-05, loss= 1.2331 (max= 2.3620), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:55:51,022 - root - INFO - Step 16850: lr=1.00E-05, loss= 1.2331 (max= 2.3620), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:56:09,060 - root - INFO - Step 16860: lr=1.00E-05, loss= 1.2060 (max= 2.4667), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:56:09,060 - root - INFO - Step 16860: lr=1.00E-05, loss= 1.2060 (max= 2.4667), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:56:09,060 - root - INFO - Step 16860: lr=1.00E-05, loss= 1.2060 (max= 2.4667), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:56:09,060 - root - INFO - Step 16860: lr=1.00E-05, loss= 1.2060 (max= 2.4667), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:56:09,060 - root - INFO - Step 16860: lr=1.00E-05, loss= 1.2060 (max= 2.4667), tps=18169, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:56:09,060 - root - INFO - Step 16860: lr=1.00E-05, loss= 1.2060 (max= 2.4667), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:56:09,060 - root - INFO - Step 16860: lr=1.00E-05, loss= 1.2060 (max= 2.4667), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:56:09,061 - root - INFO - Step 16860: lr=1.00E-05, loss= 1.2060 (max= 2.4667), tps=18169, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:56:27,106 - root - INFO - Step 16870: lr=1.00E-05, loss= 1.2327 (max= 2.3076), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:56:27,106 - root - INFO - Step 16870: lr=1.00E-05, loss= 1.2327 (max= 2.3076), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:56:27,106 - root - INFO - Step 16870: lr=1.00E-05, loss= 1.2327 (max= 2.3076), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:56:27,106 - root - INFO - Step 16870: lr=1.00E-05, loss= 1.2327 (max= 2.3076), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:56:27,106 - root - INFO - Step 16870: lr=1.00E-05, loss= 1.2327 (max= 2.3076), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:56:27,106 - root - INFO - Step 16870: lr=1.00E-05, loss= 1.2327 (max= 2.3076), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:56:27,106 - root - INFO - Step 16870: lr=1.00E-05, loss= 1.2327 (max= 2.3076), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:56:27,106 - root - INFO - Step 16870: lr=1.00E-05, loss= 1.2327 (max= 2.3076), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:56:45,140 - root - INFO - Step 16880: lr=1.00E-05, loss= 1.2301 (max= 2.3981), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:56:45,140 - root - INFO - Step 16880: lr=1.00E-05, loss= 1.2301 (max= 2.3981), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:56:45,140 - root - INFO - Step 16880: lr=1.00E-05, loss= 1.2301 (max= 2.3981), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:56:45,140 - root - INFO - Step 16880: lr=1.00E-05, loss= 1.2301 (max= 2.3981), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:56:45,140 - root - INFO - Step 16880: lr=1.00E-05, loss= 1.2301 (max= 2.3981), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:56:45,140 - root - INFO - Step 16880: lr=1.00E-05, loss= 1.2301 (max= 2.3981), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:56:45,140 - root - INFO - Step 16880: lr=1.00E-05, loss= 1.2301 (max= 2.3981), tps=18174, mfu=37.87%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:56:45,140 - root - INFO - Step 16880: lr=1.00E-05, loss= 1.2301 (max= 2.3981), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:57:03,141 - root - INFO - Step 16890: lr=1.00E-05, loss= 1.2412 (max= 2.3931), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:03,141 - root - INFO - Step 16890: lr=1.00E-05, loss= 1.2412 (max= 2.3931), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:03,141 - root - INFO - Step 16890: lr=1.00E-05, loss= 1.2412 (max= 2.3931), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:03,142 - root - INFO - Step 16890: lr=1.00E-05, loss= 1.2412 (max= 2.3931), tps=18207, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:03,142 - root - INFO - Step 16890: lr=1.00E-05, loss= 1.2412 (max= 2.3931), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:03,142 - root - INFO - Step 16890: lr=1.00E-05, loss= 1.2412 (max= 2.3931), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:03,142 - root - INFO - Step 16890: lr=1.00E-05, loss= 1.2412 (max= 2.3931), tps=18207, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:03,143 - root - INFO - Step 16890: lr=1.00E-05, loss= 1.2412 (max= 2.3931), tps=18206, mfu=37.93%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:21,162 - root - INFO - Step 16900: lr=1.00E-05, loss= 1.2006 (max= 2.0814), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:21,163 - root - INFO - Step 16900: lr=1.00E-05, loss= 1.2006 (max= 2.0814), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:21,163 - root - INFO - Step 16900: lr=1.00E-05, loss= 1.2006 (max= 2.0814), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:21,163 - root - INFO - Step 16900: lr=1.00E-05, loss= 1.2006 (max= 2.0814), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:21,163 - root - INFO - Step 16900: lr=1.00E-05, loss= 1.2006 (max= 2.0814), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:21,163 - root - INFO - Step 16900: lr=1.00E-05, loss= 1.2006 (max= 2.0814), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:21,163 - root - INFO - Step 16900: lr=1.00E-05, loss= 1.2006 (max= 2.0814), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:21,163 - root - INFO - Step 16900: lr=1.00E-05, loss= 1.2006 (max= 2.0814), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:39,184 - root - INFO - Step 16910: lr=1.00E-05, loss= 1.1918 (max= 2.0555), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:39,184 - root - INFO - Step 16910: lr=1.00E-05, loss= 1.1918 (max= 2.0555), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:39,184 - root - INFO - Step 16910: lr=1.00E-05, loss= 1.1918 (max= 2.0555), tps=18188, mfu=37.90%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:39,184 - root - INFO - Step 16910: lr=1.00E-05, loss= 1.1918 (max= 2.0555), tps=18188, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:39,184 - root - INFO - Step 16910: lr=1.00E-05, loss= 1.1918 (max= 2.0555), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:39,184 - root - INFO - Step 16910: lr=1.00E-05, loss= 1.1918 (max= 2.0555), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:39,185 - root - INFO - Step 16910: lr=1.00E-05, loss= 1.1918 (max= 2.0555), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:39,185 - root - INFO - Step 16910: lr=1.00E-05, loss= 1.1918 (max= 2.0555), tps=18187, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:57,242 - root - INFO - Step 16920: lr=1.00E-05, loss= 1.2008 (max= 1.9694), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:57,242 - root - INFO - Step 16920: lr=1.00E-05, loss= 1.2008 (max= 1.9694), tps=18151, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:57,242 - root - INFO - Step 16920: lr=1.00E-05, loss= 1.2008 (max= 1.9694), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:57,242 - root - INFO - Step 16920: lr=1.00E-05, loss= 1.2008 (max= 1.9694), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:57,242 - root - INFO - Step 16920: lr=1.00E-05, loss= 1.2008 (max= 1.9694), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:57,242 - root - INFO - Step 16920: lr=1.00E-05, loss= 1.2008 (max= 1.9694), tps=18150, mfu=37.82%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:57,243 - root - INFO - Step 16920: lr=1.00E-05, loss= 1.2008 (max= 1.9694), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:57:57,243 - root - INFO - Step 16920: lr=1.00E-05, loss= 1.2008 (max= 1.9694), tps=18149, mfu=37.81%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:58:15,222 - root - INFO - Step 16930: lr=1.00E-05, loss= 1.2180 (max= 3.0611), tps=18229, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:58:15,223 - root - INFO - Step 16930: lr=1.00E-05, loss= 1.2180 (max= 3.0611), tps=18228, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:58:15,223 - root - INFO - Step 16930: lr=1.00E-05, loss= 1.2180 (max= 3.0611), tps=18228, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:58:15,223 - root - INFO - Step 16930: lr=1.00E-05, loss= 1.2180 (max= 3.0611), tps=18229, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:58:15,223 - root - INFO - Step 16930: lr=1.00E-05, loss= 1.2180 (max= 3.0611), tps=18229, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:58:15,223 - root - INFO - Step 16930: lr=1.00E-05, loss= 1.2180 (max= 3.0611), tps=18228, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:58:15,223 - root - INFO - Step 16930: lr=1.00E-05, loss= 1.2180 (max= 3.0611), tps=18228, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:58:15,223 - root - INFO - Step 16930: lr=1.00E-05, loss= 1.2180 (max= 3.0611), tps=18228, mfu=37.98%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:58:33,269 - root - INFO - Step 16940: lr=1.00E-05, loss= 1.2081 (max= 2.2324), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:58:33,269 - root - INFO - Step 16940: lr=1.00E-05, loss= 1.2081 (max= 2.2324), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:58:33,269 - root - INFO - Step 16940: lr=1.00E-05, loss= 1.2081 (max= 2.2324), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:58:33,269 - root - INFO - Step 16940: lr=1.00E-05, loss= 1.2081 (max= 2.2324), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:58:33,269 - root - INFO - Step 16940: lr=1.00E-05, loss= 1.2081 (max= 2.2324), tps=18162, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:58:33,269 - root - INFO - Step 16940: lr=1.00E-05, loss= 1.2081 (max= 2.2324), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:58:33,269 - root - INFO - Step 16940: lr=1.00E-05, loss= 1.2081 (max= 2.2324), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:58:33,269 - root - INFO - Step 16940: lr=1.00E-05, loss= 1.2081 (max= 2.2324), tps=18161, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:58:51,293 - root - INFO - Step 16950: lr=1.00E-05, loss= 1.2294 (max= 2.1929), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:58:51,293 - root - INFO - Step 16950: lr=1.00E-05, loss= 1.2294 (max= 2.1929), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:58:51,293 - root - INFO - Step 16950: lr=1.00E-05, loss= 1.2294 (max= 2.1929), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:58:51,293 - root - INFO - Step 16950: lr=1.00E-05, loss= 1.2294 (max= 2.1929), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:58:51,293 - root - INFO - Step 16950: lr=1.00E-05, loss= 1.2294 (max= 2.1929), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:58:51,293 - root - INFO - Step 16950: lr=1.00E-05, loss= 1.2294 (max= 2.1929), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:58:51,293 - root - INFO - Step 16950: lr=1.00E-05, loss= 1.2294 (max= 2.1929), tps=18183, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:58:51,293 - root - INFO - Step 16950: lr=1.00E-05, loss= 1.2294 (max= 2.1929), tps=18184, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:59:09,328 - root - INFO - Step 16960: lr=1.00E-05, loss= 1.2205 (max= 2.2008), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:59:09,328 - root - INFO - Step 16960: lr=1.00E-05, loss= 1.2205 (max= 2.2008), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:59:09,328 - root - INFO - Step 16960: lr=1.00E-05, loss= 1.2205 (max= 2.2008), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:59:09,328 - root - INFO - Step 16960: lr=1.00E-05, loss= 1.2205 (max= 2.2008), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:59:09,328 - root - INFO - Step 16960: lr=1.00E-05, loss= 1.2205 (max= 2.2008), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:59:09,328 - root - INFO - Step 16960: lr=1.00E-05, loss= 1.2205 (max= 2.2008), tps=18173, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:59:09,328 - root - INFO - Step 16960: lr=1.00E-05, loss= 1.2205 (max= 2.2008), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:59:09,328 - root - INFO - Step 16960: lr=1.00E-05, loss= 1.2205 (max= 2.2008), tps=18172, mfu=37.86%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.02%) +2025-10-24 17:59:27,338 - root - INFO - Step 16970: lr=1.00E-05, loss= 1.2250 (max= 2.2779), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:59:27,339 - root - INFO - Step 16970: lr=1.00E-05, loss= 1.2250 (max= 2.2779), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:59:27,339 - root - INFO - Step 16970: lr=1.00E-05, loss= 1.2250 (max= 2.2779), tps=18198, mfu=37.92%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:59:27,339 - root - INFO - Step 16970: lr=1.00E-05, loss= 1.2250 (max= 2.2779), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:59:27,339 - root - INFO - Step 16970: lr=1.00E-05, loss= 1.2250 (max= 2.2779), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:59:27,339 - root - INFO - Step 16970: lr=1.00E-05, loss= 1.2250 (max= 2.2779), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:59:27,339 - root - INFO - Step 16970: lr=1.00E-05, loss= 1.2250 (max= 2.2779), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:59:27,339 - root - INFO - Step 16970: lr=1.00E-05, loss= 1.2250 (max= 2.2779), tps=18197, mfu=37.91%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:59:45,365 - root - INFO - Step 16980: lr=1.00E-05, loss= 1.2321 (max= 2.3167), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:59:45,365 - root - INFO - Step 16980: lr=1.00E-05, loss= 1.2321 (max= 2.3167), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:59:45,365 - root - INFO - Step 16980: lr=1.00E-05, loss= 1.2321 (max= 2.3167), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:59:45,365 - root - INFO - Step 16980: lr=1.00E-05, loss= 1.2321 (max= 2.3167), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:59:45,365 - root - INFO - Step 16980: lr=1.00E-05, loss= 1.2321 (max= 2.3167), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:59:45,366 - root - INFO - Step 16980: lr=1.00E-05, loss= 1.2321 (max= 2.3167), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:59:45,366 - root - INFO - Step 16980: lr=1.00E-05, loss= 1.2321 (max= 2.3167), tps=18182, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 17:59:45,366 - root - INFO - Step 16980: lr=1.00E-05, loss= 1.2321 (max= 2.3167), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:00:03,415 - root - INFO - Step 16990: lr=1.00E-05, loss= 1.1941 (max= 2.2031), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:00:03,415 - root - INFO - Step 16990: lr=1.00E-05, loss= 1.1941 (max= 2.2031), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:00:03,415 - root - INFO - Step 16990: lr=1.00E-05, loss= 1.1941 (max= 2.2031), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:00:03,415 - root - INFO - Step 16990: lr=1.00E-05, loss= 1.1941 (max= 2.2031), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:00:03,415 - root - INFO - Step 16990: lr=1.00E-05, loss= 1.1941 (max= 2.2031), tps=18159, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:00:03,415 - root - INFO - Step 16990: lr=1.00E-05, loss= 1.1941 (max= 2.2031), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:00:03,415 - root - INFO - Step 16990: lr=1.00E-05, loss= 1.1941 (max= 2.2031), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:00:03,415 - root - INFO - Step 16990: lr=1.00E-05, loss= 1.1941 (max= 2.2031), tps=18158, mfu=37.83%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-17000 +2025-10-24 18:00:21,443 - root - INFO - Step 17000: lr=1.00E-05, loss= 1.2146 (max= 2.1267), tps=18181, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:00:21,443 - root - INFO - Step 17000: lr=1.00E-05, loss= 1.2146 (max= 2.1267), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:00:21,443 - root - INFO - Saving a full checkpoint at step 17000 +2025-10-24 18:00:21,443 - root - INFO - Step 17000: lr=1.00E-05, loss= 1.2146 (max= 2.1267), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:00:21,443 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:00:21,443 - root - INFO - Saving a full checkpoint at step 17000 +2025-10-24 18:00:21,443 - root - INFO - Step 17000: lr=1.00E-05, loss= 1.2146 (max= 2.1267), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:00:21,443 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:00:21,443 - root - INFO - Saving a full checkpoint at step 17000 +2025-10-24 18:00:21,443 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:00:21,443 - root - INFO - Step 17000: lr=1.00E-05, loss= 1.2146 (max= 2.1267), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:00:21,443 - root - INFO - Saving a full checkpoint at step 17000 +2025-10-24 18:00:21,443 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:00:21,444 - root - INFO - Saving a full checkpoint at step 17000 +2025-10-24 18:00:21,444 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:00:21,444 - root - INFO - Step 17000: lr=1.00E-05, loss= 1.2146 (max= 2.1267), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:00:21,444 - root - INFO - Step 17000: lr=1.00E-05, loss= 1.2146 (max= 2.1267), tps=18180, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:00:21,444 - root - INFO - Saving a full checkpoint at step 17000 +2025-10-24 18:00:21,444 - root - INFO - Saving a full checkpoint at step 17000 +2025-10-24 18:00:21,444 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:00:21,444 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:00:21,444 - root - INFO - Step 17000: lr=1.00E-05, loss= 1.2146 (max= 2.1267), tps=18179, mfu=37.88%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:00:21,444 - root - INFO - Saving a full checkpoint at step 17000 +2025-10-24 18:00:21,444 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-17000! Save time: 4.629223346710205 +2025-10-24 18:00:36,220 - root - INFO - Finished saving the checkpoint in 14.78 seconds +2025-10-24 18:00:36,227 - root - INFO - Finished saving the checkpoint in 14.78 seconds +2025-10-24 18:00:36,227 - root - INFO - Finished saving the checkpoint in 14.78 seconds +2025-10-24 18:00:36,227 - root - INFO - Finished saving the checkpoint in 14.78 seconds +2025-10-24 18:00:36,228 - root - INFO - Finished saving the checkpoint in 14.78 seconds +2025-10-24 18:00:36,228 - root - INFO - Finished saving the checkpoint in 14.78 seconds +2025-10-24 18:00:36,228 - root - INFO - Finished saving the checkpoint in 14.78 seconds +2025-10-24 18:00:36,229 - root - INFO - Finished saving the checkpoint in 14.79 seconds +2025-10-24 18:00:54,213 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.2102 (max= 2.0294), tps=10001, mfu=20.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:00:54,213 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.2102 (max= 2.0294), tps=10001, mfu=20.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:00:54,213 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.2102 (max= 2.0294), tps=10001, mfu=20.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:00:54,213 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.2102 (max= 2.0294), tps=10001, mfu=20.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:00:54,213 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.2102 (max= 2.0294), tps=10001, mfu=20.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:00:54,213 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.2102 (max= 2.0294), tps=10000, mfu=20.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:00:54,213 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.2102 (max= 2.0294), tps=10001, mfu=20.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:00:54,213 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.2102 (max= 2.0294), tps=10001, mfu=20.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:01:12,211 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1946 (max= 2.1351), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:12,211 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1946 (max= 2.1351), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:12,212 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1946 (max= 2.1351), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:12,212 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1946 (max= 2.1351), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:12,212 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1946 (max= 2.1351), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:12,212 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1946 (max= 2.1351), tps=18210, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:12,212 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1946 (max= 2.1351), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:12,212 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1946 (max= 2.1351), tps=18209, mfu=37.94%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:30,234 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.2150 (max= 2.1500), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:30,234 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.2150 (max= 2.1500), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:30,234 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.2150 (max= 2.1500), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:30,234 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.2150 (max= 2.1500), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:30,234 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.2150 (max= 2.1500), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:30,234 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.2150 (max= 2.1500), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:30,234 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.2150 (max= 2.1500), tps=18185, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:30,234 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.2150 (max= 2.1500), tps=18186, mfu=37.89%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:48,308 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.2048 (max= 2.8802), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:48,308 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.2048 (max= 2.8802), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:48,308 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.2048 (max= 2.8802), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:48,308 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.2048 (max= 2.8802), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:48,308 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.2048 (max= 2.8802), tps=18134, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:48,308 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.2048 (max= 2.8802), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:48,308 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.2048 (max= 2.8802), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:01:48,309 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.2048 (max= 2.8802), tps=18133, mfu=37.78%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:02:06,351 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.2180 (max= 2.1162), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:02:06,352 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.2180 (max= 2.1162), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:02:06,352 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.2180 (max= 2.1162), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:02:06,352 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.2180 (max= 2.1162), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:02:06,352 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.2180 (max= 2.1162), tps=18164, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:02:06,352 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.2180 (max= 2.1162), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:02:06,352 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.2180 (max= 2.1162), tps=18164, mfu=37.84%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:02:06,352 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.2180 (max= 2.1162), tps=18165, mfu=37.85%, memory: 78.54GiB(44.03%) time/data_loading=0.00s (max=0.00s, 0.03%) +2025-10-24 18:02:53,963 - root - INFO - Starting training. +2025-10-24 18:02:53,963 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 18:02:54,119 - root - INFO - Starting training. +2025-10-24 18:02:54,120 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 18:02:54,225 - root - INFO - Starting training. +2025-10-24 18:02:54,225 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 18:02:54,338 - root - INFO - Starting training. +2025-10-24 18:02:54,338 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 18:02:54,448 - root - INFO - Starting training. +2025-10-24 18:02:54,449 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 18:02:54,545 - root - INFO - Starting training. +2025-10-24 18:02:54,545 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 18:02:54,571 - root - INFO - Starting training. +2025-10-24 18:02:54,572 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 18:02:54,642 - root - INFO - Starting training. +2025-10-24 18:02:54,642 - root - INFO - Loading config from jobs/munin-7b-open-stage1/config.json +2025-10-24 18:02:54,871 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 18:02:54,872 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 18:02:54,872 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 18:02:55,309 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 18:02:55,360 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 18:02:55,360 - root - INFO - GPU capacity: NVIDIA B200 (0) with 178.36GiB memory +2025-10-24 18:02:55,364 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 18:02:55,402 - root - INFO - Applied FSDP to the model +2025-10-24 18:02:55,403 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 18:02:55,795 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 18:02:55,796 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 18:02:55,797 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 18:02:55,955 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 18:02:55,957 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 18:02:55,958 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 18:02:56,126 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 18:02:56,128 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 18:02:56,128 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 18:02:56,322 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 18:02:56,334 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 18:02:56,373 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 18:02:56,373 - root - INFO - GPU capacity: NVIDIA B200 (4) with 178.36GiB memory +2025-10-24 18:02:56,377 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 18:02:56,383 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 18:02:56,384 - root - INFO - GPU capacity: NVIDIA B200 (6) with 178.36GiB memory +2025-10-24 18:02:56,387 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 18:02:56,412 - root - INFO - Applied FSDP to the model +2025-10-24 18:02:56,413 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 18:02:56,420 - root - INFO - Applied FSDP to the model +2025-10-24 18:02:56,421 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 18:02:56,483 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 18:02:56,484 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 18:02:56,485 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 18:02:56,495 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 18:02:56,518 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 18:02:56,520 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 18:02:56,520 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 18:02:56,524 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 18:02:56,526 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 18:02:56,527 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 18:02:56,545 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 18:02:56,545 - root - INFO - GPU capacity: NVIDIA B200 (7) with 178.36GiB memory +2025-10-24 18:02:56,548 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 18:02:56,551 - root - WARNING - ENV[TORCH_NCCL_ASYNC_ERROR_HANDLING] = 1 will be overridden to 3 based on job config +2025-10-24 18:02:56,552 - root - INFO - Building 1-D device mesh with ['dp'], [8] +2025-10-24 18:02:56,553 - root - INFO - world mesh: DeviceMesh('cuda', [0, 1, 2, 3, 4, 5, 6, 7], mesh_dim_names=('dp',)) +2025-10-24 18:02:56,582 - root - INFO - Applied FSDP to the model +2025-10-24 18:02:56,582 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 18:02:56,848 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 18:02:56,887 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 18:02:56,887 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 18:02:56,898 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 18:02:56,899 - root - INFO - GPU capacity: NVIDIA B200 (1) with 178.36GiB memory +2025-10-24 18:02:56,902 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 18:02:56,908 - root - INFO - Building llama3 Comma7B with ModelArgs(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=64256, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=100000.0, init_std=0.02, tied_embeddings=False, max_batch_size=32, max_seq_len=4096, norm_type='compiled_rmsnorm', enable_mup=False, mup_input_alpha=1.0, mup_output_alpha=1.0, mup_width_mul=1.0) +2025-10-24 18:02:56,935 - root - INFO - Applied FSDP to the model +2025-10-24 18:02:56,936 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 18:02:56,937 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 18:02:56,937 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 18:02:56,938 - root - INFO - GPU capacity: NVIDIA B200 (3) with 178.36GiB memory +2025-10-24 18:02:56,938 - root - INFO - GPU capacity: NVIDIA B200 (2) with 178.36GiB memory +2025-10-24 18:02:56,941 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 18:02:56,941 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 18:02:56,958 - root - INFO - Model llama3 Comma7B size: 7,002,656,768 total parameters (6,739,464,192 without embeddings) +2025-10-24 18:02:56,959 - root - INFO - GPU capacity: NVIDIA B200 (5) with 178.36GiB memory +2025-10-24 18:02:56,962 - root - INFO - Compiling each TransformerBlock with torch.compile +2025-10-24 18:02:56,974 - root - INFO - Applied FSDP to the model +2025-10-24 18:02:56,975 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 18:02:56,975 - root - INFO - Applied FSDP to the model +2025-10-24 18:02:56,976 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 18:02:56,995 - root - INFO - Applied FSDP to the model +2025-10-24 18:02:56,996 - root - INFO - Model after parallelization model=FSDPTransformer( + (tok_embeddings): Embedding(64256, 4096) + (layers): ModuleDict( + (0): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (1): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (2): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (3): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (4): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (5): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (6): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (7): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (8): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (9): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (10): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (11): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (12): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (13): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (14): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (15): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (16): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (17): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (18): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (19): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (20): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (21): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (22): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (23): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (24): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (25): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (26): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (27): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (28): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (29): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (30): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + (31): FSDPOptimizedModule( + (_orig_mod): TransformerBlock( + (attention): Attention( + (wq): Linear(in_features=4096, out_features=4096, bias=False) + (wk): Linear(in_features=4096, out_features=4096, bias=False) + (wv): Linear(in_features=4096, out_features=4096, bias=False) + (wo): Linear(in_features=4096, out_features=4096, bias=False) + ) + (feed_forward): FeedForward( + (w1): Linear(in_features=4096, out_features=11008, bias=False) + (w2): Linear(in_features=11008, out_features=4096, bias=False) + (w3): Linear(in_features=4096, out_features=11008, bias=False) + ) + (attention_norm): RMSNorm() + (ffn_norm): RMSNorm() + ) + ) + ) + (norm): RMSNorm() + (output): Linear(in_features=4096, out_features=64256, bias=False) +) + +2025-10-24 18:03:21,745 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 18:03:21,745 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 18:03:21,745 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 18:03:21,745 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 18:03:21,745 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 18:03:21,745 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 18:03:21,746 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +2025-10-24 18:03:21,746 - root - INFO - GPU memory usage for model: 3.28GiB(1.84%) +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +/home/ucloud/miniconda3/envs/maester/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. + warnings.warn( # warn only once +2025-10-24 18:03:22,371 - root - INFO - Loaded cached document counts in 0.0003216266632080078 seconds +2025-10-24 18:03:22,371 - root - INFO - Loaded cached document counts in 0.00017786026000976562 seconds +2025-10-24 18:03:22,371 - root - INFO - Loaded cached document counts in 0.00011658668518066406 seconds +2025-10-24 18:03:22,372 - root - INFO - Loaded cached document counts in 7.486343383789062e-05 seconds +2025-10-24 18:03:22,372 - root - INFO - Loaded cached document counts in 6.127357482910156e-05 seconds +2025-10-24 18:03:22,372 - root - INFO - Loaded cached document counts in 5.316734313964844e-05 seconds +2025-10-24 18:03:22,372 - root - INFO - Loaded cached document counts in 8.177757263183594e-05 seconds +2025-10-24 18:03:22,372 - root - INFO - Loaded cached document counts in 0.0001800060272216797 seconds +2025-10-24 18:03:22,373 - root - INFO - Worker 0 responsible for docs: [('/work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet', 0, 945398)] +2025-10-24 18:03:22,373 - root - INFO - Total docs: 945399 +2025-10-24 18:03:22,373 - root - INFO - Worker 0 assembled subdataset iterator for /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/, 1 of 1 +Dataset checkpoint detected at jobs/munin-7b-open-stage1/checkpoints/dataloader/step-17000 +Dataset checkpoint loaded! Load time: 15.952952861785889 +2025-10-24 18:03:38,329 - root - INFO - Nodecay weight: tok_embeddings.weight +2025-10-24 18:03:38,329 - root - INFO - Decay weight: layers.0._orig_mod.attention.wq.weight +2025-10-24 18:03:38,329 - root - INFO - Decay weight: layers.0._orig_mod.attention.wk.weight +2025-10-24 18:03:38,329 - root - INFO - Decay weight: layers.0._orig_mod.attention.wv.weight +2025-10-24 18:03:38,329 - root - INFO - Decay weight: layers.0._orig_mod.attention.wo.weight +2025-10-24 18:03:38,329 - root - INFO - Decay weight: layers.0._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,329 - root - INFO - Decay weight: layers.0._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,329 - root - INFO - Decay weight: layers.0._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,329 - root - INFO - Nodecay weight: layers.0._orig_mod.attention_norm.weight +2025-10-24 18:03:38,330 - root - INFO - Nodecay weight: layers.0._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.1._orig_mod.attention.wq.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.1._orig_mod.attention.wk.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.1._orig_mod.attention.wv.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.1._orig_mod.attention.wo.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.1._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.1._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.1._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,330 - root - INFO - Nodecay weight: layers.1._orig_mod.attention_norm.weight +2025-10-24 18:03:38,330 - root - INFO - Nodecay weight: layers.1._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.2._orig_mod.attention.wq.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.2._orig_mod.attention.wk.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.2._orig_mod.attention.wv.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.2._orig_mod.attention.wo.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.2._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.2._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.2._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,330 - root - INFO - Nodecay weight: layers.2._orig_mod.attention_norm.weight +2025-10-24 18:03:38,330 - root - INFO - Nodecay weight: layers.2._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.3._orig_mod.attention.wq.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.3._orig_mod.attention.wk.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.3._orig_mod.attention.wv.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.3._orig_mod.attention.wo.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.3._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.3._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.3._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,330 - root - INFO - Nodecay weight: layers.3._orig_mod.attention_norm.weight +2025-10-24 18:03:38,330 - root - INFO - Nodecay weight: layers.3._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.4._orig_mod.attention.wq.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.4._orig_mod.attention.wk.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.4._orig_mod.attention.wv.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.4._orig_mod.attention.wo.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.4._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.4._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.4._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,330 - root - INFO - Nodecay weight: layers.4._orig_mod.attention_norm.weight +2025-10-24 18:03:38,330 - root - INFO - Nodecay weight: layers.4._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.5._orig_mod.attention.wq.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.5._orig_mod.attention.wk.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.5._orig_mod.attention.wv.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.5._orig_mod.attention.wo.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.5._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.5._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.5._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,330 - root - INFO - Nodecay weight: layers.5._orig_mod.attention_norm.weight +2025-10-24 18:03:38,330 - root - INFO - Nodecay weight: layers.5._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.6._orig_mod.attention.wq.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.6._orig_mod.attention.wk.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.6._orig_mod.attention.wv.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.6._orig_mod.attention.wo.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.6._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.6._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.6._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,330 - root - INFO - Nodecay weight: layers.6._orig_mod.attention_norm.weight +2025-10-24 18:03:38,330 - root - INFO - Nodecay weight: layers.6._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.7._orig_mod.attention.wq.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.7._orig_mod.attention.wk.weight +2025-10-24 18:03:38,330 - root - INFO - Decay weight: layers.7._orig_mod.attention.wv.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.7._orig_mod.attention.wo.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.7._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.7._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.7._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,331 - root - INFO - Nodecay weight: layers.7._orig_mod.attention_norm.weight +2025-10-24 18:03:38,331 - root - INFO - Nodecay weight: layers.7._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.8._orig_mod.attention.wq.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.8._orig_mod.attention.wk.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.8._orig_mod.attention.wv.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.8._orig_mod.attention.wo.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.8._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.8._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.8._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,331 - root - INFO - Nodecay weight: layers.8._orig_mod.attention_norm.weight +2025-10-24 18:03:38,331 - root - INFO - Nodecay weight: layers.8._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.9._orig_mod.attention.wq.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.9._orig_mod.attention.wk.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.9._orig_mod.attention.wv.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.9._orig_mod.attention.wo.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.9._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.9._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.9._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,331 - root - INFO - Nodecay weight: layers.9._orig_mod.attention_norm.weight +2025-10-24 18:03:38,331 - root - INFO - Nodecay weight: layers.9._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.10._orig_mod.attention.wq.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.10._orig_mod.attention.wk.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.10._orig_mod.attention.wv.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.10._orig_mod.attention.wo.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.10._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.10._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.10._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,331 - root - INFO - Nodecay weight: layers.10._orig_mod.attention_norm.weight +2025-10-24 18:03:38,331 - root - INFO - Nodecay weight: layers.10._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.11._orig_mod.attention.wq.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.11._orig_mod.attention.wk.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.11._orig_mod.attention.wv.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.11._orig_mod.attention.wo.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.11._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.11._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.11._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,331 - root - INFO - Nodecay weight: layers.11._orig_mod.attention_norm.weight +2025-10-24 18:03:38,331 - root - INFO - Nodecay weight: layers.11._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.12._orig_mod.attention.wq.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.12._orig_mod.attention.wk.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.12._orig_mod.attention.wv.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.12._orig_mod.attention.wo.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.12._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.12._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.12._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,331 - root - INFO - Nodecay weight: layers.12._orig_mod.attention_norm.weight +2025-10-24 18:03:38,331 - root - INFO - Nodecay weight: layers.12._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.13._orig_mod.attention.wq.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.13._orig_mod.attention.wk.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.13._orig_mod.attention.wv.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.13._orig_mod.attention.wo.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.13._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.13._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.13._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,331 - root - INFO - Nodecay weight: layers.13._orig_mod.attention_norm.weight +2025-10-24 18:03:38,331 - root - INFO - Nodecay weight: layers.13._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.14._orig_mod.attention.wq.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.14._orig_mod.attention.wk.weight +2025-10-24 18:03:38,331 - root - INFO - Decay weight: layers.14._orig_mod.attention.wv.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.14._orig_mod.attention.wo.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.14._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.14._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.14._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,332 - root - INFO - Nodecay weight: layers.14._orig_mod.attention_norm.weight +2025-10-24 18:03:38,332 - root - INFO - Nodecay weight: layers.14._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.15._orig_mod.attention.wq.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.15._orig_mod.attention.wk.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.15._orig_mod.attention.wv.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.15._orig_mod.attention.wo.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.15._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.15._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.15._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,332 - root - INFO - Nodecay weight: layers.15._orig_mod.attention_norm.weight +2025-10-24 18:03:38,332 - root - INFO - Nodecay weight: layers.15._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.16._orig_mod.attention.wq.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.16._orig_mod.attention.wk.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.16._orig_mod.attention.wv.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.16._orig_mod.attention.wo.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.16._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.16._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.16._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,332 - root - INFO - Nodecay weight: layers.16._orig_mod.attention_norm.weight +2025-10-24 18:03:38,332 - root - INFO - Nodecay weight: layers.16._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.17._orig_mod.attention.wq.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.17._orig_mod.attention.wk.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.17._orig_mod.attention.wv.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.17._orig_mod.attention.wo.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.17._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.17._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.17._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,332 - root - INFO - Nodecay weight: layers.17._orig_mod.attention_norm.weight +2025-10-24 18:03:38,332 - root - INFO - Nodecay weight: layers.17._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.18._orig_mod.attention.wq.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.18._orig_mod.attention.wk.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.18._orig_mod.attention.wv.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.18._orig_mod.attention.wo.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.18._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.18._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.18._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,332 - root - INFO - Nodecay weight: layers.18._orig_mod.attention_norm.weight +2025-10-24 18:03:38,332 - root - INFO - Nodecay weight: layers.18._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.19._orig_mod.attention.wq.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.19._orig_mod.attention.wk.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.19._orig_mod.attention.wv.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.19._orig_mod.attention.wo.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.19._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.19._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.19._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,332 - root - INFO - Nodecay weight: layers.19._orig_mod.attention_norm.weight +2025-10-24 18:03:38,332 - root - INFO - Nodecay weight: layers.19._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.20._orig_mod.attention.wq.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.20._orig_mod.attention.wk.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.20._orig_mod.attention.wv.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.20._orig_mod.attention.wo.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.20._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.20._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.20._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,332 - root - INFO - Nodecay weight: layers.20._orig_mod.attention_norm.weight +2025-10-24 18:03:38,332 - root - INFO - Nodecay weight: layers.20._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.21._orig_mod.attention.wq.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.21._orig_mod.attention.wk.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.21._orig_mod.attention.wv.weight +2025-10-24 18:03:38,332 - root - INFO - Decay weight: layers.21._orig_mod.attention.wo.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.21._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.21._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.21._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,333 - root - INFO - Nodecay weight: layers.21._orig_mod.attention_norm.weight +2025-10-24 18:03:38,333 - root - INFO - Nodecay weight: layers.21._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.22._orig_mod.attention.wq.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.22._orig_mod.attention.wk.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.22._orig_mod.attention.wv.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.22._orig_mod.attention.wo.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.22._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.22._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.22._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,333 - root - INFO - Nodecay weight: layers.22._orig_mod.attention_norm.weight +2025-10-24 18:03:38,333 - root - INFO - Nodecay weight: layers.22._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.23._orig_mod.attention.wq.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.23._orig_mod.attention.wk.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.23._orig_mod.attention.wv.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.23._orig_mod.attention.wo.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.23._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.23._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.23._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,333 - root - INFO - Nodecay weight: layers.23._orig_mod.attention_norm.weight +2025-10-24 18:03:38,333 - root - INFO - Nodecay weight: layers.23._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.24._orig_mod.attention.wq.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.24._orig_mod.attention.wk.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.24._orig_mod.attention.wv.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.24._orig_mod.attention.wo.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.24._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.24._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.24._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,333 - root - INFO - Nodecay weight: layers.24._orig_mod.attention_norm.weight +2025-10-24 18:03:38,333 - root - INFO - Nodecay weight: layers.24._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,333 - root - INFO - Decay weight: layers.25._orig_mod.attention.wq.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.25._orig_mod.attention.wk.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.25._orig_mod.attention.wv.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.25._orig_mod.attention.wo.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.25._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.25._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.25._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,334 - root - INFO - Nodecay weight: layers.25._orig_mod.attention_norm.weight +2025-10-24 18:03:38,334 - root - INFO - Nodecay weight: layers.25._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.26._orig_mod.attention.wq.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.26._orig_mod.attention.wk.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.26._orig_mod.attention.wv.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.26._orig_mod.attention.wo.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.26._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.26._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.26._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,334 - root - INFO - Nodecay weight: layers.26._orig_mod.attention_norm.weight +2025-10-24 18:03:38,334 - root - INFO - Nodecay weight: layers.26._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.27._orig_mod.attention.wq.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.27._orig_mod.attention.wk.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.27._orig_mod.attention.wv.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.27._orig_mod.attention.wo.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.27._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.27._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.27._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,334 - root - INFO - Nodecay weight: layers.27._orig_mod.attention_norm.weight +2025-10-24 18:03:38,334 - root - INFO - Nodecay weight: layers.27._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.28._orig_mod.attention.wq.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.28._orig_mod.attention.wk.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.28._orig_mod.attention.wv.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.28._orig_mod.attention.wo.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.28._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.28._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.28._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,334 - root - INFO - Nodecay weight: layers.28._orig_mod.attention_norm.weight +2025-10-24 18:03:38,334 - root - INFO - Nodecay weight: layers.28._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.29._orig_mod.attention.wq.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.29._orig_mod.attention.wk.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.29._orig_mod.attention.wv.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.29._orig_mod.attention.wo.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.29._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.29._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.29._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,334 - root - INFO - Nodecay weight: layers.29._orig_mod.attention_norm.weight +2025-10-24 18:03:38,334 - root - INFO - Nodecay weight: layers.29._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.30._orig_mod.attention.wq.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.30._orig_mod.attention.wk.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.30._orig_mod.attention.wv.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.30._orig_mod.attention.wo.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.30._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.30._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.30._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,334 - root - INFO - Nodecay weight: layers.30._orig_mod.attention_norm.weight +2025-10-24 18:03:38,334 - root - INFO - Nodecay weight: layers.30._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.31._orig_mod.attention.wq.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.31._orig_mod.attention.wk.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.31._orig_mod.attention.wv.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.31._orig_mod.attention.wo.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.31._orig_mod.feed_forward.w1.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.31._orig_mod.feed_forward.w2.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: layers.31._orig_mod.feed_forward.w3.weight +2025-10-24 18:03:38,334 - root - INFO - Nodecay weight: layers.31._orig_mod.attention_norm.weight +2025-10-24 18:03:38,334 - root - INFO - Nodecay weight: layers.31._orig_mod.ffn_norm.weight +2025-10-24 18:03:38,334 - root - INFO - Nodecay weight: norm.weight +2025-10-24 18:03:38,334 - root - INFO - Decay weight: output.weight +2025-10-24 18:03:38,961 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 18:03:38,982 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 18:03:38,991 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 18:03:39,023 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 18:03:39,024 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 18:03:39,176 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 18:03:39,469 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 18:03:40,100 - root - INFO - Checkpointing active. Checkpoints will be loaded from and saved to jobs/munin-7b-open-stage1/checkpoints +2025-10-24 18:03:40,145 - root - INFO - Loading the checkpoint at step 17000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:03:40,145 - root - INFO - Loading the checkpoint at step 17000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:03:40,145 - root - INFO - Loading the checkpoint at step 17000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:03:40,145 - root - INFO - Loading the checkpoint at step 17000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:03:40,145 - root - INFO - Loading the checkpoint at step 17000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:03:40,145 - root - INFO - Loading the checkpoint at step 17000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:03:40,145 - root - INFO - Loading the checkpoint at step 17000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:03:40,145 - root - INFO - Loading the checkpoint at step 17000, containing keys dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:03:47,902 - root - INFO - Finished loading the checkpoint in 7.76 seconds +2025-10-24 18:03:47,903 - root - INFO - Finished loading the checkpoint in 7.76 seconds +2025-10-24 18:03:47,903 - root - INFO - Finished loading the checkpoint in 7.76 seconds +2025-10-24 18:03:47,904 - root - INFO - Finished loading the checkpoint in 7.76 seconds +2025-10-24 18:03:47,904 - root - INFO - Finished loading the checkpoint in 7.76 seconds +2025-10-24 18:03:47,904 - root - INFO - Finished loading the checkpoint in 7.76 seconds +2025-10-24 18:03:47,905 - root - INFO - Finished loading the checkpoint in 7.76 seconds +2025-10-24 18:03:47,905 - root - INFO - Finished loading the checkpoint in 7.76 seconds +2025-10-24 18:03:47,943 - root - INFO - Training starts at step 17000 +2025-10-24 18:03:47,943 - root - INFO - Training starts at step 17000 +2025-10-24 18:03:47,943 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 18:03:47,943 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 18:03:47,944 - root - INFO - Training starts at step 17000 +2025-10-24 18:03:47,944 - root - INFO - Training starts at step 17000 +2025-10-24 18:03:47,944 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 18:03:47,944 - root - INFO - Training starts at step 17000 +2025-10-24 18:03:47,944 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 18:03:47,944 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 18:03:47,947 - root - INFO - Training starts at step 17000 +2025-10-24 18:03:47,947 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 18:03:47,951 - root - INFO - Training starts at step 17000 +2025-10-24 18:03:47,951 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 18:03:47,953 - root - INFO - Training starts at step 17000 +2025-10-24 18:03:47,953 - root - INFO - Worker 0 opening new file /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet +2025-10-24 18:03:47,953 - root - INFO - Profiling active. Traces will be saved at jobs/munin-7b-open-stage1/traces +2025-10-24 18:04:22,380 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.2102 (max= 1.5360), tps=9518, mfu=19.83%, memory: 152.90GiB(85.73%) time/data_loading=0.13s (max=0.50s, 14.51%) +2025-10-24 18:04:22,380 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.2102 (max= 1.5360), tps=9517, mfu=19.83%, memory: 152.90GiB(85.73%) time/data_loading=0.13s (max=0.50s, 14.51%) +2025-10-24 18:04:22,380 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.2102 (max= 1.5360), tps=9517, mfu=19.83%, memory: 152.90GiB(85.73%) time/data_loading=0.13s (max=0.50s, 14.51%) +2025-10-24 18:04:22,380 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.2102 (max= 1.5360), tps=9517, mfu=19.83%, memory: 152.90GiB(85.73%) time/data_loading=0.13s (max=0.50s, 14.51%) +2025-10-24 18:04:22,380 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.2102 (max= 1.5360), tps=9519, mfu=19.83%, memory: 152.90GiB(85.73%) time/data_loading=0.13s (max=0.50s, 14.51%) +2025-10-24 18:04:22,380 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.2102 (max= 1.5360), tps=9517, mfu=19.83%, memory: 152.90GiB(85.73%) time/data_loading=0.13s (max=0.50s, 14.51%) +2025-10-24 18:04:22,380 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.2102 (max= 1.5360), tps=9517, mfu=19.83%, memory: 152.90GiB(85.73%) time/data_loading=0.13s (max=0.50s, 14.51%) +2025-10-24 18:04:22,380 - root - INFO - Step 17010: lr=1.00E-05, loss= 1.2102 (max= 1.5360), tps=9520, mfu=19.83%, memory: 152.90GiB(85.73%) time/data_loading=0.13s (max=0.50s, 14.51%) +2025-10-24 18:04:22,631 - root - INFO - Dumping traces at step 10 +2025-10-24 18:04:22,631 - root - INFO - Dumping traces at step 10 +2025-10-24 18:04:22,637 - root - INFO - Dumping traces at step 10 +2025-10-24 18:04:22,637 - root - INFO - Dumping traces at step 10 +2025-10-24 18:04:22,639 - root - INFO - Dumping traces at step 10 +2025-10-24 18:04:22,642 - root - INFO - Dumping traces at step 10 +2025-10-24 18:04:22,644 - root - INFO - Dumping traces at step 10 +2025-10-24 18:04:22,646 - root - INFO - Dumping traces at step 10 +2025-10-24 18:04:22,692 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 18:04:22,693 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 18:04:22,705 - root - INFO - Finished dumping traces in 0.07 seconds +2025-10-24 18:04:22,713 - root - INFO - Finished dumping traces in 0.07 seconds +2025-10-24 18:04:22,713 - root - INFO - Finished dumping traces in 0.07 seconds +2025-10-24 18:04:22,713 - root - INFO - Finished dumping traces in 0.07 seconds +2025-10-24 18:04:22,714 - root - INFO - Finished dumping traces in 0.07 seconds +2025-10-24 18:04:22,714 - root - INFO - Finished dumping traces in 0.08 seconds +2025-10-24 18:04:38,686 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1946 (max= 1.6591), tps=20100, mfu=41.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:04:38,686 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1946 (max= 1.6591), tps=20099, mfu=41.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:04:38,686 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1946 (max= 1.6591), tps=20100, mfu=41.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:04:38,686 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1946 (max= 1.6591), tps=20099, mfu=41.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:04:38,686 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1946 (max= 1.6591), tps=20099, mfu=41.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:04:38,686 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1946 (max= 1.6591), tps=20099, mfu=41.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:04:38,686 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1946 (max= 1.6591), tps=20099, mfu=41.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:04:38,686 - root - INFO - Step 17020: lr=1.00E-05, loss= 1.1946 (max= 1.6591), tps=20099, mfu=41.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:04:38,738 - root - INFO - Dumping traces at step 20 +2025-10-24 18:04:38,738 - root - INFO - Dumping traces at step 20 +2025-10-24 18:04:38,738 - root - INFO - Dumping traces at step 20 +2025-10-24 18:04:38,738 - root - INFO - Dumping traces at step 20 +2025-10-24 18:04:38,738 - root - INFO - Dumping traces at step 20 +2025-10-24 18:04:38,739 - root - INFO - Dumping traces at step 20 +2025-10-24 18:04:38,739 - root - INFO - Dumping traces at step 20 +2025-10-24 18:04:38,741 - root - INFO - Dumping traces at step 20 +2025-10-24 18:04:38,789 - root - INFO - Finished dumping traces in 0.05 seconds +2025-10-24 18:04:38,798 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 18:04:38,801 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 18:04:38,801 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 18:04:38,801 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 18:04:38,802 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 18:04:38,802 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 18:04:38,802 - root - INFO - Finished dumping traces in 0.06 seconds +2025-10-24 18:04:54,766 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.2150 (max= 1.5523), tps=20382, mfu=42.47%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:04:54,766 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.2150 (max= 1.5523), tps=20382, mfu=42.47%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:04:54,766 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.2150 (max= 1.5523), tps=20383, mfu=42.47%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:04:54,766 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.2150 (max= 1.5523), tps=20383, mfu=42.47%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:04:54,766 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.2150 (max= 1.5523), tps=20383, mfu=42.47%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:04:54,766 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.2150 (max= 1.5523), tps=20383, mfu=42.47%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:04:54,766 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.2150 (max= 1.5523), tps=20382, mfu=42.47%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:04:54,766 - root - INFO - Step 17030: lr=1.00E-05, loss= 1.2150 (max= 1.5523), tps=20382, mfu=42.47%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:04:54,818 - root - INFO - Dumping traces at step 30 +2025-10-24 18:04:54,818 - root - INFO - Dumping traces at step 30 +2025-10-24 18:04:54,819 - root - INFO - Dumping traces at step 30 +2025-10-24 18:04:54,819 - root - INFO - Dumping traces at step 30 +2025-10-24 18:04:54,821 - root - INFO - Dumping traces at step 30 +2025-10-24 18:04:54,822 - root - INFO - Dumping traces at step 30 +2025-10-24 18:04:54,823 - root - INFO - Dumping traces at step 30 +2025-10-24 18:04:54,824 - root - INFO - Dumping traces at step 30 +2025-10-24 18:04:54,873 - root - INFO - Finished dumping traces in 0.05 seconds +2025-10-24 18:04:54,890 - root - INFO - Finished dumping traces in 0.07 seconds +2025-10-24 18:04:54,890 - root - INFO - Finished dumping traces in 0.07 seconds +2025-10-24 18:04:54,891 - root - INFO - Finished dumping traces in 0.07 seconds +2025-10-24 18:04:54,891 - root - INFO - Finished dumping traces in 0.07 seconds +2025-10-24 18:04:54,891 - root - INFO - Finished dumping traces in 0.07 seconds +2025-10-24 18:04:54,892 - root - INFO - Finished dumping traces in 0.07 seconds +2025-10-24 18:04:54,892 - root - INFO - Finished dumping traces in 0.07 seconds +2025-10-24 18:05:10,805 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.2048 (max= 1.7669), tps=20435, mfu=42.58%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:10,805 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.2048 (max= 1.7669), tps=20434, mfu=42.58%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:10,805 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.2048 (max= 1.7669), tps=20434, mfu=42.57%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:10,805 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.2048 (max= 1.7669), tps=20434, mfu=42.58%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:10,805 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.2048 (max= 1.7669), tps=20435, mfu=42.58%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:10,806 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.2048 (max= 1.7669), tps=20434, mfu=42.58%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:10,806 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.2048 (max= 1.7669), tps=20435, mfu=42.58%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:10,806 - root - INFO - Step 17040: lr=1.00E-05, loss= 1.2048 (max= 1.7669), tps=20434, mfu=42.57%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:26,721 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.2179 (max= 1.4847), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:26,721 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.2179 (max= 1.4847), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:26,721 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.2179 (max= 1.4847), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:26,721 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.2179 (max= 1.4847), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:26,721 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.2179 (max= 1.4847), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:26,721 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.2179 (max= 1.4847), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:26,721 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.2179 (max= 1.4847), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:26,721 - root - INFO - Step 17050: lr=1.00E-05, loss= 1.2179 (max= 1.4847), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:42,619 - root - INFO - Step 17060: lr=1.00E-05, loss= 1.2224 (max= 1.6292), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:42,620 - root - INFO - Step 17060: lr=1.00E-05, loss= 1.2224 (max= 1.6292), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:42,620 - root - INFO - Step 17060: lr=1.00E-05, loss= 1.2224 (max= 1.6292), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:42,620 - root - INFO - Step 17060: lr=1.00E-05, loss= 1.2224 (max= 1.6292), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:42,620 - root - INFO - Step 17060: lr=1.00E-05, loss= 1.2224 (max= 1.6292), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:42,620 - root - INFO - Step 17060: lr=1.00E-05, loss= 1.2224 (max= 1.6292), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:42,620 - root - INFO - Step 17060: lr=1.00E-05, loss= 1.2224 (max= 1.6292), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:42,620 - root - INFO - Step 17060: lr=1.00E-05, loss= 1.2224 (max= 1.6292), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:58,595 - root - INFO - Step 17070: lr=1.00E-05, loss= 1.2356 (max= 1.6571), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:58,595 - root - INFO - Step 17070: lr=1.00E-05, loss= 1.2356 (max= 1.6571), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:58,595 - root - INFO - Step 17070: lr=1.00E-05, loss= 1.2356 (max= 1.6571), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:58,595 - root - INFO - Step 17070: lr=1.00E-05, loss= 1.2356 (max= 1.6571), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:58,595 - root - INFO - Step 17070: lr=1.00E-05, loss= 1.2356 (max= 1.6571), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:58,595 - root - INFO - Step 17070: lr=1.00E-05, loss= 1.2356 (max= 1.6571), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:58,595 - root - INFO - Step 17070: lr=1.00E-05, loss= 1.2356 (max= 1.6571), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:05:58,595 - root - INFO - Step 17070: lr=1.00E-05, loss= 1.2356 (max= 1.6571), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:06:14,479 - root - INFO - Step 17080: lr=1.00E-05, loss= 1.2049 (max= 1.9196), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:06:14,479 - root - INFO - Step 17080: lr=1.00E-05, loss= 1.2049 (max= 1.9196), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:06:14,479 - root - INFO - Step 17080: lr=1.00E-05, loss= 1.2049 (max= 1.9196), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:06:14,480 - root - INFO - Step 17080: lr=1.00E-05, loss= 1.2049 (max= 1.9196), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:06:14,480 - root - INFO - Step 17080: lr=1.00E-05, loss= 1.2049 (max= 1.9196), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:06:14,480 - root - INFO - Step 17080: lr=1.00E-05, loss= 1.2049 (max= 1.9196), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:06:14,480 - root - INFO - Step 17080: lr=1.00E-05, loss= 1.2049 (max= 1.9196), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:06:14,480 - root - INFO - Step 17080: lr=1.00E-05, loss= 1.2049 (max= 1.9196), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:06:30,442 - root - INFO - Step 17090: lr=1.00E-05, loss= 1.2077 (max= 1.6638), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:06:30,442 - root - INFO - Step 17090: lr=1.00E-05, loss= 1.2077 (max= 1.6638), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:06:30,442 - root - INFO - Step 17090: lr=1.00E-05, loss= 1.2077 (max= 1.6638), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:06:30,442 - root - INFO - Step 17090: lr=1.00E-05, loss= 1.2077 (max= 1.6638), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:06:30,442 - root - INFO - Step 17090: lr=1.00E-05, loss= 1.2077 (max= 1.6638), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:06:30,442 - root - INFO - Step 17090: lr=1.00E-05, loss= 1.2077 (max= 1.6638), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:06:30,442 - root - INFO - Step 17090: lr=1.00E-05, loss= 1.2077 (max= 1.6638), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:06:30,442 - root - INFO - Step 17090: lr=1.00E-05, loss= 1.2077 (max= 1.6638), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:06:46,380 - root - INFO - Step 17100: lr=1.00E-05, loss= 1.2189 (max= 1.7543), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:06:46,380 - root - INFO - Step 17100: lr=1.00E-05, loss= 1.2189 (max= 1.7543), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:06:46,380 - root - INFO - Step 17100: lr=1.00E-05, loss= 1.2189 (max= 1.7543), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:06:46,380 - root - INFO - Step 17100: lr=1.00E-05, loss= 1.2189 (max= 1.7543), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:06:46,380 - root - INFO - Step 17100: lr=1.00E-05, loss= 1.2189 (max= 1.7543), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:06:46,380 - root - INFO - Step 17100: lr=1.00E-05, loss= 1.2189 (max= 1.7543), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:06:46,380 - root - INFO - Step 17100: lr=1.00E-05, loss= 1.2189 (max= 1.7543), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:06:46,380 - root - INFO - Step 17100: lr=1.00E-05, loss= 1.2189 (max= 1.7543), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:07:02,310 - root - INFO - Step 17110: lr=1.00E-05, loss= 1.2262 (max= 1.6277), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:02,310 - root - INFO - Step 17110: lr=1.00E-05, loss= 1.2262 (max= 1.6277), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:02,310 - root - INFO - Step 17110: lr=1.00E-05, loss= 1.2262 (max= 1.6277), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:02,310 - root - INFO - Step 17110: lr=1.00E-05, loss= 1.2262 (max= 1.6277), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:02,310 - root - INFO - Step 17110: lr=1.00E-05, loss= 1.2262 (max= 1.6277), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:02,310 - root - INFO - Step 17110: lr=1.00E-05, loss= 1.2262 (max= 1.6277), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:02,310 - root - INFO - Step 17110: lr=1.00E-05, loss= 1.2262 (max= 1.6277), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:02,310 - root - INFO - Step 17110: lr=1.00E-05, loss= 1.2262 (max= 1.6277), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:18,241 - root - INFO - Step 17120: lr=1.00E-05, loss= 1.2359 (max= 1.6150), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:18,241 - root - INFO - Step 17120: lr=1.00E-05, loss= 1.2359 (max= 1.6150), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:18,241 - root - INFO - Step 17120: lr=1.00E-05, loss= 1.2359 (max= 1.6150), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:18,241 - root - INFO - Step 17120: lr=1.00E-05, loss= 1.2359 (max= 1.6150), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:18,241 - root - INFO - Step 17120: lr=1.00E-05, loss= 1.2359 (max= 1.6150), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:18,241 - root - INFO - Step 17120: lr=1.00E-05, loss= 1.2359 (max= 1.6150), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:18,241 - root - INFO - Step 17120: lr=1.00E-05, loss= 1.2359 (max= 1.6150), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:18,241 - root - INFO - Step 17120: lr=1.00E-05, loss= 1.2359 (max= 1.6150), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:34,230 - root - INFO - Step 17130: lr=1.00E-05, loss= 1.1832 (max= 1.5473), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:07:34,230 - root - INFO - Step 17130: lr=1.00E-05, loss= 1.1832 (max= 1.5473), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:07:34,230 - root - INFO - Step 17130: lr=1.00E-05, loss= 1.1832 (max= 1.5473), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:07:34,230 - root - INFO - Step 17130: lr=1.00E-05, loss= 1.1832 (max= 1.5473), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:07:34,230 - root - INFO - Step 17130: lr=1.00E-05, loss= 1.1832 (max= 1.5473), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:07:34,230 - root - INFO - Step 17130: lr=1.00E-05, loss= 1.1832 (max= 1.5473), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:07:34,230 - root - INFO - Step 17130: lr=1.00E-05, loss= 1.1832 (max= 1.5473), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:07:34,230 - root - INFO - Step 17130: lr=1.00E-05, loss= 1.1832 (max= 1.5473), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:07:50,163 - root - INFO - Step 17140: lr=1.00E-05, loss= 1.2210 (max= 1.7487), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:50,163 - root - INFO - Step 17140: lr=1.00E-05, loss= 1.2210 (max= 1.7487), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:50,163 - root - INFO - Step 17140: lr=1.00E-05, loss= 1.2210 (max= 1.7487), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:50,163 - root - INFO - Step 17140: lr=1.00E-05, loss= 1.2210 (max= 1.7487), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:50,163 - root - INFO - Step 17140: lr=1.00E-05, loss= 1.2210 (max= 1.7487), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:50,163 - root - INFO - Step 17140: lr=1.00E-05, loss= 1.2210 (max= 1.7487), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:50,163 - root - INFO - Step 17140: lr=1.00E-05, loss= 1.2210 (max= 1.7487), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:07:50,163 - root - INFO - Step 17140: lr=1.00E-05, loss= 1.2210 (max= 1.7487), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:06,122 - root - INFO - Step 17150: lr=1.00E-05, loss= 1.2204 (max= 1.6784), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:06,122 - root - INFO - Step 17150: lr=1.00E-05, loss= 1.2204 (max= 1.6784), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:06,122 - root - INFO - Step 17150: lr=1.00E-05, loss= 1.2204 (max= 1.6784), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:06,122 - root - INFO - Step 17150: lr=1.00E-05, loss= 1.2204 (max= 1.6784), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:06,122 - root - INFO - Step 17150: lr=1.00E-05, loss= 1.2204 (max= 1.6784), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:06,122 - root - INFO - Step 17150: lr=1.00E-05, loss= 1.2204 (max= 1.6784), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:06,122 - root - INFO - Step 17150: lr=1.00E-05, loss= 1.2204 (max= 1.6784), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:06,122 - root - INFO - Step 17150: lr=1.00E-05, loss= 1.2204 (max= 1.6784), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:22,017 - root - INFO - Step 17160: lr=1.00E-05, loss= 1.1852 (max= 1.6169), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:22,017 - root - INFO - Step 17160: lr=1.00E-05, loss= 1.1852 (max= 1.6169), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:22,017 - root - INFO - Step 17160: lr=1.00E-05, loss= 1.1852 (max= 1.6169), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:22,017 - root - INFO - Step 17160: lr=1.00E-05, loss= 1.1852 (max= 1.6169), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:22,017 - root - INFO - Step 17160: lr=1.00E-05, loss= 1.1852 (max= 1.6169), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:22,017 - root - INFO - Step 17160: lr=1.00E-05, loss= 1.1852 (max= 1.6169), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:22,017 - root - INFO - Step 17160: lr=1.00E-05, loss= 1.1852 (max= 1.6169), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:22,017 - root - INFO - Step 17160: lr=1.00E-05, loss= 1.1852 (max= 1.6169), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:37,912 - root - INFO - Step 17170: lr=1.00E-05, loss= 1.2319 (max= 1.7485), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:08:37,913 - root - INFO - Step 17170: lr=1.00E-05, loss= 1.2319 (max= 1.7485), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:08:37,913 - root - INFO - Step 17170: lr=1.00E-05, loss= 1.2319 (max= 1.7485), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:08:37,913 - root - INFO - Step 17170: lr=1.00E-05, loss= 1.2319 (max= 1.7485), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:08:37,913 - root - INFO - Step 17170: lr=1.00E-05, loss= 1.2319 (max= 1.7485), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:08:37,913 - root - INFO - Step 17170: lr=1.00E-05, loss= 1.2319 (max= 1.7485), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:08:37,913 - root - INFO - Step 17170: lr=1.00E-05, loss= 1.2319 (max= 1.7485), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:08:37,913 - root - INFO - Step 17170: lr=1.00E-05, loss= 1.2319 (max= 1.7485), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:08:53,825 - root - INFO - Step 17180: lr=1.00E-05, loss= 1.1793 (max= 1.6127), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:53,825 - root - INFO - Step 17180: lr=1.00E-05, loss= 1.1793 (max= 1.6127), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:53,825 - root - INFO - Step 17180: lr=1.00E-05, loss= 1.1793 (max= 1.6127), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:53,825 - root - INFO - Step 17180: lr=1.00E-05, loss= 1.1793 (max= 1.6127), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:53,825 - root - INFO - Step 17180: lr=1.00E-05, loss= 1.1793 (max= 1.6127), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:53,825 - root - INFO - Step 17180: lr=1.00E-05, loss= 1.1793 (max= 1.6127), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:53,825 - root - INFO - Step 17180: lr=1.00E-05, loss= 1.1793 (max= 1.6127), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:08:53,825 - root - INFO - Step 17180: lr=1.00E-05, loss= 1.1793 (max= 1.6127), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:09,771 - root - INFO - Step 17190: lr=1.00E-05, loss= 1.2040 (max= 1.6238), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:09,772 - root - INFO - Step 17190: lr=1.00E-05, loss= 1.2040 (max= 1.6238), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:09,772 - root - INFO - Step 17190: lr=1.00E-05, loss= 1.2040 (max= 1.6238), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:09,772 - root - INFO - Step 17190: lr=1.00E-05, loss= 1.2040 (max= 1.6238), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:09,772 - root - INFO - Step 17190: lr=1.00E-05, loss= 1.2040 (max= 1.6238), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:09,772 - root - INFO - Step 17190: lr=1.00E-05, loss= 1.2040 (max= 1.6238), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:09,772 - root - INFO - Step 17190: lr=1.00E-05, loss= 1.2040 (max= 1.6238), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:09,772 - root - INFO - Step 17190: lr=1.00E-05, loss= 1.2040 (max= 1.6238), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:25,728 - root - INFO - Step 17200: lr=1.00E-05, loss= 1.2173 (max= 1.6018), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:25,729 - root - INFO - Step 17200: lr=1.00E-05, loss= 1.2173 (max= 1.6018), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:25,729 - root - INFO - Step 17200: lr=1.00E-05, loss= 1.2173 (max= 1.6018), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:25,729 - root - INFO - Step 17200: lr=1.00E-05, loss= 1.2173 (max= 1.6018), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:25,729 - root - INFO - Step 17200: lr=1.00E-05, loss= 1.2173 (max= 1.6018), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:25,729 - root - INFO - Step 17200: lr=1.00E-05, loss= 1.2173 (max= 1.6018), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:25,729 - root - INFO - Step 17200: lr=1.00E-05, loss= 1.2173 (max= 1.6018), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:25,729 - root - INFO - Step 17200: lr=1.00E-05, loss= 1.2173 (max= 1.6018), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:41,650 - root - INFO - Step 17210: lr=1.00E-05, loss= 1.2189 (max= 1.5943), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:09:41,650 - root - INFO - Step 17210: lr=1.00E-05, loss= 1.2189 (max= 1.5943), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:09:41,650 - root - INFO - Step 17210: lr=1.00E-05, loss= 1.2189 (max= 1.5943), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:09:41,650 - root - INFO - Step 17210: lr=1.00E-05, loss= 1.2189 (max= 1.5943), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:09:41,650 - root - INFO - Step 17210: lr=1.00E-05, loss= 1.2189 (max= 1.5943), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:09:41,650 - root - INFO - Step 17210: lr=1.00E-05, loss= 1.2189 (max= 1.5943), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:09:41,650 - root - INFO - Step 17210: lr=1.00E-05, loss= 1.2189 (max= 1.5943), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:09:41,650 - root - INFO - Step 17210: lr=1.00E-05, loss= 1.2189 (max= 1.5943), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:09:57,570 - root - INFO - Step 17220: lr=1.00E-05, loss= 1.2397 (max= 1.7218), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:57,570 - root - INFO - Step 17220: lr=1.00E-05, loss= 1.2397 (max= 1.7218), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:57,570 - root - INFO - Step 17220: lr=1.00E-05, loss= 1.2397 (max= 1.7218), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:57,570 - root - INFO - Step 17220: lr=1.00E-05, loss= 1.2397 (max= 1.7218), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:57,570 - root - INFO - Step 17220: lr=1.00E-05, loss= 1.2397 (max= 1.7218), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:57,570 - root - INFO - Step 17220: lr=1.00E-05, loss= 1.2397 (max= 1.7218), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:57,570 - root - INFO - Step 17220: lr=1.00E-05, loss= 1.2397 (max= 1.7218), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:09:57,570 - root - INFO - Step 17220: lr=1.00E-05, loss= 1.2397 (max= 1.7218), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:10:13,565 - root - INFO - Step 17230: lr=1.00E-05, loss= 1.2083 (max= 1.5309), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:10:13,565 - root - INFO - Step 17230: lr=1.00E-05, loss= 1.2083 (max= 1.5309), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:10:13,565 - root - INFO - Step 17230: lr=1.00E-05, loss= 1.2083 (max= 1.5309), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:10:13,565 - root - INFO - Step 17230: lr=1.00E-05, loss= 1.2083 (max= 1.5309), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:10:13,565 - root - INFO - Step 17230: lr=1.00E-05, loss= 1.2083 (max= 1.5309), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:10:13,565 - root - INFO - Step 17230: lr=1.00E-05, loss= 1.2083 (max= 1.5309), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:10:13,565 - root - INFO - Step 17230: lr=1.00E-05, loss= 1.2083 (max= 1.5309), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:10:13,565 - root - INFO - Step 17230: lr=1.00E-05, loss= 1.2083 (max= 1.5309), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:10:29,546 - root - INFO - Step 17240: lr=1.00E-05, loss= 1.2284 (max= 1.6450), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:10:29,546 - root - INFO - Step 17240: lr=1.00E-05, loss= 1.2284 (max= 1.6450), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:10:29,546 - root - INFO - Step 17240: lr=1.00E-05, loss= 1.2284 (max= 1.6450), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:10:29,546 - root - INFO - Step 17240: lr=1.00E-05, loss= 1.2284 (max= 1.6450), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:10:29,546 - root - INFO - Step 17240: lr=1.00E-05, loss= 1.2284 (max= 1.6450), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:10:29,546 - root - INFO - Step 17240: lr=1.00E-05, loss= 1.2284 (max= 1.6450), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:10:29,546 - root - INFO - Step 17240: lr=1.00E-05, loss= 1.2284 (max= 1.6450), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:10:29,546 - root - INFO - Step 17240: lr=1.00E-05, loss= 1.2284 (max= 1.6450), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:10:45,465 - root - INFO - Step 17250: lr=1.00E-05, loss= 1.2141 (max= 1.6343), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:10:45,465 - root - INFO - Step 17250: lr=1.00E-05, loss= 1.2141 (max= 1.6343), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:10:45,465 - root - INFO - Step 17250: lr=1.00E-05, loss= 1.2141 (max= 1.6343), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:10:45,465 - root - INFO - Step 17250: lr=1.00E-05, loss= 1.2141 (max= 1.6343), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:10:45,465 - root - INFO - Step 17250: lr=1.00E-05, loss= 1.2141 (max= 1.6343), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:10:45,465 - root - INFO - Step 17250: lr=1.00E-05, loss= 1.2141 (max= 1.6343), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:10:45,465 - root - INFO - Step 17250: lr=1.00E-05, loss= 1.2141 (max= 1.6343), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:10:45,465 - root - INFO - Step 17250: lr=1.00E-05, loss= 1.2141 (max= 1.6343), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:11:01,373 - root - INFO - Step 17260: lr=1.00E-05, loss= 1.2010 (max= 1.6534), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:01,373 - root - INFO - Step 17260: lr=1.00E-05, loss= 1.2010 (max= 1.6534), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:01,373 - root - INFO - Step 17260: lr=1.00E-05, loss= 1.2010 (max= 1.6534), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:01,373 - root - INFO - Step 17260: lr=1.00E-05, loss= 1.2010 (max= 1.6534), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:01,373 - root - INFO - Step 17260: lr=1.00E-05, loss= 1.2010 (max= 1.6534), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:01,373 - root - INFO - Step 17260: lr=1.00E-05, loss= 1.2010 (max= 1.6534), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:01,373 - root - INFO - Step 17260: lr=1.00E-05, loss= 1.2010 (max= 1.6534), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:01,373 - root - INFO - Step 17260: lr=1.00E-05, loss= 1.2010 (max= 1.6534), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:17,285 - root - INFO - Step 17270: lr=1.00E-05, loss= 1.1994 (max= 1.6048), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:17,286 - root - INFO - Step 17270: lr=1.00E-05, loss= 1.1994 (max= 1.6048), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:17,286 - root - INFO - Step 17270: lr=1.00E-05, loss= 1.1994 (max= 1.6048), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:17,286 - root - INFO - Step 17270: lr=1.00E-05, loss= 1.1994 (max= 1.6048), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:17,286 - root - INFO - Step 17270: lr=1.00E-05, loss= 1.1994 (max= 1.6048), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:17,286 - root - INFO - Step 17270: lr=1.00E-05, loss= 1.1994 (max= 1.6048), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:17,286 - root - INFO - Step 17270: lr=1.00E-05, loss= 1.1994 (max= 1.6048), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:17,286 - root - INFO - Step 17270: lr=1.00E-05, loss= 1.1994 (max= 1.6048), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:33,221 - root - INFO - Step 17280: lr=1.00E-05, loss= 1.1862 (max= 1.5594), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:11:33,221 - root - INFO - Step 17280: lr=1.00E-05, loss= 1.1862 (max= 1.5594), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:11:33,221 - root - INFO - Step 17280: lr=1.00E-05, loss= 1.1862 (max= 1.5594), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:11:33,221 - root - INFO - Step 17280: lr=1.00E-05, loss= 1.1862 (max= 1.5594), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:11:33,221 - root - INFO - Step 17280: lr=1.00E-05, loss= 1.1862 (max= 1.5594), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:11:33,221 - root - INFO - Step 17280: lr=1.00E-05, loss= 1.1862 (max= 1.5594), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:11:33,221 - root - INFO - Step 17280: lr=1.00E-05, loss= 1.1862 (max= 1.5594), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:11:33,221 - root - INFO - Step 17280: lr=1.00E-05, loss= 1.1862 (max= 1.5594), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:11:49,150 - root - INFO - Step 17290: lr=1.00E-05, loss= 1.2297 (max= 1.6351), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:49,151 - root - INFO - Step 17290: lr=1.00E-05, loss= 1.2297 (max= 1.6351), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:49,151 - root - INFO - Step 17290: lr=1.00E-05, loss= 1.2297 (max= 1.6351), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:49,151 - root - INFO - Step 17290: lr=1.00E-05, loss= 1.2297 (max= 1.6351), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:49,151 - root - INFO - Step 17290: lr=1.00E-05, loss= 1.2297 (max= 1.6351), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:49,151 - root - INFO - Step 17290: lr=1.00E-05, loss= 1.2297 (max= 1.6351), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:49,151 - root - INFO - Step 17290: lr=1.00E-05, loss= 1.2297 (max= 1.6351), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:11:49,151 - root - INFO - Step 17290: lr=1.00E-05, loss= 1.2297 (max= 1.6351), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:12:05,093 - root - INFO - Step 17300: lr=1.00E-05, loss= 1.1989 (max= 1.5823), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:12:05,093 - root - INFO - Step 17300: lr=1.00E-05, loss= 1.1989 (max= 1.5823), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:12:05,093 - root - INFO - Step 17300: lr=1.00E-05, loss= 1.1989 (max= 1.5823), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:12:05,093 - root - INFO - Step 17300: lr=1.00E-05, loss= 1.1989 (max= 1.5823), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:12:05,093 - root - INFO - Step 17300: lr=1.00E-05, loss= 1.1989 (max= 1.5823), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:12:05,093 - root - INFO - Step 17300: lr=1.00E-05, loss= 1.1989 (max= 1.5823), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:12:05,093 - root - INFO - Step 17300: lr=1.00E-05, loss= 1.1989 (max= 1.5823), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:12:05,093 - root - INFO - Step 17300: lr=1.00E-05, loss= 1.1989 (max= 1.5823), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:12:21,030 - root - INFO - Step 17310: lr=1.00E-05, loss= 1.2277 (max= 1.6908), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:12:21,030 - root - INFO - Step 17310: lr=1.00E-05, loss= 1.2277 (max= 1.6908), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:12:21,030 - root - INFO - Step 17310: lr=1.00E-05, loss= 1.2277 (max= 1.6908), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:12:21,030 - root - INFO - Step 17310: lr=1.00E-05, loss= 1.2277 (max= 1.6908), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:12:21,031 - root - INFO - Step 17310: lr=1.00E-05, loss= 1.2277 (max= 1.6908), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:12:21,031 - root - INFO - Step 17310: lr=1.00E-05, loss= 1.2277 (max= 1.6908), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:12:21,031 - root - INFO - Step 17310: lr=1.00E-05, loss= 1.2277 (max= 1.6908), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:12:21,031 - root - INFO - Step 17310: lr=1.00E-05, loss= 1.2277 (max= 1.6908), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:12:36,982 - root - INFO - Step 17320: lr=1.00E-05, loss= 1.1815 (max= 1.7665), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:12:36,982 - root - INFO - Step 17320: lr=1.00E-05, loss= 1.1815 (max= 1.7665), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:12:36,982 - root - INFO - Step 17320: lr=1.00E-05, loss= 1.1815 (max= 1.7665), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:12:36,982 - root - INFO - Step 17320: lr=1.00E-05, loss= 1.1815 (max= 1.7665), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:12:36,982 - root - INFO - Step 17320: lr=1.00E-05, loss= 1.1815 (max= 1.7665), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:12:36,982 - root - INFO - Step 17320: lr=1.00E-05, loss= 1.1815 (max= 1.7665), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:12:36,982 - root - INFO - Step 17320: lr=1.00E-05, loss= 1.1815 (max= 1.7665), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:12:36,982 - root - INFO - Step 17320: lr=1.00E-05, loss= 1.1815 (max= 1.7665), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:12:52,912 - root - INFO - Step 17330: lr=1.00E-05, loss= 1.1596 (max= 1.5463), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:12:52,912 - root - INFO - Step 17330: lr=1.00E-05, loss= 1.1596 (max= 1.5463), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:12:52,912 - root - INFO - Step 17330: lr=1.00E-05, loss= 1.1596 (max= 1.5463), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:12:52,912 - root - INFO - Step 17330: lr=1.00E-05, loss= 1.1596 (max= 1.5463), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:12:52,912 - root - INFO - Step 17330: lr=1.00E-05, loss= 1.1596 (max= 1.5463), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:12:52,912 - root - INFO - Step 17330: lr=1.00E-05, loss= 1.1596 (max= 1.5463), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:12:52,912 - root - INFO - Step 17330: lr=1.00E-05, loss= 1.1596 (max= 1.5463), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:12:52,912 - root - INFO - Step 17330: lr=1.00E-05, loss= 1.1596 (max= 1.5463), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:13:08,866 - root - INFO - Step 17340: lr=1.00E-05, loss= 1.2192 (max= 1.7011), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:13:08,866 - root - INFO - Step 17340: lr=1.00E-05, loss= 1.2192 (max= 1.7011), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:13:08,866 - root - INFO - Step 17340: lr=1.00E-05, loss= 1.2192 (max= 1.7011), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:13:08,866 - root - INFO - Step 17340: lr=1.00E-05, loss= 1.2192 (max= 1.7011), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:13:08,866 - root - INFO - Step 17340: lr=1.00E-05, loss= 1.2192 (max= 1.7011), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:13:08,866 - root - INFO - Step 17340: lr=1.00E-05, loss= 1.2192 (max= 1.7011), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:13:08,866 - root - INFO - Step 17340: lr=1.00E-05, loss= 1.2192 (max= 1.7011), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:13:08,866 - root - INFO - Step 17340: lr=1.00E-05, loss= 1.2192 (max= 1.7011), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:13:24,787 - root - INFO - Step 17350: lr=1.00E-05, loss= 1.2487 (max= 1.6076), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:13:24,787 - root - INFO - Step 17350: lr=1.00E-05, loss= 1.2487 (max= 1.6076), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:13:24,787 - root - INFO - Step 17350: lr=1.00E-05, loss= 1.2487 (max= 1.6076), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:13:24,787 - root - INFO - Step 17350: lr=1.00E-05, loss= 1.2487 (max= 1.6076), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:13:24,787 - root - INFO - Step 17350: lr=1.00E-05, loss= 1.2487 (max= 1.6076), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:13:24,787 - root - INFO - Step 17350: lr=1.00E-05, loss= 1.2487 (max= 1.6076), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:13:24,787 - root - INFO - Step 17350: lr=1.00E-05, loss= 1.2487 (max= 1.6076), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:13:24,787 - root - INFO - Step 17350: lr=1.00E-05, loss= 1.2487 (max= 1.6076), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:13:40,726 - root - INFO - Step 17360: lr=1.00E-05, loss= 1.2261 (max= 1.6807), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:13:40,726 - root - INFO - Step 17360: lr=1.00E-05, loss= 1.2261 (max= 1.6807), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:13:40,726 - root - INFO - Step 17360: lr=1.00E-05, loss= 1.2261 (max= 1.6807), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:13:40,726 - root - INFO - Step 17360: lr=1.00E-05, loss= 1.2261 (max= 1.6807), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:13:40,726 - root - INFO - Step 17360: lr=1.00E-05, loss= 1.2261 (max= 1.6807), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:13:40,726 - root - INFO - Step 17360: lr=1.00E-05, loss= 1.2261 (max= 1.6807), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:13:40,726 - root - INFO - Step 17360: lr=1.00E-05, loss= 1.2261 (max= 1.6807), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:13:40,726 - root - INFO - Step 17360: lr=1.00E-05, loss= 1.2261 (max= 1.6807), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:13:56,671 - root - INFO - Step 17370: lr=1.00E-05, loss= 1.2034 (max= 1.6311), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:13:56,671 - root - INFO - Step 17370: lr=1.00E-05, loss= 1.2034 (max= 1.6311), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:13:56,671 - root - INFO - Step 17370: lr=1.00E-05, loss= 1.2034 (max= 1.6311), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:13:56,671 - root - INFO - Step 17370: lr=1.00E-05, loss= 1.2034 (max= 1.6311), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:13:56,671 - root - INFO - Step 17370: lr=1.00E-05, loss= 1.2034 (max= 1.6311), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:13:56,671 - root - INFO - Step 17370: lr=1.00E-05, loss= 1.2034 (max= 1.6311), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:13:56,671 - root - INFO - Step 17370: lr=1.00E-05, loss= 1.2034 (max= 1.6311), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:13:56,672 - root - INFO - Step 17370: lr=1.00E-05, loss= 1.2034 (max= 1.6311), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:14:12,571 - root - INFO - Step 17380: lr=1.00E-05, loss= 1.2113 (max= 1.6280), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:12,571 - root - INFO - Step 17380: lr=1.00E-05, loss= 1.2113 (max= 1.6280), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:12,571 - root - INFO - Step 17380: lr=1.00E-05, loss= 1.2113 (max= 1.6280), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:12,571 - root - INFO - Step 17380: lr=1.00E-05, loss= 1.2113 (max= 1.6280), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:12,571 - root - INFO - Step 17380: lr=1.00E-05, loss= 1.2113 (max= 1.6280), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:12,571 - root - INFO - Step 17380: lr=1.00E-05, loss= 1.2113 (max= 1.6280), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:12,571 - root - INFO - Step 17380: lr=1.00E-05, loss= 1.2113 (max= 1.6280), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:12,571 - root - INFO - Step 17380: lr=1.00E-05, loss= 1.2113 (max= 1.6280), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:28,533 - root - INFO - Step 17390: lr=1.00E-05, loss= 1.1903 (max= 1.6699), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:28,533 - root - INFO - Step 17390: lr=1.00E-05, loss= 1.1903 (max= 1.6699), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:28,533 - root - INFO - Step 17390: lr=1.00E-05, loss= 1.1903 (max= 1.6699), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:28,533 - root - INFO - Step 17390: lr=1.00E-05, loss= 1.1903 (max= 1.6699), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:28,534 - root - INFO - Step 17390: lr=1.00E-05, loss= 1.1903 (max= 1.6699), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:28,534 - root - INFO - Step 17390: lr=1.00E-05, loss= 1.1903 (max= 1.6699), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:28,534 - root - INFO - Step 17390: lr=1.00E-05, loss= 1.1903 (max= 1.6699), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:28,534 - root - INFO - Step 17390: lr=1.00E-05, loss= 1.1903 (max= 1.6699), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:44,430 - root - INFO - Step 17400: lr=1.00E-05, loss= 1.2184 (max= 1.5151), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:44,431 - root - INFO - Step 17400: lr=1.00E-05, loss= 1.2184 (max= 1.5151), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:44,431 - root - INFO - Step 17400: lr=1.00E-05, loss= 1.2184 (max= 1.5151), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:44,431 - root - INFO - Step 17400: lr=1.00E-05, loss= 1.2184 (max= 1.5151), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:44,431 - root - INFO - Step 17400: lr=1.00E-05, loss= 1.2184 (max= 1.5151), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:44,431 - root - INFO - Step 17400: lr=1.00E-05, loss= 1.2184 (max= 1.5151), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:44,431 - root - INFO - Step 17400: lr=1.00E-05, loss= 1.2184 (max= 1.5151), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:14:44,431 - root - INFO - Step 17400: lr=1.00E-05, loss= 1.2184 (max= 1.5151), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:15:00,410 - root - INFO - Step 17410: lr=1.00E-05, loss= 1.2367 (max= 1.7106), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:15:00,410 - root - INFO - Step 17410: lr=1.00E-05, loss= 1.2367 (max= 1.7106), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:15:00,411 - root - INFO - Step 17410: lr=1.00E-05, loss= 1.2367 (max= 1.7106), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:15:00,411 - root - INFO - Step 17410: lr=1.00E-05, loss= 1.2367 (max= 1.7106), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:15:00,411 - root - INFO - Step 17410: lr=1.00E-05, loss= 1.2367 (max= 1.7106), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:15:00,411 - root - INFO - Step 17410: lr=1.00E-05, loss= 1.2367 (max= 1.7106), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:15:00,411 - root - INFO - Step 17410: lr=1.00E-05, loss= 1.2367 (max= 1.7106), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:15:00,411 - root - INFO - Step 17410: lr=1.00E-05, loss= 1.2367 (max= 1.7106), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:15:16,366 - root - INFO - Step 17420: lr=1.00E-05, loss= 1.1893 (max= 1.5160), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:15:16,367 - root - INFO - Step 17420: lr=1.00E-05, loss= 1.1893 (max= 1.5160), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:15:16,367 - root - INFO - Step 17420: lr=1.00E-05, loss= 1.1893 (max= 1.5160), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:15:16,367 - root - INFO - Step 17420: lr=1.00E-05, loss= 1.1893 (max= 1.5160), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:15:16,367 - root - INFO - Step 17420: lr=1.00E-05, loss= 1.1893 (max= 1.5160), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:15:16,367 - root - INFO - Step 17420: lr=1.00E-05, loss= 1.1893 (max= 1.5160), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:15:16,367 - root - INFO - Step 17420: lr=1.00E-05, loss= 1.1893 (max= 1.5160), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:15:16,367 - root - INFO - Step 17420: lr=1.00E-05, loss= 1.1893 (max= 1.5160), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:15:32,320 - root - INFO - Step 17430: lr=1.00E-05, loss= 1.2083 (max= 1.5939), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:15:32,320 - root - INFO - Step 17430: lr=1.00E-05, loss= 1.2083 (max= 1.5939), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:15:32,320 - root - INFO - Step 17430: lr=1.00E-05, loss= 1.2083 (max= 1.5939), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:15:32,320 - root - INFO - Step 17430: lr=1.00E-05, loss= 1.2083 (max= 1.5939), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:15:32,320 - root - INFO - Step 17430: lr=1.00E-05, loss= 1.2083 (max= 1.5939), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:15:32,320 - root - INFO - Step 17430: lr=1.00E-05, loss= 1.2083 (max= 1.5939), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:15:32,320 - root - INFO - Step 17430: lr=1.00E-05, loss= 1.2083 (max= 1.5939), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:15:32,320 - root - INFO - Step 17430: lr=1.00E-05, loss= 1.2083 (max= 1.5939), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:15:48,274 - root - INFO - Step 17440: lr=1.00E-05, loss= 1.2100 (max= 1.6287), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:15:48,274 - root - INFO - Step 17440: lr=1.00E-05, loss= 1.2100 (max= 1.6287), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:15:48,274 - root - INFO - Step 17440: lr=1.00E-05, loss= 1.2100 (max= 1.6287), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:15:48,274 - root - INFO - Step 17440: lr=1.00E-05, loss= 1.2100 (max= 1.6287), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:15:48,274 - root - INFO - Step 17440: lr=1.00E-05, loss= 1.2100 (max= 1.6287), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:15:48,274 - root - INFO - Step 17440: lr=1.00E-05, loss= 1.2100 (max= 1.6287), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:15:48,274 - root - INFO - Step 17440: lr=1.00E-05, loss= 1.2100 (max= 1.6287), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:15:48,274 - root - INFO - Step 17440: lr=1.00E-05, loss= 1.2100 (max= 1.6287), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:04,205 - root - INFO - Step 17450: lr=1.00E-05, loss= 1.1827 (max= 1.5455), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:04,205 - root - INFO - Step 17450: lr=1.00E-05, loss= 1.1827 (max= 1.5455), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:04,205 - root - INFO - Step 17450: lr=1.00E-05, loss= 1.1827 (max= 1.5455), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:04,205 - root - INFO - Step 17450: lr=1.00E-05, loss= 1.1827 (max= 1.5455), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:04,205 - root - INFO - Step 17450: lr=1.00E-05, loss= 1.1827 (max= 1.5455), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:04,205 - root - INFO - Step 17450: lr=1.00E-05, loss= 1.1827 (max= 1.5455), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:04,205 - root - INFO - Step 17450: lr=1.00E-05, loss= 1.1827 (max= 1.5455), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:04,205 - root - INFO - Step 17450: lr=1.00E-05, loss= 1.1827 (max= 1.5455), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:20,138 - root - INFO - Step 17460: lr=1.00E-05, loss= 1.1754 (max= 1.5833), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:20,138 - root - INFO - Step 17460: lr=1.00E-05, loss= 1.1754 (max= 1.5833), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:20,138 - root - INFO - Step 17460: lr=1.00E-05, loss= 1.1754 (max= 1.5833), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:20,138 - root - INFO - Step 17460: lr=1.00E-05, loss= 1.1754 (max= 1.5833), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:20,138 - root - INFO - Step 17460: lr=1.00E-05, loss= 1.1754 (max= 1.5833), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:20,138 - root - INFO - Step 17460: lr=1.00E-05, loss= 1.1754 (max= 1.5833), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:20,138 - root - INFO - Step 17460: lr=1.00E-05, loss= 1.1754 (max= 1.5833), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:20,138 - root - INFO - Step 17460: lr=1.00E-05, loss= 1.1754 (max= 1.5833), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:36,077 - root - INFO - Step 17470: lr=1.00E-05, loss= 1.2099 (max= 1.5656), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:36,077 - root - INFO - Step 17470: lr=1.00E-05, loss= 1.2099 (max= 1.5656), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:36,077 - root - INFO - Step 17470: lr=1.00E-05, loss= 1.2099 (max= 1.5656), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:36,077 - root - INFO - Step 17470: lr=1.00E-05, loss= 1.2099 (max= 1.5656), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:36,077 - root - INFO - Step 17470: lr=1.00E-05, loss= 1.2099 (max= 1.5656), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:36,077 - root - INFO - Step 17470: lr=1.00E-05, loss= 1.2099 (max= 1.5656), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:36,077 - root - INFO - Step 17470: lr=1.00E-05, loss= 1.2099 (max= 1.5656), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:36,077 - root - INFO - Step 17470: lr=1.00E-05, loss= 1.2099 (max= 1.5656), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:52,051 - root - INFO - Step 17480: lr=1.00E-05, loss= 1.1937 (max= 1.6658), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:52,051 - root - INFO - Step 17480: lr=1.00E-05, loss= 1.1937 (max= 1.6658), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:52,051 - root - INFO - Step 17480: lr=1.00E-05, loss= 1.1937 (max= 1.6658), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:52,051 - root - INFO - Step 17480: lr=1.00E-05, loss= 1.1937 (max= 1.6658), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:52,051 - root - INFO - Step 17480: lr=1.00E-05, loss= 1.1937 (max= 1.6658), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:52,051 - root - INFO - Step 17480: lr=1.00E-05, loss= 1.1937 (max= 1.6658), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:52,051 - root - INFO - Step 17480: lr=1.00E-05, loss= 1.1937 (max= 1.6658), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:16:52,051 - root - INFO - Step 17480: lr=1.00E-05, loss= 1.1937 (max= 1.6658), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:07,986 - root - INFO - Step 17490: lr=1.00E-05, loss= 1.1910 (max= 1.5644), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:07,986 - root - INFO - Step 17490: lr=1.00E-05, loss= 1.1910 (max= 1.5644), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:07,987 - root - INFO - Step 17490: lr=1.00E-05, loss= 1.1910 (max= 1.5644), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:07,987 - root - INFO - Step 17490: lr=1.00E-05, loss= 1.1910 (max= 1.5644), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:07,987 - root - INFO - Step 17490: lr=1.00E-05, loss= 1.1910 (max= 1.5644), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:07,987 - root - INFO - Step 17490: lr=1.00E-05, loss= 1.1910 (max= 1.5644), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:07,987 - root - INFO - Step 17490: lr=1.00E-05, loss= 1.1910 (max= 1.5644), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:07,987 - root - INFO - Step 17490: lr=1.00E-05, loss= 1.1910 (max= 1.5644), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:23,960 - root - INFO - Step 17500: lr=1.00E-05, loss= 1.2324 (max= 1.6481), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:23,960 - root - INFO - Step 17500: lr=1.00E-05, loss= 1.2324 (max= 1.6481), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:23,961 - root - INFO - Step 17500: lr=1.00E-05, loss= 1.2324 (max= 1.6481), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:23,961 - root - INFO - Step 17500: lr=1.00E-05, loss= 1.2324 (max= 1.6481), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:23,961 - root - INFO - Step 17500: lr=1.00E-05, loss= 1.2324 (max= 1.6481), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:23,961 - root - INFO - Step 17500: lr=1.00E-05, loss= 1.2324 (max= 1.6481), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:23,961 - root - INFO - Step 17500: lr=1.00E-05, loss= 1.2324 (max= 1.6481), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:23,961 - root - INFO - Step 17500: lr=1.00E-05, loss= 1.2324 (max= 1.6481), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:39,920 - root - INFO - Step 17510: lr=1.00E-05, loss= 1.1929 (max= 1.6505), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:39,920 - root - INFO - Step 17510: lr=1.00E-05, loss= 1.1929 (max= 1.6505), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:39,921 - root - INFO - Step 17510: lr=1.00E-05, loss= 1.1929 (max= 1.6505), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:39,921 - root - INFO - Step 17510: lr=1.00E-05, loss= 1.1929 (max= 1.6505), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:39,921 - root - INFO - Step 17510: lr=1.00E-05, loss= 1.1929 (max= 1.6505), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:39,921 - root - INFO - Step 17510: lr=1.00E-05, loss= 1.1929 (max= 1.6505), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:39,921 - root - INFO - Step 17510: lr=1.00E-05, loss= 1.1929 (max= 1.6505), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:39,921 - root - INFO - Step 17510: lr=1.00E-05, loss= 1.1929 (max= 1.6505), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:55,848 - root - INFO - Step 17520: lr=1.00E-05, loss= 1.2344 (max= 1.6876), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:55,848 - root - INFO - Step 17520: lr=1.00E-05, loss= 1.2344 (max= 1.6876), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:55,848 - root - INFO - Step 17520: lr=1.00E-05, loss= 1.2344 (max= 1.6876), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:55,848 - root - INFO - Step 17520: lr=1.00E-05, loss= 1.2344 (max= 1.6876), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:55,848 - root - INFO - Step 17520: lr=1.00E-05, loss= 1.2344 (max= 1.6876), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:55,848 - root - INFO - Step 17520: lr=1.00E-05, loss= 1.2344 (max= 1.6876), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:55,848 - root - INFO - Step 17520: lr=1.00E-05, loss= 1.2344 (max= 1.6876), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:17:55,848 - root - INFO - Step 17520: lr=1.00E-05, loss= 1.2344 (max= 1.6876), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:11,755 - root - INFO - Step 17530: lr=1.00E-05, loss= 1.2098 (max= 1.6331), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:11,755 - root - INFO - Step 17530: lr=1.00E-05, loss= 1.2098 (max= 1.6331), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:11,755 - root - INFO - Step 17530: lr=1.00E-05, loss= 1.2098 (max= 1.6331), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:11,755 - root - INFO - Step 17530: lr=1.00E-05, loss= 1.2098 (max= 1.6331), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:11,755 - root - INFO - Step 17530: lr=1.00E-05, loss= 1.2098 (max= 1.6331), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:11,755 - root - INFO - Step 17530: lr=1.00E-05, loss= 1.2098 (max= 1.6331), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:11,755 - root - INFO - Step 17530: lr=1.00E-05, loss= 1.2098 (max= 1.6331), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:11,755 - root - INFO - Step 17530: lr=1.00E-05, loss= 1.2098 (max= 1.6331), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:27,725 - root - INFO - Step 17540: lr=1.00E-05, loss= 1.2212 (max= 1.6373), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:27,725 - root - INFO - Step 17540: lr=1.00E-05, loss= 1.2212 (max= 1.6373), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:27,725 - root - INFO - Step 17540: lr=1.00E-05, loss= 1.2212 (max= 1.6373), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:27,725 - root - INFO - Step 17540: lr=1.00E-05, loss= 1.2212 (max= 1.6373), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:27,725 - root - INFO - Step 17540: lr=1.00E-05, loss= 1.2212 (max= 1.6373), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:27,725 - root - INFO - Step 17540: lr=1.00E-05, loss= 1.2212 (max= 1.6373), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:27,725 - root - INFO - Step 17540: lr=1.00E-05, loss= 1.2212 (max= 1.6373), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:27,725 - root - INFO - Step 17540: lr=1.00E-05, loss= 1.2212 (max= 1.6373), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:43,678 - root - INFO - Step 17550: lr=1.00E-05, loss= 1.2036 (max= 1.6766), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:43,678 - root - INFO - Step 17550: lr=1.00E-05, loss= 1.2036 (max= 1.6766), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:43,678 - root - INFO - Step 17550: lr=1.00E-05, loss= 1.2036 (max= 1.6766), tps=20545, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:43,678 - root - INFO - Step 17550: lr=1.00E-05, loss= 1.2036 (max= 1.6766), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:43,679 - root - INFO - Step 17550: lr=1.00E-05, loss= 1.2036 (max= 1.6766), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:43,679 - root - INFO - Step 17550: lr=1.00E-05, loss= 1.2036 (max= 1.6766), tps=20545, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:43,679 - root - INFO - Step 17550: lr=1.00E-05, loss= 1.2036 (max= 1.6766), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:43,679 - root - INFO - Step 17550: lr=1.00E-05, loss= 1.2036 (max= 1.6766), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:18:59,664 - root - INFO - Step 17560: lr=1.00E-05, loss= 1.1830 (max= 1.6134), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:18:59,664 - root - INFO - Step 17560: lr=1.00E-05, loss= 1.1830 (max= 1.6134), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:18:59,664 - root - INFO - Step 17560: lr=1.00E-05, loss= 1.1830 (max= 1.6134), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:18:59,664 - root - INFO - Step 17560: lr=1.00E-05, loss= 1.1830 (max= 1.6134), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:18:59,664 - root - INFO - Step 17560: lr=1.00E-05, loss= 1.1830 (max= 1.6134), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:18:59,664 - root - INFO - Step 17560: lr=1.00E-05, loss= 1.1830 (max= 1.6134), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:18:59,664 - root - INFO - Step 17560: lr=1.00E-05, loss= 1.1830 (max= 1.6134), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:18:59,664 - root - INFO - Step 17560: lr=1.00E-05, loss= 1.1830 (max= 1.6134), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:15,632 - root - INFO - Step 17570: lr=1.00E-05, loss= 1.2073 (max= 1.5583), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:15,633 - root - INFO - Step 17570: lr=1.00E-05, loss= 1.2073 (max= 1.5583), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:15,633 - root - INFO - Step 17570: lr=1.00E-05, loss= 1.2073 (max= 1.5583), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:15,633 - root - INFO - Step 17570: lr=1.00E-05, loss= 1.2073 (max= 1.5583), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:15,633 - root - INFO - Step 17570: lr=1.00E-05, loss= 1.2073 (max= 1.5583), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:15,633 - root - INFO - Step 17570: lr=1.00E-05, loss= 1.2073 (max= 1.5583), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:15,633 - root - INFO - Step 17570: lr=1.00E-05, loss= 1.2073 (max= 1.5583), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:15,633 - root - INFO - Step 17570: lr=1.00E-05, loss= 1.2073 (max= 1.5583), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:31,530 - root - INFO - Step 17580: lr=1.00E-05, loss= 1.2096 (max= 1.5880), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:31,530 - root - INFO - Step 17580: lr=1.00E-05, loss= 1.2096 (max= 1.5880), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:31,530 - root - INFO - Step 17580: lr=1.00E-05, loss= 1.2096 (max= 1.5880), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:31,531 - root - INFO - Step 17580: lr=1.00E-05, loss= 1.2096 (max= 1.5880), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:31,531 - root - INFO - Step 17580: lr=1.00E-05, loss= 1.2096 (max= 1.5880), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:31,531 - root - INFO - Step 17580: lr=1.00E-05, loss= 1.2096 (max= 1.5880), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:31,531 - root - INFO - Step 17580: lr=1.00E-05, loss= 1.2096 (max= 1.5880), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:31,531 - root - INFO - Step 17580: lr=1.00E-05, loss= 1.2096 (max= 1.5880), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:47,490 - root - INFO - Step 17590: lr=1.00E-05, loss= 1.2200 (max= 1.5829), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:47,490 - root - INFO - Step 17590: lr=1.00E-05, loss= 1.2200 (max= 1.5829), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:47,490 - root - INFO - Step 17590: lr=1.00E-05, loss= 1.2200 (max= 1.5829), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:47,490 - root - INFO - Step 17590: lr=1.00E-05, loss= 1.2200 (max= 1.5829), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:47,490 - root - INFO - Step 17590: lr=1.00E-05, loss= 1.2200 (max= 1.5829), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:47,490 - root - INFO - Step 17590: lr=1.00E-05, loss= 1.2200 (max= 1.5829), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:47,490 - root - INFO - Step 17590: lr=1.00E-05, loss= 1.2200 (max= 1.5829), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:19:47,491 - root - INFO - Step 17590: lr=1.00E-05, loss= 1.2200 (max= 1.5829), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:03,420 - root - INFO - Step 17600: lr=1.00E-05, loss= 1.1806 (max= 1.4981), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:03,420 - root - INFO - Step 17600: lr=1.00E-05, loss= 1.1806 (max= 1.4981), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:03,420 - root - INFO - Step 17600: lr=1.00E-05, loss= 1.1806 (max= 1.4981), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:03,420 - root - INFO - Step 17600: lr=1.00E-05, loss= 1.1806 (max= 1.4981), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:03,420 - root - INFO - Step 17600: lr=1.00E-05, loss= 1.1806 (max= 1.4981), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:03,420 - root - INFO - Step 17600: lr=1.00E-05, loss= 1.1806 (max= 1.4981), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:03,420 - root - INFO - Step 17600: lr=1.00E-05, loss= 1.1806 (max= 1.4981), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:03,420 - root - INFO - Step 17600: lr=1.00E-05, loss= 1.1806 (max= 1.4981), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:19,310 - root - INFO - Step 17610: lr=1.00E-05, loss= 1.2251 (max= 1.5729), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:19,310 - root - INFO - Step 17610: lr=1.00E-05, loss= 1.2251 (max= 1.5729), tps=20626, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:19,310 - root - INFO - Step 17610: lr=1.00E-05, loss= 1.2251 (max= 1.5729), tps=20626, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:19,310 - root - INFO - Step 17610: lr=1.00E-05, loss= 1.2251 (max= 1.5729), tps=20626, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:19,310 - root - INFO - Step 17610: lr=1.00E-05, loss= 1.2251 (max= 1.5729), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:19,310 - root - INFO - Step 17610: lr=1.00E-05, loss= 1.2251 (max= 1.5729), tps=20626, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:19,310 - root - INFO - Step 17610: lr=1.00E-05, loss= 1.2251 (max= 1.5729), tps=20626, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:19,310 - root - INFO - Step 17610: lr=1.00E-05, loss= 1.2251 (max= 1.5729), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:35,262 - root - INFO - Step 17620: lr=1.00E-05, loss= 1.2028 (max= 1.7013), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:35,262 - root - INFO - Step 17620: lr=1.00E-05, loss= 1.2028 (max= 1.7013), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:35,262 - root - INFO - Step 17620: lr=1.00E-05, loss= 1.2028 (max= 1.7013), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:35,262 - root - INFO - Step 17620: lr=1.00E-05, loss= 1.2028 (max= 1.7013), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:35,262 - root - INFO - Step 17620: lr=1.00E-05, loss= 1.2028 (max= 1.7013), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:35,262 - root - INFO - Step 17620: lr=1.00E-05, loss= 1.2028 (max= 1.7013), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:35,262 - root - INFO - Step 17620: lr=1.00E-05, loss= 1.2028 (max= 1.7013), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:35,262 - root - INFO - Step 17620: lr=1.00E-05, loss= 1.2028 (max= 1.7013), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:20:51,203 - root - INFO - Step 17630: lr=1.00E-05, loss= 1.2167 (max= 1.6207), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:20:51,203 - root - INFO - Step 17630: lr=1.00E-05, loss= 1.2167 (max= 1.6207), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:20:51,203 - root - INFO - Step 17630: lr=1.00E-05, loss= 1.2167 (max= 1.6207), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:20:51,203 - root - INFO - Step 17630: lr=1.00E-05, loss= 1.2167 (max= 1.6207), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:20:51,203 - root - INFO - Step 17630: lr=1.00E-05, loss= 1.2167 (max= 1.6207), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:20:51,203 - root - INFO - Step 17630: lr=1.00E-05, loss= 1.2167 (max= 1.6207), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:20:51,203 - root - INFO - Step 17630: lr=1.00E-05, loss= 1.2167 (max= 1.6207), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:20:51,203 - root - INFO - Step 17630: lr=1.00E-05, loss= 1.2167 (max= 1.6207), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:21:07,116 - root - INFO - Step 17640: lr=1.00E-05, loss= 1.2352 (max= 1.7113), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:21:07,116 - root - INFO - Step 17640: lr=1.00E-05, loss= 1.2352 (max= 1.7113), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:21:07,116 - root - INFO - Step 17640: lr=1.00E-05, loss= 1.2352 (max= 1.7113), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:21:07,116 - root - INFO - Step 17640: lr=1.00E-05, loss= 1.2352 (max= 1.7113), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:21:07,116 - root - INFO - Step 17640: lr=1.00E-05, loss= 1.2352 (max= 1.7113), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:21:07,116 - root - INFO - Step 17640: lr=1.00E-05, loss= 1.2352 (max= 1.7113), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:21:07,116 - root - INFO - Step 17640: lr=1.00E-05, loss= 1.2352 (max= 1.7113), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:21:07,116 - root - INFO - Step 17640: lr=1.00E-05, loss= 1.2352 (max= 1.7113), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:21:23,056 - root - INFO - Step 17650: lr=1.00E-05, loss= 1.2078 (max= 1.6516), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:21:23,056 - root - INFO - Step 17650: lr=1.00E-05, loss= 1.2078 (max= 1.6516), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:21:23,056 - root - INFO - Step 17650: lr=1.00E-05, loss= 1.2078 (max= 1.6516), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:21:23,056 - root - INFO - Step 17650: lr=1.00E-05, loss= 1.2078 (max= 1.6516), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:21:23,056 - root - INFO - Step 17650: lr=1.00E-05, loss= 1.2078 (max= 1.6516), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:21:23,056 - root - INFO - Step 17650: lr=1.00E-05, loss= 1.2078 (max= 1.6516), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:21:23,056 - root - INFO - Step 17650: lr=1.00E-05, loss= 1.2078 (max= 1.6516), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:21:23,056 - root - INFO - Step 17650: lr=1.00E-05, loss= 1.2078 (max= 1.6516), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:21:39,035 - root - INFO - Step 17660: lr=1.00E-05, loss= 1.2138 (max= 1.5923), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:21:39,035 - root - INFO - Step 17660: lr=1.00E-05, loss= 1.2138 (max= 1.5923), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:21:39,035 - root - INFO - Step 17660: lr=1.00E-05, loss= 1.2138 (max= 1.5923), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:21:39,035 - root - INFO - Step 17660: lr=1.00E-05, loss= 1.2138 (max= 1.5923), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:21:39,035 - root - INFO - Step 17660: lr=1.00E-05, loss= 1.2138 (max= 1.5923), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:21:39,035 - root - INFO - Step 17660: lr=1.00E-05, loss= 1.2138 (max= 1.5923), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:21:39,035 - root - INFO - Step 17660: lr=1.00E-05, loss= 1.2138 (max= 1.5923), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:21:39,035 - root - INFO - Step 17660: lr=1.00E-05, loss= 1.2138 (max= 1.5923), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:21:54,993 - root - INFO - Step 17670: lr=1.00E-05, loss= 1.1812 (max= 1.5381), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:21:54,993 - root - INFO - Step 17670: lr=1.00E-05, loss= 1.1812 (max= 1.5381), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:21:54,993 - root - INFO - Step 17670: lr=1.00E-05, loss= 1.1812 (max= 1.5381), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:21:54,993 - root - INFO - Step 17670: lr=1.00E-05, loss= 1.1812 (max= 1.5381), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:21:54,993 - root - INFO - Step 17670: lr=1.00E-05, loss= 1.1812 (max= 1.5381), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:21:54,993 - root - INFO - Step 17670: lr=1.00E-05, loss= 1.1812 (max= 1.5381), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:21:54,993 - root - INFO - Step 17670: lr=1.00E-05, loss= 1.1812 (max= 1.5381), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:21:54,993 - root - INFO - Step 17670: lr=1.00E-05, loss= 1.1812 (max= 1.5381), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:22:10,956 - root - INFO - Step 17680: lr=1.00E-05, loss= 1.2151 (max= 1.5819), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:22:10,956 - root - INFO - Step 17680: lr=1.00E-05, loss= 1.2151 (max= 1.5819), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:22:10,956 - root - INFO - Step 17680: lr=1.00E-05, loss= 1.2151 (max= 1.5819), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:22:10,956 - root - INFO - Step 17680: lr=1.00E-05, loss= 1.2151 (max= 1.5819), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:22:10,956 - root - INFO - Step 17680: lr=1.00E-05, loss= 1.2151 (max= 1.5819), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:22:10,956 - root - INFO - Step 17680: lr=1.00E-05, loss= 1.2151 (max= 1.5819), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:22:10,956 - root - INFO - Step 17680: lr=1.00E-05, loss= 1.2151 (max= 1.5819), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:22:10,956 - root - INFO - Step 17680: lr=1.00E-05, loss= 1.2151 (max= 1.5819), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:22:26,914 - root - INFO - Step 17690: lr=1.00E-05, loss= 1.1993 (max= 1.6185), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:22:26,914 - root - INFO - Step 17690: lr=1.00E-05, loss= 1.1993 (max= 1.6185), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:22:26,914 - root - INFO - Step 17690: lr=1.00E-05, loss= 1.1993 (max= 1.6185), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:22:26,914 - root - INFO - Step 17690: lr=1.00E-05, loss= 1.1993 (max= 1.6185), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:22:26,914 - root - INFO - Step 17690: lr=1.00E-05, loss= 1.1993 (max= 1.6185), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:22:26,914 - root - INFO - Step 17690: lr=1.00E-05, loss= 1.1993 (max= 1.6185), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:22:26,914 - root - INFO - Step 17690: lr=1.00E-05, loss= 1.1993 (max= 1.6185), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:22:26,914 - root - INFO - Step 17690: lr=1.00E-05, loss= 1.1993 (max= 1.6185), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:22:42,828 - root - INFO - Step 17700: lr=1.00E-05, loss= 1.1786 (max= 1.5386), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:22:42,828 - root - INFO - Step 17700: lr=1.00E-05, loss= 1.1786 (max= 1.5386), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:22:42,828 - root - INFO - Step 17700: lr=1.00E-05, loss= 1.1786 (max= 1.5386), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:22:42,828 - root - INFO - Step 17700: lr=1.00E-05, loss= 1.1786 (max= 1.5386), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:22:42,828 - root - INFO - Step 17700: lr=1.00E-05, loss= 1.1786 (max= 1.5386), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:22:42,828 - root - INFO - Step 17700: lr=1.00E-05, loss= 1.1786 (max= 1.5386), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:22:42,828 - root - INFO - Step 17700: lr=1.00E-05, loss= 1.1786 (max= 1.5386), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:22:42,828 - root - INFO - Step 17700: lr=1.00E-05, loss= 1.1786 (max= 1.5386), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:22:58,790 - root - INFO - Step 17710: lr=1.00E-05, loss= 1.2166 (max= 1.6402), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:22:58,790 - root - INFO - Step 17710: lr=1.00E-05, loss= 1.2166 (max= 1.6402), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:22:58,790 - root - INFO - Step 17710: lr=1.00E-05, loss= 1.2166 (max= 1.6402), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:22:58,790 - root - INFO - Step 17710: lr=1.00E-05, loss= 1.2166 (max= 1.6402), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:22:58,790 - root - INFO - Step 17710: lr=1.00E-05, loss= 1.2166 (max= 1.6402), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:22:58,790 - root - INFO - Step 17710: lr=1.00E-05, loss= 1.2166 (max= 1.6402), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:22:58,790 - root - INFO - Step 17710: lr=1.00E-05, loss= 1.2166 (max= 1.6402), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:22:58,790 - root - INFO - Step 17710: lr=1.00E-05, loss= 1.2166 (max= 1.6402), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:23:14,729 - root - INFO - Step 17720: lr=1.00E-05, loss= 1.2117 (max= 1.6662), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:14,729 - root - INFO - Step 17720: lr=1.00E-05, loss= 1.2117 (max= 1.6662), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:14,729 - root - INFO - Step 17720: lr=1.00E-05, loss= 1.2117 (max= 1.6662), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:14,729 - root - INFO - Step 17720: lr=1.00E-05, loss= 1.2117 (max= 1.6662), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:14,729 - root - INFO - Step 17720: lr=1.00E-05, loss= 1.2117 (max= 1.6662), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:14,729 - root - INFO - Step 17720: lr=1.00E-05, loss= 1.2117 (max= 1.6662), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:14,729 - root - INFO - Step 17720: lr=1.00E-05, loss= 1.2117 (max= 1.6662), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:14,729 - root - INFO - Step 17720: lr=1.00E-05, loss= 1.2117 (max= 1.6662), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:30,676 - root - INFO - Step 17730: lr=1.00E-05, loss= 1.2019 (max= 1.5776), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:30,677 - root - INFO - Step 17730: lr=1.00E-05, loss= 1.2019 (max= 1.5776), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:30,677 - root - INFO - Step 17730: lr=1.00E-05, loss= 1.2019 (max= 1.5776), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:30,677 - root - INFO - Step 17730: lr=1.00E-05, loss= 1.2019 (max= 1.5776), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:30,677 - root - INFO - Step 17730: lr=1.00E-05, loss= 1.2019 (max= 1.5776), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:30,677 - root - INFO - Step 17730: lr=1.00E-05, loss= 1.2019 (max= 1.5776), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:30,677 - root - INFO - Step 17730: lr=1.00E-05, loss= 1.2019 (max= 1.5776), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:30,677 - root - INFO - Step 17730: lr=1.00E-05, loss= 1.2019 (max= 1.5776), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:46,581 - root - INFO - Step 17740: lr=1.00E-05, loss= 1.2232 (max= 1.5955), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:46,582 - root - INFO - Step 17740: lr=1.00E-05, loss= 1.2232 (max= 1.5955), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:46,582 - root - INFO - Step 17740: lr=1.00E-05, loss= 1.2232 (max= 1.5955), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:46,582 - root - INFO - Step 17740: lr=1.00E-05, loss= 1.2232 (max= 1.5955), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:46,582 - root - INFO - Step 17740: lr=1.00E-05, loss= 1.2232 (max= 1.5955), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:46,582 - root - INFO - Step 17740: lr=1.00E-05, loss= 1.2232 (max= 1.5955), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:46,582 - root - INFO - Step 17740: lr=1.00E-05, loss= 1.2232 (max= 1.5955), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:23:46,582 - root - INFO - Step 17740: lr=1.00E-05, loss= 1.2232 (max= 1.5955), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:02,516 - root - INFO - Step 17750: lr=1.00E-05, loss= 1.2252 (max= 1.9290), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:02,516 - root - INFO - Step 17750: lr=1.00E-05, loss= 1.2252 (max= 1.9290), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:02,516 - root - INFO - Step 17750: lr=1.00E-05, loss= 1.2252 (max= 1.9290), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:02,516 - root - INFO - Step 17750: lr=1.00E-05, loss= 1.2252 (max= 1.9290), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:02,516 - root - INFO - Step 17750: lr=1.00E-05, loss= 1.2252 (max= 1.9290), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:02,516 - root - INFO - Step 17750: lr=1.00E-05, loss= 1.2252 (max= 1.9290), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:02,516 - root - INFO - Step 17750: lr=1.00E-05, loss= 1.2252 (max= 1.9290), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:02,516 - root - INFO - Step 17750: lr=1.00E-05, loss= 1.2252 (max= 1.9290), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:18,471 - root - INFO - Step 17760: lr=1.00E-05, loss= 1.2189 (max= 1.6175), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:18,471 - root - INFO - Step 17760: lr=1.00E-05, loss= 1.2189 (max= 1.6175), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:18,471 - root - INFO - Step 17760: lr=1.00E-05, loss= 1.2189 (max= 1.6175), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:18,471 - root - INFO - Step 17760: lr=1.00E-05, loss= 1.2189 (max= 1.6175), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:18,471 - root - INFO - Step 17760: lr=1.00E-05, loss= 1.2189 (max= 1.6175), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:18,471 - root - INFO - Step 17760: lr=1.00E-05, loss= 1.2189 (max= 1.6175), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:18,472 - root - INFO - Step 17760: lr=1.00E-05, loss= 1.2189 (max= 1.6175), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:18,472 - root - INFO - Step 17760: lr=1.00E-05, loss= 1.2189 (max= 1.6175), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:34,424 - root - INFO - Step 17770: lr=1.00E-05, loss= 1.2110 (max= 1.5119), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:34,425 - root - INFO - Step 17770: lr=1.00E-05, loss= 1.2110 (max= 1.5119), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:34,425 - root - INFO - Step 17770: lr=1.00E-05, loss= 1.2110 (max= 1.5119), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:34,425 - root - INFO - Step 17770: lr=1.00E-05, loss= 1.2110 (max= 1.5119), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:34,425 - root - INFO - Step 17770: lr=1.00E-05, loss= 1.2110 (max= 1.5119), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:34,425 - root - INFO - Step 17770: lr=1.00E-05, loss= 1.2110 (max= 1.5119), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:34,425 - root - INFO - Step 17770: lr=1.00E-05, loss= 1.2110 (max= 1.5119), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:34,425 - root - INFO - Step 17770: lr=1.00E-05, loss= 1.2110 (max= 1.5119), tps=20545, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:50,410 - root - INFO - Step 17780: lr=1.00E-05, loss= 1.1913 (max= 1.6299), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:50,411 - root - INFO - Step 17780: lr=1.00E-05, loss= 1.1913 (max= 1.6299), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:50,411 - root - INFO - Step 17780: lr=1.00E-05, loss= 1.1913 (max= 1.6299), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:50,411 - root - INFO - Step 17780: lr=1.00E-05, loss= 1.1913 (max= 1.6299), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:50,411 - root - INFO - Step 17780: lr=1.00E-05, loss= 1.1913 (max= 1.6299), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:50,411 - root - INFO - Step 17780: lr=1.00E-05, loss= 1.1913 (max= 1.6299), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:50,411 - root - INFO - Step 17780: lr=1.00E-05, loss= 1.1913 (max= 1.6299), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:24:50,411 - root - INFO - Step 17780: lr=1.00E-05, loss= 1.1913 (max= 1.6299), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:06,350 - root - INFO - Step 17790: lr=1.00E-05, loss= 1.2292 (max= 1.8355), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:06,350 - root - INFO - Step 17790: lr=1.00E-05, loss= 1.2292 (max= 1.8355), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:06,350 - root - INFO - Step 17790: lr=1.00E-05, loss= 1.2292 (max= 1.8355), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:06,350 - root - INFO - Step 17790: lr=1.00E-05, loss= 1.2292 (max= 1.8355), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:06,350 - root - INFO - Step 17790: lr=1.00E-05, loss= 1.2292 (max= 1.8355), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:06,350 - root - INFO - Step 17790: lr=1.00E-05, loss= 1.2292 (max= 1.8355), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:06,350 - root - INFO - Step 17790: lr=1.00E-05, loss= 1.2292 (max= 1.8355), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:06,350 - root - INFO - Step 17790: lr=1.00E-05, loss= 1.2292 (max= 1.8355), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:22,294 - root - INFO - Step 17800: lr=1.00E-05, loss= 1.2041 (max= 1.6201), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:22,295 - root - INFO - Step 17800: lr=1.00E-05, loss= 1.2041 (max= 1.6201), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:22,295 - root - INFO - Step 17800: lr=1.00E-05, loss= 1.2041 (max= 1.6201), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:22,295 - root - INFO - Step 17800: lr=1.00E-05, loss= 1.2041 (max= 1.6201), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:22,295 - root - INFO - Step 17800: lr=1.00E-05, loss= 1.2041 (max= 1.6201), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:22,295 - root - INFO - Step 17800: lr=1.00E-05, loss= 1.2041 (max= 1.6201), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:22,295 - root - INFO - Step 17800: lr=1.00E-05, loss= 1.2041 (max= 1.6201), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:22,295 - root - INFO - Step 17800: lr=1.00E-05, loss= 1.2041 (max= 1.6201), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:38,230 - root - INFO - Step 17810: lr=1.00E-05, loss= 1.2061 (max= 1.5845), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:38,230 - root - INFO - Step 17810: lr=1.00E-05, loss= 1.2061 (max= 1.5845), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:38,230 - root - INFO - Step 17810: lr=1.00E-05, loss= 1.2061 (max= 1.5845), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:38,230 - root - INFO - Step 17810: lr=1.00E-05, loss= 1.2061 (max= 1.5845), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:38,230 - root - INFO - Step 17810: lr=1.00E-05, loss= 1.2061 (max= 1.5845), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:38,230 - root - INFO - Step 17810: lr=1.00E-05, loss= 1.2061 (max= 1.5845), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:38,230 - root - INFO - Step 17810: lr=1.00E-05, loss= 1.2061 (max= 1.5845), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:38,230 - root - INFO - Step 17810: lr=1.00E-05, loss= 1.2061 (max= 1.5845), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:54,183 - root - INFO - Step 17820: lr=1.00E-05, loss= 1.2100 (max= 1.6230), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:54,183 - root - INFO - Step 17820: lr=1.00E-05, loss= 1.2100 (max= 1.6230), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:54,183 - root - INFO - Step 17820: lr=1.00E-05, loss= 1.2100 (max= 1.6230), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:54,183 - root - INFO - Step 17820: lr=1.00E-05, loss= 1.2100 (max= 1.6230), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:54,183 - root - INFO - Step 17820: lr=1.00E-05, loss= 1.2100 (max= 1.6230), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:54,183 - root - INFO - Step 17820: lr=1.00E-05, loss= 1.2100 (max= 1.6230), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:54,183 - root - INFO - Step 17820: lr=1.00E-05, loss= 1.2100 (max= 1.6230), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:25:54,183 - root - INFO - Step 17820: lr=1.00E-05, loss= 1.2100 (max= 1.6230), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:10,164 - root - INFO - Step 17830: lr=1.00E-05, loss= 1.2259 (max= 1.6011), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:10,165 - root - INFO - Step 17830: lr=1.00E-05, loss= 1.2259 (max= 1.6011), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:10,165 - root - INFO - Step 17830: lr=1.00E-05, loss= 1.2259 (max= 1.6011), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:10,165 - root - INFO - Step 17830: lr=1.00E-05, loss= 1.2259 (max= 1.6011), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:10,165 - root - INFO - Step 17830: lr=1.00E-05, loss= 1.2259 (max= 1.6011), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:10,165 - root - INFO - Step 17830: lr=1.00E-05, loss= 1.2259 (max= 1.6011), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:10,165 - root - INFO - Step 17830: lr=1.00E-05, loss= 1.2259 (max= 1.6011), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:10,165 - root - INFO - Step 17830: lr=1.00E-05, loss= 1.2259 (max= 1.6011), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:26,130 - root - INFO - Step 17840: lr=1.00E-05, loss= 1.2242 (max= 1.5539), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:26,130 - root - INFO - Step 17840: lr=1.00E-05, loss= 1.2242 (max= 1.5539), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:26,130 - root - INFO - Step 17840: lr=1.00E-05, loss= 1.2242 (max= 1.5539), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:26,130 - root - INFO - Step 17840: lr=1.00E-05, loss= 1.2242 (max= 1.5539), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:26,130 - root - INFO - Step 17840: lr=1.00E-05, loss= 1.2242 (max= 1.5539), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:26,130 - root - INFO - Step 17840: lr=1.00E-05, loss= 1.2242 (max= 1.5539), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:26,130 - root - INFO - Step 17840: lr=1.00E-05, loss= 1.2242 (max= 1.5539), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:26,130 - root - INFO - Step 17840: lr=1.00E-05, loss= 1.2242 (max= 1.5539), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:42,095 - root - INFO - Step 17850: lr=1.00E-05, loss= 1.2032 (max= 1.6155), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:42,096 - root - INFO - Step 17850: lr=1.00E-05, loss= 1.2032 (max= 1.6155), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:42,096 - root - INFO - Step 17850: lr=1.00E-05, loss= 1.2032 (max= 1.6155), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:42,096 - root - INFO - Step 17850: lr=1.00E-05, loss= 1.2032 (max= 1.6155), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:42,096 - root - INFO - Step 17850: lr=1.00E-05, loss= 1.2032 (max= 1.6155), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:42,096 - root - INFO - Step 17850: lr=1.00E-05, loss= 1.2032 (max= 1.6155), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:42,096 - root - INFO - Step 17850: lr=1.00E-05, loss= 1.2032 (max= 1.6155), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:42,096 - root - INFO - Step 17850: lr=1.00E-05, loss= 1.2032 (max= 1.6155), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:58,058 - root - INFO - Step 17860: lr=1.00E-05, loss= 1.2308 (max= 1.6117), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:58,058 - root - INFO - Step 17860: lr=1.00E-05, loss= 1.2308 (max= 1.6117), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:58,058 - root - INFO - Step 17860: lr=1.00E-05, loss= 1.2308 (max= 1.6117), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:58,058 - root - INFO - Step 17860: lr=1.00E-05, loss= 1.2308 (max= 1.6117), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:58,058 - root - INFO - Step 17860: lr=1.00E-05, loss= 1.2308 (max= 1.6117), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:58,058 - root - INFO - Step 17860: lr=1.00E-05, loss= 1.2308 (max= 1.6117), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:58,058 - root - INFO - Step 17860: lr=1.00E-05, loss= 1.2308 (max= 1.6117), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:26:58,058 - root - INFO - Step 17860: lr=1.00E-05, loss= 1.2308 (max= 1.6117), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:27:14,012 - root - INFO - Step 17870: lr=1.00E-05, loss= 1.1674 (max= 1.4934), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:27:14,012 - root - INFO - Step 17870: lr=1.00E-05, loss= 1.1674 (max= 1.4934), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:27:14,012 - root - INFO - Step 17870: lr=1.00E-05, loss= 1.1674 (max= 1.4934), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:27:14,012 - root - INFO - Step 17870: lr=1.00E-05, loss= 1.1674 (max= 1.4934), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:27:14,013 - root - INFO - Step 17870: lr=1.00E-05, loss= 1.1674 (max= 1.4934), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:27:14,013 - root - INFO - Step 17870: lr=1.00E-05, loss= 1.1674 (max= 1.4934), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:27:14,013 - root - INFO - Step 17870: lr=1.00E-05, loss= 1.1674 (max= 1.4934), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:27:14,013 - root - INFO - Step 17870: lr=1.00E-05, loss= 1.1674 (max= 1.4934), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:27:29,963 - root - INFO - Step 17880: lr=1.00E-05, loss= 1.2071 (max= 1.6069), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:27:29,963 - root - INFO - Step 17880: lr=1.00E-05, loss= 1.2071 (max= 1.6069), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:27:29,963 - root - INFO - Step 17880: lr=1.00E-05, loss= 1.2071 (max= 1.6069), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:27:29,963 - root - INFO - Step 17880: lr=1.00E-05, loss= 1.2071 (max= 1.6069), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:27:29,963 - root - INFO - Step 17880: lr=1.00E-05, loss= 1.2071 (max= 1.6069), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:27:29,963 - root - INFO - Step 17880: lr=1.00E-05, loss= 1.2071 (max= 1.6069), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:27:29,963 - root - INFO - Step 17880: lr=1.00E-05, loss= 1.2071 (max= 1.6069), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:27:29,963 - root - INFO - Step 17880: lr=1.00E-05, loss= 1.2071 (max= 1.6069), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:27:45,914 - root - INFO - Step 17890: lr=1.00E-05, loss= 1.2466 (max= 1.6774), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:27:45,915 - root - INFO - Step 17890: lr=1.00E-05, loss= 1.2466 (max= 1.6774), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:27:45,914 - root - INFO - Step 17890: lr=1.00E-05, loss= 1.2466 (max= 1.6774), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:27:45,915 - root - INFO - Step 17890: lr=1.00E-05, loss= 1.2466 (max= 1.6774), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:27:45,915 - root - INFO - Step 17890: lr=1.00E-05, loss= 1.2466 (max= 1.6774), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:27:45,915 - root - INFO - Step 17890: lr=1.00E-05, loss= 1.2466 (max= 1.6774), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:27:45,915 - root - INFO - Step 17890: lr=1.00E-05, loss= 1.2466 (max= 1.6774), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:27:45,915 - root - INFO - Step 17890: lr=1.00E-05, loss= 1.2466 (max= 1.6774), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:28:01,909 - root - INFO - Step 17900: lr=1.00E-05, loss= 1.2360 (max= 1.7486), tps=20491, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:01,909 - root - INFO - Step 17900: lr=1.00E-05, loss= 1.2360 (max= 1.7486), tps=20491, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:01,909 - root - INFO - Step 17900: lr=1.00E-05, loss= 1.2360 (max= 1.7486), tps=20491, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:01,909 - root - INFO - Step 17900: lr=1.00E-05, loss= 1.2360 (max= 1.7486), tps=20491, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:01,909 - root - INFO - Step 17900: lr=1.00E-05, loss= 1.2360 (max= 1.7486), tps=20491, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:01,909 - root - INFO - Step 17900: lr=1.00E-05, loss= 1.2360 (max= 1.7486), tps=20491, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:01,909 - root - INFO - Step 17900: lr=1.00E-05, loss= 1.2360 (max= 1.7486), tps=20491, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:01,909 - root - INFO - Step 17900: lr=1.00E-05, loss= 1.2360 (max= 1.7486), tps=20491, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:17,841 - root - INFO - Step 17910: lr=1.00E-05, loss= 1.2255 (max= 1.5672), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:28:17,842 - root - INFO - Step 17910: lr=1.00E-05, loss= 1.2255 (max= 1.5672), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:28:17,842 - root - INFO - Step 17910: lr=1.00E-05, loss= 1.2255 (max= 1.5672), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:28:17,842 - root - INFO - Step 17910: lr=1.00E-05, loss= 1.2255 (max= 1.5672), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:28:17,842 - root - INFO - Step 17910: lr=1.00E-05, loss= 1.2255 (max= 1.5672), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:28:17,842 - root - INFO - Step 17910: lr=1.00E-05, loss= 1.2255 (max= 1.5672), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:28:17,842 - root - INFO - Step 17910: lr=1.00E-05, loss= 1.2255 (max= 1.5672), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:28:17,842 - root - INFO - Step 17910: lr=1.00E-05, loss= 1.2255 (max= 1.5672), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:28:33,781 - root - INFO - Step 17920: lr=1.00E-05, loss= 1.2253 (max= 1.5932), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:33,781 - root - INFO - Step 17920: lr=1.00E-05, loss= 1.2253 (max= 1.5932), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:33,781 - root - INFO - Step 17920: lr=1.00E-05, loss= 1.2253 (max= 1.5932), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:33,781 - root - INFO - Step 17920: lr=1.00E-05, loss= 1.2253 (max= 1.5932), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:33,781 - root - INFO - Step 17920: lr=1.00E-05, loss= 1.2253 (max= 1.5932), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:33,781 - root - INFO - Step 17920: lr=1.00E-05, loss= 1.2253 (max= 1.5932), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:33,781 - root - INFO - Step 17920: lr=1.00E-05, loss= 1.2253 (max= 1.5932), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:33,781 - root - INFO - Step 17920: lr=1.00E-05, loss= 1.2253 (max= 1.5932), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:49,700 - root - INFO - Step 17930: lr=1.00E-05, loss= 1.1938 (max= 1.5770), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:49,700 - root - INFO - Step 17930: lr=1.00E-05, loss= 1.1938 (max= 1.5770), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:49,700 - root - INFO - Step 17930: lr=1.00E-05, loss= 1.1938 (max= 1.5770), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:49,700 - root - INFO - Step 17930: lr=1.00E-05, loss= 1.1938 (max= 1.5770), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:49,700 - root - INFO - Step 17930: lr=1.00E-05, loss= 1.1938 (max= 1.5770), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:49,700 - root - INFO - Step 17930: lr=1.00E-05, loss= 1.1938 (max= 1.5770), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:49,700 - root - INFO - Step 17930: lr=1.00E-05, loss= 1.1938 (max= 1.5770), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:28:49,700 - root - INFO - Step 17930: lr=1.00E-05, loss= 1.1938 (max= 1.5770), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:05,679 - root - INFO - Step 17940: lr=1.00E-05, loss= 1.2450 (max= 1.5601), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:29:05,679 - root - INFO - Step 17940: lr=1.00E-05, loss= 1.2450 (max= 1.5601), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:29:05,679 - root - INFO - Step 17940: lr=1.00E-05, loss= 1.2450 (max= 1.5601), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:29:05,680 - root - INFO - Step 17940: lr=1.00E-05, loss= 1.2450 (max= 1.5601), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:29:05,680 - root - INFO - Step 17940: lr=1.00E-05, loss= 1.2450 (max= 1.5601), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:29:05,680 - root - INFO - Step 17940: lr=1.00E-05, loss= 1.2450 (max= 1.5601), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:29:05,680 - root - INFO - Step 17940: lr=1.00E-05, loss= 1.2450 (max= 1.5601), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:29:05,680 - root - INFO - Step 17940: lr=1.00E-05, loss= 1.2450 (max= 1.5601), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:29:21,625 - root - INFO - Step 17950: lr=1.00E-05, loss= 1.2197 (max= 1.6492), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:21,625 - root - INFO - Step 17950: lr=1.00E-05, loss= 1.2197 (max= 1.6492), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:21,625 - root - INFO - Step 17950: lr=1.00E-05, loss= 1.2197 (max= 1.6492), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:21,625 - root - INFO - Step 17950: lr=1.00E-05, loss= 1.2197 (max= 1.6492), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:21,625 - root - INFO - Step 17950: lr=1.00E-05, loss= 1.2197 (max= 1.6492), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:21,625 - root - INFO - Step 17950: lr=1.00E-05, loss= 1.2197 (max= 1.6492), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:21,625 - root - INFO - Step 17950: lr=1.00E-05, loss= 1.2197 (max= 1.6492), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:21,625 - root - INFO - Step 17950: lr=1.00E-05, loss= 1.2197 (max= 1.6492), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:37,603 - root - INFO - Step 17960: lr=1.00E-05, loss= 1.2338 (max= 1.5804), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:37,603 - root - INFO - Step 17960: lr=1.00E-05, loss= 1.2338 (max= 1.5804), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:37,603 - root - INFO - Step 17960: lr=1.00E-05, loss= 1.2338 (max= 1.5804), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:37,603 - root - INFO - Step 17960: lr=1.00E-05, loss= 1.2338 (max= 1.5804), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:37,603 - root - INFO - Step 17960: lr=1.00E-05, loss= 1.2338 (max= 1.5804), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:37,603 - root - INFO - Step 17960: lr=1.00E-05, loss= 1.2338 (max= 1.5804), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:37,603 - root - INFO - Step 17960: lr=1.00E-05, loss= 1.2338 (max= 1.5804), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:37,603 - root - INFO - Step 17960: lr=1.00E-05, loss= 1.2338 (max= 1.5804), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:53,532 - root - INFO - Step 17970: lr=1.00E-05, loss= 1.2000 (max= 1.6311), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:53,532 - root - INFO - Step 17970: lr=1.00E-05, loss= 1.2000 (max= 1.6311), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:53,532 - root - INFO - Step 17970: lr=1.00E-05, loss= 1.2000 (max= 1.6311), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:53,532 - root - INFO - Step 17970: lr=1.00E-05, loss= 1.2000 (max= 1.6311), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:53,532 - root - INFO - Step 17970: lr=1.00E-05, loss= 1.2000 (max= 1.6311), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:53,532 - root - INFO - Step 17970: lr=1.00E-05, loss= 1.2000 (max= 1.6311), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:53,532 - root - INFO - Step 17970: lr=1.00E-05, loss= 1.2000 (max= 1.6311), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:29:53,532 - root - INFO - Step 17970: lr=1.00E-05, loss= 1.2000 (max= 1.6311), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:09,462 - root - INFO - Step 17980: lr=1.00E-05, loss= 1.2470 (max= 1.6538), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:09,462 - root - INFO - Step 17980: lr=1.00E-05, loss= 1.2470 (max= 1.6538), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:09,462 - root - INFO - Step 17980: lr=1.00E-05, loss= 1.2470 (max= 1.6538), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:09,462 - root - INFO - Step 17980: lr=1.00E-05, loss= 1.2470 (max= 1.6538), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:09,462 - root - INFO - Step 17980: lr=1.00E-05, loss= 1.2470 (max= 1.6538), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:09,462 - root - INFO - Step 17980: lr=1.00E-05, loss= 1.2470 (max= 1.6538), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:09,462 - root - INFO - Step 17980: lr=1.00E-05, loss= 1.2470 (max= 1.6538), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:09,463 - root - INFO - Step 17980: lr=1.00E-05, loss= 1.2470 (max= 1.6538), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:25,433 - root - INFO - Step 17990: lr=1.00E-05, loss= 1.2169 (max= 1.7849), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:25,433 - root - INFO - Step 17990: lr=1.00E-05, loss= 1.2169 (max= 1.7849), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:25,433 - root - INFO - Step 17990: lr=1.00E-05, loss= 1.2169 (max= 1.7849), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:25,433 - root - INFO - Step 17990: lr=1.00E-05, loss= 1.2169 (max= 1.7849), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:25,433 - root - INFO - Step 17990: lr=1.00E-05, loss= 1.2169 (max= 1.7849), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:25,433 - root - INFO - Step 17990: lr=1.00E-05, loss= 1.2169 (max= 1.7849), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:25,433 - root - INFO - Step 17990: lr=1.00E-05, loss= 1.2169 (max= 1.7849), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:25,433 - root - INFO - Step 17990: lr=1.00E-05, loss= 1.2169 (max= 1.7849), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-18000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-18000! Save time: 4.502336502075195 +2025-10-24 18:30:41,316 - root - INFO - Step 18000: lr=1.00E-05, loss= 1.1958 (max= 1.6408), tps=20636, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:41,316 - root - INFO - Step 18000: lr=1.00E-05, loss= 1.1958 (max= 1.6408), tps=20636, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:41,316 - root - INFO - Saving a full checkpoint at step 18000 +2025-10-24 18:30:41,316 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:30:41,316 - root - INFO - Saving a full checkpoint at step 18000 +2025-10-24 18:30:41,316 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:30:41,316 - root - INFO - Step 18000: lr=1.00E-05, loss= 1.1958 (max= 1.6408), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:41,316 - root - INFO - Step 18000: lr=1.00E-05, loss= 1.1958 (max= 1.6408), tps=20636, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:41,316 - root - INFO - Saving a full checkpoint at step 18000 +2025-10-24 18:30:41,316 - root - INFO - Saving a full checkpoint at step 18000 +2025-10-24 18:30:41,316 - root - INFO - Step 18000: lr=1.00E-05, loss= 1.1958 (max= 1.6408), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:41,316 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:30:41,316 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:30:41,316 - root - INFO - Step 18000: lr=1.00E-05, loss= 1.1958 (max= 1.6408), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:41,316 - root - INFO - Step 18000: lr=1.00E-05, loss= 1.1958 (max= 1.6408), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:41,316 - root - INFO - Saving a full checkpoint at step 18000 +2025-10-24 18:30:41,316 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:30:41,316 - root - INFO - Saving a full checkpoint at step 18000 +2025-10-24 18:30:41,316 - root - INFO - Saving a full checkpoint at step 18000 +2025-10-24 18:30:41,316 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:30:41,316 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:30:41,317 - root - INFO - Step 18000: lr=1.00E-05, loss= 1.1958 (max= 1.6408), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:30:41,317 - root - INFO - Saving a full checkpoint at step 18000 +2025-10-24 18:30:41,317 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:31:08,431 - root - INFO - Finished saving the checkpoint in 27.11 seconds +2025-10-24 18:31:08,440 - root - INFO - Finished saving the checkpoint in 27.12 seconds +2025-10-24 18:31:08,440 - root - INFO - Finished saving the checkpoint in 27.12 seconds +2025-10-24 18:31:08,440 - root - INFO - Finished saving the checkpoint in 27.12 seconds +2025-10-24 18:31:08,441 - root - INFO - Finished saving the checkpoint in 27.12 seconds +2025-10-24 18:31:08,441 - root - INFO - Finished saving the checkpoint in 27.12 seconds +2025-10-24 18:31:08,441 - root - INFO - Finished saving the checkpoint in 27.13 seconds +2025-10-24 18:31:08,441 - root - INFO - Finished saving the checkpoint in 27.12 seconds +2025-10-24 18:31:24,336 - root - INFO - Step 18010: lr=1.00E-05, loss= 1.2285 (max= 1.7301), tps=7617, mfu=15.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:31:24,336 - root - INFO - Step 18010: lr=1.00E-05, loss= 1.2285 (max= 1.7301), tps=7618, mfu=15.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:31:24,336 - root - INFO - Step 18010: lr=1.00E-05, loss= 1.2285 (max= 1.7301), tps=7618, mfu=15.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:31:24,336 - root - INFO - Step 18010: lr=1.00E-05, loss= 1.2285 (max= 1.7301), tps=7617, mfu=15.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:31:24,336 - root - INFO - Step 18010: lr=1.00E-05, loss= 1.2285 (max= 1.7301), tps=7618, mfu=15.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:31:24,336 - root - INFO - Step 18010: lr=1.00E-05, loss= 1.2285 (max= 1.7301), tps=7617, mfu=15.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:31:24,337 - root - INFO - Step 18010: lr=1.00E-05, loss= 1.2285 (max= 1.7301), tps=7618, mfu=15.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:31:24,337 - root - INFO - Step 18010: lr=1.00E-05, loss= 1.2285 (max= 1.7301), tps=7618, mfu=15.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:31:40,246 - root - INFO - Step 18020: lr=1.00E-05, loss= 1.2308 (max= 1.7392), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:31:40,246 - root - INFO - Step 18020: lr=1.00E-05, loss= 1.2308 (max= 1.7392), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:31:40,246 - root - INFO - Step 18020: lr=1.00E-05, loss= 1.2308 (max= 1.7392), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:31:40,246 - root - INFO - Step 18020: lr=1.00E-05, loss= 1.2308 (max= 1.7392), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:31:40,246 - root - INFO - Step 18020: lr=1.00E-05, loss= 1.2308 (max= 1.7392), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:31:40,246 - root - INFO - Step 18020: lr=1.00E-05, loss= 1.2308 (max= 1.7392), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:31:40,246 - root - INFO - Step 18020: lr=1.00E-05, loss= 1.2308 (max= 1.7392), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:31:40,246 - root - INFO - Step 18020: lr=1.00E-05, loss= 1.2308 (max= 1.7392), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:31:56,164 - root - INFO - Step 18030: lr=1.00E-05, loss= 1.2285 (max= 1.8698), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:31:56,164 - root - INFO - Step 18030: lr=1.00E-05, loss= 1.2285 (max= 1.8698), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:31:56,164 - root - INFO - Step 18030: lr=1.00E-05, loss= 1.2285 (max= 1.8698), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:31:56,164 - root - INFO - Step 18030: lr=1.00E-05, loss= 1.2285 (max= 1.8698), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:31:56,164 - root - INFO - Step 18030: lr=1.00E-05, loss= 1.2285 (max= 1.8698), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:31:56,164 - root - INFO - Step 18030: lr=1.00E-05, loss= 1.2285 (max= 1.8698), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:31:56,164 - root - INFO - Step 18030: lr=1.00E-05, loss= 1.2285 (max= 1.8698), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:31:56,165 - root - INFO - Step 18030: lr=1.00E-05, loss= 1.2285 (max= 1.8698), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:12,098 - root - INFO - Step 18040: lr=1.00E-05, loss= 1.2376 (max= 1.5565), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:12,098 - root - INFO - Step 18040: lr=1.00E-05, loss= 1.2376 (max= 1.5565), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:12,098 - root - INFO - Step 18040: lr=1.00E-05, loss= 1.2376 (max= 1.5565), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:12,098 - root - INFO - Step 18040: lr=1.00E-05, loss= 1.2376 (max= 1.5565), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:12,098 - root - INFO - Step 18040: lr=1.00E-05, loss= 1.2376 (max= 1.5565), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:12,099 - root - INFO - Step 18040: lr=1.00E-05, loss= 1.2376 (max= 1.5565), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:12,099 - root - INFO - Step 18040: lr=1.00E-05, loss= 1.2376 (max= 1.5565), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:12,099 - root - INFO - Step 18040: lr=1.00E-05, loss= 1.2376 (max= 1.5565), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:28,042 - root - INFO - Step 18050: lr=1.00E-05, loss= 1.2162 (max= 1.5067), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:28,042 - root - INFO - Step 18050: lr=1.00E-05, loss= 1.2162 (max= 1.5067), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:28,042 - root - INFO - Step 18050: lr=1.00E-05, loss= 1.2162 (max= 1.5067), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:28,042 - root - INFO - Step 18050: lr=1.00E-05, loss= 1.2162 (max= 1.5067), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:28,042 - root - INFO - Step 18050: lr=1.00E-05, loss= 1.2162 (max= 1.5067), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:28,042 - root - INFO - Step 18050: lr=1.00E-05, loss= 1.2162 (max= 1.5067), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:28,042 - root - INFO - Step 18050: lr=1.00E-05, loss= 1.2162 (max= 1.5067), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:28,042 - root - INFO - Step 18050: lr=1.00E-05, loss= 1.2162 (max= 1.5067), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:43,947 - root - INFO - Step 18060: lr=1.00E-05, loss= 1.2267 (max= 1.8489), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:32:43,948 - root - INFO - Step 18060: lr=1.00E-05, loss= 1.2267 (max= 1.8489), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:32:43,948 - root - INFO - Step 18060: lr=1.00E-05, loss= 1.2267 (max= 1.8489), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:32:43,948 - root - INFO - Step 18060: lr=1.00E-05, loss= 1.2267 (max= 1.8489), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:32:43,948 - root - INFO - Step 18060: lr=1.00E-05, loss= 1.2267 (max= 1.8489), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:32:43,948 - root - INFO - Step 18060: lr=1.00E-05, loss= 1.2267 (max= 1.8489), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:32:43,948 - root - INFO - Step 18060: lr=1.00E-05, loss= 1.2267 (max= 1.8489), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:32:43,948 - root - INFO - Step 18060: lr=1.00E-05, loss= 1.2267 (max= 1.8489), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:32:59,903 - root - INFO - Step 18070: lr=1.00E-05, loss= 1.2097 (max= 1.5489), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:59,903 - root - INFO - Step 18070: lr=1.00E-05, loss= 1.2097 (max= 1.5489), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:59,903 - root - INFO - Step 18070: lr=1.00E-05, loss= 1.2097 (max= 1.5489), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:59,903 - root - INFO - Step 18070: lr=1.00E-05, loss= 1.2097 (max= 1.5489), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:59,904 - root - INFO - Step 18070: lr=1.00E-05, loss= 1.2097 (max= 1.5489), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:59,904 - root - INFO - Step 18070: lr=1.00E-05, loss= 1.2097 (max= 1.5489), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:59,904 - root - INFO - Step 18070: lr=1.00E-05, loss= 1.2097 (max= 1.5489), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:32:59,904 - root - INFO - Step 18070: lr=1.00E-05, loss= 1.2097 (max= 1.5489), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:33:15,818 - root - INFO - Step 18080: lr=1.00E-05, loss= 1.2675 (max= 1.5851), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:33:15,818 - root - INFO - Step 18080: lr=1.00E-05, loss= 1.2675 (max= 1.5851), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:33:15,818 - root - INFO - Step 18080: lr=1.00E-05, loss= 1.2675 (max= 1.5851), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:33:15,818 - root - INFO - Step 18080: lr=1.00E-05, loss= 1.2675 (max= 1.5851), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:33:15,818 - root - INFO - Step 18080: lr=1.00E-05, loss= 1.2675 (max= 1.5851), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:33:15,818 - root - INFO - Step 18080: lr=1.00E-05, loss= 1.2675 (max= 1.5851), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:33:15,818 - root - INFO - Step 18080: lr=1.00E-05, loss= 1.2675 (max= 1.5851), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:33:15,818 - root - INFO - Step 18080: lr=1.00E-05, loss= 1.2675 (max= 1.5851), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:33:31,736 - root - INFO - Step 18090: lr=1.00E-05, loss= 1.2136 (max= 1.9160), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:33:31,736 - root - INFO - Step 18090: lr=1.00E-05, loss= 1.2136 (max= 1.9160), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:33:31,736 - root - INFO - Step 18090: lr=1.00E-05, loss= 1.2136 (max= 1.9160), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:33:31,736 - root - INFO - Step 18090: lr=1.00E-05, loss= 1.2136 (max= 1.9160), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:33:31,736 - root - INFO - Step 18090: lr=1.00E-05, loss= 1.2136 (max= 1.9160), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:33:31,736 - root - INFO - Step 18090: lr=1.00E-05, loss= 1.2136 (max= 1.9160), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:33:31,736 - root - INFO - Step 18090: lr=1.00E-05, loss= 1.2136 (max= 1.9160), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:33:31,736 - root - INFO - Step 18090: lr=1.00E-05, loss= 1.2136 (max= 1.9160), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:33:47,661 - root - INFO - Step 18100: lr=1.00E-05, loss= 1.2426 (max= 1.7352), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:33:47,661 - root - INFO - Step 18100: lr=1.00E-05, loss= 1.2426 (max= 1.7352), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:33:47,661 - root - INFO - Step 18100: lr=1.00E-05, loss= 1.2426 (max= 1.7352), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:33:47,661 - root - INFO - Step 18100: lr=1.00E-05, loss= 1.2426 (max= 1.7352), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:33:47,661 - root - INFO - Step 18100: lr=1.00E-05, loss= 1.2426 (max= 1.7352), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:33:47,661 - root - INFO - Step 18100: lr=1.00E-05, loss= 1.2426 (max= 1.7352), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:33:47,661 - root - INFO - Step 18100: lr=1.00E-05, loss= 1.2426 (max= 1.7352), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:33:47,661 - root - INFO - Step 18100: lr=1.00E-05, loss= 1.2426 (max= 1.7352), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:34:03,614 - root - INFO - Step 18110: lr=1.00E-05, loss= 1.2014 (max= 1.6386), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:03,614 - root - INFO - Step 18110: lr=1.00E-05, loss= 1.2014 (max= 1.6386), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:03,614 - root - INFO - Step 18110: lr=1.00E-05, loss= 1.2014 (max= 1.6386), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:03,614 - root - INFO - Step 18110: lr=1.00E-05, loss= 1.2014 (max= 1.6386), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:03,614 - root - INFO - Step 18110: lr=1.00E-05, loss= 1.2014 (max= 1.6386), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:03,614 - root - INFO - Step 18110: lr=1.00E-05, loss= 1.2014 (max= 1.6386), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:03,615 - root - INFO - Step 18110: lr=1.00E-05, loss= 1.2014 (max= 1.6386), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:03,615 - root - INFO - Step 18110: lr=1.00E-05, loss= 1.2014 (max= 1.6386), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:19,576 - root - INFO - Step 18120: lr=1.00E-05, loss= 1.2352 (max= 1.8946), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:19,576 - root - INFO - Step 18120: lr=1.00E-05, loss= 1.2352 (max= 1.8946), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:19,576 - root - INFO - Step 18120: lr=1.00E-05, loss= 1.2352 (max= 1.8946), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:19,576 - root - INFO - Step 18120: lr=1.00E-05, loss= 1.2352 (max= 1.8946), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:19,576 - root - INFO - Step 18120: lr=1.00E-05, loss= 1.2352 (max= 1.8946), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:19,576 - root - INFO - Step 18120: lr=1.00E-05, loss= 1.2352 (max= 1.8946), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:19,576 - root - INFO - Step 18120: lr=1.00E-05, loss= 1.2352 (max= 1.8946), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:19,577 - root - INFO - Step 18120: lr=1.00E-05, loss= 1.2352 (max= 1.8946), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:35,444 - root - INFO - Step 18130: lr=1.00E-05, loss= 1.2346 (max= 1.6244), tps=20655, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:35,444 - root - INFO - Step 18130: lr=1.00E-05, loss= 1.2346 (max= 1.6244), tps=20655, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:35,444 - root - INFO - Step 18130: lr=1.00E-05, loss= 1.2346 (max= 1.6244), tps=20655, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:35,444 - root - INFO - Step 18130: lr=1.00E-05, loss= 1.2346 (max= 1.6244), tps=20655, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:35,444 - root - INFO - Step 18130: lr=1.00E-05, loss= 1.2346 (max= 1.6244), tps=20655, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:35,444 - root - INFO - Step 18130: lr=1.00E-05, loss= 1.2346 (max= 1.6244), tps=20655, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:35,444 - root - INFO - Step 18130: lr=1.00E-05, loss= 1.2346 (max= 1.6244), tps=20655, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:35,445 - root - INFO - Step 18130: lr=1.00E-05, loss= 1.2346 (max= 1.6244), tps=20655, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:34:51,340 - root - INFO - Step 18140: lr=1.00E-05, loss= 1.2256 (max= 1.5303), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:34:51,340 - root - INFO - Step 18140: lr=1.00E-05, loss= 1.2256 (max= 1.5303), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:34:51,340 - root - INFO - Step 18140: lr=1.00E-05, loss= 1.2256 (max= 1.5303), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:34:51,340 - root - INFO - Step 18140: lr=1.00E-05, loss= 1.2256 (max= 1.5303), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:34:51,340 - root - INFO - Step 18140: lr=1.00E-05, loss= 1.2256 (max= 1.5303), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:34:51,340 - root - INFO - Step 18140: lr=1.00E-05, loss= 1.2256 (max= 1.5303), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:34:51,340 - root - INFO - Step 18140: lr=1.00E-05, loss= 1.2256 (max= 1.5303), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:34:51,340 - root - INFO - Step 18140: lr=1.00E-05, loss= 1.2256 (max= 1.5303), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:35:07,281 - root - INFO - Step 18150: lr=1.00E-05, loss= 1.2104 (max= 1.6226), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:07,281 - root - INFO - Step 18150: lr=1.00E-05, loss= 1.2104 (max= 1.6226), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:07,281 - root - INFO - Step 18150: lr=1.00E-05, loss= 1.2104 (max= 1.6226), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:07,281 - root - INFO - Step 18150: lr=1.00E-05, loss= 1.2104 (max= 1.6226), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:07,281 - root - INFO - Step 18150: lr=1.00E-05, loss= 1.2104 (max= 1.6226), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:07,281 - root - INFO - Step 18150: lr=1.00E-05, loss= 1.2104 (max= 1.6226), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:07,281 - root - INFO - Step 18150: lr=1.00E-05, loss= 1.2104 (max= 1.6226), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:07,282 - root - INFO - Step 18150: lr=1.00E-05, loss= 1.2104 (max= 1.6226), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:23,216 - root - INFO - Step 18160: lr=1.00E-05, loss= 1.2205 (max= 1.5205), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:23,216 - root - INFO - Step 18160: lr=1.00E-05, loss= 1.2205 (max= 1.5205), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:23,216 - root - INFO - Step 18160: lr=1.00E-05, loss= 1.2205 (max= 1.5205), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:23,216 - root - INFO - Step 18160: lr=1.00E-05, loss= 1.2205 (max= 1.5205), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:23,216 - root - INFO - Step 18160: lr=1.00E-05, loss= 1.2205 (max= 1.5205), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:23,216 - root - INFO - Step 18160: lr=1.00E-05, loss= 1.2205 (max= 1.5205), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:23,216 - root - INFO - Step 18160: lr=1.00E-05, loss= 1.2205 (max= 1.5205), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:23,216 - root - INFO - Step 18160: lr=1.00E-05, loss= 1.2205 (max= 1.5205), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:39,149 - root - INFO - Step 18170: lr=1.00E-05, loss= 1.2451 (max= 1.9464), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:39,149 - root - INFO - Step 18170: lr=1.00E-05, loss= 1.2451 (max= 1.9464), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:39,149 - root - INFO - Step 18170: lr=1.00E-05, loss= 1.2451 (max= 1.9464), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:39,149 - root - INFO - Step 18170: lr=1.00E-05, loss= 1.2451 (max= 1.9464), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:39,149 - root - INFO - Step 18170: lr=1.00E-05, loss= 1.2451 (max= 1.9464), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:39,149 - root - INFO - Step 18170: lr=1.00E-05, loss= 1.2451 (max= 1.9464), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:39,149 - root - INFO - Step 18170: lr=1.00E-05, loss= 1.2451 (max= 1.9464), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:39,149 - root - INFO - Step 18170: lr=1.00E-05, loss= 1.2451 (max= 1.9464), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:55,104 - root - INFO - Step 18180: lr=1.00E-05, loss= 1.2455 (max= 1.7485), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:55,104 - root - INFO - Step 18180: lr=1.00E-05, loss= 1.2455 (max= 1.7485), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:55,104 - root - INFO - Step 18180: lr=1.00E-05, loss= 1.2455 (max= 1.7485), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:55,104 - root - INFO - Step 18180: lr=1.00E-05, loss= 1.2455 (max= 1.7485), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:55,104 - root - INFO - Step 18180: lr=1.00E-05, loss= 1.2455 (max= 1.7485), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:55,104 - root - INFO - Step 18180: lr=1.00E-05, loss= 1.2455 (max= 1.7485), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:55,104 - root - INFO - Step 18180: lr=1.00E-05, loss= 1.2455 (max= 1.7485), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:35:55,104 - root - INFO - Step 18180: lr=1.00E-05, loss= 1.2455 (max= 1.7485), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:36:11,006 - root - INFO - Step 18190: lr=1.00E-05, loss= 1.2004 (max= 1.8943), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:11,006 - root - INFO - Step 18190: lr=1.00E-05, loss= 1.2004 (max= 1.8943), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:11,006 - root - INFO - Step 18190: lr=1.00E-05, loss= 1.2004 (max= 1.8943), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:11,006 - root - INFO - Step 18190: lr=1.00E-05, loss= 1.2004 (max= 1.8943), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:11,006 - root - INFO - Step 18190: lr=1.00E-05, loss= 1.2004 (max= 1.8943), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:11,006 - root - INFO - Step 18190: lr=1.00E-05, loss= 1.2004 (max= 1.8943), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:11,006 - root - INFO - Step 18190: lr=1.00E-05, loss= 1.2004 (max= 1.8943), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:11,006 - root - INFO - Step 18190: lr=1.00E-05, loss= 1.2004 (max= 1.8943), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:13,366 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:2613758 +2025-10-24 18:36:26,990 - root - INFO - Step 18200: lr=1.00E-05, loss= 1.2493 (max= 1.6889), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:36:26,990 - root - INFO - Step 18200: lr=1.00E-05, loss= 1.2493 (max= 1.6889), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:36:26,990 - root - INFO - Step 18200: lr=1.00E-05, loss= 1.2493 (max= 1.6889), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:36:26,990 - root - INFO - Step 18200: lr=1.00E-05, loss= 1.2493 (max= 1.6889), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:36:26,990 - root - INFO - Step 18200: lr=1.00E-05, loss= 1.2493 (max= 1.6889), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:36:26,990 - root - INFO - Step 18200: lr=1.00E-05, loss= 1.2493 (max= 1.6889), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:36:26,990 - root - INFO - Step 18200: lr=1.00E-05, loss= 1.2493 (max= 1.6889), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:36:26,990 - root - INFO - Step 18200: lr=1.00E-05, loss= 1.2493 (max= 1.6889), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:36:42,935 - root - INFO - Step 18210: lr=1.00E-05, loss= 1.2416 (max= 1.6345), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:42,935 - root - INFO - Step 18210: lr=1.00E-05, loss= 1.2416 (max= 1.6345), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:42,935 - root - INFO - Step 18210: lr=1.00E-05, loss= 1.2416 (max= 1.6345), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:42,935 - root - INFO - Step 18210: lr=1.00E-05, loss= 1.2416 (max= 1.6345), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:42,935 - root - INFO - Step 18210: lr=1.00E-05, loss= 1.2416 (max= 1.6345), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:42,935 - root - INFO - Step 18210: lr=1.00E-05, loss= 1.2416 (max= 1.6345), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:42,935 - root - INFO - Step 18210: lr=1.00E-05, loss= 1.2416 (max= 1.6345), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:42,935 - root - INFO - Step 18210: lr=1.00E-05, loss= 1.2416 (max= 1.6345), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:58,878 - root - INFO - Step 18220: lr=1.00E-05, loss= 1.2201 (max= 1.7002), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:58,878 - root - INFO - Step 18220: lr=1.00E-05, loss= 1.2201 (max= 1.7002), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:58,878 - root - INFO - Step 18220: lr=1.00E-05, loss= 1.2201 (max= 1.7002), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:58,878 - root - INFO - Step 18220: lr=1.00E-05, loss= 1.2201 (max= 1.7002), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:58,878 - root - INFO - Step 18220: lr=1.00E-05, loss= 1.2201 (max= 1.7002), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:58,878 - root - INFO - Step 18220: lr=1.00E-05, loss= 1.2201 (max= 1.7002), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:58,878 - root - INFO - Step 18220: lr=1.00E-05, loss= 1.2201 (max= 1.7002), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:36:58,878 - root - INFO - Step 18220: lr=1.00E-05, loss= 1.2201 (max= 1.7002), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:14,835 - root - INFO - Step 18230: lr=1.00E-05, loss= 1.2048 (max= 1.6975), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:14,835 - root - INFO - Step 18230: lr=1.00E-05, loss= 1.2048 (max= 1.6975), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:14,835 - root - INFO - Step 18230: lr=1.00E-05, loss= 1.2048 (max= 1.6975), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:14,835 - root - INFO - Step 18230: lr=1.00E-05, loss= 1.2048 (max= 1.6975), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:14,835 - root - INFO - Step 18230: lr=1.00E-05, loss= 1.2048 (max= 1.6975), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:14,835 - root - INFO - Step 18230: lr=1.00E-05, loss= 1.2048 (max= 1.6975), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:14,835 - root - INFO - Step 18230: lr=1.00E-05, loss= 1.2048 (max= 1.6975), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:14,835 - root - INFO - Step 18230: lr=1.00E-05, loss= 1.2048 (max= 1.6975), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:30,727 - root - INFO - Step 18240: lr=1.00E-05, loss= 1.2474 (max= 1.6254), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:30,727 - root - INFO - Step 18240: lr=1.00E-05, loss= 1.2474 (max= 1.6254), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:30,727 - root - INFO - Step 18240: lr=1.00E-05, loss= 1.2474 (max= 1.6254), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:30,727 - root - INFO - Step 18240: lr=1.00E-05, loss= 1.2474 (max= 1.6254), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:30,727 - root - INFO - Step 18240: lr=1.00E-05, loss= 1.2474 (max= 1.6254), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:30,727 - root - INFO - Step 18240: lr=1.00E-05, loss= 1.2474 (max= 1.6254), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:30,727 - root - INFO - Step 18240: lr=1.00E-05, loss= 1.2474 (max= 1.6254), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:30,727 - root - INFO - Step 18240: lr=1.00E-05, loss= 1.2474 (max= 1.6254), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:46,655 - root - INFO - Step 18250: lr=1.00E-05, loss= 1.2281 (max= 1.6016), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:46,655 - root - INFO - Step 18250: lr=1.00E-05, loss= 1.2281 (max= 1.6016), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:46,655 - root - INFO - Step 18250: lr=1.00E-05, loss= 1.2281 (max= 1.6016), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:46,656 - root - INFO - Step 18250: lr=1.00E-05, loss= 1.2281 (max= 1.6016), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:46,656 - root - INFO - Step 18250: lr=1.00E-05, loss= 1.2281 (max= 1.6016), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:46,656 - root - INFO - Step 18250: lr=1.00E-05, loss= 1.2281 (max= 1.6016), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:46,656 - root - INFO - Step 18250: lr=1.00E-05, loss= 1.2281 (max= 1.6016), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:37:46,656 - root - INFO - Step 18250: lr=1.00E-05, loss= 1.2281 (max= 1.6016), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:38:02,617 - root - INFO - Step 18260: lr=1.00E-05, loss= 1.2183 (max= 1.5534), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:38:02,617 - root - INFO - Step 18260: lr=1.00E-05, loss= 1.2183 (max= 1.5534), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:38:02,617 - root - INFO - Step 18260: lr=1.00E-05, loss= 1.2183 (max= 1.5534), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:38:02,617 - root - INFO - Step 18260: lr=1.00E-05, loss= 1.2183 (max= 1.5534), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:38:02,617 - root - INFO - Step 18260: lr=1.00E-05, loss= 1.2183 (max= 1.5534), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:38:02,617 - root - INFO - Step 18260: lr=1.00E-05, loss= 1.2183 (max= 1.5534), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:38:02,617 - root - INFO - Step 18260: lr=1.00E-05, loss= 1.2183 (max= 1.5534), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:38:02,617 - root - INFO - Step 18260: lr=1.00E-05, loss= 1.2183 (max= 1.5534), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:38:18,571 - root - INFO - Step 18270: lr=1.00E-05, loss= 1.2070 (max= 1.6944), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:38:18,571 - root - INFO - Step 18270: lr=1.00E-05, loss= 1.2070 (max= 1.6944), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:38:18,571 - root - INFO - Step 18270: lr=1.00E-05, loss= 1.2070 (max= 1.6944), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:38:18,571 - root - INFO - Step 18270: lr=1.00E-05, loss= 1.2070 (max= 1.6944), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:38:18,571 - root - INFO - Step 18270: lr=1.00E-05, loss= 1.2070 (max= 1.6944), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:38:18,571 - root - INFO - Step 18270: lr=1.00E-05, loss= 1.2070 (max= 1.6944), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:38:18,571 - root - INFO - Step 18270: lr=1.00E-05, loss= 1.2070 (max= 1.6944), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:38:18,571 - root - INFO - Step 18270: lr=1.00E-05, loss= 1.2070 (max= 1.6944), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:38:34,503 - root - INFO - Step 18280: lr=1.00E-05, loss= 1.2492 (max= 1.5997), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:38:34,503 - root - INFO - Step 18280: lr=1.00E-05, loss= 1.2492 (max= 1.5997), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:38:34,503 - root - INFO - Step 18280: lr=1.00E-05, loss= 1.2492 (max= 1.5997), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:38:34,503 - root - INFO - Step 18280: lr=1.00E-05, loss= 1.2492 (max= 1.5997), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:38:34,503 - root - INFO - Step 18280: lr=1.00E-05, loss= 1.2492 (max= 1.5997), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:38:34,503 - root - INFO - Step 18280: lr=1.00E-05, loss= 1.2492 (max= 1.5997), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:38:34,503 - root - INFO - Step 18280: lr=1.00E-05, loss= 1.2492 (max= 1.5997), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:38:34,503 - root - INFO - Step 18280: lr=1.00E-05, loss= 1.2492 (max= 1.5997), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:38:50,422 - root - INFO - Step 18290: lr=1.00E-05, loss= 1.2230 (max= 1.6126), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:38:50,422 - root - INFO - Step 18290: lr=1.00E-05, loss= 1.2230 (max= 1.6126), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:38:50,422 - root - INFO - Step 18290: lr=1.00E-05, loss= 1.2230 (max= 1.6126), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:38:50,422 - root - INFO - Step 18290: lr=1.00E-05, loss= 1.2230 (max= 1.6126), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:38:50,422 - root - INFO - Step 18290: lr=1.00E-05, loss= 1.2230 (max= 1.6126), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:38:50,422 - root - INFO - Step 18290: lr=1.00E-05, loss= 1.2230 (max= 1.6126), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:38:50,422 - root - INFO - Step 18290: lr=1.00E-05, loss= 1.2230 (max= 1.6126), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:38:50,422 - root - INFO - Step 18290: lr=1.00E-05, loss= 1.2230 (max= 1.6126), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:39:06,416 - root - INFO - Step 18300: lr=1.00E-05, loss= 1.2304 (max= 1.8277), tps=20492, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:39:06,416 - root - INFO - Step 18300: lr=1.00E-05, loss= 1.2304 (max= 1.8277), tps=20492, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:39:06,416 - root - INFO - Step 18300: lr=1.00E-05, loss= 1.2304 (max= 1.8277), tps=20492, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:39:06,416 - root - INFO - Step 18300: lr=1.00E-05, loss= 1.2304 (max= 1.8277), tps=20492, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:39:06,416 - root - INFO - Step 18300: lr=1.00E-05, loss= 1.2304 (max= 1.8277), tps=20492, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:39:06,416 - root - INFO - Step 18300: lr=1.00E-05, loss= 1.2304 (max= 1.8277), tps=20492, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:39:06,416 - root - INFO - Step 18300: lr=1.00E-05, loss= 1.2304 (max= 1.8277), tps=20492, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:39:06,416 - root - INFO - Step 18300: lr=1.00E-05, loss= 1.2304 (max= 1.8277), tps=20492, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:39:22,368 - root - INFO - Step 18310: lr=1.00E-05, loss= 1.2579 (max= 1.6773), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:22,368 - root - INFO - Step 18310: lr=1.00E-05, loss= 1.2579 (max= 1.6773), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:22,368 - root - INFO - Step 18310: lr=1.00E-05, loss= 1.2579 (max= 1.6773), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:22,368 - root - INFO - Step 18310: lr=1.00E-05, loss= 1.2579 (max= 1.6773), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:22,369 - root - INFO - Step 18310: lr=1.00E-05, loss= 1.2579 (max= 1.6773), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:22,369 - root - INFO - Step 18310: lr=1.00E-05, loss= 1.2579 (max= 1.6773), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:22,369 - root - INFO - Step 18310: lr=1.00E-05, loss= 1.2579 (max= 1.6773), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:22,369 - root - INFO - Step 18310: lr=1.00E-05, loss= 1.2579 (max= 1.6773), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:38,268 - root - INFO - Step 18320: lr=1.00E-05, loss= 1.2232 (max= 1.6424), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:38,268 - root - INFO - Step 18320: lr=1.00E-05, loss= 1.2232 (max= 1.6424), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:38,268 - root - INFO - Step 18320: lr=1.00E-05, loss= 1.2232 (max= 1.6424), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:38,268 - root - INFO - Step 18320: lr=1.00E-05, loss= 1.2232 (max= 1.6424), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:38,268 - root - INFO - Step 18320: lr=1.00E-05, loss= 1.2232 (max= 1.6424), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:38,268 - root - INFO - Step 18320: lr=1.00E-05, loss= 1.2232 (max= 1.6424), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:38,268 - root - INFO - Step 18320: lr=1.00E-05, loss= 1.2232 (max= 1.6424), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:38,268 - root - INFO - Step 18320: lr=1.00E-05, loss= 1.2232 (max= 1.6424), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:54,180 - root - INFO - Step 18330: lr=1.00E-05, loss= 1.2280 (max= 1.7349), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:54,180 - root - INFO - Step 18330: lr=1.00E-05, loss= 1.2280 (max= 1.7349), tps=20597, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:54,181 - root - INFO - Step 18330: lr=1.00E-05, loss= 1.2280 (max= 1.7349), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:54,181 - root - INFO - Step 18330: lr=1.00E-05, loss= 1.2280 (max= 1.7349), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:54,181 - root - INFO - Step 18330: lr=1.00E-05, loss= 1.2280 (max= 1.7349), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:54,181 - root - INFO - Step 18330: lr=1.00E-05, loss= 1.2280 (max= 1.7349), tps=20597, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:54,181 - root - INFO - Step 18330: lr=1.00E-05, loss= 1.2280 (max= 1.7349), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:39:54,181 - root - INFO - Step 18330: lr=1.00E-05, loss= 1.2280 (max= 1.7349), tps=20597, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:40:10,163 - root - INFO - Step 18340: lr=1.00E-05, loss= 1.2670 (max= 1.5773), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:40:10,163 - root - INFO - Step 18340: lr=1.00E-05, loss= 1.2670 (max= 1.5773), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:40:10,163 - root - INFO - Step 18340: lr=1.00E-05, loss= 1.2670 (max= 1.5773), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:40:10,163 - root - INFO - Step 18340: lr=1.00E-05, loss= 1.2670 (max= 1.5773), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:40:10,163 - root - INFO - Step 18340: lr=1.00E-05, loss= 1.2670 (max= 1.5773), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:40:10,163 - root - INFO - Step 18340: lr=1.00E-05, loss= 1.2670 (max= 1.5773), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:40:10,163 - root - INFO - Step 18340: lr=1.00E-05, loss= 1.2670 (max= 1.5773), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:40:10,163 - root - INFO - Step 18340: lr=1.00E-05, loss= 1.2670 (max= 1.5773), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:40:26,050 - root - INFO - Step 18350: lr=1.00E-05, loss= 1.2497 (max= 1.6819), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:40:26,050 - root - INFO - Step 18350: lr=1.00E-05, loss= 1.2497 (max= 1.6819), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:40:26,050 - root - INFO - Step 18350: lr=1.00E-05, loss= 1.2497 (max= 1.6819), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:40:26,050 - root - INFO - Step 18350: lr=1.00E-05, loss= 1.2497 (max= 1.6819), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:40:26,050 - root - INFO - Step 18350: lr=1.00E-05, loss= 1.2497 (max= 1.6819), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:40:26,050 - root - INFO - Step 18350: lr=1.00E-05, loss= 1.2497 (max= 1.6819), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:40:26,050 - root - INFO - Step 18350: lr=1.00E-05, loss= 1.2497 (max= 1.6819), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:40:26,050 - root - INFO - Step 18350: lr=1.00E-05, loss= 1.2497 (max= 1.6819), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:40:42,001 - root - INFO - Step 18360: lr=1.00E-05, loss= 1.2271 (max= 1.7347), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:40:42,001 - root - INFO - Step 18360: lr=1.00E-05, loss= 1.2271 (max= 1.7347), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:40:42,001 - root - INFO - Step 18360: lr=1.00E-05, loss= 1.2271 (max= 1.7347), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:40:42,001 - root - INFO - Step 18360: lr=1.00E-05, loss= 1.2271 (max= 1.7347), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:40:42,001 - root - INFO - Step 18360: lr=1.00E-05, loss= 1.2271 (max= 1.7347), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:40:42,001 - root - INFO - Step 18360: lr=1.00E-05, loss= 1.2271 (max= 1.7347), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:40:42,002 - root - INFO - Step 18360: lr=1.00E-05, loss= 1.2271 (max= 1.7347), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:40:42,002 - root - INFO - Step 18360: lr=1.00E-05, loss= 1.2271 (max= 1.7347), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:40:57,966 - root - INFO - Step 18370: lr=1.00E-05, loss= 1.1820 (max= 1.5026), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:40:57,966 - root - INFO - Step 18370: lr=1.00E-05, loss= 1.1820 (max= 1.5026), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:40:57,966 - root - INFO - Step 18370: lr=1.00E-05, loss= 1.1820 (max= 1.5026), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:40:57,966 - root - INFO - Step 18370: lr=1.00E-05, loss= 1.1820 (max= 1.5026), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:40:57,967 - root - INFO - Step 18370: lr=1.00E-05, loss= 1.1820 (max= 1.5026), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:40:57,967 - root - INFO - Step 18370: lr=1.00E-05, loss= 1.1820 (max= 1.5026), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:40:57,967 - root - INFO - Step 18370: lr=1.00E-05, loss= 1.1820 (max= 1.5026), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:40:57,967 - root - INFO - Step 18370: lr=1.00E-05, loss= 1.1820 (max= 1.5026), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:41:13,875 - root - INFO - Step 18380: lr=1.00E-05, loss= 1.2335 (max= 2.3173), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:13,876 - root - INFO - Step 18380: lr=1.00E-05, loss= 1.2335 (max= 2.3173), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:13,876 - root - INFO - Step 18380: lr=1.00E-05, loss= 1.2335 (max= 2.3173), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:13,876 - root - INFO - Step 18380: lr=1.00E-05, loss= 1.2335 (max= 2.3173), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:13,876 - root - INFO - Step 18380: lr=1.00E-05, loss= 1.2335 (max= 2.3173), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:13,876 - root - INFO - Step 18380: lr=1.00E-05, loss= 1.2335 (max= 2.3173), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:13,876 - root - INFO - Step 18380: lr=1.00E-05, loss= 1.2335 (max= 2.3173), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:13,876 - root - INFO - Step 18380: lr=1.00E-05, loss= 1.2335 (max= 2.3173), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:29,834 - root - INFO - Step 18390: lr=1.00E-05, loss= 1.2144 (max= 1.5852), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:29,835 - root - INFO - Step 18390: lr=1.00E-05, loss= 1.2144 (max= 1.5852), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:29,835 - root - INFO - Step 18390: lr=1.00E-05, loss= 1.2144 (max= 1.5852), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:29,835 - root - INFO - Step 18390: lr=1.00E-05, loss= 1.2144 (max= 1.5852), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:29,835 - root - INFO - Step 18390: lr=1.00E-05, loss= 1.2144 (max= 1.5852), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:29,835 - root - INFO - Step 18390: lr=1.00E-05, loss= 1.2144 (max= 1.5852), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:29,835 - root - INFO - Step 18390: lr=1.00E-05, loss= 1.2144 (max= 1.5852), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:29,835 - root - INFO - Step 18390: lr=1.00E-05, loss= 1.2144 (max= 1.5852), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:45,722 - root - INFO - Step 18400: lr=1.00E-05, loss= 1.2452 (max= 1.6307), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:45,722 - root - INFO - Step 18400: lr=1.00E-05, loss= 1.2452 (max= 1.6307), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:45,722 - root - INFO - Step 18400: lr=1.00E-05, loss= 1.2452 (max= 1.6307), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:45,722 - root - INFO - Step 18400: lr=1.00E-05, loss= 1.2452 (max= 1.6307), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:45,723 - root - INFO - Step 18400: lr=1.00E-05, loss= 1.2452 (max= 1.6307), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:45,723 - root - INFO - Step 18400: lr=1.00E-05, loss= 1.2452 (max= 1.6307), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:45,723 - root - INFO - Step 18400: lr=1.00E-05, loss= 1.2452 (max= 1.6307), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:41:45,723 - root - INFO - Step 18400: lr=1.00E-05, loss= 1.2452 (max= 1.6307), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:01,644 - root - INFO - Step 18410: lr=1.00E-05, loss= 1.2446 (max= 1.6736), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:01,644 - root - INFO - Step 18410: lr=1.00E-05, loss= 1.2446 (max= 1.6736), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:01,644 - root - INFO - Step 18410: lr=1.00E-05, loss= 1.2446 (max= 1.6736), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:01,645 - root - INFO - Step 18410: lr=1.00E-05, loss= 1.2446 (max= 1.6736), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:01,645 - root - INFO - Step 18410: lr=1.00E-05, loss= 1.2446 (max= 1.6736), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:01,645 - root - INFO - Step 18410: lr=1.00E-05, loss= 1.2446 (max= 1.6736), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:01,645 - root - INFO - Step 18410: lr=1.00E-05, loss= 1.2446 (max= 1.6736), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:01,645 - root - INFO - Step 18410: lr=1.00E-05, loss= 1.2446 (max= 1.6736), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:17,591 - root - INFO - Step 18420: lr=1.00E-05, loss= 1.2371 (max= 1.7421), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:17,591 - root - INFO - Step 18420: lr=1.00E-05, loss= 1.2371 (max= 1.7421), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:17,591 - root - INFO - Step 18420: lr=1.00E-05, loss= 1.2371 (max= 1.7421), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:17,591 - root - INFO - Step 18420: lr=1.00E-05, loss= 1.2371 (max= 1.7421), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:17,591 - root - INFO - Step 18420: lr=1.00E-05, loss= 1.2371 (max= 1.7421), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:17,592 - root - INFO - Step 18420: lr=1.00E-05, loss= 1.2371 (max= 1.7421), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:17,592 - root - INFO - Step 18420: lr=1.00E-05, loss= 1.2371 (max= 1.7421), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:17,592 - root - INFO - Step 18420: lr=1.00E-05, loss= 1.2371 (max= 1.7421), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:33,544 - root - INFO - Step 18430: lr=1.00E-05, loss= 1.2518 (max= 1.7240), tps=20545, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:33,544 - root - INFO - Step 18430: lr=1.00E-05, loss= 1.2518 (max= 1.7240), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:33,544 - root - INFO - Step 18430: lr=1.00E-05, loss= 1.2518 (max= 1.7240), tps=20545, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:33,545 - root - INFO - Step 18430: lr=1.00E-05, loss= 1.2518 (max= 1.7240), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:33,545 - root - INFO - Step 18430: lr=1.00E-05, loss= 1.2518 (max= 1.7240), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:33,545 - root - INFO - Step 18430: lr=1.00E-05, loss= 1.2518 (max= 1.7240), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:33,545 - root - INFO - Step 18430: lr=1.00E-05, loss= 1.2518 (max= 1.7240), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:33,545 - root - INFO - Step 18430: lr=1.00E-05, loss= 1.2518 (max= 1.7240), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:49,459 - root - INFO - Step 18440: lr=1.00E-05, loss= 1.2258 (max= 1.5909), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:49,459 - root - INFO - Step 18440: lr=1.00E-05, loss= 1.2258 (max= 1.5909), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:49,459 - root - INFO - Step 18440: lr=1.00E-05, loss= 1.2258 (max= 1.5909), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:49,460 - root - INFO - Step 18440: lr=1.00E-05, loss= 1.2258 (max= 1.5909), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:49,460 - root - INFO - Step 18440: lr=1.00E-05, loss= 1.2258 (max= 1.5909), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:49,460 - root - INFO - Step 18440: lr=1.00E-05, loss= 1.2258 (max= 1.5909), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:49,460 - root - INFO - Step 18440: lr=1.00E-05, loss= 1.2258 (max= 1.5909), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:42:49,460 - root - INFO - Step 18440: lr=1.00E-05, loss= 1.2258 (max= 1.5909), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:05,415 - root - INFO - Step 18450: lr=1.00E-05, loss= 1.2221 (max= 1.7733), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:05,416 - root - INFO - Step 18450: lr=1.00E-05, loss= 1.2221 (max= 1.7733), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:05,416 - root - INFO - Step 18450: lr=1.00E-05, loss= 1.2221 (max= 1.7733), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:05,416 - root - INFO - Step 18450: lr=1.00E-05, loss= 1.2221 (max= 1.7733), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:05,416 - root - INFO - Step 18450: lr=1.00E-05, loss= 1.2221 (max= 1.7733), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:05,416 - root - INFO - Step 18450: lr=1.00E-05, loss= 1.2221 (max= 1.7733), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:05,416 - root - INFO - Step 18450: lr=1.00E-05, loss= 1.2221 (max= 1.7733), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:05,416 - root - INFO - Step 18450: lr=1.00E-05, loss= 1.2221 (max= 1.7733), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:21,345 - root - INFO - Step 18460: lr=1.00E-05, loss= 1.2495 (max= 1.5367), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:21,345 - root - INFO - Step 18460: lr=1.00E-05, loss= 1.2495 (max= 1.5367), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:21,346 - root - INFO - Step 18460: lr=1.00E-05, loss= 1.2495 (max= 1.5367), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:21,346 - root - INFO - Step 18460: lr=1.00E-05, loss= 1.2495 (max= 1.5367), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:21,346 - root - INFO - Step 18460: lr=1.00E-05, loss= 1.2495 (max= 1.5367), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:21,346 - root - INFO - Step 18460: lr=1.00E-05, loss= 1.2495 (max= 1.5367), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:21,346 - root - INFO - Step 18460: lr=1.00E-05, loss= 1.2495 (max= 1.5367), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:21,346 - root - INFO - Step 18460: lr=1.00E-05, loss= 1.2495 (max= 1.5367), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:37,282 - root - INFO - Step 18470: lr=1.00E-05, loss= 1.2430 (max= 1.5810), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:37,282 - root - INFO - Step 18470: lr=1.00E-05, loss= 1.2430 (max= 1.5810), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:37,282 - root - INFO - Step 18470: lr=1.00E-05, loss= 1.2430 (max= 1.5810), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:37,282 - root - INFO - Step 18470: lr=1.00E-05, loss= 1.2430 (max= 1.5810), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:37,282 - root - INFO - Step 18470: lr=1.00E-05, loss= 1.2430 (max= 1.5810), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:37,282 - root - INFO - Step 18470: lr=1.00E-05, loss= 1.2430 (max= 1.5810), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:37,282 - root - INFO - Step 18470: lr=1.00E-05, loss= 1.2430 (max= 1.5810), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:37,282 - root - INFO - Step 18470: lr=1.00E-05, loss= 1.2430 (max= 1.5810), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:53,210 - root - INFO - Step 18480: lr=1.00E-05, loss= 1.2220 (max= 1.7327), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:53,210 - root - INFO - Step 18480: lr=1.00E-05, loss= 1.2220 (max= 1.7327), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:53,211 - root - INFO - Step 18480: lr=1.00E-05, loss= 1.2220 (max= 1.7327), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:53,211 - root - INFO - Step 18480: lr=1.00E-05, loss= 1.2220 (max= 1.7327), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:53,211 - root - INFO - Step 18480: lr=1.00E-05, loss= 1.2220 (max= 1.7327), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:53,211 - root - INFO - Step 18480: lr=1.00E-05, loss= 1.2220 (max= 1.7327), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:53,211 - root - INFO - Step 18480: lr=1.00E-05, loss= 1.2220 (max= 1.7327), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:43:53,211 - root - INFO - Step 18480: lr=1.00E-05, loss= 1.2220 (max= 1.7327), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:09,164 - root - INFO - Step 18490: lr=1.00E-05, loss= 1.2360 (max= 1.6015), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:09,165 - root - INFO - Step 18490: lr=1.00E-05, loss= 1.2360 (max= 1.6015), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:09,165 - root - INFO - Step 18490: lr=1.00E-05, loss= 1.2360 (max= 1.6015), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:09,165 - root - INFO - Step 18490: lr=1.00E-05, loss= 1.2360 (max= 1.6015), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:09,165 - root - INFO - Step 18490: lr=1.00E-05, loss= 1.2360 (max= 1.6015), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:09,165 - root - INFO - Step 18490: lr=1.00E-05, loss= 1.2360 (max= 1.6015), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:09,165 - root - INFO - Step 18490: lr=1.00E-05, loss= 1.2360 (max= 1.6015), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:09,165 - root - INFO - Step 18490: lr=1.00E-05, loss= 1.2360 (max= 1.6015), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:25,094 - root - INFO - Step 18500: lr=1.00E-05, loss= 1.2280 (max= 1.8434), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:25,095 - root - INFO - Step 18500: lr=1.00E-05, loss= 1.2280 (max= 1.8434), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:25,095 - root - INFO - Step 18500: lr=1.00E-05, loss= 1.2280 (max= 1.8434), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:25,095 - root - INFO - Step 18500: lr=1.00E-05, loss= 1.2280 (max= 1.8434), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:25,095 - root - INFO - Step 18500: lr=1.00E-05, loss= 1.2280 (max= 1.8434), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:25,095 - root - INFO - Step 18500: lr=1.00E-05, loss= 1.2280 (max= 1.8434), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:25,095 - root - INFO - Step 18500: lr=1.00E-05, loss= 1.2280 (max= 1.8434), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:25,095 - root - INFO - Step 18500: lr=1.00E-05, loss= 1.2280 (max= 1.8434), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:41,009 - root - INFO - Step 18510: lr=1.00E-05, loss= 1.2388 (max= 1.6163), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:41,009 - root - INFO - Step 18510: lr=1.00E-05, loss= 1.2388 (max= 1.6163), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:41,009 - root - INFO - Step 18510: lr=1.00E-05, loss= 1.2388 (max= 1.6163), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:41,009 - root - INFO - Step 18510: lr=1.00E-05, loss= 1.2388 (max= 1.6163), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:41,009 - root - INFO - Step 18510: lr=1.00E-05, loss= 1.2388 (max= 1.6163), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:41,009 - root - INFO - Step 18510: lr=1.00E-05, loss= 1.2388 (max= 1.6163), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:41,009 - root - INFO - Step 18510: lr=1.00E-05, loss= 1.2388 (max= 1.6163), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:41,009 - root - INFO - Step 18510: lr=1.00E-05, loss= 1.2388 (max= 1.6163), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:56,970 - root - INFO - Step 18520: lr=1.00E-05, loss= 1.2603 (max= 1.6187), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:56,970 - root - INFO - Step 18520: lr=1.00E-05, loss= 1.2603 (max= 1.6187), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:56,970 - root - INFO - Step 18520: lr=1.00E-05, loss= 1.2603 (max= 1.6187), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:56,971 - root - INFO - Step 18520: lr=1.00E-05, loss= 1.2603 (max= 1.6187), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:56,971 - root - INFO - Step 18520: lr=1.00E-05, loss= 1.2603 (max= 1.6187), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:56,971 - root - INFO - Step 18520: lr=1.00E-05, loss= 1.2603 (max= 1.6187), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:56,971 - root - INFO - Step 18520: lr=1.00E-05, loss= 1.2603 (max= 1.6187), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:44:56,971 - root - INFO - Step 18520: lr=1.00E-05, loss= 1.2603 (max= 1.6187), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:45:12,900 - root - INFO - Step 18530: lr=1.00E-05, loss= 1.2315 (max= 1.7293), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:45:12,900 - root - INFO - Step 18530: lr=1.00E-05, loss= 1.2315 (max= 1.7293), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:45:12,900 - root - INFO - Step 18530: lr=1.00E-05, loss= 1.2315 (max= 1.7293), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:45:12,900 - root - INFO - Step 18530: lr=1.00E-05, loss= 1.2315 (max= 1.7293), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:45:12,900 - root - INFO - Step 18530: lr=1.00E-05, loss= 1.2315 (max= 1.7293), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:45:12,900 - root - INFO - Step 18530: lr=1.00E-05, loss= 1.2315 (max= 1.7293), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:45:12,900 - root - INFO - Step 18530: lr=1.00E-05, loss= 1.2315 (max= 1.7293), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:45:12,900 - root - INFO - Step 18530: lr=1.00E-05, loss= 1.2315 (max= 1.7293), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:45:28,804 - root - INFO - Step 18540: lr=1.00E-05, loss= 1.2419 (max= 1.6682), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:45:28,804 - root - INFO - Step 18540: lr=1.00E-05, loss= 1.2419 (max= 1.6682), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:45:28,804 - root - INFO - Step 18540: lr=1.00E-05, loss= 1.2419 (max= 1.6682), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:45:28,804 - root - INFO - Step 18540: lr=1.00E-05, loss= 1.2419 (max= 1.6682), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:45:28,804 - root - INFO - Step 18540: lr=1.00E-05, loss= 1.2419 (max= 1.6682), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:45:28,804 - root - INFO - Step 18540: lr=1.00E-05, loss= 1.2419 (max= 1.6682), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:45:28,804 - root - INFO - Step 18540: lr=1.00E-05, loss= 1.2419 (max= 1.6682), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:45:28,805 - root - INFO - Step 18540: lr=1.00E-05, loss= 1.2419 (max= 1.6682), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:45:44,678 - root - INFO - Step 18550: lr=1.00E-05, loss= 1.2303 (max= 1.6192), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:45:44,678 - root - INFO - Step 18550: lr=1.00E-05, loss= 1.2303 (max= 1.6192), tps=20648, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:45:44,678 - root - INFO - Step 18550: lr=1.00E-05, loss= 1.2303 (max= 1.6192), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:45:44,678 - root - INFO - Step 18550: lr=1.00E-05, loss= 1.2303 (max= 1.6192), tps=20648, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:45:44,678 - root - INFO - Step 18550: lr=1.00E-05, loss= 1.2303 (max= 1.6192), tps=20648, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:45:44,678 - root - INFO - Step 18550: lr=1.00E-05, loss= 1.2303 (max= 1.6192), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:45:44,678 - root - INFO - Step 18550: lr=1.00E-05, loss= 1.2303 (max= 1.6192), tps=20648, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:45:44,678 - root - INFO - Step 18550: lr=1.00E-05, loss= 1.2303 (max= 1.6192), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:46:00,628 - root - INFO - Step 18560: lr=1.00E-05, loss= 1.2297 (max= 1.5635), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:46:00,629 - root - INFO - Step 18560: lr=1.00E-05, loss= 1.2297 (max= 1.5635), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:46:00,629 - root - INFO - Step 18560: lr=1.00E-05, loss= 1.2297 (max= 1.5635), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:46:00,629 - root - INFO - Step 18560: lr=1.00E-05, loss= 1.2297 (max= 1.5635), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:46:00,629 - root - INFO - Step 18560: lr=1.00E-05, loss= 1.2297 (max= 1.5635), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:46:00,629 - root - INFO - Step 18560: lr=1.00E-05, loss= 1.2297 (max= 1.5635), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:46:00,629 - root - INFO - Step 18560: lr=1.00E-05, loss= 1.2297 (max= 1.5635), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:46:00,629 - root - INFO - Step 18560: lr=1.00E-05, loss= 1.2297 (max= 1.5635), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:46:16,564 - root - INFO - Step 18570: lr=1.00E-05, loss= 1.2450 (max= 1.6093), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:46:16,565 - root - INFO - Step 18570: lr=1.00E-05, loss= 1.2450 (max= 1.6093), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:46:16,565 - root - INFO - Step 18570: lr=1.00E-05, loss= 1.2450 (max= 1.6093), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:46:16,565 - root - INFO - Step 18570: lr=1.00E-05, loss= 1.2450 (max= 1.6093), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:46:16,565 - root - INFO - Step 18570: lr=1.00E-05, loss= 1.2450 (max= 1.6093), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:46:16,565 - root - INFO - Step 18570: lr=1.00E-05, loss= 1.2450 (max= 1.6093), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:46:16,565 - root - INFO - Step 18570: lr=1.00E-05, loss= 1.2450 (max= 1.6093), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:46:16,565 - root - INFO - Step 18570: lr=1.00E-05, loss= 1.2450 (max= 1.6093), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:46:32,490 - root - INFO - Step 18580: lr=1.00E-05, loss= 1.2101 (max= 1.8072), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:46:32,490 - root - INFO - Step 18580: lr=1.00E-05, loss= 1.2101 (max= 1.8072), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:46:32,490 - root - INFO - Step 18580: lr=1.00E-05, loss= 1.2101 (max= 1.8072), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:46:32,490 - root - INFO - Step 18580: lr=1.00E-05, loss= 1.2101 (max= 1.8072), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:46:32,490 - root - INFO - Step 18580: lr=1.00E-05, loss= 1.2101 (max= 1.8072), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:46:32,490 - root - INFO - Step 18580: lr=1.00E-05, loss= 1.2101 (max= 1.8072), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:46:32,490 - root - INFO - Step 18580: lr=1.00E-05, loss= 1.2101 (max= 1.8072), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:46:32,490 - root - INFO - Step 18580: lr=1.00E-05, loss= 1.2101 (max= 1.8072), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:46:48,443 - root - INFO - Step 18590: lr=1.00E-05, loss= 1.2361 (max= 1.7839), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:46:48,443 - root - INFO - Step 18590: lr=1.00E-05, loss= 1.2361 (max= 1.7839), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:46:48,443 - root - INFO - Step 18590: lr=1.00E-05, loss= 1.2361 (max= 1.7839), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:46:48,443 - root - INFO - Step 18590: lr=1.00E-05, loss= 1.2361 (max= 1.7839), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:46:48,443 - root - INFO - Step 18590: lr=1.00E-05, loss= 1.2361 (max= 1.7839), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:46:48,443 - root - INFO - Step 18590: lr=1.00E-05, loss= 1.2361 (max= 1.7839), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:46:48,443 - root - INFO - Step 18590: lr=1.00E-05, loss= 1.2361 (max= 1.7839), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:46:48,443 - root - INFO - Step 18590: lr=1.00E-05, loss= 1.2361 (max= 1.7839), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:04,429 - root - INFO - Step 18600: lr=1.00E-05, loss= 1.2509 (max= 1.5848), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:04,429 - root - INFO - Step 18600: lr=1.00E-05, loss= 1.2509 (max= 1.5848), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:04,429 - root - INFO - Step 18600: lr=1.00E-05, loss= 1.2509 (max= 1.5848), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:04,429 - root - INFO - Step 18600: lr=1.00E-05, loss= 1.2509 (max= 1.5848), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:04,429 - root - INFO - Step 18600: lr=1.00E-05, loss= 1.2509 (max= 1.5848), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:04,429 - root - INFO - Step 18600: lr=1.00E-05, loss= 1.2509 (max= 1.5848), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:04,429 - root - INFO - Step 18600: lr=1.00E-05, loss= 1.2509 (max= 1.5848), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:04,429 - root - INFO - Step 18600: lr=1.00E-05, loss= 1.2509 (max= 1.5848), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:20,337 - root - INFO - Step 18610: lr=1.00E-05, loss= 1.2376 (max= 1.5295), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:20,337 - root - INFO - Step 18610: lr=1.00E-05, loss= 1.2376 (max= 1.5295), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:20,337 - root - INFO - Step 18610: lr=1.00E-05, loss= 1.2376 (max= 1.5295), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:20,337 - root - INFO - Step 18610: lr=1.00E-05, loss= 1.2376 (max= 1.5295), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:20,337 - root - INFO - Step 18610: lr=1.00E-05, loss= 1.2376 (max= 1.5295), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:20,337 - root - INFO - Step 18610: lr=1.00E-05, loss= 1.2376 (max= 1.5295), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:20,337 - root - INFO - Step 18610: lr=1.00E-05, loss= 1.2376 (max= 1.5295), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:20,337 - root - INFO - Step 18610: lr=1.00E-05, loss= 1.2376 (max= 1.5295), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:36,304 - root - INFO - Step 18620: lr=1.00E-05, loss= 1.2307 (max= 1.5593), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:36,304 - root - INFO - Step 18620: lr=1.00E-05, loss= 1.2307 (max= 1.5593), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:36,304 - root - INFO - Step 18620: lr=1.00E-05, loss= 1.2307 (max= 1.5593), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:36,304 - root - INFO - Step 18620: lr=1.00E-05, loss= 1.2307 (max= 1.5593), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:36,304 - root - INFO - Step 18620: lr=1.00E-05, loss= 1.2307 (max= 1.5593), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:36,305 - root - INFO - Step 18620: lr=1.00E-05, loss= 1.2307 (max= 1.5593), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:36,305 - root - INFO - Step 18620: lr=1.00E-05, loss= 1.2307 (max= 1.5593), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:36,305 - root - INFO - Step 18620: lr=1.00E-05, loss= 1.2307 (max= 1.5593), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:47:52,249 - root - INFO - Step 18630: lr=1.00E-05, loss= 1.2519 (max= 1.7982), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:47:52,249 - root - INFO - Step 18630: lr=1.00E-05, loss= 1.2519 (max= 1.7982), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:47:52,249 - root - INFO - Step 18630: lr=1.00E-05, loss= 1.2519 (max= 1.7982), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:47:52,249 - root - INFO - Step 18630: lr=1.00E-05, loss= 1.2519 (max= 1.7982), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:47:52,249 - root - INFO - Step 18630: lr=1.00E-05, loss= 1.2519 (max= 1.7982), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:47:52,249 - root - INFO - Step 18630: lr=1.00E-05, loss= 1.2519 (max= 1.7982), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:47:52,249 - root - INFO - Step 18630: lr=1.00E-05, loss= 1.2519 (max= 1.7982), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:47:52,249 - root - INFO - Step 18630: lr=1.00E-05, loss= 1.2519 (max= 1.7982), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:48:08,184 - root - INFO - Step 18640: lr=1.00E-05, loss= 1.2282 (max= 1.5209), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:48:08,184 - root - INFO - Step 18640: lr=1.00E-05, loss= 1.2282 (max= 1.5209), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:48:08,184 - root - INFO - Step 18640: lr=1.00E-05, loss= 1.2282 (max= 1.5209), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:48:08,184 - root - INFO - Step 18640: lr=1.00E-05, loss= 1.2282 (max= 1.5209), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:48:08,184 - root - INFO - Step 18640: lr=1.00E-05, loss= 1.2282 (max= 1.5209), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:48:08,184 - root - INFO - Step 18640: lr=1.00E-05, loss= 1.2282 (max= 1.5209), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:48:08,184 - root - INFO - Step 18640: lr=1.00E-05, loss= 1.2282 (max= 1.5209), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:48:08,184 - root - INFO - Step 18640: lr=1.00E-05, loss= 1.2282 (max= 1.5209), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:48:24,081 - root - INFO - Step 18650: lr=1.00E-05, loss= 1.2274 (max= 1.5731), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:48:24,081 - root - INFO - Step 18650: lr=1.00E-05, loss= 1.2274 (max= 1.5731), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:48:24,081 - root - INFO - Step 18650: lr=1.00E-05, loss= 1.2274 (max= 1.5731), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:48:24,081 - root - INFO - Step 18650: lr=1.00E-05, loss= 1.2274 (max= 1.5731), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:48:24,081 - root - INFO - Step 18650: lr=1.00E-05, loss= 1.2274 (max= 1.5731), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:48:24,081 - root - INFO - Step 18650: lr=1.00E-05, loss= 1.2274 (max= 1.5731), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:48:24,081 - root - INFO - Step 18650: lr=1.00E-05, loss= 1.2274 (max= 1.5731), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:48:24,081 - root - INFO - Step 18650: lr=1.00E-05, loss= 1.2274 (max= 1.5731), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:48:40,087 - root - INFO - Step 18660: lr=1.00E-05, loss= 1.2338 (max= 1.6459), tps=20476, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:48:40,087 - root - INFO - Step 18660: lr=1.00E-05, loss= 1.2338 (max= 1.6459), tps=20476, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:48:40,087 - root - INFO - Step 18660: lr=1.00E-05, loss= 1.2338 (max= 1.6459), tps=20476, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:48:40,087 - root - INFO - Step 18660: lr=1.00E-05, loss= 1.2338 (max= 1.6459), tps=20476, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:48:40,087 - root - INFO - Step 18660: lr=1.00E-05, loss= 1.2338 (max= 1.6459), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:48:40,087 - root - INFO - Step 18660: lr=1.00E-05, loss= 1.2338 (max= 1.6459), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:48:40,087 - root - INFO - Step 18660: lr=1.00E-05, loss= 1.2338 (max= 1.6459), tps=20476, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:48:40,088 - root - INFO - Step 18660: lr=1.00E-05, loss= 1.2338 (max= 1.6459), tps=20476, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:48:56,054 - root - INFO - Step 18670: lr=1.00E-05, loss= 1.2513 (max= 2.1066), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:48:56,054 - root - INFO - Step 18670: lr=1.00E-05, loss= 1.2513 (max= 2.1066), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:48:56,054 - root - INFO - Step 18670: lr=1.00E-05, loss= 1.2513 (max= 2.1066), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:48:56,054 - root - INFO - Step 18670: lr=1.00E-05, loss= 1.2513 (max= 2.1066), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:48:56,054 - root - INFO - Step 18670: lr=1.00E-05, loss= 1.2513 (max= 2.1066), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:48:56,054 - root - INFO - Step 18670: lr=1.00E-05, loss= 1.2513 (max= 2.1066), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:48:56,054 - root - INFO - Step 18670: lr=1.00E-05, loss= 1.2513 (max= 2.1066), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:48:56,054 - root - INFO - Step 18670: lr=1.00E-05, loss= 1.2513 (max= 2.1066), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:49:11,932 - root - INFO - Step 18680: lr=1.00E-05, loss= 1.2086 (max= 1.6880), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:49:11,932 - root - INFO - Step 18680: lr=1.00E-05, loss= 1.2086 (max= 1.6880), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:49:11,932 - root - INFO - Step 18680: lr=1.00E-05, loss= 1.2086 (max= 1.6880), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:49:11,932 - root - INFO - Step 18680: lr=1.00E-05, loss= 1.2086 (max= 1.6880), tps=20643, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:49:11,932 - root - INFO - Step 18680: lr=1.00E-05, loss= 1.2086 (max= 1.6880), tps=20643, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:49:11,932 - root - INFO - Step 18680: lr=1.00E-05, loss= 1.2086 (max= 1.6880), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:49:11,932 - root - INFO - Step 18680: lr=1.00E-05, loss= 1.2086 (max= 1.6880), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:49:11,932 - root - INFO - Step 18680: lr=1.00E-05, loss= 1.2086 (max= 1.6880), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:49:27,825 - root - INFO - Step 18690: lr=1.00E-05, loss= 1.2448 (max= 1.6510), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:49:27,825 - root - INFO - Step 18690: lr=1.00E-05, loss= 1.2448 (max= 1.6510), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:49:27,825 - root - INFO - Step 18690: lr=1.00E-05, loss= 1.2448 (max= 1.6510), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:49:27,825 - root - INFO - Step 18690: lr=1.00E-05, loss= 1.2448 (max= 1.6510), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:49:27,825 - root - INFO - Step 18690: lr=1.00E-05, loss= 1.2448 (max= 1.6510), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:49:27,825 - root - INFO - Step 18690: lr=1.00E-05, loss= 1.2448 (max= 1.6510), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:49:27,825 - root - INFO - Step 18690: lr=1.00E-05, loss= 1.2448 (max= 1.6510), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:49:27,825 - root - INFO - Step 18690: lr=1.00E-05, loss= 1.2448 (max= 1.6510), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:49:43,789 - root - INFO - Step 18700: lr=1.00E-05, loss= 1.2470 (max= 1.6895), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:49:43,789 - root - INFO - Step 18700: lr=1.00E-05, loss= 1.2470 (max= 1.6895), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:49:43,789 - root - INFO - Step 18700: lr=1.00E-05, loss= 1.2470 (max= 1.6895), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:49:43,789 - root - INFO - Step 18700: lr=1.00E-05, loss= 1.2470 (max= 1.6895), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:49:43,790 - root - INFO - Step 18700: lr=1.00E-05, loss= 1.2470 (max= 1.6895), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:49:43,790 - root - INFO - Step 18700: lr=1.00E-05, loss= 1.2470 (max= 1.6895), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:49:43,790 - root - INFO - Step 18700: lr=1.00E-05, loss= 1.2470 (max= 1.6895), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:49:43,790 - root - INFO - Step 18700: lr=1.00E-05, loss= 1.2470 (max= 1.6895), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:49:59,710 - root - INFO - Step 18710: lr=1.00E-05, loss= 1.2864 (max= 1.6573), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:49:59,710 - root - INFO - Step 18710: lr=1.00E-05, loss= 1.2864 (max= 1.6573), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:49:59,710 - root - INFO - Step 18710: lr=1.00E-05, loss= 1.2864 (max= 1.6573), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:49:59,710 - root - INFO - Step 18710: lr=1.00E-05, loss= 1.2864 (max= 1.6573), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:49:59,710 - root - INFO - Step 18710: lr=1.00E-05, loss= 1.2864 (max= 1.6573), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:49:59,710 - root - INFO - Step 18710: lr=1.00E-05, loss= 1.2864 (max= 1.6573), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:49:59,710 - root - INFO - Step 18710: lr=1.00E-05, loss= 1.2864 (max= 1.6573), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:49:59,710 - root - INFO - Step 18710: lr=1.00E-05, loss= 1.2864 (max= 1.6573), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:50:15,627 - root - INFO - Step 18720: lr=1.00E-05, loss= 1.2184 (max= 1.6121), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:50:15,628 - root - INFO - Step 18720: lr=1.00E-05, loss= 1.2184 (max= 1.6121), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:50:15,628 - root - INFO - Step 18720: lr=1.00E-05, loss= 1.2184 (max= 1.6121), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:50:15,628 - root - INFO - Step 18720: lr=1.00E-05, loss= 1.2184 (max= 1.6121), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:50:15,628 - root - INFO - Step 18720: lr=1.00E-05, loss= 1.2184 (max= 1.6121), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:50:15,628 - root - INFO - Step 18720: lr=1.00E-05, loss= 1.2184 (max= 1.6121), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:50:15,628 - root - INFO - Step 18720: lr=1.00E-05, loss= 1.2184 (max= 1.6121), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:50:15,628 - root - INFO - Step 18720: lr=1.00E-05, loss= 1.2184 (max= 1.6121), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:50:31,593 - root - INFO - Step 18730: lr=1.00E-05, loss= 1.2393 (max= 1.7134), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:50:31,593 - root - INFO - Step 18730: lr=1.00E-05, loss= 1.2393 (max= 1.7134), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:50:31,593 - root - INFO - Step 18730: lr=1.00E-05, loss= 1.2393 (max= 1.7134), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:50:31,593 - root - INFO - Step 18730: lr=1.00E-05, loss= 1.2393 (max= 1.7134), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:50:31,593 - root - INFO - Step 18730: lr=1.00E-05, loss= 1.2393 (max= 1.7134), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:50:31,593 - root - INFO - Step 18730: lr=1.00E-05, loss= 1.2393 (max= 1.7134), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:50:31,593 - root - INFO - Step 18730: lr=1.00E-05, loss= 1.2393 (max= 1.7134), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:50:31,593 - root - INFO - Step 18730: lr=1.00E-05, loss= 1.2393 (max= 1.7134), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:50:47,554 - root - INFO - Step 18740: lr=1.00E-05, loss= 1.2528 (max= 1.6671), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:50:47,554 - root - INFO - Step 18740: lr=1.00E-05, loss= 1.2528 (max= 1.6671), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:50:47,554 - root - INFO - Step 18740: lr=1.00E-05, loss= 1.2528 (max= 1.6671), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:50:47,554 - root - INFO - Step 18740: lr=1.00E-05, loss= 1.2528 (max= 1.6671), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:50:47,554 - root - INFO - Step 18740: lr=1.00E-05, loss= 1.2528 (max= 1.6671), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:50:47,554 - root - INFO - Step 18740: lr=1.00E-05, loss= 1.2528 (max= 1.6671), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:50:47,554 - root - INFO - Step 18740: lr=1.00E-05, loss= 1.2528 (max= 1.6671), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:50:47,554 - root - INFO - Step 18740: lr=1.00E-05, loss= 1.2528 (max= 1.6671), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:51:03,526 - root - INFO - Step 18750: lr=1.00E-05, loss= 1.2363 (max= 1.6519), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:51:03,526 - root - INFO - Step 18750: lr=1.00E-05, loss= 1.2363 (max= 1.6519), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:51:03,526 - root - INFO - Step 18750: lr=1.00E-05, loss= 1.2363 (max= 1.6519), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:51:03,526 - root - INFO - Step 18750: lr=1.00E-05, loss= 1.2363 (max= 1.6519), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:51:03,526 - root - INFO - Step 18750: lr=1.00E-05, loss= 1.2363 (max= 1.6519), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:51:03,526 - root - INFO - Step 18750: lr=1.00E-05, loss= 1.2363 (max= 1.6519), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:51:03,526 - root - INFO - Step 18750: lr=1.00E-05, loss= 1.2363 (max= 1.6519), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:51:03,526 - root - INFO - Step 18750: lr=1.00E-05, loss= 1.2363 (max= 1.6519), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:51:19,475 - root - INFO - Step 18760: lr=1.00E-05, loss= 1.2125 (max= 1.5483), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:51:19,475 - root - INFO - Step 18760: lr=1.00E-05, loss= 1.2125 (max= 1.5483), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:51:19,475 - root - INFO - Step 18760: lr=1.00E-05, loss= 1.2125 (max= 1.5483), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:51:19,475 - root - INFO - Step 18760: lr=1.00E-05, loss= 1.2125 (max= 1.5483), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:51:19,475 - root - INFO - Step 18760: lr=1.00E-05, loss= 1.2125 (max= 1.5483), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:51:19,475 - root - INFO - Step 18760: lr=1.00E-05, loss= 1.2125 (max= 1.5483), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:51:19,475 - root - INFO - Step 18760: lr=1.00E-05, loss= 1.2125 (max= 1.5483), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:51:19,475 - root - INFO - Step 18760: lr=1.00E-05, loss= 1.2125 (max= 1.5483), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:51:35,395 - root - INFO - Step 18770: lr=1.00E-05, loss= 1.2205 (max= 1.8167), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:51:35,395 - root - INFO - Step 18770: lr=1.00E-05, loss= 1.2205 (max= 1.8167), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:51:35,395 - root - INFO - Step 18770: lr=1.00E-05, loss= 1.2205 (max= 1.8167), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:51:35,395 - root - INFO - Step 18770: lr=1.00E-05, loss= 1.2205 (max= 1.8167), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:51:35,395 - root - INFO - Step 18770: lr=1.00E-05, loss= 1.2205 (max= 1.8167), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:51:35,395 - root - INFO - Step 18770: lr=1.00E-05, loss= 1.2205 (max= 1.8167), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:51:35,395 - root - INFO - Step 18770: lr=1.00E-05, loss= 1.2205 (max= 1.8167), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:51:35,395 - root - INFO - Step 18770: lr=1.00E-05, loss= 1.2205 (max= 1.8167), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:51:51,371 - root - INFO - Step 18780: lr=1.00E-05, loss= 1.2208 (max= 1.6552), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:51:51,371 - root - INFO - Step 18780: lr=1.00E-05, loss= 1.2208 (max= 1.6552), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:51:51,371 - root - INFO - Step 18780: lr=1.00E-05, loss= 1.2208 (max= 1.6552), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:51:51,371 - root - INFO - Step 18780: lr=1.00E-05, loss= 1.2208 (max= 1.6552), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:51:51,371 - root - INFO - Step 18780: lr=1.00E-05, loss= 1.2208 (max= 1.6552), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:51:51,371 - root - INFO - Step 18780: lr=1.00E-05, loss= 1.2208 (max= 1.6552), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:51:51,371 - root - INFO - Step 18780: lr=1.00E-05, loss= 1.2208 (max= 1.6552), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:51:51,371 - root - INFO - Step 18780: lr=1.00E-05, loss= 1.2208 (max= 1.6552), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:52:07,310 - root - INFO - Step 18790: lr=1.00E-05, loss= 1.2346 (max= 1.5182), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:07,310 - root - INFO - Step 18790: lr=1.00E-05, loss= 1.2346 (max= 1.5182), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:07,310 - root - INFO - Step 18790: lr=1.00E-05, loss= 1.2346 (max= 1.5182), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:07,310 - root - INFO - Step 18790: lr=1.00E-05, loss= 1.2346 (max= 1.5182), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:07,310 - root - INFO - Step 18790: lr=1.00E-05, loss= 1.2346 (max= 1.5182), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:07,310 - root - INFO - Step 18790: lr=1.00E-05, loss= 1.2346 (max= 1.5182), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:07,310 - root - INFO - Step 18790: lr=1.00E-05, loss= 1.2346 (max= 1.5182), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:07,310 - root - INFO - Step 18790: lr=1.00E-05, loss= 1.2346 (max= 1.5182), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:23,325 - root - INFO - Step 18800: lr=1.00E-05, loss= 1.2137 (max= 1.6482), tps=20464, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:23,325 - root - INFO - Step 18800: lr=1.00E-05, loss= 1.2137 (max= 1.6482), tps=20464, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:23,325 - root - INFO - Step 18800: lr=1.00E-05, loss= 1.2137 (max= 1.6482), tps=20464, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:23,325 - root - INFO - Step 18800: lr=1.00E-05, loss= 1.2137 (max= 1.6482), tps=20464, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:23,326 - root - INFO - Step 18800: lr=1.00E-05, loss= 1.2137 (max= 1.6482), tps=20464, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:23,326 - root - INFO - Step 18800: lr=1.00E-05, loss= 1.2137 (max= 1.6482), tps=20464, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:23,326 - root - INFO - Step 18800: lr=1.00E-05, loss= 1.2137 (max= 1.6482), tps=20464, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:23,326 - root - INFO - Step 18800: lr=1.00E-05, loss= 1.2137 (max= 1.6482), tps=20464, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:39,253 - root - INFO - Step 18810: lr=1.00E-05, loss= 1.2048 (max= 1.6073), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:39,253 - root - INFO - Step 18810: lr=1.00E-05, loss= 1.2048 (max= 1.6073), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:39,253 - root - INFO - Step 18810: lr=1.00E-05, loss= 1.2048 (max= 1.6073), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:39,253 - root - INFO - Step 18810: lr=1.00E-05, loss= 1.2048 (max= 1.6073), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:39,253 - root - INFO - Step 18810: lr=1.00E-05, loss= 1.2048 (max= 1.6073), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:39,253 - root - INFO - Step 18810: lr=1.00E-05, loss= 1.2048 (max= 1.6073), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:39,253 - root - INFO - Step 18810: lr=1.00E-05, loss= 1.2048 (max= 1.6073), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:39,253 - root - INFO - Step 18810: lr=1.00E-05, loss= 1.2048 (max= 1.6073), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:52:55,214 - root - INFO - Step 18820: lr=1.00E-05, loss= 1.2260 (max= 1.5978), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:52:55,214 - root - INFO - Step 18820: lr=1.00E-05, loss= 1.2260 (max= 1.5978), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:52:55,214 - root - INFO - Step 18820: lr=1.00E-05, loss= 1.2260 (max= 1.5978), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:52:55,214 - root - INFO - Step 18820: lr=1.00E-05, loss= 1.2260 (max= 1.5978), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:52:55,214 - root - INFO - Step 18820: lr=1.00E-05, loss= 1.2260 (max= 1.5978), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:52:55,214 - root - INFO - Step 18820: lr=1.00E-05, loss= 1.2260 (max= 1.5978), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:52:55,214 - root - INFO - Step 18820: lr=1.00E-05, loss= 1.2260 (max= 1.5978), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:52:55,214 - root - INFO - Step 18820: lr=1.00E-05, loss= 1.2260 (max= 1.5978), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:53:11,165 - root - INFO - Step 18830: lr=1.00E-05, loss= 1.2413 (max= 1.6052), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:53:11,165 - root - INFO - Step 18830: lr=1.00E-05, loss= 1.2413 (max= 1.6052), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:53:11,166 - root - INFO - Step 18830: lr=1.00E-05, loss= 1.2413 (max= 1.6052), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:53:11,166 - root - INFO - Step 18830: lr=1.00E-05, loss= 1.2413 (max= 1.6052), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:53:11,166 - root - INFO - Step 18830: lr=1.00E-05, loss= 1.2413 (max= 1.6052), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:53:11,166 - root - INFO - Step 18830: lr=1.00E-05, loss= 1.2413 (max= 1.6052), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:53:11,166 - root - INFO - Step 18830: lr=1.00E-05, loss= 1.2413 (max= 1.6052), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:53:11,166 - root - INFO - Step 18830: lr=1.00E-05, loss= 1.2413 (max= 1.6052), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:53:27,173 - root - INFO - Step 18840: lr=1.00E-05, loss= 1.2553 (max= 1.5997), tps=20475, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:53:27,173 - root - INFO - Step 18840: lr=1.00E-05, loss= 1.2553 (max= 1.5997), tps=20475, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:53:27,173 - root - INFO - Step 18840: lr=1.00E-05, loss= 1.2553 (max= 1.5997), tps=20475, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:53:27,173 - root - INFO - Step 18840: lr=1.00E-05, loss= 1.2553 (max= 1.5997), tps=20475, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:53:27,173 - root - INFO - Step 18840: lr=1.00E-05, loss= 1.2553 (max= 1.5997), tps=20475, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:53:27,173 - root - INFO - Step 18840: lr=1.00E-05, loss= 1.2553 (max= 1.5997), tps=20475, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:53:27,173 - root - INFO - Step 18840: lr=1.00E-05, loss= 1.2553 (max= 1.5997), tps=20475, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:53:27,173 - root - INFO - Step 18840: lr=1.00E-05, loss= 1.2553 (max= 1.5997), tps=20475, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:53:43,089 - root - INFO - Step 18850: lr=1.00E-05, loss= 1.2499 (max= 1.7650), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:53:43,089 - root - INFO - Step 18850: lr=1.00E-05, loss= 1.2499 (max= 1.7650), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:53:43,089 - root - INFO - Step 18850: lr=1.00E-05, loss= 1.2499 (max= 1.7650), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:53:43,089 - root - INFO - Step 18850: lr=1.00E-05, loss= 1.2499 (max= 1.7650), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:53:43,090 - root - INFO - Step 18850: lr=1.00E-05, loss= 1.2499 (max= 1.7650), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:53:43,089 - root - INFO - Step 18850: lr=1.00E-05, loss= 1.2499 (max= 1.7650), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:53:43,090 - root - INFO - Step 18850: lr=1.00E-05, loss= 1.2499 (max= 1.7650), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:53:43,090 - root - INFO - Step 18850: lr=1.00E-05, loss= 1.2499 (max= 1.7650), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:53:59,005 - root - INFO - Step 18860: lr=1.00E-05, loss= 1.2129 (max= 1.7667), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:53:59,005 - root - INFO - Step 18860: lr=1.00E-05, loss= 1.2129 (max= 1.7667), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:53:59,005 - root - INFO - Step 18860: lr=1.00E-05, loss= 1.2129 (max= 1.7667), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:53:59,005 - root - INFO - Step 18860: lr=1.00E-05, loss= 1.2129 (max= 1.7667), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:53:59,005 - root - INFO - Step 18860: lr=1.00E-05, loss= 1.2129 (max= 1.7667), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:53:59,005 - root - INFO - Step 18860: lr=1.00E-05, loss= 1.2129 (max= 1.7667), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:53:59,005 - root - INFO - Step 18860: lr=1.00E-05, loss= 1.2129 (max= 1.7667), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:53:59,006 - root - INFO - Step 18860: lr=1.00E-05, loss= 1.2129 (max= 1.7667), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:54:14,958 - root - INFO - Step 18870: lr=1.00E-05, loss= 1.2079 (max= 1.5500), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:54:14,958 - root - INFO - Step 18870: lr=1.00E-05, loss= 1.2079 (max= 1.5500), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:54:14,958 - root - INFO - Step 18870: lr=1.00E-05, loss= 1.2079 (max= 1.5500), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:54:14,958 - root - INFO - Step 18870: lr=1.00E-05, loss= 1.2079 (max= 1.5500), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:54:14,958 - root - INFO - Step 18870: lr=1.00E-05, loss= 1.2079 (max= 1.5500), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:54:14,958 - root - INFO - Step 18870: lr=1.00E-05, loss= 1.2079 (max= 1.5500), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:54:14,958 - root - INFO - Step 18870: lr=1.00E-05, loss= 1.2079 (max= 1.5500), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:54:14,958 - root - INFO - Step 18870: lr=1.00E-05, loss= 1.2079 (max= 1.5500), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:54:30,878 - root - INFO - Step 18880: lr=1.00E-05, loss= 1.2526 (max= 1.6560), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:54:30,878 - root - INFO - Step 18880: lr=1.00E-05, loss= 1.2526 (max= 1.6560), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:54:30,878 - root - INFO - Step 18880: lr=1.00E-05, loss= 1.2526 (max= 1.6560), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:54:30,878 - root - INFO - Step 18880: lr=1.00E-05, loss= 1.2526 (max= 1.6560), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:54:30,878 - root - INFO - Step 18880: lr=1.00E-05, loss= 1.2526 (max= 1.6560), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:54:30,878 - root - INFO - Step 18880: lr=1.00E-05, loss= 1.2526 (max= 1.6560), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:54:30,878 - root - INFO - Step 18880: lr=1.00E-05, loss= 1.2526 (max= 1.6560), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:54:30,879 - root - INFO - Step 18880: lr=1.00E-05, loss= 1.2526 (max= 1.6560), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:54:46,840 - root - INFO - Step 18890: lr=1.00E-05, loss= 1.2465 (max= 1.7189), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:54:46,840 - root - INFO - Step 18890: lr=1.00E-05, loss= 1.2465 (max= 1.7189), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:54:46,840 - root - INFO - Step 18890: lr=1.00E-05, loss= 1.2465 (max= 1.7189), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:54:46,840 - root - INFO - Step 18890: lr=1.00E-05, loss= 1.2465 (max= 1.7189), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:54:46,840 - root - INFO - Step 18890: lr=1.00E-05, loss= 1.2465 (max= 1.7189), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:54:46,840 - root - INFO - Step 18890: lr=1.00E-05, loss= 1.2465 (max= 1.7189), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:54:46,840 - root - INFO - Step 18890: lr=1.00E-05, loss= 1.2465 (max= 1.7189), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:54:46,840 - root - INFO - Step 18890: lr=1.00E-05, loss= 1.2465 (max= 1.7189), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:02,746 - root - INFO - Step 18900: lr=1.00E-05, loss= 1.2151 (max= 1.5594), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:02,746 - root - INFO - Step 18900: lr=1.00E-05, loss= 1.2151 (max= 1.5594), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:02,747 - root - INFO - Step 18900: lr=1.00E-05, loss= 1.2151 (max= 1.5594), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:02,747 - root - INFO - Step 18900: lr=1.00E-05, loss= 1.2151 (max= 1.5594), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:02,747 - root - INFO - Step 18900: lr=1.00E-05, loss= 1.2151 (max= 1.5594), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:02,747 - root - INFO - Step 18900: lr=1.00E-05, loss= 1.2151 (max= 1.5594), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:02,747 - root - INFO - Step 18900: lr=1.00E-05, loss= 1.2151 (max= 1.5594), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:02,747 - root - INFO - Step 18900: lr=1.00E-05, loss= 1.2151 (max= 1.5594), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:18,676 - root - INFO - Step 18910: lr=1.00E-05, loss= 1.2295 (max= 1.6511), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:18,676 - root - INFO - Step 18910: lr=1.00E-05, loss= 1.2295 (max= 1.6511), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:18,676 - root - INFO - Step 18910: lr=1.00E-05, loss= 1.2295 (max= 1.6511), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:18,676 - root - INFO - Step 18910: lr=1.00E-05, loss= 1.2295 (max= 1.6511), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:18,676 - root - INFO - Step 18910: lr=1.00E-05, loss= 1.2295 (max= 1.6511), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:18,676 - root - INFO - Step 18910: lr=1.00E-05, loss= 1.2295 (max= 1.6511), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:18,676 - root - INFO - Step 18910: lr=1.00E-05, loss= 1.2295 (max= 1.6511), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:18,676 - root - INFO - Step 18910: lr=1.00E-05, loss= 1.2295 (max= 1.6511), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:34,633 - root - INFO - Step 18920: lr=1.00E-05, loss= 1.2230 (max= 1.6650), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:34,633 - root - INFO - Step 18920: lr=1.00E-05, loss= 1.2230 (max= 1.6650), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:34,633 - root - INFO - Step 18920: lr=1.00E-05, loss= 1.2230 (max= 1.6650), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:34,633 - root - INFO - Step 18920: lr=1.00E-05, loss= 1.2230 (max= 1.6650), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:34,633 - root - INFO - Step 18920: lr=1.00E-05, loss= 1.2230 (max= 1.6650), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:34,633 - root - INFO - Step 18920: lr=1.00E-05, loss= 1.2230 (max= 1.6650), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:34,633 - root - INFO - Step 18920: lr=1.00E-05, loss= 1.2230 (max= 1.6650), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:34,633 - root - INFO - Step 18920: lr=1.00E-05, loss= 1.2230 (max= 1.6650), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:55:50,548 - root - INFO - Step 18930: lr=1.00E-05, loss= 1.2464 (max= 1.7725), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:55:50,548 - root - INFO - Step 18930: lr=1.00E-05, loss= 1.2464 (max= 1.7725), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:55:50,548 - root - INFO - Step 18930: lr=1.00E-05, loss= 1.2464 (max= 1.7725), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:55:50,548 - root - INFO - Step 18930: lr=1.00E-05, loss= 1.2464 (max= 1.7725), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:55:50,548 - root - INFO - Step 18930: lr=1.00E-05, loss= 1.2464 (max= 1.7725), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:55:50,548 - root - INFO - Step 18930: lr=1.00E-05, loss= 1.2464 (max= 1.7725), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:55:50,548 - root - INFO - Step 18930: lr=1.00E-05, loss= 1.2464 (max= 1.7725), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:55:50,549 - root - INFO - Step 18930: lr=1.00E-05, loss= 1.2464 (max= 1.7725), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:56:06,493 - root - INFO - Step 18940: lr=1.00E-05, loss= 1.2555 (max= 1.5221), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:06,493 - root - INFO - Step 18940: lr=1.00E-05, loss= 1.2555 (max= 1.5221), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:06,493 - root - INFO - Step 18940: lr=1.00E-05, loss= 1.2555 (max= 1.5221), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:06,493 - root - INFO - Step 18940: lr=1.00E-05, loss= 1.2555 (max= 1.5221), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:06,493 - root - INFO - Step 18940: lr=1.00E-05, loss= 1.2555 (max= 1.5221), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:06,493 - root - INFO - Step 18940: lr=1.00E-05, loss= 1.2555 (max= 1.5221), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:06,493 - root - INFO - Step 18940: lr=1.00E-05, loss= 1.2555 (max= 1.5221), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:06,493 - root - INFO - Step 18940: lr=1.00E-05, loss= 1.2555 (max= 1.5221), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:22,454 - root - INFO - Step 18950: lr=1.00E-05, loss= 1.2349 (max= 1.6878), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:22,455 - root - INFO - Step 18950: lr=1.00E-05, loss= 1.2349 (max= 1.6878), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:22,455 - root - INFO - Step 18950: lr=1.00E-05, loss= 1.2349 (max= 1.6878), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:22,455 - root - INFO - Step 18950: lr=1.00E-05, loss= 1.2349 (max= 1.6878), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:22,455 - root - INFO - Step 18950: lr=1.00E-05, loss= 1.2349 (max= 1.6878), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:22,455 - root - INFO - Step 18950: lr=1.00E-05, loss= 1.2349 (max= 1.6878), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:22,455 - root - INFO - Step 18950: lr=1.00E-05, loss= 1.2349 (max= 1.6878), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:22,455 - root - INFO - Step 18950: lr=1.00E-05, loss= 1.2349 (max= 1.6878), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:38,347 - root - INFO - Step 18960: lr=1.00E-05, loss= 1.2485 (max= 1.6273), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:38,347 - root - INFO - Step 18960: lr=1.00E-05, loss= 1.2485 (max= 1.6273), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:38,347 - root - INFO - Step 18960: lr=1.00E-05, loss= 1.2485 (max= 1.6273), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:38,347 - root - INFO - Step 18960: lr=1.00E-05, loss= 1.2485 (max= 1.6273), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:38,347 - root - INFO - Step 18960: lr=1.00E-05, loss= 1.2485 (max= 1.6273), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:38,347 - root - INFO - Step 18960: lr=1.00E-05, loss= 1.2485 (max= 1.6273), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:38,347 - root - INFO - Step 18960: lr=1.00E-05, loss= 1.2485 (max= 1.6273), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:38,348 - root - INFO - Step 18960: lr=1.00E-05, loss= 1.2485 (max= 1.6273), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:54,241 - root - INFO - Step 18970: lr=1.00E-05, loss= 1.2457 (max= 1.6115), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:54,241 - root - INFO - Step 18970: lr=1.00E-05, loss= 1.2457 (max= 1.6115), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:54,241 - root - INFO - Step 18970: lr=1.00E-05, loss= 1.2457 (max= 1.6115), tps=20621, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:54,241 - root - INFO - Step 18970: lr=1.00E-05, loss= 1.2457 (max= 1.6115), tps=20621, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:54,241 - root - INFO - Step 18970: lr=1.00E-05, loss= 1.2457 (max= 1.6115), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:54,241 - root - INFO - Step 18970: lr=1.00E-05, loss= 1.2457 (max= 1.6115), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:54,241 - root - INFO - Step 18970: lr=1.00E-05, loss= 1.2457 (max= 1.6115), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:56:54,241 - root - INFO - Step 18970: lr=1.00E-05, loss= 1.2457 (max= 1.6115), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:57:10,196 - root - INFO - Step 18980: lr=1.00E-05, loss= 1.2469 (max= 1.6727), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:57:10,196 - root - INFO - Step 18980: lr=1.00E-05, loss= 1.2469 (max= 1.6727), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:57:10,196 - root - INFO - Step 18980: lr=1.00E-05, loss= 1.2469 (max= 1.6727), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:57:10,196 - root - INFO - Step 18980: lr=1.00E-05, loss= 1.2469 (max= 1.6727), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:57:10,196 - root - INFO - Step 18980: lr=1.00E-05, loss= 1.2469 (max= 1.6727), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:57:10,196 - root - INFO - Step 18980: lr=1.00E-05, loss= 1.2469 (max= 1.6727), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:57:10,196 - root - INFO - Step 18980: lr=1.00E-05, loss= 1.2469 (max= 1.6727), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:57:10,196 - root - INFO - Step 18980: lr=1.00E-05, loss= 1.2469 (max= 1.6727), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:57:26,131 - root - INFO - Step 18990: lr=1.00E-05, loss= 1.2383 (max= 1.5758), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:57:26,131 - root - INFO - Step 18990: lr=1.00E-05, loss= 1.2383 (max= 1.5758), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:57:26,131 - root - INFO - Step 18990: lr=1.00E-05, loss= 1.2383 (max= 1.5758), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:57:26,131 - root - INFO - Step 18990: lr=1.00E-05, loss= 1.2383 (max= 1.5758), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:57:26,131 - root - INFO - Step 18990: lr=1.00E-05, loss= 1.2383 (max= 1.5758), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:57:26,131 - root - INFO - Step 18990: lr=1.00E-05, loss= 1.2383 (max= 1.5758), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:57:26,131 - root - INFO - Step 18990: lr=1.00E-05, loss= 1.2383 (max= 1.5758), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:57:26,131 - root - INFO - Step 18990: lr=1.00E-05, loss= 1.2383 (max= 1.5758), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-19000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-19000! Save time: 4.3753955364227295 +2025-10-24 18:57:42,069 - root - INFO - Step 19000: lr=1.00E-05, loss= 1.2803 (max= 1.6395), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:57:42,069 - root - INFO - Saving a full checkpoint at step 19000 +2025-10-24 18:57:42,069 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:57:42,069 - root - INFO - Step 19000: lr=1.00E-05, loss= 1.2803 (max= 1.6395), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:57:42,069 - root - INFO - Saving a full checkpoint at step 19000 +2025-10-24 18:57:42,069 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:57:42,069 - root - INFO - Step 19000: lr=1.00E-05, loss= 1.2803 (max= 1.6395), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:57:42,069 - root - INFO - Step 19000: lr=1.00E-05, loss= 1.2803 (max= 1.6395), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:57:42,069 - root - INFO - Saving a full checkpoint at step 19000 +2025-10-24 18:57:42,069 - root - INFO - Step 19000: lr=1.00E-05, loss= 1.2803 (max= 1.6395), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:57:42,069 - root - INFO - Saving a full checkpoint at step 19000 +2025-10-24 18:57:42,069 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:57:42,069 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:57:42,069 - root - INFO - Step 19000: lr=1.00E-05, loss= 1.2803 (max= 1.6395), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:57:42,069 - root - INFO - Step 19000: lr=1.00E-05, loss= 1.2803 (max= 1.6395), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:57:42,069 - root - INFO - Saving a full checkpoint at step 19000 +2025-10-24 18:57:42,069 - root - INFO - Step 19000: lr=1.00E-05, loss= 1.2803 (max= 1.6395), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:57:42,069 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:57:42,069 - root - INFO - Saving a full checkpoint at step 19000 +2025-10-24 18:57:42,069 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:57:42,069 - root - INFO - Saving a full checkpoint at step 19000 +2025-10-24 18:57:42,069 - root - INFO - Saving a full checkpoint at step 19000 +2025-10-24 18:57:42,069 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:57:42,069 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 18:57:56,966 - root - INFO - Finished saving the checkpoint in 14.90 seconds +2025-10-24 18:57:56,972 - root - INFO - Finished saving the checkpoint in 14.90 seconds +2025-10-24 18:57:56,972 - root - INFO - Finished saving the checkpoint in 14.90 seconds +2025-10-24 18:57:56,972 - root - INFO - Finished saving the checkpoint in 14.90 seconds +2025-10-24 18:57:56,973 - root - INFO - Finished saving the checkpoint in 14.90 seconds +2025-10-24 18:57:56,973 - root - INFO - Finished saving the checkpoint in 14.90 seconds +2025-10-24 18:57:56,973 - root - INFO - Finished saving the checkpoint in 14.90 seconds +2025-10-24 18:57:56,974 - root - INFO - Finished saving the checkpoint in 14.90 seconds +2025-10-24 18:58:12,850 - root - INFO - Step 19010: lr=1.00E-05, loss= 1.2448 (max= 1.6730), tps=10647, mfu=22.18%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:58:12,850 - root - INFO - Step 19010: lr=1.00E-05, loss= 1.2448 (max= 1.6730), tps=10647, mfu=22.18%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:58:12,850 - root - INFO - Step 19010: lr=1.00E-05, loss= 1.2448 (max= 1.6730), tps=10647, mfu=22.18%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:58:12,850 - root - INFO - Step 19010: lr=1.00E-05, loss= 1.2448 (max= 1.6730), tps=10647, mfu=22.18%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:58:12,850 - root - INFO - Step 19010: lr=1.00E-05, loss= 1.2448 (max= 1.6730), tps=10647, mfu=22.18%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:58:12,850 - root - INFO - Step 19010: lr=1.00E-05, loss= 1.2448 (max= 1.6730), tps=10647, mfu=22.18%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:58:12,850 - root - INFO - Step 19010: lr=1.00E-05, loss= 1.2448 (max= 1.6730), tps=10647, mfu=22.18%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:58:12,850 - root - INFO - Step 19010: lr=1.00E-05, loss= 1.2448 (max= 1.6730), tps=10647, mfu=22.18%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 18:58:28,806 - root - INFO - Step 19020: lr=1.00E-05, loss= 1.2668 (max= 1.5562), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:58:28,806 - root - INFO - Step 19020: lr=1.00E-05, loss= 1.2668 (max= 1.5562), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:58:28,806 - root - INFO - Step 19020: lr=1.00E-05, loss= 1.2668 (max= 1.5562), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:58:28,806 - root - INFO - Step 19020: lr=1.00E-05, loss= 1.2668 (max= 1.5562), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:58:28,806 - root - INFO - Step 19020: lr=1.00E-05, loss= 1.2668 (max= 1.5562), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:58:28,806 - root - INFO - Step 19020: lr=1.00E-05, loss= 1.2668 (max= 1.5562), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:58:28,806 - root - INFO - Step 19020: lr=1.00E-05, loss= 1.2668 (max= 1.5562), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:58:28,806 - root - INFO - Step 19020: lr=1.00E-05, loss= 1.2668 (max= 1.5562), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:58:44,750 - root - INFO - Step 19030: lr=1.00E-05, loss= 1.2299 (max= 1.6868), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:58:44,750 - root - INFO - Step 19030: lr=1.00E-05, loss= 1.2299 (max= 1.6868), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:58:44,750 - root - INFO - Step 19030: lr=1.00E-05, loss= 1.2299 (max= 1.6868), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:58:44,750 - root - INFO - Step 19030: lr=1.00E-05, loss= 1.2299 (max= 1.6868), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:58:44,750 - root - INFO - Step 19030: lr=1.00E-05, loss= 1.2299 (max= 1.6868), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:58:44,750 - root - INFO - Step 19030: lr=1.00E-05, loss= 1.2299 (max= 1.6868), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:58:44,750 - root - INFO - Step 19030: lr=1.00E-05, loss= 1.2299 (max= 1.6868), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:58:44,750 - root - INFO - Step 19030: lr=1.00E-05, loss= 1.2299 (max= 1.6868), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:00,679 - root - INFO - Step 19040: lr=1.00E-05, loss= 1.2194 (max= 1.6073), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:00,679 - root - INFO - Step 19040: lr=1.00E-05, loss= 1.2194 (max= 1.6073), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:00,679 - root - INFO - Step 19040: lr=1.00E-05, loss= 1.2194 (max= 1.6073), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:00,679 - root - INFO - Step 19040: lr=1.00E-05, loss= 1.2194 (max= 1.6073), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:00,679 - root - INFO - Step 19040: lr=1.00E-05, loss= 1.2194 (max= 1.6073), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:00,679 - root - INFO - Step 19040: lr=1.00E-05, loss= 1.2194 (max= 1.6073), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:00,679 - root - INFO - Step 19040: lr=1.00E-05, loss= 1.2194 (max= 1.6073), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:00,679 - root - INFO - Step 19040: lr=1.00E-05, loss= 1.2194 (max= 1.6073), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:16,583 - root - INFO - Step 19050: lr=1.00E-05, loss= 1.2223 (max= 1.8398), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:16,583 - root - INFO - Step 19050: lr=1.00E-05, loss= 1.2223 (max= 1.8398), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:16,583 - root - INFO - Step 19050: lr=1.00E-05, loss= 1.2223 (max= 1.8398), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:16,583 - root - INFO - Step 19050: lr=1.00E-05, loss= 1.2223 (max= 1.8398), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:16,584 - root - INFO - Step 19050: lr=1.00E-05, loss= 1.2223 (max= 1.8398), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:16,584 - root - INFO - Step 19050: lr=1.00E-05, loss= 1.2223 (max= 1.8398), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:16,584 - root - INFO - Step 19050: lr=1.00E-05, loss= 1.2223 (max= 1.8398), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:16,584 - root - INFO - Step 19050: lr=1.00E-05, loss= 1.2223 (max= 1.8398), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:32,546 - root - INFO - Step 19060: lr=1.00E-05, loss= 1.2504 (max= 1.5591), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:32,546 - root - INFO - Step 19060: lr=1.00E-05, loss= 1.2504 (max= 1.5591), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:32,546 - root - INFO - Step 19060: lr=1.00E-05, loss= 1.2504 (max= 1.5591), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:32,546 - root - INFO - Step 19060: lr=1.00E-05, loss= 1.2504 (max= 1.5591), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:32,546 - root - INFO - Step 19060: lr=1.00E-05, loss= 1.2504 (max= 1.5591), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:32,546 - root - INFO - Step 19060: lr=1.00E-05, loss= 1.2504 (max= 1.5591), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:32,546 - root - INFO - Step 19060: lr=1.00E-05, loss= 1.2504 (max= 1.5591), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:32,546 - root - INFO - Step 19060: lr=1.00E-05, loss= 1.2504 (max= 1.5591), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:48,508 - root - INFO - Step 19070: lr=1.00E-05, loss= 1.2232 (max= 1.5987), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:48,508 - root - INFO - Step 19070: lr=1.00E-05, loss= 1.2232 (max= 1.5987), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:48,509 - root - INFO - Step 19070: lr=1.00E-05, loss= 1.2232 (max= 1.5987), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:48,509 - root - INFO - Step 19070: lr=1.00E-05, loss= 1.2232 (max= 1.5987), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:48,509 - root - INFO - Step 19070: lr=1.00E-05, loss= 1.2232 (max= 1.5987), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:48,509 - root - INFO - Step 19070: lr=1.00E-05, loss= 1.2232 (max= 1.5987), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:48,509 - root - INFO - Step 19070: lr=1.00E-05, loss= 1.2232 (max= 1.5987), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 18:59:48,509 - root - INFO - Step 19070: lr=1.00E-05, loss= 1.2232 (max= 1.5987), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:04,411 - root - INFO - Step 19080: lr=1.00E-05, loss= 1.1934 (max= 1.5412), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:04,411 - root - INFO - Step 19080: lr=1.00E-05, loss= 1.1934 (max= 1.5412), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:04,411 - root - INFO - Step 19080: lr=1.00E-05, loss= 1.1934 (max= 1.5412), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:04,411 - root - INFO - Step 19080: lr=1.00E-05, loss= 1.1934 (max= 1.5412), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:04,411 - root - INFO - Step 19080: lr=1.00E-05, loss= 1.1934 (max= 1.5412), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:04,411 - root - INFO - Step 19080: lr=1.00E-05, loss= 1.1934 (max= 1.5412), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:04,411 - root - INFO - Step 19080: lr=1.00E-05, loss= 1.1934 (max= 1.5412), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:04,412 - root - INFO - Step 19080: lr=1.00E-05, loss= 1.1934 (max= 1.5412), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:20,335 - root - INFO - Step 19090: lr=1.00E-05, loss= 1.2235 (max= 1.5551), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:20,335 - root - INFO - Step 19090: lr=1.00E-05, loss= 1.2235 (max= 1.5551), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:20,335 - root - INFO - Step 19090: lr=1.00E-05, loss= 1.2235 (max= 1.5551), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:20,335 - root - INFO - Step 19090: lr=1.00E-05, loss= 1.2235 (max= 1.5551), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:20,335 - root - INFO - Step 19090: lr=1.00E-05, loss= 1.2235 (max= 1.5551), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:20,335 - root - INFO - Step 19090: lr=1.00E-05, loss= 1.2235 (max= 1.5551), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:20,335 - root - INFO - Step 19090: lr=1.00E-05, loss= 1.2235 (max= 1.5551), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:20,336 - root - INFO - Step 19090: lr=1.00E-05, loss= 1.2235 (max= 1.5551), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:36,257 - root - INFO - Step 19100: lr=1.00E-05, loss= 1.2098 (max= 1.7057), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:36,257 - root - INFO - Step 19100: lr=1.00E-05, loss= 1.2098 (max= 1.7057), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:36,257 - root - INFO - Step 19100: lr=1.00E-05, loss= 1.2098 (max= 1.7057), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:36,258 - root - INFO - Step 19100: lr=1.00E-05, loss= 1.2098 (max= 1.7057), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:36,258 - root - INFO - Step 19100: lr=1.00E-05, loss= 1.2098 (max= 1.7057), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:36,258 - root - INFO - Step 19100: lr=1.00E-05, loss= 1.2098 (max= 1.7057), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:36,258 - root - INFO - Step 19100: lr=1.00E-05, loss= 1.2098 (max= 1.7057), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:36,258 - root - INFO - Step 19100: lr=1.00E-05, loss= 1.2098 (max= 1.7057), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:00:52,215 - root - INFO - Step 19110: lr=1.00E-05, loss= 1.2241 (max= 1.6208), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:00:52,215 - root - INFO - Step 19110: lr=1.00E-05, loss= 1.2241 (max= 1.6208), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:00:52,215 - root - INFO - Step 19110: lr=1.00E-05, loss= 1.2241 (max= 1.6208), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:00:52,215 - root - INFO - Step 19110: lr=1.00E-05, loss= 1.2241 (max= 1.6208), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:00:52,215 - root - INFO - Step 19110: lr=1.00E-05, loss= 1.2241 (max= 1.6208), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:00:52,215 - root - INFO - Step 19110: lr=1.00E-05, loss= 1.2241 (max= 1.6208), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:00:52,215 - root - INFO - Step 19110: lr=1.00E-05, loss= 1.2241 (max= 1.6208), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:00:52,215 - root - INFO - Step 19110: lr=1.00E-05, loss= 1.2241 (max= 1.6208), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:01:08,122 - root - INFO - Step 19120: lr=1.00E-05, loss= 1.2154 (max= 1.5864), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:01:08,122 - root - INFO - Step 19120: lr=1.00E-05, loss= 1.2154 (max= 1.5864), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:01:08,122 - root - INFO - Step 19120: lr=1.00E-05, loss= 1.2154 (max= 1.5864), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:01:08,122 - root - INFO - Step 19120: lr=1.00E-05, loss= 1.2154 (max= 1.5864), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:01:08,122 - root - INFO - Step 19120: lr=1.00E-05, loss= 1.2154 (max= 1.5864), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:01:08,122 - root - INFO - Step 19120: lr=1.00E-05, loss= 1.2154 (max= 1.5864), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:01:08,122 - root - INFO - Step 19120: lr=1.00E-05, loss= 1.2154 (max= 1.5864), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:01:08,122 - root - INFO - Step 19120: lr=1.00E-05, loss= 1.2154 (max= 1.5864), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:01:24,115 - root - INFO - Step 19130: lr=1.00E-05, loss= 1.2190 (max= 1.5816), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:24,116 - root - INFO - Step 19130: lr=1.00E-05, loss= 1.2190 (max= 1.5816), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:24,116 - root - INFO - Step 19130: lr=1.00E-05, loss= 1.2190 (max= 1.5816), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:24,116 - root - INFO - Step 19130: lr=1.00E-05, loss= 1.2190 (max= 1.5816), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:24,116 - root - INFO - Step 19130: lr=1.00E-05, loss= 1.2190 (max= 1.5816), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:24,116 - root - INFO - Step 19130: lr=1.00E-05, loss= 1.2190 (max= 1.5816), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:24,116 - root - INFO - Step 19130: lr=1.00E-05, loss= 1.2190 (max= 1.5816), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:24,116 - root - INFO - Step 19130: lr=1.00E-05, loss= 1.2190 (max= 1.5816), tps=20492, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:40,062 - root - INFO - Step 19140: lr=1.00E-05, loss= 1.2404 (max= 1.8324), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:40,062 - root - INFO - Step 19140: lr=1.00E-05, loss= 1.2404 (max= 1.8324), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:40,062 - root - INFO - Step 19140: lr=1.00E-05, loss= 1.2404 (max= 1.8324), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:40,062 - root - INFO - Step 19140: lr=1.00E-05, loss= 1.2404 (max= 1.8324), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:40,063 - root - INFO - Step 19140: lr=1.00E-05, loss= 1.2404 (max= 1.8324), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:40,063 - root - INFO - Step 19140: lr=1.00E-05, loss= 1.2404 (max= 1.8324), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:40,063 - root - INFO - Step 19140: lr=1.00E-05, loss= 1.2404 (max= 1.8324), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:40,063 - root - INFO - Step 19140: lr=1.00E-05, loss= 1.2404 (max= 1.8324), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:55,996 - root - INFO - Step 19150: lr=1.00E-05, loss= 1.2105 (max= 1.5916), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:55,997 - root - INFO - Step 19150: lr=1.00E-05, loss= 1.2105 (max= 1.5916), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:55,997 - root - INFO - Step 19150: lr=1.00E-05, loss= 1.2105 (max= 1.5916), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:55,997 - root - INFO - Step 19150: lr=1.00E-05, loss= 1.2105 (max= 1.5916), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:55,997 - root - INFO - Step 19150: lr=1.00E-05, loss= 1.2105 (max= 1.5916), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:55,997 - root - INFO - Step 19150: lr=1.00E-05, loss= 1.2105 (max= 1.5916), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:55,997 - root - INFO - Step 19150: lr=1.00E-05, loss= 1.2105 (max= 1.5916), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:01:55,997 - root - INFO - Step 19150: lr=1.00E-05, loss= 1.2105 (max= 1.5916), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:11,932 - root - INFO - Step 19160: lr=1.00E-05, loss= 1.2307 (max= 1.5593), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:11,932 - root - INFO - Step 19160: lr=1.00E-05, loss= 1.2307 (max= 1.5593), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:11,932 - root - INFO - Step 19160: lr=1.00E-05, loss= 1.2307 (max= 1.5593), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:11,932 - root - INFO - Step 19160: lr=1.00E-05, loss= 1.2307 (max= 1.5593), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:11,932 - root - INFO - Step 19160: lr=1.00E-05, loss= 1.2307 (max= 1.5593), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:11,932 - root - INFO - Step 19160: lr=1.00E-05, loss= 1.2307 (max= 1.5593), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:11,932 - root - INFO - Step 19160: lr=1.00E-05, loss= 1.2307 (max= 1.5593), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:11,932 - root - INFO - Step 19160: lr=1.00E-05, loss= 1.2307 (max= 1.5593), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:27,854 - root - INFO - Step 19170: lr=1.00E-05, loss= 1.2298 (max= 1.8603), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:27,854 - root - INFO - Step 19170: lr=1.00E-05, loss= 1.2298 (max= 1.8603), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:27,854 - root - INFO - Step 19170: lr=1.00E-05, loss= 1.2298 (max= 1.8603), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:27,854 - root - INFO - Step 19170: lr=1.00E-05, loss= 1.2298 (max= 1.8603), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:27,854 - root - INFO - Step 19170: lr=1.00E-05, loss= 1.2298 (max= 1.8603), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:27,854 - root - INFO - Step 19170: lr=1.00E-05, loss= 1.2298 (max= 1.8603), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:27,854 - root - INFO - Step 19170: lr=1.00E-05, loss= 1.2298 (max= 1.8603), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:27,854 - root - INFO - Step 19170: lr=1.00E-05, loss= 1.2298 (max= 1.8603), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:43,764 - root - INFO - Step 19180: lr=1.00E-05, loss= 1.2450 (max= 1.5970), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:43,764 - root - INFO - Step 19180: lr=1.00E-05, loss= 1.2450 (max= 1.5970), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:43,764 - root - INFO - Step 19180: lr=1.00E-05, loss= 1.2450 (max= 1.5970), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:43,764 - root - INFO - Step 19180: lr=1.00E-05, loss= 1.2450 (max= 1.5970), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:43,764 - root - INFO - Step 19180: lr=1.00E-05, loss= 1.2450 (max= 1.5970), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:43,764 - root - INFO - Step 19180: lr=1.00E-05, loss= 1.2450 (max= 1.5970), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:43,764 - root - INFO - Step 19180: lr=1.00E-05, loss= 1.2450 (max= 1.5970), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:43,764 - root - INFO - Step 19180: lr=1.00E-05, loss= 1.2450 (max= 1.5970), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:59,696 - root - INFO - Step 19190: lr=1.00E-05, loss= 1.2177 (max= 1.4471), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:59,696 - root - INFO - Step 19190: lr=1.00E-05, loss= 1.2177 (max= 1.4471), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:59,696 - root - INFO - Step 19190: lr=1.00E-05, loss= 1.2177 (max= 1.4471), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:59,696 - root - INFO - Step 19190: lr=1.00E-05, loss= 1.2177 (max= 1.4471), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:59,696 - root - INFO - Step 19190: lr=1.00E-05, loss= 1.2177 (max= 1.4471), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:59,696 - root - INFO - Step 19190: lr=1.00E-05, loss= 1.2177 (max= 1.4471), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:59,696 - root - INFO - Step 19190: lr=1.00E-05, loss= 1.2177 (max= 1.4471), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:02:59,696 - root - INFO - Step 19190: lr=1.00E-05, loss= 1.2177 (max= 1.4471), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:15,630 - root - INFO - Step 19200: lr=1.00E-05, loss= 1.2138 (max= 1.5541), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:15,630 - root - INFO - Step 19200: lr=1.00E-05, loss= 1.2138 (max= 1.5541), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:15,630 - root - INFO - Step 19200: lr=1.00E-05, loss= 1.2138 (max= 1.5541), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:15,630 - root - INFO - Step 19200: lr=1.00E-05, loss= 1.2138 (max= 1.5541), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:15,630 - root - INFO - Step 19200: lr=1.00E-05, loss= 1.2138 (max= 1.5541), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:15,630 - root - INFO - Step 19200: lr=1.00E-05, loss= 1.2138 (max= 1.5541), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:15,630 - root - INFO - Step 19200: lr=1.00E-05, loss= 1.2138 (max= 1.5541), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:15,631 - root - INFO - Step 19200: lr=1.00E-05, loss= 1.2138 (max= 1.5541), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:31,544 - root - INFO - Step 19210: lr=1.00E-05, loss= 1.2380 (max= 1.6143), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:31,544 - root - INFO - Step 19210: lr=1.00E-05, loss= 1.2380 (max= 1.6143), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:31,544 - root - INFO - Step 19210: lr=1.00E-05, loss= 1.2380 (max= 1.6143), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:31,544 - root - INFO - Step 19210: lr=1.00E-05, loss= 1.2380 (max= 1.6143), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:31,544 - root - INFO - Step 19210: lr=1.00E-05, loss= 1.2380 (max= 1.6143), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:31,544 - root - INFO - Step 19210: lr=1.00E-05, loss= 1.2380 (max= 1.6143), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:31,544 - root - INFO - Step 19210: lr=1.00E-05, loss= 1.2380 (max= 1.6143), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:31,544 - root - INFO - Step 19210: lr=1.00E-05, loss= 1.2380 (max= 1.6143), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:47,426 - root - INFO - Step 19220: lr=1.00E-05, loss= 1.2484 (max= 1.5824), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:47,427 - root - INFO - Step 19220: lr=1.00E-05, loss= 1.2484 (max= 1.5824), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:47,427 - root - INFO - Step 19220: lr=1.00E-05, loss= 1.2484 (max= 1.5824), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:47,427 - root - INFO - Step 19220: lr=1.00E-05, loss= 1.2484 (max= 1.5824), tps=20636, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:47,427 - root - INFO - Step 19220: lr=1.00E-05, loss= 1.2484 (max= 1.5824), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:47,427 - root - INFO - Step 19220: lr=1.00E-05, loss= 1.2484 (max= 1.5824), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:47,427 - root - INFO - Step 19220: lr=1.00E-05, loss= 1.2484 (max= 1.5824), tps=20636, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:03:47,427 - root - INFO - Step 19220: lr=1.00E-05, loss= 1.2484 (max= 1.5824), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:03,418 - root - INFO - Step 19230: lr=1.00E-05, loss= 1.1932 (max= 1.6147), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:03,418 - root - INFO - Step 19230: lr=1.00E-05, loss= 1.1932 (max= 1.6147), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:03,419 - root - INFO - Step 19230: lr=1.00E-05, loss= 1.1932 (max= 1.6147), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:03,419 - root - INFO - Step 19230: lr=1.00E-05, loss= 1.1932 (max= 1.6147), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:03,419 - root - INFO - Step 19230: lr=1.00E-05, loss= 1.1932 (max= 1.6147), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:03,419 - root - INFO - Step 19230: lr=1.00E-05, loss= 1.1932 (max= 1.6147), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:03,419 - root - INFO - Step 19230: lr=1.00E-05, loss= 1.1932 (max= 1.6147), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:03,419 - root - INFO - Step 19230: lr=1.00E-05, loss= 1.1932 (max= 1.6147), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:19,353 - root - INFO - Step 19240: lr=1.00E-05, loss= 1.2103 (max= 1.5225), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:19,353 - root - INFO - Step 19240: lr=1.00E-05, loss= 1.2103 (max= 1.5225), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:19,353 - root - INFO - Step 19240: lr=1.00E-05, loss= 1.2103 (max= 1.5225), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:19,353 - root - INFO - Step 19240: lr=1.00E-05, loss= 1.2103 (max= 1.5225), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:19,353 - root - INFO - Step 19240: lr=1.00E-05, loss= 1.2103 (max= 1.5225), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:19,353 - root - INFO - Step 19240: lr=1.00E-05, loss= 1.2103 (max= 1.5225), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:19,353 - root - INFO - Step 19240: lr=1.00E-05, loss= 1.2103 (max= 1.5225), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:19,354 - root - INFO - Step 19240: lr=1.00E-05, loss= 1.2103 (max= 1.5225), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:35,320 - root - INFO - Step 19250: lr=1.00E-05, loss= 1.2478 (max= 1.6338), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:35,320 - root - INFO - Step 19250: lr=1.00E-05, loss= 1.2478 (max= 1.6338), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:35,320 - root - INFO - Step 19250: lr=1.00E-05, loss= 1.2478 (max= 1.6338), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:35,320 - root - INFO - Step 19250: lr=1.00E-05, loss= 1.2478 (max= 1.6338), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:35,320 - root - INFO - Step 19250: lr=1.00E-05, loss= 1.2478 (max= 1.6338), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:35,320 - root - INFO - Step 19250: lr=1.00E-05, loss= 1.2478 (max= 1.6338), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:35,320 - root - INFO - Step 19250: lr=1.00E-05, loss= 1.2478 (max= 1.6338), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:35,320 - root - INFO - Step 19250: lr=1.00E-05, loss= 1.2478 (max= 1.6338), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:51,268 - root - INFO - Step 19260: lr=1.00E-05, loss= 1.2436 (max= 1.6959), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:51,268 - root - INFO - Step 19260: lr=1.00E-05, loss= 1.2436 (max= 1.6959), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:51,268 - root - INFO - Step 19260: lr=1.00E-05, loss= 1.2436 (max= 1.6959), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:51,268 - root - INFO - Step 19260: lr=1.00E-05, loss= 1.2436 (max= 1.6959), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:51,268 - root - INFO - Step 19260: lr=1.00E-05, loss= 1.2436 (max= 1.6959), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:51,268 - root - INFO - Step 19260: lr=1.00E-05, loss= 1.2436 (max= 1.6959), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:51,268 - root - INFO - Step 19260: lr=1.00E-05, loss= 1.2436 (max= 1.6959), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:04:51,269 - root - INFO - Step 19260: lr=1.00E-05, loss= 1.2436 (max= 1.6959), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:07,182 - root - INFO - Step 19270: lr=1.00E-05, loss= 1.2105 (max= 1.5874), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:07,182 - root - INFO - Step 19270: lr=1.00E-05, loss= 1.2105 (max= 1.5874), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:07,182 - root - INFO - Step 19270: lr=1.00E-05, loss= 1.2105 (max= 1.5874), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:07,182 - root - INFO - Step 19270: lr=1.00E-05, loss= 1.2105 (max= 1.5874), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:07,182 - root - INFO - Step 19270: lr=1.00E-05, loss= 1.2105 (max= 1.5874), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:07,182 - root - INFO - Step 19270: lr=1.00E-05, loss= 1.2105 (max= 1.5874), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:07,182 - root - INFO - Step 19270: lr=1.00E-05, loss= 1.2105 (max= 1.5874), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:07,183 - root - INFO - Step 19270: lr=1.00E-05, loss= 1.2105 (max= 1.5874), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:23,149 - root - INFO - Step 19280: lr=1.00E-05, loss= 1.2223 (max= 1.5146), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:23,149 - root - INFO - Step 19280: lr=1.00E-05, loss= 1.2223 (max= 1.5146), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:23,149 - root - INFO - Step 19280: lr=1.00E-05, loss= 1.2223 (max= 1.5146), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:23,149 - root - INFO - Step 19280: lr=1.00E-05, loss= 1.2223 (max= 1.5146), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:23,149 - root - INFO - Step 19280: lr=1.00E-05, loss= 1.2223 (max= 1.5146), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:23,149 - root - INFO - Step 19280: lr=1.00E-05, loss= 1.2223 (max= 1.5146), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:23,150 - root - INFO - Step 19280: lr=1.00E-05, loss= 1.2223 (max= 1.5146), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:23,150 - root - INFO - Step 19280: lr=1.00E-05, loss= 1.2223 (max= 1.5146), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:39,085 - root - INFO - Step 19290: lr=1.00E-05, loss= 1.2253 (max= 1.6484), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:39,085 - root - INFO - Step 19290: lr=1.00E-05, loss= 1.2253 (max= 1.6484), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:39,085 - root - INFO - Step 19290: lr=1.00E-05, loss= 1.2253 (max= 1.6484), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:39,085 - root - INFO - Step 19290: lr=1.00E-05, loss= 1.2253 (max= 1.6484), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:39,085 - root - INFO - Step 19290: lr=1.00E-05, loss= 1.2253 (max= 1.6484), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:39,085 - root - INFO - Step 19290: lr=1.00E-05, loss= 1.2253 (max= 1.6484), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:39,085 - root - INFO - Step 19290: lr=1.00E-05, loss= 1.2253 (max= 1.6484), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:39,085 - root - INFO - Step 19290: lr=1.00E-05, loss= 1.2253 (max= 1.6484), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:55,019 - root - INFO - Step 19300: lr=1.00E-05, loss= 1.2211 (max= 1.5714), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:55,019 - root - INFO - Step 19300: lr=1.00E-05, loss= 1.2211 (max= 1.5714), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:55,019 - root - INFO - Step 19300: lr=1.00E-05, loss= 1.2211 (max= 1.5714), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:55,019 - root - INFO - Step 19300: lr=1.00E-05, loss= 1.2211 (max= 1.5714), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:55,019 - root - INFO - Step 19300: lr=1.00E-05, loss= 1.2211 (max= 1.5714), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:55,019 - root - INFO - Step 19300: lr=1.00E-05, loss= 1.2211 (max= 1.5714), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:55,019 - root - INFO - Step 19300: lr=1.00E-05, loss= 1.2211 (max= 1.5714), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:05:55,019 - root - INFO - Step 19300: lr=1.00E-05, loss= 1.2211 (max= 1.5714), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:10,999 - root - INFO - Step 19310: lr=1.00E-05, loss= 1.1869 (max= 1.6099), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:10,999 - root - INFO - Step 19310: lr=1.00E-05, loss= 1.1869 (max= 1.6099), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:10,999 - root - INFO - Step 19310: lr=1.00E-05, loss= 1.1869 (max= 1.6099), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:10,999 - root - INFO - Step 19310: lr=1.00E-05, loss= 1.1869 (max= 1.6099), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:10,999 - root - INFO - Step 19310: lr=1.00E-05, loss= 1.1869 (max= 1.6099), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:10,999 - root - INFO - Step 19310: lr=1.00E-05, loss= 1.1869 (max= 1.6099), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:10,999 - root - INFO - Step 19310: lr=1.00E-05, loss= 1.1869 (max= 1.6099), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:11,000 - root - INFO - Step 19310: lr=1.00E-05, loss= 1.1869 (max= 1.6099), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:26,954 - root - INFO - Step 19320: lr=1.00E-05, loss= 1.2524 (max= 1.6536), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:26,954 - root - INFO - Step 19320: lr=1.00E-05, loss= 1.2524 (max= 1.6536), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:26,954 - root - INFO - Step 19320: lr=1.00E-05, loss= 1.2524 (max= 1.6536), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:26,954 - root - INFO - Step 19320: lr=1.00E-05, loss= 1.2524 (max= 1.6536), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:26,954 - root - INFO - Step 19320: lr=1.00E-05, loss= 1.2524 (max= 1.6536), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:26,954 - root - INFO - Step 19320: lr=1.00E-05, loss= 1.2524 (max= 1.6536), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:26,954 - root - INFO - Step 19320: lr=1.00E-05, loss= 1.2524 (max= 1.6536), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:26,955 - root - INFO - Step 19320: lr=1.00E-05, loss= 1.2524 (max= 1.6536), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:42,894 - root - INFO - Step 19330: lr=1.00E-05, loss= 1.2235 (max= 1.5729), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:42,894 - root - INFO - Step 19330: lr=1.00E-05, loss= 1.2235 (max= 1.5729), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:42,894 - root - INFO - Step 19330: lr=1.00E-05, loss= 1.2235 (max= 1.5729), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:42,894 - root - INFO - Step 19330: lr=1.00E-05, loss= 1.2235 (max= 1.5729), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:42,894 - root - INFO - Step 19330: lr=1.00E-05, loss= 1.2235 (max= 1.5729), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:42,894 - root - INFO - Step 19330: lr=1.00E-05, loss= 1.2235 (max= 1.5729), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:42,894 - root - INFO - Step 19330: lr=1.00E-05, loss= 1.2235 (max= 1.5729), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:42,894 - root - INFO - Step 19330: lr=1.00E-05, loss= 1.2235 (max= 1.5729), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:58,828 - root - INFO - Step 19340: lr=1.00E-05, loss= 1.2291 (max= 1.5283), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:58,828 - root - INFO - Step 19340: lr=1.00E-05, loss= 1.2291 (max= 1.5283), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:58,828 - root - INFO - Step 19340: lr=1.00E-05, loss= 1.2291 (max= 1.5283), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:58,828 - root - INFO - Step 19340: lr=1.00E-05, loss= 1.2291 (max= 1.5283), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:58,828 - root - INFO - Step 19340: lr=1.00E-05, loss= 1.2291 (max= 1.5283), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:58,828 - root - INFO - Step 19340: lr=1.00E-05, loss= 1.2291 (max= 1.5283), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:58,828 - root - INFO - Step 19340: lr=1.00E-05, loss= 1.2291 (max= 1.5283), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:06:58,828 - root - INFO - Step 19340: lr=1.00E-05, loss= 1.2291 (max= 1.5283), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:07:14,781 - root - INFO - Step 19350: lr=1.00E-05, loss= 1.2136 (max= 1.6254), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:07:14,781 - root - INFO - Step 19350: lr=1.00E-05, loss= 1.2136 (max= 1.6254), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:07:14,781 - root - INFO - Step 19350: lr=1.00E-05, loss= 1.2136 (max= 1.6254), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:07:14,781 - root - INFO - Step 19350: lr=1.00E-05, loss= 1.2136 (max= 1.6254), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:07:14,781 - root - INFO - Step 19350: lr=1.00E-05, loss= 1.2136 (max= 1.6254), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:07:14,781 - root - INFO - Step 19350: lr=1.00E-05, loss= 1.2136 (max= 1.6254), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:07:14,781 - root - INFO - Step 19350: lr=1.00E-05, loss= 1.2136 (max= 1.6254), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:07:14,781 - root - INFO - Step 19350: lr=1.00E-05, loss= 1.2136 (max= 1.6254), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:07:30,656 - root - INFO - Step 19360: lr=1.00E-05, loss= 1.2202 (max= 1.5926), tps=20644, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:07:30,657 - root - INFO - Step 19360: lr=1.00E-05, loss= 1.2202 (max= 1.5926), tps=20645, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:07:30,657 - root - INFO - Step 19360: lr=1.00E-05, loss= 1.2202 (max= 1.5926), tps=20644, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:07:30,657 - root - INFO - Step 19360: lr=1.00E-05, loss= 1.2202 (max= 1.5926), tps=20645, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:07:30,657 - root - INFO - Step 19360: lr=1.00E-05, loss= 1.2202 (max= 1.5926), tps=20645, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:07:30,657 - root - INFO - Step 19360: lr=1.00E-05, loss= 1.2202 (max= 1.5926), tps=20644, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:07:30,657 - root - INFO - Step 19360: lr=1.00E-05, loss= 1.2202 (max= 1.5926), tps=20645, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:07:30,657 - root - INFO - Step 19360: lr=1.00E-05, loss= 1.2202 (max= 1.5926), tps=20644, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:07:46,562 - root - INFO - Step 19370: lr=1.00E-05, loss= 1.2423 (max= 1.6510), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:07:46,562 - root - INFO - Step 19370: lr=1.00E-05, loss= 1.2423 (max= 1.6510), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:07:46,562 - root - INFO - Step 19370: lr=1.00E-05, loss= 1.2423 (max= 1.6510), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:07:46,563 - root - INFO - Step 19370: lr=1.00E-05, loss= 1.2423 (max= 1.6510), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:07:46,563 - root - INFO - Step 19370: lr=1.00E-05, loss= 1.2423 (max= 1.6510), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:07:46,563 - root - INFO - Step 19370: lr=1.00E-05, loss= 1.2423 (max= 1.6510), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:07:46,563 - root - INFO - Step 19370: lr=1.00E-05, loss= 1.2423 (max= 1.6510), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:07:46,563 - root - INFO - Step 19370: lr=1.00E-05, loss= 1.2423 (max= 1.6510), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:08:02,547 - root - INFO - Step 19380: lr=1.00E-05, loss= 1.2468 (max= 1.7786), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:08:02,547 - root - INFO - Step 19380: lr=1.00E-05, loss= 1.2468 (max= 1.7786), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:08:02,547 - root - INFO - Step 19380: lr=1.00E-05, loss= 1.2468 (max= 1.7786), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:08:02,547 - root - INFO - Step 19380: lr=1.00E-05, loss= 1.2468 (max= 1.7786), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:08:02,547 - root - INFO - Step 19380: lr=1.00E-05, loss= 1.2468 (max= 1.7786), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:08:02,547 - root - INFO - Step 19380: lr=1.00E-05, loss= 1.2468 (max= 1.7786), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:08:02,548 - root - INFO - Step 19380: lr=1.00E-05, loss= 1.2468 (max= 1.7786), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:08:02,548 - root - INFO - Step 19380: lr=1.00E-05, loss= 1.2468 (max= 1.7786), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:08:18,454 - root - INFO - Step 19390: lr=1.00E-05, loss= 1.2450 (max= 1.6521), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:08:18,455 - root - INFO - Step 19390: lr=1.00E-05, loss= 1.2450 (max= 1.6521), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:08:18,455 - root - INFO - Step 19390: lr=1.00E-05, loss= 1.2450 (max= 1.6521), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:08:18,455 - root - INFO - Step 19390: lr=1.00E-05, loss= 1.2450 (max= 1.6521), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:08:18,455 - root - INFO - Step 19390: lr=1.00E-05, loss= 1.2450 (max= 1.6521), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:08:18,455 - root - INFO - Step 19390: lr=1.00E-05, loss= 1.2450 (max= 1.6521), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:08:18,455 - root - INFO - Step 19390: lr=1.00E-05, loss= 1.2450 (max= 1.6521), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:08:18,455 - root - INFO - Step 19390: lr=1.00E-05, loss= 1.2450 (max= 1.6521), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:08:34,451 - root - INFO - Step 19400: lr=1.00E-05, loss= 1.2276 (max= 1.7194), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:08:34,451 - root - INFO - Step 19400: lr=1.00E-05, loss= 1.2276 (max= 1.7194), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:08:34,451 - root - INFO - Step 19400: lr=1.00E-05, loss= 1.2276 (max= 1.7194), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:08:34,451 - root - INFO - Step 19400: lr=1.00E-05, loss= 1.2276 (max= 1.7194), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:08:34,451 - root - INFO - Step 19400: lr=1.00E-05, loss= 1.2276 (max= 1.7194), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:08:34,451 - root - INFO - Step 19400: lr=1.00E-05, loss= 1.2276 (max= 1.7194), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:08:34,451 - root - INFO - Step 19400: lr=1.00E-05, loss= 1.2276 (max= 1.7194), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:08:34,451 - root - INFO - Step 19400: lr=1.00E-05, loss= 1.2276 (max= 1.7194), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:08:50,422 - root - INFO - Step 19410: lr=1.00E-05, loss= 1.2048 (max= 1.5835), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:08:50,422 - root - INFO - Step 19410: lr=1.00E-05, loss= 1.2048 (max= 1.5835), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:08:50,422 - root - INFO - Step 19410: lr=1.00E-05, loss= 1.2048 (max= 1.5835), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:08:50,422 - root - INFO - Step 19410: lr=1.00E-05, loss= 1.2048 (max= 1.5835), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:08:50,423 - root - INFO - Step 19410: lr=1.00E-05, loss= 1.2048 (max= 1.5835), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:08:50,423 - root - INFO - Step 19410: lr=1.00E-05, loss= 1.2048 (max= 1.5835), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:08:50,423 - root - INFO - Step 19410: lr=1.00E-05, loss= 1.2048 (max= 1.5835), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:08:50,423 - root - INFO - Step 19410: lr=1.00E-05, loss= 1.2048 (max= 1.5835), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:09:06,376 - root - INFO - Step 19420: lr=1.00E-05, loss= 1.2226 (max= 1.5159), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:06,377 - root - INFO - Step 19420: lr=1.00E-05, loss= 1.2226 (max= 1.5159), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:06,377 - root - INFO - Step 19420: lr=1.00E-05, loss= 1.2226 (max= 1.5159), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:06,377 - root - INFO - Step 19420: lr=1.00E-05, loss= 1.2226 (max= 1.5159), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:06,377 - root - INFO - Step 19420: lr=1.00E-05, loss= 1.2226 (max= 1.5159), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:06,377 - root - INFO - Step 19420: lr=1.00E-05, loss= 1.2226 (max= 1.5159), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:06,377 - root - INFO - Step 19420: lr=1.00E-05, loss= 1.2226 (max= 1.5159), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:06,377 - root - INFO - Step 19420: lr=1.00E-05, loss= 1.2226 (max= 1.5159), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:22,325 - root - INFO - Step 19430: lr=1.00E-05, loss= 1.2574 (max= 1.9152), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:22,325 - root - INFO - Step 19430: lr=1.00E-05, loss= 1.2574 (max= 1.9152), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:22,325 - root - INFO - Step 19430: lr=1.00E-05, loss= 1.2574 (max= 1.9152), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:22,325 - root - INFO - Step 19430: lr=1.00E-05, loss= 1.2574 (max= 1.9152), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:22,325 - root - INFO - Step 19430: lr=1.00E-05, loss= 1.2574 (max= 1.9152), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:22,325 - root - INFO - Step 19430: lr=1.00E-05, loss= 1.2574 (max= 1.9152), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:22,325 - root - INFO - Step 19430: lr=1.00E-05, loss= 1.2574 (max= 1.9152), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:22,326 - root - INFO - Step 19430: lr=1.00E-05, loss= 1.2574 (max= 1.9152), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:38,281 - root - INFO - Step 19440: lr=1.00E-05, loss= 1.2376 (max= 1.6311), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:38,281 - root - INFO - Step 19440: lr=1.00E-05, loss= 1.2376 (max= 1.6311), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:38,281 - root - INFO - Step 19440: lr=1.00E-05, loss= 1.2376 (max= 1.6311), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:38,281 - root - INFO - Step 19440: lr=1.00E-05, loss= 1.2376 (max= 1.6311), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:38,281 - root - INFO - Step 19440: lr=1.00E-05, loss= 1.2376 (max= 1.6311), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:38,281 - root - INFO - Step 19440: lr=1.00E-05, loss= 1.2376 (max= 1.6311), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:38,281 - root - INFO - Step 19440: lr=1.00E-05, loss= 1.2376 (max= 1.6311), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:38,282 - root - INFO - Step 19440: lr=1.00E-05, loss= 1.2376 (max= 1.6311), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:09:54,245 - root - INFO - Step 19450: lr=1.00E-05, loss= 1.2275 (max= 1.5510), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:09:54,245 - root - INFO - Step 19450: lr=1.00E-05, loss= 1.2275 (max= 1.5510), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:09:54,245 - root - INFO - Step 19450: lr=1.00E-05, loss= 1.2275 (max= 1.5510), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:09:54,245 - root - INFO - Step 19450: lr=1.00E-05, loss= 1.2275 (max= 1.5510), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:09:54,245 - root - INFO - Step 19450: lr=1.00E-05, loss= 1.2275 (max= 1.5510), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:09:54,245 - root - INFO - Step 19450: lr=1.00E-05, loss= 1.2275 (max= 1.5510), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:09:54,245 - root - INFO - Step 19450: lr=1.00E-05, loss= 1.2275 (max= 1.5510), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:09:54,245 - root - INFO - Step 19450: lr=1.00E-05, loss= 1.2275 (max= 1.5510), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:10:10,179 - root - INFO - Step 19460: lr=1.00E-05, loss= 1.2607 (max= 1.6460), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:10,179 - root - INFO - Step 19460: lr=1.00E-05, loss= 1.2607 (max= 1.6460), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:10,179 - root - INFO - Step 19460: lr=1.00E-05, loss= 1.2607 (max= 1.6460), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:10,179 - root - INFO - Step 19460: lr=1.00E-05, loss= 1.2607 (max= 1.6460), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:10,179 - root - INFO - Step 19460: lr=1.00E-05, loss= 1.2607 (max= 1.6460), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:10,179 - root - INFO - Step 19460: lr=1.00E-05, loss= 1.2607 (max= 1.6460), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:10,179 - root - INFO - Step 19460: lr=1.00E-05, loss= 1.2607 (max= 1.6460), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:10,180 - root - INFO - Step 19460: lr=1.00E-05, loss= 1.2607 (max= 1.6460), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:26,119 - root - INFO - Step 19470: lr=1.00E-05, loss= 1.2118 (max= 1.4737), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:26,119 - root - INFO - Step 19470: lr=1.00E-05, loss= 1.2118 (max= 1.4737), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:26,119 - root - INFO - Step 19470: lr=1.00E-05, loss= 1.2118 (max= 1.4737), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:26,119 - root - INFO - Step 19470: lr=1.00E-05, loss= 1.2118 (max= 1.4737), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:26,119 - root - INFO - Step 19470: lr=1.00E-05, loss= 1.2118 (max= 1.4737), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:26,119 - root - INFO - Step 19470: lr=1.00E-05, loss= 1.2118 (max= 1.4737), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:26,119 - root - INFO - Step 19470: lr=1.00E-05, loss= 1.2118 (max= 1.4737), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:26,119 - root - INFO - Step 19470: lr=1.00E-05, loss= 1.2118 (max= 1.4737), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:42,083 - root - INFO - Step 19480: lr=1.00E-05, loss= 1.2286 (max= 1.5482), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:42,083 - root - INFO - Step 19480: lr=1.00E-05, loss= 1.2286 (max= 1.5482), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:42,083 - root - INFO - Step 19480: lr=1.00E-05, loss= 1.2286 (max= 1.5482), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:42,083 - root - INFO - Step 19480: lr=1.00E-05, loss= 1.2286 (max= 1.5482), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:42,083 - root - INFO - Step 19480: lr=1.00E-05, loss= 1.2286 (max= 1.5482), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:42,083 - root - INFO - Step 19480: lr=1.00E-05, loss= 1.2286 (max= 1.5482), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:42,083 - root - INFO - Step 19480: lr=1.00E-05, loss= 1.2286 (max= 1.5482), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:42,084 - root - INFO - Step 19480: lr=1.00E-05, loss= 1.2286 (max= 1.5482), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:57,987 - root - INFO - Step 19490: lr=1.00E-05, loss= 1.1870 (max= 1.5770), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:57,987 - root - INFO - Step 19490: lr=1.00E-05, loss= 1.1870 (max= 1.5770), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:57,987 - root - INFO - Step 19490: lr=1.00E-05, loss= 1.1870 (max= 1.5770), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:57,987 - root - INFO - Step 19490: lr=1.00E-05, loss= 1.1870 (max= 1.5770), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:57,987 - root - INFO - Step 19490: lr=1.00E-05, loss= 1.1870 (max= 1.5770), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:57,987 - root - INFO - Step 19490: lr=1.00E-05, loss= 1.1870 (max= 1.5770), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:57,987 - root - INFO - Step 19490: lr=1.00E-05, loss= 1.1870 (max= 1.5770), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:10:57,987 - root - INFO - Step 19490: lr=1.00E-05, loss= 1.1870 (max= 1.5770), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:11:13,884 - root - INFO - Step 19500: lr=1.00E-05, loss= 1.2113 (max= 1.6232), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:11:13,885 - root - INFO - Step 19500: lr=1.00E-05, loss= 1.2113 (max= 1.6232), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:11:13,885 - root - INFO - Step 19500: lr=1.00E-05, loss= 1.2113 (max= 1.6232), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:11:13,885 - root - INFO - Step 19500: lr=1.00E-05, loss= 1.2113 (max= 1.6232), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:11:13,885 - root - INFO - Step 19500: lr=1.00E-05, loss= 1.2113 (max= 1.6232), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:11:13,885 - root - INFO - Step 19500: lr=1.00E-05, loss= 1.2113 (max= 1.6232), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:11:13,885 - root - INFO - Step 19500: lr=1.00E-05, loss= 1.2113 (max= 1.6232), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:11:13,885 - root - INFO - Step 19500: lr=1.00E-05, loss= 1.2113 (max= 1.6232), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:11:29,848 - root - INFO - Step 19510: lr=1.00E-05, loss= 1.2180 (max= 1.7368), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:11:29,848 - root - INFO - Step 19510: lr=1.00E-05, loss= 1.2180 (max= 1.7368), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:11:29,848 - root - INFO - Step 19510: lr=1.00E-05, loss= 1.2180 (max= 1.7368), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:11:29,848 - root - INFO - Step 19510: lr=1.00E-05, loss= 1.2180 (max= 1.7368), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:11:29,848 - root - INFO - Step 19510: lr=1.00E-05, loss= 1.2180 (max= 1.7368), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:11:29,848 - root - INFO - Step 19510: lr=1.00E-05, loss= 1.2180 (max= 1.7368), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:11:29,848 - root - INFO - Step 19510: lr=1.00E-05, loss= 1.2180 (max= 1.7368), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:11:29,849 - root - INFO - Step 19510: lr=1.00E-05, loss= 1.2180 (max= 1.7368), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:11:45,799 - root - INFO - Step 19520: lr=1.00E-05, loss= 1.2248 (max= 1.5826), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:11:45,799 - root - INFO - Step 19520: lr=1.00E-05, loss= 1.2248 (max= 1.5826), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:11:45,799 - root - INFO - Step 19520: lr=1.00E-05, loss= 1.2248 (max= 1.5826), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:11:45,799 - root - INFO - Step 19520: lr=1.00E-05, loss= 1.2248 (max= 1.5826), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:11:45,799 - root - INFO - Step 19520: lr=1.00E-05, loss= 1.2248 (max= 1.5826), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:11:45,799 - root - INFO - Step 19520: lr=1.00E-05, loss= 1.2248 (max= 1.5826), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:11:45,799 - root - INFO - Step 19520: lr=1.00E-05, loss= 1.2248 (max= 1.5826), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:11:45,799 - root - INFO - Step 19520: lr=1.00E-05, loss= 1.2248 (max= 1.5826), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:12:01,751 - root - INFO - Step 19530: lr=1.00E-05, loss= 1.2097 (max= 1.5291), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:12:01,751 - root - INFO - Step 19530: lr=1.00E-05, loss= 1.2097 (max= 1.5291), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:12:01,751 - root - INFO - Step 19530: lr=1.00E-05, loss= 1.2097 (max= 1.5291), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:12:01,751 - root - INFO - Step 19530: lr=1.00E-05, loss= 1.2097 (max= 1.5291), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:12:01,751 - root - INFO - Step 19530: lr=1.00E-05, loss= 1.2097 (max= 1.5291), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:12:01,751 - root - INFO - Step 19530: lr=1.00E-05, loss= 1.2097 (max= 1.5291), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:12:01,751 - root - INFO - Step 19530: lr=1.00E-05, loss= 1.2097 (max= 1.5291), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:12:01,751 - root - INFO - Step 19530: lr=1.00E-05, loss= 1.2097 (max= 1.5291), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:12:17,703 - root - INFO - Step 19540: lr=1.00E-05, loss= 1.2066 (max= 1.5365), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:17,703 - root - INFO - Step 19540: lr=1.00E-05, loss= 1.2066 (max= 1.5365), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:17,703 - root - INFO - Step 19540: lr=1.00E-05, loss= 1.2066 (max= 1.5365), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:17,703 - root - INFO - Step 19540: lr=1.00E-05, loss= 1.2066 (max= 1.5365), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:17,703 - root - INFO - Step 19540: lr=1.00E-05, loss= 1.2066 (max= 1.5365), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:17,703 - root - INFO - Step 19540: lr=1.00E-05, loss= 1.2066 (max= 1.5365), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:17,703 - root - INFO - Step 19540: lr=1.00E-05, loss= 1.2066 (max= 1.5365), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:17,703 - root - INFO - Step 19540: lr=1.00E-05, loss= 1.2066 (max= 1.5365), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:33,697 - root - INFO - Step 19550: lr=1.00E-05, loss= 1.2104 (max= 1.6055), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:33,697 - root - INFO - Step 19550: lr=1.00E-05, loss= 1.2104 (max= 1.6055), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:33,697 - root - INFO - Step 19550: lr=1.00E-05, loss= 1.2104 (max= 1.6055), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:33,697 - root - INFO - Step 19550: lr=1.00E-05, loss= 1.2104 (max= 1.6055), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:33,697 - root - INFO - Step 19550: lr=1.00E-05, loss= 1.2104 (max= 1.6055), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:33,697 - root - INFO - Step 19550: lr=1.00E-05, loss= 1.2104 (max= 1.6055), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:33,697 - root - INFO - Step 19550: lr=1.00E-05, loss= 1.2104 (max= 1.6055), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:33,697 - root - INFO - Step 19550: lr=1.00E-05, loss= 1.2104 (max= 1.6055), tps=20492, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:49,652 - root - INFO - Step 19560: lr=1.00E-05, loss= 1.2532 (max= 1.9230), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:49,652 - root - INFO - Step 19560: lr=1.00E-05, loss= 1.2532 (max= 1.9230), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:49,652 - root - INFO - Step 19560: lr=1.00E-05, loss= 1.2532 (max= 1.9230), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:49,652 - root - INFO - Step 19560: lr=1.00E-05, loss= 1.2532 (max= 1.9230), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:49,652 - root - INFO - Step 19560: lr=1.00E-05, loss= 1.2532 (max= 1.9230), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:49,653 - root - INFO - Step 19560: lr=1.00E-05, loss= 1.2532 (max= 1.9230), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:49,653 - root - INFO - Step 19560: lr=1.00E-05, loss= 1.2532 (max= 1.9230), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:12:49,653 - root - INFO - Step 19560: lr=1.00E-05, loss= 1.2532 (max= 1.9230), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:13:05,614 - root - INFO - Step 19570: lr=1.00E-05, loss= 1.2174 (max= 1.5217), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:05,614 - root - INFO - Step 19570: lr=1.00E-05, loss= 1.2174 (max= 1.5217), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:05,614 - root - INFO - Step 19570: lr=1.00E-05, loss= 1.2174 (max= 1.5217), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:05,614 - root - INFO - Step 19570: lr=1.00E-05, loss= 1.2174 (max= 1.5217), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:05,614 - root - INFO - Step 19570: lr=1.00E-05, loss= 1.2174 (max= 1.5217), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:05,614 - root - INFO - Step 19570: lr=1.00E-05, loss= 1.2174 (max= 1.5217), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:05,614 - root - INFO - Step 19570: lr=1.00E-05, loss= 1.2174 (max= 1.5217), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:05,614 - root - INFO - Step 19570: lr=1.00E-05, loss= 1.2174 (max= 1.5217), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:21,565 - root - INFO - Step 19580: lr=1.00E-05, loss= 1.2335 (max= 1.6529), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:21,565 - root - INFO - Step 19580: lr=1.00E-05, loss= 1.2335 (max= 1.6529), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:21,565 - root - INFO - Step 19580: lr=1.00E-05, loss= 1.2335 (max= 1.6529), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:21,565 - root - INFO - Step 19580: lr=1.00E-05, loss= 1.2335 (max= 1.6529), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:21,565 - root - INFO - Step 19580: lr=1.00E-05, loss= 1.2335 (max= 1.6529), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:21,565 - root - INFO - Step 19580: lr=1.00E-05, loss= 1.2335 (max= 1.6529), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:21,565 - root - INFO - Step 19580: lr=1.00E-05, loss= 1.2335 (max= 1.6529), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:21,565 - root - INFO - Step 19580: lr=1.00E-05, loss= 1.2335 (max= 1.6529), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:37,554 - root - INFO - Step 19590: lr=1.00E-05, loss= 1.2505 (max= 1.5860), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:37,554 - root - INFO - Step 19590: lr=1.00E-05, loss= 1.2505 (max= 1.5860), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:37,554 - root - INFO - Step 19590: lr=1.00E-05, loss= 1.2505 (max= 1.5860), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:37,554 - root - INFO - Step 19590: lr=1.00E-05, loss= 1.2505 (max= 1.5860), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:37,554 - root - INFO - Step 19590: lr=1.00E-05, loss= 1.2505 (max= 1.5860), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:37,554 - root - INFO - Step 19590: lr=1.00E-05, loss= 1.2505 (max= 1.5860), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:37,554 - root - INFO - Step 19590: lr=1.00E-05, loss= 1.2505 (max= 1.5860), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:37,554 - root - INFO - Step 19590: lr=1.00E-05, loss= 1.2505 (max= 1.5860), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:53,479 - root - INFO - Step 19600: lr=1.00E-05, loss= 1.2434 (max= 1.8250), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:53,479 - root - INFO - Step 19600: lr=1.00E-05, loss= 1.2434 (max= 1.8250), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:53,479 - root - INFO - Step 19600: lr=1.00E-05, loss= 1.2434 (max= 1.8250), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:53,480 - root - INFO - Step 19600: lr=1.00E-05, loss= 1.2434 (max= 1.8250), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:53,480 - root - INFO - Step 19600: lr=1.00E-05, loss= 1.2434 (max= 1.8250), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:53,480 - root - INFO - Step 19600: lr=1.00E-05, loss= 1.2434 (max= 1.8250), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:53,480 - root - INFO - Step 19600: lr=1.00E-05, loss= 1.2434 (max= 1.8250), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:13:53,480 - root - INFO - Step 19600: lr=1.00E-05, loss= 1.2434 (max= 1.8250), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:09,379 - root - INFO - Step 19610: lr=1.00E-05, loss= 1.2268 (max= 1.5756), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:09,379 - root - INFO - Step 19610: lr=1.00E-05, loss= 1.2268 (max= 1.5756), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:09,379 - root - INFO - Step 19610: lr=1.00E-05, loss= 1.2268 (max= 1.5756), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:09,379 - root - INFO - Step 19610: lr=1.00E-05, loss= 1.2268 (max= 1.5756), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:09,379 - root - INFO - Step 19610: lr=1.00E-05, loss= 1.2268 (max= 1.5756), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:09,379 - root - INFO - Step 19610: lr=1.00E-05, loss= 1.2268 (max= 1.5756), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:09,379 - root - INFO - Step 19610: lr=1.00E-05, loss= 1.2268 (max= 1.5756), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:09,380 - root - INFO - Step 19610: lr=1.00E-05, loss= 1.2268 (max= 1.5756), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:25,362 - root - INFO - Step 19620: lr=1.00E-05, loss= 1.2183 (max= 1.5460), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:25,362 - root - INFO - Step 19620: lr=1.00E-05, loss= 1.2183 (max= 1.5460), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:25,362 - root - INFO - Step 19620: lr=1.00E-05, loss= 1.2183 (max= 1.5460), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:25,362 - root - INFO - Step 19620: lr=1.00E-05, loss= 1.2183 (max= 1.5460), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:25,362 - root - INFO - Step 19620: lr=1.00E-05, loss= 1.2183 (max= 1.5460), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:25,362 - root - INFO - Step 19620: lr=1.00E-05, loss= 1.2183 (max= 1.5460), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:25,362 - root - INFO - Step 19620: lr=1.00E-05, loss= 1.2183 (max= 1.5460), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:25,362 - root - INFO - Step 19620: lr=1.00E-05, loss= 1.2183 (max= 1.5460), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:41,295 - root - INFO - Step 19630: lr=1.00E-05, loss= 1.2076 (max= 1.6704), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:41,295 - root - INFO - Step 19630: lr=1.00E-05, loss= 1.2076 (max= 1.6704), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:41,295 - root - INFO - Step 19630: lr=1.00E-05, loss= 1.2076 (max= 1.6704), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:41,295 - root - INFO - Step 19630: lr=1.00E-05, loss= 1.2076 (max= 1.6704), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:41,295 - root - INFO - Step 19630: lr=1.00E-05, loss= 1.2076 (max= 1.6704), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:41,295 - root - INFO - Step 19630: lr=1.00E-05, loss= 1.2076 (max= 1.6704), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:41,295 - root - INFO - Step 19630: lr=1.00E-05, loss= 1.2076 (max= 1.6704), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:41,295 - root - INFO - Step 19630: lr=1.00E-05, loss= 1.2076 (max= 1.6704), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:57,245 - root - INFO - Step 19640: lr=1.00E-05, loss= 1.2016 (max= 1.7708), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:57,246 - root - INFO - Step 19640: lr=1.00E-05, loss= 1.2016 (max= 1.7708), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:57,246 - root - INFO - Step 19640: lr=1.00E-05, loss= 1.2016 (max= 1.7708), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:57,246 - root - INFO - Step 19640: lr=1.00E-05, loss= 1.2016 (max= 1.7708), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:57,246 - root - INFO - Step 19640: lr=1.00E-05, loss= 1.2016 (max= 1.7708), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:57,246 - root - INFO - Step 19640: lr=1.00E-05, loss= 1.2016 (max= 1.7708), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:57,246 - root - INFO - Step 19640: lr=1.00E-05, loss= 1.2016 (max= 1.7708), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:14:57,246 - root - INFO - Step 19640: lr=1.00E-05, loss= 1.2016 (max= 1.7708), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:13,166 - root - INFO - Step 19650: lr=1.00E-05, loss= 1.2483 (max= 1.6794), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:13,166 - root - INFO - Step 19650: lr=1.00E-05, loss= 1.2483 (max= 1.6794), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:13,166 - root - INFO - Step 19650: lr=1.00E-05, loss= 1.2483 (max= 1.6794), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:13,166 - root - INFO - Step 19650: lr=1.00E-05, loss= 1.2483 (max= 1.6794), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:13,166 - root - INFO - Step 19650: lr=1.00E-05, loss= 1.2483 (max= 1.6794), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:13,166 - root - INFO - Step 19650: lr=1.00E-05, loss= 1.2483 (max= 1.6794), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:13,166 - root - INFO - Step 19650: lr=1.00E-05, loss= 1.2483 (max= 1.6794), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:13,166 - root - INFO - Step 19650: lr=1.00E-05, loss= 1.2483 (max= 1.6794), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:29,042 - root - INFO - Step 19660: lr=1.00E-05, loss= 1.2240 (max= 1.7146), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:29,042 - root - INFO - Step 19660: lr=1.00E-05, loss= 1.2240 (max= 1.7146), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:29,042 - root - INFO - Step 19660: lr=1.00E-05, loss= 1.2240 (max= 1.7146), tps=20645, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:29,042 - root - INFO - Step 19660: lr=1.00E-05, loss= 1.2240 (max= 1.7146), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:29,042 - root - INFO - Step 19660: lr=1.00E-05, loss= 1.2240 (max= 1.7146), tps=20645, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:29,042 - root - INFO - Step 19660: lr=1.00E-05, loss= 1.2240 (max= 1.7146), tps=20645, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:29,042 - root - INFO - Step 19660: lr=1.00E-05, loss= 1.2240 (max= 1.7146), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:29,042 - root - INFO - Step 19660: lr=1.00E-05, loss= 1.2240 (max= 1.7146), tps=20645, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:44,957 - root - INFO - Step 19670: lr=1.00E-05, loss= 1.2059 (max= 1.6314), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:44,957 - root - INFO - Step 19670: lr=1.00E-05, loss= 1.2059 (max= 1.6314), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:44,957 - root - INFO - Step 19670: lr=1.00E-05, loss= 1.2059 (max= 1.6314), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:44,957 - root - INFO - Step 19670: lr=1.00E-05, loss= 1.2059 (max= 1.6314), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:44,957 - root - INFO - Step 19670: lr=1.00E-05, loss= 1.2059 (max= 1.6314), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:44,958 - root - INFO - Step 19670: lr=1.00E-05, loss= 1.2059 (max= 1.6314), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:44,958 - root - INFO - Step 19670: lr=1.00E-05, loss= 1.2059 (max= 1.6314), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:15:44,958 - root - INFO - Step 19670: lr=1.00E-05, loss= 1.2059 (max= 1.6314), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:00,889 - root - INFO - Step 19680: lr=1.00E-05, loss= 1.2349 (max= 1.6310), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:00,889 - root - INFO - Step 19680: lr=1.00E-05, loss= 1.2349 (max= 1.6310), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:00,889 - root - INFO - Step 19680: lr=1.00E-05, loss= 1.2349 (max= 1.6310), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:00,889 - root - INFO - Step 19680: lr=1.00E-05, loss= 1.2349 (max= 1.6310), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:00,889 - root - INFO - Step 19680: lr=1.00E-05, loss= 1.2349 (max= 1.6310), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:00,889 - root - INFO - Step 19680: lr=1.00E-05, loss= 1.2349 (max= 1.6310), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:00,889 - root - INFO - Step 19680: lr=1.00E-05, loss= 1.2349 (max= 1.6310), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:00,889 - root - INFO - Step 19680: lr=1.00E-05, loss= 1.2349 (max= 1.6310), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:16,875 - root - INFO - Step 19690: lr=1.00E-05, loss= 1.2275 (max= 1.7031), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:16,875 - root - INFO - Step 19690: lr=1.00E-05, loss= 1.2275 (max= 1.7031), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:16,875 - root - INFO - Step 19690: lr=1.00E-05, loss= 1.2275 (max= 1.7031), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:16,875 - root - INFO - Step 19690: lr=1.00E-05, loss= 1.2275 (max= 1.7031), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:16,875 - root - INFO - Step 19690: lr=1.00E-05, loss= 1.2275 (max= 1.7031), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:16,875 - root - INFO - Step 19690: lr=1.00E-05, loss= 1.2275 (max= 1.7031), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:16,876 - root - INFO - Step 19690: lr=1.00E-05, loss= 1.2275 (max= 1.7031), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:16,876 - root - INFO - Step 19690: lr=1.00E-05, loss= 1.2275 (max= 1.7031), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:32,803 - root - INFO - Step 19700: lr=1.00E-05, loss= 1.2430 (max= 1.6370), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:16:32,803 - root - INFO - Step 19700: lr=1.00E-05, loss= 1.2430 (max= 1.6370), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:16:32,803 - root - INFO - Step 19700: lr=1.00E-05, loss= 1.2430 (max= 1.6370), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:16:32,803 - root - INFO - Step 19700: lr=1.00E-05, loss= 1.2430 (max= 1.6370), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:16:32,803 - root - INFO - Step 19700: lr=1.00E-05, loss= 1.2430 (max= 1.6370), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:16:32,803 - root - INFO - Step 19700: lr=1.00E-05, loss= 1.2430 (max= 1.6370), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:16:32,803 - root - INFO - Step 19700: lr=1.00E-05, loss= 1.2430 (max= 1.6370), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:16:32,803 - root - INFO - Step 19700: lr=1.00E-05, loss= 1.2430 (max= 1.6370), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:16:48,758 - root - INFO - Step 19710: lr=1.00E-05, loss= 1.2065 (max= 1.6374), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:48,758 - root - INFO - Step 19710: lr=1.00E-05, loss= 1.2065 (max= 1.6374), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:48,758 - root - INFO - Step 19710: lr=1.00E-05, loss= 1.2065 (max= 1.6374), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:48,758 - root - INFO - Step 19710: lr=1.00E-05, loss= 1.2065 (max= 1.6374), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:48,759 - root - INFO - Step 19710: lr=1.00E-05, loss= 1.2065 (max= 1.6374), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:48,759 - root - INFO - Step 19710: lr=1.00E-05, loss= 1.2065 (max= 1.6374), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:48,759 - root - INFO - Step 19710: lr=1.00E-05, loss= 1.2065 (max= 1.6374), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:16:48,759 - root - INFO - Step 19710: lr=1.00E-05, loss= 1.2065 (max= 1.6374), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:17:04,718 - root - INFO - Step 19720: lr=1.00E-05, loss= 1.2231 (max= 1.5554), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:04,719 - root - INFO - Step 19720: lr=1.00E-05, loss= 1.2231 (max= 1.5554), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:04,719 - root - INFO - Step 19720: lr=1.00E-05, loss= 1.2231 (max= 1.5554), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:04,719 - root - INFO - Step 19720: lr=1.00E-05, loss= 1.2231 (max= 1.5554), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:04,719 - root - INFO - Step 19720: lr=1.00E-05, loss= 1.2231 (max= 1.5554), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:04,719 - root - INFO - Step 19720: lr=1.00E-05, loss= 1.2231 (max= 1.5554), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:04,719 - root - INFO - Step 19720: lr=1.00E-05, loss= 1.2231 (max= 1.5554), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:04,719 - root - INFO - Step 19720: lr=1.00E-05, loss= 1.2231 (max= 1.5554), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:20,653 - root - INFO - Step 19730: lr=1.00E-05, loss= 1.2189 (max= 1.7148), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:20,653 - root - INFO - Step 19730: lr=1.00E-05, loss= 1.2189 (max= 1.7148), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:20,653 - root - INFO - Step 19730: lr=1.00E-05, loss= 1.2189 (max= 1.7148), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:20,653 - root - INFO - Step 19730: lr=1.00E-05, loss= 1.2189 (max= 1.7148), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:20,653 - root - INFO - Step 19730: lr=1.00E-05, loss= 1.2189 (max= 1.7148), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:20,653 - root - INFO - Step 19730: lr=1.00E-05, loss= 1.2189 (max= 1.7148), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:20,653 - root - INFO - Step 19730: lr=1.00E-05, loss= 1.2189 (max= 1.7148), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:20,654 - root - INFO - Step 19730: lr=1.00E-05, loss= 1.2189 (max= 1.7148), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:36,613 - root - INFO - Step 19740: lr=1.00E-05, loss= 1.2506 (max= 1.6533), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:36,613 - root - INFO - Step 19740: lr=1.00E-05, loss= 1.2506 (max= 1.6533), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:36,613 - root - INFO - Step 19740: lr=1.00E-05, loss= 1.2506 (max= 1.6533), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:36,613 - root - INFO - Step 19740: lr=1.00E-05, loss= 1.2506 (max= 1.6533), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:36,613 - root - INFO - Step 19740: lr=1.00E-05, loss= 1.2506 (max= 1.6533), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:36,613 - root - INFO - Step 19740: lr=1.00E-05, loss= 1.2506 (max= 1.6533), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:36,613 - root - INFO - Step 19740: lr=1.00E-05, loss= 1.2506 (max= 1.6533), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:36,613 - root - INFO - Step 19740: lr=1.00E-05, loss= 1.2506 (max= 1.6533), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:52,534 - root - INFO - Step 19750: lr=1.00E-05, loss= 1.2237 (max= 1.5654), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:52,534 - root - INFO - Step 19750: lr=1.00E-05, loss= 1.2237 (max= 1.5654), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:52,534 - root - INFO - Step 19750: lr=1.00E-05, loss= 1.2237 (max= 1.5654), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:52,534 - root - INFO - Step 19750: lr=1.00E-05, loss= 1.2237 (max= 1.5654), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:52,535 - root - INFO - Step 19750: lr=1.00E-05, loss= 1.2237 (max= 1.5654), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:52,535 - root - INFO - Step 19750: lr=1.00E-05, loss= 1.2237 (max= 1.5654), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:52,535 - root - INFO - Step 19750: lr=1.00E-05, loss= 1.2237 (max= 1.5654), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:17:52,535 - root - INFO - Step 19750: lr=1.00E-05, loss= 1.2237 (max= 1.5654), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:18:08,458 - root - INFO - Step 19760: lr=1.00E-05, loss= 1.2459 (max= 1.6245), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:18:08,458 - root - INFO - Step 19760: lr=1.00E-05, loss= 1.2459 (max= 1.6245), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:18:08,458 - root - INFO - Step 19760: lr=1.00E-05, loss= 1.2459 (max= 1.6245), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:18:08,458 - root - INFO - Step 19760: lr=1.00E-05, loss= 1.2459 (max= 1.6245), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:18:08,458 - root - INFO - Step 19760: lr=1.00E-05, loss= 1.2459 (max= 1.6245), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:18:08,458 - root - INFO - Step 19760: lr=1.00E-05, loss= 1.2459 (max= 1.6245), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:18:08,458 - root - INFO - Step 19760: lr=1.00E-05, loss= 1.2459 (max= 1.6245), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:18:08,458 - root - INFO - Step 19760: lr=1.00E-05, loss= 1.2459 (max= 1.6245), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:18:24,419 - root - INFO - Step 19770: lr=1.00E-05, loss= 1.2146 (max= 1.6172), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:18:24,419 - root - INFO - Step 19770: lr=1.00E-05, loss= 1.2146 (max= 1.6172), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:18:24,419 - root - INFO - Step 19770: lr=1.00E-05, loss= 1.2146 (max= 1.6172), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:18:24,420 - root - INFO - Step 19770: lr=1.00E-05, loss= 1.2146 (max= 1.6172), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:18:24,420 - root - INFO - Step 19770: lr=1.00E-05, loss= 1.2146 (max= 1.6172), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:18:24,420 - root - INFO - Step 19770: lr=1.00E-05, loss= 1.2146 (max= 1.6172), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:18:24,420 - root - INFO - Step 19770: lr=1.00E-05, loss= 1.2146 (max= 1.6172), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:18:24,420 - root - INFO - Step 19770: lr=1.00E-05, loss= 1.2146 (max= 1.6172), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:18:40,369 - root - INFO - Step 19780: lr=1.00E-05, loss= 1.2146 (max= 1.5732), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:18:40,370 - root - INFO - Step 19780: lr=1.00E-05, loss= 1.2146 (max= 1.5732), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:18:40,370 - root - INFO - Step 19780: lr=1.00E-05, loss= 1.2146 (max= 1.5732), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:18:40,370 - root - INFO - Step 19780: lr=1.00E-05, loss= 1.2146 (max= 1.5732), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:18:40,370 - root - INFO - Step 19780: lr=1.00E-05, loss= 1.2146 (max= 1.5732), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:18:40,370 - root - INFO - Step 19780: lr=1.00E-05, loss= 1.2146 (max= 1.5732), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:18:40,370 - root - INFO - Step 19780: lr=1.00E-05, loss= 1.2146 (max= 1.5732), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:18:40,370 - root - INFO - Step 19780: lr=1.00E-05, loss= 1.2146 (max= 1.5732), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:18:56,294 - root - INFO - Step 19790: lr=1.00E-05, loss= 1.2185 (max= 1.5743), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:18:56,294 - root - INFO - Step 19790: lr=1.00E-05, loss= 1.2185 (max= 1.5743), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:18:56,294 - root - INFO - Step 19790: lr=1.00E-05, loss= 1.2185 (max= 1.5743), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:18:56,294 - root - INFO - Step 19790: lr=1.00E-05, loss= 1.2185 (max= 1.5743), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:18:56,294 - root - INFO - Step 19790: lr=1.00E-05, loss= 1.2185 (max= 1.5743), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:18:56,294 - root - INFO - Step 19790: lr=1.00E-05, loss= 1.2185 (max= 1.5743), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:18:56,294 - root - INFO - Step 19790: lr=1.00E-05, loss= 1.2185 (max= 1.5743), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:18:56,294 - root - INFO - Step 19790: lr=1.00E-05, loss= 1.2185 (max= 1.5743), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:19:12,219 - root - INFO - Step 19800: lr=1.00E-05, loss= 1.2273 (max= 1.7135), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:19:12,219 - root - INFO - Step 19800: lr=1.00E-05, loss= 1.2273 (max= 1.7135), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:19:12,219 - root - INFO - Step 19800: lr=1.00E-05, loss= 1.2273 (max= 1.7135), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:19:12,219 - root - INFO - Step 19800: lr=1.00E-05, loss= 1.2273 (max= 1.7135), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:19:12,220 - root - INFO - Step 19800: lr=1.00E-05, loss= 1.2273 (max= 1.7135), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:19:12,220 - root - INFO - Step 19800: lr=1.00E-05, loss= 1.2273 (max= 1.7135), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:19:12,220 - root - INFO - Step 19800: lr=1.00E-05, loss= 1.2273 (max= 1.7135), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:19:12,220 - root - INFO - Step 19800: lr=1.00E-05, loss= 1.2273 (max= 1.7135), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:19:28,151 - root - INFO - Step 19810: lr=1.00E-05, loss= 1.2160 (max= 1.5484), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:19:28,152 - root - INFO - Step 19810: lr=1.00E-05, loss= 1.2160 (max= 1.5484), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:19:28,152 - root - INFO - Step 19810: lr=1.00E-05, loss= 1.2160 (max= 1.5484), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:19:28,152 - root - INFO - Step 19810: lr=1.00E-05, loss= 1.2160 (max= 1.5484), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:19:28,152 - root - INFO - Step 19810: lr=1.00E-05, loss= 1.2160 (max= 1.5484), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:19:28,152 - root - INFO - Step 19810: lr=1.00E-05, loss= 1.2160 (max= 1.5484), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:19:28,152 - root - INFO - Step 19810: lr=1.00E-05, loss= 1.2160 (max= 1.5484), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:19:28,152 - root - INFO - Step 19810: lr=1.00E-05, loss= 1.2160 (max= 1.5484), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:19:44,055 - root - INFO - Step 19820: lr=1.00E-05, loss= 1.2305 (max= 1.7331), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:19:44,055 - root - INFO - Step 19820: lr=1.00E-05, loss= 1.2305 (max= 1.7331), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:19:44,055 - root - INFO - Step 19820: lr=1.00E-05, loss= 1.2305 (max= 1.7331), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:19:44,056 - root - INFO - Step 19820: lr=1.00E-05, loss= 1.2305 (max= 1.7331), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:19:44,056 - root - INFO - Step 19820: lr=1.00E-05, loss= 1.2305 (max= 1.7331), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:19:44,056 - root - INFO - Step 19820: lr=1.00E-05, loss= 1.2305 (max= 1.7331), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:19:44,056 - root - INFO - Step 19820: lr=1.00E-05, loss= 1.2305 (max= 1.7331), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:19:44,056 - root - INFO - Step 19820: lr=1.00E-05, loss= 1.2305 (max= 1.7331), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:00,033 - root - INFO - Step 19830: lr=1.00E-05, loss= 1.2413 (max= 1.6668), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:20:00,033 - root - INFO - Step 19830: lr=1.00E-05, loss= 1.2413 (max= 1.6668), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:20:00,033 - root - INFO - Step 19830: lr=1.00E-05, loss= 1.2413 (max= 1.6668), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:20:00,033 - root - INFO - Step 19830: lr=1.00E-05, loss= 1.2413 (max= 1.6668), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:20:00,033 - root - INFO - Step 19830: lr=1.00E-05, loss= 1.2413 (max= 1.6668), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:20:00,033 - root - INFO - Step 19830: lr=1.00E-05, loss= 1.2413 (max= 1.6668), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:20:00,034 - root - INFO - Step 19830: lr=1.00E-05, loss= 1.2413 (max= 1.6668), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:20:00,034 - root - INFO - Step 19830: lr=1.00E-05, loss= 1.2413 (max= 1.6668), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:20:16,012 - root - INFO - Step 19840: lr=1.00E-05, loss= 1.2065 (max= 1.6016), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:16,012 - root - INFO - Step 19840: lr=1.00E-05, loss= 1.2065 (max= 1.6016), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:16,013 - root - INFO - Step 19840: lr=1.00E-05, loss= 1.2065 (max= 1.6016), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:16,013 - root - INFO - Step 19840: lr=1.00E-05, loss= 1.2065 (max= 1.6016), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:16,013 - root - INFO - Step 19840: lr=1.00E-05, loss= 1.2065 (max= 1.6016), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:16,013 - root - INFO - Step 19840: lr=1.00E-05, loss= 1.2065 (max= 1.6016), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:16,013 - root - INFO - Step 19840: lr=1.00E-05, loss= 1.2065 (max= 1.6016), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:16,013 - root - INFO - Step 19840: lr=1.00E-05, loss= 1.2065 (max= 1.6016), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:31,916 - root - INFO - Step 19850: lr=1.00E-05, loss= 1.1879 (max= 1.7153), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:31,916 - root - INFO - Step 19850: lr=1.00E-05, loss= 1.1879 (max= 1.7153), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:31,916 - root - INFO - Step 19850: lr=1.00E-05, loss= 1.1879 (max= 1.7153), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:31,916 - root - INFO - Step 19850: lr=1.00E-05, loss= 1.1879 (max= 1.7153), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:31,916 - root - INFO - Step 19850: lr=1.00E-05, loss= 1.1879 (max= 1.7153), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:31,916 - root - INFO - Step 19850: lr=1.00E-05, loss= 1.1879 (max= 1.7153), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:31,916 - root - INFO - Step 19850: lr=1.00E-05, loss= 1.1879 (max= 1.7153), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:31,917 - root - INFO - Step 19850: lr=1.00E-05, loss= 1.1879 (max= 1.7153), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:47,858 - root - INFO - Step 19860: lr=1.00E-05, loss= 1.2308 (max= 1.6922), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:47,859 - root - INFO - Step 19860: lr=1.00E-05, loss= 1.2308 (max= 1.6922), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:47,859 - root - INFO - Step 19860: lr=1.00E-05, loss= 1.2308 (max= 1.6922), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:47,859 - root - INFO - Step 19860: lr=1.00E-05, loss= 1.2308 (max= 1.6922), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:47,859 - root - INFO - Step 19860: lr=1.00E-05, loss= 1.2308 (max= 1.6922), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:47,859 - root - INFO - Step 19860: lr=1.00E-05, loss= 1.2308 (max= 1.6922), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:47,859 - root - INFO - Step 19860: lr=1.00E-05, loss= 1.2308 (max= 1.6922), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:20:47,859 - root - INFO - Step 19860: lr=1.00E-05, loss= 1.2308 (max= 1.6922), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:21:03,797 - root - INFO - Step 19870: lr=1.00E-05, loss= 1.1989 (max= 1.5201), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:21:03,797 - root - INFO - Step 19870: lr=1.00E-05, loss= 1.1989 (max= 1.5201), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:21:03,797 - root - INFO - Step 19870: lr=1.00E-05, loss= 1.1989 (max= 1.5201), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:21:03,797 - root - INFO - Step 19870: lr=1.00E-05, loss= 1.1989 (max= 1.5201), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:21:03,797 - root - INFO - Step 19870: lr=1.00E-05, loss= 1.1989 (max= 1.5201), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:21:03,798 - root - INFO - Step 19870: lr=1.00E-05, loss= 1.1989 (max= 1.5201), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:21:03,798 - root - INFO - Step 19870: lr=1.00E-05, loss= 1.1989 (max= 1.5201), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:21:03,798 - root - INFO - Step 19870: lr=1.00E-05, loss= 1.1989 (max= 1.5201), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:21:19,718 - root - INFO - Step 19880: lr=1.00E-05, loss= 1.2564 (max= 1.5703), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:21:19,718 - root - INFO - Step 19880: lr=1.00E-05, loss= 1.2564 (max= 1.5703), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:21:19,718 - root - INFO - Step 19880: lr=1.00E-05, loss= 1.2564 (max= 1.5703), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:21:19,718 - root - INFO - Step 19880: lr=1.00E-05, loss= 1.2564 (max= 1.5703), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:21:19,718 - root - INFO - Step 19880: lr=1.00E-05, loss= 1.2564 (max= 1.5703), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:21:19,718 - root - INFO - Step 19880: lr=1.00E-05, loss= 1.2564 (max= 1.5703), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:21:19,718 - root - INFO - Step 19880: lr=1.00E-05, loss= 1.2564 (max= 1.5703), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:21:19,718 - root - INFO - Step 19880: lr=1.00E-05, loss= 1.2564 (max= 1.5703), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:21:35,678 - root - INFO - Step 19890: lr=1.00E-05, loss= 1.2273 (max= 1.7731), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:21:35,678 - root - INFO - Step 19890: lr=1.00E-05, loss= 1.2273 (max= 1.7731), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:21:35,678 - root - INFO - Step 19890: lr=1.00E-05, loss= 1.2273 (max= 1.7731), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:21:35,678 - root - INFO - Step 19890: lr=1.00E-05, loss= 1.2273 (max= 1.7731), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:21:35,678 - root - INFO - Step 19890: lr=1.00E-05, loss= 1.2273 (max= 1.7731), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:21:35,678 - root - INFO - Step 19890: lr=1.00E-05, loss= 1.2273 (max= 1.7731), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:21:35,678 - root - INFO - Step 19890: lr=1.00E-05, loss= 1.2273 (max= 1.7731), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:21:35,678 - root - INFO - Step 19890: lr=1.00E-05, loss= 1.2273 (max= 1.7731), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:21:51,582 - root - INFO - Step 19900: lr=1.00E-05, loss= 1.1932 (max= 1.5913), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:21:51,582 - root - INFO - Step 19900: lr=1.00E-05, loss= 1.1932 (max= 1.5913), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:21:51,582 - root - INFO - Step 19900: lr=1.00E-05, loss= 1.1932 (max= 1.5913), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:21:51,583 - root - INFO - Step 19900: lr=1.00E-05, loss= 1.1932 (max= 1.5913), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:21:51,583 - root - INFO - Step 19900: lr=1.00E-05, loss= 1.1932 (max= 1.5913), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:21:51,583 - root - INFO - Step 19900: lr=1.00E-05, loss= 1.1932 (max= 1.5913), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:21:51,583 - root - INFO - Step 19900: lr=1.00E-05, loss= 1.1932 (max= 1.5913), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:21:51,583 - root - INFO - Step 19900: lr=1.00E-05, loss= 1.1932 (max= 1.5913), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:07,490 - root - INFO - Step 19910: lr=1.00E-05, loss= 1.2475 (max= 1.6167), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:22:07,490 - root - INFO - Step 19910: lr=1.00E-05, loss= 1.2475 (max= 1.6167), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:22:07,490 - root - INFO - Step 19910: lr=1.00E-05, loss= 1.2475 (max= 1.6167), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:22:07,490 - root - INFO - Step 19910: lr=1.00E-05, loss= 1.2475 (max= 1.6167), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:22:07,490 - root - INFO - Step 19910: lr=1.00E-05, loss= 1.2475 (max= 1.6167), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:22:07,490 - root - INFO - Step 19910: lr=1.00E-05, loss= 1.2475 (max= 1.6167), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:22:07,490 - root - INFO - Step 19910: lr=1.00E-05, loss= 1.2475 (max= 1.6167), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:22:07,490 - root - INFO - Step 19910: lr=1.00E-05, loss= 1.2475 (max= 1.6167), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:22:23,434 - root - INFO - Step 19920: lr=1.00E-05, loss= 1.1947 (max= 1.6405), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:23,435 - root - INFO - Step 19920: lr=1.00E-05, loss= 1.1947 (max= 1.6405), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:23,435 - root - INFO - Step 19920: lr=1.00E-05, loss= 1.1947 (max= 1.6405), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:23,435 - root - INFO - Step 19920: lr=1.00E-05, loss= 1.1947 (max= 1.6405), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:23,435 - root - INFO - Step 19920: lr=1.00E-05, loss= 1.1947 (max= 1.6405), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:23,435 - root - INFO - Step 19920: lr=1.00E-05, loss= 1.1947 (max= 1.6405), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:23,435 - root - INFO - Step 19920: lr=1.00E-05, loss= 1.1947 (max= 1.6405), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:23,435 - root - INFO - Step 19920: lr=1.00E-05, loss= 1.1947 (max= 1.6405), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:39,350 - root - INFO - Step 19930: lr=1.00E-05, loss= 1.2383 (max= 1.7471), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:39,351 - root - INFO - Step 19930: lr=1.00E-05, loss= 1.2383 (max= 1.7471), tps=20593, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:39,351 - root - INFO - Step 19930: lr=1.00E-05, loss= 1.2383 (max= 1.7471), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:39,351 - root - INFO - Step 19930: lr=1.00E-05, loss= 1.2383 (max= 1.7471), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:39,351 - root - INFO - Step 19930: lr=1.00E-05, loss= 1.2383 (max= 1.7471), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:39,351 - root - INFO - Step 19930: lr=1.00E-05, loss= 1.2383 (max= 1.7471), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:39,351 - root - INFO - Step 19930: lr=1.00E-05, loss= 1.2383 (max= 1.7471), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:39,351 - root - INFO - Step 19930: lr=1.00E-05, loss= 1.2383 (max= 1.7471), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:55,301 - root - INFO - Step 19940: lr=1.00E-05, loss= 1.2190 (max= 1.6870), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:55,301 - root - INFO - Step 19940: lr=1.00E-05, loss= 1.2190 (max= 1.6870), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:55,301 - root - INFO - Step 19940: lr=1.00E-05, loss= 1.2190 (max= 1.6870), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:55,302 - root - INFO - Step 19940: lr=1.00E-05, loss= 1.2190 (max= 1.6870), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:55,302 - root - INFO - Step 19940: lr=1.00E-05, loss= 1.2190 (max= 1.6870), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:55,302 - root - INFO - Step 19940: lr=1.00E-05, loss= 1.2190 (max= 1.6870), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:55,302 - root - INFO - Step 19940: lr=1.00E-05, loss= 1.2190 (max= 1.6870), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:22:55,302 - root - INFO - Step 19940: lr=1.00E-05, loss= 1.2190 (max= 1.6870), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:11,298 - root - INFO - Step 19950: lr=1.00E-05, loss= 1.2028 (max= 1.5689), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:11,298 - root - INFO - Step 19950: lr=1.00E-05, loss= 1.2028 (max= 1.5689), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:11,298 - root - INFO - Step 19950: lr=1.00E-05, loss= 1.2028 (max= 1.5689), tps=20488, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:11,298 - root - INFO - Step 19950: lr=1.00E-05, loss= 1.2028 (max= 1.5689), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:11,298 - root - INFO - Step 19950: lr=1.00E-05, loss= 1.2028 (max= 1.5689), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:11,298 - root - INFO - Step 19950: lr=1.00E-05, loss= 1.2028 (max= 1.5689), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:11,298 - root - INFO - Step 19950: lr=1.00E-05, loss= 1.2028 (max= 1.5689), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:11,298 - root - INFO - Step 19950: lr=1.00E-05, loss= 1.2028 (max= 1.5689), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:27,266 - root - INFO - Step 19960: lr=1.00E-05, loss= 1.2078 (max= 1.5784), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:23:27,266 - root - INFO - Step 19960: lr=1.00E-05, loss= 1.2078 (max= 1.5784), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:23:27,266 - root - INFO - Step 19960: lr=1.00E-05, loss= 1.2078 (max= 1.5784), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:23:27,266 - root - INFO - Step 19960: lr=1.00E-05, loss= 1.2078 (max= 1.5784), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:23:27,266 - root - INFO - Step 19960: lr=1.00E-05, loss= 1.2078 (max= 1.5784), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:23:27,266 - root - INFO - Step 19960: lr=1.00E-05, loss= 1.2078 (max= 1.5784), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:23:27,266 - root - INFO - Step 19960: lr=1.00E-05, loss= 1.2078 (max= 1.5784), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:23:27,266 - root - INFO - Step 19960: lr=1.00E-05, loss= 1.2078 (max= 1.5784), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:23:43,189 - root - INFO - Step 19970: lr=1.00E-05, loss= 1.1926 (max= 1.5579), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:43,189 - root - INFO - Step 19970: lr=1.00E-05, loss= 1.1926 (max= 1.5579), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:43,189 - root - INFO - Step 19970: lr=1.00E-05, loss= 1.1926 (max= 1.5579), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:43,189 - root - INFO - Step 19970: lr=1.00E-05, loss= 1.1926 (max= 1.5579), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:43,189 - root - INFO - Step 19970: lr=1.00E-05, loss= 1.1926 (max= 1.5579), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:43,189 - root - INFO - Step 19970: lr=1.00E-05, loss= 1.1926 (max= 1.5579), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:43,190 - root - INFO - Step 19970: lr=1.00E-05, loss= 1.1926 (max= 1.5579), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:43,190 - root - INFO - Step 19970: lr=1.00E-05, loss= 1.1926 (max= 1.5579), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:59,128 - root - INFO - Step 19980: lr=1.00E-05, loss= 1.1940 (max= 1.5889), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:59,128 - root - INFO - Step 19980: lr=1.00E-05, loss= 1.1940 (max= 1.5889), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:59,128 - root - INFO - Step 19980: lr=1.00E-05, loss= 1.1940 (max= 1.5889), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:59,129 - root - INFO - Step 19980: lr=1.00E-05, loss= 1.1940 (max= 1.5889), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:59,129 - root - INFO - Step 19980: lr=1.00E-05, loss= 1.1940 (max= 1.5889), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:59,129 - root - INFO - Step 19980: lr=1.00E-05, loss= 1.1940 (max= 1.5889), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:59,129 - root - INFO - Step 19980: lr=1.00E-05, loss= 1.1940 (max= 1.5889), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:23:59,129 - root - INFO - Step 19980: lr=1.00E-05, loss= 1.1940 (max= 1.5889), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:24:15,032 - root - INFO - Step 19990: lr=1.00E-05, loss= 1.2132 (max= 1.6720), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:24:15,032 - root - INFO - Step 19990: lr=1.00E-05, loss= 1.2132 (max= 1.6720), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:24:15,032 - root - INFO - Step 19990: lr=1.00E-05, loss= 1.2132 (max= 1.6720), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:24:15,032 - root - INFO - Step 19990: lr=1.00E-05, loss= 1.2132 (max= 1.6720), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:24:15,032 - root - INFO - Step 19990: lr=1.00E-05, loss= 1.2132 (max= 1.6720), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:24:15,032 - root - INFO - Step 19990: lr=1.00E-05, loss= 1.2132 (max= 1.6720), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:24:15,032 - root - INFO - Step 19990: lr=1.00E-05, loss= 1.2132 (max= 1.6720), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:24:15,032 - root - INFO - Step 19990: lr=1.00E-05, loss= 1.2132 (max= 1.6720), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-20000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-20000! Save time: 4.454166650772095 +2025-10-24 19:24:30,956 - root - INFO - Step 20000: lr=1.00E-05, loss= 1.2079 (max= 1.7874), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:24:30,956 - root - INFO - Step 20000: lr=1.00E-05, loss= 1.2079 (max= 1.7874), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:24:30,956 - root - INFO - Saving a full checkpoint at step 20000 +2025-10-24 19:24:30,956 - root - INFO - Step 20000: lr=1.00E-05, loss= 1.2079 (max= 1.7874), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:24:30,956 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 19:24:30,956 - root - INFO - Saving a full checkpoint at step 20000 +2025-10-24 19:24:30,956 - root - INFO - Saving a full checkpoint at step 20000 +2025-10-24 19:24:30,956 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 19:24:30,956 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 19:24:30,956 - root - INFO - Step 20000: lr=1.00E-05, loss= 1.2079 (max= 1.7874), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:24:30,956 - root - INFO - Step 20000: lr=1.00E-05, loss= 1.2079 (max= 1.7874), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:24:30,957 - root - INFO - Saving a full checkpoint at step 20000 +2025-10-24 19:24:30,957 - root - INFO - Saving a full checkpoint at step 20000 +2025-10-24 19:24:30,957 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 19:24:30,957 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 19:24:30,956 - root - INFO - Step 20000: lr=1.00E-05, loss= 1.2079 (max= 1.7874), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:24:30,957 - root - INFO - Step 20000: lr=1.00E-05, loss= 1.2079 (max= 1.7874), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:24:30,957 - root - INFO - Saving a full checkpoint at step 20000 +2025-10-24 19:24:30,957 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 19:24:30,957 - root - INFO - Saving a full checkpoint at step 20000 +2025-10-24 19:24:30,957 - root - INFO - Step 20000: lr=1.00E-05, loss= 1.2079 (max= 1.7874), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:24:30,957 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 19:24:30,957 - root - INFO - Saving a full checkpoint at step 20000 +2025-10-24 19:24:30,957 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 19:24:46,824 - root - INFO - Finished saving the checkpoint in 15.87 seconds +2025-10-24 19:24:46,830 - root - INFO - Finished saving the checkpoint in 15.87 seconds +2025-10-24 19:24:46,831 - root - INFO - Finished saving the checkpoint in 15.87 seconds +2025-10-24 19:24:46,831 - root - INFO - Finished saving the checkpoint in 15.87 seconds +2025-10-24 19:24:46,831 - root - INFO - Finished saving the checkpoint in 15.87 seconds +2025-10-24 19:24:46,831 - root - INFO - Finished saving the checkpoint in 15.87 seconds +2025-10-24 19:24:46,831 - root - INFO - Finished saving the checkpoint in 15.87 seconds +2025-10-24 19:24:46,832 - root - INFO - Finished saving the checkpoint in 15.87 seconds +2025-10-24 19:25:02,682 - root - INFO - Step 20010: lr=1.00E-05, loss= 1.2120 (max= 1.5348), tps=10330, mfu=21.52%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:02,682 - root - INFO - Step 20010: lr=1.00E-05, loss= 1.2120 (max= 1.5348), tps=10330, mfu=21.52%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:02,682 - root - INFO - Step 20010: lr=1.00E-05, loss= 1.2120 (max= 1.5348), tps=10330, mfu=21.52%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:02,682 - root - INFO - Step 20010: lr=1.00E-05, loss= 1.2120 (max= 1.5348), tps=10330, mfu=21.52%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:02,682 - root - INFO - Step 20010: lr=1.00E-05, loss= 1.2120 (max= 1.5348), tps=10330, mfu=21.52%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:02,683 - root - INFO - Step 20010: lr=1.00E-05, loss= 1.2120 (max= 1.5348), tps=10330, mfu=21.52%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:02,683 - root - INFO - Step 20010: lr=1.00E-05, loss= 1.2120 (max= 1.5348), tps=10330, mfu=21.52%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:02,683 - root - INFO - Step 20010: lr=1.00E-05, loss= 1.2120 (max= 1.5348), tps=10330, mfu=21.52%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:18,644 - root - INFO - Step 20020: lr=1.00E-05, loss= 1.1886 (max= 1.5667), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:18,644 - root - INFO - Step 20020: lr=1.00E-05, loss= 1.1886 (max= 1.5667), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:18,644 - root - INFO - Step 20020: lr=1.00E-05, loss= 1.1886 (max= 1.5667), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:18,644 - root - INFO - Step 20020: lr=1.00E-05, loss= 1.1886 (max= 1.5667), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:18,645 - root - INFO - Step 20020: lr=1.00E-05, loss= 1.1886 (max= 1.5667), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:18,645 - root - INFO - Step 20020: lr=1.00E-05, loss= 1.1886 (max= 1.5667), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:18,645 - root - INFO - Step 20020: lr=1.00E-05, loss= 1.1886 (max= 1.5667), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:18,645 - root - INFO - Step 20020: lr=1.00E-05, loss= 1.1886 (max= 1.5667), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:34,573 - root - INFO - Step 20030: lr=1.00E-05, loss= 1.2385 (max= 1.6608), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:34,573 - root - INFO - Step 20030: lr=1.00E-05, loss= 1.2385 (max= 1.6608), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:34,573 - root - INFO - Step 20030: lr=1.00E-05, loss= 1.2385 (max= 1.6608), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:34,573 - root - INFO - Step 20030: lr=1.00E-05, loss= 1.2385 (max= 1.6608), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:34,573 - root - INFO - Step 20030: lr=1.00E-05, loss= 1.2385 (max= 1.6608), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:34,573 - root - INFO - Step 20030: lr=1.00E-05, loss= 1.2385 (max= 1.6608), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:34,573 - root - INFO - Step 20030: lr=1.00E-05, loss= 1.2385 (max= 1.6608), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:34,573 - root - INFO - Step 20030: lr=1.00E-05, loss= 1.2385 (max= 1.6608), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:50,523 - root - INFO - Step 20040: lr=1.00E-05, loss= 1.1920 (max= 1.5979), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:50,523 - root - INFO - Step 20040: lr=1.00E-05, loss= 1.1920 (max= 1.5979), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:50,523 - root - INFO - Step 20040: lr=1.00E-05, loss= 1.1920 (max= 1.5979), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:50,523 - root - INFO - Step 20040: lr=1.00E-05, loss= 1.1920 (max= 1.5979), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:50,523 - root - INFO - Step 20040: lr=1.00E-05, loss= 1.1920 (max= 1.5979), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:50,523 - root - INFO - Step 20040: lr=1.00E-05, loss= 1.1920 (max= 1.5979), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:50,523 - root - INFO - Step 20040: lr=1.00E-05, loss= 1.1920 (max= 1.5979), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:25:50,523 - root - INFO - Step 20040: lr=1.00E-05, loss= 1.1920 (max= 1.5979), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:26:06,447 - root - INFO - Step 20050: lr=1.00E-05, loss= 1.2226 (max= 1.6856), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:06,447 - root - INFO - Step 20050: lr=1.00E-05, loss= 1.2226 (max= 1.6856), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:06,447 - root - INFO - Step 20050: lr=1.00E-05, loss= 1.2226 (max= 1.6856), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:06,447 - root - INFO - Step 20050: lr=1.00E-05, loss= 1.2226 (max= 1.6856), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:06,447 - root - INFO - Step 20050: lr=1.00E-05, loss= 1.2226 (max= 1.6856), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:06,447 - root - INFO - Step 20050: lr=1.00E-05, loss= 1.2226 (max= 1.6856), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:06,447 - root - INFO - Step 20050: lr=1.00E-05, loss= 1.2226 (max= 1.6856), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:06,447 - root - INFO - Step 20050: lr=1.00E-05, loss= 1.2226 (max= 1.6856), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:22,406 - root - INFO - Step 20060: lr=1.00E-05, loss= 1.2318 (max= 1.5815), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:22,407 - root - INFO - Step 20060: lr=1.00E-05, loss= 1.2318 (max= 1.5815), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:22,407 - root - INFO - Step 20060: lr=1.00E-05, loss= 1.2318 (max= 1.5815), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:22,407 - root - INFO - Step 20060: lr=1.00E-05, loss= 1.2318 (max= 1.5815), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:22,407 - root - INFO - Step 20060: lr=1.00E-05, loss= 1.2318 (max= 1.5815), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:22,407 - root - INFO - Step 20060: lr=1.00E-05, loss= 1.2318 (max= 1.5815), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:22,407 - root - INFO - Step 20060: lr=1.00E-05, loss= 1.2318 (max= 1.5815), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:22,407 - root - INFO - Step 20060: lr=1.00E-05, loss= 1.2318 (max= 1.5815), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:38,364 - root - INFO - Step 20070: lr=1.00E-05, loss= 1.2078 (max= 1.6352), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:38,364 - root - INFO - Step 20070: lr=1.00E-05, loss= 1.2078 (max= 1.6352), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:38,364 - root - INFO - Step 20070: lr=1.00E-05, loss= 1.2078 (max= 1.6352), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:38,364 - root - INFO - Step 20070: lr=1.00E-05, loss= 1.2078 (max= 1.6352), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:38,364 - root - INFO - Step 20070: lr=1.00E-05, loss= 1.2078 (max= 1.6352), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:38,364 - root - INFO - Step 20070: lr=1.00E-05, loss= 1.2078 (max= 1.6352), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:38,364 - root - INFO - Step 20070: lr=1.00E-05, loss= 1.2078 (max= 1.6352), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:38,364 - root - INFO - Step 20070: lr=1.00E-05, loss= 1.2078 (max= 1.6352), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:47,074 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:3667390 +2025-10-24 19:26:54,307 - root - INFO - Step 20080: lr=1.00E-05, loss= 1.2192 (max= 1.5900), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:54,307 - root - INFO - Step 20080: lr=1.00E-05, loss= 1.2192 (max= 1.5900), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:54,307 - root - INFO - Step 20080: lr=1.00E-05, loss= 1.2192 (max= 1.5900), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:54,307 - root - INFO - Step 20080: lr=1.00E-05, loss= 1.2192 (max= 1.5900), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:54,307 - root - INFO - Step 20080: lr=1.00E-05, loss= 1.2192 (max= 1.5900), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:54,307 - root - INFO - Step 20080: lr=1.00E-05, loss= 1.2192 (max= 1.5900), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:54,307 - root - INFO - Step 20080: lr=1.00E-05, loss= 1.2192 (max= 1.5900), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:26:54,307 - root - INFO - Step 20080: lr=1.00E-05, loss= 1.2192 (max= 1.5900), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:27:10,266 - root - INFO - Step 20090: lr=1.00E-05, loss= 1.2145 (max= 1.6025), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:27:10,266 - root - INFO - Step 20090: lr=1.00E-05, loss= 1.2145 (max= 1.6025), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:27:10,266 - root - INFO - Step 20090: lr=1.00E-05, loss= 1.2145 (max= 1.6025), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:27:10,266 - root - INFO - Step 20090: lr=1.00E-05, loss= 1.2145 (max= 1.6025), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:27:10,266 - root - INFO - Step 20090: lr=1.00E-05, loss= 1.2145 (max= 1.6025), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:27:10,266 - root - INFO - Step 20090: lr=1.00E-05, loss= 1.2145 (max= 1.6025), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:27:10,266 - root - INFO - Step 20090: lr=1.00E-05, loss= 1.2145 (max= 1.6025), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:27:10,266 - root - INFO - Step 20090: lr=1.00E-05, loss= 1.2145 (max= 1.6025), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:27:26,207 - root - INFO - Step 20100: lr=1.00E-05, loss= 1.2356 (max= 1.5481), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:27:26,207 - root - INFO - Step 20100: lr=1.00E-05, loss= 1.2356 (max= 1.5481), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:27:26,207 - root - INFO - Step 20100: lr=1.00E-05, loss= 1.2356 (max= 1.5481), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:27:26,207 - root - INFO - Step 20100: lr=1.00E-05, loss= 1.2356 (max= 1.5481), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:27:26,207 - root - INFO - Step 20100: lr=1.00E-05, loss= 1.2356 (max= 1.5481), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:27:26,207 - root - INFO - Step 20100: lr=1.00E-05, loss= 1.2356 (max= 1.5481), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:27:26,207 - root - INFO - Step 20100: lr=1.00E-05, loss= 1.2356 (max= 1.5481), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:27:26,207 - root - INFO - Step 20100: lr=1.00E-05, loss= 1.2356 (max= 1.5481), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:27:42,116 - root - INFO - Step 20110: lr=1.00E-05, loss= 1.2229 (max= 1.6584), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:27:42,116 - root - INFO - Step 20110: lr=1.00E-05, loss= 1.2229 (max= 1.6584), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:27:42,116 - root - INFO - Step 20110: lr=1.00E-05, loss= 1.2229 (max= 1.6584), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:27:42,116 - root - INFO - Step 20110: lr=1.00E-05, loss= 1.2229 (max= 1.6584), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:27:42,117 - root - INFO - Step 20110: lr=1.00E-05, loss= 1.2229 (max= 1.6584), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:27:42,117 - root - INFO - Step 20110: lr=1.00E-05, loss= 1.2229 (max= 1.6584), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:27:42,117 - root - INFO - Step 20110: lr=1.00E-05, loss= 1.2229 (max= 1.6584), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:27:42,117 - root - INFO - Step 20110: lr=1.00E-05, loss= 1.2229 (max= 1.6584), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:27:58,047 - root - INFO - Step 20120: lr=1.00E-05, loss= 1.2423 (max= 1.7777), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:27:58,047 - root - INFO - Step 20120: lr=1.00E-05, loss= 1.2423 (max= 1.7777), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:27:58,047 - root - INFO - Step 20120: lr=1.00E-05, loss= 1.2423 (max= 1.7777), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:27:58,047 - root - INFO - Step 20120: lr=1.00E-05, loss= 1.2423 (max= 1.7777), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:27:58,047 - root - INFO - Step 20120: lr=1.00E-05, loss= 1.2423 (max= 1.7777), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:27:58,047 - root - INFO - Step 20120: lr=1.00E-05, loss= 1.2423 (max= 1.7777), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:27:58,047 - root - INFO - Step 20120: lr=1.00E-05, loss= 1.2423 (max= 1.7777), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:27:58,047 - root - INFO - Step 20120: lr=1.00E-05, loss= 1.2423 (max= 1.7777), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:14,020 - root - INFO - Step 20130: lr=1.00E-05, loss= 1.2500 (max= 1.6617), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:14,020 - root - INFO - Step 20130: lr=1.00E-05, loss= 1.2500 (max= 1.6617), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:14,020 - root - INFO - Step 20130: lr=1.00E-05, loss= 1.2500 (max= 1.6617), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:14,020 - root - INFO - Step 20130: lr=1.00E-05, loss= 1.2500 (max= 1.6617), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:14,020 - root - INFO - Step 20130: lr=1.00E-05, loss= 1.2500 (max= 1.6617), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:14,020 - root - INFO - Step 20130: lr=1.00E-05, loss= 1.2500 (max= 1.6617), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:14,020 - root - INFO - Step 20130: lr=1.00E-05, loss= 1.2500 (max= 1.6617), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:14,020 - root - INFO - Step 20130: lr=1.00E-05, loss= 1.2500 (max= 1.6617), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:29,962 - root - INFO - Step 20140: lr=1.00E-05, loss= 1.2297 (max= 1.5803), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:29,962 - root - INFO - Step 20140: lr=1.00E-05, loss= 1.2297 (max= 1.5803), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:29,962 - root - INFO - Step 20140: lr=1.00E-05, loss= 1.2297 (max= 1.5803), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:29,962 - root - INFO - Step 20140: lr=1.00E-05, loss= 1.2297 (max= 1.5803), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:29,962 - root - INFO - Step 20140: lr=1.00E-05, loss= 1.2297 (max= 1.5803), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:29,962 - root - INFO - Step 20140: lr=1.00E-05, loss= 1.2297 (max= 1.5803), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:29,962 - root - INFO - Step 20140: lr=1.00E-05, loss= 1.2297 (max= 1.5803), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:29,963 - root - INFO - Step 20140: lr=1.00E-05, loss= 1.2297 (max= 1.5803), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:45,871 - root - INFO - Step 20150: lr=1.00E-05, loss= 1.2669 (max= 1.6547), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:45,871 - root - INFO - Step 20150: lr=1.00E-05, loss= 1.2669 (max= 1.6547), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:45,871 - root - INFO - Step 20150: lr=1.00E-05, loss= 1.2669 (max= 1.6547), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:45,871 - root - INFO - Step 20150: lr=1.00E-05, loss= 1.2669 (max= 1.6547), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:45,871 - root - INFO - Step 20150: lr=1.00E-05, loss= 1.2669 (max= 1.6547), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:45,871 - root - INFO - Step 20150: lr=1.00E-05, loss= 1.2669 (max= 1.6547), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:45,871 - root - INFO - Step 20150: lr=1.00E-05, loss= 1.2669 (max= 1.6547), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:28:45,871 - root - INFO - Step 20150: lr=1.00E-05, loss= 1.2669 (max= 1.6547), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:01,829 - root - INFO - Step 20160: lr=1.00E-05, loss= 1.2053 (max= 1.7402), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:29:01,829 - root - INFO - Step 20160: lr=1.00E-05, loss= 1.2053 (max= 1.7402), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:29:01,829 - root - INFO - Step 20160: lr=1.00E-05, loss= 1.2053 (max= 1.7402), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:29:01,829 - root - INFO - Step 20160: lr=1.00E-05, loss= 1.2053 (max= 1.7402), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:29:01,829 - root - INFO - Step 20160: lr=1.00E-05, loss= 1.2053 (max= 1.7402), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:29:01,829 - root - INFO - Step 20160: lr=1.00E-05, loss= 1.2053 (max= 1.7402), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:29:01,829 - root - INFO - Step 20160: lr=1.00E-05, loss= 1.2053 (max= 1.7402), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:29:01,829 - root - INFO - Step 20160: lr=1.00E-05, loss= 1.2053 (max= 1.7402), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:29:17,770 - root - INFO - Step 20170: lr=1.00E-05, loss= 1.2575 (max= 1.6467), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:17,770 - root - INFO - Step 20170: lr=1.00E-05, loss= 1.2575 (max= 1.6467), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:17,770 - root - INFO - Step 20170: lr=1.00E-05, loss= 1.2575 (max= 1.6467), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:17,770 - root - INFO - Step 20170: lr=1.00E-05, loss= 1.2575 (max= 1.6467), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:17,770 - root - INFO - Step 20170: lr=1.00E-05, loss= 1.2575 (max= 1.6467), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:17,770 - root - INFO - Step 20170: lr=1.00E-05, loss= 1.2575 (max= 1.6467), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:17,770 - root - INFO - Step 20170: lr=1.00E-05, loss= 1.2575 (max= 1.6467), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:17,770 - root - INFO - Step 20170: lr=1.00E-05, loss= 1.2575 (max= 1.6467), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:33,696 - root - INFO - Step 20180: lr=1.00E-05, loss= 1.2275 (max= 1.5073), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:33,696 - root - INFO - Step 20180: lr=1.00E-05, loss= 1.2275 (max= 1.5073), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:33,696 - root - INFO - Step 20180: lr=1.00E-05, loss= 1.2275 (max= 1.5073), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:33,696 - root - INFO - Step 20180: lr=1.00E-05, loss= 1.2275 (max= 1.5073), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:33,696 - root - INFO - Step 20180: lr=1.00E-05, loss= 1.2275 (max= 1.5073), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:33,696 - root - INFO - Step 20180: lr=1.00E-05, loss= 1.2275 (max= 1.5073), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:33,696 - root - INFO - Step 20180: lr=1.00E-05, loss= 1.2275 (max= 1.5073), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:33,696 - root - INFO - Step 20180: lr=1.00E-05, loss= 1.2275 (max= 1.5073), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:49,619 - root - INFO - Step 20190: lr=1.00E-05, loss= 1.2463 (max= 1.6133), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:49,619 - root - INFO - Step 20190: lr=1.00E-05, loss= 1.2463 (max= 1.6133), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:49,619 - root - INFO - Step 20190: lr=1.00E-05, loss= 1.2463 (max= 1.6133), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:49,619 - root - INFO - Step 20190: lr=1.00E-05, loss= 1.2463 (max= 1.6133), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:49,619 - root - INFO - Step 20190: lr=1.00E-05, loss= 1.2463 (max= 1.6133), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:49,619 - root - INFO - Step 20190: lr=1.00E-05, loss= 1.2463 (max= 1.6133), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:49,619 - root - INFO - Step 20190: lr=1.00E-05, loss= 1.2463 (max= 1.6133), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:29:49,619 - root - INFO - Step 20190: lr=1.00E-05, loss= 1.2463 (max= 1.6133), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:05,590 - root - INFO - Step 20200: lr=1.00E-05, loss= 1.2190 (max= 1.7148), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:30:05,590 - root - INFO - Step 20200: lr=1.00E-05, loss= 1.2190 (max= 1.7148), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:30:05,590 - root - INFO - Step 20200: lr=1.00E-05, loss= 1.2190 (max= 1.7148), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:30:05,590 - root - INFO - Step 20200: lr=1.00E-05, loss= 1.2190 (max= 1.7148), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:30:05,590 - root - INFO - Step 20200: lr=1.00E-05, loss= 1.2190 (max= 1.7148), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:30:05,590 - root - INFO - Step 20200: lr=1.00E-05, loss= 1.2190 (max= 1.7148), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:30:05,590 - root - INFO - Step 20200: lr=1.00E-05, loss= 1.2190 (max= 1.7148), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:30:05,590 - root - INFO - Step 20200: lr=1.00E-05, loss= 1.2190 (max= 1.7148), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:30:21,514 - root - INFO - Step 20210: lr=1.00E-05, loss= 1.2084 (max= 1.6223), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:21,514 - root - INFO - Step 20210: lr=1.00E-05, loss= 1.2084 (max= 1.6223), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:21,514 - root - INFO - Step 20210: lr=1.00E-05, loss= 1.2084 (max= 1.6223), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:21,514 - root - INFO - Step 20210: lr=1.00E-05, loss= 1.2084 (max= 1.6223), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:21,514 - root - INFO - Step 20210: lr=1.00E-05, loss= 1.2084 (max= 1.6223), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:21,514 - root - INFO - Step 20210: lr=1.00E-05, loss= 1.2084 (max= 1.6223), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:21,514 - root - INFO - Step 20210: lr=1.00E-05, loss= 1.2084 (max= 1.6223), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:21,514 - root - INFO - Step 20210: lr=1.00E-05, loss= 1.2084 (max= 1.6223), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:37,493 - root - INFO - Step 20220: lr=1.00E-05, loss= 1.1916 (max= 1.5726), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:37,494 - root - INFO - Step 20220: lr=1.00E-05, loss= 1.1916 (max= 1.5726), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:37,494 - root - INFO - Step 20220: lr=1.00E-05, loss= 1.1916 (max= 1.5726), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:37,494 - root - INFO - Step 20220: lr=1.00E-05, loss= 1.1916 (max= 1.5726), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:37,494 - root - INFO - Step 20220: lr=1.00E-05, loss= 1.1916 (max= 1.5726), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:37,494 - root - INFO - Step 20220: lr=1.00E-05, loss= 1.1916 (max= 1.5726), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:37,494 - root - INFO - Step 20220: lr=1.00E-05, loss= 1.1916 (max= 1.5726), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:37,494 - root - INFO - Step 20220: lr=1.00E-05, loss= 1.1916 (max= 1.5726), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:53,414 - root - INFO - Step 20230: lr=1.00E-05, loss= 1.2448 (max= 1.6103), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:53,414 - root - INFO - Step 20230: lr=1.00E-05, loss= 1.2448 (max= 1.6103), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:53,414 - root - INFO - Step 20230: lr=1.00E-05, loss= 1.2448 (max= 1.6103), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:53,414 - root - INFO - Step 20230: lr=1.00E-05, loss= 1.2448 (max= 1.6103), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:53,414 - root - INFO - Step 20230: lr=1.00E-05, loss= 1.2448 (max= 1.6103), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:53,414 - root - INFO - Step 20230: lr=1.00E-05, loss= 1.2448 (max= 1.6103), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:53,414 - root - INFO - Step 20230: lr=1.00E-05, loss= 1.2448 (max= 1.6103), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:30:53,414 - root - INFO - Step 20230: lr=1.00E-05, loss= 1.2448 (max= 1.6103), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:31:09,333 - root - INFO - Step 20240: lr=1.00E-05, loss= 1.2217 (max= 1.6236), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:09,333 - root - INFO - Step 20240: lr=1.00E-05, loss= 1.2217 (max= 1.6236), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:09,334 - root - INFO - Step 20240: lr=1.00E-05, loss= 1.2217 (max= 1.6236), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:09,334 - root - INFO - Step 20240: lr=1.00E-05, loss= 1.2217 (max= 1.6236), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:09,334 - root - INFO - Step 20240: lr=1.00E-05, loss= 1.2217 (max= 1.6236), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:09,334 - root - INFO - Step 20240: lr=1.00E-05, loss= 1.2217 (max= 1.6236), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:09,334 - root - INFO - Step 20240: lr=1.00E-05, loss= 1.2217 (max= 1.6236), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:09,334 - root - INFO - Step 20240: lr=1.00E-05, loss= 1.2217 (max= 1.6236), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:25,291 - root - INFO - Step 20250: lr=1.00E-05, loss= 1.2134 (max= 1.5875), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:31:25,292 - root - INFO - Step 20250: lr=1.00E-05, loss= 1.2134 (max= 1.5875), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:31:25,292 - root - INFO - Step 20250: lr=1.00E-05, loss= 1.2134 (max= 1.5875), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:31:25,292 - root - INFO - Step 20250: lr=1.00E-05, loss= 1.2134 (max= 1.5875), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:31:25,292 - root - INFO - Step 20250: lr=1.00E-05, loss= 1.2134 (max= 1.5875), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:31:25,292 - root - INFO - Step 20250: lr=1.00E-05, loss= 1.2134 (max= 1.5875), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:31:25,292 - root - INFO - Step 20250: lr=1.00E-05, loss= 1.2134 (max= 1.5875), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:31:25,292 - root - INFO - Step 20250: lr=1.00E-05, loss= 1.2134 (max= 1.5875), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:31:41,243 - root - INFO - Step 20260: lr=1.00E-05, loss= 1.1899 (max= 1.4530), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:41,243 - root - INFO - Step 20260: lr=1.00E-05, loss= 1.1899 (max= 1.4530), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:41,243 - root - INFO - Step 20260: lr=1.00E-05, loss= 1.1899 (max= 1.4530), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:41,243 - root - INFO - Step 20260: lr=1.00E-05, loss= 1.1899 (max= 1.4530), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:41,243 - root - INFO - Step 20260: lr=1.00E-05, loss= 1.1899 (max= 1.4530), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:41,243 - root - INFO - Step 20260: lr=1.00E-05, loss= 1.1899 (max= 1.4530), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:41,243 - root - INFO - Step 20260: lr=1.00E-05, loss= 1.1899 (max= 1.4530), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:41,243 - root - INFO - Step 20260: lr=1.00E-05, loss= 1.1899 (max= 1.4530), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:57,184 - root - INFO - Step 20270: lr=1.00E-05, loss= 1.2301 (max= 1.7482), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:57,184 - root - INFO - Step 20270: lr=1.00E-05, loss= 1.2301 (max= 1.7482), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:57,184 - root - INFO - Step 20270: lr=1.00E-05, loss= 1.2301 (max= 1.7482), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:57,184 - root - INFO - Step 20270: lr=1.00E-05, loss= 1.2301 (max= 1.7482), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:57,184 - root - INFO - Step 20270: lr=1.00E-05, loss= 1.2301 (max= 1.7482), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:57,184 - root - INFO - Step 20270: lr=1.00E-05, loss= 1.2301 (max= 1.7482), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:57,184 - root - INFO - Step 20270: lr=1.00E-05, loss= 1.2301 (max= 1.7482), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:31:57,184 - root - INFO - Step 20270: lr=1.00E-05, loss= 1.2301 (max= 1.7482), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:32:13,133 - root - INFO - Step 20280: lr=1.00E-05, loss= 1.2123 (max= 1.6239), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:32:13,133 - root - INFO - Step 20280: lr=1.00E-05, loss= 1.2123 (max= 1.6239), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:32:13,133 - root - INFO - Step 20280: lr=1.00E-05, loss= 1.2123 (max= 1.6239), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:32:13,133 - root - INFO - Step 20280: lr=1.00E-05, loss= 1.2123 (max= 1.6239), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:32:13,133 - root - INFO - Step 20280: lr=1.00E-05, loss= 1.2123 (max= 1.6239), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:32:13,133 - root - INFO - Step 20280: lr=1.00E-05, loss= 1.2123 (max= 1.6239), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:32:13,133 - root - INFO - Step 20280: lr=1.00E-05, loss= 1.2123 (max= 1.6239), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:32:13,133 - root - INFO - Step 20280: lr=1.00E-05, loss= 1.2123 (max= 1.6239), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:32:29,127 - root - INFO - Step 20290: lr=1.00E-05, loss= 1.1994 (max= 1.5828), tps=20491, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:32:29,127 - root - INFO - Step 20290: lr=1.00E-05, loss= 1.1994 (max= 1.5828), tps=20491, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:32:29,127 - root - INFO - Step 20290: lr=1.00E-05, loss= 1.1994 (max= 1.5828), tps=20491, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:32:29,127 - root - INFO - Step 20290: lr=1.00E-05, loss= 1.1994 (max= 1.5828), tps=20491, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:32:29,127 - root - INFO - Step 20290: lr=1.00E-05, loss= 1.1994 (max= 1.5828), tps=20491, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:32:29,128 - root - INFO - Step 20290: lr=1.00E-05, loss= 1.1994 (max= 1.5828), tps=20491, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:32:29,128 - root - INFO - Step 20290: lr=1.00E-05, loss= 1.1994 (max= 1.5828), tps=20491, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:32:29,128 - root - INFO - Step 20290: lr=1.00E-05, loss= 1.1994 (max= 1.5828), tps=20491, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:32:45,059 - root - INFO - Step 20300: lr=1.00E-05, loss= 1.2375 (max= 1.5594), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:32:45,059 - root - INFO - Step 20300: lr=1.00E-05, loss= 1.2375 (max= 1.5594), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:32:45,059 - root - INFO - Step 20300: lr=1.00E-05, loss= 1.2375 (max= 1.5594), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:32:45,059 - root - INFO - Step 20300: lr=1.00E-05, loss= 1.2375 (max= 1.5594), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:32:45,059 - root - INFO - Step 20300: lr=1.00E-05, loss= 1.2375 (max= 1.5594), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:32:45,059 - root - INFO - Step 20300: lr=1.00E-05, loss= 1.2375 (max= 1.5594), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:32:45,059 - root - INFO - Step 20300: lr=1.00E-05, loss= 1.2375 (max= 1.5594), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:32:45,059 - root - INFO - Step 20300: lr=1.00E-05, loss= 1.2375 (max= 1.5594), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:33:00,942 - root - INFO - Step 20310: lr=1.00E-05, loss= 1.2246 (max= 1.6547), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:33:00,943 - root - INFO - Step 20310: lr=1.00E-05, loss= 1.2246 (max= 1.6547), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:33:00,943 - root - INFO - Step 20310: lr=1.00E-05, loss= 1.2246 (max= 1.6547), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:33:00,943 - root - INFO - Step 20310: lr=1.00E-05, loss= 1.2246 (max= 1.6547), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:33:00,943 - root - INFO - Step 20310: lr=1.00E-05, loss= 1.2246 (max= 1.6547), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:33:00,943 - root - INFO - Step 20310: lr=1.00E-05, loss= 1.2246 (max= 1.6547), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:33:00,943 - root - INFO - Step 20310: lr=1.00E-05, loss= 1.2246 (max= 1.6547), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:33:00,943 - root - INFO - Step 20310: lr=1.00E-05, loss= 1.2246 (max= 1.6547), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:33:16,875 - root - INFO - Step 20320: lr=1.00E-05, loss= 1.1788 (max= 1.5466), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:33:16,875 - root - INFO - Step 20320: lr=1.00E-05, loss= 1.1788 (max= 1.5466), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:33:16,875 - root - INFO - Step 20320: lr=1.00E-05, loss= 1.1788 (max= 1.5466), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:33:16,875 - root - INFO - Step 20320: lr=1.00E-05, loss= 1.1788 (max= 1.5466), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:33:16,875 - root - INFO - Step 20320: lr=1.00E-05, loss= 1.1788 (max= 1.5466), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:33:16,875 - root - INFO - Step 20320: lr=1.00E-05, loss= 1.1788 (max= 1.5466), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:33:16,875 - root - INFO - Step 20320: lr=1.00E-05, loss= 1.1788 (max= 1.5466), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:33:16,875 - root - INFO - Step 20320: lr=1.00E-05, loss= 1.1788 (max= 1.5466), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:33:32,842 - root - INFO - Step 20330: lr=1.00E-05, loss= 1.2029 (max= 1.6246), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:33:32,842 - root - INFO - Step 20330: lr=1.00E-05, loss= 1.2029 (max= 1.6246), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:33:32,842 - root - INFO - Step 20330: lr=1.00E-05, loss= 1.2029 (max= 1.6246), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:33:32,842 - root - INFO - Step 20330: lr=1.00E-05, loss= 1.2029 (max= 1.6246), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:33:32,842 - root - INFO - Step 20330: lr=1.00E-05, loss= 1.2029 (max= 1.6246), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:33:32,842 - root - INFO - Step 20330: lr=1.00E-05, loss= 1.2029 (max= 1.6246), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:33:32,842 - root - INFO - Step 20330: lr=1.00E-05, loss= 1.2029 (max= 1.6246), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:33:32,842 - root - INFO - Step 20330: lr=1.00E-05, loss= 1.2029 (max= 1.6246), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:33:48,804 - root - INFO - Step 20340: lr=1.00E-05, loss= 1.2492 (max= 1.7747), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:33:48,804 - root - INFO - Step 20340: lr=1.00E-05, loss= 1.2492 (max= 1.7747), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:33:48,804 - root - INFO - Step 20340: lr=1.00E-05, loss= 1.2492 (max= 1.7747), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:33:48,804 - root - INFO - Step 20340: lr=1.00E-05, loss= 1.2492 (max= 1.7747), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:33:48,804 - root - INFO - Step 20340: lr=1.00E-05, loss= 1.2492 (max= 1.7747), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:33:48,804 - root - INFO - Step 20340: lr=1.00E-05, loss= 1.2492 (max= 1.7747), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:33:48,804 - root - INFO - Step 20340: lr=1.00E-05, loss= 1.2492 (max= 1.7747), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:33:48,804 - root - INFO - Step 20340: lr=1.00E-05, loss= 1.2492 (max= 1.7747), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:34:04,718 - root - INFO - Step 20350: lr=1.00E-05, loss= 1.2456 (max= 1.6389), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:34:04,718 - root - INFO - Step 20350: lr=1.00E-05, loss= 1.2456 (max= 1.6389), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:34:04,718 - root - INFO - Step 20350: lr=1.00E-05, loss= 1.2456 (max= 1.6389), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:34:04,718 - root - INFO - Step 20350: lr=1.00E-05, loss= 1.2456 (max= 1.6389), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:34:04,718 - root - INFO - Step 20350: lr=1.00E-05, loss= 1.2456 (max= 1.6389), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:34:04,718 - root - INFO - Step 20350: lr=1.00E-05, loss= 1.2456 (max= 1.6389), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:34:04,719 - root - INFO - Step 20350: lr=1.00E-05, loss= 1.2456 (max= 1.6389), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:34:04,719 - root - INFO - Step 20350: lr=1.00E-05, loss= 1.2456 (max= 1.6389), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:34:20,707 - root - INFO - Step 20360: lr=1.00E-05, loss= 1.2198 (max= 1.5700), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:34:20,707 - root - INFO - Step 20360: lr=1.00E-05, loss= 1.2198 (max= 1.5700), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:34:20,707 - root - INFO - Step 20360: lr=1.00E-05, loss= 1.2198 (max= 1.5700), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:34:20,707 - root - INFO - Step 20360: lr=1.00E-05, loss= 1.2198 (max= 1.5700), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:34:20,707 - root - INFO - Step 20360: lr=1.00E-05, loss= 1.2198 (max= 1.5700), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:34:20,707 - root - INFO - Step 20360: lr=1.00E-05, loss= 1.2198 (max= 1.5700), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:34:20,707 - root - INFO - Step 20360: lr=1.00E-05, loss= 1.2198 (max= 1.5700), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:34:20,707 - root - INFO - Step 20360: lr=1.00E-05, loss= 1.2198 (max= 1.5700), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:34:36,683 - root - INFO - Step 20370: lr=1.00E-05, loss= 1.2086 (max= 1.6807), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:34:36,683 - root - INFO - Step 20370: lr=1.00E-05, loss= 1.2086 (max= 1.6807), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:34:36,683 - root - INFO - Step 20370: lr=1.00E-05, loss= 1.2086 (max= 1.6807), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:34:36,683 - root - INFO - Step 20370: lr=1.00E-05, loss= 1.2086 (max= 1.6807), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:34:36,683 - root - INFO - Step 20370: lr=1.00E-05, loss= 1.2086 (max= 1.6807), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:34:36,683 - root - INFO - Step 20370: lr=1.00E-05, loss= 1.2086 (max= 1.6807), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:34:36,683 - root - INFO - Step 20370: lr=1.00E-05, loss= 1.2086 (max= 1.6807), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:34:36,683 - root - INFO - Step 20370: lr=1.00E-05, loss= 1.2086 (max= 1.6807), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:34:52,679 - root - INFO - Step 20380: lr=1.00E-05, loss= 1.2300 (max= 1.5421), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:34:52,679 - root - INFO - Step 20380: lr=1.00E-05, loss= 1.2300 (max= 1.5421), tps=20491, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:34:52,679 - root - INFO - Step 20380: lr=1.00E-05, loss= 1.2300 (max= 1.5421), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:34:52,679 - root - INFO - Step 20380: lr=1.00E-05, loss= 1.2300 (max= 1.5421), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:34:52,679 - root - INFO - Step 20380: lr=1.00E-05, loss= 1.2300 (max= 1.5421), tps=20491, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:34:52,679 - root - INFO - Step 20380: lr=1.00E-05, loss= 1.2300 (max= 1.5421), tps=20491, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:34:52,679 - root - INFO - Step 20380: lr=1.00E-05, loss= 1.2300 (max= 1.5421), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:34:52,679 - root - INFO - Step 20380: lr=1.00E-05, loss= 1.2300 (max= 1.5421), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:35:08,593 - root - INFO - Step 20390: lr=1.00E-05, loss= 1.2109 (max= 1.5643), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:35:08,593 - root - INFO - Step 20390: lr=1.00E-05, loss= 1.2109 (max= 1.5643), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:35:08,593 - root - INFO - Step 20390: lr=1.00E-05, loss= 1.2109 (max= 1.5643), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:35:08,593 - root - INFO - Step 20390: lr=1.00E-05, loss= 1.2109 (max= 1.5643), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:35:08,593 - root - INFO - Step 20390: lr=1.00E-05, loss= 1.2109 (max= 1.5643), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:35:08,593 - root - INFO - Step 20390: lr=1.00E-05, loss= 1.2109 (max= 1.5643), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:35:08,593 - root - INFO - Step 20390: lr=1.00E-05, loss= 1.2109 (max= 1.5643), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:35:08,593 - root - INFO - Step 20390: lr=1.00E-05, loss= 1.2109 (max= 1.5643), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:35:24,552 - root - INFO - Step 20400: lr=1.00E-05, loss= 1.2001 (max= 1.5338), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:35:24,552 - root - INFO - Step 20400: lr=1.00E-05, loss= 1.2001 (max= 1.5338), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:35:24,552 - root - INFO - Step 20400: lr=1.00E-05, loss= 1.2001 (max= 1.5338), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:35:24,552 - root - INFO - Step 20400: lr=1.00E-05, loss= 1.2001 (max= 1.5338), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:35:24,553 - root - INFO - Step 20400: lr=1.00E-05, loss= 1.2001 (max= 1.5338), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:35:24,553 - root - INFO - Step 20400: lr=1.00E-05, loss= 1.2001 (max= 1.5338), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:35:24,553 - root - INFO - Step 20400: lr=1.00E-05, loss= 1.2001 (max= 1.5338), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:35:24,553 - root - INFO - Step 20400: lr=1.00E-05, loss= 1.2001 (max= 1.5338), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:35:40,477 - root - INFO - Step 20410: lr=1.00E-05, loss= 1.1755 (max= 1.5966), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:35:40,477 - root - INFO - Step 20410: lr=1.00E-05, loss= 1.1755 (max= 1.5966), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:35:40,478 - root - INFO - Step 20410: lr=1.00E-05, loss= 1.1755 (max= 1.5966), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:35:40,478 - root - INFO - Step 20410: lr=1.00E-05, loss= 1.1755 (max= 1.5966), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:35:40,478 - root - INFO - Step 20410: lr=1.00E-05, loss= 1.1755 (max= 1.5966), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:35:40,478 - root - INFO - Step 20410: lr=1.00E-05, loss= 1.1755 (max= 1.5966), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:35:40,478 - root - INFO - Step 20410: lr=1.00E-05, loss= 1.1755 (max= 1.5966), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:35:40,478 - root - INFO - Step 20410: lr=1.00E-05, loss= 1.1755 (max= 1.5966), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:35:56,464 - root - INFO - Step 20420: lr=1.00E-05, loss= 1.2200 (max= 1.6397), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:35:56,464 - root - INFO - Step 20420: lr=1.00E-05, loss= 1.2200 (max= 1.6397), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:35:56,464 - root - INFO - Step 20420: lr=1.00E-05, loss= 1.2200 (max= 1.6397), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:35:56,464 - root - INFO - Step 20420: lr=1.00E-05, loss= 1.2200 (max= 1.6397), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:35:56,464 - root - INFO - Step 20420: lr=1.00E-05, loss= 1.2200 (max= 1.6397), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:35:56,464 - root - INFO - Step 20420: lr=1.00E-05, loss= 1.2200 (max= 1.6397), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:35:56,464 - root - INFO - Step 20420: lr=1.00E-05, loss= 1.2200 (max= 1.6397), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:35:56,464 - root - INFO - Step 20420: lr=1.00E-05, loss= 1.2200 (max= 1.6397), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:36:12,415 - root - INFO - Step 20430: lr=1.00E-05, loss= 1.2529 (max= 1.7663), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:36:12,415 - root - INFO - Step 20430: lr=1.00E-05, loss= 1.2529 (max= 1.7663), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:36:12,415 - root - INFO - Step 20430: lr=1.00E-05, loss= 1.2529 (max= 1.7663), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:36:12,415 - root - INFO - Step 20430: lr=1.00E-05, loss= 1.2529 (max= 1.7663), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:36:12,415 - root - INFO - Step 20430: lr=1.00E-05, loss= 1.2529 (max= 1.7663), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:36:12,415 - root - INFO - Step 20430: lr=1.00E-05, loss= 1.2529 (max= 1.7663), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:36:12,415 - root - INFO - Step 20430: lr=1.00E-05, loss= 1.2529 (max= 1.7663), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:36:12,415 - root - INFO - Step 20430: lr=1.00E-05, loss= 1.2529 (max= 1.7663), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:36:28,433 - root - INFO - Step 20440: lr=1.00E-05, loss= 1.2403 (max= 1.6430), tps=20461, mfu=42.63%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:36:28,433 - root - INFO - Step 20440: lr=1.00E-05, loss= 1.2403 (max= 1.6430), tps=20461, mfu=42.63%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:36:28,433 - root - INFO - Step 20440: lr=1.00E-05, loss= 1.2403 (max= 1.6430), tps=20461, mfu=42.63%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:36:28,433 - root - INFO - Step 20440: lr=1.00E-05, loss= 1.2403 (max= 1.6430), tps=20461, mfu=42.63%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:36:28,433 - root - INFO - Step 20440: lr=1.00E-05, loss= 1.2403 (max= 1.6430), tps=20461, mfu=42.63%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:36:28,433 - root - INFO - Step 20440: lr=1.00E-05, loss= 1.2403 (max= 1.6430), tps=20462, mfu=42.63%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:36:28,433 - root - INFO - Step 20440: lr=1.00E-05, loss= 1.2403 (max= 1.6430), tps=20461, mfu=42.63%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:36:28,433 - root - INFO - Step 20440: lr=1.00E-05, loss= 1.2403 (max= 1.6430), tps=20462, mfu=42.63%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:36:44,401 - root - INFO - Step 20450: lr=1.00E-05, loss= 1.2231 (max= 1.7169), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:36:44,401 - root - INFO - Step 20450: lr=1.00E-05, loss= 1.2231 (max= 1.7169), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:36:44,401 - root - INFO - Step 20450: lr=1.00E-05, loss= 1.2231 (max= 1.7169), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:36:44,401 - root - INFO - Step 20450: lr=1.00E-05, loss= 1.2231 (max= 1.7169), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:36:44,401 - root - INFO - Step 20450: lr=1.00E-05, loss= 1.2231 (max= 1.7169), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:36:44,401 - root - INFO - Step 20450: lr=1.00E-05, loss= 1.2231 (max= 1.7169), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:36:44,401 - root - INFO - Step 20450: lr=1.00E-05, loss= 1.2231 (max= 1.7169), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:36:44,401 - root - INFO - Step 20450: lr=1.00E-05, loss= 1.2231 (max= 1.7169), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:37:00,292 - root - INFO - Step 20460: lr=1.00E-05, loss= 1.2015 (max= 1.5491), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:37:00,292 - root - INFO - Step 20460: lr=1.00E-05, loss= 1.2015 (max= 1.5491), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:37:00,292 - root - INFO - Step 20460: lr=1.00E-05, loss= 1.2015 (max= 1.5491), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:37:00,292 - root - INFO - Step 20460: lr=1.00E-05, loss= 1.2015 (max= 1.5491), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:37:00,292 - root - INFO - Step 20460: lr=1.00E-05, loss= 1.2015 (max= 1.5491), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:37:00,292 - root - INFO - Step 20460: lr=1.00E-05, loss= 1.2015 (max= 1.5491), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:37:00,292 - root - INFO - Step 20460: lr=1.00E-05, loss= 1.2015 (max= 1.5491), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:37:00,292 - root - INFO - Step 20460: lr=1.00E-05, loss= 1.2015 (max= 1.5491), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:37:16,273 - root - INFO - Step 20470: lr=1.00E-05, loss= 1.2147 (max= 1.6815), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:37:16,273 - root - INFO - Step 20470: lr=1.00E-05, loss= 1.2147 (max= 1.6815), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:37:16,273 - root - INFO - Step 20470: lr=1.00E-05, loss= 1.2147 (max= 1.6815), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:37:16,273 - root - INFO - Step 20470: lr=1.00E-05, loss= 1.2147 (max= 1.6815), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:37:16,273 - root - INFO - Step 20470: lr=1.00E-05, loss= 1.2147 (max= 1.6815), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:37:16,273 - root - INFO - Step 20470: lr=1.00E-05, loss= 1.2147 (max= 1.6815), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:37:16,273 - root - INFO - Step 20470: lr=1.00E-05, loss= 1.2147 (max= 1.6815), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:37:16,273 - root - INFO - Step 20470: lr=1.00E-05, loss= 1.2147 (max= 1.6815), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:37:32,226 - root - INFO - Step 20480: lr=1.00E-05, loss= 1.2166 (max= 1.6947), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:37:32,226 - root - INFO - Step 20480: lr=1.00E-05, loss= 1.2166 (max= 1.6947), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:37:32,226 - root - INFO - Step 20480: lr=1.00E-05, loss= 1.2166 (max= 1.6947), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:37:32,226 - root - INFO - Step 20480: lr=1.00E-05, loss= 1.2166 (max= 1.6947), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:37:32,226 - root - INFO - Step 20480: lr=1.00E-05, loss= 1.2166 (max= 1.6947), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:37:32,226 - root - INFO - Step 20480: lr=1.00E-05, loss= 1.2166 (max= 1.6947), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:37:32,226 - root - INFO - Step 20480: lr=1.00E-05, loss= 1.2166 (max= 1.6947), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:37:32,226 - root - INFO - Step 20480: lr=1.00E-05, loss= 1.2166 (max= 1.6947), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:37:48,156 - root - INFO - Step 20490: lr=1.00E-05, loss= 1.2054 (max= 1.6387), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:37:48,156 - root - INFO - Step 20490: lr=1.00E-05, loss= 1.2054 (max= 1.6387), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:37:48,156 - root - INFO - Step 20490: lr=1.00E-05, loss= 1.2054 (max= 1.6387), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:37:48,156 - root - INFO - Step 20490: lr=1.00E-05, loss= 1.2054 (max= 1.6387), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:37:48,156 - root - INFO - Step 20490: lr=1.00E-05, loss= 1.2054 (max= 1.6387), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:37:48,156 - root - INFO - Step 20490: lr=1.00E-05, loss= 1.2054 (max= 1.6387), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:37:48,156 - root - INFO - Step 20490: lr=1.00E-05, loss= 1.2054 (max= 1.6387), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:37:48,156 - root - INFO - Step 20490: lr=1.00E-05, loss= 1.2054 (max= 1.6387), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:38:04,106 - root - INFO - Step 20500: lr=1.00E-05, loss= 1.1966 (max= 1.5442), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:04,106 - root - INFO - Step 20500: lr=1.00E-05, loss= 1.1966 (max= 1.5442), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:04,107 - root - INFO - Step 20500: lr=1.00E-05, loss= 1.1966 (max= 1.5442), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:04,107 - root - INFO - Step 20500: lr=1.00E-05, loss= 1.1966 (max= 1.5442), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:04,107 - root - INFO - Step 20500: lr=1.00E-05, loss= 1.1966 (max= 1.5442), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:04,107 - root - INFO - Step 20500: lr=1.00E-05, loss= 1.1966 (max= 1.5442), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:04,107 - root - INFO - Step 20500: lr=1.00E-05, loss= 1.1966 (max= 1.5442), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:04,107 - root - INFO - Step 20500: lr=1.00E-05, loss= 1.1966 (max= 1.5442), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:20,053 - root - INFO - Step 20510: lr=1.00E-05, loss= 1.2138 (max= 1.7447), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:20,053 - root - INFO - Step 20510: lr=1.00E-05, loss= 1.2138 (max= 1.7447), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:20,053 - root - INFO - Step 20510: lr=1.00E-05, loss= 1.2138 (max= 1.7447), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:20,053 - root - INFO - Step 20510: lr=1.00E-05, loss= 1.2138 (max= 1.7447), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:20,053 - root - INFO - Step 20510: lr=1.00E-05, loss= 1.2138 (max= 1.7447), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:20,053 - root - INFO - Step 20510: lr=1.00E-05, loss= 1.2138 (max= 1.7447), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:20,053 - root - INFO - Step 20510: lr=1.00E-05, loss= 1.2138 (max= 1.7447), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:20,053 - root - INFO - Step 20510: lr=1.00E-05, loss= 1.2138 (max= 1.7447), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:35,980 - root - INFO - Step 20520: lr=1.00E-05, loss= 1.2225 (max= 1.7438), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:38:35,980 - root - INFO - Step 20520: lr=1.00E-05, loss= 1.2225 (max= 1.7438), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:38:35,981 - root - INFO - Step 20520: lr=1.00E-05, loss= 1.2225 (max= 1.7438), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:38:35,981 - root - INFO - Step 20520: lr=1.00E-05, loss= 1.2225 (max= 1.7438), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:38:35,981 - root - INFO - Step 20520: lr=1.00E-05, loss= 1.2225 (max= 1.7438), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:38:35,981 - root - INFO - Step 20520: lr=1.00E-05, loss= 1.2225 (max= 1.7438), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:38:35,981 - root - INFO - Step 20520: lr=1.00E-05, loss= 1.2225 (max= 1.7438), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:38:35,981 - root - INFO - Step 20520: lr=1.00E-05, loss= 1.2225 (max= 1.7438), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:38:51,894 - root - INFO - Step 20530: lr=1.00E-05, loss= 1.2053 (max= 1.6131), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:51,894 - root - INFO - Step 20530: lr=1.00E-05, loss= 1.2053 (max= 1.6131), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:51,894 - root - INFO - Step 20530: lr=1.00E-05, loss= 1.2053 (max= 1.6131), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:51,894 - root - INFO - Step 20530: lr=1.00E-05, loss= 1.2053 (max= 1.6131), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:51,895 - root - INFO - Step 20530: lr=1.00E-05, loss= 1.2053 (max= 1.6131), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:51,895 - root - INFO - Step 20530: lr=1.00E-05, loss= 1.2053 (max= 1.6131), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:51,895 - root - INFO - Step 20530: lr=1.00E-05, loss= 1.2053 (max= 1.6131), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:38:51,895 - root - INFO - Step 20530: lr=1.00E-05, loss= 1.2053 (max= 1.6131), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:07,816 - root - INFO - Step 20540: lr=1.00E-05, loss= 1.1838 (max= 1.5431), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:07,816 - root - INFO - Step 20540: lr=1.00E-05, loss= 1.1838 (max= 1.5431), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:07,816 - root - INFO - Step 20540: lr=1.00E-05, loss= 1.1838 (max= 1.5431), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:07,816 - root - INFO - Step 20540: lr=1.00E-05, loss= 1.1838 (max= 1.5431), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:07,816 - root - INFO - Step 20540: lr=1.00E-05, loss= 1.1838 (max= 1.5431), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:07,816 - root - INFO - Step 20540: lr=1.00E-05, loss= 1.1838 (max= 1.5431), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:07,817 - root - INFO - Step 20540: lr=1.00E-05, loss= 1.1838 (max= 1.5431), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:07,817 - root - INFO - Step 20540: lr=1.00E-05, loss= 1.1838 (max= 1.5431), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:23,745 - root - INFO - Step 20550: lr=1.00E-05, loss= 1.1989 (max= 1.7825), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:23,745 - root - INFO - Step 20550: lr=1.00E-05, loss= 1.1989 (max= 1.7825), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:23,745 - root - INFO - Step 20550: lr=1.00E-05, loss= 1.1989 (max= 1.7825), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:23,745 - root - INFO - Step 20550: lr=1.00E-05, loss= 1.1989 (max= 1.7825), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:23,745 - root - INFO - Step 20550: lr=1.00E-05, loss= 1.1989 (max= 1.7825), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:23,746 - root - INFO - Step 20550: lr=1.00E-05, loss= 1.1989 (max= 1.7825), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:23,746 - root - INFO - Step 20550: lr=1.00E-05, loss= 1.1989 (max= 1.7825), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:23,746 - root - INFO - Step 20550: lr=1.00E-05, loss= 1.1989 (max= 1.7825), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:39,747 - root - INFO - Step 20560: lr=1.00E-05, loss= 1.1757 (max= 1.6399), tps=20482, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:39,748 - root - INFO - Step 20560: lr=1.00E-05, loss= 1.1757 (max= 1.6399), tps=20482, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:39,748 - root - INFO - Step 20560: lr=1.00E-05, loss= 1.1757 (max= 1.6399), tps=20482, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:39,748 - root - INFO - Step 20560: lr=1.00E-05, loss= 1.1757 (max= 1.6399), tps=20482, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:39,748 - root - INFO - Step 20560: lr=1.00E-05, loss= 1.1757 (max= 1.6399), tps=20482, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:39,748 - root - INFO - Step 20560: lr=1.00E-05, loss= 1.1757 (max= 1.6399), tps=20482, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:39,748 - root - INFO - Step 20560: lr=1.00E-05, loss= 1.1757 (max= 1.6399), tps=20482, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:39,748 - root - INFO - Step 20560: lr=1.00E-05, loss= 1.1757 (max= 1.6399), tps=20482, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:55,718 - root - INFO - Step 20570: lr=1.00E-05, loss= 1.2167 (max= 1.6504), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:55,718 - root - INFO - Step 20570: lr=1.00E-05, loss= 1.2167 (max= 1.6504), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:55,718 - root - INFO - Step 20570: lr=1.00E-05, loss= 1.2167 (max= 1.6504), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:55,718 - root - INFO - Step 20570: lr=1.00E-05, loss= 1.2167 (max= 1.6504), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:55,718 - root - INFO - Step 20570: lr=1.00E-05, loss= 1.2167 (max= 1.6504), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:55,718 - root - INFO - Step 20570: lr=1.00E-05, loss= 1.2167 (max= 1.6504), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:55,718 - root - INFO - Step 20570: lr=1.00E-05, loss= 1.2167 (max= 1.6504), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:39:55,718 - root - INFO - Step 20570: lr=1.00E-05, loss= 1.2167 (max= 1.6504), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:40:11,630 - root - INFO - Step 20580: lr=1.00E-05, loss= 1.2035 (max= 1.5361), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:40:11,630 - root - INFO - Step 20580: lr=1.00E-05, loss= 1.2035 (max= 1.5361), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:40:11,630 - root - INFO - Step 20580: lr=1.00E-05, loss= 1.2035 (max= 1.5361), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:40:11,630 - root - INFO - Step 20580: lr=1.00E-05, loss= 1.2035 (max= 1.5361), tps=20597, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:40:11,630 - root - INFO - Step 20580: lr=1.00E-05, loss= 1.2035 (max= 1.5361), tps=20597, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:40:11,630 - root - INFO - Step 20580: lr=1.00E-05, loss= 1.2035 (max= 1.5361), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:40:11,630 - root - INFO - Step 20580: lr=1.00E-05, loss= 1.2035 (max= 1.5361), tps=20597, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:40:11,630 - root - INFO - Step 20580: lr=1.00E-05, loss= 1.2035 (max= 1.5361), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:40:27,566 - root - INFO - Step 20590: lr=1.00E-05, loss= 1.1974 (max= 1.6133), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:40:27,566 - root - INFO - Step 20590: lr=1.00E-05, loss= 1.1974 (max= 1.6133), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:40:27,566 - root - INFO - Step 20590: lr=1.00E-05, loss= 1.1974 (max= 1.6133), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:40:27,566 - root - INFO - Step 20590: lr=1.00E-05, loss= 1.1974 (max= 1.6133), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:40:27,566 - root - INFO - Step 20590: lr=1.00E-05, loss= 1.1974 (max= 1.6133), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:40:27,566 - root - INFO - Step 20590: lr=1.00E-05, loss= 1.1974 (max= 1.6133), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:40:27,566 - root - INFO - Step 20590: lr=1.00E-05, loss= 1.1974 (max= 1.6133), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:40:27,566 - root - INFO - Step 20590: lr=1.00E-05, loss= 1.1974 (max= 1.6133), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:40:43,547 - root - INFO - Step 20600: lr=1.00E-05, loss= 1.2362 (max= 1.7424), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:40:43,547 - root - INFO - Step 20600: lr=1.00E-05, loss= 1.2362 (max= 1.7424), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:40:43,547 - root - INFO - Step 20600: lr=1.00E-05, loss= 1.2362 (max= 1.7424), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:40:43,547 - root - INFO - Step 20600: lr=1.00E-05, loss= 1.2362 (max= 1.7424), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:40:43,547 - root - INFO - Step 20600: lr=1.00E-05, loss= 1.2362 (max= 1.7424), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:40:43,547 - root - INFO - Step 20600: lr=1.00E-05, loss= 1.2362 (max= 1.7424), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:40:43,547 - root - INFO - Step 20600: lr=1.00E-05, loss= 1.2362 (max= 1.7424), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:40:43,547 - root - INFO - Step 20600: lr=1.00E-05, loss= 1.2362 (max= 1.7424), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:40:59,458 - root - INFO - Step 20610: lr=1.00E-05, loss= 1.2005 (max= 1.5660), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:40:59,458 - root - INFO - Step 20610: lr=1.00E-05, loss= 1.2005 (max= 1.5660), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:40:59,459 - root - INFO - Step 20610: lr=1.00E-05, loss= 1.2005 (max= 1.5660), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:40:59,459 - root - INFO - Step 20610: lr=1.00E-05, loss= 1.2005 (max= 1.5660), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:40:59,459 - root - INFO - Step 20610: lr=1.00E-05, loss= 1.2005 (max= 1.5660), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:40:59,459 - root - INFO - Step 20610: lr=1.00E-05, loss= 1.2005 (max= 1.5660), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:40:59,459 - root - INFO - Step 20610: lr=1.00E-05, loss= 1.2005 (max= 1.5660), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:40:59,459 - root - INFO - Step 20610: lr=1.00E-05, loss= 1.2005 (max= 1.5660), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:41:15,436 - root - INFO - Step 20620: lr=1.00E-05, loss= 1.2282 (max= 1.6210), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:41:15,436 - root - INFO - Step 20620: lr=1.00E-05, loss= 1.2282 (max= 1.6210), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:41:15,436 - root - INFO - Step 20620: lr=1.00E-05, loss= 1.2282 (max= 1.6210), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:41:15,436 - root - INFO - Step 20620: lr=1.00E-05, loss= 1.2282 (max= 1.6210), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:41:15,436 - root - INFO - Step 20620: lr=1.00E-05, loss= 1.2282 (max= 1.6210), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:41:15,436 - root - INFO - Step 20620: lr=1.00E-05, loss= 1.2282 (max= 1.6210), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:41:15,436 - root - INFO - Step 20620: lr=1.00E-05, loss= 1.2282 (max= 1.6210), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:41:15,436 - root - INFO - Step 20620: lr=1.00E-05, loss= 1.2282 (max= 1.6210), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:41:31,457 - root - INFO - Step 20630: lr=1.00E-05, loss= 1.1974 (max= 1.7510), tps=20458, mfu=42.62%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:41:31,457 - root - INFO - Step 20630: lr=1.00E-05, loss= 1.1974 (max= 1.7510), tps=20458, mfu=42.62%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:41:31,457 - root - INFO - Step 20630: lr=1.00E-05, loss= 1.1974 (max= 1.7510), tps=20458, mfu=42.62%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:41:31,457 - root - INFO - Step 20630: lr=1.00E-05, loss= 1.1974 (max= 1.7510), tps=20458, mfu=42.62%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:41:31,457 - root - INFO - Step 20630: lr=1.00E-05, loss= 1.1974 (max= 1.7510), tps=20458, mfu=42.62%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:41:31,457 - root - INFO - Step 20630: lr=1.00E-05, loss= 1.1974 (max= 1.7510), tps=20458, mfu=42.62%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:41:31,457 - root - INFO - Step 20630: lr=1.00E-05, loss= 1.1974 (max= 1.7510), tps=20458, mfu=42.62%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:41:31,457 - root - INFO - Step 20630: lr=1.00E-05, loss= 1.1974 (max= 1.7510), tps=20458, mfu=42.62%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:41:47,366 - root - INFO - Step 20640: lr=1.00E-05, loss= 1.2079 (max= 1.5382), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:41:47,366 - root - INFO - Step 20640: lr=1.00E-05, loss= 1.2079 (max= 1.5382), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:41:47,366 - root - INFO - Step 20640: lr=1.00E-05, loss= 1.2079 (max= 1.5382), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:41:47,366 - root - INFO - Step 20640: lr=1.00E-05, loss= 1.2079 (max= 1.5382), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:41:47,366 - root - INFO - Step 20640: lr=1.00E-05, loss= 1.2079 (max= 1.5382), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:41:47,366 - root - INFO - Step 20640: lr=1.00E-05, loss= 1.2079 (max= 1.5382), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:41:47,366 - root - INFO - Step 20640: lr=1.00E-05, loss= 1.2079 (max= 1.5382), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:41:47,366 - root - INFO - Step 20640: lr=1.00E-05, loss= 1.2079 (max= 1.5382), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:03,305 - root - INFO - Step 20650: lr=1.00E-05, loss= 1.2000 (max= 1.6283), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:03,305 - root - INFO - Step 20650: lr=1.00E-05, loss= 1.2000 (max= 1.6283), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:03,305 - root - INFO - Step 20650: lr=1.00E-05, loss= 1.2000 (max= 1.6283), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:03,305 - root - INFO - Step 20650: lr=1.00E-05, loss= 1.2000 (max= 1.6283), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:03,305 - root - INFO - Step 20650: lr=1.00E-05, loss= 1.2000 (max= 1.6283), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:03,305 - root - INFO - Step 20650: lr=1.00E-05, loss= 1.2000 (max= 1.6283), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:03,305 - root - INFO - Step 20650: lr=1.00E-05, loss= 1.2000 (max= 1.6283), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:03,306 - root - INFO - Step 20650: lr=1.00E-05, loss= 1.2000 (max= 1.6283), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:19,252 - root - INFO - Step 20660: lr=1.00E-05, loss= 1.1640 (max= 1.4981), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:42:19,252 - root - INFO - Step 20660: lr=1.00E-05, loss= 1.1640 (max= 1.4981), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:42:19,252 - root - INFO - Step 20660: lr=1.00E-05, loss= 1.1640 (max= 1.4981), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:42:19,252 - root - INFO - Step 20660: lr=1.00E-05, loss= 1.1640 (max= 1.4981), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:42:19,252 - root - INFO - Step 20660: lr=1.00E-05, loss= 1.1640 (max= 1.4981), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:42:19,252 - root - INFO - Step 20660: lr=1.00E-05, loss= 1.1640 (max= 1.4981), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:42:19,252 - root - INFO - Step 20660: lr=1.00E-05, loss= 1.1640 (max= 1.4981), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:42:19,252 - root - INFO - Step 20660: lr=1.00E-05, loss= 1.1640 (max= 1.4981), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:42:35,170 - root - INFO - Step 20670: lr=1.00E-05, loss= 1.2083 (max= 1.7039), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:35,170 - root - INFO - Step 20670: lr=1.00E-05, loss= 1.2083 (max= 1.7039), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:35,170 - root - INFO - Step 20670: lr=1.00E-05, loss= 1.2083 (max= 1.7039), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:35,170 - root - INFO - Step 20670: lr=1.00E-05, loss= 1.2083 (max= 1.7039), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:35,170 - root - INFO - Step 20670: lr=1.00E-05, loss= 1.2083 (max= 1.7039), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:35,170 - root - INFO - Step 20670: lr=1.00E-05, loss= 1.2083 (max= 1.7039), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:35,170 - root - INFO - Step 20670: lr=1.00E-05, loss= 1.2083 (max= 1.7039), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:35,170 - root - INFO - Step 20670: lr=1.00E-05, loss= 1.2083 (max= 1.7039), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:51,099 - root - INFO - Step 20680: lr=1.00E-05, loss= 1.2305 (max= 1.5877), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:51,099 - root - INFO - Step 20680: lr=1.00E-05, loss= 1.2305 (max= 1.5877), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:51,099 - root - INFO - Step 20680: lr=1.00E-05, loss= 1.2305 (max= 1.5877), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:51,099 - root - INFO - Step 20680: lr=1.00E-05, loss= 1.2305 (max= 1.5877), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:51,099 - root - INFO - Step 20680: lr=1.00E-05, loss= 1.2305 (max= 1.5877), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:51,099 - root - INFO - Step 20680: lr=1.00E-05, loss= 1.2305 (max= 1.5877), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:51,099 - root - INFO - Step 20680: lr=1.00E-05, loss= 1.2305 (max= 1.5877), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:42:51,099 - root - INFO - Step 20680: lr=1.00E-05, loss= 1.2305 (max= 1.5877), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:43:06,977 - root - INFO - Step 20690: lr=1.00E-05, loss= 1.2111 (max= 1.5759), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:43:06,977 - root - INFO - Step 20690: lr=1.00E-05, loss= 1.2111 (max= 1.5759), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:43:06,977 - root - INFO - Step 20690: lr=1.00E-05, loss= 1.2111 (max= 1.5759), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:43:06,977 - root - INFO - Step 20690: lr=1.00E-05, loss= 1.2111 (max= 1.5759), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:43:06,977 - root - INFO - Step 20690: lr=1.00E-05, loss= 1.2111 (max= 1.5759), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:43:06,977 - root - INFO - Step 20690: lr=1.00E-05, loss= 1.2111 (max= 1.5759), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:43:06,977 - root - INFO - Step 20690: lr=1.00E-05, loss= 1.2111 (max= 1.5759), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:43:06,977 - root - INFO - Step 20690: lr=1.00E-05, loss= 1.2111 (max= 1.5759), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:43:22,901 - root - INFO - Step 20700: lr=1.00E-05, loss= 1.1992 (max= 1.6418), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:43:22,901 - root - INFO - Step 20700: lr=1.00E-05, loss= 1.1992 (max= 1.6418), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:43:22,901 - root - INFO - Step 20700: lr=1.00E-05, loss= 1.1992 (max= 1.6418), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:43:22,901 - root - INFO - Step 20700: lr=1.00E-05, loss= 1.1992 (max= 1.6418), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:43:22,901 - root - INFO - Step 20700: lr=1.00E-05, loss= 1.1992 (max= 1.6418), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:43:22,901 - root - INFO - Step 20700: lr=1.00E-05, loss= 1.1992 (max= 1.6418), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:43:22,901 - root - INFO - Step 20700: lr=1.00E-05, loss= 1.1992 (max= 1.6418), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:43:22,902 - root - INFO - Step 20700: lr=1.00E-05, loss= 1.1992 (max= 1.6418), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:43:38,835 - root - INFO - Step 20710: lr=1.00E-05, loss= 1.2117 (max= 1.6311), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:43:38,835 - root - INFO - Step 20710: lr=1.00E-05, loss= 1.2117 (max= 1.6311), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:43:38,835 - root - INFO - Step 20710: lr=1.00E-05, loss= 1.2117 (max= 1.6311), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:43:38,835 - root - INFO - Step 20710: lr=1.00E-05, loss= 1.2117 (max= 1.6311), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:43:38,835 - root - INFO - Step 20710: lr=1.00E-05, loss= 1.2117 (max= 1.6311), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:43:38,835 - root - INFO - Step 20710: lr=1.00E-05, loss= 1.2117 (max= 1.6311), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:43:38,835 - root - INFO - Step 20710: lr=1.00E-05, loss= 1.2117 (max= 1.6311), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:43:38,835 - root - INFO - Step 20710: lr=1.00E-05, loss= 1.2117 (max= 1.6311), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:43:54,722 - root - INFO - Step 20720: lr=1.00E-05, loss= 1.2019 (max= 1.5834), tps=20631, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:43:54,722 - root - INFO - Step 20720: lr=1.00E-05, loss= 1.2019 (max= 1.5834), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:43:54,722 - root - INFO - Step 20720: lr=1.00E-05, loss= 1.2019 (max= 1.5834), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:43:54,722 - root - INFO - Step 20720: lr=1.00E-05, loss= 1.2019 (max= 1.5834), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:43:54,722 - root - INFO - Step 20720: lr=1.00E-05, loss= 1.2019 (max= 1.5834), tps=20631, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:43:54,722 - root - INFO - Step 20720: lr=1.00E-05, loss= 1.2019 (max= 1.5834), tps=20631, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:43:54,722 - root - INFO - Step 20720: lr=1.00E-05, loss= 1.2019 (max= 1.5834), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:43:54,722 - root - INFO - Step 20720: lr=1.00E-05, loss= 1.2019 (max= 1.5834), tps=20631, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:10,650 - root - INFO - Step 20730: lr=1.00E-05, loss= 1.2144 (max= 1.5649), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:10,650 - root - INFO - Step 20730: lr=1.00E-05, loss= 1.2144 (max= 1.5649), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:10,650 - root - INFO - Step 20730: lr=1.00E-05, loss= 1.2144 (max= 1.5649), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:10,650 - root - INFO - Step 20730: lr=1.00E-05, loss= 1.2144 (max= 1.5649), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:10,650 - root - INFO - Step 20730: lr=1.00E-05, loss= 1.2144 (max= 1.5649), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:10,650 - root - INFO - Step 20730: lr=1.00E-05, loss= 1.2144 (max= 1.5649), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:10,650 - root - INFO - Step 20730: lr=1.00E-05, loss= 1.2144 (max= 1.5649), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:10,650 - root - INFO - Step 20730: lr=1.00E-05, loss= 1.2144 (max= 1.5649), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:26,566 - root - INFO - Step 20740: lr=1.00E-05, loss= 1.2108 (max= 1.4933), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:26,566 - root - INFO - Step 20740: lr=1.00E-05, loss= 1.2108 (max= 1.4933), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:26,566 - root - INFO - Step 20740: lr=1.00E-05, loss= 1.2108 (max= 1.4933), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:26,566 - root - INFO - Step 20740: lr=1.00E-05, loss= 1.2108 (max= 1.4933), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:26,566 - root - INFO - Step 20740: lr=1.00E-05, loss= 1.2108 (max= 1.4933), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:26,566 - root - INFO - Step 20740: lr=1.00E-05, loss= 1.2108 (max= 1.4933), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:26,567 - root - INFO - Step 20740: lr=1.00E-05, loss= 1.2108 (max= 1.4933), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:26,567 - root - INFO - Step 20740: lr=1.00E-05, loss= 1.2108 (max= 1.4933), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:42,510 - root - INFO - Step 20750: lr=1.00E-05, loss= 1.2002 (max= 1.4971), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:42,510 - root - INFO - Step 20750: lr=1.00E-05, loss= 1.2002 (max= 1.4971), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:42,510 - root - INFO - Step 20750: lr=1.00E-05, loss= 1.2002 (max= 1.4971), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:42,510 - root - INFO - Step 20750: lr=1.00E-05, loss= 1.2002 (max= 1.4971), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:42,510 - root - INFO - Step 20750: lr=1.00E-05, loss= 1.2002 (max= 1.4971), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:42,510 - root - INFO - Step 20750: lr=1.00E-05, loss= 1.2002 (max= 1.4971), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:42,510 - root - INFO - Step 20750: lr=1.00E-05, loss= 1.2002 (max= 1.4971), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:42,510 - root - INFO - Step 20750: lr=1.00E-05, loss= 1.2002 (max= 1.4971), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:58,435 - root - INFO - Step 20760: lr=1.00E-05, loss= 1.1873 (max= 1.6113), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:58,435 - root - INFO - Step 20760: lr=1.00E-05, loss= 1.1873 (max= 1.6113), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:58,435 - root - INFO - Step 20760: lr=1.00E-05, loss= 1.1873 (max= 1.6113), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:58,435 - root - INFO - Step 20760: lr=1.00E-05, loss= 1.1873 (max= 1.6113), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:58,435 - root - INFO - Step 20760: lr=1.00E-05, loss= 1.1873 (max= 1.6113), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:58,436 - root - INFO - Step 20760: lr=1.00E-05, loss= 1.1873 (max= 1.6113), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:58,436 - root - INFO - Step 20760: lr=1.00E-05, loss= 1.1873 (max= 1.6113), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:44:58,436 - root - INFO - Step 20760: lr=1.00E-05, loss= 1.1873 (max= 1.6113), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:45:14,400 - root - INFO - Step 20770: lr=1.00E-05, loss= 1.2114 (max= 1.5603), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:45:14,401 - root - INFO - Step 20770: lr=1.00E-05, loss= 1.2114 (max= 1.5603), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:45:14,401 - root - INFO - Step 20770: lr=1.00E-05, loss= 1.2114 (max= 1.5603), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:45:14,401 - root - INFO - Step 20770: lr=1.00E-05, loss= 1.2114 (max= 1.5603), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:45:14,401 - root - INFO - Step 20770: lr=1.00E-05, loss= 1.2114 (max= 1.5603), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:45:14,401 - root - INFO - Step 20770: lr=1.00E-05, loss= 1.2114 (max= 1.5603), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:45:14,401 - root - INFO - Step 20770: lr=1.00E-05, loss= 1.2114 (max= 1.5603), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:45:14,401 - root - INFO - Step 20770: lr=1.00E-05, loss= 1.2114 (max= 1.5603), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:45:30,334 - root - INFO - Step 20780: lr=1.00E-05, loss= 1.2311 (max= 1.7231), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:45:30,334 - root - INFO - Step 20780: lr=1.00E-05, loss= 1.2311 (max= 1.7231), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:45:30,334 - root - INFO - Step 20780: lr=1.00E-05, loss= 1.2311 (max= 1.7231), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:45:30,334 - root - INFO - Step 20780: lr=1.00E-05, loss= 1.2311 (max= 1.7231), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:45:30,334 - root - INFO - Step 20780: lr=1.00E-05, loss= 1.2311 (max= 1.7231), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:45:30,334 - root - INFO - Step 20780: lr=1.00E-05, loss= 1.2311 (max= 1.7231), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:45:30,335 - root - INFO - Step 20780: lr=1.00E-05, loss= 1.2311 (max= 1.7231), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:45:30,335 - root - INFO - Step 20780: lr=1.00E-05, loss= 1.2311 (max= 1.7231), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:45:46,259 - root - INFO - Step 20790: lr=1.00E-05, loss= 1.2176 (max= 1.5966), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:45:46,259 - root - INFO - Step 20790: lr=1.00E-05, loss= 1.2176 (max= 1.5966), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:45:46,259 - root - INFO - Step 20790: lr=1.00E-05, loss= 1.2176 (max= 1.5966), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:45:46,259 - root - INFO - Step 20790: lr=1.00E-05, loss= 1.2176 (max= 1.5966), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:45:46,259 - root - INFO - Step 20790: lr=1.00E-05, loss= 1.2176 (max= 1.5966), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:45:46,259 - root - INFO - Step 20790: lr=1.00E-05, loss= 1.2176 (max= 1.5966), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:45:46,259 - root - INFO - Step 20790: lr=1.00E-05, loss= 1.2176 (max= 1.5966), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:45:46,260 - root - INFO - Step 20790: lr=1.00E-05, loss= 1.2176 (max= 1.5966), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:46:02,195 - root - INFO - Step 20800: lr=1.00E-05, loss= 1.1998 (max= 1.4988), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:02,196 - root - INFO - Step 20800: lr=1.00E-05, loss= 1.1998 (max= 1.4988), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:02,196 - root - INFO - Step 20800: lr=1.00E-05, loss= 1.1998 (max= 1.4988), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:02,196 - root - INFO - Step 20800: lr=1.00E-05, loss= 1.1998 (max= 1.4988), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:02,196 - root - INFO - Step 20800: lr=1.00E-05, loss= 1.1998 (max= 1.4988), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:02,196 - root - INFO - Step 20800: lr=1.00E-05, loss= 1.1998 (max= 1.4988), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:02,196 - root - INFO - Step 20800: lr=1.00E-05, loss= 1.1998 (max= 1.4988), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:02,196 - root - INFO - Step 20800: lr=1.00E-05, loss= 1.1998 (max= 1.4988), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:18,112 - root - INFO - Step 20810: lr=1.00E-05, loss= 1.1874 (max= 1.6707), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:18,112 - root - INFO - Step 20810: lr=1.00E-05, loss= 1.1874 (max= 1.6707), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:18,113 - root - INFO - Step 20810: lr=1.00E-05, loss= 1.1874 (max= 1.6707), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:18,113 - root - INFO - Step 20810: lr=1.00E-05, loss= 1.1874 (max= 1.6707), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:18,113 - root - INFO - Step 20810: lr=1.00E-05, loss= 1.1874 (max= 1.6707), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:18,113 - root - INFO - Step 20810: lr=1.00E-05, loss= 1.1874 (max= 1.6707), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:18,113 - root - INFO - Step 20810: lr=1.00E-05, loss= 1.1874 (max= 1.6707), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:18,113 - root - INFO - Step 20810: lr=1.00E-05, loss= 1.1874 (max= 1.6707), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:34,015 - root - INFO - Step 20820: lr=1.00E-05, loss= 1.1977 (max= 1.6786), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:34,015 - root - INFO - Step 20820: lr=1.00E-05, loss= 1.1977 (max= 1.6786), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:34,015 - root - INFO - Step 20820: lr=1.00E-05, loss= 1.1977 (max= 1.6786), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:34,015 - root - INFO - Step 20820: lr=1.00E-05, loss= 1.1977 (max= 1.6786), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:34,015 - root - INFO - Step 20820: lr=1.00E-05, loss= 1.1977 (max= 1.6786), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:34,015 - root - INFO - Step 20820: lr=1.00E-05, loss= 1.1977 (max= 1.6786), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:34,015 - root - INFO - Step 20820: lr=1.00E-05, loss= 1.1977 (max= 1.6786), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:34,015 - root - INFO - Step 20820: lr=1.00E-05, loss= 1.1977 (max= 1.6786), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:49,960 - root - INFO - Step 20830: lr=1.00E-05, loss= 1.2128 (max= 1.6529), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:49,960 - root - INFO - Step 20830: lr=1.00E-05, loss= 1.2128 (max= 1.6529), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:49,960 - root - INFO - Step 20830: lr=1.00E-05, loss= 1.2128 (max= 1.6529), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:49,960 - root - INFO - Step 20830: lr=1.00E-05, loss= 1.2128 (max= 1.6529), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:49,960 - root - INFO - Step 20830: lr=1.00E-05, loss= 1.2128 (max= 1.6529), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:49,960 - root - INFO - Step 20830: lr=1.00E-05, loss= 1.2128 (max= 1.6529), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:49,961 - root - INFO - Step 20830: lr=1.00E-05, loss= 1.2128 (max= 1.6529), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:49,961 - root - INFO - Step 20830: lr=1.00E-05, loss= 1.2128 (max= 1.6529), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:46:57,076 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:7111869 +2025-10-24 19:47:05,928 - root - INFO - Step 20840: lr=1.00E-05, loss= 1.1844 (max= 1.6532), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:05,928 - root - INFO - Step 20840: lr=1.00E-05, loss= 1.1844 (max= 1.6532), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:05,928 - root - INFO - Step 20840: lr=1.00E-05, loss= 1.1844 (max= 1.6532), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:05,928 - root - INFO - Step 20840: lr=1.00E-05, loss= 1.1844 (max= 1.6532), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:05,928 - root - INFO - Step 20840: lr=1.00E-05, loss= 1.1844 (max= 1.6532), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:05,928 - root - INFO - Step 20840: lr=1.00E-05, loss= 1.1844 (max= 1.6532), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:05,928 - root - INFO - Step 20840: lr=1.00E-05, loss= 1.1844 (max= 1.6532), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:05,928 - root - INFO - Step 20840: lr=1.00E-05, loss= 1.1844 (max= 1.6532), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:21,896 - root - INFO - Step 20850: lr=1.00E-05, loss= 1.2244 (max= 1.8325), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:21,896 - root - INFO - Step 20850: lr=1.00E-05, loss= 1.2244 (max= 1.8325), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:21,896 - root - INFO - Step 20850: lr=1.00E-05, loss= 1.2244 (max= 1.8325), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:21,896 - root - INFO - Step 20850: lr=1.00E-05, loss= 1.2244 (max= 1.8325), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:21,896 - root - INFO - Step 20850: lr=1.00E-05, loss= 1.2244 (max= 1.8325), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:21,896 - root - INFO - Step 20850: lr=1.00E-05, loss= 1.2244 (max= 1.8325), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:21,896 - root - INFO - Step 20850: lr=1.00E-05, loss= 1.2244 (max= 1.8325), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:21,896 - root - INFO - Step 20850: lr=1.00E-05, loss= 1.2244 (max= 1.8325), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:37,847 - root - INFO - Step 20860: lr=1.00E-05, loss= 1.2027 (max= 1.7246), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:37,847 - root - INFO - Step 20860: lr=1.00E-05, loss= 1.2027 (max= 1.7246), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:37,847 - root - INFO - Step 20860: lr=1.00E-05, loss= 1.2027 (max= 1.7246), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:37,847 - root - INFO - Step 20860: lr=1.00E-05, loss= 1.2027 (max= 1.7246), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:37,848 - root - INFO - Step 20860: lr=1.00E-05, loss= 1.2027 (max= 1.7246), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:37,848 - root - INFO - Step 20860: lr=1.00E-05, loss= 1.2027 (max= 1.7246), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:37,848 - root - INFO - Step 20860: lr=1.00E-05, loss= 1.2027 (max= 1.7246), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:37,848 - root - INFO - Step 20860: lr=1.00E-05, loss= 1.2027 (max= 1.7246), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:53,794 - root - INFO - Step 20870: lr=1.00E-05, loss= 1.1815 (max= 1.5025), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:53,795 - root - INFO - Step 20870: lr=1.00E-05, loss= 1.1815 (max= 1.5025), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:53,795 - root - INFO - Step 20870: lr=1.00E-05, loss= 1.1815 (max= 1.5025), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:53,795 - root - INFO - Step 20870: lr=1.00E-05, loss= 1.1815 (max= 1.5025), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:53,795 - root - INFO - Step 20870: lr=1.00E-05, loss= 1.1815 (max= 1.5025), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:53,795 - root - INFO - Step 20870: lr=1.00E-05, loss= 1.1815 (max= 1.5025), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:53,795 - root - INFO - Step 20870: lr=1.00E-05, loss= 1.1815 (max= 1.5025), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:47:53,795 - root - INFO - Step 20870: lr=1.00E-05, loss= 1.1815 (max= 1.5025), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:48:09,689 - root - INFO - Step 20880: lr=1.00E-05, loss= 1.1878 (max= 1.6455), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:09,689 - root - INFO - Step 20880: lr=1.00E-05, loss= 1.1878 (max= 1.6455), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:09,689 - root - INFO - Step 20880: lr=1.00E-05, loss= 1.1878 (max= 1.6455), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:09,689 - root - INFO - Step 20880: lr=1.00E-05, loss= 1.1878 (max= 1.6455), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:09,689 - root - INFO - Step 20880: lr=1.00E-05, loss= 1.1878 (max= 1.6455), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:09,689 - root - INFO - Step 20880: lr=1.00E-05, loss= 1.1878 (max= 1.6455), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:09,689 - root - INFO - Step 20880: lr=1.00E-05, loss= 1.1878 (max= 1.6455), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:09,689 - root - INFO - Step 20880: lr=1.00E-05, loss= 1.1878 (max= 1.6455), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:25,633 - root - INFO - Step 20890: lr=1.00E-05, loss= 1.1950 (max= 1.6637), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:25,633 - root - INFO - Step 20890: lr=1.00E-05, loss= 1.1950 (max= 1.6637), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:25,633 - root - INFO - Step 20890: lr=1.00E-05, loss= 1.1950 (max= 1.6637), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:25,633 - root - INFO - Step 20890: lr=1.00E-05, loss= 1.1950 (max= 1.6637), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:25,633 - root - INFO - Step 20890: lr=1.00E-05, loss= 1.1950 (max= 1.6637), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:25,633 - root - INFO - Step 20890: lr=1.00E-05, loss= 1.1950 (max= 1.6637), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:25,633 - root - INFO - Step 20890: lr=1.00E-05, loss= 1.1950 (max= 1.6637), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:25,633 - root - INFO - Step 20890: lr=1.00E-05, loss= 1.1950 (max= 1.6637), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:41,610 - root - INFO - Step 20900: lr=1.00E-05, loss= 1.1934 (max= 1.5517), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:41,610 - root - INFO - Step 20900: lr=1.00E-05, loss= 1.1934 (max= 1.5517), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:41,610 - root - INFO - Step 20900: lr=1.00E-05, loss= 1.1934 (max= 1.5517), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:41,610 - root - INFO - Step 20900: lr=1.00E-05, loss= 1.1934 (max= 1.5517), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:41,610 - root - INFO - Step 20900: lr=1.00E-05, loss= 1.1934 (max= 1.5517), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:41,610 - root - INFO - Step 20900: lr=1.00E-05, loss= 1.1934 (max= 1.5517), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:41,610 - root - INFO - Step 20900: lr=1.00E-05, loss= 1.1934 (max= 1.5517), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:41,610 - root - INFO - Step 20900: lr=1.00E-05, loss= 1.1934 (max= 1.5517), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:48:57,606 - root - INFO - Step 20910: lr=1.00E-05, loss= 1.2315 (max= 1.7539), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:48:57,606 - root - INFO - Step 20910: lr=1.00E-05, loss= 1.2315 (max= 1.7539), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:48:57,606 - root - INFO - Step 20910: lr=1.00E-05, loss= 1.2315 (max= 1.7539), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:48:57,606 - root - INFO - Step 20910: lr=1.00E-05, loss= 1.2315 (max= 1.7539), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:48:57,606 - root - INFO - Step 20910: lr=1.00E-05, loss= 1.2315 (max= 1.7539), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:48:57,606 - root - INFO - Step 20910: lr=1.00E-05, loss= 1.2315 (max= 1.7539), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:48:57,607 - root - INFO - Step 20910: lr=1.00E-05, loss= 1.2315 (max= 1.7539), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:48:57,607 - root - INFO - Step 20910: lr=1.00E-05, loss= 1.2315 (max= 1.7539), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:13,574 - root - INFO - Step 20920: lr=1.00E-05, loss= 1.1952 (max= 1.6029), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:13,574 - root - INFO - Step 20920: lr=1.00E-05, loss= 1.1952 (max= 1.6029), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:13,575 - root - INFO - Step 20920: lr=1.00E-05, loss= 1.1952 (max= 1.6029), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:13,575 - root - INFO - Step 20920: lr=1.00E-05, loss= 1.1952 (max= 1.6029), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:13,575 - root - INFO - Step 20920: lr=1.00E-05, loss= 1.1952 (max= 1.6029), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:13,575 - root - INFO - Step 20920: lr=1.00E-05, loss= 1.1952 (max= 1.6029), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:13,575 - root - INFO - Step 20920: lr=1.00E-05, loss= 1.1952 (max= 1.6029), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:13,575 - root - INFO - Step 20920: lr=1.00E-05, loss= 1.1952 (max= 1.6029), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:29,511 - root - INFO - Step 20930: lr=1.00E-05, loss= 1.2066 (max= 1.5126), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:29,512 - root - INFO - Step 20930: lr=1.00E-05, loss= 1.2066 (max= 1.5126), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:29,512 - root - INFO - Step 20930: lr=1.00E-05, loss= 1.2066 (max= 1.5126), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:29,512 - root - INFO - Step 20930: lr=1.00E-05, loss= 1.2066 (max= 1.5126), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:29,512 - root - INFO - Step 20930: lr=1.00E-05, loss= 1.2066 (max= 1.5126), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:29,512 - root - INFO - Step 20930: lr=1.00E-05, loss= 1.2066 (max= 1.5126), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:29,512 - root - INFO - Step 20930: lr=1.00E-05, loss= 1.2066 (max= 1.5126), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:29,512 - root - INFO - Step 20930: lr=1.00E-05, loss= 1.2066 (max= 1.5126), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:45,461 - root - INFO - Step 20940: lr=1.00E-05, loss= 1.1719 (max= 1.5257), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:45,462 - root - INFO - Step 20940: lr=1.00E-05, loss= 1.1719 (max= 1.5257), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:45,462 - root - INFO - Step 20940: lr=1.00E-05, loss= 1.1719 (max= 1.5257), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:45,462 - root - INFO - Step 20940: lr=1.00E-05, loss= 1.1719 (max= 1.5257), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:45,462 - root - INFO - Step 20940: lr=1.00E-05, loss= 1.1719 (max= 1.5257), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:45,462 - root - INFO - Step 20940: lr=1.00E-05, loss= 1.1719 (max= 1.5257), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:45,462 - root - INFO - Step 20940: lr=1.00E-05, loss= 1.1719 (max= 1.5257), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:49:45,462 - root - INFO - Step 20940: lr=1.00E-05, loss= 1.1719 (max= 1.5257), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:01,397 - root - INFO - Step 20950: lr=1.00E-05, loss= 1.2066 (max= 1.5867), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:01,398 - root - INFO - Step 20950: lr=1.00E-05, loss= 1.2066 (max= 1.5867), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:01,398 - root - INFO - Step 20950: lr=1.00E-05, loss= 1.2066 (max= 1.5867), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:01,398 - root - INFO - Step 20950: lr=1.00E-05, loss= 1.2066 (max= 1.5867), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:01,398 - root - INFO - Step 20950: lr=1.00E-05, loss= 1.2066 (max= 1.5867), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:01,398 - root - INFO - Step 20950: lr=1.00E-05, loss= 1.2066 (max= 1.5867), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:01,398 - root - INFO - Step 20950: lr=1.00E-05, loss= 1.2066 (max= 1.5867), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:01,398 - root - INFO - Step 20950: lr=1.00E-05, loss= 1.2066 (max= 1.5867), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:17,340 - root - INFO - Step 20960: lr=1.00E-05, loss= 1.1910 (max= 1.6540), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:17,340 - root - INFO - Step 20960: lr=1.00E-05, loss= 1.1910 (max= 1.6540), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:17,341 - root - INFO - Step 20960: lr=1.00E-05, loss= 1.1910 (max= 1.6540), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:17,341 - root - INFO - Step 20960: lr=1.00E-05, loss= 1.1910 (max= 1.6540), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:17,341 - root - INFO - Step 20960: lr=1.00E-05, loss= 1.1910 (max= 1.6540), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:17,341 - root - INFO - Step 20960: lr=1.00E-05, loss= 1.1910 (max= 1.6540), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:17,341 - root - INFO - Step 20960: lr=1.00E-05, loss= 1.1910 (max= 1.6540), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:17,341 - root - INFO - Step 20960: lr=1.00E-05, loss= 1.1910 (max= 1.6540), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:33,270 - root - INFO - Step 20970: lr=1.00E-05, loss= 1.1914 (max= 1.5450), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:33,271 - root - INFO - Step 20970: lr=1.00E-05, loss= 1.1914 (max= 1.5450), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:33,271 - root - INFO - Step 20970: lr=1.00E-05, loss= 1.1914 (max= 1.5450), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:33,271 - root - INFO - Step 20970: lr=1.00E-05, loss= 1.1914 (max= 1.5450), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:33,271 - root - INFO - Step 20970: lr=1.00E-05, loss= 1.1914 (max= 1.5450), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:33,271 - root - INFO - Step 20970: lr=1.00E-05, loss= 1.1914 (max= 1.5450), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:33,271 - root - INFO - Step 20970: lr=1.00E-05, loss= 1.1914 (max= 1.5450), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:33,271 - root - INFO - Step 20970: lr=1.00E-05, loss= 1.1914 (max= 1.5450), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:49,183 - root - INFO - Step 20980: lr=1.00E-05, loss= 1.2231 (max= 1.6298), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:49,183 - root - INFO - Step 20980: lr=1.00E-05, loss= 1.2231 (max= 1.6298), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:49,183 - root - INFO - Step 20980: lr=1.00E-05, loss= 1.2231 (max= 1.6298), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:49,183 - root - INFO - Step 20980: lr=1.00E-05, loss= 1.2231 (max= 1.6298), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:49,183 - root - INFO - Step 20980: lr=1.00E-05, loss= 1.2231 (max= 1.6298), tps=20597, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:49,183 - root - INFO - Step 20980: lr=1.00E-05, loss= 1.2231 (max= 1.6298), tps=20597, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:49,183 - root - INFO - Step 20980: lr=1.00E-05, loss= 1.2231 (max= 1.6298), tps=20597, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:50:49,183 - root - INFO - Step 20980: lr=1.00E-05, loss= 1.2231 (max= 1.6298), tps=20597, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:05,135 - root - INFO - Step 20990: lr=1.00E-05, loss= 1.2400 (max= 1.5805), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:05,135 - root - INFO - Step 20990: lr=1.00E-05, loss= 1.2400 (max= 1.5805), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:05,135 - root - INFO - Step 20990: lr=1.00E-05, loss= 1.2400 (max= 1.5805), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:05,135 - root - INFO - Step 20990: lr=1.00E-05, loss= 1.2400 (max= 1.5805), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:05,135 - root - INFO - Step 20990: lr=1.00E-05, loss= 1.2400 (max= 1.5805), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:05,135 - root - INFO - Step 20990: lr=1.00E-05, loss= 1.2400 (max= 1.5805), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:05,135 - root - INFO - Step 20990: lr=1.00E-05, loss= 1.2400 (max= 1.5805), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:05,135 - root - INFO - Step 20990: lr=1.00E-05, loss= 1.2400 (max= 1.5805), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-21000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-21000! Save time: 4.236515045166016 +2025-10-24 19:51:21,077 - root - INFO - Step 21000: lr=1.00E-05, loss= 1.2264 (max= 1.7915), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:21,077 - root - INFO - Saving a full checkpoint at step 21000 +2025-10-24 19:51:21,077 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 19:51:21,078 - root - INFO - Step 21000: lr=1.00E-05, loss= 1.2264 (max= 1.7915), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:21,078 - root - INFO - Step 21000: lr=1.00E-05, loss= 1.2264 (max= 1.7915), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:21,078 - root - INFO - Saving a full checkpoint at step 21000 +2025-10-24 19:51:21,078 - root - INFO - Saving a full checkpoint at step 21000 +2025-10-24 19:51:21,078 - root - INFO - Step 21000: lr=1.00E-05, loss= 1.2264 (max= 1.7915), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:21,078 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 19:51:21,078 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 19:51:21,078 - root - INFO - Saving a full checkpoint at step 21000 +2025-10-24 19:51:21,078 - root - INFO - Step 21000: lr=1.00E-05, loss= 1.2264 (max= 1.7915), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:21,078 - root - INFO - Step 21000: lr=1.00E-05, loss= 1.2264 (max= 1.7915), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:21,078 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 19:51:21,078 - root - INFO - Step 21000: lr=1.00E-05, loss= 1.2264 (max= 1.7915), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:21,078 - root - INFO - Saving a full checkpoint at step 21000 +2025-10-24 19:51:21,078 - root - INFO - Saving a full checkpoint at step 21000 +2025-10-24 19:51:21,078 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 19:51:21,078 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 19:51:21,078 - root - INFO - Saving a full checkpoint at step 21000 +2025-10-24 19:51:21,078 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 19:51:21,078 - root - INFO - Step 21000: lr=1.00E-05, loss= 1.2264 (max= 1.7915), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:21,078 - root - INFO - Saving a full checkpoint at step 21000 +2025-10-24 19:51:21,078 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 19:51:37,836 - root - INFO - Finished saving the checkpoint in 16.76 seconds +2025-10-24 19:51:37,842 - root - INFO - Finished saving the checkpoint in 16.76 seconds +2025-10-24 19:51:37,842 - root - INFO - Finished saving the checkpoint in 16.76 seconds +2025-10-24 19:51:37,843 - root - INFO - Finished saving the checkpoint in 16.76 seconds +2025-10-24 19:51:37,843 - root - INFO - Finished saving the checkpoint in 16.76 seconds +2025-10-24 19:51:37,843 - root - INFO - Finished saving the checkpoint in 16.77 seconds +2025-10-24 19:51:37,843 - root - INFO - Finished saving the checkpoint in 16.77 seconds +2025-10-24 19:51:37,844 - root - INFO - Finished saving the checkpoint in 16.77 seconds +2025-10-24 19:51:53,696 - root - INFO - Step 21010: lr=1.00E-05, loss= 1.1734 (max= 1.6012), tps=10047, mfu=20.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:53,696 - root - INFO - Step 21010: lr=1.00E-05, loss= 1.1734 (max= 1.6012), tps=10047, mfu=20.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:53,696 - root - INFO - Step 21010: lr=1.00E-05, loss= 1.1734 (max= 1.6012), tps=10047, mfu=20.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:53,696 - root - INFO - Step 21010: lr=1.00E-05, loss= 1.1734 (max= 1.6012), tps=10047, mfu=20.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:53,696 - root - INFO - Step 21010: lr=1.00E-05, loss= 1.1734 (max= 1.6012), tps=10047, mfu=20.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:53,696 - root - INFO - Step 21010: lr=1.00E-05, loss= 1.1734 (max= 1.6012), tps=10047, mfu=20.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:53,696 - root - INFO - Step 21010: lr=1.00E-05, loss= 1.1734 (max= 1.6012), tps=10047, mfu=20.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:51:53,696 - root - INFO - Step 21010: lr=1.00E-05, loss= 1.1734 (max= 1.6012), tps=10047, mfu=20.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:09,667 - root - INFO - Step 21020: lr=1.00E-05, loss= 1.1966 (max= 1.6570), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:52:09,668 - root - INFO - Step 21020: lr=1.00E-05, loss= 1.1966 (max= 1.6570), tps=20521, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:52:09,668 - root - INFO - Step 21020: lr=1.00E-05, loss= 1.1966 (max= 1.6570), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:52:09,668 - root - INFO - Step 21020: lr=1.00E-05, loss= 1.1966 (max= 1.6570), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:52:09,668 - root - INFO - Step 21020: lr=1.00E-05, loss= 1.1966 (max= 1.6570), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:52:09,668 - root - INFO - Step 21020: lr=1.00E-05, loss= 1.1966 (max= 1.6570), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:52:09,668 - root - INFO - Step 21020: lr=1.00E-05, loss= 1.1966 (max= 1.6570), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:52:09,668 - root - INFO - Step 21020: lr=1.00E-05, loss= 1.1966 (max= 1.6570), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:52:25,614 - root - INFO - Step 21030: lr=1.00E-05, loss= 1.2143 (max= 1.6087), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:25,614 - root - INFO - Step 21030: lr=1.00E-05, loss= 1.2143 (max= 1.6087), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:25,614 - root - INFO - Step 21030: lr=1.00E-05, loss= 1.2143 (max= 1.6087), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:25,614 - root - INFO - Step 21030: lr=1.00E-05, loss= 1.2143 (max= 1.6087), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:25,614 - root - INFO - Step 21030: lr=1.00E-05, loss= 1.2143 (max= 1.6087), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:25,614 - root - INFO - Step 21030: lr=1.00E-05, loss= 1.2143 (max= 1.6087), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:25,614 - root - INFO - Step 21030: lr=1.00E-05, loss= 1.2143 (max= 1.6087), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:25,614 - root - INFO - Step 21030: lr=1.00E-05, loss= 1.2143 (max= 1.6087), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:41,517 - root - INFO - Step 21040: lr=1.00E-05, loss= 1.2015 (max= 1.5822), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:41,517 - root - INFO - Step 21040: lr=1.00E-05, loss= 1.2015 (max= 1.5822), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:41,517 - root - INFO - Step 21040: lr=1.00E-05, loss= 1.2015 (max= 1.5822), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:41,517 - root - INFO - Step 21040: lr=1.00E-05, loss= 1.2015 (max= 1.5822), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:41,517 - root - INFO - Step 21040: lr=1.00E-05, loss= 1.2015 (max= 1.5822), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:41,517 - root - INFO - Step 21040: lr=1.00E-05, loss= 1.2015 (max= 1.5822), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:41,517 - root - INFO - Step 21040: lr=1.00E-05, loss= 1.2015 (max= 1.5822), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:41,517 - root - INFO - Step 21040: lr=1.00E-05, loss= 1.2015 (max= 1.5822), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:57,487 - root - INFO - Step 21050: lr=1.00E-05, loss= 1.2217 (max= 1.6333), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:57,488 - root - INFO - Step 21050: lr=1.00E-05, loss= 1.2217 (max= 1.6333), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:57,488 - root - INFO - Step 21050: lr=1.00E-05, loss= 1.2217 (max= 1.6333), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:57,488 - root - INFO - Step 21050: lr=1.00E-05, loss= 1.2217 (max= 1.6333), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:57,488 - root - INFO - Step 21050: lr=1.00E-05, loss= 1.2217 (max= 1.6333), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:57,488 - root - INFO - Step 21050: lr=1.00E-05, loss= 1.2217 (max= 1.6333), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:57,488 - root - INFO - Step 21050: lr=1.00E-05, loss= 1.2217 (max= 1.6333), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:52:57,488 - root - INFO - Step 21050: lr=1.00E-05, loss= 1.2217 (max= 1.6333), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:13,386 - root - INFO - Step 21060: lr=1.00E-05, loss= 1.2306 (max= 1.7606), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:13,387 - root - INFO - Step 21060: lr=1.00E-05, loss= 1.2306 (max= 1.7606), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:13,387 - root - INFO - Step 21060: lr=1.00E-05, loss= 1.2306 (max= 1.7606), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:13,387 - root - INFO - Step 21060: lr=1.00E-05, loss= 1.2306 (max= 1.7606), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:13,387 - root - INFO - Step 21060: lr=1.00E-05, loss= 1.2306 (max= 1.7606), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:13,387 - root - INFO - Step 21060: lr=1.00E-05, loss= 1.2306 (max= 1.7606), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:13,387 - root - INFO - Step 21060: lr=1.00E-05, loss= 1.2306 (max= 1.7606), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:13,387 - root - INFO - Step 21060: lr=1.00E-05, loss= 1.2306 (max= 1.7606), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:29,394 - root - INFO - Step 21070: lr=1.00E-05, loss= 1.1858 (max= 1.6251), tps=20474, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:29,394 - root - INFO - Step 21070: lr=1.00E-05, loss= 1.1858 (max= 1.6251), tps=20474, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:29,394 - root - INFO - Step 21070: lr=1.00E-05, loss= 1.1858 (max= 1.6251), tps=20474, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:29,395 - root - INFO - Step 21070: lr=1.00E-05, loss= 1.1858 (max= 1.6251), tps=20474, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:29,395 - root - INFO - Step 21070: lr=1.00E-05, loss= 1.1858 (max= 1.6251), tps=20474, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:29,395 - root - INFO - Step 21070: lr=1.00E-05, loss= 1.1858 (max= 1.6251), tps=20474, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:29,395 - root - INFO - Step 21070: lr=1.00E-05, loss= 1.1858 (max= 1.6251), tps=20475, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:29,395 - root - INFO - Step 21070: lr=1.00E-05, loss= 1.1858 (max= 1.6251), tps=20475, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:45,303 - root - INFO - Step 21080: lr=1.00E-05, loss= 1.2365 (max= 1.8430), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:45,303 - root - INFO - Step 21080: lr=1.00E-05, loss= 1.2365 (max= 1.8430), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:45,303 - root - INFO - Step 21080: lr=1.00E-05, loss= 1.2365 (max= 1.8430), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:45,303 - root - INFO - Step 21080: lr=1.00E-05, loss= 1.2365 (max= 1.8430), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:45,303 - root - INFO - Step 21080: lr=1.00E-05, loss= 1.2365 (max= 1.8430), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:45,303 - root - INFO - Step 21080: lr=1.00E-05, loss= 1.2365 (max= 1.8430), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:45,303 - root - INFO - Step 21080: lr=1.00E-05, loss= 1.2365 (max= 1.8430), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:53:45,303 - root - INFO - Step 21080: lr=1.00E-05, loss= 1.2365 (max= 1.8430), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:01,279 - root - INFO - Step 21090: lr=1.00E-05, loss= 1.2174 (max= 1.5976), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:01,279 - root - INFO - Step 21090: lr=1.00E-05, loss= 1.2174 (max= 1.5976), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:01,279 - root - INFO - Step 21090: lr=1.00E-05, loss= 1.2174 (max= 1.5976), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:01,279 - root - INFO - Step 21090: lr=1.00E-05, loss= 1.2174 (max= 1.5976), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:01,279 - root - INFO - Step 21090: lr=1.00E-05, loss= 1.2174 (max= 1.5976), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:01,279 - root - INFO - Step 21090: lr=1.00E-05, loss= 1.2174 (max= 1.5976), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:01,279 - root - INFO - Step 21090: lr=1.00E-05, loss= 1.2174 (max= 1.5976), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:01,279 - root - INFO - Step 21090: lr=1.00E-05, loss= 1.2174 (max= 1.5976), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:17,246 - root - INFO - Step 21100: lr=1.00E-05, loss= 1.1702 (max= 1.5973), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:17,246 - root - INFO - Step 21100: lr=1.00E-05, loss= 1.1702 (max= 1.5973), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:17,246 - root - INFO - Step 21100: lr=1.00E-05, loss= 1.1702 (max= 1.5973), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:17,246 - root - INFO - Step 21100: lr=1.00E-05, loss= 1.1702 (max= 1.5973), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:17,246 - root - INFO - Step 21100: lr=1.00E-05, loss= 1.1702 (max= 1.5973), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:17,246 - root - INFO - Step 21100: lr=1.00E-05, loss= 1.1702 (max= 1.5973), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:17,246 - root - INFO - Step 21100: lr=1.00E-05, loss= 1.1702 (max= 1.5973), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:17,246 - root - INFO - Step 21100: lr=1.00E-05, loss= 1.1702 (max= 1.5973), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:33,227 - root - INFO - Step 21110: lr=1.00E-05, loss= 1.2048 (max= 1.5460), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:33,227 - root - INFO - Step 21110: lr=1.00E-05, loss= 1.2048 (max= 1.5460), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:33,227 - root - INFO - Step 21110: lr=1.00E-05, loss= 1.2048 (max= 1.5460), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:33,227 - root - INFO - Step 21110: lr=1.00E-05, loss= 1.2048 (max= 1.5460), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:33,227 - root - INFO - Step 21110: lr=1.00E-05, loss= 1.2048 (max= 1.5460), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:33,227 - root - INFO - Step 21110: lr=1.00E-05, loss= 1.2048 (max= 1.5460), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:33,227 - root - INFO - Step 21110: lr=1.00E-05, loss= 1.2048 (max= 1.5460), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:33,227 - root - INFO - Step 21110: lr=1.00E-05, loss= 1.2048 (max= 1.5460), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:49,155 - root - INFO - Step 21120: lr=1.00E-05, loss= 1.1878 (max= 1.5947), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:49,155 - root - INFO - Step 21120: lr=1.00E-05, loss= 1.1878 (max= 1.5947), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:49,155 - root - INFO - Step 21120: lr=1.00E-05, loss= 1.1878 (max= 1.5947), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:49,155 - root - INFO - Step 21120: lr=1.00E-05, loss= 1.1878 (max= 1.5947), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:49,155 - root - INFO - Step 21120: lr=1.00E-05, loss= 1.1878 (max= 1.5947), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:49,156 - root - INFO - Step 21120: lr=1.00E-05, loss= 1.1878 (max= 1.5947), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:49,156 - root - INFO - Step 21120: lr=1.00E-05, loss= 1.1878 (max= 1.5947), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:54:49,156 - root - INFO - Step 21120: lr=1.00E-05, loss= 1.1878 (max= 1.5947), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:05,110 - root - INFO - Step 21130: lr=1.00E-05, loss= 1.2025 (max= 1.6091), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:05,110 - root - INFO - Step 21130: lr=1.00E-05, loss= 1.2025 (max= 1.6091), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:05,110 - root - INFO - Step 21130: lr=1.00E-05, loss= 1.2025 (max= 1.6091), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:05,110 - root - INFO - Step 21130: lr=1.00E-05, loss= 1.2025 (max= 1.6091), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:05,110 - root - INFO - Step 21130: lr=1.00E-05, loss= 1.2025 (max= 1.6091), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:05,110 - root - INFO - Step 21130: lr=1.00E-05, loss= 1.2025 (max= 1.6091), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:05,110 - root - INFO - Step 21130: lr=1.00E-05, loss= 1.2025 (max= 1.6091), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:05,110 - root - INFO - Step 21130: lr=1.00E-05, loss= 1.2025 (max= 1.6091), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:21,038 - root - INFO - Step 21140: lr=1.00E-05, loss= 1.1735 (max= 1.5514), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:55:21,038 - root - INFO - Step 21140: lr=1.00E-05, loss= 1.1735 (max= 1.5514), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:55:21,038 - root - INFO - Step 21140: lr=1.00E-05, loss= 1.1735 (max= 1.5514), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:55:21,038 - root - INFO - Step 21140: lr=1.00E-05, loss= 1.1735 (max= 1.5514), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:55:21,038 - root - INFO - Step 21140: lr=1.00E-05, loss= 1.1735 (max= 1.5514), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:55:21,038 - root - INFO - Step 21140: lr=1.00E-05, loss= 1.1735 (max= 1.5514), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:55:21,038 - root - INFO - Step 21140: lr=1.00E-05, loss= 1.1735 (max= 1.5514), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:55:21,038 - root - INFO - Step 21140: lr=1.00E-05, loss= 1.1735 (max= 1.5514), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 19:55:36,982 - root - INFO - Step 21150: lr=1.00E-05, loss= 1.1566 (max= 1.5263), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:36,982 - root - INFO - Step 21150: lr=1.00E-05, loss= 1.1566 (max= 1.5263), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:36,982 - root - INFO - Step 21150: lr=1.00E-05, loss= 1.1566 (max= 1.5263), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:36,982 - root - INFO - Step 21150: lr=1.00E-05, loss= 1.1566 (max= 1.5263), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:36,983 - root - INFO - Step 21150: lr=1.00E-05, loss= 1.1566 (max= 1.5263), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:36,983 - root - INFO - Step 21150: lr=1.00E-05, loss= 1.1566 (max= 1.5263), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:36,983 - root - INFO - Step 21150: lr=1.00E-05, loss= 1.1566 (max= 1.5263), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:36,983 - root - INFO - Step 21150: lr=1.00E-05, loss= 1.1566 (max= 1.5263), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:52,916 - root - INFO - Step 21160: lr=1.00E-05, loss= 1.1963 (max= 1.6271), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:52,916 - root - INFO - Step 21160: lr=1.00E-05, loss= 1.1963 (max= 1.6271), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:52,916 - root - INFO - Step 21160: lr=1.00E-05, loss= 1.1963 (max= 1.6271), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:52,916 - root - INFO - Step 21160: lr=1.00E-05, loss= 1.1963 (max= 1.6271), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:52,916 - root - INFO - Step 21160: lr=1.00E-05, loss= 1.1963 (max= 1.6271), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:52,916 - root - INFO - Step 21160: lr=1.00E-05, loss= 1.1963 (max= 1.6271), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:52,916 - root - INFO - Step 21160: lr=1.00E-05, loss= 1.1963 (max= 1.6271), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:55:52,917 - root - INFO - Step 21160: lr=1.00E-05, loss= 1.1963 (max= 1.6271), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:08,852 - root - INFO - Step 21170: lr=1.00E-05, loss= 1.1789 (max= 1.4959), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:08,852 - root - INFO - Step 21170: lr=1.00E-05, loss= 1.1789 (max= 1.4959), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:08,852 - root - INFO - Step 21170: lr=1.00E-05, loss= 1.1789 (max= 1.4959), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:08,852 - root - INFO - Step 21170: lr=1.00E-05, loss= 1.1789 (max= 1.4959), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:08,852 - root - INFO - Step 21170: lr=1.00E-05, loss= 1.1789 (max= 1.4959), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:08,852 - root - INFO - Step 21170: lr=1.00E-05, loss= 1.1789 (max= 1.4959), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:08,852 - root - INFO - Step 21170: lr=1.00E-05, loss= 1.1789 (max= 1.4959), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:08,853 - root - INFO - Step 21170: lr=1.00E-05, loss= 1.1789 (max= 1.4959), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:24,786 - root - INFO - Step 21180: lr=1.00E-05, loss= 1.2037 (max= 1.6310), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:24,786 - root - INFO - Step 21180: lr=1.00E-05, loss= 1.2037 (max= 1.6310), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:24,786 - root - INFO - Step 21180: lr=1.00E-05, loss= 1.2037 (max= 1.6310), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:24,786 - root - INFO - Step 21180: lr=1.00E-05, loss= 1.2037 (max= 1.6310), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:24,786 - root - INFO - Step 21180: lr=1.00E-05, loss= 1.2037 (max= 1.6310), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:24,786 - root - INFO - Step 21180: lr=1.00E-05, loss= 1.2037 (max= 1.6310), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:24,786 - root - INFO - Step 21180: lr=1.00E-05, loss= 1.2037 (max= 1.6310), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:24,786 - root - INFO - Step 21180: lr=1.00E-05, loss= 1.2037 (max= 1.6310), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:40,703 - root - INFO - Step 21190: lr=1.00E-05, loss= 1.2133 (max= 1.6452), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:40,703 - root - INFO - Step 21190: lr=1.00E-05, loss= 1.2133 (max= 1.6452), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:40,703 - root - INFO - Step 21190: lr=1.00E-05, loss= 1.2133 (max= 1.6452), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:40,703 - root - INFO - Step 21190: lr=1.00E-05, loss= 1.2133 (max= 1.6452), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:40,703 - root - INFO - Step 21190: lr=1.00E-05, loss= 1.2133 (max= 1.6452), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:40,703 - root - INFO - Step 21190: lr=1.00E-05, loss= 1.2133 (max= 1.6452), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:40,703 - root - INFO - Step 21190: lr=1.00E-05, loss= 1.2133 (max= 1.6452), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:40,703 - root - INFO - Step 21190: lr=1.00E-05, loss= 1.2133 (max= 1.6452), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:56,630 - root - INFO - Step 21200: lr=1.00E-05, loss= 1.1846 (max= 1.6376), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:56,630 - root - INFO - Step 21200: lr=1.00E-05, loss= 1.1846 (max= 1.6376), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:56,630 - root - INFO - Step 21200: lr=1.00E-05, loss= 1.1846 (max= 1.6376), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:56,630 - root - INFO - Step 21200: lr=1.00E-05, loss= 1.1846 (max= 1.6376), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:56,630 - root - INFO - Step 21200: lr=1.00E-05, loss= 1.1846 (max= 1.6376), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:56,630 - root - INFO - Step 21200: lr=1.00E-05, loss= 1.1846 (max= 1.6376), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:56,630 - root - INFO - Step 21200: lr=1.00E-05, loss= 1.1846 (max= 1.6376), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:56:56,630 - root - INFO - Step 21200: lr=1.00E-05, loss= 1.1846 (max= 1.6376), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:12,588 - root - INFO - Step 21210: lr=1.00E-05, loss= 1.2035 (max= 1.7472), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:12,588 - root - INFO - Step 21210: lr=1.00E-05, loss= 1.2035 (max= 1.7472), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:12,588 - root - INFO - Step 21210: lr=1.00E-05, loss= 1.2035 (max= 1.7472), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:12,588 - root - INFO - Step 21210: lr=1.00E-05, loss= 1.2035 (max= 1.7472), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:12,588 - root - INFO - Step 21210: lr=1.00E-05, loss= 1.2035 (max= 1.7472), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:12,588 - root - INFO - Step 21210: lr=1.00E-05, loss= 1.2035 (max= 1.7472), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:12,588 - root - INFO - Step 21210: lr=1.00E-05, loss= 1.2035 (max= 1.7472), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:12,588 - root - INFO - Step 21210: lr=1.00E-05, loss= 1.2035 (max= 1.7472), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:28,590 - root - INFO - Step 21220: lr=1.00E-05, loss= 1.2014 (max= 1.6812), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:28,590 - root - INFO - Step 21220: lr=1.00E-05, loss= 1.2014 (max= 1.6812), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:28,590 - root - INFO - Step 21220: lr=1.00E-05, loss= 1.2014 (max= 1.6812), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:28,591 - root - INFO - Step 21220: lr=1.00E-05, loss= 1.2014 (max= 1.6812), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:28,591 - root - INFO - Step 21220: lr=1.00E-05, loss= 1.2014 (max= 1.6812), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:28,591 - root - INFO - Step 21220: lr=1.00E-05, loss= 1.2014 (max= 1.6812), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:28,591 - root - INFO - Step 21220: lr=1.00E-05, loss= 1.2014 (max= 1.6812), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:28,591 - root - INFO - Step 21220: lr=1.00E-05, loss= 1.2014 (max= 1.6812), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:44,534 - root - INFO - Step 21230: lr=1.00E-05, loss= 1.2130 (max= 1.6847), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:44,534 - root - INFO - Step 21230: lr=1.00E-05, loss= 1.2130 (max= 1.6847), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:44,534 - root - INFO - Step 21230: lr=1.00E-05, loss= 1.2130 (max= 1.6847), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:44,534 - root - INFO - Step 21230: lr=1.00E-05, loss= 1.2130 (max= 1.6847), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:44,534 - root - INFO - Step 21230: lr=1.00E-05, loss= 1.2130 (max= 1.6847), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:44,534 - root - INFO - Step 21230: lr=1.00E-05, loss= 1.2130 (max= 1.6847), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:44,534 - root - INFO - Step 21230: lr=1.00E-05, loss= 1.2130 (max= 1.6847), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:57:44,534 - root - INFO - Step 21230: lr=1.00E-05, loss= 1.2130 (max= 1.6847), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:00,496 - root - INFO - Step 21240: lr=1.00E-05, loss= 1.2069 (max= 1.5231), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:00,496 - root - INFO - Step 21240: lr=1.00E-05, loss= 1.2069 (max= 1.5231), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:00,496 - root - INFO - Step 21240: lr=1.00E-05, loss= 1.2069 (max= 1.5231), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:00,496 - root - INFO - Step 21240: lr=1.00E-05, loss= 1.2069 (max= 1.5231), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:00,496 - root - INFO - Step 21240: lr=1.00E-05, loss= 1.2069 (max= 1.5231), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:00,496 - root - INFO - Step 21240: lr=1.00E-05, loss= 1.2069 (max= 1.5231), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:00,496 - root - INFO - Step 21240: lr=1.00E-05, loss= 1.2069 (max= 1.5231), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:00,496 - root - INFO - Step 21240: lr=1.00E-05, loss= 1.2069 (max= 1.5231), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:16,497 - root - INFO - Step 21250: lr=1.00E-05, loss= 1.1968 (max= 1.6086), tps=20484, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:16,497 - root - INFO - Step 21250: lr=1.00E-05, loss= 1.1968 (max= 1.6086), tps=20483, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:16,497 - root - INFO - Step 21250: lr=1.00E-05, loss= 1.1968 (max= 1.6086), tps=20483, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:16,497 - root - INFO - Step 21250: lr=1.00E-05, loss= 1.1968 (max= 1.6086), tps=20483, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:16,497 - root - INFO - Step 21250: lr=1.00E-05, loss= 1.1968 (max= 1.6086), tps=20483, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:16,497 - root - INFO - Step 21250: lr=1.00E-05, loss= 1.1968 (max= 1.6086), tps=20484, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:16,497 - root - INFO - Step 21250: lr=1.00E-05, loss= 1.1968 (max= 1.6086), tps=20483, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:16,497 - root - INFO - Step 21250: lr=1.00E-05, loss= 1.1968 (max= 1.6086), tps=20484, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:32,465 - root - INFO - Step 21260: lr=1.00E-05, loss= 1.2050 (max= 1.6080), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:32,465 - root - INFO - Step 21260: lr=1.00E-05, loss= 1.2050 (max= 1.6080), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:32,465 - root - INFO - Step 21260: lr=1.00E-05, loss= 1.2050 (max= 1.6080), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:32,466 - root - INFO - Step 21260: lr=1.00E-05, loss= 1.2050 (max= 1.6080), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:32,466 - root - INFO - Step 21260: lr=1.00E-05, loss= 1.2050 (max= 1.6080), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:32,466 - root - INFO - Step 21260: lr=1.00E-05, loss= 1.2050 (max= 1.6080), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:32,466 - root - INFO - Step 21260: lr=1.00E-05, loss= 1.2050 (max= 1.6080), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:32,466 - root - INFO - Step 21260: lr=1.00E-05, loss= 1.2050 (max= 1.6080), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:48,425 - root - INFO - Step 21270: lr=1.00E-05, loss= 1.2188 (max= 1.5956), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:48,425 - root - INFO - Step 21270: lr=1.00E-05, loss= 1.2188 (max= 1.5956), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:48,425 - root - INFO - Step 21270: lr=1.00E-05, loss= 1.2188 (max= 1.5956), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:48,426 - root - INFO - Step 21270: lr=1.00E-05, loss= 1.2188 (max= 1.5956), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:48,426 - root - INFO - Step 21270: lr=1.00E-05, loss= 1.2188 (max= 1.5956), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:48,426 - root - INFO - Step 21270: lr=1.00E-05, loss= 1.2188 (max= 1.5956), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:48,426 - root - INFO - Step 21270: lr=1.00E-05, loss= 1.2188 (max= 1.5956), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:58:48,426 - root - INFO - Step 21270: lr=1.00E-05, loss= 1.2188 (max= 1.5956), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:04,369 - root - INFO - Step 21280: lr=1.00E-05, loss= 1.1724 (max= 1.5756), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:04,369 - root - INFO - Step 21280: lr=1.00E-05, loss= 1.1724 (max= 1.5756), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:04,369 - root - INFO - Step 21280: lr=1.00E-05, loss= 1.1724 (max= 1.5756), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:04,369 - root - INFO - Step 21280: lr=1.00E-05, loss= 1.1724 (max= 1.5756), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:04,369 - root - INFO - Step 21280: lr=1.00E-05, loss= 1.1724 (max= 1.5756), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:04,369 - root - INFO - Step 21280: lr=1.00E-05, loss= 1.1724 (max= 1.5756), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:04,369 - root - INFO - Step 21280: lr=1.00E-05, loss= 1.1724 (max= 1.5756), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:04,369 - root - INFO - Step 21280: lr=1.00E-05, loss= 1.1724 (max= 1.5756), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:20,335 - root - INFO - Step 21290: lr=1.00E-05, loss= 1.1836 (max= 1.6806), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:20,335 - root - INFO - Step 21290: lr=1.00E-05, loss= 1.1836 (max= 1.6806), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:20,335 - root - INFO - Step 21290: lr=1.00E-05, loss= 1.1836 (max= 1.6806), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:20,335 - root - INFO - Step 21290: lr=1.00E-05, loss= 1.1836 (max= 1.6806), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:20,335 - root - INFO - Step 21290: lr=1.00E-05, loss= 1.1836 (max= 1.6806), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:20,335 - root - INFO - Step 21290: lr=1.00E-05, loss= 1.1836 (max= 1.6806), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:20,335 - root - INFO - Step 21290: lr=1.00E-05, loss= 1.1836 (max= 1.6806), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:20,335 - root - INFO - Step 21290: lr=1.00E-05, loss= 1.1836 (max= 1.6806), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:36,242 - root - INFO - Step 21300: lr=1.00E-05, loss= 1.1901 (max= 1.5820), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:36,242 - root - INFO - Step 21300: lr=1.00E-05, loss= 1.1901 (max= 1.5820), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:36,242 - root - INFO - Step 21300: lr=1.00E-05, loss= 1.1901 (max= 1.5820), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:36,242 - root - INFO - Step 21300: lr=1.00E-05, loss= 1.1901 (max= 1.5820), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:36,242 - root - INFO - Step 21300: lr=1.00E-05, loss= 1.1901 (max= 1.5820), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:36,242 - root - INFO - Step 21300: lr=1.00E-05, loss= 1.1901 (max= 1.5820), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:36,242 - root - INFO - Step 21300: lr=1.00E-05, loss= 1.1901 (max= 1.5820), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:36,243 - root - INFO - Step 21300: lr=1.00E-05, loss= 1.1901 (max= 1.5820), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:52,132 - root - INFO - Step 21310: lr=1.00E-05, loss= 1.1982 (max= 1.5079), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:52,132 - root - INFO - Step 21310: lr=1.00E-05, loss= 1.1982 (max= 1.5079), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:52,132 - root - INFO - Step 21310: lr=1.00E-05, loss= 1.1982 (max= 1.5079), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:52,133 - root - INFO - Step 21310: lr=1.00E-05, loss= 1.1982 (max= 1.5079), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:52,133 - root - INFO - Step 21310: lr=1.00E-05, loss= 1.1982 (max= 1.5079), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:52,133 - root - INFO - Step 21310: lr=1.00E-05, loss= 1.1982 (max= 1.5079), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:52,133 - root - INFO - Step 21310: lr=1.00E-05, loss= 1.1982 (max= 1.5079), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 19:59:52,133 - root - INFO - Step 21310: lr=1.00E-05, loss= 1.1982 (max= 1.5079), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:08,067 - root - INFO - Step 21320: lr=1.00E-05, loss= 1.1716 (max= 1.6521), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:00:08,067 - root - INFO - Step 21320: lr=1.00E-05, loss= 1.1716 (max= 1.6521), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:00:08,067 - root - INFO - Step 21320: lr=1.00E-05, loss= 1.1716 (max= 1.6521), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:00:08,067 - root - INFO - Step 21320: lr=1.00E-05, loss= 1.1716 (max= 1.6521), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:00:08,067 - root - INFO - Step 21320: lr=1.00E-05, loss= 1.1716 (max= 1.6521), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:00:08,067 - root - INFO - Step 21320: lr=1.00E-05, loss= 1.1716 (max= 1.6521), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:00:08,068 - root - INFO - Step 21320: lr=1.00E-05, loss= 1.1716 (max= 1.6521), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:00:08,068 - root - INFO - Step 21320: lr=1.00E-05, loss= 1.1716 (max= 1.6521), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:00:23,996 - root - INFO - Step 21330: lr=1.00E-05, loss= 1.1954 (max= 1.5081), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:23,996 - root - INFO - Step 21330: lr=1.00E-05, loss= 1.1954 (max= 1.5081), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:23,996 - root - INFO - Step 21330: lr=1.00E-05, loss= 1.1954 (max= 1.5081), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:23,996 - root - INFO - Step 21330: lr=1.00E-05, loss= 1.1954 (max= 1.5081), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:23,996 - root - INFO - Step 21330: lr=1.00E-05, loss= 1.1954 (max= 1.5081), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:23,996 - root - INFO - Step 21330: lr=1.00E-05, loss= 1.1954 (max= 1.5081), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:23,996 - root - INFO - Step 21330: lr=1.00E-05, loss= 1.1954 (max= 1.5081), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:23,996 - root - INFO - Step 21330: lr=1.00E-05, loss= 1.1954 (max= 1.5081), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:39,897 - root - INFO - Step 21340: lr=1.00E-05, loss= 1.2124 (max= 1.6350), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:39,897 - root - INFO - Step 21340: lr=1.00E-05, loss= 1.2124 (max= 1.6350), tps=20612, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:39,897 - root - INFO - Step 21340: lr=1.00E-05, loss= 1.2124 (max= 1.6350), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:39,897 - root - INFO - Step 21340: lr=1.00E-05, loss= 1.2124 (max= 1.6350), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:39,897 - root - INFO - Step 21340: lr=1.00E-05, loss= 1.2124 (max= 1.6350), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:39,897 - root - INFO - Step 21340: lr=1.00E-05, loss= 1.2124 (max= 1.6350), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:39,897 - root - INFO - Step 21340: lr=1.00E-05, loss= 1.2124 (max= 1.6350), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:39,898 - root - INFO - Step 21340: lr=1.00E-05, loss= 1.2124 (max= 1.6350), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:55,831 - root - INFO - Step 21350: lr=1.00E-05, loss= 1.2015 (max= 1.7239), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:55,831 - root - INFO - Step 21350: lr=1.00E-05, loss= 1.2015 (max= 1.7239), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:55,831 - root - INFO - Step 21350: lr=1.00E-05, loss= 1.2015 (max= 1.7239), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:55,831 - root - INFO - Step 21350: lr=1.00E-05, loss= 1.2015 (max= 1.7239), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:55,831 - root - INFO - Step 21350: lr=1.00E-05, loss= 1.2015 (max= 1.7239), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:55,831 - root - INFO - Step 21350: lr=1.00E-05, loss= 1.2015 (max= 1.7239), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:55,831 - root - INFO - Step 21350: lr=1.00E-05, loss= 1.2015 (max= 1.7239), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:00:55,831 - root - INFO - Step 21350: lr=1.00E-05, loss= 1.2015 (max= 1.7239), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:11,789 - root - INFO - Step 21360: lr=1.00E-05, loss= 1.1944 (max= 1.6411), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:11,789 - root - INFO - Step 21360: lr=1.00E-05, loss= 1.1944 (max= 1.6411), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:11,789 - root - INFO - Step 21360: lr=1.00E-05, loss= 1.1944 (max= 1.6411), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:11,789 - root - INFO - Step 21360: lr=1.00E-05, loss= 1.1944 (max= 1.6411), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:11,789 - root - INFO - Step 21360: lr=1.00E-05, loss= 1.1944 (max= 1.6411), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:11,789 - root - INFO - Step 21360: lr=1.00E-05, loss= 1.1944 (max= 1.6411), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:11,789 - root - INFO - Step 21360: lr=1.00E-05, loss= 1.1944 (max= 1.6411), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:11,789 - root - INFO - Step 21360: lr=1.00E-05, loss= 1.1944 (max= 1.6411), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:27,734 - root - INFO - Step 21370: lr=1.00E-05, loss= 1.1900 (max= 1.7691), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:27,734 - root - INFO - Step 21370: lr=1.00E-05, loss= 1.1900 (max= 1.7691), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:27,734 - root - INFO - Step 21370: lr=1.00E-05, loss= 1.1900 (max= 1.7691), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:27,735 - root - INFO - Step 21370: lr=1.00E-05, loss= 1.1900 (max= 1.7691), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:27,735 - root - INFO - Step 21370: lr=1.00E-05, loss= 1.1900 (max= 1.7691), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:27,735 - root - INFO - Step 21370: lr=1.00E-05, loss= 1.1900 (max= 1.7691), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:27,735 - root - INFO - Step 21370: lr=1.00E-05, loss= 1.1900 (max= 1.7691), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:27,735 - root - INFO - Step 21370: lr=1.00E-05, loss= 1.1900 (max= 1.7691), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:43,676 - root - INFO - Step 21380: lr=1.00E-05, loss= 1.2005 (max= 1.6864), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:01:43,676 - root - INFO - Step 21380: lr=1.00E-05, loss= 1.2005 (max= 1.6864), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:01:43,676 - root - INFO - Step 21380: lr=1.00E-05, loss= 1.2005 (max= 1.6864), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:01:43,677 - root - INFO - Step 21380: lr=1.00E-05, loss= 1.2005 (max= 1.6864), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:01:43,677 - root - INFO - Step 21380: lr=1.00E-05, loss= 1.2005 (max= 1.6864), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:01:43,677 - root - INFO - Step 21380: lr=1.00E-05, loss= 1.2005 (max= 1.6864), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:01:43,677 - root - INFO - Step 21380: lr=1.00E-05, loss= 1.2005 (max= 1.6864), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:01:43,677 - root - INFO - Step 21380: lr=1.00E-05, loss= 1.2005 (max= 1.6864), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:01:59,592 - root - INFO - Step 21390: lr=1.00E-05, loss= 1.2000 (max= 1.6330), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:59,592 - root - INFO - Step 21390: lr=1.00E-05, loss= 1.2000 (max= 1.6330), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:59,592 - root - INFO - Step 21390: lr=1.00E-05, loss= 1.2000 (max= 1.6330), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:59,592 - root - INFO - Step 21390: lr=1.00E-05, loss= 1.2000 (max= 1.6330), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:59,592 - root - INFO - Step 21390: lr=1.00E-05, loss= 1.2000 (max= 1.6330), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:59,592 - root - INFO - Step 21390: lr=1.00E-05, loss= 1.2000 (max= 1.6330), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:59,592 - root - INFO - Step 21390: lr=1.00E-05, loss= 1.2000 (max= 1.6330), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:01:59,592 - root - INFO - Step 21390: lr=1.00E-05, loss= 1.2000 (max= 1.6330), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:02:15,510 - root - INFO - Step 21400: lr=1.00E-05, loss= 1.2034 (max= 1.6030), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:02:15,510 - root - INFO - Step 21400: lr=1.00E-05, loss= 1.2034 (max= 1.6030), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:02:15,510 - root - INFO - Step 21400: lr=1.00E-05, loss= 1.2034 (max= 1.6030), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:02:15,510 - root - INFO - Step 21400: lr=1.00E-05, loss= 1.2034 (max= 1.6030), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:02:15,511 - root - INFO - Step 21400: lr=1.00E-05, loss= 1.2034 (max= 1.6030), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:02:15,511 - root - INFO - Step 21400: lr=1.00E-05, loss= 1.2034 (max= 1.6030), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:02:15,511 - root - INFO - Step 21400: lr=1.00E-05, loss= 1.2034 (max= 1.6030), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:02:15,511 - root - INFO - Step 21400: lr=1.00E-05, loss= 1.2034 (max= 1.6030), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:02:31,468 - root - INFO - Step 21410: lr=1.00E-05, loss= 1.2074 (max= 1.5114), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:02:31,468 - root - INFO - Step 21410: lr=1.00E-05, loss= 1.2074 (max= 1.5114), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:02:31,468 - root - INFO - Step 21410: lr=1.00E-05, loss= 1.2074 (max= 1.5114), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:02:31,468 - root - INFO - Step 21410: lr=1.00E-05, loss= 1.2074 (max= 1.5114), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:02:31,468 - root - INFO - Step 21410: lr=1.00E-05, loss= 1.2074 (max= 1.5114), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:02:31,468 - root - INFO - Step 21410: lr=1.00E-05, loss= 1.2074 (max= 1.5114), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:02:31,468 - root - INFO - Step 21410: lr=1.00E-05, loss= 1.2074 (max= 1.5114), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:02:31,468 - root - INFO - Step 21410: lr=1.00E-05, loss= 1.2074 (max= 1.5114), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:02:47,396 - root - INFO - Step 21420: lr=1.00E-05, loss= 1.1908 (max= 1.5178), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:02:47,396 - root - INFO - Step 21420: lr=1.00E-05, loss= 1.1908 (max= 1.5178), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:02:47,396 - root - INFO - Step 21420: lr=1.00E-05, loss= 1.1908 (max= 1.5178), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:02:47,396 - root - INFO - Step 21420: lr=1.00E-05, loss= 1.1908 (max= 1.5178), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:02:47,396 - root - INFO - Step 21420: lr=1.00E-05, loss= 1.1908 (max= 1.5178), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:02:47,396 - root - INFO - Step 21420: lr=1.00E-05, loss= 1.1908 (max= 1.5178), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:02:47,396 - root - INFO - Step 21420: lr=1.00E-05, loss= 1.1908 (max= 1.5178), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:02:47,396 - root - INFO - Step 21420: lr=1.00E-05, loss= 1.1908 (max= 1.5178), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:03,355 - root - INFO - Step 21430: lr=1.00E-05, loss= 1.2150 (max= 1.5962), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:03,355 - root - INFO - Step 21430: lr=1.00E-05, loss= 1.2150 (max= 1.5962), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:03,355 - root - INFO - Step 21430: lr=1.00E-05, loss= 1.2150 (max= 1.5962), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:03,355 - root - INFO - Step 21430: lr=1.00E-05, loss= 1.2150 (max= 1.5962), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:03,355 - root - INFO - Step 21430: lr=1.00E-05, loss= 1.2150 (max= 1.5962), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:03,355 - root - INFO - Step 21430: lr=1.00E-05, loss= 1.2150 (max= 1.5962), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:03,355 - root - INFO - Step 21430: lr=1.00E-05, loss= 1.2150 (max= 1.5962), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:03,355 - root - INFO - Step 21430: lr=1.00E-05, loss= 1.2150 (max= 1.5962), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:05,719 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:6432973 +2025-10-24 20:03:19,284 - root - INFO - Step 21440: lr=1.00E-05, loss= 1.2137 (max= 1.5200), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:19,284 - root - INFO - Step 21440: lr=1.00E-05, loss= 1.2137 (max= 1.5200), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:19,284 - root - INFO - Step 21440: lr=1.00E-05, loss= 1.2137 (max= 1.5200), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:19,284 - root - INFO - Step 21440: lr=1.00E-05, loss= 1.2137 (max= 1.5200), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:19,284 - root - INFO - Step 21440: lr=1.00E-05, loss= 1.2137 (max= 1.5200), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:19,284 - root - INFO - Step 21440: lr=1.00E-05, loss= 1.2137 (max= 1.5200), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:19,284 - root - INFO - Step 21440: lr=1.00E-05, loss= 1.2137 (max= 1.5200), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:19,284 - root - INFO - Step 21440: lr=1.00E-05, loss= 1.2137 (max= 1.5200), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:35,240 - root - INFO - Step 21450: lr=1.00E-05, loss= 1.1943 (max= 1.5362), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:35,240 - root - INFO - Step 21450: lr=1.00E-05, loss= 1.1943 (max= 1.5362), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:35,240 - root - INFO - Step 21450: lr=1.00E-05, loss= 1.1943 (max= 1.5362), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:35,240 - root - INFO - Step 21450: lr=1.00E-05, loss= 1.1943 (max= 1.5362), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:35,240 - root - INFO - Step 21450: lr=1.00E-05, loss= 1.1943 (max= 1.5362), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:35,240 - root - INFO - Step 21450: lr=1.00E-05, loss= 1.1943 (max= 1.5362), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:35,240 - root - INFO - Step 21450: lr=1.00E-05, loss= 1.1943 (max= 1.5362), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:35,240 - root - INFO - Step 21450: lr=1.00E-05, loss= 1.1943 (max= 1.5362), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:51,152 - root - INFO - Step 21460: lr=1.00E-05, loss= 1.2100 (max= 1.5466), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:51,152 - root - INFO - Step 21460: lr=1.00E-05, loss= 1.2100 (max= 1.5466), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:51,152 - root - INFO - Step 21460: lr=1.00E-05, loss= 1.2100 (max= 1.5466), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:51,152 - root - INFO - Step 21460: lr=1.00E-05, loss= 1.2100 (max= 1.5466), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:51,152 - root - INFO - Step 21460: lr=1.00E-05, loss= 1.2100 (max= 1.5466), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:51,152 - root - INFO - Step 21460: lr=1.00E-05, loss= 1.2100 (max= 1.5466), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:51,152 - root - INFO - Step 21460: lr=1.00E-05, loss= 1.2100 (max= 1.5466), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:03:51,152 - root - INFO - Step 21460: lr=1.00E-05, loss= 1.2100 (max= 1.5466), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:04:07,081 - root - INFO - Step 21470: lr=1.00E-05, loss= 1.2008 (max= 1.5380), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:07,081 - root - INFO - Step 21470: lr=1.00E-05, loss= 1.2008 (max= 1.5380), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:07,081 - root - INFO - Step 21470: lr=1.00E-05, loss= 1.2008 (max= 1.5380), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:07,081 - root - INFO - Step 21470: lr=1.00E-05, loss= 1.2008 (max= 1.5380), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:07,081 - root - INFO - Step 21470: lr=1.00E-05, loss= 1.2008 (max= 1.5380), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:07,081 - root - INFO - Step 21470: lr=1.00E-05, loss= 1.2008 (max= 1.5380), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:07,081 - root - INFO - Step 21470: lr=1.00E-05, loss= 1.2008 (max= 1.5380), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:07,081 - root - INFO - Step 21470: lr=1.00E-05, loss= 1.2008 (max= 1.5380), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:23,095 - root - INFO - Step 21480: lr=1.00E-05, loss= 1.2039 (max= 1.6959), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:04:23,095 - root - INFO - Step 21480: lr=1.00E-05, loss= 1.2039 (max= 1.6959), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:04:23,095 - root - INFO - Step 21480: lr=1.00E-05, loss= 1.2039 (max= 1.6959), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:04:23,095 - root - INFO - Step 21480: lr=1.00E-05, loss= 1.2039 (max= 1.6959), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:04:23,095 - root - INFO - Step 21480: lr=1.00E-05, loss= 1.2039 (max= 1.6959), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:04:23,095 - root - INFO - Step 21480: lr=1.00E-05, loss= 1.2039 (max= 1.6959), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:04:23,095 - root - INFO - Step 21480: lr=1.00E-05, loss= 1.2039 (max= 1.6959), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:04:23,095 - root - INFO - Step 21480: lr=1.00E-05, loss= 1.2039 (max= 1.6959), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:04:39,048 - root - INFO - Step 21490: lr=1.00E-05, loss= 1.1760 (max= 1.4798), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:39,048 - root - INFO - Step 21490: lr=1.00E-05, loss= 1.1760 (max= 1.4798), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:39,048 - root - INFO - Step 21490: lr=1.00E-05, loss= 1.1760 (max= 1.4798), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:39,048 - root - INFO - Step 21490: lr=1.00E-05, loss= 1.1760 (max= 1.4798), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:39,048 - root - INFO - Step 21490: lr=1.00E-05, loss= 1.1760 (max= 1.4798), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:39,048 - root - INFO - Step 21490: lr=1.00E-05, loss= 1.1760 (max= 1.4798), tps=20545, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:39,048 - root - INFO - Step 21490: lr=1.00E-05, loss= 1.1760 (max= 1.4798), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:39,048 - root - INFO - Step 21490: lr=1.00E-05, loss= 1.1760 (max= 1.4798), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:54,998 - root - INFO - Step 21500: lr=1.00E-05, loss= 1.1816 (max= 1.5302), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:54,998 - root - INFO - Step 21500: lr=1.00E-05, loss= 1.1816 (max= 1.5302), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:54,998 - root - INFO - Step 21500: lr=1.00E-05, loss= 1.1816 (max= 1.5302), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:54,998 - root - INFO - Step 21500: lr=1.00E-05, loss= 1.1816 (max= 1.5302), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:54,998 - root - INFO - Step 21500: lr=1.00E-05, loss= 1.1816 (max= 1.5302), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:54,998 - root - INFO - Step 21500: lr=1.00E-05, loss= 1.1816 (max= 1.5302), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:54,998 - root - INFO - Step 21500: lr=1.00E-05, loss= 1.1816 (max= 1.5302), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:04:54,998 - root - INFO - Step 21500: lr=1.00E-05, loss= 1.1816 (max= 1.5302), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:05:10,958 - root - INFO - Step 21510: lr=1.00E-05, loss= 1.2003 (max= 1.6344), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:10,958 - root - INFO - Step 21510: lr=1.00E-05, loss= 1.2003 (max= 1.6344), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:10,959 - root - INFO - Step 21510: lr=1.00E-05, loss= 1.2003 (max= 1.6344), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:10,959 - root - INFO - Step 21510: lr=1.00E-05, loss= 1.2003 (max= 1.6344), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:10,959 - root - INFO - Step 21510: lr=1.00E-05, loss= 1.2003 (max= 1.6344), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:10,959 - root - INFO - Step 21510: lr=1.00E-05, loss= 1.2003 (max= 1.6344), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:10,959 - root - INFO - Step 21510: lr=1.00E-05, loss= 1.2003 (max= 1.6344), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:10,959 - root - INFO - Step 21510: lr=1.00E-05, loss= 1.2003 (max= 1.6344), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:26,868 - root - INFO - Step 21520: lr=1.00E-05, loss= 1.1824 (max= 1.6509), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:26,868 - root - INFO - Step 21520: lr=1.00E-05, loss= 1.1824 (max= 1.6509), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:26,868 - root - INFO - Step 21520: lr=1.00E-05, loss= 1.1824 (max= 1.6509), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:26,868 - root - INFO - Step 21520: lr=1.00E-05, loss= 1.1824 (max= 1.6509), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:26,868 - root - INFO - Step 21520: lr=1.00E-05, loss= 1.1824 (max= 1.6509), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:26,868 - root - INFO - Step 21520: lr=1.00E-05, loss= 1.1824 (max= 1.6509), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:26,869 - root - INFO - Step 21520: lr=1.00E-05, loss= 1.1824 (max= 1.6509), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:26,869 - root - INFO - Step 21520: lr=1.00E-05, loss= 1.1824 (max= 1.6509), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:42,827 - root - INFO - Step 21530: lr=1.00E-05, loss= 1.1982 (max= 1.6141), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:42,827 - root - INFO - Step 21530: lr=1.00E-05, loss= 1.1982 (max= 1.6141), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:42,828 - root - INFO - Step 21530: lr=1.00E-05, loss= 1.1982 (max= 1.6141), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:42,828 - root - INFO - Step 21530: lr=1.00E-05, loss= 1.1982 (max= 1.6141), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:42,828 - root - INFO - Step 21530: lr=1.00E-05, loss= 1.1982 (max= 1.6141), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:42,828 - root - INFO - Step 21530: lr=1.00E-05, loss= 1.1982 (max= 1.6141), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:42,828 - root - INFO - Step 21530: lr=1.00E-05, loss= 1.1982 (max= 1.6141), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:42,828 - root - INFO - Step 21530: lr=1.00E-05, loss= 1.1982 (max= 1.6141), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:58,783 - root - INFO - Step 21540: lr=1.00E-05, loss= 1.1743 (max= 1.7040), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:58,783 - root - INFO - Step 21540: lr=1.00E-05, loss= 1.1743 (max= 1.7040), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:58,783 - root - INFO - Step 21540: lr=1.00E-05, loss= 1.1743 (max= 1.7040), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:58,783 - root - INFO - Step 21540: lr=1.00E-05, loss= 1.1743 (max= 1.7040), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:58,783 - root - INFO - Step 21540: lr=1.00E-05, loss= 1.1743 (max= 1.7040), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:58,783 - root - INFO - Step 21540: lr=1.00E-05, loss= 1.1743 (max= 1.7040), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:58,783 - root - INFO - Step 21540: lr=1.00E-05, loss= 1.1743 (max= 1.7040), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:05:58,783 - root - INFO - Step 21540: lr=1.00E-05, loss= 1.1743 (max= 1.7040), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:14,750 - root - INFO - Step 21550: lr=1.00E-05, loss= 1.1696 (max= 1.8313), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:14,750 - root - INFO - Step 21550: lr=1.00E-05, loss= 1.1696 (max= 1.8313), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:14,750 - root - INFO - Step 21550: lr=1.00E-05, loss= 1.1696 (max= 1.8313), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:14,750 - root - INFO - Step 21550: lr=1.00E-05, loss= 1.1696 (max= 1.8313), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:14,750 - root - INFO - Step 21550: lr=1.00E-05, loss= 1.1696 (max= 1.8313), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:14,750 - root - INFO - Step 21550: lr=1.00E-05, loss= 1.1696 (max= 1.8313), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:14,750 - root - INFO - Step 21550: lr=1.00E-05, loss= 1.1696 (max= 1.8313), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:14,750 - root - INFO - Step 21550: lr=1.00E-05, loss= 1.1696 (max= 1.8313), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:30,682 - root - INFO - Step 21560: lr=1.00E-05, loss= 1.2122 (max= 1.5519), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:30,682 - root - INFO - Step 21560: lr=1.00E-05, loss= 1.2122 (max= 1.5519), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:30,682 - root - INFO - Step 21560: lr=1.00E-05, loss= 1.2122 (max= 1.5519), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:30,682 - root - INFO - Step 21560: lr=1.00E-05, loss= 1.2122 (max= 1.5519), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:30,682 - root - INFO - Step 21560: lr=1.00E-05, loss= 1.2122 (max= 1.5519), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:30,682 - root - INFO - Step 21560: lr=1.00E-05, loss= 1.2122 (max= 1.5519), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:30,682 - root - INFO - Step 21560: lr=1.00E-05, loss= 1.2122 (max= 1.5519), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:30,682 - root - INFO - Step 21560: lr=1.00E-05, loss= 1.2122 (max= 1.5519), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:46,631 - root - INFO - Step 21570: lr=1.00E-05, loss= 1.2036 (max= 1.5981), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:46,631 - root - INFO - Step 21570: lr=1.00E-05, loss= 1.2036 (max= 1.5981), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:46,631 - root - INFO - Step 21570: lr=1.00E-05, loss= 1.2036 (max= 1.5981), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:46,631 - root - INFO - Step 21570: lr=1.00E-05, loss= 1.2036 (max= 1.5981), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:46,631 - root - INFO - Step 21570: lr=1.00E-05, loss= 1.2036 (max= 1.5981), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:46,632 - root - INFO - Step 21570: lr=1.00E-05, loss= 1.2036 (max= 1.5981), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:46,632 - root - INFO - Step 21570: lr=1.00E-05, loss= 1.2036 (max= 1.5981), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:06:46,632 - root - INFO - Step 21570: lr=1.00E-05, loss= 1.2036 (max= 1.5981), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:02,576 - root - INFO - Step 21580: lr=1.00E-05, loss= 1.1818 (max= 1.5925), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:02,576 - root - INFO - Step 21580: lr=1.00E-05, loss= 1.1818 (max= 1.5925), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:02,576 - root - INFO - Step 21580: lr=1.00E-05, loss= 1.1818 (max= 1.5925), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:02,576 - root - INFO - Step 21580: lr=1.00E-05, loss= 1.1818 (max= 1.5925), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:02,576 - root - INFO - Step 21580: lr=1.00E-05, loss= 1.1818 (max= 1.5925), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:02,576 - root - INFO - Step 21580: lr=1.00E-05, loss= 1.1818 (max= 1.5925), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:02,576 - root - INFO - Step 21580: lr=1.00E-05, loss= 1.1818 (max= 1.5925), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:02,576 - root - INFO - Step 21580: lr=1.00E-05, loss= 1.1818 (max= 1.5925), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:18,503 - root - INFO - Step 21590: lr=1.00E-05, loss= 1.2000 (max= 1.5932), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:18,503 - root - INFO - Step 21590: lr=1.00E-05, loss= 1.2000 (max= 1.5932), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:18,503 - root - INFO - Step 21590: lr=1.00E-05, loss= 1.2000 (max= 1.5932), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:18,503 - root - INFO - Step 21590: lr=1.00E-05, loss= 1.2000 (max= 1.5932), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:18,503 - root - INFO - Step 21590: lr=1.00E-05, loss= 1.2000 (max= 1.5932), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:18,503 - root - INFO - Step 21590: lr=1.00E-05, loss= 1.2000 (max= 1.5932), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:18,503 - root - INFO - Step 21590: lr=1.00E-05, loss= 1.2000 (max= 1.5932), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:18,503 - root - INFO - Step 21590: lr=1.00E-05, loss= 1.2000 (max= 1.5932), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:34,422 - root - INFO - Step 21600: lr=1.00E-05, loss= 1.1602 (max= 1.5310), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:07:34,422 - root - INFO - Step 21600: lr=1.00E-05, loss= 1.1602 (max= 1.5310), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:07:34,422 - root - INFO - Step 21600: lr=1.00E-05, loss= 1.1602 (max= 1.5310), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:07:34,423 - root - INFO - Step 21600: lr=1.00E-05, loss= 1.1602 (max= 1.5310), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:07:34,423 - root - INFO - Step 21600: lr=1.00E-05, loss= 1.1602 (max= 1.5310), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:07:34,423 - root - INFO - Step 21600: lr=1.00E-05, loss= 1.1602 (max= 1.5310), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:07:34,423 - root - INFO - Step 21600: lr=1.00E-05, loss= 1.1602 (max= 1.5310), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:07:34,423 - root - INFO - Step 21600: lr=1.00E-05, loss= 1.1602 (max= 1.5310), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:07:50,380 - root - INFO - Step 21610: lr=1.00E-05, loss= 1.1840 (max= 1.7138), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:50,380 - root - INFO - Step 21610: lr=1.00E-05, loss= 1.1840 (max= 1.7138), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:50,380 - root - INFO - Step 21610: lr=1.00E-05, loss= 1.1840 (max= 1.7138), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:50,380 - root - INFO - Step 21610: lr=1.00E-05, loss= 1.1840 (max= 1.7138), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:50,380 - root - INFO - Step 21610: lr=1.00E-05, loss= 1.1840 (max= 1.7138), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:50,380 - root - INFO - Step 21610: lr=1.00E-05, loss= 1.1840 (max= 1.7138), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:50,380 - root - INFO - Step 21610: lr=1.00E-05, loss= 1.1840 (max= 1.7138), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:07:50,380 - root - INFO - Step 21610: lr=1.00E-05, loss= 1.1840 (max= 1.7138), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:08:06,272 - root - INFO - Step 21620: lr=1.00E-05, loss= 1.1973 (max= 1.5173), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:06,272 - root - INFO - Step 21620: lr=1.00E-05, loss= 1.1973 (max= 1.5173), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:06,272 - root - INFO - Step 21620: lr=1.00E-05, loss= 1.1973 (max= 1.5173), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:06,272 - root - INFO - Step 21620: lr=1.00E-05, loss= 1.1973 (max= 1.5173), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:06,272 - root - INFO - Step 21620: lr=1.00E-05, loss= 1.1973 (max= 1.5173), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:06,272 - root - INFO - Step 21620: lr=1.00E-05, loss= 1.1973 (max= 1.5173), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:06,272 - root - INFO - Step 21620: lr=1.00E-05, loss= 1.1973 (max= 1.5173), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:06,272 - root - INFO - Step 21620: lr=1.00E-05, loss= 1.1973 (max= 1.5173), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:22,230 - root - INFO - Step 21630: lr=1.00E-05, loss= 1.1994 (max= 1.6060), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:22,230 - root - INFO - Step 21630: lr=1.00E-05, loss= 1.1994 (max= 1.6060), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:22,230 - root - INFO - Step 21630: lr=1.00E-05, loss= 1.1994 (max= 1.6060), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:22,230 - root - INFO - Step 21630: lr=1.00E-05, loss= 1.1994 (max= 1.6060), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:22,230 - root - INFO - Step 21630: lr=1.00E-05, loss= 1.1994 (max= 1.6060), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:22,230 - root - INFO - Step 21630: lr=1.00E-05, loss= 1.1994 (max= 1.6060), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:22,230 - root - INFO - Step 21630: lr=1.00E-05, loss= 1.1994 (max= 1.6060), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:22,230 - root - INFO - Step 21630: lr=1.00E-05, loss= 1.1994 (max= 1.6060), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:38,167 - root - INFO - Step 21640: lr=1.00E-05, loss= 1.2302 (max= 1.6885), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:38,167 - root - INFO - Step 21640: lr=1.00E-05, loss= 1.2302 (max= 1.6885), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:38,167 - root - INFO - Step 21640: lr=1.00E-05, loss= 1.2302 (max= 1.6885), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:38,167 - root - INFO - Step 21640: lr=1.00E-05, loss= 1.2302 (max= 1.6885), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:38,167 - root - INFO - Step 21640: lr=1.00E-05, loss= 1.2302 (max= 1.6885), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:38,167 - root - INFO - Step 21640: lr=1.00E-05, loss= 1.2302 (max= 1.6885), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:38,167 - root - INFO - Step 21640: lr=1.00E-05, loss= 1.2302 (max= 1.6885), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:38,167 - root - INFO - Step 21640: lr=1.00E-05, loss= 1.2302 (max= 1.6885), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:54,134 - root - INFO - Step 21650: lr=1.00E-05, loss= 1.1815 (max= 1.5026), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:54,134 - root - INFO - Step 21650: lr=1.00E-05, loss= 1.1815 (max= 1.5026), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:54,134 - root - INFO - Step 21650: lr=1.00E-05, loss= 1.1815 (max= 1.5026), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:54,134 - root - INFO - Step 21650: lr=1.00E-05, loss= 1.1815 (max= 1.5026), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:54,134 - root - INFO - Step 21650: lr=1.00E-05, loss= 1.1815 (max= 1.5026), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:54,134 - root - INFO - Step 21650: lr=1.00E-05, loss= 1.1815 (max= 1.5026), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:54,134 - root - INFO - Step 21650: lr=1.00E-05, loss= 1.1815 (max= 1.5026), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:08:54,135 - root - INFO - Step 21650: lr=1.00E-05, loss= 1.1815 (max= 1.5026), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:09:10,005 - root - INFO - Step 21660: lr=1.00E-05, loss= 1.1776 (max= 1.7064), tps=20651, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:09:10,005 - root - INFO - Step 21660: lr=1.00E-05, loss= 1.1776 (max= 1.7064), tps=20651, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:09:10,005 - root - INFO - Step 21660: lr=1.00E-05, loss= 1.1776 (max= 1.7064), tps=20651, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:09:10,005 - root - INFO - Step 21660: lr=1.00E-05, loss= 1.1776 (max= 1.7064), tps=20651, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:09:10,006 - root - INFO - Step 21660: lr=1.00E-05, loss= 1.1776 (max= 1.7064), tps=20650, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:09:10,006 - root - INFO - Step 21660: lr=1.00E-05, loss= 1.1776 (max= 1.7064), tps=20651, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:09:10,006 - root - INFO - Step 21660: lr=1.00E-05, loss= 1.1776 (max= 1.7064), tps=20651, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:09:10,006 - root - INFO - Step 21660: lr=1.00E-05, loss= 1.1776 (max= 1.7064), tps=20651, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:09:25,903 - root - INFO - Step 21670: lr=1.00E-05, loss= 1.2115 (max= 1.5382), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:09:25,903 - root - INFO - Step 21670: lr=1.00E-05, loss= 1.2115 (max= 1.5382), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:09:25,903 - root - INFO - Step 21670: lr=1.00E-05, loss= 1.2115 (max= 1.5382), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:09:25,903 - root - INFO - Step 21670: lr=1.00E-05, loss= 1.2115 (max= 1.5382), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:09:25,903 - root - INFO - Step 21670: lr=1.00E-05, loss= 1.2115 (max= 1.5382), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:09:25,903 - root - INFO - Step 21670: lr=1.00E-05, loss= 1.2115 (max= 1.5382), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:09:25,903 - root - INFO - Step 21670: lr=1.00E-05, loss= 1.2115 (max= 1.5382), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:09:25,903 - root - INFO - Step 21670: lr=1.00E-05, loss= 1.2115 (max= 1.5382), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:09:41,858 - root - INFO - Step 21680: lr=1.00E-05, loss= 1.2127 (max= 1.5348), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:09:41,858 - root - INFO - Step 21680: lr=1.00E-05, loss= 1.2127 (max= 1.5348), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:09:41,858 - root - INFO - Step 21680: lr=1.00E-05, loss= 1.2127 (max= 1.5348), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:09:41,858 - root - INFO - Step 21680: lr=1.00E-05, loss= 1.2127 (max= 1.5348), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:09:41,858 - root - INFO - Step 21680: lr=1.00E-05, loss= 1.2127 (max= 1.5348), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:09:41,858 - root - INFO - Step 21680: lr=1.00E-05, loss= 1.2127 (max= 1.5348), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:09:41,858 - root - INFO - Step 21680: lr=1.00E-05, loss= 1.2127 (max= 1.5348), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:09:41,858 - root - INFO - Step 21680: lr=1.00E-05, loss= 1.2127 (max= 1.5348), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:09:57,805 - root - INFO - Step 21690: lr=1.00E-05, loss= 1.1610 (max= 1.5973), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:09:57,805 - root - INFO - Step 21690: lr=1.00E-05, loss= 1.1610 (max= 1.5973), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:09:57,805 - root - INFO - Step 21690: lr=1.00E-05, loss= 1.1610 (max= 1.5973), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:09:57,805 - root - INFO - Step 21690: lr=1.00E-05, loss= 1.1610 (max= 1.5973), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:09:57,805 - root - INFO - Step 21690: lr=1.00E-05, loss= 1.1610 (max= 1.5973), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:09:57,805 - root - INFO - Step 21690: lr=1.00E-05, loss= 1.1610 (max= 1.5973), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:09:57,805 - root - INFO - Step 21690: lr=1.00E-05, loss= 1.1610 (max= 1.5973), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:09:57,805 - root - INFO - Step 21690: lr=1.00E-05, loss= 1.1610 (max= 1.5973), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:13,731 - root - INFO - Step 21700: lr=1.00E-05, loss= 1.1914 (max= 1.7036), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:13,731 - root - INFO - Step 21700: lr=1.00E-05, loss= 1.1914 (max= 1.7036), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:13,731 - root - INFO - Step 21700: lr=1.00E-05, loss= 1.1914 (max= 1.7036), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:13,731 - root - INFO - Step 21700: lr=1.00E-05, loss= 1.1914 (max= 1.7036), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:13,731 - root - INFO - Step 21700: lr=1.00E-05, loss= 1.1914 (max= 1.7036), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:13,731 - root - INFO - Step 21700: lr=1.00E-05, loss= 1.1914 (max= 1.7036), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:13,731 - root - INFO - Step 21700: lr=1.00E-05, loss= 1.1914 (max= 1.7036), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:13,732 - root - INFO - Step 21700: lr=1.00E-05, loss= 1.1914 (max= 1.7036), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:29,685 - root - INFO - Step 21710: lr=1.00E-05, loss= 1.1797 (max= 1.5567), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:29,685 - root - INFO - Step 21710: lr=1.00E-05, loss= 1.1797 (max= 1.5567), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:29,685 - root - INFO - Step 21710: lr=1.00E-05, loss= 1.1797 (max= 1.5567), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:29,685 - root - INFO - Step 21710: lr=1.00E-05, loss= 1.1797 (max= 1.5567), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:29,685 - root - INFO - Step 21710: lr=1.00E-05, loss= 1.1797 (max= 1.5567), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:29,685 - root - INFO - Step 21710: lr=1.00E-05, loss= 1.1797 (max= 1.5567), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:29,686 - root - INFO - Step 21710: lr=1.00E-05, loss= 1.1797 (max= 1.5567), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:29,686 - root - INFO - Step 21710: lr=1.00E-05, loss= 1.1797 (max= 1.5567), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:45,629 - root - INFO - Step 21720: lr=1.00E-05, loss= 1.1715 (max= 1.8628), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:45,629 - root - INFO - Step 21720: lr=1.00E-05, loss= 1.1715 (max= 1.8628), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:45,629 - root - INFO - Step 21720: lr=1.00E-05, loss= 1.1715 (max= 1.8628), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:45,629 - root - INFO - Step 21720: lr=1.00E-05, loss= 1.1715 (max= 1.8628), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:45,629 - root - INFO - Step 21720: lr=1.00E-05, loss= 1.1715 (max= 1.8628), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:45,629 - root - INFO - Step 21720: lr=1.00E-05, loss= 1.1715 (max= 1.8628), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:45,629 - root - INFO - Step 21720: lr=1.00E-05, loss= 1.1715 (max= 1.8628), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:10:45,630 - root - INFO - Step 21720: lr=1.00E-05, loss= 1.1715 (max= 1.8628), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:01,587 - root - INFO - Step 21730: lr=1.00E-05, loss= 1.2033 (max= 1.6509), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:01,587 - root - INFO - Step 21730: lr=1.00E-05, loss= 1.2033 (max= 1.6509), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:01,587 - root - INFO - Step 21730: lr=1.00E-05, loss= 1.2033 (max= 1.6509), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:01,587 - root - INFO - Step 21730: lr=1.00E-05, loss= 1.2033 (max= 1.6509), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:01,587 - root - INFO - Step 21730: lr=1.00E-05, loss= 1.2033 (max= 1.6509), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:01,587 - root - INFO - Step 21730: lr=1.00E-05, loss= 1.2033 (max= 1.6509), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:01,587 - root - INFO - Step 21730: lr=1.00E-05, loss= 1.2033 (max= 1.6509), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:01,588 - root - INFO - Step 21730: lr=1.00E-05, loss= 1.2033 (max= 1.6509), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:17,570 - root - INFO - Step 21740: lr=1.00E-05, loss= 1.2035 (max= 1.6453), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:17,570 - root - INFO - Step 21740: lr=1.00E-05, loss= 1.2035 (max= 1.6453), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:17,570 - root - INFO - Step 21740: lr=1.00E-05, loss= 1.2035 (max= 1.6453), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:17,570 - root - INFO - Step 21740: lr=1.00E-05, loss= 1.2035 (max= 1.6453), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:17,570 - root - INFO - Step 21740: lr=1.00E-05, loss= 1.2035 (max= 1.6453), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:17,570 - root - INFO - Step 21740: lr=1.00E-05, loss= 1.2035 (max= 1.6453), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:17,570 - root - INFO - Step 21740: lr=1.00E-05, loss= 1.2035 (max= 1.6453), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:17,571 - root - INFO - Step 21740: lr=1.00E-05, loss= 1.2035 (max= 1.6453), tps=20506, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:33,519 - root - INFO - Step 21750: lr=1.00E-05, loss= 1.2007 (max= 1.6055), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:11:33,519 - root - INFO - Step 21750: lr=1.00E-05, loss= 1.2007 (max= 1.6055), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:11:33,519 - root - INFO - Step 21750: lr=1.00E-05, loss= 1.2007 (max= 1.6055), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:11:33,519 - root - INFO - Step 21750: lr=1.00E-05, loss= 1.2007 (max= 1.6055), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:11:33,519 - root - INFO - Step 21750: lr=1.00E-05, loss= 1.2007 (max= 1.6055), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:11:33,519 - root - INFO - Step 21750: lr=1.00E-05, loss= 1.2007 (max= 1.6055), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:11:33,519 - root - INFO - Step 21750: lr=1.00E-05, loss= 1.2007 (max= 1.6055), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:11:33,519 - root - INFO - Step 21750: lr=1.00E-05, loss= 1.2007 (max= 1.6055), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:11:49,469 - root - INFO - Step 21760: lr=1.00E-05, loss= 1.2196 (max= 1.6365), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:49,469 - root - INFO - Step 21760: lr=1.00E-05, loss= 1.2196 (max= 1.6365), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:49,469 - root - INFO - Step 21760: lr=1.00E-05, loss= 1.2196 (max= 1.6365), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:49,469 - root - INFO - Step 21760: lr=1.00E-05, loss= 1.2196 (max= 1.6365), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:49,469 - root - INFO - Step 21760: lr=1.00E-05, loss= 1.2196 (max= 1.6365), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:49,469 - root - INFO - Step 21760: lr=1.00E-05, loss= 1.2196 (max= 1.6365), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:49,469 - root - INFO - Step 21760: lr=1.00E-05, loss= 1.2196 (max= 1.6365), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:11:49,469 - root - INFO - Step 21760: lr=1.00E-05, loss= 1.2196 (max= 1.6365), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:05,410 - root - INFO - Step 21770: lr=1.00E-05, loss= 1.1903 (max= 1.5025), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:05,410 - root - INFO - Step 21770: lr=1.00E-05, loss= 1.1903 (max= 1.5025), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:05,410 - root - INFO - Step 21770: lr=1.00E-05, loss= 1.1903 (max= 1.5025), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:05,410 - root - INFO - Step 21770: lr=1.00E-05, loss= 1.1903 (max= 1.5025), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:05,410 - root - INFO - Step 21770: lr=1.00E-05, loss= 1.1903 (max= 1.5025), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:05,410 - root - INFO - Step 21770: lr=1.00E-05, loss= 1.1903 (max= 1.5025), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:05,410 - root - INFO - Step 21770: lr=1.00E-05, loss= 1.1903 (max= 1.5025), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:05,410 - root - INFO - Step 21770: lr=1.00E-05, loss= 1.1903 (max= 1.5025), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:21,389 - root - INFO - Step 21780: lr=1.00E-05, loss= 1.2070 (max= 1.5526), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:21,389 - root - INFO - Step 21780: lr=1.00E-05, loss= 1.2070 (max= 1.5526), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:21,389 - root - INFO - Step 21780: lr=1.00E-05, loss= 1.2070 (max= 1.5526), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:21,389 - root - INFO - Step 21780: lr=1.00E-05, loss= 1.2070 (max= 1.5526), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:21,389 - root - INFO - Step 21780: lr=1.00E-05, loss= 1.2070 (max= 1.5526), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:21,389 - root - INFO - Step 21780: lr=1.00E-05, loss= 1.2070 (max= 1.5526), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:21,389 - root - INFO - Step 21780: lr=1.00E-05, loss= 1.2070 (max= 1.5526), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:21,389 - root - INFO - Step 21780: lr=1.00E-05, loss= 1.2070 (max= 1.5526), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:37,302 - root - INFO - Step 21790: lr=1.00E-05, loss= 1.1654 (max= 1.4456), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:12:37,302 - root - INFO - Step 21790: lr=1.00E-05, loss= 1.1654 (max= 1.4456), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:12:37,302 - root - INFO - Step 21790: lr=1.00E-05, loss= 1.1654 (max= 1.4456), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:12:37,302 - root - INFO - Step 21790: lr=1.00E-05, loss= 1.1654 (max= 1.4456), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:12:37,302 - root - INFO - Step 21790: lr=1.00E-05, loss= 1.1654 (max= 1.4456), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:12:37,302 - root - INFO - Step 21790: lr=1.00E-05, loss= 1.1654 (max= 1.4456), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:12:37,302 - root - INFO - Step 21790: lr=1.00E-05, loss= 1.1654 (max= 1.4456), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:12:37,302 - root - INFO - Step 21790: lr=1.00E-05, loss= 1.1654 (max= 1.4456), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:12:53,316 - root - INFO - Step 21800: lr=1.00E-05, loss= 1.2331 (max= 1.6477), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:53,316 - root - INFO - Step 21800: lr=1.00E-05, loss= 1.2331 (max= 1.6477), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:53,316 - root - INFO - Step 21800: lr=1.00E-05, loss= 1.2331 (max= 1.6477), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:53,316 - root - INFO - Step 21800: lr=1.00E-05, loss= 1.2331 (max= 1.6477), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:53,316 - root - INFO - Step 21800: lr=1.00E-05, loss= 1.2331 (max= 1.6477), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:53,316 - root - INFO - Step 21800: lr=1.00E-05, loss= 1.2331 (max= 1.6477), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:53,316 - root - INFO - Step 21800: lr=1.00E-05, loss= 1.2331 (max= 1.6477), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:12:53,317 - root - INFO - Step 21800: lr=1.00E-05, loss= 1.2331 (max= 1.6477), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:09,280 - root - INFO - Step 21810: lr=1.00E-05, loss= 1.1660 (max= 1.4922), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:13:09,280 - root - INFO - Step 21810: lr=1.00E-05, loss= 1.1660 (max= 1.4922), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:13:09,280 - root - INFO - Step 21810: lr=1.00E-05, loss= 1.1660 (max= 1.4922), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:13:09,280 - root - INFO - Step 21810: lr=1.00E-05, loss= 1.1660 (max= 1.4922), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:13:09,280 - root - INFO - Step 21810: lr=1.00E-05, loss= 1.1660 (max= 1.4922), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:13:09,280 - root - INFO - Step 21810: lr=1.00E-05, loss= 1.1660 (max= 1.4922), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:13:09,280 - root - INFO - Step 21810: lr=1.00E-05, loss= 1.1660 (max= 1.4922), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:13:09,281 - root - INFO - Step 21810: lr=1.00E-05, loss= 1.1660 (max= 1.4922), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:13:25,260 - root - INFO - Step 21820: lr=1.00E-05, loss= 1.1701 (max= 1.4216), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:25,260 - root - INFO - Step 21820: lr=1.00E-05, loss= 1.1701 (max= 1.4216), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:25,260 - root - INFO - Step 21820: lr=1.00E-05, loss= 1.1701 (max= 1.4216), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:25,260 - root - INFO - Step 21820: lr=1.00E-05, loss= 1.1701 (max= 1.4216), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:25,260 - root - INFO - Step 21820: lr=1.00E-05, loss= 1.1701 (max= 1.4216), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:25,260 - root - INFO - Step 21820: lr=1.00E-05, loss= 1.1701 (max= 1.4216), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:25,260 - root - INFO - Step 21820: lr=1.00E-05, loss= 1.1701 (max= 1.4216), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:25,261 - root - INFO - Step 21820: lr=1.00E-05, loss= 1.1701 (max= 1.4216), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:41,264 - root - INFO - Step 21830: lr=1.00E-05, loss= 1.1918 (max= 1.5663), tps=20480, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:41,264 - root - INFO - Step 21830: lr=1.00E-05, loss= 1.1918 (max= 1.5663), tps=20480, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:41,264 - root - INFO - Step 21830: lr=1.00E-05, loss= 1.1918 (max= 1.5663), tps=20480, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:41,264 - root - INFO - Step 21830: lr=1.00E-05, loss= 1.1918 (max= 1.5663), tps=20480, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:41,264 - root - INFO - Step 21830: lr=1.00E-05, loss= 1.1918 (max= 1.5663), tps=20480, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:41,264 - root - INFO - Step 21830: lr=1.00E-05, loss= 1.1918 (max= 1.5663), tps=20480, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:41,264 - root - INFO - Step 21830: lr=1.00E-05, loss= 1.1918 (max= 1.5663), tps=20480, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:41,264 - root - INFO - Step 21830: lr=1.00E-05, loss= 1.1918 (max= 1.5663), tps=20480, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:57,182 - root - INFO - Step 21840: lr=1.00E-05, loss= 1.1822 (max= 1.6191), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:57,183 - root - INFO - Step 21840: lr=1.00E-05, loss= 1.1822 (max= 1.6191), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:57,183 - root - INFO - Step 21840: lr=1.00E-05, loss= 1.1822 (max= 1.6191), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:57,183 - root - INFO - Step 21840: lr=1.00E-05, loss= 1.1822 (max= 1.6191), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:57,183 - root - INFO - Step 21840: lr=1.00E-05, loss= 1.1822 (max= 1.6191), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:57,183 - root - INFO - Step 21840: lr=1.00E-05, loss= 1.1822 (max= 1.6191), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:57,183 - root - INFO - Step 21840: lr=1.00E-05, loss= 1.1822 (max= 1.6191), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:13:57,183 - root - INFO - Step 21840: lr=1.00E-05, loss= 1.1822 (max= 1.6191), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:14:13,159 - root - INFO - Step 21850: lr=1.00E-05, loss= 1.1639 (max= 1.5421), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:14:13,159 - root - INFO - Step 21850: lr=1.00E-05, loss= 1.1639 (max= 1.5421), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:14:13,159 - root - INFO - Step 21850: lr=1.00E-05, loss= 1.1639 (max= 1.5421), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:14:13,159 - root - INFO - Step 21850: lr=1.00E-05, loss= 1.1639 (max= 1.5421), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:14:13,159 - root - INFO - Step 21850: lr=1.00E-05, loss= 1.1639 (max= 1.5421), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:14:13,159 - root - INFO - Step 21850: lr=1.00E-05, loss= 1.1639 (max= 1.5421), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:14:13,159 - root - INFO - Step 21850: lr=1.00E-05, loss= 1.1639 (max= 1.5421), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:14:13,160 - root - INFO - Step 21850: lr=1.00E-05, loss= 1.1639 (max= 1.5421), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:14:29,051 - root - INFO - Step 21860: lr=1.00E-05, loss= 1.1834 (max= 2.0959), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:14:29,051 - root - INFO - Step 21860: lr=1.00E-05, loss= 1.1834 (max= 2.0959), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:14:29,051 - root - INFO - Step 21860: lr=1.00E-05, loss= 1.1834 (max= 2.0959), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:14:29,051 - root - INFO - Step 21860: lr=1.00E-05, loss= 1.1834 (max= 2.0959), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:14:29,051 - root - INFO - Step 21860: lr=1.00E-05, loss= 1.1834 (max= 2.0959), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:14:29,051 - root - INFO - Step 21860: lr=1.00E-05, loss= 1.1834 (max= 2.0959), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:14:29,051 - root - INFO - Step 21860: lr=1.00E-05, loss= 1.1834 (max= 2.0959), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:14:29,051 - root - INFO - Step 21860: lr=1.00E-05, loss= 1.1834 (max= 2.0959), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:14:45,006 - root - INFO - Step 21870: lr=1.00E-05, loss= 1.1663 (max= 1.6600), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:14:45,006 - root - INFO - Step 21870: lr=1.00E-05, loss= 1.1663 (max= 1.6600), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:14:45,006 - root - INFO - Step 21870: lr=1.00E-05, loss= 1.1663 (max= 1.6600), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:14:45,006 - root - INFO - Step 21870: lr=1.00E-05, loss= 1.1663 (max= 1.6600), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:14:45,006 - root - INFO - Step 21870: lr=1.00E-05, loss= 1.1663 (max= 1.6600), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:14:45,006 - root - INFO - Step 21870: lr=1.00E-05, loss= 1.1663 (max= 1.6600), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:14:45,006 - root - INFO - Step 21870: lr=1.00E-05, loss= 1.1663 (max= 1.6600), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:14:45,006 - root - INFO - Step 21870: lr=1.00E-05, loss= 1.1663 (max= 1.6600), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:00,959 - root - INFO - Step 21880: lr=1.00E-05, loss= 1.2016 (max= 1.6023), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:00,959 - root - INFO - Step 21880: lr=1.00E-05, loss= 1.2016 (max= 1.6023), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:00,959 - root - INFO - Step 21880: lr=1.00E-05, loss= 1.2016 (max= 1.6023), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:00,959 - root - INFO - Step 21880: lr=1.00E-05, loss= 1.2016 (max= 1.6023), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:00,959 - root - INFO - Step 21880: lr=1.00E-05, loss= 1.2016 (max= 1.6023), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:00,959 - root - INFO - Step 21880: lr=1.00E-05, loss= 1.2016 (max= 1.6023), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:00,959 - root - INFO - Step 21880: lr=1.00E-05, loss= 1.2016 (max= 1.6023), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:00,960 - root - INFO - Step 21880: lr=1.00E-05, loss= 1.2016 (max= 1.6023), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:16,907 - root - INFO - Step 21890: lr=1.00E-05, loss= 1.1513 (max= 1.5315), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:16,907 - root - INFO - Step 21890: lr=1.00E-05, loss= 1.1513 (max= 1.5315), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:16,907 - root - INFO - Step 21890: lr=1.00E-05, loss= 1.1513 (max= 1.5315), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:16,907 - root - INFO - Step 21890: lr=1.00E-05, loss= 1.1513 (max= 1.5315), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:16,907 - root - INFO - Step 21890: lr=1.00E-05, loss= 1.1513 (max= 1.5315), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:16,907 - root - INFO - Step 21890: lr=1.00E-05, loss= 1.1513 (max= 1.5315), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:16,907 - root - INFO - Step 21890: lr=1.00E-05, loss= 1.1513 (max= 1.5315), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:16,908 - root - INFO - Step 21890: lr=1.00E-05, loss= 1.1513 (max= 1.5315), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:32,921 - root - INFO - Step 21900: lr=1.00E-05, loss= 1.1700 (max= 1.5804), tps=20467, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:32,921 - root - INFO - Step 21900: lr=1.00E-05, loss= 1.1700 (max= 1.5804), tps=20467, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:32,921 - root - INFO - Step 21900: lr=1.00E-05, loss= 1.1700 (max= 1.5804), tps=20467, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:32,921 - root - INFO - Step 21900: lr=1.00E-05, loss= 1.1700 (max= 1.5804), tps=20467, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:32,921 - root - INFO - Step 21900: lr=1.00E-05, loss= 1.1700 (max= 1.5804), tps=20467, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:32,921 - root - INFO - Step 21900: lr=1.00E-05, loss= 1.1700 (max= 1.5804), tps=20467, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:32,921 - root - INFO - Step 21900: lr=1.00E-05, loss= 1.1700 (max= 1.5804), tps=20467, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:32,921 - root - INFO - Step 21900: lr=1.00E-05, loss= 1.1700 (max= 1.5804), tps=20467, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:48,877 - root - INFO - Step 21910: lr=1.00E-05, loss= 1.1831 (max= 1.5802), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:48,877 - root - INFO - Step 21910: lr=1.00E-05, loss= 1.1831 (max= 1.5802), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:48,877 - root - INFO - Step 21910: lr=1.00E-05, loss= 1.1831 (max= 1.5802), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:48,877 - root - INFO - Step 21910: lr=1.00E-05, loss= 1.1831 (max= 1.5802), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:48,877 - root - INFO - Step 21910: lr=1.00E-05, loss= 1.1831 (max= 1.5802), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:48,877 - root - INFO - Step 21910: lr=1.00E-05, loss= 1.1831 (max= 1.5802), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:48,877 - root - INFO - Step 21910: lr=1.00E-05, loss= 1.1831 (max= 1.5802), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:15:48,878 - root - INFO - Step 21910: lr=1.00E-05, loss= 1.1831 (max= 1.5802), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:16:04,805 - root - INFO - Step 21920: lr=1.00E-05, loss= 1.1575 (max= 1.5323), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:16:04,805 - root - INFO - Step 21920: lr=1.00E-05, loss= 1.1575 (max= 1.5323), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:16:04,805 - root - INFO - Step 21920: lr=1.00E-05, loss= 1.1575 (max= 1.5323), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:16:04,805 - root - INFO - Step 21920: lr=1.00E-05, loss= 1.1575 (max= 1.5323), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:16:04,805 - root - INFO - Step 21920: lr=1.00E-05, loss= 1.1575 (max= 1.5323), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:16:04,805 - root - INFO - Step 21920: lr=1.00E-05, loss= 1.1575 (max= 1.5323), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:16:04,805 - root - INFO - Step 21920: lr=1.00E-05, loss= 1.1575 (max= 1.5323), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:16:04,805 - root - INFO - Step 21920: lr=1.00E-05, loss= 1.1575 (max= 1.5323), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:16:20,777 - root - INFO - Step 21930: lr=1.00E-05, loss= 1.1891 (max= 1.5777), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:16:20,777 - root - INFO - Step 21930: lr=1.00E-05, loss= 1.1891 (max= 1.5777), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:16:20,777 - root - INFO - Step 21930: lr=1.00E-05, loss= 1.1891 (max= 1.5777), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:16:20,777 - root - INFO - Step 21930: lr=1.00E-05, loss= 1.1891 (max= 1.5777), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:16:20,777 - root - INFO - Step 21930: lr=1.00E-05, loss= 1.1891 (max= 1.5777), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:16:20,777 - root - INFO - Step 21930: lr=1.00E-05, loss= 1.1891 (max= 1.5777), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:16:20,778 - root - INFO - Step 21930: lr=1.00E-05, loss= 1.1891 (max= 1.5777), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:16:20,778 - root - INFO - Step 21930: lr=1.00E-05, loss= 1.1891 (max= 1.5777), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:16:26,330 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:4081003 +2025-10-24 20:16:36,743 - root - INFO - Step 21940: lr=1.00E-05, loss= 1.1938 (max= 1.5584), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:16:36,743 - root - INFO - Step 21940: lr=1.00E-05, loss= 1.1938 (max= 1.5584), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:16:36,744 - root - INFO - Step 21940: lr=1.00E-05, loss= 1.1938 (max= 1.5584), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:16:36,744 - root - INFO - Step 21940: lr=1.00E-05, loss= 1.1938 (max= 1.5584), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:16:36,744 - root - INFO - Step 21940: lr=1.00E-05, loss= 1.1938 (max= 1.5584), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:16:36,744 - root - INFO - Step 21940: lr=1.00E-05, loss= 1.1938 (max= 1.5584), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:16:36,744 - root - INFO - Step 21940: lr=1.00E-05, loss= 1.1938 (max= 1.5584), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:16:36,744 - root - INFO - Step 21940: lr=1.00E-05, loss= 1.1938 (max= 1.5584), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:16:52,722 - root - INFO - Step 21950: lr=1.00E-05, loss= 1.2007 (max= 1.6411), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:16:52,722 - root - INFO - Step 21950: lr=1.00E-05, loss= 1.2007 (max= 1.6411), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:16:52,722 - root - INFO - Step 21950: lr=1.00E-05, loss= 1.2007 (max= 1.6411), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:16:52,722 - root - INFO - Step 21950: lr=1.00E-05, loss= 1.2007 (max= 1.6411), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:16:52,722 - root - INFO - Step 21950: lr=1.00E-05, loss= 1.2007 (max= 1.6411), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:16:52,722 - root - INFO - Step 21950: lr=1.00E-05, loss= 1.2007 (max= 1.6411), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:16:52,722 - root - INFO - Step 21950: lr=1.00E-05, loss= 1.2007 (max= 1.6411), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:16:52,723 - root - INFO - Step 21950: lr=1.00E-05, loss= 1.2007 (max= 1.6411), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:17:08,652 - root - INFO - Step 21960: lr=1.00E-05, loss= 1.1687 (max= 1.5606), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:08,652 - root - INFO - Step 21960: lr=1.00E-05, loss= 1.1687 (max= 1.5606), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:08,652 - root - INFO - Step 21960: lr=1.00E-05, loss= 1.1687 (max= 1.5606), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:08,652 - root - INFO - Step 21960: lr=1.00E-05, loss= 1.1687 (max= 1.5606), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:08,652 - root - INFO - Step 21960: lr=1.00E-05, loss= 1.1687 (max= 1.5606), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:08,653 - root - INFO - Step 21960: lr=1.00E-05, loss= 1.1687 (max= 1.5606), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:08,653 - root - INFO - Step 21960: lr=1.00E-05, loss= 1.1687 (max= 1.5606), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:08,653 - root - INFO - Step 21960: lr=1.00E-05, loss= 1.1687 (max= 1.5606), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:24,615 - root - INFO - Step 21970: lr=1.00E-05, loss= 1.1809 (max= 1.6839), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:24,616 - root - INFO - Step 21970: lr=1.00E-05, loss= 1.1809 (max= 1.6839), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:24,616 - root - INFO - Step 21970: lr=1.00E-05, loss= 1.1809 (max= 1.6839), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:24,616 - root - INFO - Step 21970: lr=1.00E-05, loss= 1.1809 (max= 1.6839), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:24,616 - root - INFO - Step 21970: lr=1.00E-05, loss= 1.1809 (max= 1.6839), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:24,616 - root - INFO - Step 21970: lr=1.00E-05, loss= 1.1809 (max= 1.6839), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:24,616 - root - INFO - Step 21970: lr=1.00E-05, loss= 1.1809 (max= 1.6839), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:24,616 - root - INFO - Step 21970: lr=1.00E-05, loss= 1.1809 (max= 1.6839), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:40,562 - root - INFO - Step 21980: lr=1.00E-05, loss= 1.1911 (max= 1.5835), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:17:40,562 - root - INFO - Step 21980: lr=1.00E-05, loss= 1.1911 (max= 1.5835), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:17:40,562 - root - INFO - Step 21980: lr=1.00E-05, loss= 1.1911 (max= 1.5835), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:17:40,562 - root - INFO - Step 21980: lr=1.00E-05, loss= 1.1911 (max= 1.5835), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:17:40,563 - root - INFO - Step 21980: lr=1.00E-05, loss= 1.1911 (max= 1.5835), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:17:40,563 - root - INFO - Step 21980: lr=1.00E-05, loss= 1.1911 (max= 1.5835), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:17:40,563 - root - INFO - Step 21980: lr=1.00E-05, loss= 1.1911 (max= 1.5835), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:17:40,563 - root - INFO - Step 21980: lr=1.00E-05, loss= 1.1911 (max= 1.5835), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:17:56,495 - root - INFO - Step 21990: lr=1.00E-05, loss= 1.1670 (max= 1.5633), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:56,495 - root - INFO - Step 21990: lr=1.00E-05, loss= 1.1670 (max= 1.5633), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:56,495 - root - INFO - Step 21990: lr=1.00E-05, loss= 1.1670 (max= 1.5633), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:56,495 - root - INFO - Step 21990: lr=1.00E-05, loss= 1.1670 (max= 1.5633), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:56,495 - root - INFO - Step 21990: lr=1.00E-05, loss= 1.1670 (max= 1.5633), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:56,495 - root - INFO - Step 21990: lr=1.00E-05, loss= 1.1670 (max= 1.5633), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:56,495 - root - INFO - Step 21990: lr=1.00E-05, loss= 1.1670 (max= 1.5633), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:17:56,495 - root - INFO - Step 21990: lr=1.00E-05, loss= 1.1670 (max= 1.5633), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-22000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-22000! Save time: 4.465980291366577 +2025-10-24 20:18:12,477 - root - INFO - Step 22000: lr=1.00E-05, loss= 1.1868 (max= 1.6778), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:18:12,477 - root - INFO - Step 22000: lr=1.00E-05, loss= 1.1868 (max= 1.6778), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:18:12,477 - root - INFO - Saving a full checkpoint at step 22000 +2025-10-24 20:18:12,477 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 20:18:12,477 - root - INFO - Saving a full checkpoint at step 22000 +2025-10-24 20:18:12,477 - root - INFO - Step 22000: lr=1.00E-05, loss= 1.1868 (max= 1.6778), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:18:12,477 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 20:18:12,477 - root - INFO - Step 22000: lr=1.00E-05, loss= 1.1868 (max= 1.6778), tps=20506, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:18:12,477 - root - INFO - Step 22000: lr=1.00E-05, loss= 1.1868 (max= 1.6778), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:18:12,477 - root - INFO - Saving a full checkpoint at step 22000 +2025-10-24 20:18:12,477 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 20:18:12,477 - root - INFO - Step 22000: lr=1.00E-05, loss= 1.1868 (max= 1.6778), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:18:12,477 - root - INFO - Saving a full checkpoint at step 22000 +2025-10-24 20:18:12,477 - root - INFO - Saving a full checkpoint at step 22000 +2025-10-24 20:18:12,477 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 20:18:12,477 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 20:18:12,477 - root - INFO - Step 22000: lr=1.00E-05, loss= 1.1868 (max= 1.6778), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:18:12,477 - root - INFO - Saving a full checkpoint at step 22000 +2025-10-24 20:18:12,477 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 20:18:12,477 - root - INFO - Step 22000: lr=1.00E-05, loss= 1.1868 (max= 1.6778), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:18:12,477 - root - INFO - Saving a full checkpoint at step 22000 +2025-10-24 20:18:12,478 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 20:18:12,478 - root - INFO - Saving a full checkpoint at step 22000 +2025-10-24 20:18:12,478 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 20:18:27,093 - root - INFO - Finished saving the checkpoint in 14.62 seconds +2025-10-24 20:18:27,101 - root - INFO - Finished saving the checkpoint in 14.62 seconds +2025-10-24 20:18:27,101 - root - INFO - Finished saving the checkpoint in 14.62 seconds +2025-10-24 20:18:27,101 - root - INFO - Finished saving the checkpoint in 14.62 seconds +2025-10-24 20:18:27,102 - root - INFO - Finished saving the checkpoint in 14.62 seconds +2025-10-24 20:18:27,102 - root - INFO - Finished saving the checkpoint in 14.62 seconds +2025-10-24 20:18:27,102 - root - INFO - Finished saving the checkpoint in 14.62 seconds +2025-10-24 20:18:27,103 - root - INFO - Finished saving the checkpoint in 14.63 seconds +2025-10-24 20:18:42,967 - root - INFO - Step 22010: lr=1.00E-05, loss= 1.1968 (max= 1.6707), tps=10748, mfu=22.39%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:18:42,967 - root - INFO - Step 22010: lr=1.00E-05, loss= 1.1968 (max= 1.6707), tps=10748, mfu=22.39%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:18:42,968 - root - INFO - Step 22010: lr=1.00E-05, loss= 1.1968 (max= 1.6707), tps=10748, mfu=22.39%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:18:42,968 - root - INFO - Step 22010: lr=1.00E-05, loss= 1.1968 (max= 1.6707), tps=10748, mfu=22.39%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:18:42,968 - root - INFO - Step 22010: lr=1.00E-05, loss= 1.1968 (max= 1.6707), tps=10748, mfu=22.39%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:18:42,968 - root - INFO - Step 22010: lr=1.00E-05, loss= 1.1968 (max= 1.6707), tps=10748, mfu=22.39%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:18:42,968 - root - INFO - Step 22010: lr=1.00E-05, loss= 1.1968 (max= 1.6707), tps=10748, mfu=22.39%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:18:42,968 - root - INFO - Step 22010: lr=1.00E-05, loss= 1.1968 (max= 1.6707), tps=10748, mfu=22.39%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:18:58,894 - root - INFO - Step 22020: lr=1.00E-05, loss= 1.1610 (max= 1.5360), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:18:58,894 - root - INFO - Step 22020: lr=1.00E-05, loss= 1.1610 (max= 1.5360), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:18:58,894 - root - INFO - Step 22020: lr=1.00E-05, loss= 1.1610 (max= 1.5360), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:18:58,894 - root - INFO - Step 22020: lr=1.00E-05, loss= 1.1610 (max= 1.5360), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:18:58,894 - root - INFO - Step 22020: lr=1.00E-05, loss= 1.1610 (max= 1.5360), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:18:58,894 - root - INFO - Step 22020: lr=1.00E-05, loss= 1.1610 (max= 1.5360), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:18:58,894 - root - INFO - Step 22020: lr=1.00E-05, loss= 1.1610 (max= 1.5360), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:18:58,894 - root - INFO - Step 22020: lr=1.00E-05, loss= 1.1610 (max= 1.5360), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:14,828 - root - INFO - Step 22030: lr=1.00E-05, loss= 1.1697 (max= 1.6910), tps=20569, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:14,828 - root - INFO - Step 22030: lr=1.00E-05, loss= 1.1697 (max= 1.6910), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:14,828 - root - INFO - Step 22030: lr=1.00E-05, loss= 1.1697 (max= 1.6910), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:14,828 - root - INFO - Step 22030: lr=1.00E-05, loss= 1.1697 (max= 1.6910), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:14,828 - root - INFO - Step 22030: lr=1.00E-05, loss= 1.1697 (max= 1.6910), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:14,829 - root - INFO - Step 22030: lr=1.00E-05, loss= 1.1697 (max= 1.6910), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:14,829 - root - INFO - Step 22030: lr=1.00E-05, loss= 1.1697 (max= 1.6910), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:14,829 - root - INFO - Step 22030: lr=1.00E-05, loss= 1.1697 (max= 1.6910), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:30,788 - root - INFO - Step 22040: lr=1.00E-05, loss= 1.1743 (max= 1.5066), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:30,788 - root - INFO - Step 22040: lr=1.00E-05, loss= 1.1743 (max= 1.5066), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:30,788 - root - INFO - Step 22040: lr=1.00E-05, loss= 1.1743 (max= 1.5066), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:30,788 - root - INFO - Step 22040: lr=1.00E-05, loss= 1.1743 (max= 1.5066), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:30,788 - root - INFO - Step 22040: lr=1.00E-05, loss= 1.1743 (max= 1.5066), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:30,788 - root - INFO - Step 22040: lr=1.00E-05, loss= 1.1743 (max= 1.5066), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:30,788 - root - INFO - Step 22040: lr=1.00E-05, loss= 1.1743 (max= 1.5066), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:30,788 - root - INFO - Step 22040: lr=1.00E-05, loss= 1.1743 (max= 1.5066), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:46,722 - root - INFO - Step 22050: lr=1.00E-05, loss= 1.1427 (max= 1.5232), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:46,722 - root - INFO - Step 22050: lr=1.00E-05, loss= 1.1427 (max= 1.5232), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:46,722 - root - INFO - Step 22050: lr=1.00E-05, loss= 1.1427 (max= 1.5232), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:46,722 - root - INFO - Step 22050: lr=1.00E-05, loss= 1.1427 (max= 1.5232), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:46,722 - root - INFO - Step 22050: lr=1.00E-05, loss= 1.1427 (max= 1.5232), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:46,722 - root - INFO - Step 22050: lr=1.00E-05, loss= 1.1427 (max= 1.5232), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:46,722 - root - INFO - Step 22050: lr=1.00E-05, loss= 1.1427 (max= 1.5232), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:19:46,722 - root - INFO - Step 22050: lr=1.00E-05, loss= 1.1427 (max= 1.5232), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:02,633 - root - INFO - Step 22060: lr=1.00E-05, loss= 1.2306 (max= 1.8741), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:02,633 - root - INFO - Step 22060: lr=1.00E-05, loss= 1.2306 (max= 1.8741), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:02,633 - root - INFO - Step 22060: lr=1.00E-05, loss= 1.2306 (max= 1.8741), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:02,633 - root - INFO - Step 22060: lr=1.00E-05, loss= 1.2306 (max= 1.8741), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:02,633 - root - INFO - Step 22060: lr=1.00E-05, loss= 1.2306 (max= 1.8741), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:02,633 - root - INFO - Step 22060: lr=1.00E-05, loss= 1.2306 (max= 1.8741), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:02,633 - root - INFO - Step 22060: lr=1.00E-05, loss= 1.2306 (max= 1.8741), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:02,633 - root - INFO - Step 22060: lr=1.00E-05, loss= 1.2306 (max= 1.8741), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:18,561 - root - INFO - Step 22070: lr=1.00E-05, loss= 1.1761 (max= 1.5504), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:18,561 - root - INFO - Step 22070: lr=1.00E-05, loss= 1.1761 (max= 1.5504), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:18,561 - root - INFO - Step 22070: lr=1.00E-05, loss= 1.1761 (max= 1.5504), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:18,561 - root - INFO - Step 22070: lr=1.00E-05, loss= 1.1761 (max= 1.5504), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:18,561 - root - INFO - Step 22070: lr=1.00E-05, loss= 1.1761 (max= 1.5504), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:18,561 - root - INFO - Step 22070: lr=1.00E-05, loss= 1.1761 (max= 1.5504), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:18,561 - root - INFO - Step 22070: lr=1.00E-05, loss= 1.1761 (max= 1.5504), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:18,561 - root - INFO - Step 22070: lr=1.00E-05, loss= 1.1761 (max= 1.5504), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:34,495 - root - INFO - Step 22080: lr=1.00E-05, loss= 1.1813 (max= 1.5976), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:34,495 - root - INFO - Step 22080: lr=1.00E-05, loss= 1.1813 (max= 1.5976), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:34,495 - root - INFO - Step 22080: lr=1.00E-05, loss= 1.1813 (max= 1.5976), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:34,495 - root - INFO - Step 22080: lr=1.00E-05, loss= 1.1813 (max= 1.5976), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:34,495 - root - INFO - Step 22080: lr=1.00E-05, loss= 1.1813 (max= 1.5976), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:34,495 - root - INFO - Step 22080: lr=1.00E-05, loss= 1.1813 (max= 1.5976), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:34,495 - root - INFO - Step 22080: lr=1.00E-05, loss= 1.1813 (max= 1.5976), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:34,495 - root - INFO - Step 22080: lr=1.00E-05, loss= 1.1813 (max= 1.5976), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:50,479 - root - INFO - Step 22090: lr=1.00E-05, loss= 1.2002 (max= 1.6930), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:50,479 - root - INFO - Step 22090: lr=1.00E-05, loss= 1.2002 (max= 1.6930), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:50,479 - root - INFO - Step 22090: lr=1.00E-05, loss= 1.2002 (max= 1.6930), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:50,479 - root - INFO - Step 22090: lr=1.00E-05, loss= 1.2002 (max= 1.6930), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:50,479 - root - INFO - Step 22090: lr=1.00E-05, loss= 1.2002 (max= 1.6930), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:50,479 - root - INFO - Step 22090: lr=1.00E-05, loss= 1.2002 (max= 1.6930), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:50,479 - root - INFO - Step 22090: lr=1.00E-05, loss= 1.2002 (max= 1.6930), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:20:50,479 - root - INFO - Step 22090: lr=1.00E-05, loss= 1.2002 (max= 1.6930), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:06,417 - root - INFO - Step 22100: lr=1.00E-05, loss= 1.1936 (max= 1.5044), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:06,417 - root - INFO - Step 22100: lr=1.00E-05, loss= 1.1936 (max= 1.5044), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:06,417 - root - INFO - Step 22100: lr=1.00E-05, loss= 1.1936 (max= 1.5044), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:06,417 - root - INFO - Step 22100: lr=1.00E-05, loss= 1.1936 (max= 1.5044), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:06,417 - root - INFO - Step 22100: lr=1.00E-05, loss= 1.1936 (max= 1.5044), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:06,417 - root - INFO - Step 22100: lr=1.00E-05, loss= 1.1936 (max= 1.5044), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:06,417 - root - INFO - Step 22100: lr=1.00E-05, loss= 1.1936 (max= 1.5044), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:06,417 - root - INFO - Step 22100: lr=1.00E-05, loss= 1.1936 (max= 1.5044), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:22,351 - root - INFO - Step 22110: lr=1.00E-05, loss= 1.1750 (max= 1.5962), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:22,351 - root - INFO - Step 22110: lr=1.00E-05, loss= 1.1750 (max= 1.5962), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:22,351 - root - INFO - Step 22110: lr=1.00E-05, loss= 1.1750 (max= 1.5962), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:22,351 - root - INFO - Step 22110: lr=1.00E-05, loss= 1.1750 (max= 1.5962), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:22,351 - root - INFO - Step 22110: lr=1.00E-05, loss= 1.1750 (max= 1.5962), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:22,351 - root - INFO - Step 22110: lr=1.00E-05, loss= 1.1750 (max= 1.5962), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:22,351 - root - INFO - Step 22110: lr=1.00E-05, loss= 1.1750 (max= 1.5962), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:22,351 - root - INFO - Step 22110: lr=1.00E-05, loss= 1.1750 (max= 1.5962), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:38,342 - root - INFO - Step 22120: lr=1.00E-05, loss= 1.1893 (max= 1.5088), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:38,342 - root - INFO - Step 22120: lr=1.00E-05, loss= 1.1893 (max= 1.5088), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:38,342 - root - INFO - Step 22120: lr=1.00E-05, loss= 1.1893 (max= 1.5088), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:38,342 - root - INFO - Step 22120: lr=1.00E-05, loss= 1.1893 (max= 1.5088), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:38,342 - root - INFO - Step 22120: lr=1.00E-05, loss= 1.1893 (max= 1.5088), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:38,342 - root - INFO - Step 22120: lr=1.00E-05, loss= 1.1893 (max= 1.5088), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:38,342 - root - INFO - Step 22120: lr=1.00E-05, loss= 1.1893 (max= 1.5088), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:38,342 - root - INFO - Step 22120: lr=1.00E-05, loss= 1.1893 (max= 1.5088), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:54,293 - root - INFO - Step 22130: lr=1.00E-05, loss= 1.1771 (max= 1.4687), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:54,293 - root - INFO - Step 22130: lr=1.00E-05, loss= 1.1771 (max= 1.4687), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:54,293 - root - INFO - Step 22130: lr=1.00E-05, loss= 1.1771 (max= 1.4687), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:54,293 - root - INFO - Step 22130: lr=1.00E-05, loss= 1.1771 (max= 1.4687), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:54,293 - root - INFO - Step 22130: lr=1.00E-05, loss= 1.1771 (max= 1.4687), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:54,293 - root - INFO - Step 22130: lr=1.00E-05, loss= 1.1771 (max= 1.4687), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:54,293 - root - INFO - Step 22130: lr=1.00E-05, loss= 1.1771 (max= 1.4687), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:21:54,293 - root - INFO - Step 22130: lr=1.00E-05, loss= 1.1771 (max= 1.4687), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:10,253 - root - INFO - Step 22140: lr=1.00E-05, loss= 1.1749 (max= 1.6929), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:10,253 - root - INFO - Step 22140: lr=1.00E-05, loss= 1.1749 (max= 1.6929), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:10,253 - root - INFO - Step 22140: lr=1.00E-05, loss= 1.1749 (max= 1.6929), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:10,253 - root - INFO - Step 22140: lr=1.00E-05, loss= 1.1749 (max= 1.6929), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:10,253 - root - INFO - Step 22140: lr=1.00E-05, loss= 1.1749 (max= 1.6929), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:10,253 - root - INFO - Step 22140: lr=1.00E-05, loss= 1.1749 (max= 1.6929), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:10,253 - root - INFO - Step 22140: lr=1.00E-05, loss= 1.1749 (max= 1.6929), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:10,253 - root - INFO - Step 22140: lr=1.00E-05, loss= 1.1749 (max= 1.6929), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:26,180 - root - INFO - Step 22150: lr=1.00E-05, loss= 1.1876 (max= 1.5920), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:26,180 - root - INFO - Step 22150: lr=1.00E-05, loss= 1.1876 (max= 1.5920), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:26,180 - root - INFO - Step 22150: lr=1.00E-05, loss= 1.1876 (max= 1.5920), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:26,180 - root - INFO - Step 22150: lr=1.00E-05, loss= 1.1876 (max= 1.5920), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:26,180 - root - INFO - Step 22150: lr=1.00E-05, loss= 1.1876 (max= 1.5920), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:26,180 - root - INFO - Step 22150: lr=1.00E-05, loss= 1.1876 (max= 1.5920), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:26,181 - root - INFO - Step 22150: lr=1.00E-05, loss= 1.1876 (max= 1.5920), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:26,181 - root - INFO - Step 22150: lr=1.00E-05, loss= 1.1876 (max= 1.5920), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:42,111 - root - INFO - Step 22160: lr=1.00E-05, loss= 1.1437 (max= 1.6530), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:42,111 - root - INFO - Step 22160: lr=1.00E-05, loss= 1.1437 (max= 1.6530), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:42,111 - root - INFO - Step 22160: lr=1.00E-05, loss= 1.1437 (max= 1.6530), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:42,111 - root - INFO - Step 22160: lr=1.00E-05, loss= 1.1437 (max= 1.6530), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:42,111 - root - INFO - Step 22160: lr=1.00E-05, loss= 1.1437 (max= 1.6530), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:42,111 - root - INFO - Step 22160: lr=1.00E-05, loss= 1.1437 (max= 1.6530), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:42,111 - root - INFO - Step 22160: lr=1.00E-05, loss= 1.1437 (max= 1.6530), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:42,111 - root - INFO - Step 22160: lr=1.00E-05, loss= 1.1437 (max= 1.6530), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:58,041 - root - INFO - Step 22170: lr=1.00E-05, loss= 1.1721 (max= 1.5496), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:58,041 - root - INFO - Step 22170: lr=1.00E-05, loss= 1.1721 (max= 1.5496), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:58,041 - root - INFO - Step 22170: lr=1.00E-05, loss= 1.1721 (max= 1.5496), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:58,041 - root - INFO - Step 22170: lr=1.00E-05, loss= 1.1721 (max= 1.5496), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:58,041 - root - INFO - Step 22170: lr=1.00E-05, loss= 1.1721 (max= 1.5496), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:58,041 - root - INFO - Step 22170: lr=1.00E-05, loss= 1.1721 (max= 1.5496), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:58,041 - root - INFO - Step 22170: lr=1.00E-05, loss= 1.1721 (max= 1.5496), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:22:58,041 - root - INFO - Step 22170: lr=1.00E-05, loss= 1.1721 (max= 1.5496), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:13,976 - root - INFO - Step 22180: lr=1.00E-05, loss= 1.1478 (max= 1.5758), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:13,976 - root - INFO - Step 22180: lr=1.00E-05, loss= 1.1478 (max= 1.5758), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:13,976 - root - INFO - Step 22180: lr=1.00E-05, loss= 1.1478 (max= 1.5758), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:13,976 - root - INFO - Step 22180: lr=1.00E-05, loss= 1.1478 (max= 1.5758), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:13,976 - root - INFO - Step 22180: lr=1.00E-05, loss= 1.1478 (max= 1.5758), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:13,976 - root - INFO - Step 22180: lr=1.00E-05, loss= 1.1478 (max= 1.5758), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:13,976 - root - INFO - Step 22180: lr=1.00E-05, loss= 1.1478 (max= 1.5758), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:13,976 - root - INFO - Step 22180: lr=1.00E-05, loss= 1.1478 (max= 1.5758), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:29,812 - root - INFO - Step 22190: lr=1.00E-05, loss= 1.1739 (max= 1.5921), tps=20695, mfu=43.12%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:29,812 - root - INFO - Step 22190: lr=1.00E-05, loss= 1.1739 (max= 1.5921), tps=20695, mfu=43.12%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:29,812 - root - INFO - Step 22190: lr=1.00E-05, loss= 1.1739 (max= 1.5921), tps=20695, mfu=43.12%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:29,813 - root - INFO - Step 22190: lr=1.00E-05, loss= 1.1739 (max= 1.5921), tps=20695, mfu=43.12%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:29,813 - root - INFO - Step 22190: lr=1.00E-05, loss= 1.1739 (max= 1.5921), tps=20695, mfu=43.12%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:29,813 - root - INFO - Step 22190: lr=1.00E-05, loss= 1.1739 (max= 1.5921), tps=20695, mfu=43.12%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:29,813 - root - INFO - Step 22190: lr=1.00E-05, loss= 1.1739 (max= 1.5921), tps=20696, mfu=43.12%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:29,813 - root - INFO - Step 22190: lr=1.00E-05, loss= 1.1739 (max= 1.5921), tps=20695, mfu=43.12%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:45,736 - root - INFO - Step 22200: lr=1.00E-05, loss= 1.2102 (max= 1.5504), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:45,736 - root - INFO - Step 22200: lr=1.00E-05, loss= 1.2102 (max= 1.5504), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:45,736 - root - INFO - Step 22200: lr=1.00E-05, loss= 1.2102 (max= 1.5504), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:45,736 - root - INFO - Step 22200: lr=1.00E-05, loss= 1.2102 (max= 1.5504), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:45,736 - root - INFO - Step 22200: lr=1.00E-05, loss= 1.2102 (max= 1.5504), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:45,736 - root - INFO - Step 22200: lr=1.00E-05, loss= 1.2102 (max= 1.5504), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:45,736 - root - INFO - Step 22200: lr=1.00E-05, loss= 1.2102 (max= 1.5504), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:23:45,736 - root - INFO - Step 22200: lr=1.00E-05, loss= 1.2102 (max= 1.5504), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:01,660 - root - INFO - Step 22210: lr=1.00E-05, loss= 1.1992 (max= 1.7279), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:01,660 - root - INFO - Step 22210: lr=1.00E-05, loss= 1.1992 (max= 1.7279), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:01,660 - root - INFO - Step 22210: lr=1.00E-05, loss= 1.1992 (max= 1.7279), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:01,660 - root - INFO - Step 22210: lr=1.00E-05, loss= 1.1992 (max= 1.7279), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:01,660 - root - INFO - Step 22210: lr=1.00E-05, loss= 1.1992 (max= 1.7279), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:01,660 - root - INFO - Step 22210: lr=1.00E-05, loss= 1.1992 (max= 1.7279), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:01,660 - root - INFO - Step 22210: lr=1.00E-05, loss= 1.1992 (max= 1.7279), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:01,660 - root - INFO - Step 22210: lr=1.00E-05, loss= 1.1992 (max= 1.7279), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:17,583 - root - INFO - Step 22220: lr=1.00E-05, loss= 1.1637 (max= 1.5535), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:17,584 - root - INFO - Step 22220: lr=1.00E-05, loss= 1.1637 (max= 1.5535), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:17,584 - root - INFO - Step 22220: lr=1.00E-05, loss= 1.1637 (max= 1.5535), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:17,584 - root - INFO - Step 22220: lr=1.00E-05, loss= 1.1637 (max= 1.5535), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:17,584 - root - INFO - Step 22220: lr=1.00E-05, loss= 1.1637 (max= 1.5535), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:17,584 - root - INFO - Step 22220: lr=1.00E-05, loss= 1.1637 (max= 1.5535), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:17,584 - root - INFO - Step 22220: lr=1.00E-05, loss= 1.1637 (max= 1.5535), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:17,584 - root - INFO - Step 22220: lr=1.00E-05, loss= 1.1637 (max= 1.5535), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:33,488 - root - INFO - Step 22230: lr=1.00E-05, loss= 1.1493 (max= 1.6495), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:33,489 - root - INFO - Step 22230: lr=1.00E-05, loss= 1.1493 (max= 1.6495), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:33,489 - root - INFO - Step 22230: lr=1.00E-05, loss= 1.1493 (max= 1.6495), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:33,489 - root - INFO - Step 22230: lr=1.00E-05, loss= 1.1493 (max= 1.6495), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:33,489 - root - INFO - Step 22230: lr=1.00E-05, loss= 1.1493 (max= 1.6495), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:33,489 - root - INFO - Step 22230: lr=1.00E-05, loss= 1.1493 (max= 1.6495), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:33,489 - root - INFO - Step 22230: lr=1.00E-05, loss= 1.1493 (max= 1.6495), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:33,489 - root - INFO - Step 22230: lr=1.00E-05, loss= 1.1493 (max= 1.6495), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:49,418 - root - INFO - Step 22240: lr=1.00E-05, loss= 1.1263 (max= 1.5210), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:49,418 - root - INFO - Step 22240: lr=1.00E-05, loss= 1.1263 (max= 1.5210), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:49,418 - root - INFO - Step 22240: lr=1.00E-05, loss= 1.1263 (max= 1.5210), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:49,418 - root - INFO - Step 22240: lr=1.00E-05, loss= 1.1263 (max= 1.5210), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:49,418 - root - INFO - Step 22240: lr=1.00E-05, loss= 1.1263 (max= 1.5210), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:49,419 - root - INFO - Step 22240: lr=1.00E-05, loss= 1.1263 (max= 1.5210), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:49,419 - root - INFO - Step 22240: lr=1.00E-05, loss= 1.1263 (max= 1.5210), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:24:49,419 - root - INFO - Step 22240: lr=1.00E-05, loss= 1.1263 (max= 1.5210), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:05,331 - root - INFO - Step 22250: lr=1.00E-05, loss= 1.1455 (max= 1.5732), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:25:05,331 - root - INFO - Step 22250: lr=1.00E-05, loss= 1.1455 (max= 1.5732), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:25:05,331 - root - INFO - Step 22250: lr=1.00E-05, loss= 1.1455 (max= 1.5732), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:25:05,332 - root - INFO - Step 22250: lr=1.00E-05, loss= 1.1455 (max= 1.5732), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:25:05,332 - root - INFO - Step 22250: lr=1.00E-05, loss= 1.1455 (max= 1.5732), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:25:05,332 - root - INFO - Step 22250: lr=1.00E-05, loss= 1.1455 (max= 1.5732), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:25:05,332 - root - INFO - Step 22250: lr=1.00E-05, loss= 1.1455 (max= 1.5732), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:25:05,332 - root - INFO - Step 22250: lr=1.00E-05, loss= 1.1455 (max= 1.5732), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:25:21,230 - root - INFO - Step 22260: lr=1.00E-05, loss= 1.1688 (max= 1.5731), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:21,230 - root - INFO - Step 22260: lr=1.00E-05, loss= 1.1688 (max= 1.5731), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:21,230 - root - INFO - Step 22260: lr=1.00E-05, loss= 1.1688 (max= 1.5731), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:21,230 - root - INFO - Step 22260: lr=1.00E-05, loss= 1.1688 (max= 1.5731), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:21,230 - root - INFO - Step 22260: lr=1.00E-05, loss= 1.1688 (max= 1.5731), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:21,230 - root - INFO - Step 22260: lr=1.00E-05, loss= 1.1688 (max= 1.5731), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:21,230 - root - INFO - Step 22260: lr=1.00E-05, loss= 1.1688 (max= 1.5731), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:21,230 - root - INFO - Step 22260: lr=1.00E-05, loss= 1.1688 (max= 1.5731), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:37,221 - root - INFO - Step 22270: lr=1.00E-05, loss= 1.1614 (max= 1.4901), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:37,221 - root - INFO - Step 22270: lr=1.00E-05, loss= 1.1614 (max= 1.4901), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:37,221 - root - INFO - Step 22270: lr=1.00E-05, loss= 1.1614 (max= 1.4901), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:37,221 - root - INFO - Step 22270: lr=1.00E-05, loss= 1.1614 (max= 1.4901), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:37,221 - root - INFO - Step 22270: lr=1.00E-05, loss= 1.1614 (max= 1.4901), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:37,221 - root - INFO - Step 22270: lr=1.00E-05, loss= 1.1614 (max= 1.4901), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:37,221 - root - INFO - Step 22270: lr=1.00E-05, loss= 1.1614 (max= 1.4901), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:37,222 - root - INFO - Step 22270: lr=1.00E-05, loss= 1.1614 (max= 1.4901), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:53,130 - root - INFO - Step 22280: lr=1.00E-05, loss= 1.1960 (max= 1.5421), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:53,130 - root - INFO - Step 22280: lr=1.00E-05, loss= 1.1960 (max= 1.5421), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:53,130 - root - INFO - Step 22280: lr=1.00E-05, loss= 1.1960 (max= 1.5421), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:53,130 - root - INFO - Step 22280: lr=1.00E-05, loss= 1.1960 (max= 1.5421), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:53,130 - root - INFO - Step 22280: lr=1.00E-05, loss= 1.1960 (max= 1.5421), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:53,130 - root - INFO - Step 22280: lr=1.00E-05, loss= 1.1960 (max= 1.5421), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:53,130 - root - INFO - Step 22280: lr=1.00E-05, loss= 1.1960 (max= 1.5421), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:25:53,130 - root - INFO - Step 22280: lr=1.00E-05, loss= 1.1960 (max= 1.5421), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:09,117 - root - INFO - Step 22290: lr=1.00E-05, loss= 1.1765 (max= 1.5543), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:09,117 - root - INFO - Step 22290: lr=1.00E-05, loss= 1.1765 (max= 1.5543), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:09,117 - root - INFO - Step 22290: lr=1.00E-05, loss= 1.1765 (max= 1.5543), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:09,117 - root - INFO - Step 22290: lr=1.00E-05, loss= 1.1765 (max= 1.5543), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:09,117 - root - INFO - Step 22290: lr=1.00E-05, loss= 1.1765 (max= 1.5543), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:09,117 - root - INFO - Step 22290: lr=1.00E-05, loss= 1.1765 (max= 1.5543), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:09,117 - root - INFO - Step 22290: lr=1.00E-05, loss= 1.1765 (max= 1.5543), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:09,117 - root - INFO - Step 22290: lr=1.00E-05, loss= 1.1765 (max= 1.5543), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:25,062 - root - INFO - Step 22300: lr=1.00E-05, loss= 1.1629 (max= 1.4680), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:25,063 - root - INFO - Step 22300: lr=1.00E-05, loss= 1.1629 (max= 1.4680), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:25,063 - root - INFO - Step 22300: lr=1.00E-05, loss= 1.1629 (max= 1.4680), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:25,063 - root - INFO - Step 22300: lr=1.00E-05, loss= 1.1629 (max= 1.4680), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:25,063 - root - INFO - Step 22300: lr=1.00E-05, loss= 1.1629 (max= 1.4680), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:25,063 - root - INFO - Step 22300: lr=1.00E-05, loss= 1.1629 (max= 1.4680), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:25,063 - root - INFO - Step 22300: lr=1.00E-05, loss= 1.1629 (max= 1.4680), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:25,063 - root - INFO - Step 22300: lr=1.00E-05, loss= 1.1629 (max= 1.4680), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:41,025 - root - INFO - Step 22310: lr=1.00E-05, loss= 1.1605 (max= 1.4284), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:41,026 - root - INFO - Step 22310: lr=1.00E-05, loss= 1.1605 (max= 1.4284), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:41,026 - root - INFO - Step 22310: lr=1.00E-05, loss= 1.1605 (max= 1.4284), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:41,026 - root - INFO - Step 22310: lr=1.00E-05, loss= 1.1605 (max= 1.4284), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:41,026 - root - INFO - Step 22310: lr=1.00E-05, loss= 1.1605 (max= 1.4284), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:41,026 - root - INFO - Step 22310: lr=1.00E-05, loss= 1.1605 (max= 1.4284), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:41,026 - root - INFO - Step 22310: lr=1.00E-05, loss= 1.1605 (max= 1.4284), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:41,026 - root - INFO - Step 22310: lr=1.00E-05, loss= 1.1605 (max= 1.4284), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:56,966 - root - INFO - Step 22320: lr=1.00E-05, loss= 1.2032 (max= 1.6044), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:56,966 - root - INFO - Step 22320: lr=1.00E-05, loss= 1.2032 (max= 1.6044), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:56,966 - root - INFO - Step 22320: lr=1.00E-05, loss= 1.2032 (max= 1.6044), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:56,966 - root - INFO - Step 22320: lr=1.00E-05, loss= 1.2032 (max= 1.6044), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:56,966 - root - INFO - Step 22320: lr=1.00E-05, loss= 1.2032 (max= 1.6044), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:56,966 - root - INFO - Step 22320: lr=1.00E-05, loss= 1.2032 (max= 1.6044), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:56,966 - root - INFO - Step 22320: lr=1.00E-05, loss= 1.2032 (max= 1.6044), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:26:56,966 - root - INFO - Step 22320: lr=1.00E-05, loss= 1.2032 (max= 1.6044), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:27:12,884 - root - INFO - Step 22330: lr=1.00E-05, loss= 1.1539 (max= 1.6228), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:27:12,885 - root - INFO - Step 22330: lr=1.00E-05, loss= 1.1539 (max= 1.6228), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:27:12,885 - root - INFO - Step 22330: lr=1.00E-05, loss= 1.1539 (max= 1.6228), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:27:12,885 - root - INFO - Step 22330: lr=1.00E-05, loss= 1.1539 (max= 1.6228), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:27:12,885 - root - INFO - Step 22330: lr=1.00E-05, loss= 1.1539 (max= 1.6228), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:27:12,885 - root - INFO - Step 22330: lr=1.00E-05, loss= 1.1539 (max= 1.6228), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:27:12,885 - root - INFO - Step 22330: lr=1.00E-05, loss= 1.1539 (max= 1.6228), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:27:12,885 - root - INFO - Step 22330: lr=1.00E-05, loss= 1.1539 (max= 1.6228), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:27:28,845 - root - INFO - Step 22340: lr=1.00E-05, loss= 1.1568 (max= 1.5490), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:27:28,845 - root - INFO - Step 22340: lr=1.00E-05, loss= 1.1568 (max= 1.5490), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:27:28,845 - root - INFO - Step 22340: lr=1.00E-05, loss= 1.1568 (max= 1.5490), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:27:28,845 - root - INFO - Step 22340: lr=1.00E-05, loss= 1.1568 (max= 1.5490), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:27:28,845 - root - INFO - Step 22340: lr=1.00E-05, loss= 1.1568 (max= 1.5490), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:27:28,845 - root - INFO - Step 22340: lr=1.00E-05, loss= 1.1568 (max= 1.5490), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:27:28,846 - root - INFO - Step 22340: lr=1.00E-05, loss= 1.1568 (max= 1.5490), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:27:28,846 - root - INFO - Step 22340: lr=1.00E-05, loss= 1.1568 (max= 1.5490), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:27:44,769 - root - INFO - Step 22350: lr=1.00E-05, loss= 1.1775 (max= 1.5379), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:27:44,769 - root - INFO - Step 22350: lr=1.00E-05, loss= 1.1775 (max= 1.5379), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:27:44,769 - root - INFO - Step 22350: lr=1.00E-05, loss= 1.1775 (max= 1.5379), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:27:44,769 - root - INFO - Step 22350: lr=1.00E-05, loss= 1.1775 (max= 1.5379), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:27:44,769 - root - INFO - Step 22350: lr=1.00E-05, loss= 1.1775 (max= 1.5379), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:27:44,769 - root - INFO - Step 22350: lr=1.00E-05, loss= 1.1775 (max= 1.5379), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:27:44,769 - root - INFO - Step 22350: lr=1.00E-05, loss= 1.1775 (max= 1.5379), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:27:44,770 - root - INFO - Step 22350: lr=1.00E-05, loss= 1.1775 (max= 1.5379), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:28:00,745 - root - INFO - Step 22360: lr=1.00E-05, loss= 1.1578 (max= 1.4564), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:28:00,745 - root - INFO - Step 22360: lr=1.00E-05, loss= 1.1578 (max= 1.4564), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:28:00,745 - root - INFO - Step 22360: lr=1.00E-05, loss= 1.1578 (max= 1.4564), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:28:00,745 - root - INFO - Step 22360: lr=1.00E-05, loss= 1.1578 (max= 1.4564), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:28:00,745 - root - INFO - Step 22360: lr=1.00E-05, loss= 1.1578 (max= 1.4564), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:28:00,745 - root - INFO - Step 22360: lr=1.00E-05, loss= 1.1578 (max= 1.4564), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:28:00,745 - root - INFO - Step 22360: lr=1.00E-05, loss= 1.1578 (max= 1.4564), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:28:00,745 - root - INFO - Step 22360: lr=1.00E-05, loss= 1.1578 (max= 1.4564), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:28:16,656 - root - INFO - Step 22370: lr=1.00E-05, loss= 1.1852 (max= 1.6061), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:28:16,656 - root - INFO - Step 22370: lr=1.00E-05, loss= 1.1852 (max= 1.6061), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:28:16,656 - root - INFO - Step 22370: lr=1.00E-05, loss= 1.1852 (max= 1.6061), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:28:16,656 - root - INFO - Step 22370: lr=1.00E-05, loss= 1.1852 (max= 1.6061), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:28:16,656 - root - INFO - Step 22370: lr=1.00E-05, loss= 1.1852 (max= 1.6061), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:28:16,656 - root - INFO - Step 22370: lr=1.00E-05, loss= 1.1852 (max= 1.6061), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:28:16,656 - root - INFO - Step 22370: lr=1.00E-05, loss= 1.1852 (max= 1.6061), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:28:16,656 - root - INFO - Step 22370: lr=1.00E-05, loss= 1.1852 (max= 1.6061), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:28:32,649 - root - INFO - Step 22380: lr=1.00E-05, loss= 1.2055 (max= 1.6079), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:28:32,649 - root - INFO - Step 22380: lr=1.00E-05, loss= 1.2055 (max= 1.6079), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:28:32,649 - root - INFO - Step 22380: lr=1.00E-05, loss= 1.2055 (max= 1.6079), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:28:32,649 - root - INFO - Step 22380: lr=1.00E-05, loss= 1.2055 (max= 1.6079), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:28:32,649 - root - INFO - Step 22380: lr=1.00E-05, loss= 1.2055 (max= 1.6079), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:28:32,649 - root - INFO - Step 22380: lr=1.00E-05, loss= 1.2055 (max= 1.6079), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:28:32,649 - root - INFO - Step 22380: lr=1.00E-05, loss= 1.2055 (max= 1.6079), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:28:32,649 - root - INFO - Step 22380: lr=1.00E-05, loss= 1.2055 (max= 1.6079), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:28:48,550 - root - INFO - Step 22390: lr=1.00E-05, loss= 1.2070 (max= 1.6309), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:28:48,550 - root - INFO - Step 22390: lr=1.00E-05, loss= 1.2070 (max= 1.6309), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:28:48,550 - root - INFO - Step 22390: lr=1.00E-05, loss= 1.2070 (max= 1.6309), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:28:48,550 - root - INFO - Step 22390: lr=1.00E-05, loss= 1.2070 (max= 1.6309), tps=20612, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:28:48,550 - root - INFO - Step 22390: lr=1.00E-05, loss= 1.2070 (max= 1.6309), tps=20612, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:28:48,550 - root - INFO - Step 22390: lr=1.00E-05, loss= 1.2070 (max= 1.6309), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:28:48,550 - root - INFO - Step 22390: lr=1.00E-05, loss= 1.2070 (max= 1.6309), tps=20612, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:28:48,551 - root - INFO - Step 22390: lr=1.00E-05, loss= 1.2070 (max= 1.6309), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:29:04,489 - root - INFO - Step 22400: lr=1.00E-05, loss= 1.2067 (max= 1.5669), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:04,489 - root - INFO - Step 22400: lr=1.00E-05, loss= 1.2067 (max= 1.5669), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:04,489 - root - INFO - Step 22400: lr=1.00E-05, loss= 1.2067 (max= 1.5669), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:04,489 - root - INFO - Step 22400: lr=1.00E-05, loss= 1.2067 (max= 1.5669), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:04,489 - root - INFO - Step 22400: lr=1.00E-05, loss= 1.2067 (max= 1.5669), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:04,489 - root - INFO - Step 22400: lr=1.00E-05, loss= 1.2067 (max= 1.5669), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:04,489 - root - INFO - Step 22400: lr=1.00E-05, loss= 1.2067 (max= 1.5669), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:04,489 - root - INFO - Step 22400: lr=1.00E-05, loss= 1.2067 (max= 1.5669), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:20,442 - root - INFO - Step 22410: lr=1.00E-05, loss= 1.1710 (max= 1.5201), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:20,442 - root - INFO - Step 22410: lr=1.00E-05, loss= 1.1710 (max= 1.5201), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:20,442 - root - INFO - Step 22410: lr=1.00E-05, loss= 1.1710 (max= 1.5201), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:20,442 - root - INFO - Step 22410: lr=1.00E-05, loss= 1.1710 (max= 1.5201), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:20,442 - root - INFO - Step 22410: lr=1.00E-05, loss= 1.1710 (max= 1.5201), tps=20545, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:20,442 - root - INFO - Step 22410: lr=1.00E-05, loss= 1.1710 (max= 1.5201), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:20,442 - root - INFO - Step 22410: lr=1.00E-05, loss= 1.1710 (max= 1.5201), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:20,442 - root - INFO - Step 22410: lr=1.00E-05, loss= 1.1710 (max= 1.5201), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:36,420 - root - INFO - Step 22420: lr=1.00E-05, loss= 1.2031 (max= 1.6812), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:36,420 - root - INFO - Step 22420: lr=1.00E-05, loss= 1.2031 (max= 1.6812), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:36,420 - root - INFO - Step 22420: lr=1.00E-05, loss= 1.2031 (max= 1.6812), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:36,420 - root - INFO - Step 22420: lr=1.00E-05, loss= 1.2031 (max= 1.6812), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:36,420 - root - INFO - Step 22420: lr=1.00E-05, loss= 1.2031 (max= 1.6812), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:36,420 - root - INFO - Step 22420: lr=1.00E-05, loss= 1.2031 (max= 1.6812), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:36,420 - root - INFO - Step 22420: lr=1.00E-05, loss= 1.2031 (max= 1.6812), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:36,420 - root - INFO - Step 22420: lr=1.00E-05, loss= 1.2031 (max= 1.6812), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:52,388 - root - INFO - Step 22430: lr=1.00E-05, loss= 1.1843 (max= 1.5859), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:52,388 - root - INFO - Step 22430: lr=1.00E-05, loss= 1.1843 (max= 1.5859), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:52,388 - root - INFO - Step 22430: lr=1.00E-05, loss= 1.1843 (max= 1.5859), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:52,388 - root - INFO - Step 22430: lr=1.00E-05, loss= 1.1843 (max= 1.5859), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:52,388 - root - INFO - Step 22430: lr=1.00E-05, loss= 1.1843 (max= 1.5859), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:52,388 - root - INFO - Step 22430: lr=1.00E-05, loss= 1.1843 (max= 1.5859), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:52,388 - root - INFO - Step 22430: lr=1.00E-05, loss= 1.1843 (max= 1.5859), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:29:52,388 - root - INFO - Step 22430: lr=1.00E-05, loss= 1.1843 (max= 1.5859), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:30:08,304 - root - INFO - Step 22440: lr=1.00E-05, loss= 1.1698 (max= 1.7200), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:08,305 - root - INFO - Step 22440: lr=1.00E-05, loss= 1.1698 (max= 1.7200), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:08,305 - root - INFO - Step 22440: lr=1.00E-05, loss= 1.1698 (max= 1.7200), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:08,305 - root - INFO - Step 22440: lr=1.00E-05, loss= 1.1698 (max= 1.7200), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:08,305 - root - INFO - Step 22440: lr=1.00E-05, loss= 1.1698 (max= 1.7200), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:08,305 - root - INFO - Step 22440: lr=1.00E-05, loss= 1.1698 (max= 1.7200), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:08,305 - root - INFO - Step 22440: lr=1.00E-05, loss= 1.1698 (max= 1.7200), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:08,305 - root - INFO - Step 22440: lr=1.00E-05, loss= 1.1698 (max= 1.7200), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:24,195 - root - INFO - Step 22450: lr=1.00E-05, loss= 1.1788 (max= 1.8543), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:24,195 - root - INFO - Step 22450: lr=1.00E-05, loss= 1.1788 (max= 1.8543), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:24,195 - root - INFO - Step 22450: lr=1.00E-05, loss= 1.1788 (max= 1.8543), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:24,195 - root - INFO - Step 22450: lr=1.00E-05, loss= 1.1788 (max= 1.8543), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:24,195 - root - INFO - Step 22450: lr=1.00E-05, loss= 1.1788 (max= 1.8543), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:24,195 - root - INFO - Step 22450: lr=1.00E-05, loss= 1.1788 (max= 1.8543), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:24,195 - root - INFO - Step 22450: lr=1.00E-05, loss= 1.1788 (max= 1.8543), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:24,195 - root - INFO - Step 22450: lr=1.00E-05, loss= 1.1788 (max= 1.8543), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:40,114 - root - INFO - Step 22460: lr=1.00E-05, loss= 1.1675 (max= 1.8026), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:40,114 - root - INFO - Step 22460: lr=1.00E-05, loss= 1.1675 (max= 1.8026), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:40,114 - root - INFO - Step 22460: lr=1.00E-05, loss= 1.1675 (max= 1.8026), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:40,114 - root - INFO - Step 22460: lr=1.00E-05, loss= 1.1675 (max= 1.8026), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:40,114 - root - INFO - Step 22460: lr=1.00E-05, loss= 1.1675 (max= 1.8026), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:40,114 - root - INFO - Step 22460: lr=1.00E-05, loss= 1.1675 (max= 1.8026), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:40,114 - root - INFO - Step 22460: lr=1.00E-05, loss= 1.1675 (max= 1.8026), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:40,114 - root - INFO - Step 22460: lr=1.00E-05, loss= 1.1675 (max= 1.8026), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:48,802 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:6289006 +2025-10-24 20:30:56,057 - root - INFO - Step 22470: lr=1.00E-05, loss= 1.1917 (max= 1.6942), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:56,057 - root - INFO - Step 22470: lr=1.00E-05, loss= 1.1917 (max= 1.6942), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:56,057 - root - INFO - Step 22470: lr=1.00E-05, loss= 1.1917 (max= 1.6942), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:56,058 - root - INFO - Step 22470: lr=1.00E-05, loss= 1.1917 (max= 1.6942), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:56,058 - root - INFO - Step 22470: lr=1.00E-05, loss= 1.1917 (max= 1.6942), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:56,058 - root - INFO - Step 22470: lr=1.00E-05, loss= 1.1917 (max= 1.6942), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:56,058 - root - INFO - Step 22470: lr=1.00E-05, loss= 1.1917 (max= 1.6942), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:30:56,058 - root - INFO - Step 22470: lr=1.00E-05, loss= 1.1917 (max= 1.6942), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:31:12,012 - root - INFO - Step 22480: lr=1.00E-05, loss= 1.1808 (max= 1.5457), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:31:12,012 - root - INFO - Step 22480: lr=1.00E-05, loss= 1.1808 (max= 1.5457), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:31:12,012 - root - INFO - Step 22480: lr=1.00E-05, loss= 1.1808 (max= 1.5457), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:31:12,012 - root - INFO - Step 22480: lr=1.00E-05, loss= 1.1808 (max= 1.5457), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:31:12,012 - root - INFO - Step 22480: lr=1.00E-05, loss= 1.1808 (max= 1.5457), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:31:12,012 - root - INFO - Step 22480: lr=1.00E-05, loss= 1.1808 (max= 1.5457), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:31:12,012 - root - INFO - Step 22480: lr=1.00E-05, loss= 1.1808 (max= 1.5457), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:31:12,012 - root - INFO - Step 22480: lr=1.00E-05, loss= 1.1808 (max= 1.5457), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:31:27,931 - root - INFO - Step 22490: lr=1.00E-05, loss= 1.1793 (max= 1.7960), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:31:27,931 - root - INFO - Step 22490: lr=1.00E-05, loss= 1.1793 (max= 1.7960), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:31:27,931 - root - INFO - Step 22490: lr=1.00E-05, loss= 1.1793 (max= 1.7960), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:31:27,931 - root - INFO - Step 22490: lr=1.00E-05, loss= 1.1793 (max= 1.7960), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:31:27,931 - root - INFO - Step 22490: lr=1.00E-05, loss= 1.1793 (max= 1.7960), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:31:27,931 - root - INFO - Step 22490: lr=1.00E-05, loss= 1.1793 (max= 1.7960), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:31:27,931 - root - INFO - Step 22490: lr=1.00E-05, loss= 1.1793 (max= 1.7960), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:31:27,931 - root - INFO - Step 22490: lr=1.00E-05, loss= 1.1793 (max= 1.7960), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:31:43,918 - root - INFO - Step 22500: lr=1.00E-05, loss= 1.1815 (max= 1.5819), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:31:43,918 - root - INFO - Step 22500: lr=1.00E-05, loss= 1.1815 (max= 1.5819), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:31:43,918 - root - INFO - Step 22500: lr=1.00E-05, loss= 1.1815 (max= 1.5819), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:31:43,918 - root - INFO - Step 22500: lr=1.00E-05, loss= 1.1815 (max= 1.5819), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:31:43,918 - root - INFO - Step 22500: lr=1.00E-05, loss= 1.1815 (max= 1.5819), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:31:43,918 - root - INFO - Step 22500: lr=1.00E-05, loss= 1.1815 (max= 1.5819), tps=20501, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:31:43,918 - root - INFO - Step 22500: lr=1.00E-05, loss= 1.1815 (max= 1.5819), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:31:43,918 - root - INFO - Step 22500: lr=1.00E-05, loss= 1.1815 (max= 1.5819), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:31:59,877 - root - INFO - Step 22510: lr=1.00E-05, loss= 1.1841 (max= 1.6526), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:31:59,877 - root - INFO - Step 22510: lr=1.00E-05, loss= 1.1841 (max= 1.6526), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:31:59,877 - root - INFO - Step 22510: lr=1.00E-05, loss= 1.1841 (max= 1.6526), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:31:59,877 - root - INFO - Step 22510: lr=1.00E-05, loss= 1.1841 (max= 1.6526), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:31:59,877 - root - INFO - Step 22510: lr=1.00E-05, loss= 1.1841 (max= 1.6526), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:31:59,877 - root - INFO - Step 22510: lr=1.00E-05, loss= 1.1841 (max= 1.6526), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:31:59,877 - root - INFO - Step 22510: lr=1.00E-05, loss= 1.1841 (max= 1.6526), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:31:59,877 - root - INFO - Step 22510: lr=1.00E-05, loss= 1.1841 (max= 1.6526), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:32:15,792 - root - INFO - Step 22520: lr=1.00E-05, loss= 1.2034 (max= 1.8876), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:32:15,792 - root - INFO - Step 22520: lr=1.00E-05, loss= 1.2034 (max= 1.8876), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:32:15,792 - root - INFO - Step 22520: lr=1.00E-05, loss= 1.2034 (max= 1.8876), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:32:15,792 - root - INFO - Step 22520: lr=1.00E-05, loss= 1.2034 (max= 1.8876), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:32:15,792 - root - INFO - Step 22520: lr=1.00E-05, loss= 1.2034 (max= 1.8876), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:32:15,792 - root - INFO - Step 22520: lr=1.00E-05, loss= 1.2034 (max= 1.8876), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:32:15,792 - root - INFO - Step 22520: lr=1.00E-05, loss= 1.2034 (max= 1.8876), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:32:15,792 - root - INFO - Step 22520: lr=1.00E-05, loss= 1.2034 (max= 1.8876), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:32:31,741 - root - INFO - Step 22530: lr=1.00E-05, loss= 1.2189 (max= 1.7693), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:32:31,741 - root - INFO - Step 22530: lr=1.00E-05, loss= 1.2189 (max= 1.7693), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:32:31,741 - root - INFO - Step 22530: lr=1.00E-05, loss= 1.2189 (max= 1.7693), tps=20549, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:32:31,741 - root - INFO - Step 22530: lr=1.00E-05, loss= 1.2189 (max= 1.7693), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:32:31,741 - root - INFO - Step 22530: lr=1.00E-05, loss= 1.2189 (max= 1.7693), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:32:31,741 - root - INFO - Step 22530: lr=1.00E-05, loss= 1.2189 (max= 1.7693), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:32:31,741 - root - INFO - Step 22530: lr=1.00E-05, loss= 1.2189 (max= 1.7693), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:32:31,741 - root - INFO - Step 22530: lr=1.00E-05, loss= 1.2189 (max= 1.7693), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:32:47,690 - root - INFO - Step 22540: lr=1.00E-05, loss= 1.1732 (max= 1.4652), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:32:47,690 - root - INFO - Step 22540: lr=1.00E-05, loss= 1.1732 (max= 1.4652), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:32:47,690 - root - INFO - Step 22540: lr=1.00E-05, loss= 1.1732 (max= 1.4652), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:32:47,690 - root - INFO - Step 22540: lr=1.00E-05, loss= 1.1732 (max= 1.4652), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:32:47,690 - root - INFO - Step 22540: lr=1.00E-05, loss= 1.1732 (max= 1.4652), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:32:47,690 - root - INFO - Step 22540: lr=1.00E-05, loss= 1.1732 (max= 1.4652), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:32:47,690 - root - INFO - Step 22540: lr=1.00E-05, loss= 1.1732 (max= 1.4652), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:32:47,690 - root - INFO - Step 22540: lr=1.00E-05, loss= 1.1732 (max= 1.4652), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:03,640 - root - INFO - Step 22550: lr=1.00E-05, loss= 1.1798 (max= 1.6777), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:03,640 - root - INFO - Step 22550: lr=1.00E-05, loss= 1.1798 (max= 1.6777), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:03,641 - root - INFO - Step 22550: lr=1.00E-05, loss= 1.1798 (max= 1.6777), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:03,641 - root - INFO - Step 22550: lr=1.00E-05, loss= 1.1798 (max= 1.6777), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:03,641 - root - INFO - Step 22550: lr=1.00E-05, loss= 1.1798 (max= 1.6777), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:03,641 - root - INFO - Step 22550: lr=1.00E-05, loss= 1.1798 (max= 1.6777), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:03,641 - root - INFO - Step 22550: lr=1.00E-05, loss= 1.1798 (max= 1.6777), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:03,641 - root - INFO - Step 22550: lr=1.00E-05, loss= 1.1798 (max= 1.6777), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:19,592 - root - INFO - Step 22560: lr=1.00E-05, loss= 1.1943 (max= 1.5601), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:19,592 - root - INFO - Step 22560: lr=1.00E-05, loss= 1.1943 (max= 1.5601), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:19,592 - root - INFO - Step 22560: lr=1.00E-05, loss= 1.1943 (max= 1.5601), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:19,592 - root - INFO - Step 22560: lr=1.00E-05, loss= 1.1943 (max= 1.5601), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:19,592 - root - INFO - Step 22560: lr=1.00E-05, loss= 1.1943 (max= 1.5601), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:19,592 - root - INFO - Step 22560: lr=1.00E-05, loss= 1.1943 (max= 1.5601), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:19,593 - root - INFO - Step 22560: lr=1.00E-05, loss= 1.1943 (max= 1.5601), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:19,593 - root - INFO - Step 22560: lr=1.00E-05, loss= 1.1943 (max= 1.5601), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:35,519 - root - INFO - Step 22570: lr=1.00E-05, loss= 1.1846 (max= 1.6635), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:35,519 - root - INFO - Step 22570: lr=1.00E-05, loss= 1.1846 (max= 1.6635), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:35,519 - root - INFO - Step 22570: lr=1.00E-05, loss= 1.1846 (max= 1.6635), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:35,519 - root - INFO - Step 22570: lr=1.00E-05, loss= 1.1846 (max= 1.6635), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:35,519 - root - INFO - Step 22570: lr=1.00E-05, loss= 1.1846 (max= 1.6635), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:35,519 - root - INFO - Step 22570: lr=1.00E-05, loss= 1.1846 (max= 1.6635), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:35,519 - root - INFO - Step 22570: lr=1.00E-05, loss= 1.1846 (max= 1.6635), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:35,519 - root - INFO - Step 22570: lr=1.00E-05, loss= 1.1846 (max= 1.6635), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:51,474 - root - INFO - Step 22580: lr=1.00E-05, loss= 1.1845 (max= 1.6007), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:51,474 - root - INFO - Step 22580: lr=1.00E-05, loss= 1.1845 (max= 1.6007), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:51,474 - root - INFO - Step 22580: lr=1.00E-05, loss= 1.1845 (max= 1.6007), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:51,474 - root - INFO - Step 22580: lr=1.00E-05, loss= 1.1845 (max= 1.6007), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:51,474 - root - INFO - Step 22580: lr=1.00E-05, loss= 1.1845 (max= 1.6007), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:51,474 - root - INFO - Step 22580: lr=1.00E-05, loss= 1.1845 (max= 1.6007), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:51,474 - root - INFO - Step 22580: lr=1.00E-05, loss= 1.1845 (max= 1.6007), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:33:51,474 - root - INFO - Step 22580: lr=1.00E-05, loss= 1.1845 (max= 1.6007), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:07,440 - root - INFO - Step 22590: lr=1.00E-05, loss= 1.1733 (max= 1.4870), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:07,440 - root - INFO - Step 22590: lr=1.00E-05, loss= 1.1733 (max= 1.4870), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:07,440 - root - INFO - Step 22590: lr=1.00E-05, loss= 1.1733 (max= 1.4870), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:07,440 - root - INFO - Step 22590: lr=1.00E-05, loss= 1.1733 (max= 1.4870), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:07,441 - root - INFO - Step 22590: lr=1.00E-05, loss= 1.1733 (max= 1.4870), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:07,441 - root - INFO - Step 22590: lr=1.00E-05, loss= 1.1733 (max= 1.4870), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:07,441 - root - INFO - Step 22590: lr=1.00E-05, loss= 1.1733 (max= 1.4870), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:07,441 - root - INFO - Step 22590: lr=1.00E-05, loss= 1.1733 (max= 1.4870), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:23,355 - root - INFO - Step 22600: lr=1.00E-05, loss= 1.1981 (max= 1.5552), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:23,355 - root - INFO - Step 22600: lr=1.00E-05, loss= 1.1981 (max= 1.5552), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:23,355 - root - INFO - Step 22600: lr=1.00E-05, loss= 1.1981 (max= 1.5552), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:23,355 - root - INFO - Step 22600: lr=1.00E-05, loss= 1.1981 (max= 1.5552), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:23,355 - root - INFO - Step 22600: lr=1.00E-05, loss= 1.1981 (max= 1.5552), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:23,355 - root - INFO - Step 22600: lr=1.00E-05, loss= 1.1981 (max= 1.5552), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:23,355 - root - INFO - Step 22600: lr=1.00E-05, loss= 1.1981 (max= 1.5552), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:23,355 - root - INFO - Step 22600: lr=1.00E-05, loss= 1.1981 (max= 1.5552), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:39,338 - root - INFO - Step 22610: lr=1.00E-05, loss= 1.1692 (max= 1.4672), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:39,338 - root - INFO - Step 22610: lr=1.00E-05, loss= 1.1692 (max= 1.4672), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:39,338 - root - INFO - Step 22610: lr=1.00E-05, loss= 1.1692 (max= 1.4672), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:39,338 - root - INFO - Step 22610: lr=1.00E-05, loss= 1.1692 (max= 1.4672), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:39,338 - root - INFO - Step 22610: lr=1.00E-05, loss= 1.1692 (max= 1.4672), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:39,339 - root - INFO - Step 22610: lr=1.00E-05, loss= 1.1692 (max= 1.4672), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:39,339 - root - INFO - Step 22610: lr=1.00E-05, loss= 1.1692 (max= 1.4672), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:39,339 - root - INFO - Step 22610: lr=1.00E-05, loss= 1.1692 (max= 1.4672), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:55,276 - root - INFO - Step 22620: lr=1.00E-05, loss= 1.1328 (max= 1.5259), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:55,276 - root - INFO - Step 22620: lr=1.00E-05, loss= 1.1328 (max= 1.5259), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:55,276 - root - INFO - Step 22620: lr=1.00E-05, loss= 1.1328 (max= 1.5259), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:55,276 - root - INFO - Step 22620: lr=1.00E-05, loss= 1.1328 (max= 1.5259), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:55,276 - root - INFO - Step 22620: lr=1.00E-05, loss= 1.1328 (max= 1.5259), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:55,276 - root - INFO - Step 22620: lr=1.00E-05, loss= 1.1328 (max= 1.5259), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:55,276 - root - INFO - Step 22620: lr=1.00E-05, loss= 1.1328 (max= 1.5259), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:34:55,276 - root - INFO - Step 22620: lr=1.00E-05, loss= 1.1328 (max= 1.5259), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:11,254 - root - INFO - Step 22630: lr=1.00E-05, loss= 1.1537 (max= 1.4499), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:11,254 - root - INFO - Step 22630: lr=1.00E-05, loss= 1.1537 (max= 1.4499), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:11,254 - root - INFO - Step 22630: lr=1.00E-05, loss= 1.1537 (max= 1.4499), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:11,254 - root - INFO - Step 22630: lr=1.00E-05, loss= 1.1537 (max= 1.4499), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:11,254 - root - INFO - Step 22630: lr=1.00E-05, loss= 1.1537 (max= 1.4499), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:11,254 - root - INFO - Step 22630: lr=1.00E-05, loss= 1.1537 (max= 1.4499), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:11,254 - root - INFO - Step 22630: lr=1.00E-05, loss= 1.1537 (max= 1.4499), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:11,254 - root - INFO - Step 22630: lr=1.00E-05, loss= 1.1537 (max= 1.4499), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:27,163 - root - INFO - Step 22640: lr=1.00E-05, loss= 1.1903 (max= 1.6132), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:27,163 - root - INFO - Step 22640: lr=1.00E-05, loss= 1.1903 (max= 1.6132), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:27,163 - root - INFO - Step 22640: lr=1.00E-05, loss= 1.1903 (max= 1.6132), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:27,163 - root - INFO - Step 22640: lr=1.00E-05, loss= 1.1903 (max= 1.6132), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:27,164 - root - INFO - Step 22640: lr=1.00E-05, loss= 1.1903 (max= 1.6132), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:27,164 - root - INFO - Step 22640: lr=1.00E-05, loss= 1.1903 (max= 1.6132), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:27,164 - root - INFO - Step 22640: lr=1.00E-05, loss= 1.1903 (max= 1.6132), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:27,164 - root - INFO - Step 22640: lr=1.00E-05, loss= 1.1903 (max= 1.6132), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:43,097 - root - INFO - Step 22650: lr=1.00E-05, loss= 1.2118 (max= 1.5264), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:43,098 - root - INFO - Step 22650: lr=1.00E-05, loss= 1.2118 (max= 1.5264), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:43,098 - root - INFO - Step 22650: lr=1.00E-05, loss= 1.2118 (max= 1.5264), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:43,098 - root - INFO - Step 22650: lr=1.00E-05, loss= 1.2118 (max= 1.5264), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:43,098 - root - INFO - Step 22650: lr=1.00E-05, loss= 1.2118 (max= 1.5264), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:43,098 - root - INFO - Step 22650: lr=1.00E-05, loss= 1.2118 (max= 1.5264), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:43,098 - root - INFO - Step 22650: lr=1.00E-05, loss= 1.2118 (max= 1.5264), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:43,098 - root - INFO - Step 22650: lr=1.00E-05, loss= 1.2118 (max= 1.5264), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:59,002 - root - INFO - Step 22660: lr=1.00E-05, loss= 1.1632 (max= 1.4941), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:59,002 - root - INFO - Step 22660: lr=1.00E-05, loss= 1.1632 (max= 1.4941), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:59,002 - root - INFO - Step 22660: lr=1.00E-05, loss= 1.1632 (max= 1.4941), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:59,002 - root - INFO - Step 22660: lr=1.00E-05, loss= 1.1632 (max= 1.4941), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:59,002 - root - INFO - Step 22660: lr=1.00E-05, loss= 1.1632 (max= 1.4941), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:59,002 - root - INFO - Step 22660: lr=1.00E-05, loss= 1.1632 (max= 1.4941), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:59,002 - root - INFO - Step 22660: lr=1.00E-05, loss= 1.1632 (max= 1.4941), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:35:59,002 - root - INFO - Step 22660: lr=1.00E-05, loss= 1.1632 (max= 1.4941), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:14,928 - root - INFO - Step 22670: lr=1.00E-05, loss= 1.1861 (max= 1.5995), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:14,928 - root - INFO - Step 22670: lr=1.00E-05, loss= 1.1861 (max= 1.5995), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:14,928 - root - INFO - Step 22670: lr=1.00E-05, loss= 1.1861 (max= 1.5995), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:14,928 - root - INFO - Step 22670: lr=1.00E-05, loss= 1.1861 (max= 1.5995), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:14,929 - root - INFO - Step 22670: lr=1.00E-05, loss= 1.1861 (max= 1.5995), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:14,929 - root - INFO - Step 22670: lr=1.00E-05, loss= 1.1861 (max= 1.5995), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:14,929 - root - INFO - Step 22670: lr=1.00E-05, loss= 1.1861 (max= 1.5995), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:14,929 - root - INFO - Step 22670: lr=1.00E-05, loss= 1.1861 (max= 1.5995), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:30,893 - root - INFO - Step 22680: lr=1.00E-05, loss= 1.1725 (max= 1.6180), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:30,893 - root - INFO - Step 22680: lr=1.00E-05, loss= 1.1725 (max= 1.6180), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:30,894 - root - INFO - Step 22680: lr=1.00E-05, loss= 1.1725 (max= 1.6180), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:30,894 - root - INFO - Step 22680: lr=1.00E-05, loss= 1.1725 (max= 1.6180), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:30,894 - root - INFO - Step 22680: lr=1.00E-05, loss= 1.1725 (max= 1.6180), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:30,894 - root - INFO - Step 22680: lr=1.00E-05, loss= 1.1725 (max= 1.6180), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:30,894 - root - INFO - Step 22680: lr=1.00E-05, loss= 1.1725 (max= 1.6180), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:30,894 - root - INFO - Step 22680: lr=1.00E-05, loss= 1.1725 (max= 1.6180), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:46,814 - root - INFO - Step 22690: lr=1.00E-05, loss= 1.1740 (max= 1.4953), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:46,814 - root - INFO - Step 22690: lr=1.00E-05, loss= 1.1740 (max= 1.4953), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:46,814 - root - INFO - Step 22690: lr=1.00E-05, loss= 1.1740 (max= 1.4953), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:46,814 - root - INFO - Step 22690: lr=1.00E-05, loss= 1.1740 (max= 1.4953), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:46,814 - root - INFO - Step 22690: lr=1.00E-05, loss= 1.1740 (max= 1.4953), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:46,814 - root - INFO - Step 22690: lr=1.00E-05, loss= 1.1740 (max= 1.4953), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:46,814 - root - INFO - Step 22690: lr=1.00E-05, loss= 1.1740 (max= 1.4953), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:36:46,815 - root - INFO - Step 22690: lr=1.00E-05, loss= 1.1740 (max= 1.4953), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:02,753 - root - INFO - Step 22700: lr=1.00E-05, loss= 1.1740 (max= 1.6488), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:02,753 - root - INFO - Step 22700: lr=1.00E-05, loss= 1.1740 (max= 1.6488), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:02,753 - root - INFO - Step 22700: lr=1.00E-05, loss= 1.1740 (max= 1.6488), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:02,753 - root - INFO - Step 22700: lr=1.00E-05, loss= 1.1740 (max= 1.6488), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:02,753 - root - INFO - Step 22700: lr=1.00E-05, loss= 1.1740 (max= 1.6488), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:02,753 - root - INFO - Step 22700: lr=1.00E-05, loss= 1.1740 (max= 1.6488), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:02,753 - root - INFO - Step 22700: lr=1.00E-05, loss= 1.1740 (max= 1.6488), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:02,753 - root - INFO - Step 22700: lr=1.00E-05, loss= 1.1740 (max= 1.6488), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:18,691 - root - INFO - Step 22710: lr=1.00E-05, loss= 1.1615 (max= 1.4827), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:18,691 - root - INFO - Step 22710: lr=1.00E-05, loss= 1.1615 (max= 1.4827), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:18,691 - root - INFO - Step 22710: lr=1.00E-05, loss= 1.1615 (max= 1.4827), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:18,691 - root - INFO - Step 22710: lr=1.00E-05, loss= 1.1615 (max= 1.4827), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:18,691 - root - INFO - Step 22710: lr=1.00E-05, loss= 1.1615 (max= 1.4827), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:18,691 - root - INFO - Step 22710: lr=1.00E-05, loss= 1.1615 (max= 1.4827), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:18,692 - root - INFO - Step 22710: lr=1.00E-05, loss= 1.1615 (max= 1.4827), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:18,692 - root - INFO - Step 22710: lr=1.00E-05, loss= 1.1615 (max= 1.4827), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:34,659 - root - INFO - Step 22720: lr=1.00E-05, loss= 1.1604 (max= 1.5477), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:34,659 - root - INFO - Step 22720: lr=1.00E-05, loss= 1.1604 (max= 1.5477), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:34,659 - root - INFO - Step 22720: lr=1.00E-05, loss= 1.1604 (max= 1.5477), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:34,659 - root - INFO - Step 22720: lr=1.00E-05, loss= 1.1604 (max= 1.5477), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:34,659 - root - INFO - Step 22720: lr=1.00E-05, loss= 1.1604 (max= 1.5477), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:34,659 - root - INFO - Step 22720: lr=1.00E-05, loss= 1.1604 (max= 1.5477), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:34,659 - root - INFO - Step 22720: lr=1.00E-05, loss= 1.1604 (max= 1.5477), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:34,659 - root - INFO - Step 22720: lr=1.00E-05, loss= 1.1604 (max= 1.5477), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:35,416 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:1200191 +2025-10-24 20:37:50,599 - root - INFO - Step 22730: lr=1.00E-05, loss= 1.1772 (max= 1.5500), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:50,599 - root - INFO - Step 22730: lr=1.00E-05, loss= 1.1772 (max= 1.5500), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:50,599 - root - INFO - Step 22730: lr=1.00E-05, loss= 1.1772 (max= 1.5500), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:50,599 - root - INFO - Step 22730: lr=1.00E-05, loss= 1.1772 (max= 1.5500), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:50,599 - root - INFO - Step 22730: lr=1.00E-05, loss= 1.1772 (max= 1.5500), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:50,599 - root - INFO - Step 22730: lr=1.00E-05, loss= 1.1772 (max= 1.5500), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:50,599 - root - INFO - Step 22730: lr=1.00E-05, loss= 1.1772 (max= 1.5500), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:37:50,599 - root - INFO - Step 22730: lr=1.00E-05, loss= 1.1772 (max= 1.5500), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:06,514 - root - INFO - Step 22740: lr=1.00E-05, loss= 1.1684 (max= 1.5724), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:06,514 - root - INFO - Step 22740: lr=1.00E-05, loss= 1.1684 (max= 1.5724), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:06,514 - root - INFO - Step 22740: lr=1.00E-05, loss= 1.1684 (max= 1.5724), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:06,514 - root - INFO - Step 22740: lr=1.00E-05, loss= 1.1684 (max= 1.5724), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:06,514 - root - INFO - Step 22740: lr=1.00E-05, loss= 1.1684 (max= 1.5724), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:06,514 - root - INFO - Step 22740: lr=1.00E-05, loss= 1.1684 (max= 1.5724), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:06,515 - root - INFO - Step 22740: lr=1.00E-05, loss= 1.1684 (max= 1.5724), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:06,515 - root - INFO - Step 22740: lr=1.00E-05, loss= 1.1684 (max= 1.5724), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:22,460 - root - INFO - Step 22750: lr=1.00E-05, loss= 1.1985 (max= 1.6320), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:22,460 - root - INFO - Step 22750: lr=1.00E-05, loss= 1.1985 (max= 1.6320), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:22,460 - root - INFO - Step 22750: lr=1.00E-05, loss= 1.1985 (max= 1.6320), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:22,460 - root - INFO - Step 22750: lr=1.00E-05, loss= 1.1985 (max= 1.6320), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:22,461 - root - INFO - Step 22750: lr=1.00E-05, loss= 1.1985 (max= 1.6320), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:22,461 - root - INFO - Step 22750: lr=1.00E-05, loss= 1.1985 (max= 1.6320), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:22,461 - root - INFO - Step 22750: lr=1.00E-05, loss= 1.1985 (max= 1.6320), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:22,461 - root - INFO - Step 22750: lr=1.00E-05, loss= 1.1985 (max= 1.6320), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:38,398 - root - INFO - Step 22760: lr=1.00E-05, loss= 1.1756 (max= 1.6099), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:38,399 - root - INFO - Step 22760: lr=1.00E-05, loss= 1.1756 (max= 1.6099), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:38,399 - root - INFO - Step 22760: lr=1.00E-05, loss= 1.1756 (max= 1.6099), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:38,399 - root - INFO - Step 22760: lr=1.00E-05, loss= 1.1756 (max= 1.6099), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:38,399 - root - INFO - Step 22760: lr=1.00E-05, loss= 1.1756 (max= 1.6099), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:38,399 - root - INFO - Step 22760: lr=1.00E-05, loss= 1.1756 (max= 1.6099), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:38,399 - root - INFO - Step 22760: lr=1.00E-05, loss= 1.1756 (max= 1.6099), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:38,399 - root - INFO - Step 22760: lr=1.00E-05, loss= 1.1756 (max= 1.6099), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:54,336 - root - INFO - Step 22770: lr=1.00E-05, loss= 1.1497 (max= 1.5296), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:54,337 - root - INFO - Step 22770: lr=1.00E-05, loss= 1.1497 (max= 1.5296), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:54,337 - root - INFO - Step 22770: lr=1.00E-05, loss= 1.1497 (max= 1.5296), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:54,337 - root - INFO - Step 22770: lr=1.00E-05, loss= 1.1497 (max= 1.5296), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:54,337 - root - INFO - Step 22770: lr=1.00E-05, loss= 1.1497 (max= 1.5296), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:54,337 - root - INFO - Step 22770: lr=1.00E-05, loss= 1.1497 (max= 1.5296), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:54,337 - root - INFO - Step 22770: lr=1.00E-05, loss= 1.1497 (max= 1.5296), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:38:54,337 - root - INFO - Step 22770: lr=1.00E-05, loss= 1.1497 (max= 1.5296), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:10,315 - root - INFO - Step 22780: lr=1.00E-05, loss= 1.2012 (max= 1.7391), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:10,315 - root - INFO - Step 22780: lr=1.00E-05, loss= 1.2012 (max= 1.7391), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:10,315 - root - INFO - Step 22780: lr=1.00E-05, loss= 1.2012 (max= 1.7391), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:10,315 - root - INFO - Step 22780: lr=1.00E-05, loss= 1.2012 (max= 1.7391), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:10,315 - root - INFO - Step 22780: lr=1.00E-05, loss= 1.2012 (max= 1.7391), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:10,315 - root - INFO - Step 22780: lr=1.00E-05, loss= 1.2012 (max= 1.7391), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:10,315 - root - INFO - Step 22780: lr=1.00E-05, loss= 1.2012 (max= 1.7391), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:10,315 - root - INFO - Step 22780: lr=1.00E-05, loss= 1.2012 (max= 1.7391), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:26,255 - root - INFO - Step 22790: lr=1.00E-05, loss= 1.1613 (max= 1.5580), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:26,255 - root - INFO - Step 22790: lr=1.00E-05, loss= 1.1613 (max= 1.5580), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:26,255 - root - INFO - Step 22790: lr=1.00E-05, loss= 1.1613 (max= 1.5580), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:26,255 - root - INFO - Step 22790: lr=1.00E-05, loss= 1.1613 (max= 1.5580), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:26,255 - root - INFO - Step 22790: lr=1.00E-05, loss= 1.1613 (max= 1.5580), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:26,255 - root - INFO - Step 22790: lr=1.00E-05, loss= 1.1613 (max= 1.5580), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:26,255 - root - INFO - Step 22790: lr=1.00E-05, loss= 1.1613 (max= 1.5580), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:26,255 - root - INFO - Step 22790: lr=1.00E-05, loss= 1.1613 (max= 1.5580), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:42,163 - root - INFO - Step 22800: lr=1.00E-05, loss= 1.1991 (max= 1.5062), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:42,164 - root - INFO - Step 22800: lr=1.00E-05, loss= 1.1991 (max= 1.5062), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:42,164 - root - INFO - Step 22800: lr=1.00E-05, loss= 1.1991 (max= 1.5062), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:42,164 - root - INFO - Step 22800: lr=1.00E-05, loss= 1.1991 (max= 1.5062), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:42,164 - root - INFO - Step 22800: lr=1.00E-05, loss= 1.1991 (max= 1.5062), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:42,164 - root - INFO - Step 22800: lr=1.00E-05, loss= 1.1991 (max= 1.5062), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:42,164 - root - INFO - Step 22800: lr=1.00E-05, loss= 1.1991 (max= 1.5062), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:42,164 - root - INFO - Step 22800: lr=1.00E-05, loss= 1.1991 (max= 1.5062), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:58,106 - root - INFO - Step 22810: lr=1.00E-05, loss= 1.1718 (max= 1.5046), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:58,106 - root - INFO - Step 22810: lr=1.00E-05, loss= 1.1718 (max= 1.5046), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:58,106 - root - INFO - Step 22810: lr=1.00E-05, loss= 1.1718 (max= 1.5046), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:58,106 - root - INFO - Step 22810: lr=1.00E-05, loss= 1.1718 (max= 1.5046), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:58,106 - root - INFO - Step 22810: lr=1.00E-05, loss= 1.1718 (max= 1.5046), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:58,106 - root - INFO - Step 22810: lr=1.00E-05, loss= 1.1718 (max= 1.5046), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:58,106 - root - INFO - Step 22810: lr=1.00E-05, loss= 1.1718 (max= 1.5046), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:39:58,106 - root - INFO - Step 22810: lr=1.00E-05, loss= 1.1718 (max= 1.5046), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:40:06,822 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:2295308 +2025-10-24 20:40:14,041 - root - INFO - Step 22820: lr=1.00E-05, loss= 1.1852 (max= 1.6277), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:40:14,041 - root - INFO - Step 22820: lr=1.00E-05, loss= 1.1852 (max= 1.6277), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:40:14,041 - root - INFO - Step 22820: lr=1.00E-05, loss= 1.1852 (max= 1.6277), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:40:14,041 - root - INFO - Step 22820: lr=1.00E-05, loss= 1.1852 (max= 1.6277), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:40:14,041 - root - INFO - Step 22820: lr=1.00E-05, loss= 1.1852 (max= 1.6277), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:40:14,041 - root - INFO - Step 22820: lr=1.00E-05, loss= 1.1852 (max= 1.6277), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:40:14,041 - root - INFO - Step 22820: lr=1.00E-05, loss= 1.1852 (max= 1.6277), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:40:14,041 - root - INFO - Step 22820: lr=1.00E-05, loss= 1.1852 (max= 1.6277), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:40:29,971 - root - INFO - Step 22830: lr=1.00E-05, loss= 1.2020 (max= 1.5834), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:40:29,971 - root - INFO - Step 22830: lr=1.00E-05, loss= 1.2020 (max= 1.5834), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:40:29,971 - root - INFO - Step 22830: lr=1.00E-05, loss= 1.2020 (max= 1.5834), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:40:29,971 - root - INFO - Step 22830: lr=1.00E-05, loss= 1.2020 (max= 1.5834), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:40:29,971 - root - INFO - Step 22830: lr=1.00E-05, loss= 1.2020 (max= 1.5834), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:40:29,971 - root - INFO - Step 22830: lr=1.00E-05, loss= 1.2020 (max= 1.5834), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:40:29,971 - root - INFO - Step 22830: lr=1.00E-05, loss= 1.2020 (max= 1.5834), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:40:29,971 - root - INFO - Step 22830: lr=1.00E-05, loss= 1.2020 (max= 1.5834), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:40:45,841 - root - INFO - Step 22840: lr=1.00E-05, loss= 1.2057 (max= 1.6422), tps=20651, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:40:45,841 - root - INFO - Step 22840: lr=1.00E-05, loss= 1.2057 (max= 1.6422), tps=20651, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:40:45,841 - root - INFO - Step 22840: lr=1.00E-05, loss= 1.2057 (max= 1.6422), tps=20651, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:40:45,842 - root - INFO - Step 22840: lr=1.00E-05, loss= 1.2057 (max= 1.6422), tps=20651, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:40:45,842 - root - INFO - Step 22840: lr=1.00E-05, loss= 1.2057 (max= 1.6422), tps=20651, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:40:45,842 - root - INFO - Step 22840: lr=1.00E-05, loss= 1.2057 (max= 1.6422), tps=20651, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:40:45,842 - root - INFO - Step 22840: lr=1.00E-05, loss= 1.2057 (max= 1.6422), tps=20651, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:40:45,842 - root - INFO - Step 22840: lr=1.00E-05, loss= 1.2057 (max= 1.6422), tps=20651, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:01,784 - root - INFO - Step 22850: lr=1.00E-05, loss= 1.1779 (max= 1.5500), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:01,784 - root - INFO - Step 22850: lr=1.00E-05, loss= 1.1779 (max= 1.5500), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:01,784 - root - INFO - Step 22850: lr=1.00E-05, loss= 1.1779 (max= 1.5500), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:01,784 - root - INFO - Step 22850: lr=1.00E-05, loss= 1.1779 (max= 1.5500), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:01,784 - root - INFO - Step 22850: lr=1.00E-05, loss= 1.1779 (max= 1.5500), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:01,784 - root - INFO - Step 22850: lr=1.00E-05, loss= 1.1779 (max= 1.5500), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:01,784 - root - INFO - Step 22850: lr=1.00E-05, loss= 1.1779 (max= 1.5500), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:01,784 - root - INFO - Step 22850: lr=1.00E-05, loss= 1.1779 (max= 1.5500), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:17,769 - root - INFO - Step 22860: lr=1.00E-05, loss= 1.1647 (max= 1.4980), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:17,769 - root - INFO - Step 22860: lr=1.00E-05, loss= 1.1647 (max= 1.4980), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:17,769 - root - INFO - Step 22860: lr=1.00E-05, loss= 1.1647 (max= 1.4980), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:17,769 - root - INFO - Step 22860: lr=1.00E-05, loss= 1.1647 (max= 1.4980), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:17,769 - root - INFO - Step 22860: lr=1.00E-05, loss= 1.1647 (max= 1.4980), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:17,769 - root - INFO - Step 22860: lr=1.00E-05, loss= 1.1647 (max= 1.4980), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:17,769 - root - INFO - Step 22860: lr=1.00E-05, loss= 1.1647 (max= 1.4980), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:17,769 - root - INFO - Step 22860: lr=1.00E-05, loss= 1.1647 (max= 1.4980), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:33,711 - root - INFO - Step 22870: lr=1.00E-05, loss= 1.2038 (max= 1.5420), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:41:33,711 - root - INFO - Step 22870: lr=1.00E-05, loss= 1.2038 (max= 1.5420), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:41:33,711 - root - INFO - Step 22870: lr=1.00E-05, loss= 1.2038 (max= 1.5420), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:41:33,711 - root - INFO - Step 22870: lr=1.00E-05, loss= 1.2038 (max= 1.5420), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:41:33,711 - root - INFO - Step 22870: lr=1.00E-05, loss= 1.2038 (max= 1.5420), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:41:33,711 - root - INFO - Step 22870: lr=1.00E-05, loss= 1.2038 (max= 1.5420), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:41:33,711 - root - INFO - Step 22870: lr=1.00E-05, loss= 1.2038 (max= 1.5420), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:41:33,711 - root - INFO - Step 22870: lr=1.00E-05, loss= 1.2038 (max= 1.5420), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:41:49,637 - root - INFO - Step 22880: lr=1.00E-05, loss= 1.1467 (max= 1.5257), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:49,637 - root - INFO - Step 22880: lr=1.00E-05, loss= 1.1467 (max= 1.5257), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:49,637 - root - INFO - Step 22880: lr=1.00E-05, loss= 1.1467 (max= 1.5257), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:49,637 - root - INFO - Step 22880: lr=1.00E-05, loss= 1.1467 (max= 1.5257), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:49,637 - root - INFO - Step 22880: lr=1.00E-05, loss= 1.1467 (max= 1.5257), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:49,638 - root - INFO - Step 22880: lr=1.00E-05, loss= 1.1467 (max= 1.5257), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:49,638 - root - INFO - Step 22880: lr=1.00E-05, loss= 1.1467 (max= 1.5257), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:41:49,638 - root - INFO - Step 22880: lr=1.00E-05, loss= 1.1467 (max= 1.5257), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:42:05,583 - root - INFO - Step 22890: lr=1.00E-05, loss= 1.1820 (max= 1.6503), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:05,583 - root - INFO - Step 22890: lr=1.00E-05, loss= 1.1820 (max= 1.6503), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:05,583 - root - INFO - Step 22890: lr=1.00E-05, loss= 1.1820 (max= 1.6503), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:05,583 - root - INFO - Step 22890: lr=1.00E-05, loss= 1.1820 (max= 1.6503), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:05,583 - root - INFO - Step 22890: lr=1.00E-05, loss= 1.1820 (max= 1.6503), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:05,583 - root - INFO - Step 22890: lr=1.00E-05, loss= 1.1820 (max= 1.6503), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:05,583 - root - INFO - Step 22890: lr=1.00E-05, loss= 1.1820 (max= 1.6503), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:05,583 - root - INFO - Step 22890: lr=1.00E-05, loss= 1.1820 (max= 1.6503), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:21,555 - root - INFO - Step 22900: lr=1.00E-05, loss= 1.1703 (max= 1.6880), tps=20521, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:21,555 - root - INFO - Step 22900: lr=1.00E-05, loss= 1.1703 (max= 1.6880), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:21,555 - root - INFO - Step 22900: lr=1.00E-05, loss= 1.1703 (max= 1.6880), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:21,555 - root - INFO - Step 22900: lr=1.00E-05, loss= 1.1703 (max= 1.6880), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:21,555 - root - INFO - Step 22900: lr=1.00E-05, loss= 1.1703 (max= 1.6880), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:21,555 - root - INFO - Step 22900: lr=1.00E-05, loss= 1.1703 (max= 1.6880), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:21,555 - root - INFO - Step 22900: lr=1.00E-05, loss= 1.1703 (max= 1.6880), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:21,556 - root - INFO - Step 22900: lr=1.00E-05, loss= 1.1703 (max= 1.6880), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:37,476 - root - INFO - Step 22910: lr=1.00E-05, loss= 1.1912 (max= 1.5393), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:37,476 - root - INFO - Step 22910: lr=1.00E-05, loss= 1.1912 (max= 1.5393), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:37,476 - root - INFO - Step 22910: lr=1.00E-05, loss= 1.1912 (max= 1.5393), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:37,476 - root - INFO - Step 22910: lr=1.00E-05, loss= 1.1912 (max= 1.5393), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:37,476 - root - INFO - Step 22910: lr=1.00E-05, loss= 1.1912 (max= 1.5393), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:37,476 - root - INFO - Step 22910: lr=1.00E-05, loss= 1.1912 (max= 1.5393), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:37,476 - root - INFO - Step 22910: lr=1.00E-05, loss= 1.1912 (max= 1.5393), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:37,476 - root - INFO - Step 22910: lr=1.00E-05, loss= 1.1912 (max= 1.5393), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:53,421 - root - INFO - Step 22920: lr=1.00E-05, loss= 1.2082 (max= 1.6448), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:53,421 - root - INFO - Step 22920: lr=1.00E-05, loss= 1.2082 (max= 1.6448), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:53,421 - root - INFO - Step 22920: lr=1.00E-05, loss= 1.2082 (max= 1.6448), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:53,421 - root - INFO - Step 22920: lr=1.00E-05, loss= 1.2082 (max= 1.6448), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:53,421 - root - INFO - Step 22920: lr=1.00E-05, loss= 1.2082 (max= 1.6448), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:53,421 - root - INFO - Step 22920: lr=1.00E-05, loss= 1.2082 (max= 1.6448), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:53,421 - root - INFO - Step 22920: lr=1.00E-05, loss= 1.2082 (max= 1.6448), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:42:53,421 - root - INFO - Step 22920: lr=1.00E-05, loss= 1.2082 (max= 1.6448), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:43:09,351 - root - INFO - Step 22930: lr=1.00E-05, loss= 1.1624 (max= 1.5206), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:43:09,351 - root - INFO - Step 22930: lr=1.00E-05, loss= 1.1624 (max= 1.5206), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:43:09,352 - root - INFO - Step 22930: lr=1.00E-05, loss= 1.1624 (max= 1.5206), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:43:09,352 - root - INFO - Step 22930: lr=1.00E-05, loss= 1.1624 (max= 1.5206), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:43:09,352 - root - INFO - Step 22930: lr=1.00E-05, loss= 1.1624 (max= 1.5206), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:43:09,352 - root - INFO - Step 22930: lr=1.00E-05, loss= 1.1624 (max= 1.5206), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:43:09,352 - root - INFO - Step 22930: lr=1.00E-05, loss= 1.1624 (max= 1.5206), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:43:09,352 - root - INFO - Step 22930: lr=1.00E-05, loss= 1.1624 (max= 1.5206), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:43:25,246 - root - INFO - Step 22940: lr=1.00E-05, loss= 1.1283 (max= 1.4362), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:43:25,246 - root - INFO - Step 22940: lr=1.00E-05, loss= 1.1283 (max= 1.4362), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:43:25,246 - root - INFO - Step 22940: lr=1.00E-05, loss= 1.1283 (max= 1.4362), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:43:25,246 - root - INFO - Step 22940: lr=1.00E-05, loss= 1.1283 (max= 1.4362), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:43:25,246 - root - INFO - Step 22940: lr=1.00E-05, loss= 1.1283 (max= 1.4362), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:43:25,246 - root - INFO - Step 22940: lr=1.00E-05, loss= 1.1283 (max= 1.4362), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:43:25,247 - root - INFO - Step 22940: lr=1.00E-05, loss= 1.1283 (max= 1.4362), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:43:25,247 - root - INFO - Step 22940: lr=1.00E-05, loss= 1.1283 (max= 1.4362), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:43:41,191 - root - INFO - Step 22950: lr=1.00E-05, loss= 1.1943 (max= 1.5156), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:43:41,191 - root - INFO - Step 22950: lr=1.00E-05, loss= 1.1943 (max= 1.5156), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:43:41,191 - root - INFO - Step 22950: lr=1.00E-05, loss= 1.1943 (max= 1.5156), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:43:41,191 - root - INFO - Step 22950: lr=1.00E-05, loss= 1.1943 (max= 1.5156), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:43:41,192 - root - INFO - Step 22950: lr=1.00E-05, loss= 1.1943 (max= 1.5156), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:43:41,192 - root - INFO - Step 22950: lr=1.00E-05, loss= 1.1943 (max= 1.5156), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:43:41,192 - root - INFO - Step 22950: lr=1.00E-05, loss= 1.1943 (max= 1.5156), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:43:41,192 - root - INFO - Step 22950: lr=1.00E-05, loss= 1.1943 (max= 1.5156), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:43:57,197 - root - INFO - Step 22960: lr=1.00E-05, loss= 1.1747 (max= 1.4944), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:43:57,197 - root - INFO - Step 22960: lr=1.00E-05, loss= 1.1747 (max= 1.4944), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:43:57,197 - root - INFO - Step 22960: lr=1.00E-05, loss= 1.1747 (max= 1.4944), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:43:57,197 - root - INFO - Step 22960: lr=1.00E-05, loss= 1.1747 (max= 1.4944), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:43:57,197 - root - INFO - Step 22960: lr=1.00E-05, loss= 1.1747 (max= 1.4944), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:43:57,197 - root - INFO - Step 22960: lr=1.00E-05, loss= 1.1747 (max= 1.4944), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:43:57,197 - root - INFO - Step 22960: lr=1.00E-05, loss= 1.1747 (max= 1.4944), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:43:57,197 - root - INFO - Step 22960: lr=1.00E-05, loss= 1.1747 (max= 1.4944), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:44:13,103 - root - INFO - Step 22970: lr=1.00E-05, loss= 1.1547 (max= 1.5986), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:13,103 - root - INFO - Step 22970: lr=1.00E-05, loss= 1.1547 (max= 1.5986), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:13,103 - root - INFO - Step 22970: lr=1.00E-05, loss= 1.1547 (max= 1.5986), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:13,103 - root - INFO - Step 22970: lr=1.00E-05, loss= 1.1547 (max= 1.5986), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:13,103 - root - INFO - Step 22970: lr=1.00E-05, loss= 1.1547 (max= 1.5986), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:13,103 - root - INFO - Step 22970: lr=1.00E-05, loss= 1.1547 (max= 1.5986), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:13,103 - root - INFO - Step 22970: lr=1.00E-05, loss= 1.1547 (max= 1.5986), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:13,103 - root - INFO - Step 22970: lr=1.00E-05, loss= 1.1547 (max= 1.5986), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:29,065 - root - INFO - Step 22980: lr=1.00E-05, loss= 1.2151 (max= 1.5861), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:29,065 - root - INFO - Step 22980: lr=1.00E-05, loss= 1.2151 (max= 1.5861), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:29,065 - root - INFO - Step 22980: lr=1.00E-05, loss= 1.2151 (max= 1.5861), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:29,065 - root - INFO - Step 22980: lr=1.00E-05, loss= 1.2151 (max= 1.5861), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:29,065 - root - INFO - Step 22980: lr=1.00E-05, loss= 1.2151 (max= 1.5861), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:29,065 - root - INFO - Step 22980: lr=1.00E-05, loss= 1.2151 (max= 1.5861), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:29,066 - root - INFO - Step 22980: lr=1.00E-05, loss= 1.2151 (max= 1.5861), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:29,066 - root - INFO - Step 22980: lr=1.00E-05, loss= 1.2151 (max= 1.5861), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:44,988 - root - INFO - Step 22990: lr=1.00E-05, loss= 1.2039 (max= 1.6067), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:44,988 - root - INFO - Step 22990: lr=1.00E-05, loss= 1.2039 (max= 1.6067), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:44,989 - root - INFO - Step 22990: lr=1.00E-05, loss= 1.2039 (max= 1.6067), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:44,989 - root - INFO - Step 22990: lr=1.00E-05, loss= 1.2039 (max= 1.6067), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:44,989 - root - INFO - Step 22990: lr=1.00E-05, loss= 1.2039 (max= 1.6067), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:44,989 - root - INFO - Step 22990: lr=1.00E-05, loss= 1.2039 (max= 1.6067), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:44,989 - root - INFO - Step 22990: lr=1.00E-05, loss= 1.2039 (max= 1.6067), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:44:44,989 - root - INFO - Step 22990: lr=1.00E-05, loss= 1.2039 (max= 1.6067), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-23000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-23000! Save time: 4.268286466598511 +2025-10-24 20:45:00,959 - root - INFO - Step 23000: lr=1.00E-05, loss= 1.1772 (max= 1.6118), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:45:00,959 - root - INFO - Saving a full checkpoint at step 23000 +2025-10-24 20:45:00,959 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 20:45:00,959 - root - INFO - Step 23000: lr=1.00E-05, loss= 1.1772 (max= 1.6118), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:45:00,959 - root - INFO - Saving a full checkpoint at step 23000 +2025-10-24 20:45:00,959 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 20:45:00,959 - root - INFO - Step 23000: lr=1.00E-05, loss= 1.1772 (max= 1.6118), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:45:00,959 - root - INFO - Saving a full checkpoint at step 23000 +2025-10-24 20:45:00,959 - root - INFO - Step 23000: lr=1.00E-05, loss= 1.1772 (max= 1.6118), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:45:00,959 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 20:45:00,959 - root - INFO - Step 23000: lr=1.00E-05, loss= 1.1772 (max= 1.6118), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:45:00,959 - root - INFO - Saving a full checkpoint at step 23000 +2025-10-24 20:45:00,959 - root - INFO - Step 23000: lr=1.00E-05, loss= 1.1772 (max= 1.6118), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:45:00,959 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 20:45:00,959 - root - INFO - Saving a full checkpoint at step 23000 +2025-10-24 20:45:00,959 - root - INFO - Step 23000: lr=1.00E-05, loss= 1.1772 (max= 1.6118), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:45:00,959 - root - INFO - Saving a full checkpoint at step 23000 +2025-10-24 20:45:00,959 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 20:45:00,959 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 20:45:00,959 - root - INFO - Saving a full checkpoint at step 23000 +2025-10-24 20:45:00,959 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 20:45:00,960 - root - INFO - Step 23000: lr=1.00E-05, loss= 1.1772 (max= 1.6118), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:45:00,960 - root - INFO - Saving a full checkpoint at step 23000 +2025-10-24 20:45:00,960 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 20:45:15,050 - root - INFO - Finished saving the checkpoint in 14.09 seconds +2025-10-24 20:45:15,059 - root - INFO - Finished saving the checkpoint in 14.10 seconds +2025-10-24 20:45:15,059 - root - INFO - Finished saving the checkpoint in 14.10 seconds +2025-10-24 20:45:15,059 - root - INFO - Finished saving the checkpoint in 14.10 seconds +2025-10-24 20:45:15,060 - root - INFO - Finished saving the checkpoint in 14.10 seconds +2025-10-24 20:45:15,060 - root - INFO - Finished saving the checkpoint in 14.10 seconds +2025-10-24 20:45:15,060 - root - INFO - Finished saving the checkpoint in 14.10 seconds +2025-10-24 20:45:15,064 - root - INFO - Finished saving the checkpoint in 14.10 seconds +2025-10-24 20:45:30,933 - root - INFO - Step 23010: lr=1.00E-05, loss= 1.1781 (max= 1.4973), tps=10933, mfu=22.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:45:30,933 - root - INFO - Step 23010: lr=1.00E-05, loss= 1.1781 (max= 1.4973), tps=10933, mfu=22.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:45:30,933 - root - INFO - Step 23010: lr=1.00E-05, loss= 1.1781 (max= 1.4973), tps=10933, mfu=22.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:45:30,934 - root - INFO - Step 23010: lr=1.00E-05, loss= 1.1781 (max= 1.4973), tps=10933, mfu=22.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:45:30,934 - root - INFO - Step 23010: lr=1.00E-05, loss= 1.1781 (max= 1.4973), tps=10933, mfu=22.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:45:30,934 - root - INFO - Step 23010: lr=1.00E-05, loss= 1.1781 (max= 1.4973), tps=10933, mfu=22.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:45:30,934 - root - INFO - Step 23010: lr=1.00E-05, loss= 1.1781 (max= 1.4973), tps=10933, mfu=22.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:45:30,934 - root - INFO - Step 23010: lr=1.00E-05, loss= 1.1781 (max= 1.4973), tps=10933, mfu=22.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:45:46,876 - root - INFO - Step 23020: lr=1.00E-05, loss= 1.2063 (max= 1.5824), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:45:46,876 - root - INFO - Step 23020: lr=1.00E-05, loss= 1.2063 (max= 1.5824), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:45:46,876 - root - INFO - Step 23020: lr=1.00E-05, loss= 1.2063 (max= 1.5824), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:45:46,876 - root - INFO - Step 23020: lr=1.00E-05, loss= 1.2063 (max= 1.5824), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:45:46,876 - root - INFO - Step 23020: lr=1.00E-05, loss= 1.2063 (max= 1.5824), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:45:46,876 - root - INFO - Step 23020: lr=1.00E-05, loss= 1.2063 (max= 1.5824), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:45:46,876 - root - INFO - Step 23020: lr=1.00E-05, loss= 1.2063 (max= 1.5824), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:45:46,876 - root - INFO - Step 23020: lr=1.00E-05, loss= 1.2063 (max= 1.5824), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:46:02,857 - root - INFO - Step 23030: lr=1.00E-05, loss= 1.1889 (max= 1.5582), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:46:02,858 - root - INFO - Step 23030: lr=1.00E-05, loss= 1.1889 (max= 1.5582), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:46:02,858 - root - INFO - Step 23030: lr=1.00E-05, loss= 1.1889 (max= 1.5582), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:46:02,858 - root - INFO - Step 23030: lr=1.00E-05, loss= 1.1889 (max= 1.5582), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:46:02,858 - root - INFO - Step 23030: lr=1.00E-05, loss= 1.1889 (max= 1.5582), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:46:02,858 - root - INFO - Step 23030: lr=1.00E-05, loss= 1.1889 (max= 1.5582), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:46:02,858 - root - INFO - Step 23030: lr=1.00E-05, loss= 1.1889 (max= 1.5582), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:46:02,858 - root - INFO - Step 23030: lr=1.00E-05, loss= 1.1889 (max= 1.5582), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:46:18,788 - root - INFO - Step 23040: lr=1.00E-05, loss= 1.1915 (max= 1.8439), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:46:18,788 - root - INFO - Step 23040: lr=1.00E-05, loss= 1.1915 (max= 1.8439), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:46:18,789 - root - INFO - Step 23040: lr=1.00E-05, loss= 1.1915 (max= 1.8439), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:46:18,789 - root - INFO - Step 23040: lr=1.00E-05, loss= 1.1915 (max= 1.8439), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:46:18,789 - root - INFO - Step 23040: lr=1.00E-05, loss= 1.1915 (max= 1.8439), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:46:18,789 - root - INFO - Step 23040: lr=1.00E-05, loss= 1.1915 (max= 1.8439), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:46:18,789 - root - INFO - Step 23040: lr=1.00E-05, loss= 1.1915 (max= 1.8439), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:46:18,789 - root - INFO - Step 23040: lr=1.00E-05, loss= 1.1915 (max= 1.8439), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:46:34,689 - root - INFO - Step 23050: lr=1.00E-05, loss= 1.1599 (max= 1.5608), tps=20612, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:46:34,689 - root - INFO - Step 23050: lr=1.00E-05, loss= 1.1599 (max= 1.5608), tps=20612, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:46:34,690 - root - INFO - Step 23050: lr=1.00E-05, loss= 1.1599 (max= 1.5608), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:46:34,690 - root - INFO - Step 23050: lr=1.00E-05, loss= 1.1599 (max= 1.5608), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:46:34,690 - root - INFO - Step 23050: lr=1.00E-05, loss= 1.1599 (max= 1.5608), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:46:34,690 - root - INFO - Step 23050: lr=1.00E-05, loss= 1.1599 (max= 1.5608), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:46:34,690 - root - INFO - Step 23050: lr=1.00E-05, loss= 1.1599 (max= 1.5608), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:46:34,690 - root - INFO - Step 23050: lr=1.00E-05, loss= 1.1599 (max= 1.5608), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:46:50,626 - root - INFO - Step 23060: lr=1.00E-05, loss= 1.1981 (max= 1.6602), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:46:50,626 - root - INFO - Step 23060: lr=1.00E-05, loss= 1.1981 (max= 1.6602), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:46:50,626 - root - INFO - Step 23060: lr=1.00E-05, loss= 1.1981 (max= 1.6602), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:46:50,626 - root - INFO - Step 23060: lr=1.00E-05, loss= 1.1981 (max= 1.6602), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:46:50,626 - root - INFO - Step 23060: lr=1.00E-05, loss= 1.1981 (max= 1.6602), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:46:50,626 - root - INFO - Step 23060: lr=1.00E-05, loss= 1.1981 (max= 1.6602), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:46:50,626 - root - INFO - Step 23060: lr=1.00E-05, loss= 1.1981 (max= 1.6602), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:46:50,627 - root - INFO - Step 23060: lr=1.00E-05, loss= 1.1981 (max= 1.6602), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:47:06,568 - root - INFO - Step 23070: lr=1.00E-05, loss= 1.1809 (max= 1.5068), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:06,568 - root - INFO - Step 23070: lr=1.00E-05, loss= 1.1809 (max= 1.5068), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:06,568 - root - INFO - Step 23070: lr=1.00E-05, loss= 1.1809 (max= 1.5068), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:06,568 - root - INFO - Step 23070: lr=1.00E-05, loss= 1.1809 (max= 1.5068), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:06,568 - root - INFO - Step 23070: lr=1.00E-05, loss= 1.1809 (max= 1.5068), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:06,568 - root - INFO - Step 23070: lr=1.00E-05, loss= 1.1809 (max= 1.5068), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:06,568 - root - INFO - Step 23070: lr=1.00E-05, loss= 1.1809 (max= 1.5068), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:06,568 - root - INFO - Step 23070: lr=1.00E-05, loss= 1.1809 (max= 1.5068), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:22,536 - root - INFO - Step 23080: lr=1.00E-05, loss= 1.1673 (max= 1.5246), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:22,536 - root - INFO - Step 23080: lr=1.00E-05, loss= 1.1673 (max= 1.5246), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:22,536 - root - INFO - Step 23080: lr=1.00E-05, loss= 1.1673 (max= 1.5246), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:22,536 - root - INFO - Step 23080: lr=1.00E-05, loss= 1.1673 (max= 1.5246), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:22,536 - root - INFO - Step 23080: lr=1.00E-05, loss= 1.1673 (max= 1.5246), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:22,536 - root - INFO - Step 23080: lr=1.00E-05, loss= 1.1673 (max= 1.5246), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:22,536 - root - INFO - Step 23080: lr=1.00E-05, loss= 1.1673 (max= 1.5246), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:22,536 - root - INFO - Step 23080: lr=1.00E-05, loss= 1.1673 (max= 1.5246), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:38,493 - root - INFO - Step 23090: lr=1.00E-05, loss= 1.1940 (max= 1.7133), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:38,493 - root - INFO - Step 23090: lr=1.00E-05, loss= 1.1940 (max= 1.7133), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:38,493 - root - INFO - Step 23090: lr=1.00E-05, loss= 1.1940 (max= 1.7133), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:38,493 - root - INFO - Step 23090: lr=1.00E-05, loss= 1.1940 (max= 1.7133), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:38,493 - root - INFO - Step 23090: lr=1.00E-05, loss= 1.1940 (max= 1.7133), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:38,493 - root - INFO - Step 23090: lr=1.00E-05, loss= 1.1940 (max= 1.7133), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:38,493 - root - INFO - Step 23090: lr=1.00E-05, loss= 1.1940 (max= 1.7133), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:38,493 - root - INFO - Step 23090: lr=1.00E-05, loss= 1.1940 (max= 1.7133), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:54,408 - root - INFO - Step 23100: lr=1.00E-05, loss= 1.1876 (max= 1.5756), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:54,408 - root - INFO - Step 23100: lr=1.00E-05, loss= 1.1876 (max= 1.5756), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:54,408 - root - INFO - Step 23100: lr=1.00E-05, loss= 1.1876 (max= 1.5756), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:54,408 - root - INFO - Step 23100: lr=1.00E-05, loss= 1.1876 (max= 1.5756), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:54,408 - root - INFO - Step 23100: lr=1.00E-05, loss= 1.1876 (max= 1.5756), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:54,408 - root - INFO - Step 23100: lr=1.00E-05, loss= 1.1876 (max= 1.5756), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:54,408 - root - INFO - Step 23100: lr=1.00E-05, loss= 1.1876 (max= 1.5756), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:47:54,408 - root - INFO - Step 23100: lr=1.00E-05, loss= 1.1876 (max= 1.5756), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:48:10,304 - root - INFO - Step 23110: lr=1.00E-05, loss= 1.1684 (max= 1.5809), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:48:10,305 - root - INFO - Step 23110: lr=1.00E-05, loss= 1.1684 (max= 1.5809), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:48:10,305 - root - INFO - Step 23110: lr=1.00E-05, loss= 1.1684 (max= 1.5809), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:48:10,305 - root - INFO - Step 23110: lr=1.00E-05, loss= 1.1684 (max= 1.5809), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:48:10,305 - root - INFO - Step 23110: lr=1.00E-05, loss= 1.1684 (max= 1.5809), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:48:10,305 - root - INFO - Step 23110: lr=1.00E-05, loss= 1.1684 (max= 1.5809), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:48:10,305 - root - INFO - Step 23110: lr=1.00E-05, loss= 1.1684 (max= 1.5809), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:48:10,305 - root - INFO - Step 23110: lr=1.00E-05, loss= 1.1684 (max= 1.5809), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:48:26,248 - root - INFO - Step 23120: lr=1.00E-05, loss= 1.1769 (max= 1.8268), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:26,249 - root - INFO - Step 23120: lr=1.00E-05, loss= 1.1769 (max= 1.8268), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:26,249 - root - INFO - Step 23120: lr=1.00E-05, loss= 1.1769 (max= 1.8268), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:26,249 - root - INFO - Step 23120: lr=1.00E-05, loss= 1.1769 (max= 1.8268), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:26,249 - root - INFO - Step 23120: lr=1.00E-05, loss= 1.1769 (max= 1.8268), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:26,249 - root - INFO - Step 23120: lr=1.00E-05, loss= 1.1769 (max= 1.8268), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:26,249 - root - INFO - Step 23120: lr=1.00E-05, loss= 1.1769 (max= 1.8268), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:26,249 - root - INFO - Step 23120: lr=1.00E-05, loss= 1.1769 (max= 1.8268), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:42,147 - root - INFO - Step 23130: lr=1.00E-05, loss= 1.1447 (max= 1.5237), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:42,148 - root - INFO - Step 23130: lr=1.00E-05, loss= 1.1447 (max= 1.5237), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:42,148 - root - INFO - Step 23130: lr=1.00E-05, loss= 1.1447 (max= 1.5237), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:42,148 - root - INFO - Step 23130: lr=1.00E-05, loss= 1.1447 (max= 1.5237), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:42,148 - root - INFO - Step 23130: lr=1.00E-05, loss= 1.1447 (max= 1.5237), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:42,148 - root - INFO - Step 23130: lr=1.00E-05, loss= 1.1447 (max= 1.5237), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:42,148 - root - INFO - Step 23130: lr=1.00E-05, loss= 1.1447 (max= 1.5237), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:42,148 - root - INFO - Step 23130: lr=1.00E-05, loss= 1.1447 (max= 1.5237), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:58,103 - root - INFO - Step 23140: lr=1.00E-05, loss= 1.2288 (max= 1.5542), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:58,103 - root - INFO - Step 23140: lr=1.00E-05, loss= 1.2288 (max= 1.5542), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:58,103 - root - INFO - Step 23140: lr=1.00E-05, loss= 1.2288 (max= 1.5542), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:58,103 - root - INFO - Step 23140: lr=1.00E-05, loss= 1.2288 (max= 1.5542), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:58,103 - root - INFO - Step 23140: lr=1.00E-05, loss= 1.2288 (max= 1.5542), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:58,103 - root - INFO - Step 23140: lr=1.00E-05, loss= 1.2288 (max= 1.5542), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:58,103 - root - INFO - Step 23140: lr=1.00E-05, loss= 1.2288 (max= 1.5542), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:48:58,103 - root - INFO - Step 23140: lr=1.00E-05, loss= 1.2288 (max= 1.5542), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:49:14,086 - root - INFO - Step 23150: lr=1.00E-05, loss= 1.1908 (max= 1.8269), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:49:14,086 - root - INFO - Step 23150: lr=1.00E-05, loss= 1.1908 (max= 1.8269), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:49:14,086 - root - INFO - Step 23150: lr=1.00E-05, loss= 1.1908 (max= 1.8269), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:49:14,086 - root - INFO - Step 23150: lr=1.00E-05, loss= 1.1908 (max= 1.8269), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:49:14,086 - root - INFO - Step 23150: lr=1.00E-05, loss= 1.1908 (max= 1.8269), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:49:14,086 - root - INFO - Step 23150: lr=1.00E-05, loss= 1.1908 (max= 1.8269), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:49:14,086 - root - INFO - Step 23150: lr=1.00E-05, loss= 1.1908 (max= 1.8269), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:49:14,086 - root - INFO - Step 23150: lr=1.00E-05, loss= 1.1908 (max= 1.8269), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:49:30,083 - root - INFO - Step 23160: lr=1.00E-05, loss= 1.1944 (max= 1.8959), tps=20488, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:49:30,083 - root - INFO - Step 23160: lr=1.00E-05, loss= 1.1944 (max= 1.8959), tps=20488, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:49:30,083 - root - INFO - Step 23160: lr=1.00E-05, loss= 1.1944 (max= 1.8959), tps=20488, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:49:30,083 - root - INFO - Step 23160: lr=1.00E-05, loss= 1.1944 (max= 1.8959), tps=20488, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:49:30,083 - root - INFO - Step 23160: lr=1.00E-05, loss= 1.1944 (max= 1.8959), tps=20488, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:49:30,083 - root - INFO - Step 23160: lr=1.00E-05, loss= 1.1944 (max= 1.8959), tps=20488, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:49:30,083 - root - INFO - Step 23160: lr=1.00E-05, loss= 1.1944 (max= 1.8959), tps=20488, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:49:30,083 - root - INFO - Step 23160: lr=1.00E-05, loss= 1.1944 (max= 1.8959), tps=20488, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:49:46,047 - root - INFO - Step 23170: lr=1.00E-05, loss= 1.1842 (max= 1.7262), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:49:46,047 - root - INFO - Step 23170: lr=1.00E-05, loss= 1.1842 (max= 1.7262), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:49:46,047 - root - INFO - Step 23170: lr=1.00E-05, loss= 1.1842 (max= 1.7262), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:49:46,047 - root - INFO - Step 23170: lr=1.00E-05, loss= 1.1842 (max= 1.7262), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:49:46,047 - root - INFO - Step 23170: lr=1.00E-05, loss= 1.1842 (max= 1.7262), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:49:46,047 - root - INFO - Step 23170: lr=1.00E-05, loss= 1.1842 (max= 1.7262), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:49:46,047 - root - INFO - Step 23170: lr=1.00E-05, loss= 1.1842 (max= 1.7262), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:49:46,047 - root - INFO - Step 23170: lr=1.00E-05, loss= 1.1842 (max= 1.7262), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:50:02,004 - root - INFO - Step 23180: lr=1.00E-05, loss= 1.2060 (max= 1.8694), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:50:02,004 - root - INFO - Step 23180: lr=1.00E-05, loss= 1.2060 (max= 1.8694), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:50:02,004 - root - INFO - Step 23180: lr=1.00E-05, loss= 1.2060 (max= 1.8694), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:50:02,004 - root - INFO - Step 23180: lr=1.00E-05, loss= 1.2060 (max= 1.8694), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:50:02,004 - root - INFO - Step 23180: lr=1.00E-05, loss= 1.2060 (max= 1.8694), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:50:02,004 - root - INFO - Step 23180: lr=1.00E-05, loss= 1.2060 (max= 1.8694), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:50:02,004 - root - INFO - Step 23180: lr=1.00E-05, loss= 1.2060 (max= 1.8694), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:50:02,004 - root - INFO - Step 23180: lr=1.00E-05, loss= 1.2060 (max= 1.8694), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:50:17,935 - root - INFO - Step 23190: lr=1.00E-05, loss= 1.2583 (max= 1.7765), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:50:17,935 - root - INFO - Step 23190: lr=1.00E-05, loss= 1.2583 (max= 1.7765), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:50:17,936 - root - INFO - Step 23190: lr=1.00E-05, loss= 1.2583 (max= 1.7765), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:50:17,936 - root - INFO - Step 23190: lr=1.00E-05, loss= 1.2583 (max= 1.7765), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:50:17,936 - root - INFO - Step 23190: lr=1.00E-05, loss= 1.2583 (max= 1.7765), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:50:17,936 - root - INFO - Step 23190: lr=1.00E-05, loss= 1.2583 (max= 1.7765), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:50:17,936 - root - INFO - Step 23190: lr=1.00E-05, loss= 1.2583 (max= 1.7765), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:50:17,936 - root - INFO - Step 23190: lr=1.00E-05, loss= 1.2583 (max= 1.7765), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:50:33,839 - root - INFO - Step 23200: lr=1.00E-05, loss= 1.1830 (max= 1.4932), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:50:33,839 - root - INFO - Step 23200: lr=1.00E-05, loss= 1.1830 (max= 1.4932), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:50:33,839 - root - INFO - Step 23200: lr=1.00E-05, loss= 1.1830 (max= 1.4932), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:50:33,839 - root - INFO - Step 23200: lr=1.00E-05, loss= 1.1830 (max= 1.4932), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:50:33,839 - root - INFO - Step 23200: lr=1.00E-05, loss= 1.1830 (max= 1.4932), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:50:33,839 - root - INFO - Step 23200: lr=1.00E-05, loss= 1.1830 (max= 1.4932), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:50:33,839 - root - INFO - Step 23200: lr=1.00E-05, loss= 1.1830 (max= 1.4932), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:50:33,839 - root - INFO - Step 23200: lr=1.00E-05, loss= 1.1830 (max= 1.4932), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:50:49,766 - root - INFO - Step 23210: lr=1.00E-05, loss= 1.2150 (max= 1.8475), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:50:49,767 - root - INFO - Step 23210: lr=1.00E-05, loss= 1.2150 (max= 1.8475), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:50:49,767 - root - INFO - Step 23210: lr=1.00E-05, loss= 1.2150 (max= 1.8475), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:50:49,767 - root - INFO - Step 23210: lr=1.00E-05, loss= 1.2150 (max= 1.8475), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:50:49,767 - root - INFO - Step 23210: lr=1.00E-05, loss= 1.2150 (max= 1.8475), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:50:49,767 - root - INFO - Step 23210: lr=1.00E-05, loss= 1.2150 (max= 1.8475), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:50:49,767 - root - INFO - Step 23210: lr=1.00E-05, loss= 1.2150 (max= 1.8475), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:50:49,767 - root - INFO - Step 23210: lr=1.00E-05, loss= 1.2150 (max= 1.8475), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:51:05,670 - root - INFO - Step 23220: lr=1.00E-05, loss= 1.1903 (max= 1.4895), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:51:05,670 - root - INFO - Step 23220: lr=1.00E-05, loss= 1.1903 (max= 1.4895), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:51:05,670 - root - INFO - Step 23220: lr=1.00E-05, loss= 1.1903 (max= 1.4895), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:51:05,670 - root - INFO - Step 23220: lr=1.00E-05, loss= 1.1903 (max= 1.4895), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:51:05,670 - root - INFO - Step 23220: lr=1.00E-05, loss= 1.1903 (max= 1.4895), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:51:05,670 - root - INFO - Step 23220: lr=1.00E-05, loss= 1.1903 (max= 1.4895), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:51:05,670 - root - INFO - Step 23220: lr=1.00E-05, loss= 1.1903 (max= 1.4895), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:51:05,671 - root - INFO - Step 23220: lr=1.00E-05, loss= 1.1903 (max= 1.4895), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:51:21,532 - root - INFO - Step 23230: lr=1.00E-05, loss= 1.1938 (max= 1.5777), tps=20662, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:51:21,533 - root - INFO - Step 23230: lr=1.00E-05, loss= 1.1938 (max= 1.5777), tps=20662, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:51:21,533 - root - INFO - Step 23230: lr=1.00E-05, loss= 1.1938 (max= 1.5777), tps=20662, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:51:21,533 - root - INFO - Step 23230: lr=1.00E-05, loss= 1.1938 (max= 1.5777), tps=20663, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:51:21,533 - root - INFO - Step 23230: lr=1.00E-05, loss= 1.1938 (max= 1.5777), tps=20663, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:51:21,533 - root - INFO - Step 23230: lr=1.00E-05, loss= 1.1938 (max= 1.5777), tps=20662, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:51:21,533 - root - INFO - Step 23230: lr=1.00E-05, loss= 1.1938 (max= 1.5777), tps=20662, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:51:21,533 - root - INFO - Step 23230: lr=1.00E-05, loss= 1.1938 (max= 1.5777), tps=20662, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:51:37,489 - root - INFO - Step 23240: lr=1.00E-05, loss= 1.1857 (max= 1.4899), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:51:37,489 - root - INFO - Step 23240: lr=1.00E-05, loss= 1.1857 (max= 1.4899), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:51:37,489 - root - INFO - Step 23240: lr=1.00E-05, loss= 1.1857 (max= 1.4899), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:51:37,489 - root - INFO - Step 23240: lr=1.00E-05, loss= 1.1857 (max= 1.4899), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:51:37,490 - root - INFO - Step 23240: lr=1.00E-05, loss= 1.1857 (max= 1.4899), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:51:37,490 - root - INFO - Step 23240: lr=1.00E-05, loss= 1.1857 (max= 1.4899), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:51:37,490 - root - INFO - Step 23240: lr=1.00E-05, loss= 1.1857 (max= 1.4899), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:51:37,490 - root - INFO - Step 23240: lr=1.00E-05, loss= 1.1857 (max= 1.4899), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:51:53,387 - root - INFO - Step 23250: lr=1.00E-05, loss= 1.2158 (max= 1.6943), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:51:53,387 - root - INFO - Step 23250: lr=1.00E-05, loss= 1.2158 (max= 1.6943), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:51:53,387 - root - INFO - Step 23250: lr=1.00E-05, loss= 1.2158 (max= 1.6943), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:51:53,387 - root - INFO - Step 23250: lr=1.00E-05, loss= 1.2158 (max= 1.6943), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:51:53,387 - root - INFO - Step 23250: lr=1.00E-05, loss= 1.2158 (max= 1.6943), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:51:53,387 - root - INFO - Step 23250: lr=1.00E-05, loss= 1.2158 (max= 1.6943), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:51:53,387 - root - INFO - Step 23250: lr=1.00E-05, loss= 1.2158 (max= 1.6943), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:51:53,387 - root - INFO - Step 23250: lr=1.00E-05, loss= 1.2158 (max= 1.6943), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:09,311 - root - INFO - Step 23260: lr=1.00E-05, loss= 1.1828 (max= 1.5707), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:09,312 - root - INFO - Step 23260: lr=1.00E-05, loss= 1.1828 (max= 1.5707), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:09,312 - root - INFO - Step 23260: lr=1.00E-05, loss= 1.1828 (max= 1.5707), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:09,312 - root - INFO - Step 23260: lr=1.00E-05, loss= 1.1828 (max= 1.5707), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:09,312 - root - INFO - Step 23260: lr=1.00E-05, loss= 1.1828 (max= 1.5707), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:09,312 - root - INFO - Step 23260: lr=1.00E-05, loss= 1.1828 (max= 1.5707), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:09,312 - root - INFO - Step 23260: lr=1.00E-05, loss= 1.1828 (max= 1.5707), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:09,312 - root - INFO - Step 23260: lr=1.00E-05, loss= 1.1828 (max= 1.5707), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:25,243 - root - INFO - Step 23270: lr=1.00E-05, loss= 1.1967 (max= 1.8937), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:25,243 - root - INFO - Step 23270: lr=1.00E-05, loss= 1.1967 (max= 1.8937), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:25,243 - root - INFO - Step 23270: lr=1.00E-05, loss= 1.1967 (max= 1.8937), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:25,243 - root - INFO - Step 23270: lr=1.00E-05, loss= 1.1967 (max= 1.8937), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:25,243 - root - INFO - Step 23270: lr=1.00E-05, loss= 1.1967 (max= 1.8937), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:25,243 - root - INFO - Step 23270: lr=1.00E-05, loss= 1.1967 (max= 1.8937), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:25,243 - root - INFO - Step 23270: lr=1.00E-05, loss= 1.1967 (max= 1.8937), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:25,243 - root - INFO - Step 23270: lr=1.00E-05, loss= 1.1967 (max= 1.8937), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:41,120 - root - INFO - Step 23280: lr=1.00E-05, loss= 1.1899 (max= 1.5383), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:41,121 - root - INFO - Step 23280: lr=1.00E-05, loss= 1.1899 (max= 1.5383), tps=20643, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:41,121 - root - INFO - Step 23280: lr=1.00E-05, loss= 1.1899 (max= 1.5383), tps=20643, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:41,121 - root - INFO - Step 23280: lr=1.00E-05, loss= 1.1899 (max= 1.5383), tps=20643, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:41,121 - root - INFO - Step 23280: lr=1.00E-05, loss= 1.1899 (max= 1.5383), tps=20643, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:41,121 - root - INFO - Step 23280: lr=1.00E-05, loss= 1.1899 (max= 1.5383), tps=20643, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:41,121 - root - INFO - Step 23280: lr=1.00E-05, loss= 1.1899 (max= 1.5383), tps=20643, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:41,121 - root - INFO - Step 23280: lr=1.00E-05, loss= 1.1899 (max= 1.5383), tps=20643, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:57,094 - root - INFO - Step 23290: lr=1.00E-05, loss= 1.1933 (max= 1.5724), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:57,095 - root - INFO - Step 23290: lr=1.00E-05, loss= 1.1933 (max= 1.5724), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:57,095 - root - INFO - Step 23290: lr=1.00E-05, loss= 1.1933 (max= 1.5724), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:57,095 - root - INFO - Step 23290: lr=1.00E-05, loss= 1.1933 (max= 1.5724), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:57,095 - root - INFO - Step 23290: lr=1.00E-05, loss= 1.1933 (max= 1.5724), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:57,095 - root - INFO - Step 23290: lr=1.00E-05, loss= 1.1933 (max= 1.5724), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:57,095 - root - INFO - Step 23290: lr=1.00E-05, loss= 1.1933 (max= 1.5724), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:52:57,095 - root - INFO - Step 23290: lr=1.00E-05, loss= 1.1933 (max= 1.5724), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:13,042 - root - INFO - Step 23300: lr=1.00E-05, loss= 1.2348 (max= 1.6818), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:13,042 - root - INFO - Step 23300: lr=1.00E-05, loss= 1.2348 (max= 1.6818), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:13,042 - root - INFO - Step 23300: lr=1.00E-05, loss= 1.2348 (max= 1.6818), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:13,042 - root - INFO - Step 23300: lr=1.00E-05, loss= 1.2348 (max= 1.6818), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:13,042 - root - INFO - Step 23300: lr=1.00E-05, loss= 1.2348 (max= 1.6818), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:13,042 - root - INFO - Step 23300: lr=1.00E-05, loss= 1.2348 (max= 1.6818), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:13,042 - root - INFO - Step 23300: lr=1.00E-05, loss= 1.2348 (max= 1.6818), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:13,042 - root - INFO - Step 23300: lr=1.00E-05, loss= 1.2348 (max= 1.6818), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:28,962 - root - INFO - Step 23310: lr=1.00E-05, loss= 1.1719 (max= 1.8186), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:28,963 - root - INFO - Step 23310: lr=1.00E-05, loss= 1.1719 (max= 1.8186), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:28,963 - root - INFO - Step 23310: lr=1.00E-05, loss= 1.1719 (max= 1.8186), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:28,963 - root - INFO - Step 23310: lr=1.00E-05, loss= 1.1719 (max= 1.8186), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:28,963 - root - INFO - Step 23310: lr=1.00E-05, loss= 1.1719 (max= 1.8186), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:28,963 - root - INFO - Step 23310: lr=1.00E-05, loss= 1.1719 (max= 1.8186), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:28,963 - root - INFO - Step 23310: lr=1.00E-05, loss= 1.1719 (max= 1.8186), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:28,963 - root - INFO - Step 23310: lr=1.00E-05, loss= 1.1719 (max= 1.8186), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:44,924 - root - INFO - Step 23320: lr=1.00E-05, loss= 1.1901 (max= 1.5420), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:44,925 - root - INFO - Step 23320: lr=1.00E-05, loss= 1.1901 (max= 1.5420), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:44,925 - root - INFO - Step 23320: lr=1.00E-05, loss= 1.1901 (max= 1.5420), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:44,925 - root - INFO - Step 23320: lr=1.00E-05, loss= 1.1901 (max= 1.5420), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:44,925 - root - INFO - Step 23320: lr=1.00E-05, loss= 1.1901 (max= 1.5420), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:44,925 - root - INFO - Step 23320: lr=1.00E-05, loss= 1.1901 (max= 1.5420), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:44,925 - root - INFO - Step 23320: lr=1.00E-05, loss= 1.1901 (max= 1.5420), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:53:44,925 - root - INFO - Step 23320: lr=1.00E-05, loss= 1.1901 (max= 1.5420), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:00,854 - root - INFO - Step 23330: lr=1.00E-05, loss= 1.1693 (max= 1.5266), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:00,854 - root - INFO - Step 23330: lr=1.00E-05, loss= 1.1693 (max= 1.5266), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:00,854 - root - INFO - Step 23330: lr=1.00E-05, loss= 1.1693 (max= 1.5266), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:00,855 - root - INFO - Step 23330: lr=1.00E-05, loss= 1.1693 (max= 1.5266), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:00,855 - root - INFO - Step 23330: lr=1.00E-05, loss= 1.1693 (max= 1.5266), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:00,855 - root - INFO - Step 23330: lr=1.00E-05, loss= 1.1693 (max= 1.5266), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:00,855 - root - INFO - Step 23330: lr=1.00E-05, loss= 1.1693 (max= 1.5266), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:00,855 - root - INFO - Step 23330: lr=1.00E-05, loss= 1.1693 (max= 1.5266), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:11,155 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:441892 +2025-10-24 20:54:16,795 - root - INFO - Step 23340: lr=1.00E-05, loss= 1.1761 (max= 1.5058), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:16,795 - root - INFO - Step 23340: lr=1.00E-05, loss= 1.1761 (max= 1.5058), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:16,795 - root - INFO - Step 23340: lr=1.00E-05, loss= 1.1761 (max= 1.5058), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:16,795 - root - INFO - Step 23340: lr=1.00E-05, loss= 1.1761 (max= 1.5058), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:16,795 - root - INFO - Step 23340: lr=1.00E-05, loss= 1.1761 (max= 1.5058), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:16,795 - root - INFO - Step 23340: lr=1.00E-05, loss= 1.1761 (max= 1.5058), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:16,795 - root - INFO - Step 23340: lr=1.00E-05, loss= 1.1761 (max= 1.5058), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:16,795 - root - INFO - Step 23340: lr=1.00E-05, loss= 1.1761 (max= 1.5058), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:32,743 - root - INFO - Step 23350: lr=1.00E-05, loss= 1.2089 (max= 1.5838), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:32,744 - root - INFO - Step 23350: lr=1.00E-05, loss= 1.2089 (max= 1.5838), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:32,744 - root - INFO - Step 23350: lr=1.00E-05, loss= 1.2089 (max= 1.5838), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:32,744 - root - INFO - Step 23350: lr=1.00E-05, loss= 1.2089 (max= 1.5838), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:32,744 - root - INFO - Step 23350: lr=1.00E-05, loss= 1.2089 (max= 1.5838), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:32,744 - root - INFO - Step 23350: lr=1.00E-05, loss= 1.2089 (max= 1.5838), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:32,744 - root - INFO - Step 23350: lr=1.00E-05, loss= 1.2089 (max= 1.5838), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:32,744 - root - INFO - Step 23350: lr=1.00E-05, loss= 1.2089 (max= 1.5838), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:48,696 - root - INFO - Step 23360: lr=1.00E-05, loss= 1.1910 (max= 1.6838), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:48,696 - root - INFO - Step 23360: lr=1.00E-05, loss= 1.1910 (max= 1.6838), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:48,696 - root - INFO - Step 23360: lr=1.00E-05, loss= 1.1910 (max= 1.6838), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:48,696 - root - INFO - Step 23360: lr=1.00E-05, loss= 1.1910 (max= 1.6838), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:48,696 - root - INFO - Step 23360: lr=1.00E-05, loss= 1.1910 (max= 1.6838), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:48,696 - root - INFO - Step 23360: lr=1.00E-05, loss= 1.1910 (max= 1.6838), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:48,696 - root - INFO - Step 23360: lr=1.00E-05, loss= 1.1910 (max= 1.6838), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:54:48,697 - root - INFO - Step 23360: lr=1.00E-05, loss= 1.1910 (max= 1.6838), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:04,641 - root - INFO - Step 23370: lr=1.00E-05, loss= 1.1883 (max= 1.8078), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:04,641 - root - INFO - Step 23370: lr=1.00E-05, loss= 1.1883 (max= 1.8078), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:04,641 - root - INFO - Step 23370: lr=1.00E-05, loss= 1.1883 (max= 1.8078), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:04,641 - root - INFO - Step 23370: lr=1.00E-05, loss= 1.1883 (max= 1.8078), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:04,641 - root - INFO - Step 23370: lr=1.00E-05, loss= 1.1883 (max= 1.8078), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:04,641 - root - INFO - Step 23370: lr=1.00E-05, loss= 1.1883 (max= 1.8078), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:04,641 - root - INFO - Step 23370: lr=1.00E-05, loss= 1.1883 (max= 1.8078), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:04,641 - root - INFO - Step 23370: lr=1.00E-05, loss= 1.1883 (max= 1.8078), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:20,582 - root - INFO - Step 23380: lr=1.00E-05, loss= 1.1321 (max= 1.5933), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:20,582 - root - INFO - Step 23380: lr=1.00E-05, loss= 1.1321 (max= 1.5933), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:20,582 - root - INFO - Step 23380: lr=1.00E-05, loss= 1.1321 (max= 1.5933), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:20,583 - root - INFO - Step 23380: lr=1.00E-05, loss= 1.1321 (max= 1.5933), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:20,583 - root - INFO - Step 23380: lr=1.00E-05, loss= 1.1321 (max= 1.5933), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:20,583 - root - INFO - Step 23380: lr=1.00E-05, loss= 1.1321 (max= 1.5933), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:20,583 - root - INFO - Step 23380: lr=1.00E-05, loss= 1.1321 (max= 1.5933), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:20,583 - root - INFO - Step 23380: lr=1.00E-05, loss= 1.1321 (max= 1.5933), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:36,495 - root - INFO - Step 23390: lr=1.00E-05, loss= 1.1924 (max= 1.7875), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:36,496 - root - INFO - Step 23390: lr=1.00E-05, loss= 1.1924 (max= 1.7875), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:36,496 - root - INFO - Step 23390: lr=1.00E-05, loss= 1.1924 (max= 1.7875), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:36,496 - root - INFO - Step 23390: lr=1.00E-05, loss= 1.1924 (max= 1.7875), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:36,496 - root - INFO - Step 23390: lr=1.00E-05, loss= 1.1924 (max= 1.7875), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:36,496 - root - INFO - Step 23390: lr=1.00E-05, loss= 1.1924 (max= 1.7875), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:36,496 - root - INFO - Step 23390: lr=1.00E-05, loss= 1.1924 (max= 1.7875), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:36,496 - root - INFO - Step 23390: lr=1.00E-05, loss= 1.1924 (max= 1.7875), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:52,503 - root - INFO - Step 23400: lr=1.00E-05, loss= 1.1973 (max= 1.6181), tps=20475, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:52,504 - root - INFO - Step 23400: lr=1.00E-05, loss= 1.1973 (max= 1.6181), tps=20475, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:52,504 - root - INFO - Step 23400: lr=1.00E-05, loss= 1.1973 (max= 1.6181), tps=20475, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:52,504 - root - INFO - Step 23400: lr=1.00E-05, loss= 1.1973 (max= 1.6181), tps=20475, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:52,504 - root - INFO - Step 23400: lr=1.00E-05, loss= 1.1973 (max= 1.6181), tps=20475, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:52,504 - root - INFO - Step 23400: lr=1.00E-05, loss= 1.1973 (max= 1.6181), tps=20475, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:52,504 - root - INFO - Step 23400: lr=1.00E-05, loss= 1.1973 (max= 1.6181), tps=20475, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:55:52,504 - root - INFO - Step 23400: lr=1.00E-05, loss= 1.1973 (max= 1.6181), tps=20474, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:56:08,437 - root - INFO - Step 23410: lr=1.00E-05, loss= 1.1845 (max= 1.6705), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:56:08,437 - root - INFO - Step 23410: lr=1.00E-05, loss= 1.1845 (max= 1.6705), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:56:08,437 - root - INFO - Step 23410: lr=1.00E-05, loss= 1.1845 (max= 1.6705), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:56:08,437 - root - INFO - Step 23410: lr=1.00E-05, loss= 1.1845 (max= 1.6705), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:56:08,437 - root - INFO - Step 23410: lr=1.00E-05, loss= 1.1845 (max= 1.6705), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:56:08,438 - root - INFO - Step 23410: lr=1.00E-05, loss= 1.1845 (max= 1.6705), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:56:08,438 - root - INFO - Step 23410: lr=1.00E-05, loss= 1.1845 (max= 1.6705), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:56:08,438 - root - INFO - Step 23410: lr=1.00E-05, loss= 1.1845 (max= 1.6705), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:56:24,345 - root - INFO - Step 23420: lr=1.00E-05, loss= 1.2024 (max= 1.5816), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:56:24,345 - root - INFO - Step 23420: lr=1.00E-05, loss= 1.2024 (max= 1.5816), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:56:24,345 - root - INFO - Step 23420: lr=1.00E-05, loss= 1.2024 (max= 1.5816), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:56:24,345 - root - INFO - Step 23420: lr=1.00E-05, loss= 1.2024 (max= 1.5816), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:56:24,345 - root - INFO - Step 23420: lr=1.00E-05, loss= 1.2024 (max= 1.5816), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:56:24,345 - root - INFO - Step 23420: lr=1.00E-05, loss= 1.2024 (max= 1.5816), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:56:24,345 - root - INFO - Step 23420: lr=1.00E-05, loss= 1.2024 (max= 1.5816), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:56:24,345 - root - INFO - Step 23420: lr=1.00E-05, loss= 1.2024 (max= 1.5816), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:56:40,261 - root - INFO - Step 23430: lr=1.00E-05, loss= 1.2105 (max= 1.7533), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:56:40,261 - root - INFO - Step 23430: lr=1.00E-05, loss= 1.2105 (max= 1.7533), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:56:40,261 - root - INFO - Step 23430: lr=1.00E-05, loss= 1.2105 (max= 1.7533), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:56:40,261 - root - INFO - Step 23430: lr=1.00E-05, loss= 1.2105 (max= 1.7533), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:56:40,261 - root - INFO - Step 23430: lr=1.00E-05, loss= 1.2105 (max= 1.7533), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:56:40,261 - root - INFO - Step 23430: lr=1.00E-05, loss= 1.2105 (max= 1.7533), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:56:40,261 - root - INFO - Step 23430: lr=1.00E-05, loss= 1.2105 (max= 1.7533), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:56:40,261 - root - INFO - Step 23430: lr=1.00E-05, loss= 1.2105 (max= 1.7533), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:56:56,143 - root - INFO - Step 23440: lr=1.00E-05, loss= 1.1775 (max= 1.6367), tps=20636, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:56:56,144 - root - INFO - Step 23440: lr=1.00E-05, loss= 1.1775 (max= 1.6367), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:56:56,144 - root - INFO - Step 23440: lr=1.00E-05, loss= 1.1775 (max= 1.6367), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:56:56,144 - root - INFO - Step 23440: lr=1.00E-05, loss= 1.1775 (max= 1.6367), tps=20636, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:56:56,144 - root - INFO - Step 23440: lr=1.00E-05, loss= 1.1775 (max= 1.6367), tps=20636, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:56:56,144 - root - INFO - Step 23440: lr=1.00E-05, loss= 1.1775 (max= 1.6367), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:56:56,144 - root - INFO - Step 23440: lr=1.00E-05, loss= 1.1775 (max= 1.6367), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:56:56,144 - root - INFO - Step 23440: lr=1.00E-05, loss= 1.1775 (max= 1.6367), tps=20636, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:57:12,064 - root - INFO - Step 23450: lr=1.00E-05, loss= 1.1699 (max= 1.5311), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:57:12,064 - root - INFO - Step 23450: lr=1.00E-05, loss= 1.1699 (max= 1.5311), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:57:12,064 - root - INFO - Step 23450: lr=1.00E-05, loss= 1.1699 (max= 1.5311), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:57:12,064 - root - INFO - Step 23450: lr=1.00E-05, loss= 1.1699 (max= 1.5311), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:57:12,064 - root - INFO - Step 23450: lr=1.00E-05, loss= 1.1699 (max= 1.5311), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:57:12,064 - root - INFO - Step 23450: lr=1.00E-05, loss= 1.1699 (max= 1.5311), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:57:12,064 - root - INFO - Step 23450: lr=1.00E-05, loss= 1.1699 (max= 1.5311), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:57:12,064 - root - INFO - Step 23450: lr=1.00E-05, loss= 1.1699 (max= 1.5311), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:57:28,015 - root - INFO - Step 23460: lr=1.00E-05, loss= 1.1788 (max= 1.6065), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:57:28,015 - root - INFO - Step 23460: lr=1.00E-05, loss= 1.1788 (max= 1.6065), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:57:28,015 - root - INFO - Step 23460: lr=1.00E-05, loss= 1.1788 (max= 1.6065), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:57:28,015 - root - INFO - Step 23460: lr=1.00E-05, loss= 1.1788 (max= 1.6065), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:57:28,015 - root - INFO - Step 23460: lr=1.00E-05, loss= 1.1788 (max= 1.6065), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:57:28,015 - root - INFO - Step 23460: lr=1.00E-05, loss= 1.1788 (max= 1.6065), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:57:28,015 - root - INFO - Step 23460: lr=1.00E-05, loss= 1.1788 (max= 1.6065), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:57:28,015 - root - INFO - Step 23460: lr=1.00E-05, loss= 1.1788 (max= 1.6065), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:57:43,965 - root - INFO - Step 23470: lr=1.00E-05, loss= 1.1745 (max= 1.6436), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:57:43,965 - root - INFO - Step 23470: lr=1.00E-05, loss= 1.1745 (max= 1.6436), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:57:43,965 - root - INFO - Step 23470: lr=1.00E-05, loss= 1.1745 (max= 1.6436), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:57:43,965 - root - INFO - Step 23470: lr=1.00E-05, loss= 1.1745 (max= 1.6436), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:57:43,965 - root - INFO - Step 23470: lr=1.00E-05, loss= 1.1745 (max= 1.6436), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:57:43,965 - root - INFO - Step 23470: lr=1.00E-05, loss= 1.1745 (max= 1.6436), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:57:43,965 - root - INFO - Step 23470: lr=1.00E-05, loss= 1.1745 (max= 1.6436), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:57:43,965 - root - INFO - Step 23470: lr=1.00E-05, loss= 1.1745 (max= 1.6436), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:57:59,870 - root - INFO - Step 23480: lr=1.00E-05, loss= 1.1692 (max= 1.5593), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:57:59,870 - root - INFO - Step 23480: lr=1.00E-05, loss= 1.1692 (max= 1.5593), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:57:59,870 - root - INFO - Step 23480: lr=1.00E-05, loss= 1.1692 (max= 1.5593), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:57:59,870 - root - INFO - Step 23480: lr=1.00E-05, loss= 1.1692 (max= 1.5593), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:57:59,871 - root - INFO - Step 23480: lr=1.00E-05, loss= 1.1692 (max= 1.5593), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:57:59,871 - root - INFO - Step 23480: lr=1.00E-05, loss= 1.1692 (max= 1.5593), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:57:59,871 - root - INFO - Step 23480: lr=1.00E-05, loss= 1.1692 (max= 1.5593), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:57:59,871 - root - INFO - Step 23480: lr=1.00E-05, loss= 1.1692 (max= 1.5593), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:58:15,768 - root - INFO - Step 23490: lr=1.00E-05, loss= 1.1920 (max= 1.5948), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:58:15,768 - root - INFO - Step 23490: lr=1.00E-05, loss= 1.1920 (max= 1.5948), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:58:15,769 - root - INFO - Step 23490: lr=1.00E-05, loss= 1.1920 (max= 1.5948), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:58:15,769 - root - INFO - Step 23490: lr=1.00E-05, loss= 1.1920 (max= 1.5948), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:58:15,769 - root - INFO - Step 23490: lr=1.00E-05, loss= 1.1920 (max= 1.5948), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:58:15,769 - root - INFO - Step 23490: lr=1.00E-05, loss= 1.1920 (max= 1.5948), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:58:15,769 - root - INFO - Step 23490: lr=1.00E-05, loss= 1.1920 (max= 1.5948), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:58:15,769 - root - INFO - Step 23490: lr=1.00E-05, loss= 1.1920 (max= 1.5948), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:58:31,685 - root - INFO - Step 23500: lr=1.00E-05, loss= 1.1628 (max= 1.5317), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:58:31,686 - root - INFO - Step 23500: lr=1.00E-05, loss= 1.1628 (max= 1.5317), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:58:31,686 - root - INFO - Step 23500: lr=1.00E-05, loss= 1.1628 (max= 1.5317), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:58:31,686 - root - INFO - Step 23500: lr=1.00E-05, loss= 1.1628 (max= 1.5317), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:58:31,686 - root - INFO - Step 23500: lr=1.00E-05, loss= 1.1628 (max= 1.5317), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:58:31,686 - root - INFO - Step 23500: lr=1.00E-05, loss= 1.1628 (max= 1.5317), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:58:31,686 - root - INFO - Step 23500: lr=1.00E-05, loss= 1.1628 (max= 1.5317), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:58:31,686 - root - INFO - Step 23500: lr=1.00E-05, loss= 1.1628 (max= 1.5317), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:58:47,618 - root - INFO - Step 23510: lr=1.00E-05, loss= 1.1361 (max= 1.5294), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:58:47,618 - root - INFO - Step 23510: lr=1.00E-05, loss= 1.1361 (max= 1.5294), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:58:47,618 - root - INFO - Step 23510: lr=1.00E-05, loss= 1.1361 (max= 1.5294), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:58:47,618 - root - INFO - Step 23510: lr=1.00E-05, loss= 1.1361 (max= 1.5294), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:58:47,618 - root - INFO - Step 23510: lr=1.00E-05, loss= 1.1361 (max= 1.5294), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:58:47,618 - root - INFO - Step 23510: lr=1.00E-05, loss= 1.1361 (max= 1.5294), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:58:47,618 - root - INFO - Step 23510: lr=1.00E-05, loss= 1.1361 (max= 1.5294), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:58:47,618 - root - INFO - Step 23510: lr=1.00E-05, loss= 1.1361 (max= 1.5294), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:03,534 - root - INFO - Step 23520: lr=1.00E-05, loss= 1.1810 (max= 1.5838), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:03,534 - root - INFO - Step 23520: lr=1.00E-05, loss= 1.1810 (max= 1.5838), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:03,534 - root - INFO - Step 23520: lr=1.00E-05, loss= 1.1810 (max= 1.5838), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:03,534 - root - INFO - Step 23520: lr=1.00E-05, loss= 1.1810 (max= 1.5838), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:03,535 - root - INFO - Step 23520: lr=1.00E-05, loss= 1.1810 (max= 1.5838), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:03,535 - root - INFO - Step 23520: lr=1.00E-05, loss= 1.1810 (max= 1.5838), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:03,535 - root - INFO - Step 23520: lr=1.00E-05, loss= 1.1810 (max= 1.5838), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:03,535 - root - INFO - Step 23520: lr=1.00E-05, loss= 1.1810 (max= 1.5838), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:19,440 - root - INFO - Step 23530: lr=1.00E-05, loss= 1.1850 (max= 1.5336), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:19,441 - root - INFO - Step 23530: lr=1.00E-05, loss= 1.1850 (max= 1.5336), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:19,441 - root - INFO - Step 23530: lr=1.00E-05, loss= 1.1850 (max= 1.5336), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:19,441 - root - INFO - Step 23530: lr=1.00E-05, loss= 1.1850 (max= 1.5336), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:19,441 - root - INFO - Step 23530: lr=1.00E-05, loss= 1.1850 (max= 1.5336), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:19,441 - root - INFO - Step 23530: lr=1.00E-05, loss= 1.1850 (max= 1.5336), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:19,441 - root - INFO - Step 23530: lr=1.00E-05, loss= 1.1850 (max= 1.5336), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:19,441 - root - INFO - Step 23530: lr=1.00E-05, loss= 1.1850 (max= 1.5336), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:35,383 - root - INFO - Step 23540: lr=1.00E-05, loss= 1.1826 (max= 1.5728), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:35,384 - root - INFO - Step 23540: lr=1.00E-05, loss= 1.1826 (max= 1.5728), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:35,384 - root - INFO - Step 23540: lr=1.00E-05, loss= 1.1826 (max= 1.5728), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:35,384 - root - INFO - Step 23540: lr=1.00E-05, loss= 1.1826 (max= 1.5728), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:35,384 - root - INFO - Step 23540: lr=1.00E-05, loss= 1.1826 (max= 1.5728), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:35,384 - root - INFO - Step 23540: lr=1.00E-05, loss= 1.1826 (max= 1.5728), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:35,384 - root - INFO - Step 23540: lr=1.00E-05, loss= 1.1826 (max= 1.5728), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:35,384 - root - INFO - Step 23540: lr=1.00E-05, loss= 1.1826 (max= 1.5728), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 20:59:51,329 - root - INFO - Step 23550: lr=1.00E-05, loss= 1.2016 (max= 1.6553), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:59:51,329 - root - INFO - Step 23550: lr=1.00E-05, loss= 1.2016 (max= 1.6553), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:59:51,329 - root - INFO - Step 23550: lr=1.00E-05, loss= 1.2016 (max= 1.6553), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:59:51,329 - root - INFO - Step 23550: lr=1.00E-05, loss= 1.2016 (max= 1.6553), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:59:51,329 - root - INFO - Step 23550: lr=1.00E-05, loss= 1.2016 (max= 1.6553), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:59:51,329 - root - INFO - Step 23550: lr=1.00E-05, loss= 1.2016 (max= 1.6553), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:59:51,329 - root - INFO - Step 23550: lr=1.00E-05, loss= 1.2016 (max= 1.6553), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:59:51,329 - root - INFO - Step 23550: lr=1.00E-05, loss= 1.2016 (max= 1.6553), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 20:59:51,339 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:3875130 +2025-10-24 21:00:07,307 - root - INFO - Step 23560: lr=1.00E-05, loss= 1.1814 (max= 1.5099), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:07,307 - root - INFO - Step 23560: lr=1.00E-05, loss= 1.1814 (max= 1.5099), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:07,307 - root - INFO - Step 23560: lr=1.00E-05, loss= 1.1814 (max= 1.5099), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:07,307 - root - INFO - Step 23560: lr=1.00E-05, loss= 1.1814 (max= 1.5099), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:07,307 - root - INFO - Step 23560: lr=1.00E-05, loss= 1.1814 (max= 1.5099), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:07,307 - root - INFO - Step 23560: lr=1.00E-05, loss= 1.1814 (max= 1.5099), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:07,307 - root - INFO - Step 23560: lr=1.00E-05, loss= 1.1814 (max= 1.5099), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:07,307 - root - INFO - Step 23560: lr=1.00E-05, loss= 1.1814 (max= 1.5099), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:23,260 - root - INFO - Step 23570: lr=1.00E-05, loss= 1.1796 (max= 1.5011), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:00:23,261 - root - INFO - Step 23570: lr=1.00E-05, loss= 1.1796 (max= 1.5011), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:00:23,261 - root - INFO - Step 23570: lr=1.00E-05, loss= 1.1796 (max= 1.5011), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:00:23,261 - root - INFO - Step 23570: lr=1.00E-05, loss= 1.1796 (max= 1.5011), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:00:23,261 - root - INFO - Step 23570: lr=1.00E-05, loss= 1.1796 (max= 1.5011), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:00:23,261 - root - INFO - Step 23570: lr=1.00E-05, loss= 1.1796 (max= 1.5011), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:00:23,261 - root - INFO - Step 23570: lr=1.00E-05, loss= 1.1796 (max= 1.5011), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:00:23,261 - root - INFO - Step 23570: lr=1.00E-05, loss= 1.1796 (max= 1.5011), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:00:39,186 - root - INFO - Step 23580: lr=1.00E-05, loss= 1.1516 (max= 1.6937), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:39,186 - root - INFO - Step 23580: lr=1.00E-05, loss= 1.1516 (max= 1.6937), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:39,186 - root - INFO - Step 23580: lr=1.00E-05, loss= 1.1516 (max= 1.6937), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:39,186 - root - INFO - Step 23580: lr=1.00E-05, loss= 1.1516 (max= 1.6937), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:39,186 - root - INFO - Step 23580: lr=1.00E-05, loss= 1.1516 (max= 1.6937), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:39,186 - root - INFO - Step 23580: lr=1.00E-05, loss= 1.1516 (max= 1.6937), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:39,186 - root - INFO - Step 23580: lr=1.00E-05, loss= 1.1516 (max= 1.6937), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:39,186 - root - INFO - Step 23580: lr=1.00E-05, loss= 1.1516 (max= 1.6937), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:55,108 - root - INFO - Step 23590: lr=1.00E-05, loss= 1.1880 (max= 1.5359), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:55,108 - root - INFO - Step 23590: lr=1.00E-05, loss= 1.1880 (max= 1.5359), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:55,108 - root - INFO - Step 23590: lr=1.00E-05, loss= 1.1880 (max= 1.5359), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:55,108 - root - INFO - Step 23590: lr=1.00E-05, loss= 1.1880 (max= 1.5359), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:55,108 - root - INFO - Step 23590: lr=1.00E-05, loss= 1.1880 (max= 1.5359), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:55,108 - root - INFO - Step 23590: lr=1.00E-05, loss= 1.1880 (max= 1.5359), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:55,108 - root - INFO - Step 23590: lr=1.00E-05, loss= 1.1880 (max= 1.5359), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:00:55,108 - root - INFO - Step 23590: lr=1.00E-05, loss= 1.1880 (max= 1.5359), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:11,012 - root - INFO - Step 23600: lr=1.00E-05, loss= 1.1769 (max= 1.5430), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:11,012 - root - INFO - Step 23600: lr=1.00E-05, loss= 1.1769 (max= 1.5430), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:11,012 - root - INFO - Step 23600: lr=1.00E-05, loss= 1.1769 (max= 1.5430), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:11,012 - root - INFO - Step 23600: lr=1.00E-05, loss= 1.1769 (max= 1.5430), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:11,012 - root - INFO - Step 23600: lr=1.00E-05, loss= 1.1769 (max= 1.5430), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:11,012 - root - INFO - Step 23600: lr=1.00E-05, loss= 1.1769 (max= 1.5430), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:11,013 - root - INFO - Step 23600: lr=1.00E-05, loss= 1.1769 (max= 1.5430), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:11,013 - root - INFO - Step 23600: lr=1.00E-05, loss= 1.1769 (max= 1.5430), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:26,922 - root - INFO - Step 23610: lr=1.00E-05, loss= 1.1930 (max= 1.6003), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:26,922 - root - INFO - Step 23610: lr=1.00E-05, loss= 1.1930 (max= 1.6003), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:26,922 - root - INFO - Step 23610: lr=1.00E-05, loss= 1.1930 (max= 1.6003), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:26,922 - root - INFO - Step 23610: lr=1.00E-05, loss= 1.1930 (max= 1.6003), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:26,923 - root - INFO - Step 23610: lr=1.00E-05, loss= 1.1930 (max= 1.6003), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:26,923 - root - INFO - Step 23610: lr=1.00E-05, loss= 1.1930 (max= 1.6003), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:26,923 - root - INFO - Step 23610: lr=1.00E-05, loss= 1.1930 (max= 1.6003), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:26,923 - root - INFO - Step 23610: lr=1.00E-05, loss= 1.1930 (max= 1.6003), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:42,852 - root - INFO - Step 23620: lr=1.00E-05, loss= 1.1870 (max= 1.8544), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:42,852 - root - INFO - Step 23620: lr=1.00E-05, loss= 1.1870 (max= 1.8544), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:42,852 - root - INFO - Step 23620: lr=1.00E-05, loss= 1.1870 (max= 1.8544), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:42,852 - root - INFO - Step 23620: lr=1.00E-05, loss= 1.1870 (max= 1.8544), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:42,852 - root - INFO - Step 23620: lr=1.00E-05, loss= 1.1870 (max= 1.8544), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:42,852 - root - INFO - Step 23620: lr=1.00E-05, loss= 1.1870 (max= 1.8544), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:42,852 - root - INFO - Step 23620: lr=1.00E-05, loss= 1.1870 (max= 1.8544), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:42,852 - root - INFO - Step 23620: lr=1.00E-05, loss= 1.1870 (max= 1.8544), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:01:58,762 - root - INFO - Step 23630: lr=1.00E-05, loss= 1.2021 (max= 2.2382), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:01:58,762 - root - INFO - Step 23630: lr=1.00E-05, loss= 1.2021 (max= 2.2382), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:01:58,762 - root - INFO - Step 23630: lr=1.00E-05, loss= 1.2021 (max= 2.2382), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:01:58,762 - root - INFO - Step 23630: lr=1.00E-05, loss= 1.2021 (max= 2.2382), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:01:58,762 - root - INFO - Step 23630: lr=1.00E-05, loss= 1.2021 (max= 2.2382), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:01:58,762 - root - INFO - Step 23630: lr=1.00E-05, loss= 1.2021 (max= 2.2382), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:01:58,762 - root - INFO - Step 23630: lr=1.00E-05, loss= 1.2021 (max= 2.2382), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:01:58,762 - root - INFO - Step 23630: lr=1.00E-05, loss= 1.2021 (max= 2.2382), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:02:14,706 - root - INFO - Step 23640: lr=1.00E-05, loss= 1.1778 (max= 1.5451), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:02:14,706 - root - INFO - Step 23640: lr=1.00E-05, loss= 1.1778 (max= 1.5451), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:02:14,706 - root - INFO - Step 23640: lr=1.00E-05, loss= 1.1778 (max= 1.5451), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:02:14,706 - root - INFO - Step 23640: lr=1.00E-05, loss= 1.1778 (max= 1.5451), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:02:14,706 - root - INFO - Step 23640: lr=1.00E-05, loss= 1.1778 (max= 1.5451), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:02:14,706 - root - INFO - Step 23640: lr=1.00E-05, loss= 1.1778 (max= 1.5451), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:02:14,706 - root - INFO - Step 23640: lr=1.00E-05, loss= 1.1778 (max= 1.5451), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:02:14,706 - root - INFO - Step 23640: lr=1.00E-05, loss= 1.1778 (max= 1.5451), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:02:30,654 - root - INFO - Step 23650: lr=1.00E-05, loss= 1.1838 (max= 1.5118), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:02:30,654 - root - INFO - Step 23650: lr=1.00E-05, loss= 1.1838 (max= 1.5118), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:02:30,654 - root - INFO - Step 23650: lr=1.00E-05, loss= 1.1838 (max= 1.5118), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:02:30,654 - root - INFO - Step 23650: lr=1.00E-05, loss= 1.1838 (max= 1.5118), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:02:30,655 - root - INFO - Step 23650: lr=1.00E-05, loss= 1.1838 (max= 1.5118), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:02:30,655 - root - INFO - Step 23650: lr=1.00E-05, loss= 1.1838 (max= 1.5118), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:02:30,655 - root - INFO - Step 23650: lr=1.00E-05, loss= 1.1838 (max= 1.5118), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:02:30,655 - root - INFO - Step 23650: lr=1.00E-05, loss= 1.1838 (max= 1.5118), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:02:46,578 - root - INFO - Step 23660: lr=1.00E-05, loss= 1.2119 (max= 1.5761), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:02:46,578 - root - INFO - Step 23660: lr=1.00E-05, loss= 1.2119 (max= 1.5761), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:02:46,579 - root - INFO - Step 23660: lr=1.00E-05, loss= 1.2119 (max= 1.5761), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:02:46,579 - root - INFO - Step 23660: lr=1.00E-05, loss= 1.2119 (max= 1.5761), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:02:46,579 - root - INFO - Step 23660: lr=1.00E-05, loss= 1.2119 (max= 1.5761), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:02:46,579 - root - INFO - Step 23660: lr=1.00E-05, loss= 1.2119 (max= 1.5761), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:02:46,579 - root - INFO - Step 23660: lr=1.00E-05, loss= 1.2119 (max= 1.5761), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:02:46,579 - root - INFO - Step 23660: lr=1.00E-05, loss= 1.2119 (max= 1.5761), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:03:02,529 - root - INFO - Step 23670: lr=1.00E-05, loss= 1.2082 (max= 1.6942), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:02,529 - root - INFO - Step 23670: lr=1.00E-05, loss= 1.2082 (max= 1.6942), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:02,529 - root - INFO - Step 23670: lr=1.00E-05, loss= 1.2082 (max= 1.6942), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:02,529 - root - INFO - Step 23670: lr=1.00E-05, loss= 1.2082 (max= 1.6942), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:02,529 - root - INFO - Step 23670: lr=1.00E-05, loss= 1.2082 (max= 1.6942), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:02,529 - root - INFO - Step 23670: lr=1.00E-05, loss= 1.2082 (max= 1.6942), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:02,529 - root - INFO - Step 23670: lr=1.00E-05, loss= 1.2082 (max= 1.6942), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:02,529 - root - INFO - Step 23670: lr=1.00E-05, loss= 1.2082 (max= 1.6942), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:18,505 - root - INFO - Step 23680: lr=1.00E-05, loss= 1.1557 (max= 1.5307), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:18,505 - root - INFO - Step 23680: lr=1.00E-05, loss= 1.1557 (max= 1.5307), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:18,505 - root - INFO - Step 23680: lr=1.00E-05, loss= 1.1557 (max= 1.5307), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:18,505 - root - INFO - Step 23680: lr=1.00E-05, loss= 1.1557 (max= 1.5307), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:18,505 - root - INFO - Step 23680: lr=1.00E-05, loss= 1.1557 (max= 1.5307), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:18,505 - root - INFO - Step 23680: lr=1.00E-05, loss= 1.1557 (max= 1.5307), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:18,505 - root - INFO - Step 23680: lr=1.00E-05, loss= 1.1557 (max= 1.5307), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:18,505 - root - INFO - Step 23680: lr=1.00E-05, loss= 1.1557 (max= 1.5307), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:34,471 - root - INFO - Step 23690: lr=1.00E-05, loss= 1.1816 (max= 1.6619), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:34,471 - root - INFO - Step 23690: lr=1.00E-05, loss= 1.1816 (max= 1.6619), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:34,471 - root - INFO - Step 23690: lr=1.00E-05, loss= 1.1816 (max= 1.6619), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:34,471 - root - INFO - Step 23690: lr=1.00E-05, loss= 1.1816 (max= 1.6619), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:34,471 - root - INFO - Step 23690: lr=1.00E-05, loss= 1.1816 (max= 1.6619), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:34,471 - root - INFO - Step 23690: lr=1.00E-05, loss= 1.1816 (max= 1.6619), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:34,471 - root - INFO - Step 23690: lr=1.00E-05, loss= 1.1816 (max= 1.6619), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:34,471 - root - INFO - Step 23690: lr=1.00E-05, loss= 1.1816 (max= 1.6619), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:50,389 - root - INFO - Step 23700: lr=1.00E-05, loss= 1.2294 (max= 1.7965), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:50,389 - root - INFO - Step 23700: lr=1.00E-05, loss= 1.2294 (max= 1.7965), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:50,389 - root - INFO - Step 23700: lr=1.00E-05, loss= 1.2294 (max= 1.7965), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:50,389 - root - INFO - Step 23700: lr=1.00E-05, loss= 1.2294 (max= 1.7965), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:50,389 - root - INFO - Step 23700: lr=1.00E-05, loss= 1.2294 (max= 1.7965), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:50,389 - root - INFO - Step 23700: lr=1.00E-05, loss= 1.2294 (max= 1.7965), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:50,389 - root - INFO - Step 23700: lr=1.00E-05, loss= 1.2294 (max= 1.7965), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:03:50,389 - root - INFO - Step 23700: lr=1.00E-05, loss= 1.2294 (max= 1.7965), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:04:06,337 - root - INFO - Step 23710: lr=1.00E-05, loss= 1.1714 (max= 1.6722), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:04:06,337 - root - INFO - Step 23710: lr=1.00E-05, loss= 1.1714 (max= 1.6722), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:04:06,337 - root - INFO - Step 23710: lr=1.00E-05, loss= 1.1714 (max= 1.6722), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:04:06,337 - root - INFO - Step 23710: lr=1.00E-05, loss= 1.1714 (max= 1.6722), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:04:06,337 - root - INFO - Step 23710: lr=1.00E-05, loss= 1.1714 (max= 1.6722), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:04:06,337 - root - INFO - Step 23710: lr=1.00E-05, loss= 1.1714 (max= 1.6722), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:04:06,337 - root - INFO - Step 23710: lr=1.00E-05, loss= 1.1714 (max= 1.6722), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:04:06,337 - root - INFO - Step 23710: lr=1.00E-05, loss= 1.1714 (max= 1.6722), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:04:22,257 - root - INFO - Step 23720: lr=1.00E-05, loss= 1.1950 (max= 1.6524), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:04:22,257 - root - INFO - Step 23720: lr=1.00E-05, loss= 1.1950 (max= 1.6524), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:04:22,257 - root - INFO - Step 23720: lr=1.00E-05, loss= 1.1950 (max= 1.6524), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:04:22,257 - root - INFO - Step 23720: lr=1.00E-05, loss= 1.1950 (max= 1.6524), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:04:22,257 - root - INFO - Step 23720: lr=1.00E-05, loss= 1.1950 (max= 1.6524), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:04:22,257 - root - INFO - Step 23720: lr=1.00E-05, loss= 1.1950 (max= 1.6524), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:04:22,257 - root - INFO - Step 23720: lr=1.00E-05, loss= 1.1950 (max= 1.6524), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:04:22,257 - root - INFO - Step 23720: lr=1.00E-05, loss= 1.1950 (max= 1.6524), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:04:38,231 - root - INFO - Step 23730: lr=1.00E-05, loss= 1.1517 (max= 1.4865), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:04:38,231 - root - INFO - Step 23730: lr=1.00E-05, loss= 1.1517 (max= 1.4865), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:04:38,231 - root - INFO - Step 23730: lr=1.00E-05, loss= 1.1517 (max= 1.4865), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:04:38,231 - root - INFO - Step 23730: lr=1.00E-05, loss= 1.1517 (max= 1.4865), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:04:38,231 - root - INFO - Step 23730: lr=1.00E-05, loss= 1.1517 (max= 1.4865), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:04:38,231 - root - INFO - Step 23730: lr=1.00E-05, loss= 1.1517 (max= 1.4865), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:04:38,231 - root - INFO - Step 23730: lr=1.00E-05, loss= 1.1517 (max= 1.4865), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:04:38,231 - root - INFO - Step 23730: lr=1.00E-05, loss= 1.1517 (max= 1.4865), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:04:54,120 - root - INFO - Step 23740: lr=1.00E-05, loss= 1.1902 (max= 1.9160), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:04:54,120 - root - INFO - Step 23740: lr=1.00E-05, loss= 1.1902 (max= 1.9160), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:04:54,120 - root - INFO - Step 23740: lr=1.00E-05, loss= 1.1902 (max= 1.9160), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:04:54,120 - root - INFO - Step 23740: lr=1.00E-05, loss= 1.1902 (max= 1.9160), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:04:54,120 - root - INFO - Step 23740: lr=1.00E-05, loss= 1.1902 (max= 1.9160), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:04:54,121 - root - INFO - Step 23740: lr=1.00E-05, loss= 1.1902 (max= 1.9160), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:04:54,121 - root - INFO - Step 23740: lr=1.00E-05, loss= 1.1902 (max= 1.9160), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:04:54,121 - root - INFO - Step 23740: lr=1.00E-05, loss= 1.1902 (max= 1.9160), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:10,009 - root - INFO - Step 23750: lr=1.00E-05, loss= 1.2075 (max= 1.5470), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:10,009 - root - INFO - Step 23750: lr=1.00E-05, loss= 1.2075 (max= 1.5470), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:10,009 - root - INFO - Step 23750: lr=1.00E-05, loss= 1.2075 (max= 1.5470), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:10,009 - root - INFO - Step 23750: lr=1.00E-05, loss= 1.2075 (max= 1.5470), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:10,009 - root - INFO - Step 23750: lr=1.00E-05, loss= 1.2075 (max= 1.5470), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:10,009 - root - INFO - Step 23750: lr=1.00E-05, loss= 1.2075 (max= 1.5470), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:10,009 - root - INFO - Step 23750: lr=1.00E-05, loss= 1.2075 (max= 1.5470), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:10,009 - root - INFO - Step 23750: lr=1.00E-05, loss= 1.2075 (max= 1.5470), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:25,919 - root - INFO - Step 23760: lr=1.00E-05, loss= 1.2180 (max= 1.5987), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:05:25,919 - root - INFO - Step 23760: lr=1.00E-05, loss= 1.2180 (max= 1.5987), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:05:25,919 - root - INFO - Step 23760: lr=1.00E-05, loss= 1.2180 (max= 1.5987), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:05:25,919 - root - INFO - Step 23760: lr=1.00E-05, loss= 1.2180 (max= 1.5987), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:05:25,919 - root - INFO - Step 23760: lr=1.00E-05, loss= 1.2180 (max= 1.5987), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:05:25,919 - root - INFO - Step 23760: lr=1.00E-05, loss= 1.2180 (max= 1.5987), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:05:25,919 - root - INFO - Step 23760: lr=1.00E-05, loss= 1.2180 (max= 1.5987), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:05:25,920 - root - INFO - Step 23760: lr=1.00E-05, loss= 1.2180 (max= 1.5987), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:05:41,841 - root - INFO - Step 23770: lr=1.00E-05, loss= 1.1800 (max= 1.5270), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:41,841 - root - INFO - Step 23770: lr=1.00E-05, loss= 1.1800 (max= 1.5270), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:41,841 - root - INFO - Step 23770: lr=1.00E-05, loss= 1.1800 (max= 1.5270), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:41,841 - root - INFO - Step 23770: lr=1.00E-05, loss= 1.1800 (max= 1.5270), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:41,841 - root - INFO - Step 23770: lr=1.00E-05, loss= 1.1800 (max= 1.5270), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:41,841 - root - INFO - Step 23770: lr=1.00E-05, loss= 1.1800 (max= 1.5270), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:41,841 - root - INFO - Step 23770: lr=1.00E-05, loss= 1.1800 (max= 1.5270), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:41,842 - root - INFO - Step 23770: lr=1.00E-05, loss= 1.1800 (max= 1.5270), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:57,775 - root - INFO - Step 23780: lr=1.00E-05, loss= 1.2125 (max= 1.9647), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:57,775 - root - INFO - Step 23780: lr=1.00E-05, loss= 1.2125 (max= 1.9647), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:57,775 - root - INFO - Step 23780: lr=1.00E-05, loss= 1.2125 (max= 1.9647), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:57,775 - root - INFO - Step 23780: lr=1.00E-05, loss= 1.2125 (max= 1.9647), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:57,775 - root - INFO - Step 23780: lr=1.00E-05, loss= 1.2125 (max= 1.9647), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:57,775 - root - INFO - Step 23780: lr=1.00E-05, loss= 1.2125 (max= 1.9647), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:57,775 - root - INFO - Step 23780: lr=1.00E-05, loss= 1.2125 (max= 1.9647), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:05:57,775 - root - INFO - Step 23780: lr=1.00E-05, loss= 1.2125 (max= 1.9647), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:06:06,503 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:4206988 +2025-10-24 21:06:13,725 - root - INFO - Step 23790: lr=1.00E-05, loss= 1.1664 (max= 1.6458), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:06:13,726 - root - INFO - Step 23790: lr=1.00E-05, loss= 1.1664 (max= 1.6458), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:06:13,726 - root - INFO - Step 23790: lr=1.00E-05, loss= 1.1664 (max= 1.6458), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:06:13,726 - root - INFO - Step 23790: lr=1.00E-05, loss= 1.1664 (max= 1.6458), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:06:13,726 - root - INFO - Step 23790: lr=1.00E-05, loss= 1.1664 (max= 1.6458), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:06:13,726 - root - INFO - Step 23790: lr=1.00E-05, loss= 1.1664 (max= 1.6458), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:06:13,726 - root - INFO - Step 23790: lr=1.00E-05, loss= 1.1664 (max= 1.6458), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:06:13,726 - root - INFO - Step 23790: lr=1.00E-05, loss= 1.1664 (max= 1.6458), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:06:29,669 - root - INFO - Step 23800: lr=1.00E-05, loss= 1.1680 (max= 1.5421), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:06:29,669 - root - INFO - Step 23800: lr=1.00E-05, loss= 1.1680 (max= 1.5421), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:06:29,669 - root - INFO - Step 23800: lr=1.00E-05, loss= 1.1680 (max= 1.5421), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:06:29,669 - root - INFO - Step 23800: lr=1.00E-05, loss= 1.1680 (max= 1.5421), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:06:29,669 - root - INFO - Step 23800: lr=1.00E-05, loss= 1.1680 (max= 1.5421), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:06:29,669 - root - INFO - Step 23800: lr=1.00E-05, loss= 1.1680 (max= 1.5421), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:06:29,669 - root - INFO - Step 23800: lr=1.00E-05, loss= 1.1680 (max= 1.5421), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:06:29,669 - root - INFO - Step 23800: lr=1.00E-05, loss= 1.1680 (max= 1.5421), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:06:45,610 - root - INFO - Step 23810: lr=1.00E-05, loss= 1.1842 (max= 1.6610), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:06:45,610 - root - INFO - Step 23810: lr=1.00E-05, loss= 1.1842 (max= 1.6610), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:06:45,610 - root - INFO - Step 23810: lr=1.00E-05, loss= 1.1842 (max= 1.6610), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:06:45,610 - root - INFO - Step 23810: lr=1.00E-05, loss= 1.1842 (max= 1.6610), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:06:45,610 - root - INFO - Step 23810: lr=1.00E-05, loss= 1.1842 (max= 1.6610), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:06:45,610 - root - INFO - Step 23810: lr=1.00E-05, loss= 1.1842 (max= 1.6610), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:06:45,610 - root - INFO - Step 23810: lr=1.00E-05, loss= 1.1842 (max= 1.6610), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:06:45,610 - root - INFO - Step 23810: lr=1.00E-05, loss= 1.1842 (max= 1.6610), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:01,574 - root - INFO - Step 23820: lr=1.00E-05, loss= 1.2066 (max= 1.8520), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:01,574 - root - INFO - Step 23820: lr=1.00E-05, loss= 1.2066 (max= 1.8520), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:01,574 - root - INFO - Step 23820: lr=1.00E-05, loss= 1.2066 (max= 1.8520), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:01,574 - root - INFO - Step 23820: lr=1.00E-05, loss= 1.2066 (max= 1.8520), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:01,574 - root - INFO - Step 23820: lr=1.00E-05, loss= 1.2066 (max= 1.8520), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:01,574 - root - INFO - Step 23820: lr=1.00E-05, loss= 1.2066 (max= 1.8520), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:01,574 - root - INFO - Step 23820: lr=1.00E-05, loss= 1.2066 (max= 1.8520), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:01,574 - root - INFO - Step 23820: lr=1.00E-05, loss= 1.2066 (max= 1.8520), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:17,509 - root - INFO - Step 23830: lr=1.00E-05, loss= 1.1975 (max= 1.6380), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:07:17,509 - root - INFO - Step 23830: lr=1.00E-05, loss= 1.1975 (max= 1.6380), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:07:17,509 - root - INFO - Step 23830: lr=1.00E-05, loss= 1.1975 (max= 1.6380), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:07:17,509 - root - INFO - Step 23830: lr=1.00E-05, loss= 1.1975 (max= 1.6380), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:07:17,509 - root - INFO - Step 23830: lr=1.00E-05, loss= 1.1975 (max= 1.6380), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:07:17,509 - root - INFO - Step 23830: lr=1.00E-05, loss= 1.1975 (max= 1.6380), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:07:17,509 - root - INFO - Step 23830: lr=1.00E-05, loss= 1.1975 (max= 1.6380), tps=20569, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:07:17,509 - root - INFO - Step 23830: lr=1.00E-05, loss= 1.1975 (max= 1.6380), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:07:33,437 - root - INFO - Step 23840: lr=1.00E-05, loss= 1.1818 (max= 1.5636), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:33,437 - root - INFO - Step 23840: lr=1.00E-05, loss= 1.1818 (max= 1.5636), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:33,437 - root - INFO - Step 23840: lr=1.00E-05, loss= 1.1818 (max= 1.5636), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:33,437 - root - INFO - Step 23840: lr=1.00E-05, loss= 1.1818 (max= 1.5636), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:33,437 - root - INFO - Step 23840: lr=1.00E-05, loss= 1.1818 (max= 1.5636), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:33,438 - root - INFO - Step 23840: lr=1.00E-05, loss= 1.1818 (max= 1.5636), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:33,438 - root - INFO - Step 23840: lr=1.00E-05, loss= 1.1818 (max= 1.5636), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:33,438 - root - INFO - Step 23840: lr=1.00E-05, loss= 1.1818 (max= 1.5636), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:49,352 - root - INFO - Step 23850: lr=1.00E-05, loss= 1.1923 (max= 1.8466), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:49,352 - root - INFO - Step 23850: lr=1.00E-05, loss= 1.1923 (max= 1.8466), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:49,352 - root - INFO - Step 23850: lr=1.00E-05, loss= 1.1923 (max= 1.8466), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:49,352 - root - INFO - Step 23850: lr=1.00E-05, loss= 1.1923 (max= 1.8466), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:49,352 - root - INFO - Step 23850: lr=1.00E-05, loss= 1.1923 (max= 1.8466), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:49,352 - root - INFO - Step 23850: lr=1.00E-05, loss= 1.1923 (max= 1.8466), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:49,352 - root - INFO - Step 23850: lr=1.00E-05, loss= 1.1923 (max= 1.8466), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:07:49,352 - root - INFO - Step 23850: lr=1.00E-05, loss= 1.1923 (max= 1.8466), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:05,253 - root - INFO - Step 23860: lr=1.00E-05, loss= 1.1965 (max= 1.7335), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:08:05,253 - root - INFO - Step 23860: lr=1.00E-05, loss= 1.1965 (max= 1.7335), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:08:05,253 - root - INFO - Step 23860: lr=1.00E-05, loss= 1.1965 (max= 1.7335), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:08:05,253 - root - INFO - Step 23860: lr=1.00E-05, loss= 1.1965 (max= 1.7335), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:08:05,253 - root - INFO - Step 23860: lr=1.00E-05, loss= 1.1965 (max= 1.7335), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:08:05,253 - root - INFO - Step 23860: lr=1.00E-05, loss= 1.1965 (max= 1.7335), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:08:05,253 - root - INFO - Step 23860: lr=1.00E-05, loss= 1.1965 (max= 1.7335), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:08:05,253 - root - INFO - Step 23860: lr=1.00E-05, loss= 1.1965 (max= 1.7335), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:08:21,221 - root - INFO - Step 23870: lr=1.00E-05, loss= 1.1864 (max= 1.5825), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:21,221 - root - INFO - Step 23870: lr=1.00E-05, loss= 1.1864 (max= 1.5825), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:21,221 - root - INFO - Step 23870: lr=1.00E-05, loss= 1.1864 (max= 1.5825), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:21,221 - root - INFO - Step 23870: lr=1.00E-05, loss= 1.1864 (max= 1.5825), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:21,221 - root - INFO - Step 23870: lr=1.00E-05, loss= 1.1864 (max= 1.5825), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:21,221 - root - INFO - Step 23870: lr=1.00E-05, loss= 1.1864 (max= 1.5825), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:21,221 - root - INFO - Step 23870: lr=1.00E-05, loss= 1.1864 (max= 1.5825), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:21,221 - root - INFO - Step 23870: lr=1.00E-05, loss= 1.1864 (max= 1.5825), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:37,152 - root - INFO - Step 23880: lr=1.00E-05, loss= 1.1867 (max= 1.5875), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:37,152 - root - INFO - Step 23880: lr=1.00E-05, loss= 1.1867 (max= 1.5875), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:37,152 - root - INFO - Step 23880: lr=1.00E-05, loss= 1.1867 (max= 1.5875), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:37,152 - root - INFO - Step 23880: lr=1.00E-05, loss= 1.1867 (max= 1.5875), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:37,152 - root - INFO - Step 23880: lr=1.00E-05, loss= 1.1867 (max= 1.5875), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:37,152 - root - INFO - Step 23880: lr=1.00E-05, loss= 1.1867 (max= 1.5875), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:37,152 - root - INFO - Step 23880: lr=1.00E-05, loss= 1.1867 (max= 1.5875), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:37,152 - root - INFO - Step 23880: lr=1.00E-05, loss= 1.1867 (max= 1.5875), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:53,084 - root - INFO - Step 23890: lr=1.00E-05, loss= 1.1837 (max= 1.6411), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:53,084 - root - INFO - Step 23890: lr=1.00E-05, loss= 1.1837 (max= 1.6411), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:53,084 - root - INFO - Step 23890: lr=1.00E-05, loss= 1.1837 (max= 1.6411), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:53,084 - root - INFO - Step 23890: lr=1.00E-05, loss= 1.1837 (max= 1.6411), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:53,084 - root - INFO - Step 23890: lr=1.00E-05, loss= 1.1837 (max= 1.6411), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:53,085 - root - INFO - Step 23890: lr=1.00E-05, loss= 1.1837 (max= 1.6411), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:53,085 - root - INFO - Step 23890: lr=1.00E-05, loss= 1.1837 (max= 1.6411), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:08:53,085 - root - INFO - Step 23890: lr=1.00E-05, loss= 1.1837 (max= 1.6411), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:09:09,037 - root - INFO - Step 23900: lr=1.00E-05, loss= 1.1531 (max= 1.5379), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:09:09,037 - root - INFO - Step 23900: lr=1.00E-05, loss= 1.1531 (max= 1.5379), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:09:09,037 - root - INFO - Step 23900: lr=1.00E-05, loss= 1.1531 (max= 1.5379), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:09:09,037 - root - INFO - Step 23900: lr=1.00E-05, loss= 1.1531 (max= 1.5379), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:09:09,037 - root - INFO - Step 23900: lr=1.00E-05, loss= 1.1531 (max= 1.5379), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:09:09,037 - root - INFO - Step 23900: lr=1.00E-05, loss= 1.1531 (max= 1.5379), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:09:09,037 - root - INFO - Step 23900: lr=1.00E-05, loss= 1.1531 (max= 1.5379), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:09:09,037 - root - INFO - Step 23900: lr=1.00E-05, loss= 1.1531 (max= 1.5379), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:09:24,951 - root - INFO - Step 23910: lr=1.00E-05, loss= 1.1905 (max= 1.7794), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:09:24,951 - root - INFO - Step 23910: lr=1.00E-05, loss= 1.1905 (max= 1.7794), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:09:24,951 - root - INFO - Step 23910: lr=1.00E-05, loss= 1.1905 (max= 1.7794), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:09:24,951 - root - INFO - Step 23910: lr=1.00E-05, loss= 1.1905 (max= 1.7794), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:09:24,951 - root - INFO - Step 23910: lr=1.00E-05, loss= 1.1905 (max= 1.7794), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:09:24,951 - root - INFO - Step 23910: lr=1.00E-05, loss= 1.1905 (max= 1.7794), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:09:24,951 - root - INFO - Step 23910: lr=1.00E-05, loss= 1.1905 (max= 1.7794), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:09:24,952 - root - INFO - Step 23910: lr=1.00E-05, loss= 1.1905 (max= 1.7794), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:09:40,897 - root - INFO - Step 23920: lr=1.00E-05, loss= 1.1497 (max= 1.4776), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:09:40,897 - root - INFO - Step 23920: lr=1.00E-05, loss= 1.1497 (max= 1.4776), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:09:40,897 - root - INFO - Step 23920: lr=1.00E-05, loss= 1.1497 (max= 1.4776), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:09:40,897 - root - INFO - Step 23920: lr=1.00E-05, loss= 1.1497 (max= 1.4776), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:09:40,897 - root - INFO - Step 23920: lr=1.00E-05, loss= 1.1497 (max= 1.4776), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:09:40,897 - root - INFO - Step 23920: lr=1.00E-05, loss= 1.1497 (max= 1.4776), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:09:40,897 - root - INFO - Step 23920: lr=1.00E-05, loss= 1.1497 (max= 1.4776), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:09:40,897 - root - INFO - Step 23920: lr=1.00E-05, loss= 1.1497 (max= 1.4776), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:09:56,882 - root - INFO - Step 23930: lr=1.00E-05, loss= 1.1737 (max= 1.5570), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:09:56,882 - root - INFO - Step 23930: lr=1.00E-05, loss= 1.1737 (max= 1.5570), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:09:56,882 - root - INFO - Step 23930: lr=1.00E-05, loss= 1.1737 (max= 1.5570), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:09:56,882 - root - INFO - Step 23930: lr=1.00E-05, loss= 1.1737 (max= 1.5570), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:09:56,882 - root - INFO - Step 23930: lr=1.00E-05, loss= 1.1737 (max= 1.5570), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:09:56,882 - root - INFO - Step 23930: lr=1.00E-05, loss= 1.1737 (max= 1.5570), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:09:56,882 - root - INFO - Step 23930: lr=1.00E-05, loss= 1.1737 (max= 1.5570), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:09:56,882 - root - INFO - Step 23930: lr=1.00E-05, loss= 1.1737 (max= 1.5570), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:10:12,816 - root - INFO - Step 23940: lr=1.00E-05, loss= 1.1633 (max= 1.5184), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:10:12,816 - root - INFO - Step 23940: lr=1.00E-05, loss= 1.1633 (max= 1.5184), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:10:12,816 - root - INFO - Step 23940: lr=1.00E-05, loss= 1.1633 (max= 1.5184), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:10:12,816 - root - INFO - Step 23940: lr=1.00E-05, loss= 1.1633 (max= 1.5184), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:10:12,816 - root - INFO - Step 23940: lr=1.00E-05, loss= 1.1633 (max= 1.5184), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:10:12,816 - root - INFO - Step 23940: lr=1.00E-05, loss= 1.1633 (max= 1.5184), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:10:12,816 - root - INFO - Step 23940: lr=1.00E-05, loss= 1.1633 (max= 1.5184), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:10:12,816 - root - INFO - Step 23940: lr=1.00E-05, loss= 1.1633 (max= 1.5184), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:10:28,746 - root - INFO - Step 23950: lr=1.00E-05, loss= 1.1996 (max= 1.4897), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:10:28,746 - root - INFO - Step 23950: lr=1.00E-05, loss= 1.1996 (max= 1.4897), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:10:28,746 - root - INFO - Step 23950: lr=1.00E-05, loss= 1.1996 (max= 1.4897), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:10:28,746 - root - INFO - Step 23950: lr=1.00E-05, loss= 1.1996 (max= 1.4897), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:10:28,746 - root - INFO - Step 23950: lr=1.00E-05, loss= 1.1996 (max= 1.4897), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:10:28,746 - root - INFO - Step 23950: lr=1.00E-05, loss= 1.1996 (max= 1.4897), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:10:28,746 - root - INFO - Step 23950: lr=1.00E-05, loss= 1.1996 (max= 1.4897), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:10:28,747 - root - INFO - Step 23950: lr=1.00E-05, loss= 1.1996 (max= 1.4897), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:10:44,656 - root - INFO - Step 23960: lr=1.00E-05, loss= 1.1655 (max= 1.7417), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:10:44,656 - root - INFO - Step 23960: lr=1.00E-05, loss= 1.1655 (max= 1.7417), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:10:44,656 - root - INFO - Step 23960: lr=1.00E-05, loss= 1.1655 (max= 1.7417), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:10:44,656 - root - INFO - Step 23960: lr=1.00E-05, loss= 1.1655 (max= 1.7417), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:10:44,656 - root - INFO - Step 23960: lr=1.00E-05, loss= 1.1655 (max= 1.7417), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:10:44,656 - root - INFO - Step 23960: lr=1.00E-05, loss= 1.1655 (max= 1.7417), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:10:44,656 - root - INFO - Step 23960: lr=1.00E-05, loss= 1.1655 (max= 1.7417), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:10:44,656 - root - INFO - Step 23960: lr=1.00E-05, loss= 1.1655 (max= 1.7417), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:00,599 - root - INFO - Step 23970: lr=1.00E-05, loss= 1.1966 (max= 1.8171), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:00,599 - root - INFO - Step 23970: lr=1.00E-05, loss= 1.1966 (max= 1.8171), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:00,599 - root - INFO - Step 23970: lr=1.00E-05, loss= 1.1966 (max= 1.8171), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:00,599 - root - INFO - Step 23970: lr=1.00E-05, loss= 1.1966 (max= 1.8171), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:00,599 - root - INFO - Step 23970: lr=1.00E-05, loss= 1.1966 (max= 1.8171), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:00,599 - root - INFO - Step 23970: lr=1.00E-05, loss= 1.1966 (max= 1.8171), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:00,599 - root - INFO - Step 23970: lr=1.00E-05, loss= 1.1966 (max= 1.8171), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:00,599 - root - INFO - Step 23970: lr=1.00E-05, loss= 1.1966 (max= 1.8171), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:16,502 - root - INFO - Step 23980: lr=1.00E-05, loss= 1.1828 (max= 1.5588), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:16,502 - root - INFO - Step 23980: lr=1.00E-05, loss= 1.1828 (max= 1.5588), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:16,502 - root - INFO - Step 23980: lr=1.00E-05, loss= 1.1828 (max= 1.5588), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:16,502 - root - INFO - Step 23980: lr=1.00E-05, loss= 1.1828 (max= 1.5588), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:16,502 - root - INFO - Step 23980: lr=1.00E-05, loss= 1.1828 (max= 1.5588), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:16,502 - root - INFO - Step 23980: lr=1.00E-05, loss= 1.1828 (max= 1.5588), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:16,502 - root - INFO - Step 23980: lr=1.00E-05, loss= 1.1828 (max= 1.5588), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:16,502 - root - INFO - Step 23980: lr=1.00E-05, loss= 1.1828 (max= 1.5588), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:32,403 - root - INFO - Step 23990: lr=1.00E-05, loss= 1.1904 (max= 1.5041), tps=20612, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:32,403 - root - INFO - Step 23990: lr=1.00E-05, loss= 1.1904 (max= 1.5041), tps=20612, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:32,403 - root - INFO - Step 23990: lr=1.00E-05, loss= 1.1904 (max= 1.5041), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:32,403 - root - INFO - Step 23990: lr=1.00E-05, loss= 1.1904 (max= 1.5041), tps=20612, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:32,403 - root - INFO - Step 23990: lr=1.00E-05, loss= 1.1904 (max= 1.5041), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:32,403 - root - INFO - Step 23990: lr=1.00E-05, loss= 1.1904 (max= 1.5041), tps=20612, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:32,403 - root - INFO - Step 23990: lr=1.00E-05, loss= 1.1904 (max= 1.5041), tps=20612, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:32,403 - root - INFO - Step 23990: lr=1.00E-05, loss= 1.1904 (max= 1.5041), tps=20612, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:32,418 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:5734653 +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-24000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-24000! Save time: 4.438583612442017 +2025-10-24 21:11:48,299 - root - INFO - Step 24000: lr=1.00E-05, loss= 1.1767 (max= 1.5267), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:48,299 - root - INFO - Step 24000: lr=1.00E-05, loss= 1.1767 (max= 1.5267), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:48,299 - root - INFO - Step 24000: lr=1.00E-05, loss= 1.1767 (max= 1.5267), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:48,300 - root - INFO - Saving a full checkpoint at step 24000 +2025-10-24 21:11:48,300 - root - INFO - Saving a full checkpoint at step 24000 +2025-10-24 21:11:48,300 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 21:11:48,300 - root - INFO - Step 24000: lr=1.00E-05, loss= 1.1767 (max= 1.5267), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:48,300 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 21:11:48,300 - root - INFO - Saving a full checkpoint at step 24000 +2025-10-24 21:11:48,300 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 21:11:48,300 - root - INFO - Step 24000: lr=1.00E-05, loss= 1.1767 (max= 1.5267), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:48,300 - root - INFO - Saving a full checkpoint at step 24000 +2025-10-24 21:11:48,300 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 21:11:48,300 - root - INFO - Step 24000: lr=1.00E-05, loss= 1.1767 (max= 1.5267), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:48,300 - root - INFO - Saving a full checkpoint at step 24000 +2025-10-24 21:11:48,300 - root - INFO - Step 24000: lr=1.00E-05, loss= 1.1767 (max= 1.5267), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:48,300 - root - INFO - Step 24000: lr=1.00E-05, loss= 1.1767 (max= 1.5267), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:11:48,300 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 21:11:48,300 - root - INFO - Saving a full checkpoint at step 24000 +2025-10-24 21:11:48,300 - root - INFO - Saving a full checkpoint at step 24000 +2025-10-24 21:11:48,300 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 21:11:48,300 - root - INFO - Saving a full checkpoint at step 24000 +2025-10-24 21:11:48,300 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 21:11:48,300 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 21:12:02,704 - root - INFO - Finished saving the checkpoint in 14.40 seconds +2025-10-24 21:12:02,712 - root - INFO - Finished saving the checkpoint in 14.41 seconds +2025-10-24 21:12:02,712 - root - INFO - Finished saving the checkpoint in 14.41 seconds +2025-10-24 21:12:02,712 - root - INFO - Finished saving the checkpoint in 14.41 seconds +2025-10-24 21:12:02,712 - root - INFO - Finished saving the checkpoint in 14.41 seconds +2025-10-24 21:12:02,712 - root - INFO - Finished saving the checkpoint in 14.41 seconds +2025-10-24 21:12:02,713 - root - INFO - Finished saving the checkpoint in 14.41 seconds +2025-10-24 21:12:02,713 - root - INFO - Finished saving the checkpoint in 14.41 seconds +2025-10-24 21:12:18,619 - root - INFO - Step 24010: lr=1.00E-05, loss= 1.1547 (max= 1.5454), tps=10809, mfu=22.52%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:12:18,619 - root - INFO - Step 24010: lr=1.00E-05, loss= 1.1547 (max= 1.5454), tps=10809, mfu=22.52%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:12:18,619 - root - INFO - Step 24010: lr=1.00E-05, loss= 1.1547 (max= 1.5454), tps=10809, mfu=22.52%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:12:18,619 - root - INFO - Step 24010: lr=1.00E-05, loss= 1.1547 (max= 1.5454), tps=10809, mfu=22.52%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:12:18,619 - root - INFO - Step 24010: lr=1.00E-05, loss= 1.1547 (max= 1.5454), tps=10809, mfu=22.52%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:12:18,619 - root - INFO - Step 24010: lr=1.00E-05, loss= 1.1547 (max= 1.5454), tps=10809, mfu=22.52%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:12:18,619 - root - INFO - Step 24010: lr=1.00E-05, loss= 1.1547 (max= 1.5454), tps=10809, mfu=22.52%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:12:18,619 - root - INFO - Step 24010: lr=1.00E-05, loss= 1.1547 (max= 1.5454), tps=10809, mfu=22.52%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:12:34,539 - root - INFO - Step 24020: lr=1.00E-05, loss= 1.1389 (max= 1.4958), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:12:34,539 - root - INFO - Step 24020: lr=1.00E-05, loss= 1.1389 (max= 1.4958), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:12:34,539 - root - INFO - Step 24020: lr=1.00E-05, loss= 1.1389 (max= 1.4958), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:12:34,539 - root - INFO - Step 24020: lr=1.00E-05, loss= 1.1389 (max= 1.4958), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:12:34,539 - root - INFO - Step 24020: lr=1.00E-05, loss= 1.1389 (max= 1.4958), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:12:34,539 - root - INFO - Step 24020: lr=1.00E-05, loss= 1.1389 (max= 1.4958), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:12:34,539 - root - INFO - Step 24020: lr=1.00E-05, loss= 1.1389 (max= 1.4958), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:12:34,539 - root - INFO - Step 24020: lr=1.00E-05, loss= 1.1389 (max= 1.4958), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:12:50,439 - root - INFO - Step 24030: lr=1.00E-05, loss= 1.2058 (max= 1.6678), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:12:50,439 - root - INFO - Step 24030: lr=1.00E-05, loss= 1.2058 (max= 1.6678), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:12:50,439 - root - INFO - Step 24030: lr=1.00E-05, loss= 1.2058 (max= 1.6678), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:12:50,439 - root - INFO - Step 24030: lr=1.00E-05, loss= 1.2058 (max= 1.6678), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:12:50,439 - root - INFO - Step 24030: lr=1.00E-05, loss= 1.2058 (max= 1.6678), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:12:50,439 - root - INFO - Step 24030: lr=1.00E-05, loss= 1.2058 (max= 1.6678), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:12:50,439 - root - INFO - Step 24030: lr=1.00E-05, loss= 1.2058 (max= 1.6678), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:12:50,439 - root - INFO - Step 24030: lr=1.00E-05, loss= 1.2058 (max= 1.6678), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:06,399 - root - INFO - Step 24040: lr=1.00E-05, loss= 1.1925 (max= 1.7448), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:06,399 - root - INFO - Step 24040: lr=1.00E-05, loss= 1.1925 (max= 1.7448), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:06,400 - root - INFO - Step 24040: lr=1.00E-05, loss= 1.1925 (max= 1.7448), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:06,400 - root - INFO - Step 24040: lr=1.00E-05, loss= 1.1925 (max= 1.7448), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:06,400 - root - INFO - Step 24040: lr=1.00E-05, loss= 1.1925 (max= 1.7448), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:06,400 - root - INFO - Step 24040: lr=1.00E-05, loss= 1.1925 (max= 1.7448), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:06,400 - root - INFO - Step 24040: lr=1.00E-05, loss= 1.1925 (max= 1.7448), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:06,400 - root - INFO - Step 24040: lr=1.00E-05, loss= 1.1925 (max= 1.7448), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:22,341 - root - INFO - Step 24050: lr=1.00E-05, loss= 1.1644 (max= 1.6247), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:22,341 - root - INFO - Step 24050: lr=1.00E-05, loss= 1.1644 (max= 1.6247), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:22,341 - root - INFO - Step 24050: lr=1.00E-05, loss= 1.1644 (max= 1.6247), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:22,341 - root - INFO - Step 24050: lr=1.00E-05, loss= 1.1644 (max= 1.6247), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:22,341 - root - INFO - Step 24050: lr=1.00E-05, loss= 1.1644 (max= 1.6247), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:22,341 - root - INFO - Step 24050: lr=1.00E-05, loss= 1.1644 (max= 1.6247), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:22,341 - root - INFO - Step 24050: lr=1.00E-05, loss= 1.1644 (max= 1.6247), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:22,341 - root - INFO - Step 24050: lr=1.00E-05, loss= 1.1644 (max= 1.6247), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:38,232 - root - INFO - Step 24060: lr=1.00E-05, loss= 1.1717 (max= 1.5344), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:13:38,233 - root - INFO - Step 24060: lr=1.00E-05, loss= 1.1717 (max= 1.5344), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:13:38,233 - root - INFO - Step 24060: lr=1.00E-05, loss= 1.1717 (max= 1.5344), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:13:38,233 - root - INFO - Step 24060: lr=1.00E-05, loss= 1.1717 (max= 1.5344), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:13:38,233 - root - INFO - Step 24060: lr=1.00E-05, loss= 1.1717 (max= 1.5344), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:13:38,233 - root - INFO - Step 24060: lr=1.00E-05, loss= 1.1717 (max= 1.5344), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:13:38,233 - root - INFO - Step 24060: lr=1.00E-05, loss= 1.1717 (max= 1.5344), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:13:38,233 - root - INFO - Step 24060: lr=1.00E-05, loss= 1.1717 (max= 1.5344), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:13:54,173 - root - INFO - Step 24070: lr=1.00E-05, loss= 1.1814 (max= 1.6619), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:54,173 - root - INFO - Step 24070: lr=1.00E-05, loss= 1.1814 (max= 1.6619), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:54,173 - root - INFO - Step 24070: lr=1.00E-05, loss= 1.1814 (max= 1.6619), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:54,173 - root - INFO - Step 24070: lr=1.00E-05, loss= 1.1814 (max= 1.6619), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:54,173 - root - INFO - Step 24070: lr=1.00E-05, loss= 1.1814 (max= 1.6619), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:54,173 - root - INFO - Step 24070: lr=1.00E-05, loss= 1.1814 (max= 1.6619), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:54,173 - root - INFO - Step 24070: lr=1.00E-05, loss= 1.1814 (max= 1.6619), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:13:54,173 - root - INFO - Step 24070: lr=1.00E-05, loss= 1.1814 (max= 1.6619), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:14:10,124 - root - INFO - Step 24080: lr=1.00E-05, loss= 1.1945 (max= 1.7287), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:10,124 - root - INFO - Step 24080: lr=1.00E-05, loss= 1.1945 (max= 1.7287), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:10,124 - root - INFO - Step 24080: lr=1.00E-05, loss= 1.1945 (max= 1.7287), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:10,124 - root - INFO - Step 24080: lr=1.00E-05, loss= 1.1945 (max= 1.7287), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:10,124 - root - INFO - Step 24080: lr=1.00E-05, loss= 1.1945 (max= 1.7287), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:10,124 - root - INFO - Step 24080: lr=1.00E-05, loss= 1.1945 (max= 1.7287), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:10,124 - root - INFO - Step 24080: lr=1.00E-05, loss= 1.1945 (max= 1.7287), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:10,124 - root - INFO - Step 24080: lr=1.00E-05, loss= 1.1945 (max= 1.7287), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:26,104 - root - INFO - Step 24090: lr=1.00E-05, loss= 1.1763 (max= 1.6025), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:26,104 - root - INFO - Step 24090: lr=1.00E-05, loss= 1.1763 (max= 1.6025), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:26,104 - root - INFO - Step 24090: lr=1.00E-05, loss= 1.1763 (max= 1.6025), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:26,104 - root - INFO - Step 24090: lr=1.00E-05, loss= 1.1763 (max= 1.6025), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:26,104 - root - INFO - Step 24090: lr=1.00E-05, loss= 1.1763 (max= 1.6025), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:26,104 - root - INFO - Step 24090: lr=1.00E-05, loss= 1.1763 (max= 1.6025), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:26,104 - root - INFO - Step 24090: lr=1.00E-05, loss= 1.1763 (max= 1.6025), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:26,104 - root - INFO - Step 24090: lr=1.00E-05, loss= 1.1763 (max= 1.6025), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:42,076 - root - INFO - Step 24100: lr=1.00E-05, loss= 1.1925 (max= 1.4754), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:42,076 - root - INFO - Step 24100: lr=1.00E-05, loss= 1.1925 (max= 1.4754), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:42,076 - root - INFO - Step 24100: lr=1.00E-05, loss= 1.1925 (max= 1.4754), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:42,076 - root - INFO - Step 24100: lr=1.00E-05, loss= 1.1925 (max= 1.4754), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:42,077 - root - INFO - Step 24100: lr=1.00E-05, loss= 1.1925 (max= 1.4754), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:42,077 - root - INFO - Step 24100: lr=1.00E-05, loss= 1.1925 (max= 1.4754), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:42,077 - root - INFO - Step 24100: lr=1.00E-05, loss= 1.1925 (max= 1.4754), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:42,077 - root - INFO - Step 24100: lr=1.00E-05, loss= 1.1925 (max= 1.4754), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:14:58,011 - root - INFO - Step 24110: lr=1.00E-05, loss= 1.1606 (max= 1.8183), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:14:58,011 - root - INFO - Step 24110: lr=1.00E-05, loss= 1.1606 (max= 1.8183), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:14:58,011 - root - INFO - Step 24110: lr=1.00E-05, loss= 1.1606 (max= 1.8183), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:14:58,011 - root - INFO - Step 24110: lr=1.00E-05, loss= 1.1606 (max= 1.8183), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:14:58,011 - root - INFO - Step 24110: lr=1.00E-05, loss= 1.1606 (max= 1.8183), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:14:58,011 - root - INFO - Step 24110: lr=1.00E-05, loss= 1.1606 (max= 1.8183), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:14:58,011 - root - INFO - Step 24110: lr=1.00E-05, loss= 1.1606 (max= 1.8183), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:14:58,011 - root - INFO - Step 24110: lr=1.00E-05, loss= 1.1606 (max= 1.8183), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:15:13,917 - root - INFO - Step 24120: lr=1.00E-05, loss= 1.1987 (max= 1.8712), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:15:13,917 - root - INFO - Step 24120: lr=1.00E-05, loss= 1.1987 (max= 1.8712), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:15:13,917 - root - INFO - Step 24120: lr=1.00E-05, loss= 1.1987 (max= 1.8712), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:15:13,917 - root - INFO - Step 24120: lr=1.00E-05, loss= 1.1987 (max= 1.8712), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:15:13,917 - root - INFO - Step 24120: lr=1.00E-05, loss= 1.1987 (max= 1.8712), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:15:13,917 - root - INFO - Step 24120: lr=1.00E-05, loss= 1.1987 (max= 1.8712), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:15:13,917 - root - INFO - Step 24120: lr=1.00E-05, loss= 1.1987 (max= 1.8712), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:15:13,917 - root - INFO - Step 24120: lr=1.00E-05, loss= 1.1987 (max= 1.8712), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:15:29,846 - root - INFO - Step 24130: lr=1.00E-05, loss= 1.2010 (max= 1.6470), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:15:29,846 - root - INFO - Step 24130: lr=1.00E-05, loss= 1.2010 (max= 1.6470), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:15:29,846 - root - INFO - Step 24130: lr=1.00E-05, loss= 1.2010 (max= 1.6470), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:15:29,846 - root - INFO - Step 24130: lr=1.00E-05, loss= 1.2010 (max= 1.6470), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:15:29,846 - root - INFO - Step 24130: lr=1.00E-05, loss= 1.2010 (max= 1.6470), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:15:29,846 - root - INFO - Step 24130: lr=1.00E-05, loss= 1.2010 (max= 1.6470), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:15:29,846 - root - INFO - Step 24130: lr=1.00E-05, loss= 1.2010 (max= 1.6470), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:15:29,847 - root - INFO - Step 24130: lr=1.00E-05, loss= 1.2010 (max= 1.6470), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:15:45,820 - root - INFO - Step 24140: lr=1.00E-05, loss= 1.1711 (max= 1.4280), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:15:45,820 - root - INFO - Step 24140: lr=1.00E-05, loss= 1.1711 (max= 1.4280), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:15:45,820 - root - INFO - Step 24140: lr=1.00E-05, loss= 1.1711 (max= 1.4280), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:15:45,820 - root - INFO - Step 24140: lr=1.00E-05, loss= 1.1711 (max= 1.4280), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:15:45,820 - root - INFO - Step 24140: lr=1.00E-05, loss= 1.1711 (max= 1.4280), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:15:45,820 - root - INFO - Step 24140: lr=1.00E-05, loss= 1.1711 (max= 1.4280), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:15:45,820 - root - INFO - Step 24140: lr=1.00E-05, loss= 1.1711 (max= 1.4280), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:15:45,820 - root - INFO - Step 24140: lr=1.00E-05, loss= 1.1711 (max= 1.4280), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:16:01,771 - root - INFO - Step 24150: lr=1.00E-05, loss= 1.1907 (max= 1.6872), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:16:01,772 - root - INFO - Step 24150: lr=1.00E-05, loss= 1.1907 (max= 1.6872), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:16:01,772 - root - INFO - Step 24150: lr=1.00E-05, loss= 1.1907 (max= 1.6872), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:16:01,772 - root - INFO - Step 24150: lr=1.00E-05, loss= 1.1907 (max= 1.6872), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:16:01,772 - root - INFO - Step 24150: lr=1.00E-05, loss= 1.1907 (max= 1.6872), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:16:01,772 - root - INFO - Step 24150: lr=1.00E-05, loss= 1.1907 (max= 1.6872), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:16:01,772 - root - INFO - Step 24150: lr=1.00E-05, loss= 1.1907 (max= 1.6872), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:16:01,772 - root - INFO - Step 24150: lr=1.00E-05, loss= 1.1907 (max= 1.6872), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:16:17,716 - root - INFO - Step 24160: lr=1.00E-05, loss= 1.1810 (max= 1.5942), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:16:17,716 - root - INFO - Step 24160: lr=1.00E-05, loss= 1.1810 (max= 1.5942), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:16:17,716 - root - INFO - Step 24160: lr=1.00E-05, loss= 1.1810 (max= 1.5942), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:16:17,716 - root - INFO - Step 24160: lr=1.00E-05, loss= 1.1810 (max= 1.5942), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:16:17,716 - root - INFO - Step 24160: lr=1.00E-05, loss= 1.1810 (max= 1.5942), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:16:17,716 - root - INFO - Step 24160: lr=1.00E-05, loss= 1.1810 (max= 1.5942), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:16:17,717 - root - INFO - Step 24160: lr=1.00E-05, loss= 1.1810 (max= 1.5942), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:16:17,717 - root - INFO - Step 24160: lr=1.00E-05, loss= 1.1810 (max= 1.5942), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:16:33,653 - root - INFO - Step 24170: lr=1.00E-05, loss= 1.1824 (max= 1.6749), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:16:33,653 - root - INFO - Step 24170: lr=1.00E-05, loss= 1.1824 (max= 1.6749), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:16:33,653 - root - INFO - Step 24170: lr=1.00E-05, loss= 1.1824 (max= 1.6749), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:16:33,653 - root - INFO - Step 24170: lr=1.00E-05, loss= 1.1824 (max= 1.6749), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:16:33,653 - root - INFO - Step 24170: lr=1.00E-05, loss= 1.1824 (max= 1.6749), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:16:33,653 - root - INFO - Step 24170: lr=1.00E-05, loss= 1.1824 (max= 1.6749), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:16:33,653 - root - INFO - Step 24170: lr=1.00E-05, loss= 1.1824 (max= 1.6749), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:16:33,653 - root - INFO - Step 24170: lr=1.00E-05, loss= 1.1824 (max= 1.6749), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:16:49,637 - root - INFO - Step 24180: lr=1.00E-05, loss= 1.1749 (max= 1.5726), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:16:49,637 - root - INFO - Step 24180: lr=1.00E-05, loss= 1.1749 (max= 1.5726), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:16:49,637 - root - INFO - Step 24180: lr=1.00E-05, loss= 1.1749 (max= 1.5726), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:16:49,637 - root - INFO - Step 24180: lr=1.00E-05, loss= 1.1749 (max= 1.5726), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:16:49,637 - root - INFO - Step 24180: lr=1.00E-05, loss= 1.1749 (max= 1.5726), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:16:49,637 - root - INFO - Step 24180: lr=1.00E-05, loss= 1.1749 (max= 1.5726), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:16:49,637 - root - INFO - Step 24180: lr=1.00E-05, loss= 1.1749 (max= 1.5726), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:16:49,637 - root - INFO - Step 24180: lr=1.00E-05, loss= 1.1749 (max= 1.5726), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:17:05,590 - root - INFO - Step 24190: lr=1.00E-05, loss= 1.1847 (max= 1.6740), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:17:05,590 - root - INFO - Step 24190: lr=1.00E-05, loss= 1.1847 (max= 1.6740), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:17:05,590 - root - INFO - Step 24190: lr=1.00E-05, loss= 1.1847 (max= 1.6740), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:17:05,590 - root - INFO - Step 24190: lr=1.00E-05, loss= 1.1847 (max= 1.6740), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:17:05,590 - root - INFO - Step 24190: lr=1.00E-05, loss= 1.1847 (max= 1.6740), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:17:05,590 - root - INFO - Step 24190: lr=1.00E-05, loss= 1.1847 (max= 1.6740), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:17:05,590 - root - INFO - Step 24190: lr=1.00E-05, loss= 1.1847 (max= 1.6740), tps=20545, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:17:05,590 - root - INFO - Step 24190: lr=1.00E-05, loss= 1.1847 (max= 1.6740), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:17:21,553 - root - INFO - Step 24200: lr=1.00E-05, loss= 1.1970 (max= 1.7179), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:21,553 - root - INFO - Step 24200: lr=1.00E-05, loss= 1.1970 (max= 1.7179), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:21,553 - root - INFO - Step 24200: lr=1.00E-05, loss= 1.1970 (max= 1.7179), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:21,553 - root - INFO - Step 24200: lr=1.00E-05, loss= 1.1970 (max= 1.7179), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:21,553 - root - INFO - Step 24200: lr=1.00E-05, loss= 1.1970 (max= 1.7179), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:21,553 - root - INFO - Step 24200: lr=1.00E-05, loss= 1.1970 (max= 1.7179), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:21,553 - root - INFO - Step 24200: lr=1.00E-05, loss= 1.1970 (max= 1.7179), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:21,553 - root - INFO - Step 24200: lr=1.00E-05, loss= 1.1970 (max= 1.7179), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:37,501 - root - INFO - Step 24210: lr=1.00E-05, loss= 1.1799 (max= 1.6797), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:37,501 - root - INFO - Step 24210: lr=1.00E-05, loss= 1.1799 (max= 1.6797), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:37,501 - root - INFO - Step 24210: lr=1.00E-05, loss= 1.1799 (max= 1.6797), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:37,501 - root - INFO - Step 24210: lr=1.00E-05, loss= 1.1799 (max= 1.6797), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:37,501 - root - INFO - Step 24210: lr=1.00E-05, loss= 1.1799 (max= 1.6797), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:37,501 - root - INFO - Step 24210: lr=1.00E-05, loss= 1.1799 (max= 1.6797), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:37,501 - root - INFO - Step 24210: lr=1.00E-05, loss= 1.1799 (max= 1.6797), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:37,501 - root - INFO - Step 24210: lr=1.00E-05, loss= 1.1799 (max= 1.6797), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:53,463 - root - INFO - Step 24220: lr=1.00E-05, loss= 1.1959 (max= 1.7170), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:53,463 - root - INFO - Step 24220: lr=1.00E-05, loss= 1.1959 (max= 1.7170), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:53,463 - root - INFO - Step 24220: lr=1.00E-05, loss= 1.1959 (max= 1.7170), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:53,463 - root - INFO - Step 24220: lr=1.00E-05, loss= 1.1959 (max= 1.7170), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:53,463 - root - INFO - Step 24220: lr=1.00E-05, loss= 1.1959 (max= 1.7170), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:53,463 - root - INFO - Step 24220: lr=1.00E-05, loss= 1.1959 (max= 1.7170), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:53,463 - root - INFO - Step 24220: lr=1.00E-05, loss= 1.1959 (max= 1.7170), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:17:53,463 - root - INFO - Step 24220: lr=1.00E-05, loss= 1.1959 (max= 1.7170), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:09,381 - root - INFO - Step 24230: lr=1.00E-05, loss= 1.1865 (max= 1.6170), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:18:09,381 - root - INFO - Step 24230: lr=1.00E-05, loss= 1.1865 (max= 1.6170), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:18:09,381 - root - INFO - Step 24230: lr=1.00E-05, loss= 1.1865 (max= 1.6170), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:18:09,381 - root - INFO - Step 24230: lr=1.00E-05, loss= 1.1865 (max= 1.6170), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:18:09,381 - root - INFO - Step 24230: lr=1.00E-05, loss= 1.1865 (max= 1.6170), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:18:09,381 - root - INFO - Step 24230: lr=1.00E-05, loss= 1.1865 (max= 1.6170), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:18:09,381 - root - INFO - Step 24230: lr=1.00E-05, loss= 1.1865 (max= 1.6170), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:18:09,381 - root - INFO - Step 24230: lr=1.00E-05, loss= 1.1865 (max= 1.6170), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:18:25,287 - root - INFO - Step 24240: lr=1.00E-05, loss= 1.2052 (max= 1.7585), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:25,287 - root - INFO - Step 24240: lr=1.00E-05, loss= 1.2052 (max= 1.7585), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:25,287 - root - INFO - Step 24240: lr=1.00E-05, loss= 1.2052 (max= 1.7585), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:25,288 - root - INFO - Step 24240: lr=1.00E-05, loss= 1.2052 (max= 1.7585), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:25,288 - root - INFO - Step 24240: lr=1.00E-05, loss= 1.2052 (max= 1.7585), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:25,288 - root - INFO - Step 24240: lr=1.00E-05, loss= 1.2052 (max= 1.7585), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:25,288 - root - INFO - Step 24240: lr=1.00E-05, loss= 1.2052 (max= 1.7585), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:25,288 - root - INFO - Step 24240: lr=1.00E-05, loss= 1.2052 (max= 1.7585), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:41,195 - root - INFO - Step 24250: lr=1.00E-05, loss= 1.1977 (max= 1.5107), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:41,195 - root - INFO - Step 24250: lr=1.00E-05, loss= 1.1977 (max= 1.5107), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:41,195 - root - INFO - Step 24250: lr=1.00E-05, loss= 1.1977 (max= 1.5107), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:41,195 - root - INFO - Step 24250: lr=1.00E-05, loss= 1.1977 (max= 1.5107), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:41,195 - root - INFO - Step 24250: lr=1.00E-05, loss= 1.1977 (max= 1.5107), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:41,195 - root - INFO - Step 24250: lr=1.00E-05, loss= 1.1977 (max= 1.5107), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:41,195 - root - INFO - Step 24250: lr=1.00E-05, loss= 1.1977 (max= 1.5107), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:41,195 - root - INFO - Step 24250: lr=1.00E-05, loss= 1.1977 (max= 1.5107), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:57,146 - root - INFO - Step 24260: lr=1.00E-05, loss= 1.1443 (max= 1.5600), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:57,146 - root - INFO - Step 24260: lr=1.00E-05, loss= 1.1443 (max= 1.5600), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:57,146 - root - INFO - Step 24260: lr=1.00E-05, loss= 1.1443 (max= 1.5600), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:57,146 - root - INFO - Step 24260: lr=1.00E-05, loss= 1.1443 (max= 1.5600), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:57,146 - root - INFO - Step 24260: lr=1.00E-05, loss= 1.1443 (max= 1.5600), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:57,146 - root - INFO - Step 24260: lr=1.00E-05, loss= 1.1443 (max= 1.5600), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:57,146 - root - INFO - Step 24260: lr=1.00E-05, loss= 1.1443 (max= 1.5600), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:18:57,146 - root - INFO - Step 24260: lr=1.00E-05, loss= 1.1443 (max= 1.5600), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:13,108 - root - INFO - Step 24270: lr=1.00E-05, loss= 1.1740 (max= 1.5985), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:13,108 - root - INFO - Step 24270: lr=1.00E-05, loss= 1.1740 (max= 1.5985), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:13,108 - root - INFO - Step 24270: lr=1.00E-05, loss= 1.1740 (max= 1.5985), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:13,108 - root - INFO - Step 24270: lr=1.00E-05, loss= 1.1740 (max= 1.5985), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:13,108 - root - INFO - Step 24270: lr=1.00E-05, loss= 1.1740 (max= 1.5985), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:13,108 - root - INFO - Step 24270: lr=1.00E-05, loss= 1.1740 (max= 1.5985), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:13,108 - root - INFO - Step 24270: lr=1.00E-05, loss= 1.1740 (max= 1.5985), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:13,108 - root - INFO - Step 24270: lr=1.00E-05, loss= 1.1740 (max= 1.5985), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:29,090 - root - INFO - Step 24280: lr=1.00E-05, loss= 1.1694 (max= 1.6256), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:29,090 - root - INFO - Step 24280: lr=1.00E-05, loss= 1.1694 (max= 1.6256), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:29,090 - root - INFO - Step 24280: lr=1.00E-05, loss= 1.1694 (max= 1.6256), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:29,090 - root - INFO - Step 24280: lr=1.00E-05, loss= 1.1694 (max= 1.6256), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:29,090 - root - INFO - Step 24280: lr=1.00E-05, loss= 1.1694 (max= 1.6256), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:29,090 - root - INFO - Step 24280: lr=1.00E-05, loss= 1.1694 (max= 1.6256), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:29,091 - root - INFO - Step 24280: lr=1.00E-05, loss= 1.1694 (max= 1.6256), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:29,091 - root - INFO - Step 24280: lr=1.00E-05, loss= 1.1694 (max= 1.6256), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:44,990 - root - INFO - Step 24290: lr=1.00E-05, loss= 1.1480 (max= 1.5374), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:44,990 - root - INFO - Step 24290: lr=1.00E-05, loss= 1.1480 (max= 1.5374), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:44,990 - root - INFO - Step 24290: lr=1.00E-05, loss= 1.1480 (max= 1.5374), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:44,990 - root - INFO - Step 24290: lr=1.00E-05, loss= 1.1480 (max= 1.5374), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:44,990 - root - INFO - Step 24290: lr=1.00E-05, loss= 1.1480 (max= 1.5374), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:44,990 - root - INFO - Step 24290: lr=1.00E-05, loss= 1.1480 (max= 1.5374), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:44,990 - root - INFO - Step 24290: lr=1.00E-05, loss= 1.1480 (max= 1.5374), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:19:44,990 - root - INFO - Step 24290: lr=1.00E-05, loss= 1.1480 (max= 1.5374), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:00,889 - root - INFO - Step 24300: lr=1.00E-05, loss= 1.1915 (max= 1.5636), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:00,889 - root - INFO - Step 24300: lr=1.00E-05, loss= 1.1915 (max= 1.5636), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:00,889 - root - INFO - Step 24300: lr=1.00E-05, loss= 1.1915 (max= 1.5636), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:00,889 - root - INFO - Step 24300: lr=1.00E-05, loss= 1.1915 (max= 1.5636), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:00,889 - root - INFO - Step 24300: lr=1.00E-05, loss= 1.1915 (max= 1.5636), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:00,889 - root - INFO - Step 24300: lr=1.00E-05, loss= 1.1915 (max= 1.5636), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:00,889 - root - INFO - Step 24300: lr=1.00E-05, loss= 1.1915 (max= 1.5636), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:00,889 - root - INFO - Step 24300: lr=1.00E-05, loss= 1.1915 (max= 1.5636), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:16,865 - root - INFO - Step 24310: lr=1.00E-05, loss= 1.2011 (max= 1.6707), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:16,865 - root - INFO - Step 24310: lr=1.00E-05, loss= 1.2011 (max= 1.6707), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:16,865 - root - INFO - Step 24310: lr=1.00E-05, loss= 1.2011 (max= 1.6707), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:16,865 - root - INFO - Step 24310: lr=1.00E-05, loss= 1.2011 (max= 1.6707), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:16,865 - root - INFO - Step 24310: lr=1.00E-05, loss= 1.2011 (max= 1.6707), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:16,865 - root - INFO - Step 24310: lr=1.00E-05, loss= 1.2011 (max= 1.6707), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:16,865 - root - INFO - Step 24310: lr=1.00E-05, loss= 1.2011 (max= 1.6707), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:16,865 - root - INFO - Step 24310: lr=1.00E-05, loss= 1.2011 (max= 1.6707), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:32,880 - root - INFO - Step 24320: lr=1.00E-05, loss= 1.1542 (max= 1.4761), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:32,880 - root - INFO - Step 24320: lr=1.00E-05, loss= 1.1542 (max= 1.4761), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:32,880 - root - INFO - Step 24320: lr=1.00E-05, loss= 1.1542 (max= 1.4761), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:32,880 - root - INFO - Step 24320: lr=1.00E-05, loss= 1.1542 (max= 1.4761), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:32,880 - root - INFO - Step 24320: lr=1.00E-05, loss= 1.1542 (max= 1.4761), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:32,880 - root - INFO - Step 24320: lr=1.00E-05, loss= 1.1542 (max= 1.4761), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:32,880 - root - INFO - Step 24320: lr=1.00E-05, loss= 1.1542 (max= 1.4761), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:32,880 - root - INFO - Step 24320: lr=1.00E-05, loss= 1.1542 (max= 1.4761), tps=20466, mfu=42.64%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:20:48,829 - root - INFO - Step 24330: lr=1.00E-05, loss= 1.1687 (max= 1.5651), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:20:48,829 - root - INFO - Step 24330: lr=1.00E-05, loss= 1.1687 (max= 1.5651), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:20:48,829 - root - INFO - Step 24330: lr=1.00E-05, loss= 1.1687 (max= 1.5651), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:20:48,829 - root - INFO - Step 24330: lr=1.00E-05, loss= 1.1687 (max= 1.5651), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:20:48,829 - root - INFO - Step 24330: lr=1.00E-05, loss= 1.1687 (max= 1.5651), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:20:48,829 - root - INFO - Step 24330: lr=1.00E-05, loss= 1.1687 (max= 1.5651), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:20:48,829 - root - INFO - Step 24330: lr=1.00E-05, loss= 1.1687 (max= 1.5651), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:20:48,829 - root - INFO - Step 24330: lr=1.00E-05, loss= 1.1687 (max= 1.5651), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:21:04,745 - root - INFO - Step 24340: lr=1.00E-05, loss= 1.1987 (max= 1.8191), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:21:04,745 - root - INFO - Step 24340: lr=1.00E-05, loss= 1.1987 (max= 1.8191), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:21:04,745 - root - INFO - Step 24340: lr=1.00E-05, loss= 1.1987 (max= 1.8191), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:21:04,745 - root - INFO - Step 24340: lr=1.00E-05, loss= 1.1987 (max= 1.8191), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:21:04,745 - root - INFO - Step 24340: lr=1.00E-05, loss= 1.1987 (max= 1.8191), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:21:04,745 - root - INFO - Step 24340: lr=1.00E-05, loss= 1.1987 (max= 1.8191), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:21:04,746 - root - INFO - Step 24340: lr=1.00E-05, loss= 1.1987 (max= 1.8191), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:21:04,746 - root - INFO - Step 24340: lr=1.00E-05, loss= 1.1987 (max= 1.8191), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:21:20,682 - root - INFO - Step 24350: lr=1.00E-05, loss= 1.1683 (max= 1.4934), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:21:20,682 - root - INFO - Step 24350: lr=1.00E-05, loss= 1.1683 (max= 1.4934), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:21:20,682 - root - INFO - Step 24350: lr=1.00E-05, loss= 1.1683 (max= 1.4934), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:21:20,682 - root - INFO - Step 24350: lr=1.00E-05, loss= 1.1683 (max= 1.4934), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:21:20,682 - root - INFO - Step 24350: lr=1.00E-05, loss= 1.1683 (max= 1.4934), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:21:20,682 - root - INFO - Step 24350: lr=1.00E-05, loss= 1.1683 (max= 1.4934), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:21:20,682 - root - INFO - Step 24350: lr=1.00E-05, loss= 1.1683 (max= 1.4934), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:21:20,682 - root - INFO - Step 24350: lr=1.00E-05, loss= 1.1683 (max= 1.4934), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:21:36,614 - root - INFO - Step 24360: lr=1.00E-05, loss= 1.1850 (max= 1.6127), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:21:36,614 - root - INFO - Step 24360: lr=1.00E-05, loss= 1.1850 (max= 1.6127), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:21:36,614 - root - INFO - Step 24360: lr=1.00E-05, loss= 1.1850 (max= 1.6127), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:21:36,614 - root - INFO - Step 24360: lr=1.00E-05, loss= 1.1850 (max= 1.6127), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:21:36,614 - root - INFO - Step 24360: lr=1.00E-05, loss= 1.1850 (max= 1.6127), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:21:36,614 - root - INFO - Step 24360: lr=1.00E-05, loss= 1.1850 (max= 1.6127), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:21:36,614 - root - INFO - Step 24360: lr=1.00E-05, loss= 1.1850 (max= 1.6127), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:21:36,614 - root - INFO - Step 24360: lr=1.00E-05, loss= 1.1850 (max= 1.6127), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:21:52,490 - root - INFO - Step 24370: lr=1.00E-05, loss= 1.1937 (max= 1.6646), tps=20645, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:21:52,490 - root - INFO - Step 24370: lr=1.00E-05, loss= 1.1937 (max= 1.6646), tps=20645, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:21:52,490 - root - INFO - Step 24370: lr=1.00E-05, loss= 1.1937 (max= 1.6646), tps=20645, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:21:52,490 - root - INFO - Step 24370: lr=1.00E-05, loss= 1.1937 (max= 1.6646), tps=20645, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:21:52,490 - root - INFO - Step 24370: lr=1.00E-05, loss= 1.1937 (max= 1.6646), tps=20645, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:21:52,490 - root - INFO - Step 24370: lr=1.00E-05, loss= 1.1937 (max= 1.6646), tps=20645, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:21:52,490 - root - INFO - Step 24370: lr=1.00E-05, loss= 1.1937 (max= 1.6646), tps=20645, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:21:52,491 - root - INFO - Step 24370: lr=1.00E-05, loss= 1.1937 (max= 1.6646), tps=20644, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:22:08,434 - root - INFO - Step 24380: lr=1.00E-05, loss= 1.1937 (max= 1.6537), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:22:08,434 - root - INFO - Step 24380: lr=1.00E-05, loss= 1.1937 (max= 1.6537), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:22:08,434 - root - INFO - Step 24380: lr=1.00E-05, loss= 1.1937 (max= 1.6537), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:22:08,434 - root - INFO - Step 24380: lr=1.00E-05, loss= 1.1937 (max= 1.6537), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:22:08,434 - root - INFO - Step 24380: lr=1.00E-05, loss= 1.1937 (max= 1.6537), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:22:08,434 - root - INFO - Step 24380: lr=1.00E-05, loss= 1.1937 (max= 1.6537), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:22:08,434 - root - INFO - Step 24380: lr=1.00E-05, loss= 1.1937 (max= 1.6537), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:22:08,434 - root - INFO - Step 24380: lr=1.00E-05, loss= 1.1937 (max= 1.6537), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:22:24,372 - root - INFO - Step 24390: lr=1.00E-05, loss= 1.1879 (max= 1.8255), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:24,373 - root - INFO - Step 24390: lr=1.00E-05, loss= 1.1879 (max= 1.8255), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:24,373 - root - INFO - Step 24390: lr=1.00E-05, loss= 1.1879 (max= 1.8255), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:24,373 - root - INFO - Step 24390: lr=1.00E-05, loss= 1.1879 (max= 1.8255), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:24,373 - root - INFO - Step 24390: lr=1.00E-05, loss= 1.1879 (max= 1.8255), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:24,373 - root - INFO - Step 24390: lr=1.00E-05, loss= 1.1879 (max= 1.8255), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:24,373 - root - INFO - Step 24390: lr=1.00E-05, loss= 1.1879 (max= 1.8255), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:24,373 - root - INFO - Step 24390: lr=1.00E-05, loss= 1.1879 (max= 1.8255), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:40,290 - root - INFO - Step 24400: lr=1.00E-05, loss= 1.1481 (max= 1.5667), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:40,290 - root - INFO - Step 24400: lr=1.00E-05, loss= 1.1481 (max= 1.5667), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:40,290 - root - INFO - Step 24400: lr=1.00E-05, loss= 1.1481 (max= 1.5667), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:40,290 - root - INFO - Step 24400: lr=1.00E-05, loss= 1.1481 (max= 1.5667), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:40,290 - root - INFO - Step 24400: lr=1.00E-05, loss= 1.1481 (max= 1.5667), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:40,290 - root - INFO - Step 24400: lr=1.00E-05, loss= 1.1481 (max= 1.5667), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:40,290 - root - INFO - Step 24400: lr=1.00E-05, loss= 1.1481 (max= 1.5667), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:40,291 - root - INFO - Step 24400: lr=1.00E-05, loss= 1.1481 (max= 1.5667), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:56,207 - root - INFO - Step 24410: lr=1.00E-05, loss= 1.1928 (max= 1.6575), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:56,207 - root - INFO - Step 24410: lr=1.00E-05, loss= 1.1928 (max= 1.6575), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:56,207 - root - INFO - Step 24410: lr=1.00E-05, loss= 1.1928 (max= 1.6575), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:56,207 - root - INFO - Step 24410: lr=1.00E-05, loss= 1.1928 (max= 1.6575), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:56,207 - root - INFO - Step 24410: lr=1.00E-05, loss= 1.1928 (max= 1.6575), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:56,207 - root - INFO - Step 24410: lr=1.00E-05, loss= 1.1928 (max= 1.6575), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:56,207 - root - INFO - Step 24410: lr=1.00E-05, loss= 1.1928 (max= 1.6575), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:22:56,207 - root - INFO - Step 24410: lr=1.00E-05, loss= 1.1928 (max= 1.6575), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:23:12,185 - root - INFO - Step 24420: lr=1.00E-05, loss= 1.2137 (max= 1.6762), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:12,185 - root - INFO - Step 24420: lr=1.00E-05, loss= 1.2137 (max= 1.6762), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:12,185 - root - INFO - Step 24420: lr=1.00E-05, loss= 1.2137 (max= 1.6762), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:12,185 - root - INFO - Step 24420: lr=1.00E-05, loss= 1.2137 (max= 1.6762), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:12,185 - root - INFO - Step 24420: lr=1.00E-05, loss= 1.2137 (max= 1.6762), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:12,185 - root - INFO - Step 24420: lr=1.00E-05, loss= 1.2137 (max= 1.6762), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:12,185 - root - INFO - Step 24420: lr=1.00E-05, loss= 1.2137 (max= 1.6762), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:12,186 - root - INFO - Step 24420: lr=1.00E-05, loss= 1.2137 (max= 1.6762), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:28,125 - root - INFO - Step 24430: lr=1.00E-05, loss= 1.1673 (max= 1.4965), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:28,125 - root - INFO - Step 24430: lr=1.00E-05, loss= 1.1673 (max= 1.4965), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:28,125 - root - INFO - Step 24430: lr=1.00E-05, loss= 1.1673 (max= 1.4965), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:28,125 - root - INFO - Step 24430: lr=1.00E-05, loss= 1.1673 (max= 1.4965), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:28,125 - root - INFO - Step 24430: lr=1.00E-05, loss= 1.1673 (max= 1.4965), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:28,125 - root - INFO - Step 24430: lr=1.00E-05, loss= 1.1673 (max= 1.4965), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:28,125 - root - INFO - Step 24430: lr=1.00E-05, loss= 1.1673 (max= 1.4965), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:28,125 - root - INFO - Step 24430: lr=1.00E-05, loss= 1.1673 (max= 1.4965), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:44,057 - root - INFO - Step 24440: lr=1.00E-05, loss= 1.1786 (max= 1.6403), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:23:44,057 - root - INFO - Step 24440: lr=1.00E-05, loss= 1.1786 (max= 1.6403), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:23:44,057 - root - INFO - Step 24440: lr=1.00E-05, loss= 1.1786 (max= 1.6403), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:23:44,057 - root - INFO - Step 24440: lr=1.00E-05, loss= 1.1786 (max= 1.6403), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:23:44,057 - root - INFO - Step 24440: lr=1.00E-05, loss= 1.1786 (max= 1.6403), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:23:44,057 - root - INFO - Step 24440: lr=1.00E-05, loss= 1.1786 (max= 1.6403), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:23:44,057 - root - INFO - Step 24440: lr=1.00E-05, loss= 1.1786 (max= 1.6403), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:23:44,057 - root - INFO - Step 24440: lr=1.00E-05, loss= 1.1786 (max= 1.6403), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:23:59,990 - root - INFO - Step 24450: lr=1.00E-05, loss= 1.1755 (max= 1.5874), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:59,990 - root - INFO - Step 24450: lr=1.00E-05, loss= 1.1755 (max= 1.5874), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:59,990 - root - INFO - Step 24450: lr=1.00E-05, loss= 1.1755 (max= 1.5874), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:59,990 - root - INFO - Step 24450: lr=1.00E-05, loss= 1.1755 (max= 1.5874), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:59,990 - root - INFO - Step 24450: lr=1.00E-05, loss= 1.1755 (max= 1.5874), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:59,990 - root - INFO - Step 24450: lr=1.00E-05, loss= 1.1755 (max= 1.5874), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:59,990 - root - INFO - Step 24450: lr=1.00E-05, loss= 1.1755 (max= 1.5874), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:23:59,990 - root - INFO - Step 24450: lr=1.00E-05, loss= 1.1755 (max= 1.5874), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:24:15,921 - root - INFO - Step 24460: lr=1.00E-05, loss= 1.1715 (max= 1.6412), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:15,922 - root - INFO - Step 24460: lr=1.00E-05, loss= 1.1715 (max= 1.6412), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:15,922 - root - INFO - Step 24460: lr=1.00E-05, loss= 1.1715 (max= 1.6412), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:15,922 - root - INFO - Step 24460: lr=1.00E-05, loss= 1.1715 (max= 1.6412), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:15,922 - root - INFO - Step 24460: lr=1.00E-05, loss= 1.1715 (max= 1.6412), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:15,922 - root - INFO - Step 24460: lr=1.00E-05, loss= 1.1715 (max= 1.6412), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:15,922 - root - INFO - Step 24460: lr=1.00E-05, loss= 1.1715 (max= 1.6412), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:15,922 - root - INFO - Step 24460: lr=1.00E-05, loss= 1.1715 (max= 1.6412), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:31,921 - root - INFO - Step 24470: lr=1.00E-05, loss= 1.1824 (max= 1.5982), tps=20485, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:31,921 - root - INFO - Step 24470: lr=1.00E-05, loss= 1.1824 (max= 1.5982), tps=20486, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:31,921 - root - INFO - Step 24470: lr=1.00E-05, loss= 1.1824 (max= 1.5982), tps=20485, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:31,921 - root - INFO - Step 24470: lr=1.00E-05, loss= 1.1824 (max= 1.5982), tps=20485, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:31,921 - root - INFO - Step 24470: lr=1.00E-05, loss= 1.1824 (max= 1.5982), tps=20485, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:31,921 - root - INFO - Step 24470: lr=1.00E-05, loss= 1.1824 (max= 1.5982), tps=20485, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:31,921 - root - INFO - Step 24470: lr=1.00E-05, loss= 1.1824 (max= 1.5982), tps=20485, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:31,921 - root - INFO - Step 24470: lr=1.00E-05, loss= 1.1824 (max= 1.5982), tps=20485, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:47,901 - root - INFO - Step 24480: lr=1.00E-05, loss= 1.1843 (max= 1.6491), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:47,901 - root - INFO - Step 24480: lr=1.00E-05, loss= 1.1843 (max= 1.6491), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:47,901 - root - INFO - Step 24480: lr=1.00E-05, loss= 1.1843 (max= 1.6491), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:47,901 - root - INFO - Step 24480: lr=1.00E-05, loss= 1.1843 (max= 1.6491), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:47,901 - root - INFO - Step 24480: lr=1.00E-05, loss= 1.1843 (max= 1.6491), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:47,901 - root - INFO - Step 24480: lr=1.00E-05, loss= 1.1843 (max= 1.6491), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:47,901 - root - INFO - Step 24480: lr=1.00E-05, loss= 1.1843 (max= 1.6491), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:24:47,901 - root - INFO - Step 24480: lr=1.00E-05, loss= 1.1843 (max= 1.6491), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:03,834 - root - INFO - Step 24490: lr=1.00E-05, loss= 1.1959 (max= 1.6008), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:03,835 - root - INFO - Step 24490: lr=1.00E-05, loss= 1.1959 (max= 1.6008), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:03,835 - root - INFO - Step 24490: lr=1.00E-05, loss= 1.1959 (max= 1.6008), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:03,835 - root - INFO - Step 24490: lr=1.00E-05, loss= 1.1959 (max= 1.6008), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:03,835 - root - INFO - Step 24490: lr=1.00E-05, loss= 1.1959 (max= 1.6008), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:03,835 - root - INFO - Step 24490: lr=1.00E-05, loss= 1.1959 (max= 1.6008), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:03,835 - root - INFO - Step 24490: lr=1.00E-05, loss= 1.1959 (max= 1.6008), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:03,835 - root - INFO - Step 24490: lr=1.00E-05, loss= 1.1959 (max= 1.6008), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:19,722 - root - INFO - Step 24500: lr=1.00E-05, loss= 1.1687 (max= 1.5852), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:25:19,722 - root - INFO - Step 24500: lr=1.00E-05, loss= 1.1687 (max= 1.5852), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:25:19,722 - root - INFO - Step 24500: lr=1.00E-05, loss= 1.1687 (max= 1.5852), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:25:19,722 - root - INFO - Step 24500: lr=1.00E-05, loss= 1.1687 (max= 1.5852), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:25:19,722 - root - INFO - Step 24500: lr=1.00E-05, loss= 1.1687 (max= 1.5852), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:25:19,722 - root - INFO - Step 24500: lr=1.00E-05, loss= 1.1687 (max= 1.5852), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:25:19,722 - root - INFO - Step 24500: lr=1.00E-05, loss= 1.1687 (max= 1.5852), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:25:19,723 - root - INFO - Step 24500: lr=1.00E-05, loss= 1.1687 (max= 1.5852), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:25:35,605 - root - INFO - Step 24510: lr=1.00E-05, loss= 1.1638 (max= 1.6435), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:35,605 - root - INFO - Step 24510: lr=1.00E-05, loss= 1.1638 (max= 1.6435), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:35,605 - root - INFO - Step 24510: lr=1.00E-05, loss= 1.1638 (max= 1.6435), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:35,605 - root - INFO - Step 24510: lr=1.00E-05, loss= 1.1638 (max= 1.6435), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:35,605 - root - INFO - Step 24510: lr=1.00E-05, loss= 1.1638 (max= 1.6435), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:35,605 - root - INFO - Step 24510: lr=1.00E-05, loss= 1.1638 (max= 1.6435), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:35,606 - root - INFO - Step 24510: lr=1.00E-05, loss= 1.1638 (max= 1.6435), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:35,606 - root - INFO - Step 24510: lr=1.00E-05, loss= 1.1638 (max= 1.6435), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:51,557 - root - INFO - Step 24520: lr=1.00E-05, loss= 1.1855 (max= 1.5070), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:51,557 - root - INFO - Step 24520: lr=1.00E-05, loss= 1.1855 (max= 1.5070), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:51,557 - root - INFO - Step 24520: lr=1.00E-05, loss= 1.1855 (max= 1.5070), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:51,557 - root - INFO - Step 24520: lr=1.00E-05, loss= 1.1855 (max= 1.5070), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:51,558 - root - INFO - Step 24520: lr=1.00E-05, loss= 1.1855 (max= 1.5070), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:51,558 - root - INFO - Step 24520: lr=1.00E-05, loss= 1.1855 (max= 1.5070), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:51,558 - root - INFO - Step 24520: lr=1.00E-05, loss= 1.1855 (max= 1.5070), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:25:51,558 - root - INFO - Step 24520: lr=1.00E-05, loss= 1.1855 (max= 1.5070), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:26:07,497 - root - INFO - Step 24530: lr=1.00E-05, loss= 1.1712 (max= 1.5511), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:26:07,497 - root - INFO - Step 24530: lr=1.00E-05, loss= 1.1712 (max= 1.5511), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:26:07,497 - root - INFO - Step 24530: lr=1.00E-05, loss= 1.1712 (max= 1.5511), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:26:07,497 - root - INFO - Step 24530: lr=1.00E-05, loss= 1.1712 (max= 1.5511), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:26:07,497 - root - INFO - Step 24530: lr=1.00E-05, loss= 1.1712 (max= 1.5511), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:26:07,497 - root - INFO - Step 24530: lr=1.00E-05, loss= 1.1712 (max= 1.5511), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:26:07,497 - root - INFO - Step 24530: lr=1.00E-05, loss= 1.1712 (max= 1.5511), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:26:07,498 - root - INFO - Step 24530: lr=1.00E-05, loss= 1.1712 (max= 1.5511), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:26:23,453 - root - INFO - Step 24540: lr=1.00E-05, loss= 1.1999 (max= 1.7639), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:26:23,453 - root - INFO - Step 24540: lr=1.00E-05, loss= 1.1999 (max= 1.7639), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:26:23,453 - root - INFO - Step 24540: lr=1.00E-05, loss= 1.1999 (max= 1.7639), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:26:23,453 - root - INFO - Step 24540: lr=1.00E-05, loss= 1.1999 (max= 1.7639), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:26:23,453 - root - INFO - Step 24540: lr=1.00E-05, loss= 1.1999 (max= 1.7639), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:26:23,453 - root - INFO - Step 24540: lr=1.00E-05, loss= 1.1999 (max= 1.7639), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:26:23,453 - root - INFO - Step 24540: lr=1.00E-05, loss= 1.1999 (max= 1.7639), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:26:23,453 - root - INFO - Step 24540: lr=1.00E-05, loss= 1.1999 (max= 1.7639), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:26:39,434 - root - INFO - Step 24550: lr=1.00E-05, loss= 1.1434 (max= 1.5628), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:26:39,434 - root - INFO - Step 24550: lr=1.00E-05, loss= 1.1434 (max= 1.5628), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:26:39,434 - root - INFO - Step 24550: lr=1.00E-05, loss= 1.1434 (max= 1.5628), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:26:39,434 - root - INFO - Step 24550: lr=1.00E-05, loss= 1.1434 (max= 1.5628), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:26:39,434 - root - INFO - Step 24550: lr=1.00E-05, loss= 1.1434 (max= 1.5628), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:26:39,434 - root - INFO - Step 24550: lr=1.00E-05, loss= 1.1434 (max= 1.5628), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:26:39,434 - root - INFO - Step 24550: lr=1.00E-05, loss= 1.1434 (max= 1.5628), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:26:39,434 - root - INFO - Step 24550: lr=1.00E-05, loss= 1.1434 (max= 1.5628), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:26:55,369 - root - INFO - Step 24560: lr=1.00E-05, loss= 1.1930 (max= 1.6049), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.01s (max=0.04s, 2.67%) +2025-10-24 21:26:55,369 - root - INFO - Step 24560: lr=1.00E-05, loss= 1.1930 (max= 1.6049), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.01s (max=0.04s, 2.67%) +2025-10-24 21:26:55,369 - root - INFO - Step 24560: lr=1.00E-05, loss= 1.1930 (max= 1.6049), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.01s (max=0.04s, 2.67%) +2025-10-24 21:26:55,369 - root - INFO - Step 24560: lr=1.00E-05, loss= 1.1930 (max= 1.6049), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.01s (max=0.04s, 2.67%) +2025-10-24 21:26:55,369 - root - INFO - Step 24560: lr=1.00E-05, loss= 1.1930 (max= 1.6049), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.01s (max=0.04s, 2.67%) +2025-10-24 21:26:55,369 - root - INFO - Step 24560: lr=1.00E-05, loss= 1.1930 (max= 1.6049), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.01s (max=0.04s, 2.67%) +2025-10-24 21:26:55,369 - root - INFO - Step 24560: lr=1.00E-05, loss= 1.1930 (max= 1.6049), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.01s (max=0.04s, 2.67%) +2025-10-24 21:26:55,369 - root - INFO - Step 24560: lr=1.00E-05, loss= 1.1930 (max= 1.6049), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.01s (max=0.04s, 2.67%) +2025-10-24 21:27:11,284 - root - INFO - Step 24570: lr=1.00E-05, loss= 1.2091 (max= 1.5861), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:11,284 - root - INFO - Step 24570: lr=1.00E-05, loss= 1.2091 (max= 1.5861), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:11,284 - root - INFO - Step 24570: lr=1.00E-05, loss= 1.2091 (max= 1.5861), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:11,284 - root - INFO - Step 24570: lr=1.00E-05, loss= 1.2091 (max= 1.5861), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:11,284 - root - INFO - Step 24570: lr=1.00E-05, loss= 1.2091 (max= 1.5861), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:11,284 - root - INFO - Step 24570: lr=1.00E-05, loss= 1.2091 (max= 1.5861), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:11,284 - root - INFO - Step 24570: lr=1.00E-05, loss= 1.2091 (max= 1.5861), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:11,284 - root - INFO - Step 24570: lr=1.00E-05, loss= 1.2091 (max= 1.5861), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:27,218 - root - INFO - Step 24580: lr=1.00E-05, loss= 1.1606 (max= 1.6303), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:27,219 - root - INFO - Step 24580: lr=1.00E-05, loss= 1.1606 (max= 1.6303), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:27,219 - root - INFO - Step 24580: lr=1.00E-05, loss= 1.1606 (max= 1.6303), tps=20569, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:27,219 - root - INFO - Step 24580: lr=1.00E-05, loss= 1.1606 (max= 1.6303), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:27,219 - root - INFO - Step 24580: lr=1.00E-05, loss= 1.1606 (max= 1.6303), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:27,219 - root - INFO - Step 24580: lr=1.00E-05, loss= 1.1606 (max= 1.6303), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:27,219 - root - INFO - Step 24580: lr=1.00E-05, loss= 1.1606 (max= 1.6303), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:27,219 - root - INFO - Step 24580: lr=1.00E-05, loss= 1.1606 (max= 1.6303), tps=20569, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:43,143 - root - INFO - Step 24590: lr=1.00E-05, loss= 1.1909 (max= 1.6510), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:43,144 - root - INFO - Step 24590: lr=1.00E-05, loss= 1.1909 (max= 1.6510), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:43,144 - root - INFO - Step 24590: lr=1.00E-05, loss= 1.1909 (max= 1.6510), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:43,144 - root - INFO - Step 24590: lr=1.00E-05, loss= 1.1909 (max= 1.6510), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:43,144 - root - INFO - Step 24590: lr=1.00E-05, loss= 1.1909 (max= 1.6510), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:43,144 - root - INFO - Step 24590: lr=1.00E-05, loss= 1.1909 (max= 1.6510), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:43,144 - root - INFO - Step 24590: lr=1.00E-05, loss= 1.1909 (max= 1.6510), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:43,144 - root - INFO - Step 24590: lr=1.00E-05, loss= 1.1909 (max= 1.6510), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:59,088 - root - INFO - Step 24600: lr=1.00E-05, loss= 1.1942 (max= 1.7487), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:59,088 - root - INFO - Step 24600: lr=1.00E-05, loss= 1.1942 (max= 1.7487), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:59,088 - root - INFO - Step 24600: lr=1.00E-05, loss= 1.1942 (max= 1.7487), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:59,088 - root - INFO - Step 24600: lr=1.00E-05, loss= 1.1942 (max= 1.7487), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:59,088 - root - INFO - Step 24600: lr=1.00E-05, loss= 1.1942 (max= 1.7487), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:59,088 - root - INFO - Step 24600: lr=1.00E-05, loss= 1.1942 (max= 1.7487), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:59,088 - root - INFO - Step 24600: lr=1.00E-05, loss= 1.1942 (max= 1.7487), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:27:59,088 - root - INFO - Step 24600: lr=1.00E-05, loss= 1.1942 (max= 1.7487), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:15,059 - root - INFO - Step 24610: lr=1.00E-05, loss= 1.1928 (max= 1.7287), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:15,060 - root - INFO - Step 24610: lr=1.00E-05, loss= 1.1928 (max= 1.7287), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:15,060 - root - INFO - Step 24610: lr=1.00E-05, loss= 1.1928 (max= 1.7287), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:15,060 - root - INFO - Step 24610: lr=1.00E-05, loss= 1.1928 (max= 1.7287), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:15,060 - root - INFO - Step 24610: lr=1.00E-05, loss= 1.1928 (max= 1.7287), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:15,060 - root - INFO - Step 24610: lr=1.00E-05, loss= 1.1928 (max= 1.7287), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:15,060 - root - INFO - Step 24610: lr=1.00E-05, loss= 1.1928 (max= 1.7287), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:15,060 - root - INFO - Step 24610: lr=1.00E-05, loss= 1.1928 (max= 1.7287), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:31,000 - root - INFO - Step 24620: lr=1.00E-05, loss= 1.1660 (max= 1.6164), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:31,001 - root - INFO - Step 24620: lr=1.00E-05, loss= 1.1660 (max= 1.6164), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:31,001 - root - INFO - Step 24620: lr=1.00E-05, loss= 1.1660 (max= 1.6164), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:31,001 - root - INFO - Step 24620: lr=1.00E-05, loss= 1.1660 (max= 1.6164), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:31,001 - root - INFO - Step 24620: lr=1.00E-05, loss= 1.1660 (max= 1.6164), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:31,001 - root - INFO - Step 24620: lr=1.00E-05, loss= 1.1660 (max= 1.6164), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:31,001 - root - INFO - Step 24620: lr=1.00E-05, loss= 1.1660 (max= 1.6164), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:31,001 - root - INFO - Step 24620: lr=1.00E-05, loss= 1.1660 (max= 1.6164), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:46,977 - root - INFO - Step 24630: lr=1.00E-05, loss= 1.1732 (max= 1.6110), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:46,977 - root - INFO - Step 24630: lr=1.00E-05, loss= 1.1732 (max= 1.6110), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:46,977 - root - INFO - Step 24630: lr=1.00E-05, loss= 1.1732 (max= 1.6110), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:46,977 - root - INFO - Step 24630: lr=1.00E-05, loss= 1.1732 (max= 1.6110), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:46,977 - root - INFO - Step 24630: lr=1.00E-05, loss= 1.1732 (max= 1.6110), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:46,977 - root - INFO - Step 24630: lr=1.00E-05, loss= 1.1732 (max= 1.6110), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:46,977 - root - INFO - Step 24630: lr=1.00E-05, loss= 1.1732 (max= 1.6110), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:28:46,977 - root - INFO - Step 24630: lr=1.00E-05, loss= 1.1732 (max= 1.6110), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:02,900 - root - INFO - Step 24640: lr=1.00E-05, loss= 1.1748 (max= 1.5027), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:02,900 - root - INFO - Step 24640: lr=1.00E-05, loss= 1.1748 (max= 1.5027), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:02,900 - root - INFO - Step 24640: lr=1.00E-05, loss= 1.1748 (max= 1.5027), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:02,900 - root - INFO - Step 24640: lr=1.00E-05, loss= 1.1748 (max= 1.5027), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:02,900 - root - INFO - Step 24640: lr=1.00E-05, loss= 1.1748 (max= 1.5027), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:02,900 - root - INFO - Step 24640: lr=1.00E-05, loss= 1.1748 (max= 1.5027), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:02,900 - root - INFO - Step 24640: lr=1.00E-05, loss= 1.1748 (max= 1.5027), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:02,900 - root - INFO - Step 24640: lr=1.00E-05, loss= 1.1748 (max= 1.5027), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:18,832 - root - INFO - Step 24650: lr=1.00E-05, loss= 1.1448 (max= 1.5277), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:18,832 - root - INFO - Step 24650: lr=1.00E-05, loss= 1.1448 (max= 1.5277), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:18,832 - root - INFO - Step 24650: lr=1.00E-05, loss= 1.1448 (max= 1.5277), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:18,832 - root - INFO - Step 24650: lr=1.00E-05, loss= 1.1448 (max= 1.5277), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:18,832 - root - INFO - Step 24650: lr=1.00E-05, loss= 1.1448 (max= 1.5277), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:18,832 - root - INFO - Step 24650: lr=1.00E-05, loss= 1.1448 (max= 1.5277), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:18,832 - root - INFO - Step 24650: lr=1.00E-05, loss= 1.1448 (max= 1.5277), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:18,832 - root - INFO - Step 24650: lr=1.00E-05, loss= 1.1448 (max= 1.5277), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:34,764 - root - INFO - Step 24660: lr=1.00E-05, loss= 1.1911 (max= 1.5485), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:34,764 - root - INFO - Step 24660: lr=1.00E-05, loss= 1.1911 (max= 1.5485), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:34,764 - root - INFO - Step 24660: lr=1.00E-05, loss= 1.1911 (max= 1.5485), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:34,764 - root - INFO - Step 24660: lr=1.00E-05, loss= 1.1911 (max= 1.5485), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:34,764 - root - INFO - Step 24660: lr=1.00E-05, loss= 1.1911 (max= 1.5485), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:34,764 - root - INFO - Step 24660: lr=1.00E-05, loss= 1.1911 (max= 1.5485), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:34,764 - root - INFO - Step 24660: lr=1.00E-05, loss= 1.1911 (max= 1.5485), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:34,764 - root - INFO - Step 24660: lr=1.00E-05, loss= 1.1911 (max= 1.5485), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:50,710 - root - INFO - Step 24670: lr=1.00E-05, loss= 1.1515 (max= 1.5829), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:50,710 - root - INFO - Step 24670: lr=1.00E-05, loss= 1.1515 (max= 1.5829), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:50,710 - root - INFO - Step 24670: lr=1.00E-05, loss= 1.1515 (max= 1.5829), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:50,710 - root - INFO - Step 24670: lr=1.00E-05, loss= 1.1515 (max= 1.5829), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:50,710 - root - INFO - Step 24670: lr=1.00E-05, loss= 1.1515 (max= 1.5829), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:50,710 - root - INFO - Step 24670: lr=1.00E-05, loss= 1.1515 (max= 1.5829), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:50,710 - root - INFO - Step 24670: lr=1.00E-05, loss= 1.1515 (max= 1.5829), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:29:50,710 - root - INFO - Step 24670: lr=1.00E-05, loss= 1.1515 (max= 1.5829), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:06,657 - root - INFO - Step 24680: lr=1.00E-05, loss= 1.1846 (max= 1.6710), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:06,657 - root - INFO - Step 24680: lr=1.00E-05, loss= 1.1846 (max= 1.6710), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:06,657 - root - INFO - Step 24680: lr=1.00E-05, loss= 1.1846 (max= 1.6710), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:06,657 - root - INFO - Step 24680: lr=1.00E-05, loss= 1.1846 (max= 1.6710), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:06,657 - root - INFO - Step 24680: lr=1.00E-05, loss= 1.1846 (max= 1.6710), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:06,657 - root - INFO - Step 24680: lr=1.00E-05, loss= 1.1846 (max= 1.6710), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:06,657 - root - INFO - Step 24680: lr=1.00E-05, loss= 1.1846 (max= 1.6710), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:06,658 - root - INFO - Step 24680: lr=1.00E-05, loss= 1.1846 (max= 1.6710), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:22,566 - root - INFO - Step 24690: lr=1.00E-05, loss= 1.2173 (max= 1.6852), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:22,566 - root - INFO - Step 24690: lr=1.00E-05, loss= 1.2173 (max= 1.6852), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:22,566 - root - INFO - Step 24690: lr=1.00E-05, loss= 1.2173 (max= 1.6852), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:22,566 - root - INFO - Step 24690: lr=1.00E-05, loss= 1.2173 (max= 1.6852), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:22,567 - root - INFO - Step 24690: lr=1.00E-05, loss= 1.2173 (max= 1.6852), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:22,567 - root - INFO - Step 24690: lr=1.00E-05, loss= 1.2173 (max= 1.6852), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:22,567 - root - INFO - Step 24690: lr=1.00E-05, loss= 1.2173 (max= 1.6852), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:22,567 - root - INFO - Step 24690: lr=1.00E-05, loss= 1.2173 (max= 1.6852), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:38,486 - root - INFO - Step 24700: lr=1.00E-05, loss= 1.1818 (max= 1.6808), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:38,486 - root - INFO - Step 24700: lr=1.00E-05, loss= 1.1818 (max= 1.6808), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:38,487 - root - INFO - Step 24700: lr=1.00E-05, loss= 1.1818 (max= 1.6808), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:38,487 - root - INFO - Step 24700: lr=1.00E-05, loss= 1.1818 (max= 1.6808), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:38,487 - root - INFO - Step 24700: lr=1.00E-05, loss= 1.1818 (max= 1.6808), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:38,487 - root - INFO - Step 24700: lr=1.00E-05, loss= 1.1818 (max= 1.6808), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:38,487 - root - INFO - Step 24700: lr=1.00E-05, loss= 1.1818 (max= 1.6808), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:38,487 - root - INFO - Step 24700: lr=1.00E-05, loss= 1.1818 (max= 1.6808), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:54,421 - root - INFO - Step 24710: lr=1.00E-05, loss= 1.1901 (max= 1.5486), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:54,421 - root - INFO - Step 24710: lr=1.00E-05, loss= 1.1901 (max= 1.5486), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:54,421 - root - INFO - Step 24710: lr=1.00E-05, loss= 1.1901 (max= 1.5486), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:54,421 - root - INFO - Step 24710: lr=1.00E-05, loss= 1.1901 (max= 1.5486), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:54,421 - root - INFO - Step 24710: lr=1.00E-05, loss= 1.1901 (max= 1.5486), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:54,421 - root - INFO - Step 24710: lr=1.00E-05, loss= 1.1901 (max= 1.5486), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:54,422 - root - INFO - Step 24710: lr=1.00E-05, loss= 1.1901 (max= 1.5486), tps=20569, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:30:54,422 - root - INFO - Step 24710: lr=1.00E-05, loss= 1.1901 (max= 1.5486), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:10,400 - root - INFO - Step 24720: lr=1.00E-05, loss= 1.1702 (max= 1.6276), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:10,400 - root - INFO - Step 24720: lr=1.00E-05, loss= 1.1702 (max= 1.6276), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:10,400 - root - INFO - Step 24720: lr=1.00E-05, loss= 1.1702 (max= 1.6276), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:10,400 - root - INFO - Step 24720: lr=1.00E-05, loss= 1.1702 (max= 1.6276), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:10,400 - root - INFO - Step 24720: lr=1.00E-05, loss= 1.1702 (max= 1.6276), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:10,400 - root - INFO - Step 24720: lr=1.00E-05, loss= 1.1702 (max= 1.6276), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:10,400 - root - INFO - Step 24720: lr=1.00E-05, loss= 1.1702 (max= 1.6276), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:10,401 - root - INFO - Step 24720: lr=1.00E-05, loss= 1.1702 (max= 1.6276), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:26,341 - root - INFO - Step 24730: lr=1.00E-05, loss= 1.1914 (max= 1.5387), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:26,341 - root - INFO - Step 24730: lr=1.00E-05, loss= 1.1914 (max= 1.5387), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:26,342 - root - INFO - Step 24730: lr=1.00E-05, loss= 1.1914 (max= 1.5387), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:26,342 - root - INFO - Step 24730: lr=1.00E-05, loss= 1.1914 (max= 1.5387), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:26,342 - root - INFO - Step 24730: lr=1.00E-05, loss= 1.1914 (max= 1.5387), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:26,342 - root - INFO - Step 24730: lr=1.00E-05, loss= 1.1914 (max= 1.5387), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:26,342 - root - INFO - Step 24730: lr=1.00E-05, loss= 1.1914 (max= 1.5387), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:26,342 - root - INFO - Step 24730: lr=1.00E-05, loss= 1.1914 (max= 1.5387), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:42,267 - root - INFO - Step 24740: lr=1.00E-05, loss= 1.1720 (max= 2.0128), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:42,268 - root - INFO - Step 24740: lr=1.00E-05, loss= 1.1720 (max= 2.0128), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:42,268 - root - INFO - Step 24740: lr=1.00E-05, loss= 1.1720 (max= 2.0128), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:42,268 - root - INFO - Step 24740: lr=1.00E-05, loss= 1.1720 (max= 2.0128), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:42,268 - root - INFO - Step 24740: lr=1.00E-05, loss= 1.1720 (max= 2.0128), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:42,268 - root - INFO - Step 24740: lr=1.00E-05, loss= 1.1720 (max= 2.0128), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:42,268 - root - INFO - Step 24740: lr=1.00E-05, loss= 1.1720 (max= 2.0128), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:42,268 - root - INFO - Step 24740: lr=1.00E-05, loss= 1.1720 (max= 2.0128), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:58,171 - root - INFO - Step 24750: lr=1.00E-05, loss= 1.1865 (max= 1.6213), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:58,171 - root - INFO - Step 24750: lr=1.00E-05, loss= 1.1865 (max= 1.6213), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:58,171 - root - INFO - Step 24750: lr=1.00E-05, loss= 1.1865 (max= 1.6213), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:58,171 - root - INFO - Step 24750: lr=1.00E-05, loss= 1.1865 (max= 1.6213), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:58,171 - root - INFO - Step 24750: lr=1.00E-05, loss= 1.1865 (max= 1.6213), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:58,171 - root - INFO - Step 24750: lr=1.00E-05, loss= 1.1865 (max= 1.6213), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:58,171 - root - INFO - Step 24750: lr=1.00E-05, loss= 1.1865 (max= 1.6213), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:31:58,171 - root - INFO - Step 24750: lr=1.00E-05, loss= 1.1865 (max= 1.6213), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:32:14,099 - root - INFO - Step 24760: lr=1.00E-05, loss= 1.2156 (max= 1.6076), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:32:14,099 - root - INFO - Step 24760: lr=1.00E-05, loss= 1.2156 (max= 1.6076), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:32:14,099 - root - INFO - Step 24760: lr=1.00E-05, loss= 1.2156 (max= 1.6076), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:32:14,099 - root - INFO - Step 24760: lr=1.00E-05, loss= 1.2156 (max= 1.6076), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:32:14,099 - root - INFO - Step 24760: lr=1.00E-05, loss= 1.2156 (max= 1.6076), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:32:14,099 - root - INFO - Step 24760: lr=1.00E-05, loss= 1.2156 (max= 1.6076), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:32:14,099 - root - INFO - Step 24760: lr=1.00E-05, loss= 1.2156 (max= 1.6076), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:32:14,099 - root - INFO - Step 24760: lr=1.00E-05, loss= 1.2156 (max= 1.6076), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:32:29,961 - root - INFO - Step 24770: lr=1.00E-05, loss= 1.1924 (max= 1.7624), tps=20662, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:32:29,961 - root - INFO - Step 24770: lr=1.00E-05, loss= 1.1924 (max= 1.7624), tps=20662, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:32:29,961 - root - INFO - Step 24770: lr=1.00E-05, loss= 1.1924 (max= 1.7624), tps=20663, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:32:29,961 - root - INFO - Step 24770: lr=1.00E-05, loss= 1.1924 (max= 1.7624), tps=20663, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:32:29,961 - root - INFO - Step 24770: lr=1.00E-05, loss= 1.1924 (max= 1.7624), tps=20663, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:32:29,961 - root - INFO - Step 24770: lr=1.00E-05, loss= 1.1924 (max= 1.7624), tps=20663, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:32:29,961 - root - INFO - Step 24770: lr=1.00E-05, loss= 1.1924 (max= 1.7624), tps=20663, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:32:29,961 - root - INFO - Step 24770: lr=1.00E-05, loss= 1.1924 (max= 1.7624), tps=20663, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:32:45,894 - root - INFO - Step 24780: lr=1.00E-05, loss= 1.1531 (max= 1.6327), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:32:45,894 - root - INFO - Step 24780: lr=1.00E-05, loss= 1.1531 (max= 1.6327), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:32:45,895 - root - INFO - Step 24780: lr=1.00E-05, loss= 1.1531 (max= 1.6327), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:32:45,895 - root - INFO - Step 24780: lr=1.00E-05, loss= 1.1531 (max= 1.6327), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:32:45,895 - root - INFO - Step 24780: lr=1.00E-05, loss= 1.1531 (max= 1.6327), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:32:45,895 - root - INFO - Step 24780: lr=1.00E-05, loss= 1.1531 (max= 1.6327), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:32:45,895 - root - INFO - Step 24780: lr=1.00E-05, loss= 1.1531 (max= 1.6327), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:32:45,895 - root - INFO - Step 24780: lr=1.00E-05, loss= 1.1531 (max= 1.6327), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:33:01,812 - root - INFO - Step 24790: lr=1.00E-05, loss= 1.1882 (max= 1.6002), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:33:01,812 - root - INFO - Step 24790: lr=1.00E-05, loss= 1.1882 (max= 1.6002), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:33:01,812 - root - INFO - Step 24790: lr=1.00E-05, loss= 1.1882 (max= 1.6002), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:33:01,812 - root - INFO - Step 24790: lr=1.00E-05, loss= 1.1882 (max= 1.6002), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:33:01,812 - root - INFO - Step 24790: lr=1.00E-05, loss= 1.1882 (max= 1.6002), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:33:01,812 - root - INFO - Step 24790: lr=1.00E-05, loss= 1.1882 (max= 1.6002), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:33:01,812 - root - INFO - Step 24790: lr=1.00E-05, loss= 1.1882 (max= 1.6002), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:33:01,812 - root - INFO - Step 24790: lr=1.00E-05, loss= 1.1882 (max= 1.6002), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:33:17,742 - root - INFO - Step 24800: lr=1.00E-05, loss= 1.1761 (max= 1.5841), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:33:17,743 - root - INFO - Step 24800: lr=1.00E-05, loss= 1.1761 (max= 1.5841), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:33:17,743 - root - INFO - Step 24800: lr=1.00E-05, loss= 1.1761 (max= 1.5841), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:33:17,743 - root - INFO - Step 24800: lr=1.00E-05, loss= 1.1761 (max= 1.5841), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:33:17,743 - root - INFO - Step 24800: lr=1.00E-05, loss= 1.1761 (max= 1.5841), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:33:17,743 - root - INFO - Step 24800: lr=1.00E-05, loss= 1.1761 (max= 1.5841), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:33:17,743 - root - INFO - Step 24800: lr=1.00E-05, loss= 1.1761 (max= 1.5841), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:33:17,743 - root - INFO - Step 24800: lr=1.00E-05, loss= 1.1761 (max= 1.5841), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:33:33,682 - root - INFO - Step 24810: lr=1.00E-05, loss= 1.1946 (max= 1.5163), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:33:33,682 - root - INFO - Step 24810: lr=1.00E-05, loss= 1.1946 (max= 1.5163), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:33:33,682 - root - INFO - Step 24810: lr=1.00E-05, loss= 1.1946 (max= 1.5163), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:33:33,682 - root - INFO - Step 24810: lr=1.00E-05, loss= 1.1946 (max= 1.5163), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:33:33,682 - root - INFO - Step 24810: lr=1.00E-05, loss= 1.1946 (max= 1.5163), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:33:33,682 - root - INFO - Step 24810: lr=1.00E-05, loss= 1.1946 (max= 1.5163), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:33:33,682 - root - INFO - Step 24810: lr=1.00E-05, loss= 1.1946 (max= 1.5163), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:33:33,682 - root - INFO - Step 24810: lr=1.00E-05, loss= 1.1946 (max= 1.5163), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:33:49,616 - root - INFO - Step 24820: lr=1.00E-05, loss= 1.1649 (max= 1.5690), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:33:49,616 - root - INFO - Step 24820: lr=1.00E-05, loss= 1.1649 (max= 1.5690), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:33:49,616 - root - INFO - Step 24820: lr=1.00E-05, loss= 1.1649 (max= 1.5690), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:33:49,617 - root - INFO - Step 24820: lr=1.00E-05, loss= 1.1649 (max= 1.5690), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:33:49,617 - root - INFO - Step 24820: lr=1.00E-05, loss= 1.1649 (max= 1.5690), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:33:49,617 - root - INFO - Step 24820: lr=1.00E-05, loss= 1.1649 (max= 1.5690), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:33:49,617 - root - INFO - Step 24820: lr=1.00E-05, loss= 1.1649 (max= 1.5690), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:33:49,617 - root - INFO - Step 24820: lr=1.00E-05, loss= 1.1649 (max= 1.5690), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:34:05,525 - root - INFO - Step 24830: lr=1.00E-05, loss= 1.2039 (max= 1.7589), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:05,525 - root - INFO - Step 24830: lr=1.00E-05, loss= 1.2039 (max= 1.7589), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:05,525 - root - INFO - Step 24830: lr=1.00E-05, loss= 1.2039 (max= 1.7589), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:05,525 - root - INFO - Step 24830: lr=1.00E-05, loss= 1.2039 (max= 1.7589), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:05,525 - root - INFO - Step 24830: lr=1.00E-05, loss= 1.2039 (max= 1.7589), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:05,525 - root - INFO - Step 24830: lr=1.00E-05, loss= 1.2039 (max= 1.7589), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:05,525 - root - INFO - Step 24830: lr=1.00E-05, loss= 1.2039 (max= 1.7589), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:05,525 - root - INFO - Step 24830: lr=1.00E-05, loss= 1.2039 (max= 1.7589), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:21,436 - root - INFO - Step 24840: lr=1.00E-05, loss= 1.2077 (max= 1.4586), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:34:21,436 - root - INFO - Step 24840: lr=1.00E-05, loss= 1.2077 (max= 1.4586), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:34:21,436 - root - INFO - Step 24840: lr=1.00E-05, loss= 1.2077 (max= 1.4586), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:34:21,436 - root - INFO - Step 24840: lr=1.00E-05, loss= 1.2077 (max= 1.4586), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:34:21,436 - root - INFO - Step 24840: lr=1.00E-05, loss= 1.2077 (max= 1.4586), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:34:21,436 - root - INFO - Step 24840: lr=1.00E-05, loss= 1.2077 (max= 1.4586), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:34:21,436 - root - INFO - Step 24840: lr=1.00E-05, loss= 1.2077 (max= 1.4586), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:34:21,436 - root - INFO - Step 24840: lr=1.00E-05, loss= 1.2077 (max= 1.4586), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:34:37,377 - root - INFO - Step 24850: lr=1.00E-05, loss= 1.1850 (max= 1.5523), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:37,377 - root - INFO - Step 24850: lr=1.00E-05, loss= 1.1850 (max= 1.5523), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:37,377 - root - INFO - Step 24850: lr=1.00E-05, loss= 1.1850 (max= 1.5523), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:37,377 - root - INFO - Step 24850: lr=1.00E-05, loss= 1.1850 (max= 1.5523), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:37,377 - root - INFO - Step 24850: lr=1.00E-05, loss= 1.1850 (max= 1.5523), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:37,377 - root - INFO - Step 24850: lr=1.00E-05, loss= 1.1850 (max= 1.5523), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:37,377 - root - INFO - Step 24850: lr=1.00E-05, loss= 1.1850 (max= 1.5523), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:37,378 - root - INFO - Step 24850: lr=1.00E-05, loss= 1.1850 (max= 1.5523), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:53,355 - root - INFO - Step 24860: lr=1.00E-05, loss= 1.1849 (max= 1.8265), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:53,355 - root - INFO - Step 24860: lr=1.00E-05, loss= 1.1849 (max= 1.8265), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:53,355 - root - INFO - Step 24860: lr=1.00E-05, loss= 1.1849 (max= 1.8265), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:53,355 - root - INFO - Step 24860: lr=1.00E-05, loss= 1.1849 (max= 1.8265), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:53,355 - root - INFO - Step 24860: lr=1.00E-05, loss= 1.1849 (max= 1.8265), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:53,355 - root - INFO - Step 24860: lr=1.00E-05, loss= 1.1849 (max= 1.8265), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:53,355 - root - INFO - Step 24860: lr=1.00E-05, loss= 1.1849 (max= 1.8265), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:34:53,355 - root - INFO - Step 24860: lr=1.00E-05, loss= 1.1849 (max= 1.8265), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:35:09,271 - root - INFO - Step 24870: lr=1.00E-05, loss= 1.2144 (max= 1.6072), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:35:09,271 - root - INFO - Step 24870: lr=1.00E-05, loss= 1.2144 (max= 1.6072), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:35:09,271 - root - INFO - Step 24870: lr=1.00E-05, loss= 1.2144 (max= 1.6072), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:35:09,271 - root - INFO - Step 24870: lr=1.00E-05, loss= 1.2144 (max= 1.6072), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:35:09,271 - root - INFO - Step 24870: lr=1.00E-05, loss= 1.2144 (max= 1.6072), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:35:09,271 - root - INFO - Step 24870: lr=1.00E-05, loss= 1.2144 (max= 1.6072), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:35:09,271 - root - INFO - Step 24870: lr=1.00E-05, loss= 1.2144 (max= 1.6072), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:35:09,271 - root - INFO - Step 24870: lr=1.00E-05, loss= 1.2144 (max= 1.6072), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:35:25,182 - root - INFO - Step 24880: lr=1.00E-05, loss= 1.1881 (max= 1.5082), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:35:25,182 - root - INFO - Step 24880: lr=1.00E-05, loss= 1.1881 (max= 1.5082), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:35:25,183 - root - INFO - Step 24880: lr=1.00E-05, loss= 1.1881 (max= 1.5082), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:35:25,183 - root - INFO - Step 24880: lr=1.00E-05, loss= 1.1881 (max= 1.5082), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:35:25,183 - root - INFO - Step 24880: lr=1.00E-05, loss= 1.1881 (max= 1.5082), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:35:25,183 - root - INFO - Step 24880: lr=1.00E-05, loss= 1.1881 (max= 1.5082), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:35:25,183 - root - INFO - Step 24880: lr=1.00E-05, loss= 1.1881 (max= 1.5082), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:35:25,183 - root - INFO - Step 24880: lr=1.00E-05, loss= 1.1881 (max= 1.5082), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:35:41,151 - root - INFO - Step 24890: lr=1.00E-05, loss= 1.2170 (max= 1.8732), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:35:41,151 - root - INFO - Step 24890: lr=1.00E-05, loss= 1.2170 (max= 1.8732), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:35:41,151 - root - INFO - Step 24890: lr=1.00E-05, loss= 1.2170 (max= 1.8732), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:35:41,151 - root - INFO - Step 24890: lr=1.00E-05, loss= 1.2170 (max= 1.8732), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:35:41,151 - root - INFO - Step 24890: lr=1.00E-05, loss= 1.2170 (max= 1.8732), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:35:41,151 - root - INFO - Step 24890: lr=1.00E-05, loss= 1.2170 (max= 1.8732), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:35:41,151 - root - INFO - Step 24890: lr=1.00E-05, loss= 1.2170 (max= 1.8732), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:35:41,152 - root - INFO - Step 24890: lr=1.00E-05, loss= 1.2170 (max= 1.8732), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:35:57,089 - root - INFO - Step 24900: lr=1.00E-05, loss= 1.2086 (max= 1.5511), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:35:57,089 - root - INFO - Step 24900: lr=1.00E-05, loss= 1.2086 (max= 1.5511), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:35:57,089 - root - INFO - Step 24900: lr=1.00E-05, loss= 1.2086 (max= 1.5511), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:35:57,089 - root - INFO - Step 24900: lr=1.00E-05, loss= 1.2086 (max= 1.5511), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:35:57,089 - root - INFO - Step 24900: lr=1.00E-05, loss= 1.2086 (max= 1.5511), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:35:57,089 - root - INFO - Step 24900: lr=1.00E-05, loss= 1.2086 (max= 1.5511), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:35:57,089 - root - INFO - Step 24900: lr=1.00E-05, loss= 1.2086 (max= 1.5511), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:35:57,089 - root - INFO - Step 24900: lr=1.00E-05, loss= 1.2086 (max= 1.5511), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:36:13,022 - root - INFO - Step 24910: lr=1.00E-05, loss= 1.2173 (max= 1.6692), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:36:13,022 - root - INFO - Step 24910: lr=1.00E-05, loss= 1.2173 (max= 1.6692), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:36:13,022 - root - INFO - Step 24910: lr=1.00E-05, loss= 1.2173 (max= 1.6692), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:36:13,022 - root - INFO - Step 24910: lr=1.00E-05, loss= 1.2173 (max= 1.6692), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:36:13,022 - root - INFO - Step 24910: lr=1.00E-05, loss= 1.2173 (max= 1.6692), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:36:13,022 - root - INFO - Step 24910: lr=1.00E-05, loss= 1.2173 (max= 1.6692), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:36:13,022 - root - INFO - Step 24910: lr=1.00E-05, loss= 1.2173 (max= 1.6692), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:36:13,022 - root - INFO - Step 24910: lr=1.00E-05, loss= 1.2173 (max= 1.6692), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:36:28,971 - root - INFO - Step 24920: lr=1.00E-05, loss= 1.1946 (max= 1.5874), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:36:28,971 - root - INFO - Step 24920: lr=1.00E-05, loss= 1.1946 (max= 1.5874), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:36:28,971 - root - INFO - Step 24920: lr=1.00E-05, loss= 1.1946 (max= 1.5874), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:36:28,971 - root - INFO - Step 24920: lr=1.00E-05, loss= 1.1946 (max= 1.5874), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:36:28,971 - root - INFO - Step 24920: lr=1.00E-05, loss= 1.1946 (max= 1.5874), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:36:28,971 - root - INFO - Step 24920: lr=1.00E-05, loss= 1.1946 (max= 1.5874), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:36:28,971 - root - INFO - Step 24920: lr=1.00E-05, loss= 1.1946 (max= 1.5874), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:36:28,971 - root - INFO - Step 24920: lr=1.00E-05, loss= 1.1946 (max= 1.5874), tps=20549, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:36:44,880 - root - INFO - Step 24930: lr=1.00E-05, loss= 1.1988 (max= 1.5560), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:36:44,880 - root - INFO - Step 24930: lr=1.00E-05, loss= 1.1988 (max= 1.5560), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:36:44,880 - root - INFO - Step 24930: lr=1.00E-05, loss= 1.1988 (max= 1.5560), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:36:44,880 - root - INFO - Step 24930: lr=1.00E-05, loss= 1.1988 (max= 1.5560), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:36:44,880 - root - INFO - Step 24930: lr=1.00E-05, loss= 1.1988 (max= 1.5560), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:36:44,880 - root - INFO - Step 24930: lr=1.00E-05, loss= 1.1988 (max= 1.5560), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:36:44,880 - root - INFO - Step 24930: lr=1.00E-05, loss= 1.1988 (max= 1.5560), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:36:44,880 - root - INFO - Step 24930: lr=1.00E-05, loss= 1.1988 (max= 1.5560), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:00,852 - root - INFO - Step 24940: lr=1.00E-05, loss= 1.1906 (max= 1.6459), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:00,852 - root - INFO - Step 24940: lr=1.00E-05, loss= 1.1906 (max= 1.6459), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:00,852 - root - INFO - Step 24940: lr=1.00E-05, loss= 1.1906 (max= 1.6459), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:00,853 - root - INFO - Step 24940: lr=1.00E-05, loss= 1.1906 (max= 1.6459), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:00,853 - root - INFO - Step 24940: lr=1.00E-05, loss= 1.1906 (max= 1.6459), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:00,853 - root - INFO - Step 24940: lr=1.00E-05, loss= 1.1906 (max= 1.6459), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:00,853 - root - INFO - Step 24940: lr=1.00E-05, loss= 1.1906 (max= 1.6459), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:00,853 - root - INFO - Step 24940: lr=1.00E-05, loss= 1.1906 (max= 1.6459), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:01,607 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:3365056 +2025-10-24 21:37:16,816 - root - INFO - Step 24950: lr=1.00E-05, loss= 1.1927 (max= 1.6034), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:16,816 - root - INFO - Step 24950: lr=1.00E-05, loss= 1.1927 (max= 1.6034), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:16,816 - root - INFO - Step 24950: lr=1.00E-05, loss= 1.1927 (max= 1.6034), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:16,816 - root - INFO - Step 24950: lr=1.00E-05, loss= 1.1927 (max= 1.6034), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:16,816 - root - INFO - Step 24950: lr=1.00E-05, loss= 1.1927 (max= 1.6034), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:16,816 - root - INFO - Step 24950: lr=1.00E-05, loss= 1.1927 (max= 1.6034), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:16,817 - root - INFO - Step 24950: lr=1.00E-05, loss= 1.1927 (max= 1.6034), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:16,817 - root - INFO - Step 24950: lr=1.00E-05, loss= 1.1927 (max= 1.6034), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:32,794 - root - INFO - Step 24960: lr=1.00E-05, loss= 1.1964 (max= 1.6856), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:37:32,794 - root - INFO - Step 24960: lr=1.00E-05, loss= 1.1964 (max= 1.6856), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:37:32,795 - root - INFO - Step 24960: lr=1.00E-05, loss= 1.1964 (max= 1.6856), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:37:32,795 - root - INFO - Step 24960: lr=1.00E-05, loss= 1.1964 (max= 1.6856), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:37:32,795 - root - INFO - Step 24960: lr=1.00E-05, loss= 1.1964 (max= 1.6856), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:37:32,795 - root - INFO - Step 24960: lr=1.00E-05, loss= 1.1964 (max= 1.6856), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:37:32,795 - root - INFO - Step 24960: lr=1.00E-05, loss= 1.1964 (max= 1.6856), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:37:32,795 - root - INFO - Step 24960: lr=1.00E-05, loss= 1.1964 (max= 1.6856), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:37:48,778 - root - INFO - Step 24970: lr=1.00E-05, loss= 1.1848 (max= 1.6129), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:48,778 - root - INFO - Step 24970: lr=1.00E-05, loss= 1.1848 (max= 1.6129), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:48,778 - root - INFO - Step 24970: lr=1.00E-05, loss= 1.1848 (max= 1.6129), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:48,778 - root - INFO - Step 24970: lr=1.00E-05, loss= 1.1848 (max= 1.6129), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:48,778 - root - INFO - Step 24970: lr=1.00E-05, loss= 1.1848 (max= 1.6129), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:48,778 - root - INFO - Step 24970: lr=1.00E-05, loss= 1.1848 (max= 1.6129), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:48,779 - root - INFO - Step 24970: lr=1.00E-05, loss= 1.1848 (max= 1.6129), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:37:48,779 - root - INFO - Step 24970: lr=1.00E-05, loss= 1.1848 (max= 1.6129), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:38:04,740 - root - INFO - Step 24980: lr=1.00E-05, loss= 1.1833 (max= 1.5294), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:38:04,740 - root - INFO - Step 24980: lr=1.00E-05, loss= 1.1833 (max= 1.5294), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:38:04,740 - root - INFO - Step 24980: lr=1.00E-05, loss= 1.1833 (max= 1.5294), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:38:04,740 - root - INFO - Step 24980: lr=1.00E-05, loss= 1.1833 (max= 1.5294), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:38:04,740 - root - INFO - Step 24980: lr=1.00E-05, loss= 1.1833 (max= 1.5294), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:38:04,740 - root - INFO - Step 24980: lr=1.00E-05, loss= 1.1833 (max= 1.5294), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:38:04,740 - root - INFO - Step 24980: lr=1.00E-05, loss= 1.1833 (max= 1.5294), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:38:04,740 - root - INFO - Step 24980: lr=1.00E-05, loss= 1.1833 (max= 1.5294), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:38:20,692 - root - INFO - Step 24990: lr=1.00E-05, loss= 1.1841 (max= 1.5429), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:38:20,692 - root - INFO - Step 24990: lr=1.00E-05, loss= 1.1841 (max= 1.5429), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:38:20,692 - root - INFO - Step 24990: lr=1.00E-05, loss= 1.1841 (max= 1.5429), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:38:20,693 - root - INFO - Step 24990: lr=1.00E-05, loss= 1.1841 (max= 1.5429), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:38:20,693 - root - INFO - Step 24990: lr=1.00E-05, loss= 1.1841 (max= 1.5429), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:38:20,693 - root - INFO - Step 24990: lr=1.00E-05, loss= 1.1841 (max= 1.5429), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:38:20,693 - root - INFO - Step 24990: lr=1.00E-05, loss= 1.1841 (max= 1.5429), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:38:20,693 - root - INFO - Step 24990: lr=1.00E-05, loss= 1.1841 (max= 1.5429), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-25000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-25000! Save time: 4.300216197967529 +2025-10-24 21:38:36,607 - root - INFO - Step 25000: lr=1.00E-05, loss= 1.2098 (max= 1.5572), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:38:36,607 - root - INFO - Step 25000: lr=1.00E-05, loss= 1.2098 (max= 1.5572), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:38:36,607 - root - INFO - Saving a full checkpoint at step 25000 +2025-10-24 21:38:36,607 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 21:38:36,607 - root - INFO - Saving a full checkpoint at step 25000 +2025-10-24 21:38:36,607 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 21:38:36,607 - root - INFO - Step 25000: lr=1.00E-05, loss= 1.2098 (max= 1.5572), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:38:36,607 - root - INFO - Step 25000: lr=1.00E-05, loss= 1.2098 (max= 1.5572), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:38:36,607 - root - INFO - Saving a full checkpoint at step 25000 +2025-10-24 21:38:36,607 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 21:38:36,607 - root - INFO - Saving a full checkpoint at step 25000 +2025-10-24 21:38:36,607 - root - INFO - Step 25000: lr=1.00E-05, loss= 1.2098 (max= 1.5572), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:38:36,607 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 21:38:36,607 - root - INFO - Saving a full checkpoint at step 25000 +2025-10-24 21:38:36,607 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 21:38:36,607 - root - INFO - Step 25000: lr=1.00E-05, loss= 1.2098 (max= 1.5572), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:38:36,607 - root - INFO - Step 25000: lr=1.00E-05, loss= 1.2098 (max= 1.5572), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:38:36,608 - root - INFO - Saving a full checkpoint at step 25000 +2025-10-24 21:38:36,608 - root - INFO - Saving a full checkpoint at step 25000 +2025-10-24 21:38:36,608 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 21:38:36,608 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 21:38:36,608 - root - INFO - Step 25000: lr=1.00E-05, loss= 1.2098 (max= 1.5572), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:38:36,608 - root - INFO - Saving a full checkpoint at step 25000 +2025-10-24 21:38:36,608 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 21:38:51,529 - root - INFO - Finished saving the checkpoint in 14.92 seconds +2025-10-24 21:38:51,535 - root - INFO - Finished saving the checkpoint in 14.93 seconds +2025-10-24 21:38:51,536 - root - INFO - Finished saving the checkpoint in 14.93 seconds +2025-10-24 21:38:51,536 - root - INFO - Finished saving the checkpoint in 14.93 seconds +2025-10-24 21:38:51,536 - root - INFO - Finished saving the checkpoint in 14.93 seconds +2025-10-24 21:38:51,537 - root - INFO - Finished saving the checkpoint in 14.93 seconds +2025-10-24 21:38:51,537 - root - INFO - Finished saving the checkpoint in 14.93 seconds +2025-10-24 21:38:51,538 - root - INFO - Finished saving the checkpoint in 14.93 seconds +2025-10-24 21:39:07,393 - root - INFO - Step 25010: lr=1.00E-05, loss= 1.2195 (max= 1.5932), tps=10645, mfu=22.18%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:39:07,393 - root - INFO - Step 25010: lr=1.00E-05, loss= 1.2195 (max= 1.5932), tps=10645, mfu=22.18%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:39:07,393 - root - INFO - Step 25010: lr=1.00E-05, loss= 1.2195 (max= 1.5932), tps=10645, mfu=22.18%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:39:07,393 - root - INFO - Step 25010: lr=1.00E-05, loss= 1.2195 (max= 1.5932), tps=10645, mfu=22.18%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:39:07,393 - root - INFO - Step 25010: lr=1.00E-05, loss= 1.2195 (max= 1.5932), tps=10645, mfu=22.18%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:39:07,393 - root - INFO - Step 25010: lr=1.00E-05, loss= 1.2195 (max= 1.5932), tps=10645, mfu=22.18%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:39:07,393 - root - INFO - Step 25010: lr=1.00E-05, loss= 1.2195 (max= 1.5932), tps=10645, mfu=22.18%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:39:07,393 - root - INFO - Step 25010: lr=1.00E-05, loss= 1.2195 (max= 1.5932), tps=10645, mfu=22.18%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:39:23,329 - root - INFO - Step 25020: lr=1.00E-05, loss= 1.1872 (max= 1.6990), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:39:23,329 - root - INFO - Step 25020: lr=1.00E-05, loss= 1.1872 (max= 1.6990), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:39:23,330 - root - INFO - Step 25020: lr=1.00E-05, loss= 1.1872 (max= 1.6990), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:39:23,330 - root - INFO - Step 25020: lr=1.00E-05, loss= 1.1872 (max= 1.6990), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:39:23,330 - root - INFO - Step 25020: lr=1.00E-05, loss= 1.1872 (max= 1.6990), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:39:23,330 - root - INFO - Step 25020: lr=1.00E-05, loss= 1.1872 (max= 1.6990), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:39:23,330 - root - INFO - Step 25020: lr=1.00E-05, loss= 1.1872 (max= 1.6990), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:39:23,330 - root - INFO - Step 25020: lr=1.00E-05, loss= 1.1872 (max= 1.6990), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:39:39,290 - root - INFO - Step 25030: lr=1.00E-05, loss= 1.2014 (max= 1.6138), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:39:39,290 - root - INFO - Step 25030: lr=1.00E-05, loss= 1.2014 (max= 1.6138), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:39:39,291 - root - INFO - Step 25030: lr=1.00E-05, loss= 1.2014 (max= 1.6138), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:39:39,291 - root - INFO - Step 25030: lr=1.00E-05, loss= 1.2014 (max= 1.6138), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:39:39,291 - root - INFO - Step 25030: lr=1.00E-05, loss= 1.2014 (max= 1.6138), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:39:39,291 - root - INFO - Step 25030: lr=1.00E-05, loss= 1.2014 (max= 1.6138), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:39:39,291 - root - INFO - Step 25030: lr=1.00E-05, loss= 1.2014 (max= 1.6138), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:39:39,291 - root - INFO - Step 25030: lr=1.00E-05, loss= 1.2014 (max= 1.6138), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:39:55,243 - root - INFO - Step 25040: lr=1.00E-05, loss= 1.1896 (max= 1.6083), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:39:55,243 - root - INFO - Step 25040: lr=1.00E-05, loss= 1.1896 (max= 1.6083), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:39:55,243 - root - INFO - Step 25040: lr=1.00E-05, loss= 1.1896 (max= 1.6083), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:39:55,244 - root - INFO - Step 25040: lr=1.00E-05, loss= 1.1896 (max= 1.6083), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:39:55,244 - root - INFO - Step 25040: lr=1.00E-05, loss= 1.1896 (max= 1.6083), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:39:55,244 - root - INFO - Step 25040: lr=1.00E-05, loss= 1.1896 (max= 1.6083), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:39:55,244 - root - INFO - Step 25040: lr=1.00E-05, loss= 1.1896 (max= 1.6083), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:39:55,244 - root - INFO - Step 25040: lr=1.00E-05, loss= 1.1896 (max= 1.6083), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:40:11,124 - root - INFO - Step 25050: lr=1.00E-05, loss= 1.1740 (max= 1.5236), tps=20637, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:11,124 - root - INFO - Step 25050: lr=1.00E-05, loss= 1.1740 (max= 1.5236), tps=20637, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:11,125 - root - INFO - Step 25050: lr=1.00E-05, loss= 1.1740 (max= 1.5236), tps=20638, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:11,125 - root - INFO - Step 25050: lr=1.00E-05, loss= 1.1740 (max= 1.5236), tps=20638, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:11,125 - root - INFO - Step 25050: lr=1.00E-05, loss= 1.1740 (max= 1.5236), tps=20637, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:11,125 - root - INFO - Step 25050: lr=1.00E-05, loss= 1.1740 (max= 1.5236), tps=20638, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:11,125 - root - INFO - Step 25050: lr=1.00E-05, loss= 1.1740 (max= 1.5236), tps=20638, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:11,125 - root - INFO - Step 25050: lr=1.00E-05, loss= 1.1740 (max= 1.5236), tps=20638, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:27,069 - root - INFO - Step 25060: lr=1.00E-05, loss= 1.2102 (max= 1.5222), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:40:27,069 - root - INFO - Step 25060: lr=1.00E-05, loss= 1.2102 (max= 1.5222), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:40:27,069 - root - INFO - Step 25060: lr=1.00E-05, loss= 1.2102 (max= 1.5222), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:40:27,069 - root - INFO - Step 25060: lr=1.00E-05, loss= 1.2102 (max= 1.5222), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:40:27,069 - root - INFO - Step 25060: lr=1.00E-05, loss= 1.2102 (max= 1.5222), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:40:27,069 - root - INFO - Step 25060: lr=1.00E-05, loss= 1.2102 (max= 1.5222), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:40:27,069 - root - INFO - Step 25060: lr=1.00E-05, loss= 1.2102 (max= 1.5222), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:40:27,069 - root - INFO - Step 25060: lr=1.00E-05, loss= 1.2102 (max= 1.5222), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:40:43,006 - root - INFO - Step 25070: lr=1.00E-05, loss= 1.1975 (max= 1.9498), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:43,006 - root - INFO - Step 25070: lr=1.00E-05, loss= 1.1975 (max= 1.9498), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:43,007 - root - INFO - Step 25070: lr=1.00E-05, loss= 1.1975 (max= 1.9498), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:43,007 - root - INFO - Step 25070: lr=1.00E-05, loss= 1.1975 (max= 1.9498), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:43,007 - root - INFO - Step 25070: lr=1.00E-05, loss= 1.1975 (max= 1.9498), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:43,007 - root - INFO - Step 25070: lr=1.00E-05, loss= 1.1975 (max= 1.9498), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:43,007 - root - INFO - Step 25070: lr=1.00E-05, loss= 1.1975 (max= 1.9498), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:43,007 - root - INFO - Step 25070: lr=1.00E-05, loss= 1.1975 (max= 1.9498), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:58,928 - root - INFO - Step 25080: lr=1.00E-05, loss= 1.1870 (max= 1.5396), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:58,928 - root - INFO - Step 25080: lr=1.00E-05, loss= 1.1870 (max= 1.5396), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:58,928 - root - INFO - Step 25080: lr=1.00E-05, loss= 1.1870 (max= 1.5396), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:58,928 - root - INFO - Step 25080: lr=1.00E-05, loss= 1.1870 (max= 1.5396), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:58,928 - root - INFO - Step 25080: lr=1.00E-05, loss= 1.1870 (max= 1.5396), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:58,928 - root - INFO - Step 25080: lr=1.00E-05, loss= 1.1870 (max= 1.5396), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:58,928 - root - INFO - Step 25080: lr=1.00E-05, loss= 1.1870 (max= 1.5396), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:40:58,928 - root - INFO - Step 25080: lr=1.00E-05, loss= 1.1870 (max= 1.5396), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:41:14,853 - root - INFO - Step 25090: lr=1.00E-05, loss= 1.1913 (max= 1.7110), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:41:14,853 - root - INFO - Step 25090: lr=1.00E-05, loss= 1.1913 (max= 1.7110), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:41:14,854 - root - INFO - Step 25090: lr=1.00E-05, loss= 1.1913 (max= 1.7110), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:41:14,854 - root - INFO - Step 25090: lr=1.00E-05, loss= 1.1913 (max= 1.7110), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:41:14,854 - root - INFO - Step 25090: lr=1.00E-05, loss= 1.1913 (max= 1.7110), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:41:14,854 - root - INFO - Step 25090: lr=1.00E-05, loss= 1.1913 (max= 1.7110), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:41:14,854 - root - INFO - Step 25090: lr=1.00E-05, loss= 1.1913 (max= 1.7110), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:41:14,854 - root - INFO - Step 25090: lr=1.00E-05, loss= 1.1913 (max= 1.7110), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:41:30,792 - root - INFO - Step 25100: lr=1.00E-05, loss= 1.2058 (max= 1.8168), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:41:30,792 - root - INFO - Step 25100: lr=1.00E-05, loss= 1.2058 (max= 1.8168), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:41:30,792 - root - INFO - Step 25100: lr=1.00E-05, loss= 1.2058 (max= 1.8168), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:41:30,792 - root - INFO - Step 25100: lr=1.00E-05, loss= 1.2058 (max= 1.8168), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:41:30,792 - root - INFO - Step 25100: lr=1.00E-05, loss= 1.2058 (max= 1.8168), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:41:30,792 - root - INFO - Step 25100: lr=1.00E-05, loss= 1.2058 (max= 1.8168), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:41:30,792 - root - INFO - Step 25100: lr=1.00E-05, loss= 1.2058 (max= 1.8168), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:41:30,793 - root - INFO - Step 25100: lr=1.00E-05, loss= 1.2058 (max= 1.8168), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:41:46,754 - root - INFO - Step 25110: lr=1.00E-05, loss= 1.1732 (max= 1.5839), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:41:46,754 - root - INFO - Step 25110: lr=1.00E-05, loss= 1.1732 (max= 1.5839), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:41:46,755 - root - INFO - Step 25110: lr=1.00E-05, loss= 1.1732 (max= 1.5839), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:41:46,755 - root - INFO - Step 25110: lr=1.00E-05, loss= 1.1732 (max= 1.5839), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:41:46,755 - root - INFO - Step 25110: lr=1.00E-05, loss= 1.1732 (max= 1.5839), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:41:46,755 - root - INFO - Step 25110: lr=1.00E-05, loss= 1.1732 (max= 1.5839), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:41:46,755 - root - INFO - Step 25110: lr=1.00E-05, loss= 1.1732 (max= 1.5839), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:41:46,755 - root - INFO - Step 25110: lr=1.00E-05, loss= 1.1732 (max= 1.5839), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:42:02,654 - root - INFO - Step 25120: lr=1.00E-05, loss= 1.1812 (max= 1.5429), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:02,654 - root - INFO - Step 25120: lr=1.00E-05, loss= 1.1812 (max= 1.5429), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:02,654 - root - INFO - Step 25120: lr=1.00E-05, loss= 1.1812 (max= 1.5429), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:02,654 - root - INFO - Step 25120: lr=1.00E-05, loss= 1.1812 (max= 1.5429), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:02,654 - root - INFO - Step 25120: lr=1.00E-05, loss= 1.1812 (max= 1.5429), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:02,654 - root - INFO - Step 25120: lr=1.00E-05, loss= 1.1812 (max= 1.5429), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:02,655 - root - INFO - Step 25120: lr=1.00E-05, loss= 1.1812 (max= 1.5429), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:02,655 - root - INFO - Step 25120: lr=1.00E-05, loss= 1.1812 (max= 1.5429), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:18,615 - root - INFO - Step 25130: lr=1.00E-05, loss= 1.1908 (max= 1.5179), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:18,615 - root - INFO - Step 25130: lr=1.00E-05, loss= 1.1908 (max= 1.5179), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:18,615 - root - INFO - Step 25130: lr=1.00E-05, loss= 1.1908 (max= 1.5179), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:18,615 - root - INFO - Step 25130: lr=1.00E-05, loss= 1.1908 (max= 1.5179), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:18,615 - root - INFO - Step 25130: lr=1.00E-05, loss= 1.1908 (max= 1.5179), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:18,615 - root - INFO - Step 25130: lr=1.00E-05, loss= 1.1908 (max= 1.5179), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:18,615 - root - INFO - Step 25130: lr=1.00E-05, loss= 1.1908 (max= 1.5179), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:18,615 - root - INFO - Step 25130: lr=1.00E-05, loss= 1.1908 (max= 1.5179), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:34,570 - root - INFO - Step 25140: lr=1.00E-05, loss= 1.2232 (max= 1.7025), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:34,570 - root - INFO - Step 25140: lr=1.00E-05, loss= 1.2232 (max= 1.7025), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:34,570 - root - INFO - Step 25140: lr=1.00E-05, loss= 1.2232 (max= 1.7025), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:34,570 - root - INFO - Step 25140: lr=1.00E-05, loss= 1.2232 (max= 1.7025), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:34,570 - root - INFO - Step 25140: lr=1.00E-05, loss= 1.2232 (max= 1.7025), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:34,570 - root - INFO - Step 25140: lr=1.00E-05, loss= 1.2232 (max= 1.7025), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:34,570 - root - INFO - Step 25140: lr=1.00E-05, loss= 1.2232 (max= 1.7025), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:34,570 - root - INFO - Step 25140: lr=1.00E-05, loss= 1.2232 (max= 1.7025), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:50,537 - root - INFO - Step 25150: lr=1.00E-05, loss= 1.1984 (max= 1.6797), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:50,537 - root - INFO - Step 25150: lr=1.00E-05, loss= 1.1984 (max= 1.6797), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:50,537 - root - INFO - Step 25150: lr=1.00E-05, loss= 1.1984 (max= 1.6797), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:50,537 - root - INFO - Step 25150: lr=1.00E-05, loss= 1.1984 (max= 1.6797), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:50,537 - root - INFO - Step 25150: lr=1.00E-05, loss= 1.1984 (max= 1.6797), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:50,537 - root - INFO - Step 25150: lr=1.00E-05, loss= 1.1984 (max= 1.6797), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:50,537 - root - INFO - Step 25150: lr=1.00E-05, loss= 1.1984 (max= 1.6797), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:42:50,537 - root - INFO - Step 25150: lr=1.00E-05, loss= 1.1984 (max= 1.6797), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:43:06,436 - root - INFO - Step 25160: lr=1.00E-05, loss= 1.1610 (max= 1.4970), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:06,436 - root - INFO - Step 25160: lr=1.00E-05, loss= 1.1610 (max= 1.4970), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:06,436 - root - INFO - Step 25160: lr=1.00E-05, loss= 1.1610 (max= 1.4970), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:06,436 - root - INFO - Step 25160: lr=1.00E-05, loss= 1.1610 (max= 1.4970), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:06,436 - root - INFO - Step 25160: lr=1.00E-05, loss= 1.1610 (max= 1.4970), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:06,436 - root - INFO - Step 25160: lr=1.00E-05, loss= 1.1610 (max= 1.4970), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:06,437 - root - INFO - Step 25160: lr=1.00E-05, loss= 1.1610 (max= 1.4970), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:06,437 - root - INFO - Step 25160: lr=1.00E-05, loss= 1.1610 (max= 1.4970), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:22,329 - root - INFO - Step 25170: lr=1.00E-05, loss= 1.2026 (max= 1.6228), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:22,329 - root - INFO - Step 25170: lr=1.00E-05, loss= 1.2026 (max= 1.6228), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:22,329 - root - INFO - Step 25170: lr=1.00E-05, loss= 1.2026 (max= 1.6228), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:22,329 - root - INFO - Step 25170: lr=1.00E-05, loss= 1.2026 (max= 1.6228), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:22,329 - root - INFO - Step 25170: lr=1.00E-05, loss= 1.2026 (max= 1.6228), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:22,329 - root - INFO - Step 25170: lr=1.00E-05, loss= 1.2026 (max= 1.6228), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:22,329 - root - INFO - Step 25170: lr=1.00E-05, loss= 1.2026 (max= 1.6228), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:22,330 - root - INFO - Step 25170: lr=1.00E-05, loss= 1.2026 (max= 1.6228), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:38,273 - root - INFO - Step 25180: lr=1.00E-05, loss= 1.2071 (max= 1.6590), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:43:38,273 - root - INFO - Step 25180: lr=1.00E-05, loss= 1.2071 (max= 1.6590), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:43:38,273 - root - INFO - Step 25180: lr=1.00E-05, loss= 1.2071 (max= 1.6590), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:43:38,273 - root - INFO - Step 25180: lr=1.00E-05, loss= 1.2071 (max= 1.6590), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:43:38,273 - root - INFO - Step 25180: lr=1.00E-05, loss= 1.2071 (max= 1.6590), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:43:38,273 - root - INFO - Step 25180: lr=1.00E-05, loss= 1.2071 (max= 1.6590), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:43:38,273 - root - INFO - Step 25180: lr=1.00E-05, loss= 1.2071 (max= 1.6590), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:43:38,273 - root - INFO - Step 25180: lr=1.00E-05, loss= 1.2071 (max= 1.6590), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:43:54,202 - root - INFO - Step 25190: lr=1.00E-05, loss= 1.1948 (max= 1.4927), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:54,202 - root - INFO - Step 25190: lr=1.00E-05, loss= 1.1948 (max= 1.4927), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:54,202 - root - INFO - Step 25190: lr=1.00E-05, loss= 1.1948 (max= 1.4927), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:54,202 - root - INFO - Step 25190: lr=1.00E-05, loss= 1.1948 (max= 1.4927), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:54,202 - root - INFO - Step 25190: lr=1.00E-05, loss= 1.1948 (max= 1.4927), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:54,202 - root - INFO - Step 25190: lr=1.00E-05, loss= 1.1948 (max= 1.4927), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:54,202 - root - INFO - Step 25190: lr=1.00E-05, loss= 1.1948 (max= 1.4927), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:43:54,202 - root - INFO - Step 25190: lr=1.00E-05, loss= 1.1948 (max= 1.4927), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:10,097 - root - INFO - Step 25200: lr=1.00E-05, loss= 1.1668 (max= 1.5309), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:10,097 - root - INFO - Step 25200: lr=1.00E-05, loss= 1.1668 (max= 1.5309), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:10,097 - root - INFO - Step 25200: lr=1.00E-05, loss= 1.1668 (max= 1.5309), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:10,097 - root - INFO - Step 25200: lr=1.00E-05, loss= 1.1668 (max= 1.5309), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:10,098 - root - INFO - Step 25200: lr=1.00E-05, loss= 1.1668 (max= 1.5309), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:10,098 - root - INFO - Step 25200: lr=1.00E-05, loss= 1.1668 (max= 1.5309), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:10,098 - root - INFO - Step 25200: lr=1.00E-05, loss= 1.1668 (max= 1.5309), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:10,098 - root - INFO - Step 25200: lr=1.00E-05, loss= 1.1668 (max= 1.5309), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:25,988 - root - INFO - Step 25210: lr=1.00E-05, loss= 1.2012 (max= 1.6316), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:44:25,988 - root - INFO - Step 25210: lr=1.00E-05, loss= 1.2012 (max= 1.6316), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:44:25,988 - root - INFO - Step 25210: lr=1.00E-05, loss= 1.2012 (max= 1.6316), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:44:25,989 - root - INFO - Step 25210: lr=1.00E-05, loss= 1.2012 (max= 1.6316), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:44:25,989 - root - INFO - Step 25210: lr=1.00E-05, loss= 1.2012 (max= 1.6316), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:44:25,989 - root - INFO - Step 25210: lr=1.00E-05, loss= 1.2012 (max= 1.6316), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:44:25,989 - root - INFO - Step 25210: lr=1.00E-05, loss= 1.2012 (max= 1.6316), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:44:25,989 - root - INFO - Step 25210: lr=1.00E-05, loss= 1.2012 (max= 1.6316), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:44:41,928 - root - INFO - Step 25220: lr=1.00E-05, loss= 1.2101 (max= 1.8231), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:41,928 - root - INFO - Step 25220: lr=1.00E-05, loss= 1.2101 (max= 1.8231), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:41,928 - root - INFO - Step 25220: lr=1.00E-05, loss= 1.2101 (max= 1.8231), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:41,928 - root - INFO - Step 25220: lr=1.00E-05, loss= 1.2101 (max= 1.8231), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:41,928 - root - INFO - Step 25220: lr=1.00E-05, loss= 1.2101 (max= 1.8231), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:41,928 - root - INFO - Step 25220: lr=1.00E-05, loss= 1.2101 (max= 1.8231), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:41,928 - root - INFO - Step 25220: lr=1.00E-05, loss= 1.2101 (max= 1.8231), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:41,928 - root - INFO - Step 25220: lr=1.00E-05, loss= 1.2101 (max= 1.8231), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:57,890 - root - INFO - Step 25230: lr=1.00E-05, loss= 1.1936 (max= 1.5154), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:57,890 - root - INFO - Step 25230: lr=1.00E-05, loss= 1.1936 (max= 1.5154), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:57,890 - root - INFO - Step 25230: lr=1.00E-05, loss= 1.1936 (max= 1.5154), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:57,890 - root - INFO - Step 25230: lr=1.00E-05, loss= 1.1936 (max= 1.5154), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:57,890 - root - INFO - Step 25230: lr=1.00E-05, loss= 1.1936 (max= 1.5154), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:57,890 - root - INFO - Step 25230: lr=1.00E-05, loss= 1.1936 (max= 1.5154), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:57,890 - root - INFO - Step 25230: lr=1.00E-05, loss= 1.1936 (max= 1.5154), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:44:57,890 - root - INFO - Step 25230: lr=1.00E-05, loss= 1.1936 (max= 1.5154), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:45:13,777 - root - INFO - Step 25240: lr=1.00E-05, loss= 1.2082 (max= 1.4915), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:45:13,777 - root - INFO - Step 25240: lr=1.00E-05, loss= 1.2082 (max= 1.4915), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:45:13,777 - root - INFO - Step 25240: lr=1.00E-05, loss= 1.2082 (max= 1.4915), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:45:13,777 - root - INFO - Step 25240: lr=1.00E-05, loss= 1.2082 (max= 1.4915), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:45:13,778 - root - INFO - Step 25240: lr=1.00E-05, loss= 1.2082 (max= 1.4915), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:45:13,778 - root - INFO - Step 25240: lr=1.00E-05, loss= 1.2082 (max= 1.4915), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:45:13,778 - root - INFO - Step 25240: lr=1.00E-05, loss= 1.2082 (max= 1.4915), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:45:13,778 - root - INFO - Step 25240: lr=1.00E-05, loss= 1.2082 (max= 1.4915), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:45:29,756 - root - INFO - Step 25250: lr=1.00E-05, loss= 1.1883 (max= 1.5460), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:45:29,756 - root - INFO - Step 25250: lr=1.00E-05, loss= 1.1883 (max= 1.5460), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:45:29,756 - root - INFO - Step 25250: lr=1.00E-05, loss= 1.1883 (max= 1.5460), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:45:29,756 - root - INFO - Step 25250: lr=1.00E-05, loss= 1.1883 (max= 1.5460), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:45:29,757 - root - INFO - Step 25250: lr=1.00E-05, loss= 1.1883 (max= 1.5460), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:45:29,757 - root - INFO - Step 25250: lr=1.00E-05, loss= 1.1883 (max= 1.5460), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:45:29,757 - root - INFO - Step 25250: lr=1.00E-05, loss= 1.1883 (max= 1.5460), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:45:29,757 - root - INFO - Step 25250: lr=1.00E-05, loss= 1.1883 (max= 1.5460), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:45:45,670 - root - INFO - Step 25260: lr=1.00E-05, loss= 1.1938 (max= 1.5804), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:45:45,670 - root - INFO - Step 25260: lr=1.00E-05, loss= 1.1938 (max= 1.5804), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:45:45,670 - root - INFO - Step 25260: lr=1.00E-05, loss= 1.1938 (max= 1.5804), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:45:45,670 - root - INFO - Step 25260: lr=1.00E-05, loss= 1.1938 (max= 1.5804), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:45:45,670 - root - INFO - Step 25260: lr=1.00E-05, loss= 1.1938 (max= 1.5804), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:45:45,670 - root - INFO - Step 25260: lr=1.00E-05, loss= 1.1938 (max= 1.5804), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:45:45,670 - root - INFO - Step 25260: lr=1.00E-05, loss= 1.1938 (max= 1.5804), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:45:45,670 - root - INFO - Step 25260: lr=1.00E-05, loss= 1.1938 (max= 1.5804), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:01,659 - root - INFO - Step 25270: lr=1.00E-05, loss= 1.1811 (max= 1.6102), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:01,659 - root - INFO - Step 25270: lr=1.00E-05, loss= 1.1811 (max= 1.6102), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:01,659 - root - INFO - Step 25270: lr=1.00E-05, loss= 1.1811 (max= 1.6102), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:01,659 - root - INFO - Step 25270: lr=1.00E-05, loss= 1.1811 (max= 1.6102), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:01,659 - root - INFO - Step 25270: lr=1.00E-05, loss= 1.1811 (max= 1.6102), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:01,659 - root - INFO - Step 25270: lr=1.00E-05, loss= 1.1811 (max= 1.6102), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:01,660 - root - INFO - Step 25270: lr=1.00E-05, loss= 1.1811 (max= 1.6102), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:01,660 - root - INFO - Step 25270: lr=1.00E-05, loss= 1.1811 (max= 1.6102), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:17,553 - root - INFO - Step 25280: lr=1.00E-05, loss= 1.1708 (max= 1.7032), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:46:17,553 - root - INFO - Step 25280: lr=1.00E-05, loss= 1.1708 (max= 1.7032), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:46:17,553 - root - INFO - Step 25280: lr=1.00E-05, loss= 1.1708 (max= 1.7032), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:46:17,553 - root - INFO - Step 25280: lr=1.00E-05, loss= 1.1708 (max= 1.7032), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:46:17,553 - root - INFO - Step 25280: lr=1.00E-05, loss= 1.1708 (max= 1.7032), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:46:17,553 - root - INFO - Step 25280: lr=1.00E-05, loss= 1.1708 (max= 1.7032), tps=20621, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:46:17,553 - root - INFO - Step 25280: lr=1.00E-05, loss= 1.1708 (max= 1.7032), tps=20621, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:46:17,553 - root - INFO - Step 25280: lr=1.00E-05, loss= 1.1708 (max= 1.7032), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:46:33,514 - root - INFO - Step 25290: lr=1.00E-05, loss= 1.2098 (max= 1.9217), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:33,514 - root - INFO - Step 25290: lr=1.00E-05, loss= 1.2098 (max= 1.9217), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:33,514 - root - INFO - Step 25290: lr=1.00E-05, loss= 1.2098 (max= 1.9217), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:33,514 - root - INFO - Step 25290: lr=1.00E-05, loss= 1.2098 (max= 1.9217), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:33,514 - root - INFO - Step 25290: lr=1.00E-05, loss= 1.2098 (max= 1.9217), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:33,514 - root - INFO - Step 25290: lr=1.00E-05, loss= 1.2098 (max= 1.9217), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:33,515 - root - INFO - Step 25290: lr=1.00E-05, loss= 1.2098 (max= 1.9217), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:33,515 - root - INFO - Step 25290: lr=1.00E-05, loss= 1.2098 (max= 1.9217), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:49,476 - root - INFO - Step 25300: lr=1.00E-05, loss= 1.1883 (max= 1.6489), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:49,476 - root - INFO - Step 25300: lr=1.00E-05, loss= 1.1883 (max= 1.6489), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:49,476 - root - INFO - Step 25300: lr=1.00E-05, loss= 1.1883 (max= 1.6489), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:49,476 - root - INFO - Step 25300: lr=1.00E-05, loss= 1.1883 (max= 1.6489), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:49,477 - root - INFO - Step 25300: lr=1.00E-05, loss= 1.1883 (max= 1.6489), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:49,477 - root - INFO - Step 25300: lr=1.00E-05, loss= 1.1883 (max= 1.6489), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:49,477 - root - INFO - Step 25300: lr=1.00E-05, loss= 1.1883 (max= 1.6489), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:46:49,477 - root - INFO - Step 25300: lr=1.00E-05, loss= 1.1883 (max= 1.6489), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:05,443 - root - INFO - Step 25310: lr=1.00E-05, loss= 1.2188 (max= 1.5737), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:05,443 - root - INFO - Step 25310: lr=1.00E-05, loss= 1.2188 (max= 1.5737), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:05,443 - root - INFO - Step 25310: lr=1.00E-05, loss= 1.2188 (max= 1.5737), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:05,443 - root - INFO - Step 25310: lr=1.00E-05, loss= 1.2188 (max= 1.5737), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:05,443 - root - INFO - Step 25310: lr=1.00E-05, loss= 1.2188 (max= 1.5737), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:05,443 - root - INFO - Step 25310: lr=1.00E-05, loss= 1.2188 (max= 1.5737), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:05,443 - root - INFO - Step 25310: lr=1.00E-05, loss= 1.2188 (max= 1.5737), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:05,444 - root - INFO - Step 25310: lr=1.00E-05, loss= 1.2188 (max= 1.5737), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:21,354 - root - INFO - Step 25320: lr=1.00E-05, loss= 1.1823 (max= 1.6368), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:21,354 - root - INFO - Step 25320: lr=1.00E-05, loss= 1.1823 (max= 1.6368), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:21,354 - root - INFO - Step 25320: lr=1.00E-05, loss= 1.1823 (max= 1.6368), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:21,354 - root - INFO - Step 25320: lr=1.00E-05, loss= 1.1823 (max= 1.6368), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:21,354 - root - INFO - Step 25320: lr=1.00E-05, loss= 1.1823 (max= 1.6368), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:21,354 - root - INFO - Step 25320: lr=1.00E-05, loss= 1.1823 (max= 1.6368), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:21,354 - root - INFO - Step 25320: lr=1.00E-05, loss= 1.1823 (max= 1.6368), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:21,354 - root - INFO - Step 25320: lr=1.00E-05, loss= 1.1823 (max= 1.6368), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:37,339 - root - INFO - Step 25330: lr=1.00E-05, loss= 1.1801 (max= 1.6052), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:37,339 - root - INFO - Step 25330: lr=1.00E-05, loss= 1.1801 (max= 1.6052), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:37,339 - root - INFO - Step 25330: lr=1.00E-05, loss= 1.1801 (max= 1.6052), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:37,340 - root - INFO - Step 25330: lr=1.00E-05, loss= 1.1801 (max= 1.6052), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:37,340 - root - INFO - Step 25330: lr=1.00E-05, loss= 1.1801 (max= 1.6052), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:37,340 - root - INFO - Step 25330: lr=1.00E-05, loss= 1.1801 (max= 1.6052), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:37,340 - root - INFO - Step 25330: lr=1.00E-05, loss= 1.1801 (max= 1.6052), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:37,340 - root - INFO - Step 25330: lr=1.00E-05, loss= 1.1801 (max= 1.6052), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:53,298 - root - INFO - Step 25340: lr=1.00E-05, loss= 1.2068 (max= 1.5155), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:53,298 - root - INFO - Step 25340: lr=1.00E-05, loss= 1.2068 (max= 1.5155), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:53,298 - root - INFO - Step 25340: lr=1.00E-05, loss= 1.2068 (max= 1.5155), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:53,298 - root - INFO - Step 25340: lr=1.00E-05, loss= 1.2068 (max= 1.5155), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:53,299 - root - INFO - Step 25340: lr=1.00E-05, loss= 1.2068 (max= 1.5155), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:53,299 - root - INFO - Step 25340: lr=1.00E-05, loss= 1.2068 (max= 1.5155), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:53,299 - root - INFO - Step 25340: lr=1.00E-05, loss= 1.2068 (max= 1.5155), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:47:53,299 - root - INFO - Step 25340: lr=1.00E-05, loss= 1.2068 (max= 1.5155), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:48:09,172 - root - INFO - Step 25350: lr=1.00E-05, loss= 1.1971 (max= 1.6708), tps=20649, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:48:09,172 - root - INFO - Step 25350: lr=1.00E-05, loss= 1.1971 (max= 1.6708), tps=20649, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:48:09,172 - root - INFO - Step 25350: lr=1.00E-05, loss= 1.1971 (max= 1.6708), tps=20648, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:48:09,172 - root - INFO - Step 25350: lr=1.00E-05, loss= 1.1971 (max= 1.6708), tps=20649, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:48:09,172 - root - INFO - Step 25350: lr=1.00E-05, loss= 1.1971 (max= 1.6708), tps=20649, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:48:09,172 - root - INFO - Step 25350: lr=1.00E-05, loss= 1.1971 (max= 1.6708), tps=20648, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:48:09,172 - root - INFO - Step 25350: lr=1.00E-05, loss= 1.1971 (max= 1.6708), tps=20649, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:48:09,172 - root - INFO - Step 25350: lr=1.00E-05, loss= 1.1971 (max= 1.6708), tps=20649, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:48:25,068 - root - INFO - Step 25360: lr=1.00E-05, loss= 1.1832 (max= 1.5558), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:48:25,068 - root - INFO - Step 25360: lr=1.00E-05, loss= 1.1832 (max= 1.5558), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:48:25,068 - root - INFO - Step 25360: lr=1.00E-05, loss= 1.1832 (max= 1.5558), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:48:25,068 - root - INFO - Step 25360: lr=1.00E-05, loss= 1.1832 (max= 1.5558), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:48:25,068 - root - INFO - Step 25360: lr=1.00E-05, loss= 1.1832 (max= 1.5558), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:48:25,068 - root - INFO - Step 25360: lr=1.00E-05, loss= 1.1832 (max= 1.5558), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:48:25,068 - root - INFO - Step 25360: lr=1.00E-05, loss= 1.1832 (max= 1.5558), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:48:25,068 - root - INFO - Step 25360: lr=1.00E-05, loss= 1.1832 (max= 1.5558), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:48:41,019 - root - INFO - Step 25370: lr=1.00E-05, loss= 1.1950 (max= 1.6607), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:48:41,019 - root - INFO - Step 25370: lr=1.00E-05, loss= 1.1950 (max= 1.6607), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:48:41,019 - root - INFO - Step 25370: lr=1.00E-05, loss= 1.1950 (max= 1.6607), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:48:41,019 - root - INFO - Step 25370: lr=1.00E-05, loss= 1.1950 (max= 1.6607), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:48:41,019 - root - INFO - Step 25370: lr=1.00E-05, loss= 1.1950 (max= 1.6607), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:48:41,019 - root - INFO - Step 25370: lr=1.00E-05, loss= 1.1950 (max= 1.6607), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:48:41,019 - root - INFO - Step 25370: lr=1.00E-05, loss= 1.1950 (max= 1.6607), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:48:41,019 - root - INFO - Step 25370: lr=1.00E-05, loss= 1.1950 (max= 1.6607), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:48:56,979 - root - INFO - Step 25380: lr=1.00E-05, loss= 1.2154 (max= 1.6220), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:48:56,979 - root - INFO - Step 25380: lr=1.00E-05, loss= 1.2154 (max= 1.6220), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:48:56,979 - root - INFO - Step 25380: lr=1.00E-05, loss= 1.2154 (max= 1.6220), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:48:56,979 - root - INFO - Step 25380: lr=1.00E-05, loss= 1.2154 (max= 1.6220), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:48:56,979 - root - INFO - Step 25380: lr=1.00E-05, loss= 1.2154 (max= 1.6220), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:48:56,979 - root - INFO - Step 25380: lr=1.00E-05, loss= 1.2154 (max= 1.6220), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:48:56,979 - root - INFO - Step 25380: lr=1.00E-05, loss= 1.2154 (max= 1.6220), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:48:56,979 - root - INFO - Step 25380: lr=1.00E-05, loss= 1.2154 (max= 1.6220), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:49:12,979 - root - INFO - Step 25390: lr=1.00E-05, loss= 1.1481 (max= 1.5362), tps=20484, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:49:12,979 - root - INFO - Step 25390: lr=1.00E-05, loss= 1.1481 (max= 1.5362), tps=20484, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:49:12,979 - root - INFO - Step 25390: lr=1.00E-05, loss= 1.1481 (max= 1.5362), tps=20484, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:49:12,979 - root - INFO - Step 25390: lr=1.00E-05, loss= 1.1481 (max= 1.5362), tps=20484, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:49:12,980 - root - INFO - Step 25390: lr=1.00E-05, loss= 1.1481 (max= 1.5362), tps=20484, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:49:12,980 - root - INFO - Step 25390: lr=1.00E-05, loss= 1.1481 (max= 1.5362), tps=20484, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:49:12,980 - root - INFO - Step 25390: lr=1.00E-05, loss= 1.1481 (max= 1.5362), tps=20484, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:49:12,980 - root - INFO - Step 25390: lr=1.00E-05, loss= 1.1481 (max= 1.5362), tps=20484, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:49:28,894 - root - INFO - Step 25400: lr=1.00E-05, loss= 1.1810 (max= 1.5263), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:49:28,894 - root - INFO - Step 25400: lr=1.00E-05, loss= 1.1810 (max= 1.5263), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:49:28,894 - root - INFO - Step 25400: lr=1.00E-05, loss= 1.1810 (max= 1.5263), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:49:28,894 - root - INFO - Step 25400: lr=1.00E-05, loss= 1.1810 (max= 1.5263), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:49:28,894 - root - INFO - Step 25400: lr=1.00E-05, loss= 1.1810 (max= 1.5263), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:49:28,894 - root - INFO - Step 25400: lr=1.00E-05, loss= 1.1810 (max= 1.5263), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:49:28,894 - root - INFO - Step 25400: lr=1.00E-05, loss= 1.1810 (max= 1.5263), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:49:28,894 - root - INFO - Step 25400: lr=1.00E-05, loss= 1.1810 (max= 1.5263), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:49:44,794 - root - INFO - Step 25410: lr=1.00E-05, loss= 1.1717 (max= 1.6752), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:49:44,794 - root - INFO - Step 25410: lr=1.00E-05, loss= 1.1717 (max= 1.6752), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:49:44,794 - root - INFO - Step 25410: lr=1.00E-05, loss= 1.1717 (max= 1.6752), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:49:44,795 - root - INFO - Step 25410: lr=1.00E-05, loss= 1.1717 (max= 1.6752), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:49:44,795 - root - INFO - Step 25410: lr=1.00E-05, loss= 1.1717 (max= 1.6752), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:49:44,795 - root - INFO - Step 25410: lr=1.00E-05, loss= 1.1717 (max= 1.6752), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:49:44,795 - root - INFO - Step 25410: lr=1.00E-05, loss= 1.1717 (max= 1.6752), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:49:44,795 - root - INFO - Step 25410: lr=1.00E-05, loss= 1.1717 (max= 1.6752), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:00,713 - root - INFO - Step 25420: lr=1.00E-05, loss= 1.1765 (max= 1.6240), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:00,713 - root - INFO - Step 25420: lr=1.00E-05, loss= 1.1765 (max= 1.6240), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:00,713 - root - INFO - Step 25420: lr=1.00E-05, loss= 1.1765 (max= 1.6240), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:00,713 - root - INFO - Step 25420: lr=1.00E-05, loss= 1.1765 (max= 1.6240), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:00,713 - root - INFO - Step 25420: lr=1.00E-05, loss= 1.1765 (max= 1.6240), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:00,713 - root - INFO - Step 25420: lr=1.00E-05, loss= 1.1765 (max= 1.6240), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:00,714 - root - INFO - Step 25420: lr=1.00E-05, loss= 1.1765 (max= 1.6240), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:00,714 - root - INFO - Step 25420: lr=1.00E-05, loss= 1.1765 (max= 1.6240), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:16,641 - root - INFO - Step 25430: lr=1.00E-05, loss= 1.2031 (max= 1.5829), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:16,641 - root - INFO - Step 25430: lr=1.00E-05, loss= 1.2031 (max= 1.5829), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:16,641 - root - INFO - Step 25430: lr=1.00E-05, loss= 1.2031 (max= 1.5829), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:16,641 - root - INFO - Step 25430: lr=1.00E-05, loss= 1.2031 (max= 1.5829), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:16,641 - root - INFO - Step 25430: lr=1.00E-05, loss= 1.2031 (max= 1.5829), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:16,641 - root - INFO - Step 25430: lr=1.00E-05, loss= 1.2031 (max= 1.5829), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:16,642 - root - INFO - Step 25430: lr=1.00E-05, loss= 1.2031 (max= 1.5829), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:16,642 - root - INFO - Step 25430: lr=1.00E-05, loss= 1.2031 (max= 1.5829), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:32,543 - root - INFO - Step 25440: lr=1.00E-05, loss= 1.1869 (max= 1.6116), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:32,543 - root - INFO - Step 25440: lr=1.00E-05, loss= 1.1869 (max= 1.6116), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:32,543 - root - INFO - Step 25440: lr=1.00E-05, loss= 1.1869 (max= 1.6116), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:32,543 - root - INFO - Step 25440: lr=1.00E-05, loss= 1.1869 (max= 1.6116), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:32,543 - root - INFO - Step 25440: lr=1.00E-05, loss= 1.1869 (max= 1.6116), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:32,543 - root - INFO - Step 25440: lr=1.00E-05, loss= 1.1869 (max= 1.6116), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:32,543 - root - INFO - Step 25440: lr=1.00E-05, loss= 1.1869 (max= 1.6116), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:32,543 - root - INFO - Step 25440: lr=1.00E-05, loss= 1.1869 (max= 1.6116), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:48,488 - root - INFO - Step 25450: lr=1.00E-05, loss= 1.2113 (max= 1.5746), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:48,488 - root - INFO - Step 25450: lr=1.00E-05, loss= 1.2113 (max= 1.5746), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:48,488 - root - INFO - Step 25450: lr=1.00E-05, loss= 1.2113 (max= 1.5746), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:48,488 - root - INFO - Step 25450: lr=1.00E-05, loss= 1.2113 (max= 1.5746), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:48,488 - root - INFO - Step 25450: lr=1.00E-05, loss= 1.2113 (max= 1.5746), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:48,488 - root - INFO - Step 25450: lr=1.00E-05, loss= 1.2113 (max= 1.5746), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:48,488 - root - INFO - Step 25450: lr=1.00E-05, loss= 1.2113 (max= 1.5746), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:48,488 - root - INFO - Step 25450: lr=1.00E-05, loss= 1.2113 (max= 1.5746), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:50:49,230 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:6001089 +2025-10-24 21:51:04,476 - root - INFO - Step 25460: lr=1.00E-05, loss= 1.1727 (max= 1.6026), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:51:04,476 - root - INFO - Step 25460: lr=1.00E-05, loss= 1.1727 (max= 1.6026), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:51:04,476 - root - INFO - Step 25460: lr=1.00E-05, loss= 1.1727 (max= 1.6026), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:51:04,476 - root - INFO - Step 25460: lr=1.00E-05, loss= 1.1727 (max= 1.6026), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:51:04,476 - root - INFO - Step 25460: lr=1.00E-05, loss= 1.1727 (max= 1.6026), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:51:04,476 - root - INFO - Step 25460: lr=1.00E-05, loss= 1.1727 (max= 1.6026), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:51:04,476 - root - INFO - Step 25460: lr=1.00E-05, loss= 1.1727 (max= 1.6026), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:51:04,476 - root - INFO - Step 25460: lr=1.00E-05, loss= 1.1727 (max= 1.6026), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:51:20,431 - root - INFO - Step 25470: lr=1.00E-05, loss= 1.1545 (max= 1.5574), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:20,431 - root - INFO - Step 25470: lr=1.00E-05, loss= 1.1545 (max= 1.5574), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:20,431 - root - INFO - Step 25470: lr=1.00E-05, loss= 1.1545 (max= 1.5574), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:20,431 - root - INFO - Step 25470: lr=1.00E-05, loss= 1.1545 (max= 1.5574), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:20,431 - root - INFO - Step 25470: lr=1.00E-05, loss= 1.1545 (max= 1.5574), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:20,431 - root - INFO - Step 25470: lr=1.00E-05, loss= 1.1545 (max= 1.5574), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:20,431 - root - INFO - Step 25470: lr=1.00E-05, loss= 1.1545 (max= 1.5574), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:20,431 - root - INFO - Step 25470: lr=1.00E-05, loss= 1.1545 (max= 1.5574), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:36,409 - root - INFO - Step 25480: lr=1.00E-05, loss= 1.1756 (max= 1.6594), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:36,409 - root - INFO - Step 25480: lr=1.00E-05, loss= 1.1756 (max= 1.6594), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:36,409 - root - INFO - Step 25480: lr=1.00E-05, loss= 1.1756 (max= 1.6594), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:36,409 - root - INFO - Step 25480: lr=1.00E-05, loss= 1.1756 (max= 1.6594), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:36,409 - root - INFO - Step 25480: lr=1.00E-05, loss= 1.1756 (max= 1.6594), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:36,409 - root - INFO - Step 25480: lr=1.00E-05, loss= 1.1756 (max= 1.6594), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:36,409 - root - INFO - Step 25480: lr=1.00E-05, loss= 1.1756 (max= 1.6594), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:36,409 - root - INFO - Step 25480: lr=1.00E-05, loss= 1.1756 (max= 1.6594), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:52,338 - root - INFO - Step 25490: lr=1.00E-05, loss= 1.2000 (max= 1.5591), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:52,338 - root - INFO - Step 25490: lr=1.00E-05, loss= 1.2000 (max= 1.5591), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:52,338 - root - INFO - Step 25490: lr=1.00E-05, loss= 1.2000 (max= 1.5591), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:52,338 - root - INFO - Step 25490: lr=1.00E-05, loss= 1.2000 (max= 1.5591), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:52,338 - root - INFO - Step 25490: lr=1.00E-05, loss= 1.2000 (max= 1.5591), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:52,338 - root - INFO - Step 25490: lr=1.00E-05, loss= 1.2000 (max= 1.5591), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:52,338 - root - INFO - Step 25490: lr=1.00E-05, loss= 1.2000 (max= 1.5591), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:51:52,339 - root - INFO - Step 25490: lr=1.00E-05, loss= 1.2000 (max= 1.5591), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:52:08,271 - root - INFO - Step 25500: lr=1.00E-05, loss= 1.1987 (max= 1.5383), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:08,271 - root - INFO - Step 25500: lr=1.00E-05, loss= 1.1987 (max= 1.5383), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:08,271 - root - INFO - Step 25500: lr=1.00E-05, loss= 1.1987 (max= 1.5383), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:08,271 - root - INFO - Step 25500: lr=1.00E-05, loss= 1.1987 (max= 1.5383), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:08,271 - root - INFO - Step 25500: lr=1.00E-05, loss= 1.1987 (max= 1.5383), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:08,271 - root - INFO - Step 25500: lr=1.00E-05, loss= 1.1987 (max= 1.5383), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:08,271 - root - INFO - Step 25500: lr=1.00E-05, loss= 1.1987 (max= 1.5383), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:08,271 - root - INFO - Step 25500: lr=1.00E-05, loss= 1.1987 (max= 1.5383), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:24,213 - root - INFO - Step 25510: lr=1.00E-05, loss= 1.1762 (max= 1.7916), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:24,214 - root - INFO - Step 25510: lr=1.00E-05, loss= 1.1762 (max= 1.7916), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:24,214 - root - INFO - Step 25510: lr=1.00E-05, loss= 1.1762 (max= 1.7916), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:24,214 - root - INFO - Step 25510: lr=1.00E-05, loss= 1.1762 (max= 1.7916), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:24,214 - root - INFO - Step 25510: lr=1.00E-05, loss= 1.1762 (max= 1.7916), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:24,214 - root - INFO - Step 25510: lr=1.00E-05, loss= 1.1762 (max= 1.7916), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:24,214 - root - INFO - Step 25510: lr=1.00E-05, loss= 1.1762 (max= 1.7916), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:24,214 - root - INFO - Step 25510: lr=1.00E-05, loss= 1.1762 (max= 1.7916), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:40,145 - root - INFO - Step 25520: lr=1.00E-05, loss= 1.2060 (max= 1.6717), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:40,145 - root - INFO - Step 25520: lr=1.00E-05, loss= 1.2060 (max= 1.6717), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:40,145 - root - INFO - Step 25520: lr=1.00E-05, loss= 1.2060 (max= 1.6717), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:40,145 - root - INFO - Step 25520: lr=1.00E-05, loss= 1.2060 (max= 1.6717), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:40,145 - root - INFO - Step 25520: lr=1.00E-05, loss= 1.2060 (max= 1.6717), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:40,145 - root - INFO - Step 25520: lr=1.00E-05, loss= 1.2060 (max= 1.6717), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:40,145 - root - INFO - Step 25520: lr=1.00E-05, loss= 1.2060 (max= 1.6717), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:40,146 - root - INFO - Step 25520: lr=1.00E-05, loss= 1.2060 (max= 1.6717), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:52:56,085 - root - INFO - Step 25530: lr=1.00E-05, loss= 1.1710 (max= 1.5291), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:52:56,085 - root - INFO - Step 25530: lr=1.00E-05, loss= 1.1710 (max= 1.5291), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:52:56,085 - root - INFO - Step 25530: lr=1.00E-05, loss= 1.1710 (max= 1.5291), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:52:56,085 - root - INFO - Step 25530: lr=1.00E-05, loss= 1.1710 (max= 1.5291), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:52:56,085 - root - INFO - Step 25530: lr=1.00E-05, loss= 1.1710 (max= 1.5291), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:52:56,085 - root - INFO - Step 25530: lr=1.00E-05, loss= 1.1710 (max= 1.5291), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:52:56,085 - root - INFO - Step 25530: lr=1.00E-05, loss= 1.1710 (max= 1.5291), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:52:56,085 - root - INFO - Step 25530: lr=1.00E-05, loss= 1.1710 (max= 1.5291), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:53:11,996 - root - INFO - Step 25540: lr=1.00E-05, loss= 1.1698 (max= 1.5688), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:11,996 - root - INFO - Step 25540: lr=1.00E-05, loss= 1.1698 (max= 1.5688), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:11,996 - root - INFO - Step 25540: lr=1.00E-05, loss= 1.1698 (max= 1.5688), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:11,996 - root - INFO - Step 25540: lr=1.00E-05, loss= 1.1698 (max= 1.5688), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:11,996 - root - INFO - Step 25540: lr=1.00E-05, loss= 1.1698 (max= 1.5688), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:11,996 - root - INFO - Step 25540: lr=1.00E-05, loss= 1.1698 (max= 1.5688), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:11,996 - root - INFO - Step 25540: lr=1.00E-05, loss= 1.1698 (max= 1.5688), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:11,996 - root - INFO - Step 25540: lr=1.00E-05, loss= 1.1698 (max= 1.5688), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:27,951 - root - INFO - Step 25550: lr=1.00E-05, loss= 1.1453 (max= 1.5404), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:53:27,951 - root - INFO - Step 25550: lr=1.00E-05, loss= 1.1453 (max= 1.5404), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:53:27,951 - root - INFO - Step 25550: lr=1.00E-05, loss= 1.1453 (max= 1.5404), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:53:27,951 - root - INFO - Step 25550: lr=1.00E-05, loss= 1.1453 (max= 1.5404), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:53:27,951 - root - INFO - Step 25550: lr=1.00E-05, loss= 1.1453 (max= 1.5404), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:53:27,951 - root - INFO - Step 25550: lr=1.00E-05, loss= 1.1453 (max= 1.5404), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:53:27,951 - root - INFO - Step 25550: lr=1.00E-05, loss= 1.1453 (max= 1.5404), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:53:27,951 - root - INFO - Step 25550: lr=1.00E-05, loss= 1.1453 (max= 1.5404), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:53:43,853 - root - INFO - Step 25560: lr=1.00E-05, loss= 1.2082 (max= 1.5950), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:43,853 - root - INFO - Step 25560: lr=1.00E-05, loss= 1.2082 (max= 1.5950), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:43,854 - root - INFO - Step 25560: lr=1.00E-05, loss= 1.2082 (max= 1.5950), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:43,854 - root - INFO - Step 25560: lr=1.00E-05, loss= 1.2082 (max= 1.5950), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:43,854 - root - INFO - Step 25560: lr=1.00E-05, loss= 1.2082 (max= 1.5950), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:43,854 - root - INFO - Step 25560: lr=1.00E-05, loss= 1.2082 (max= 1.5950), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:43,854 - root - INFO - Step 25560: lr=1.00E-05, loss= 1.2082 (max= 1.5950), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:43,854 - root - INFO - Step 25560: lr=1.00E-05, loss= 1.2082 (max= 1.5950), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:59,829 - root - INFO - Step 25570: lr=1.00E-05, loss= 1.1877 (max= 1.5040), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:59,829 - root - INFO - Step 25570: lr=1.00E-05, loss= 1.1877 (max= 1.5040), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:59,829 - root - INFO - Step 25570: lr=1.00E-05, loss= 1.1877 (max= 1.5040), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:59,829 - root - INFO - Step 25570: lr=1.00E-05, loss= 1.1877 (max= 1.5040), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:59,829 - root - INFO - Step 25570: lr=1.00E-05, loss= 1.1877 (max= 1.5040), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:59,829 - root - INFO - Step 25570: lr=1.00E-05, loss= 1.1877 (max= 1.5040), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:59,829 - root - INFO - Step 25570: lr=1.00E-05, loss= 1.1877 (max= 1.5040), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:53:59,829 - root - INFO - Step 25570: lr=1.00E-05, loss= 1.1877 (max= 1.5040), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:54:15,757 - root - INFO - Step 25580: lr=1.00E-05, loss= 1.1797 (max= 1.5707), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:54:15,757 - root - INFO - Step 25580: lr=1.00E-05, loss= 1.1797 (max= 1.5707), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:54:15,757 - root - INFO - Step 25580: lr=1.00E-05, loss= 1.1797 (max= 1.5707), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:54:15,757 - root - INFO - Step 25580: lr=1.00E-05, loss= 1.1797 (max= 1.5707), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:54:15,757 - root - INFO - Step 25580: lr=1.00E-05, loss= 1.1797 (max= 1.5707), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:54:15,757 - root - INFO - Step 25580: lr=1.00E-05, loss= 1.1797 (max= 1.5707), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:54:15,757 - root - INFO - Step 25580: lr=1.00E-05, loss= 1.1797 (max= 1.5707), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:54:15,757 - root - INFO - Step 25580: lr=1.00E-05, loss= 1.1797 (max= 1.5707), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:54:31,738 - root - INFO - Step 25590: lr=1.00E-05, loss= 1.1813 (max= 1.9542), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:54:31,738 - root - INFO - Step 25590: lr=1.00E-05, loss= 1.1813 (max= 1.9542), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:54:31,738 - root - INFO - Step 25590: lr=1.00E-05, loss= 1.1813 (max= 1.9542), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:54:31,738 - root - INFO - Step 25590: lr=1.00E-05, loss= 1.1813 (max= 1.9542), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:54:31,739 - root - INFO - Step 25590: lr=1.00E-05, loss= 1.1813 (max= 1.9542), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:54:31,739 - root - INFO - Step 25590: lr=1.00E-05, loss= 1.1813 (max= 1.9542), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:54:31,739 - root - INFO - Step 25590: lr=1.00E-05, loss= 1.1813 (max= 1.9542), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:54:31,739 - root - INFO - Step 25590: lr=1.00E-05, loss= 1.1813 (max= 1.9542), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:54:47,706 - root - INFO - Step 25600: lr=1.00E-05, loss= 1.1972 (max= 1.5293), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:54:47,706 - root - INFO - Step 25600: lr=1.00E-05, loss= 1.1972 (max= 1.5293), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:54:47,706 - root - INFO - Step 25600: lr=1.00E-05, loss= 1.1972 (max= 1.5293), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:54:47,707 - root - INFO - Step 25600: lr=1.00E-05, loss= 1.1972 (max= 1.5293), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:54:47,707 - root - INFO - Step 25600: lr=1.00E-05, loss= 1.1972 (max= 1.5293), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:54:47,707 - root - INFO - Step 25600: lr=1.00E-05, loss= 1.1972 (max= 1.5293), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:54:47,707 - root - INFO - Step 25600: lr=1.00E-05, loss= 1.1972 (max= 1.5293), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:54:47,707 - root - INFO - Step 25600: lr=1.00E-05, loss= 1.1972 (max= 1.5293), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:03,682 - root - INFO - Step 25610: lr=1.00E-05, loss= 1.1713 (max= 1.6850), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:03,682 - root - INFO - Step 25610: lr=1.00E-05, loss= 1.1713 (max= 1.6850), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:03,682 - root - INFO - Step 25610: lr=1.00E-05, loss= 1.1713 (max= 1.6850), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:03,682 - root - INFO - Step 25610: lr=1.00E-05, loss= 1.1713 (max= 1.6850), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:03,682 - root - INFO - Step 25610: lr=1.00E-05, loss= 1.1713 (max= 1.6850), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:03,682 - root - INFO - Step 25610: lr=1.00E-05, loss= 1.1713 (max= 1.6850), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:03,682 - root - INFO - Step 25610: lr=1.00E-05, loss= 1.1713 (max= 1.6850), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:03,682 - root - INFO - Step 25610: lr=1.00E-05, loss= 1.1713 (max= 1.6850), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:19,631 - root - INFO - Step 25620: lr=1.00E-05, loss= 1.1811 (max= 1.5427), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:19,631 - root - INFO - Step 25620: lr=1.00E-05, loss= 1.1811 (max= 1.5427), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:19,631 - root - INFO - Step 25620: lr=1.00E-05, loss= 1.1811 (max= 1.5427), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:19,631 - root - INFO - Step 25620: lr=1.00E-05, loss= 1.1811 (max= 1.5427), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:19,631 - root - INFO - Step 25620: lr=1.00E-05, loss= 1.1811 (max= 1.5427), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:19,632 - root - INFO - Step 25620: lr=1.00E-05, loss= 1.1811 (max= 1.5427), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:19,632 - root - INFO - Step 25620: lr=1.00E-05, loss= 1.1811 (max= 1.5427), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:19,632 - root - INFO - Step 25620: lr=1.00E-05, loss= 1.1811 (max= 1.5427), tps=20549, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:35,521 - root - INFO - Step 25630: lr=1.00E-05, loss= 1.1928 (max= 1.6087), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:35,521 - root - INFO - Step 25630: lr=1.00E-05, loss= 1.1928 (max= 1.6087), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:35,521 - root - INFO - Step 25630: lr=1.00E-05, loss= 1.1928 (max= 1.6087), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:35,521 - root - INFO - Step 25630: lr=1.00E-05, loss= 1.1928 (max= 1.6087), tps=20626, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:35,521 - root - INFO - Step 25630: lr=1.00E-05, loss= 1.1928 (max= 1.6087), tps=20626, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:35,522 - root - INFO - Step 25630: lr=1.00E-05, loss= 1.1928 (max= 1.6087), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:35,522 - root - INFO - Step 25630: lr=1.00E-05, loss= 1.1928 (max= 1.6087), tps=20626, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:35,522 - root - INFO - Step 25630: lr=1.00E-05, loss= 1.1928 (max= 1.6087), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:51,484 - root - INFO - Step 25640: lr=1.00E-05, loss= 1.1912 (max= 1.6527), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:51,484 - root - INFO - Step 25640: lr=1.00E-05, loss= 1.1912 (max= 1.6527), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:51,484 - root - INFO - Step 25640: lr=1.00E-05, loss= 1.1912 (max= 1.6527), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:51,484 - root - INFO - Step 25640: lr=1.00E-05, loss= 1.1912 (max= 1.6527), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:51,484 - root - INFO - Step 25640: lr=1.00E-05, loss= 1.1912 (max= 1.6527), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:51,484 - root - INFO - Step 25640: lr=1.00E-05, loss= 1.1912 (max= 1.6527), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:51,484 - root - INFO - Step 25640: lr=1.00E-05, loss= 1.1912 (max= 1.6527), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:55:51,484 - root - INFO - Step 25640: lr=1.00E-05, loss= 1.1912 (max= 1.6527), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:56:07,479 - root - INFO - Step 25650: lr=1.00E-05, loss= 1.1785 (max= 1.5924), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:07,479 - root - INFO - Step 25650: lr=1.00E-05, loss= 1.1785 (max= 1.5924), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:07,479 - root - INFO - Step 25650: lr=1.00E-05, loss= 1.1785 (max= 1.5924), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:07,480 - root - INFO - Step 25650: lr=1.00E-05, loss= 1.1785 (max= 1.5924), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:07,480 - root - INFO - Step 25650: lr=1.00E-05, loss= 1.1785 (max= 1.5924), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:07,480 - root - INFO - Step 25650: lr=1.00E-05, loss= 1.1785 (max= 1.5924), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:07,480 - root - INFO - Step 25650: lr=1.00E-05, loss= 1.1785 (max= 1.5924), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:07,480 - root - INFO - Step 25650: lr=1.00E-05, loss= 1.1785 (max= 1.5924), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:23,380 - root - INFO - Step 25660: lr=1.00E-05, loss= 1.1850 (max= 1.5965), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:23,380 - root - INFO - Step 25660: lr=1.00E-05, loss= 1.1850 (max= 1.5965), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:23,380 - root - INFO - Step 25660: lr=1.00E-05, loss= 1.1850 (max= 1.5965), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:23,380 - root - INFO - Step 25660: lr=1.00E-05, loss= 1.1850 (max= 1.5965), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:23,380 - root - INFO - Step 25660: lr=1.00E-05, loss= 1.1850 (max= 1.5965), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:23,380 - root - INFO - Step 25660: lr=1.00E-05, loss= 1.1850 (max= 1.5965), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:23,380 - root - INFO - Step 25660: lr=1.00E-05, loss= 1.1850 (max= 1.5965), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:23,380 - root - INFO - Step 25660: lr=1.00E-05, loss= 1.1850 (max= 1.5965), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:39,355 - root - INFO - Step 25670: lr=1.00E-05, loss= 1.1569 (max= 1.6386), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:56:39,355 - root - INFO - Step 25670: lr=1.00E-05, loss= 1.1569 (max= 1.6386), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:56:39,356 - root - INFO - Step 25670: lr=1.00E-05, loss= 1.1569 (max= 1.6386), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:56:39,356 - root - INFO - Step 25670: lr=1.00E-05, loss= 1.1569 (max= 1.6386), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:56:39,356 - root - INFO - Step 25670: lr=1.00E-05, loss= 1.1569 (max= 1.6386), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:56:39,356 - root - INFO - Step 25670: lr=1.00E-05, loss= 1.1569 (max= 1.6386), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:56:39,356 - root - INFO - Step 25670: lr=1.00E-05, loss= 1.1569 (max= 1.6386), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:56:39,356 - root - INFO - Step 25670: lr=1.00E-05, loss= 1.1569 (max= 1.6386), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:56:55,290 - root - INFO - Step 25680: lr=1.00E-05, loss= 1.1757 (max= 1.4700), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:55,290 - root - INFO - Step 25680: lr=1.00E-05, loss= 1.1757 (max= 1.4700), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:55,290 - root - INFO - Step 25680: lr=1.00E-05, loss= 1.1757 (max= 1.4700), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:55,290 - root - INFO - Step 25680: lr=1.00E-05, loss= 1.1757 (max= 1.4700), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:55,290 - root - INFO - Step 25680: lr=1.00E-05, loss= 1.1757 (max= 1.4700), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:55,290 - root - INFO - Step 25680: lr=1.00E-05, loss= 1.1757 (max= 1.4700), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:55,291 - root - INFO - Step 25680: lr=1.00E-05, loss= 1.1757 (max= 1.4700), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:56:55,291 - root - INFO - Step 25680: lr=1.00E-05, loss= 1.1757 (max= 1.4700), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:57:11,221 - root - INFO - Step 25690: lr=1.00E-05, loss= 1.1655 (max= 1.9178), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:57:11,221 - root - INFO - Step 25690: lr=1.00E-05, loss= 1.1655 (max= 1.9178), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:57:11,221 - root - INFO - Step 25690: lr=1.00E-05, loss= 1.1655 (max= 1.9178), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:57:11,221 - root - INFO - Step 25690: lr=1.00E-05, loss= 1.1655 (max= 1.9178), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:57:11,221 - root - INFO - Step 25690: lr=1.00E-05, loss= 1.1655 (max= 1.9178), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:57:11,221 - root - INFO - Step 25690: lr=1.00E-05, loss= 1.1655 (max= 1.9178), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:57:11,222 - root - INFO - Step 25690: lr=1.00E-05, loss= 1.1655 (max= 1.9178), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:57:11,222 - root - INFO - Step 25690: lr=1.00E-05, loss= 1.1655 (max= 1.9178), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:57:27,185 - root - INFO - Step 25700: lr=1.00E-05, loss= 1.1519 (max= 1.5762), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:27,185 - root - INFO - Step 25700: lr=1.00E-05, loss= 1.1519 (max= 1.5762), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:27,185 - root - INFO - Step 25700: lr=1.00E-05, loss= 1.1519 (max= 1.5762), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:27,185 - root - INFO - Step 25700: lr=1.00E-05, loss= 1.1519 (max= 1.5762), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:27,185 - root - INFO - Step 25700: lr=1.00E-05, loss= 1.1519 (max= 1.5762), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:27,185 - root - INFO - Step 25700: lr=1.00E-05, loss= 1.1519 (max= 1.5762), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:27,185 - root - INFO - Step 25700: lr=1.00E-05, loss= 1.1519 (max= 1.5762), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:27,185 - root - INFO - Step 25700: lr=1.00E-05, loss= 1.1519 (max= 1.5762), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:43,133 - root - INFO - Step 25710: lr=1.00E-05, loss= 1.1844 (max= 1.7215), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:43,133 - root - INFO - Step 25710: lr=1.00E-05, loss= 1.1844 (max= 1.7215), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:43,134 - root - INFO - Step 25710: lr=1.00E-05, loss= 1.1844 (max= 1.7215), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:43,134 - root - INFO - Step 25710: lr=1.00E-05, loss= 1.1844 (max= 1.7215), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:43,134 - root - INFO - Step 25710: lr=1.00E-05, loss= 1.1844 (max= 1.7215), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:43,134 - root - INFO - Step 25710: lr=1.00E-05, loss= 1.1844 (max= 1.7215), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:43,134 - root - INFO - Step 25710: lr=1.00E-05, loss= 1.1844 (max= 1.7215), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:43,134 - root - INFO - Step 25710: lr=1.00E-05, loss= 1.1844 (max= 1.7215), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:59,029 - root - INFO - Step 25720: lr=1.00E-05, loss= 1.1987 (max= 2.0187), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:59,029 - root - INFO - Step 25720: lr=1.00E-05, loss= 1.1987 (max= 2.0187), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:59,030 - root - INFO - Step 25720: lr=1.00E-05, loss= 1.1987 (max= 2.0187), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:59,030 - root - INFO - Step 25720: lr=1.00E-05, loss= 1.1987 (max= 2.0187), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:59,030 - root - INFO - Step 25720: lr=1.00E-05, loss= 1.1987 (max= 2.0187), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:59,030 - root - INFO - Step 25720: lr=1.00E-05, loss= 1.1987 (max= 2.0187), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:59,030 - root - INFO - Step 25720: lr=1.00E-05, loss= 1.1987 (max= 2.0187), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:57:59,030 - root - INFO - Step 25720: lr=1.00E-05, loss= 1.1987 (max= 2.0187), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:58:14,971 - root - INFO - Step 25730: lr=1.00E-05, loss= 1.1295 (max= 1.5275), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:58:14,971 - root - INFO - Step 25730: lr=1.00E-05, loss= 1.1295 (max= 1.5275), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:58:14,971 - root - INFO - Step 25730: lr=1.00E-05, loss= 1.1295 (max= 1.5275), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:58:14,971 - root - INFO - Step 25730: lr=1.00E-05, loss= 1.1295 (max= 1.5275), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:58:14,971 - root - INFO - Step 25730: lr=1.00E-05, loss= 1.1295 (max= 1.5275), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:58:14,971 - root - INFO - Step 25730: lr=1.00E-05, loss= 1.1295 (max= 1.5275), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:58:14,971 - root - INFO - Step 25730: lr=1.00E-05, loss= 1.1295 (max= 1.5275), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:58:14,972 - root - INFO - Step 25730: lr=1.00E-05, loss= 1.1295 (max= 1.5275), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:58:30,941 - root - INFO - Step 25740: lr=1.00E-05, loss= 1.1945 (max= 1.6161), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:58:30,941 - root - INFO - Step 25740: lr=1.00E-05, loss= 1.1945 (max= 1.6161), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:58:30,941 - root - INFO - Step 25740: lr=1.00E-05, loss= 1.1945 (max= 1.6161), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:58:30,941 - root - INFO - Step 25740: lr=1.00E-05, loss= 1.1945 (max= 1.6161), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:58:30,941 - root - INFO - Step 25740: lr=1.00E-05, loss= 1.1945 (max= 1.6161), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:58:30,941 - root - INFO - Step 25740: lr=1.00E-05, loss= 1.1945 (max= 1.6161), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:58:30,942 - root - INFO - Step 25740: lr=1.00E-05, loss= 1.1945 (max= 1.6161), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:58:30,942 - root - INFO - Step 25740: lr=1.00E-05, loss= 1.1945 (max= 1.6161), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:58:46,899 - root - INFO - Step 25750: lr=1.00E-05, loss= 1.1728 (max= 1.5480), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:58:46,899 - root - INFO - Step 25750: lr=1.00E-05, loss= 1.1728 (max= 1.5480), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:58:46,899 - root - INFO - Step 25750: lr=1.00E-05, loss= 1.1728 (max= 1.5480), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:58:46,899 - root - INFO - Step 25750: lr=1.00E-05, loss= 1.1728 (max= 1.5480), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:58:46,899 - root - INFO - Step 25750: lr=1.00E-05, loss= 1.1728 (max= 1.5480), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:58:46,899 - root - INFO - Step 25750: lr=1.00E-05, loss= 1.1728 (max= 1.5480), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:58:46,899 - root - INFO - Step 25750: lr=1.00E-05, loss= 1.1728 (max= 1.5480), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:58:46,900 - root - INFO - Step 25750: lr=1.00E-05, loss= 1.1728 (max= 1.5480), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:02,787 - root - INFO - Step 25760: lr=1.00E-05, loss= 1.1995 (max= 2.0082), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:59:02,788 - root - INFO - Step 25760: lr=1.00E-05, loss= 1.1995 (max= 2.0082), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:59:02,788 - root - INFO - Step 25760: lr=1.00E-05, loss= 1.1995 (max= 2.0082), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:59:02,788 - root - INFO - Step 25760: lr=1.00E-05, loss= 1.1995 (max= 2.0082), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:59:02,788 - root - INFO - Step 25760: lr=1.00E-05, loss= 1.1995 (max= 2.0082), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:59:02,788 - root - INFO - Step 25760: lr=1.00E-05, loss= 1.1995 (max= 2.0082), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:59:02,788 - root - INFO - Step 25760: lr=1.00E-05, loss= 1.1995 (max= 2.0082), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:59:02,788 - root - INFO - Step 25760: lr=1.00E-05, loss= 1.1995 (max= 2.0082), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 21:59:18,723 - root - INFO - Step 25770: lr=1.00E-05, loss= 1.2020 (max= 1.7770), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:18,723 - root - INFO - Step 25770: lr=1.00E-05, loss= 1.2020 (max= 1.7770), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:18,723 - root - INFO - Step 25770: lr=1.00E-05, loss= 1.2020 (max= 1.7770), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:18,723 - root - INFO - Step 25770: lr=1.00E-05, loss= 1.2020 (max= 1.7770), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:18,723 - root - INFO - Step 25770: lr=1.00E-05, loss= 1.2020 (max= 1.7770), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:18,723 - root - INFO - Step 25770: lr=1.00E-05, loss= 1.2020 (max= 1.7770), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:18,723 - root - INFO - Step 25770: lr=1.00E-05, loss= 1.2020 (max= 1.7770), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:18,723 - root - INFO - Step 25770: lr=1.00E-05, loss= 1.2020 (max= 1.7770), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:34,663 - root - INFO - Step 25780: lr=1.00E-05, loss= 1.2028 (max= 1.7286), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:34,663 - root - INFO - Step 25780: lr=1.00E-05, loss= 1.2028 (max= 1.7286), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:34,663 - root - INFO - Step 25780: lr=1.00E-05, loss= 1.2028 (max= 1.7286), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:34,663 - root - INFO - Step 25780: lr=1.00E-05, loss= 1.2028 (max= 1.7286), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:34,663 - root - INFO - Step 25780: lr=1.00E-05, loss= 1.2028 (max= 1.7286), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:34,663 - root - INFO - Step 25780: lr=1.00E-05, loss= 1.2028 (max= 1.7286), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:34,664 - root - INFO - Step 25780: lr=1.00E-05, loss= 1.2028 (max= 1.7286), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:34,664 - root - INFO - Step 25780: lr=1.00E-05, loss= 1.2028 (max= 1.7286), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:50,621 - root - INFO - Step 25790: lr=1.00E-05, loss= 1.1779 (max= 1.5816), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:50,621 - root - INFO - Step 25790: lr=1.00E-05, loss= 1.1779 (max= 1.5816), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:50,621 - root - INFO - Step 25790: lr=1.00E-05, loss= 1.1779 (max= 1.5816), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:50,621 - root - INFO - Step 25790: lr=1.00E-05, loss= 1.1779 (max= 1.5816), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:50,622 - root - INFO - Step 25790: lr=1.00E-05, loss= 1.1779 (max= 1.5816), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:50,622 - root - INFO - Step 25790: lr=1.00E-05, loss= 1.1779 (max= 1.5816), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:50,622 - root - INFO - Step 25790: lr=1.00E-05, loss= 1.1779 (max= 1.5816), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 21:59:50,622 - root - INFO - Step 25790: lr=1.00E-05, loss= 1.1779 (max= 1.5816), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:06,583 - root - INFO - Step 25800: lr=1.00E-05, loss= 1.1612 (max= 1.6349), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:06,583 - root - INFO - Step 25800: lr=1.00E-05, loss= 1.1612 (max= 1.6349), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:06,583 - root - INFO - Step 25800: lr=1.00E-05, loss= 1.1612 (max= 1.6349), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:06,583 - root - INFO - Step 25800: lr=1.00E-05, loss= 1.1612 (max= 1.6349), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:06,584 - root - INFO - Step 25800: lr=1.00E-05, loss= 1.1612 (max= 1.6349), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:06,584 - root - INFO - Step 25800: lr=1.00E-05, loss= 1.1612 (max= 1.6349), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:06,584 - root - INFO - Step 25800: lr=1.00E-05, loss= 1.1612 (max= 1.6349), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:06,584 - root - INFO - Step 25800: lr=1.00E-05, loss= 1.1612 (max= 1.6349), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:22,499 - root - INFO - Step 25810: lr=1.00E-05, loss= 1.1579 (max= 1.9300), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:00:22,499 - root - INFO - Step 25810: lr=1.00E-05, loss= 1.1579 (max= 1.9300), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:00:22,499 - root - INFO - Step 25810: lr=1.00E-05, loss= 1.1579 (max= 1.9300), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:00:22,499 - root - INFO - Step 25810: lr=1.00E-05, loss= 1.1579 (max= 1.9300), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:00:22,499 - root - INFO - Step 25810: lr=1.00E-05, loss= 1.1579 (max= 1.9300), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:00:22,499 - root - INFO - Step 25810: lr=1.00E-05, loss= 1.1579 (max= 1.9300), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:00:22,499 - root - INFO - Step 25810: lr=1.00E-05, loss= 1.1579 (max= 1.9300), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:00:22,499 - root - INFO - Step 25810: lr=1.00E-05, loss= 1.1579 (max= 1.9300), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:00:38,385 - root - INFO - Step 25820: lr=1.00E-05, loss= 1.1922 (max= 1.5251), tps=20631, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:38,385 - root - INFO - Step 25820: lr=1.00E-05, loss= 1.1922 (max= 1.5251), tps=20631, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:38,385 - root - INFO - Step 25820: lr=1.00E-05, loss= 1.1922 (max= 1.5251), tps=20631, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:38,385 - root - INFO - Step 25820: lr=1.00E-05, loss= 1.1922 (max= 1.5251), tps=20631, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:38,385 - root - INFO - Step 25820: lr=1.00E-05, loss= 1.1922 (max= 1.5251), tps=20631, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:38,385 - root - INFO - Step 25820: lr=1.00E-05, loss= 1.1922 (max= 1.5251), tps=20631, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:38,385 - root - INFO - Step 25820: lr=1.00E-05, loss= 1.1922 (max= 1.5251), tps=20631, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:38,386 - root - INFO - Step 25820: lr=1.00E-05, loss= 1.1922 (max= 1.5251), tps=20631, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:54,341 - root - INFO - Step 25830: lr=1.00E-05, loss= 1.1916 (max= 1.7769), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:54,342 - root - INFO - Step 25830: lr=1.00E-05, loss= 1.1916 (max= 1.7769), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:54,342 - root - INFO - Step 25830: lr=1.00E-05, loss= 1.1916 (max= 1.7769), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:54,342 - root - INFO - Step 25830: lr=1.00E-05, loss= 1.1916 (max= 1.7769), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:54,342 - root - INFO - Step 25830: lr=1.00E-05, loss= 1.1916 (max= 1.7769), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:54,342 - root - INFO - Step 25830: lr=1.00E-05, loss= 1.1916 (max= 1.7769), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:54,342 - root - INFO - Step 25830: lr=1.00E-05, loss= 1.1916 (max= 1.7769), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:00:54,342 - root - INFO - Step 25830: lr=1.00E-05, loss= 1.1916 (max= 1.7769), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:01:10,279 - root - INFO - Step 25840: lr=1.00E-05, loss= 1.1794 (max= 1.5664), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:01:10,279 - root - INFO - Step 25840: lr=1.00E-05, loss= 1.1794 (max= 1.5664), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:01:10,279 - root - INFO - Step 25840: lr=1.00E-05, loss= 1.1794 (max= 1.5664), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:01:10,279 - root - INFO - Step 25840: lr=1.00E-05, loss= 1.1794 (max= 1.5664), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:01:10,279 - root - INFO - Step 25840: lr=1.00E-05, loss= 1.1794 (max= 1.5664), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:01:10,279 - root - INFO - Step 25840: lr=1.00E-05, loss= 1.1794 (max= 1.5664), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:01:10,279 - root - INFO - Step 25840: lr=1.00E-05, loss= 1.1794 (max= 1.5664), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:01:10,279 - root - INFO - Step 25840: lr=1.00E-05, loss= 1.1794 (max= 1.5664), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:01:26,204 - root - INFO - Step 25850: lr=1.00E-05, loss= 1.1528 (max= 1.6084), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:01:26,204 - root - INFO - Step 25850: lr=1.00E-05, loss= 1.1528 (max= 1.6084), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:01:26,204 - root - INFO - Step 25850: lr=1.00E-05, loss= 1.1528 (max= 1.6084), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:01:26,204 - root - INFO - Step 25850: lr=1.00E-05, loss= 1.1528 (max= 1.6084), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:01:26,204 - root - INFO - Step 25850: lr=1.00E-05, loss= 1.1528 (max= 1.6084), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:01:26,205 - root - INFO - Step 25850: lr=1.00E-05, loss= 1.1528 (max= 1.6084), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:01:26,205 - root - INFO - Step 25850: lr=1.00E-05, loss= 1.1528 (max= 1.6084), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:01:26,205 - root - INFO - Step 25850: lr=1.00E-05, loss= 1.1528 (max= 1.6084), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:01:42,149 - root - INFO - Step 25860: lr=1.00E-05, loss= 1.1951 (max= 1.8293), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:01:42,150 - root - INFO - Step 25860: lr=1.00E-05, loss= 1.1951 (max= 1.8293), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:01:42,150 - root - INFO - Step 25860: lr=1.00E-05, loss= 1.1951 (max= 1.8293), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:01:42,150 - root - INFO - Step 25860: lr=1.00E-05, loss= 1.1951 (max= 1.8293), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:01:42,150 - root - INFO - Step 25860: lr=1.00E-05, loss= 1.1951 (max= 1.8293), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:01:42,150 - root - INFO - Step 25860: lr=1.00E-05, loss= 1.1951 (max= 1.8293), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:01:42,150 - root - INFO - Step 25860: lr=1.00E-05, loss= 1.1951 (max= 1.8293), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:01:42,150 - root - INFO - Step 25860: lr=1.00E-05, loss= 1.1951 (max= 1.8293), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:01:58,065 - root - INFO - Step 25870: lr=1.00E-05, loss= 1.1374 (max= 1.5982), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:01:58,065 - root - INFO - Step 25870: lr=1.00E-05, loss= 1.1374 (max= 1.5982), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:01:58,065 - root - INFO - Step 25870: lr=1.00E-05, loss= 1.1374 (max= 1.5982), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:01:58,065 - root - INFO - Step 25870: lr=1.00E-05, loss= 1.1374 (max= 1.5982), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:01:58,065 - root - INFO - Step 25870: lr=1.00E-05, loss= 1.1374 (max= 1.5982), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:01:58,065 - root - INFO - Step 25870: lr=1.00E-05, loss= 1.1374 (max= 1.5982), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:01:58,065 - root - INFO - Step 25870: lr=1.00E-05, loss= 1.1374 (max= 1.5982), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:01:58,065 - root - INFO - Step 25870: lr=1.00E-05, loss= 1.1374 (max= 1.5982), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:02:14,031 - root - INFO - Step 25880: lr=1.00E-05, loss= 1.1663 (max= 1.5760), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:14,031 - root - INFO - Step 25880: lr=1.00E-05, loss= 1.1663 (max= 1.5760), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:14,031 - root - INFO - Step 25880: lr=1.00E-05, loss= 1.1663 (max= 1.5760), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:14,031 - root - INFO - Step 25880: lr=1.00E-05, loss= 1.1663 (max= 1.5760), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:14,031 - root - INFO - Step 25880: lr=1.00E-05, loss= 1.1663 (max= 1.5760), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:14,031 - root - INFO - Step 25880: lr=1.00E-05, loss= 1.1663 (max= 1.5760), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:14,031 - root - INFO - Step 25880: lr=1.00E-05, loss= 1.1663 (max= 1.5760), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:14,032 - root - INFO - Step 25880: lr=1.00E-05, loss= 1.1663 (max= 1.5760), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:29,927 - root - INFO - Step 25890: lr=1.00E-05, loss= 1.1956 (max= 1.9634), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:29,927 - root - INFO - Step 25890: lr=1.00E-05, loss= 1.1956 (max= 1.9634), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:29,927 - root - INFO - Step 25890: lr=1.00E-05, loss= 1.1956 (max= 1.9634), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:29,927 - root - INFO - Step 25890: lr=1.00E-05, loss= 1.1956 (max= 1.9634), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:29,927 - root - INFO - Step 25890: lr=1.00E-05, loss= 1.1956 (max= 1.9634), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:29,927 - root - INFO - Step 25890: lr=1.00E-05, loss= 1.1956 (max= 1.9634), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:29,927 - root - INFO - Step 25890: lr=1.00E-05, loss= 1.1956 (max= 1.9634), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:29,927 - root - INFO - Step 25890: lr=1.00E-05, loss= 1.1956 (max= 1.9634), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:45,901 - root - INFO - Step 25900: lr=1.00E-05, loss= 1.1917 (max= 1.8238), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:45,901 - root - INFO - Step 25900: lr=1.00E-05, loss= 1.1917 (max= 1.8238), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:45,901 - root - INFO - Step 25900: lr=1.00E-05, loss= 1.1917 (max= 1.8238), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:45,901 - root - INFO - Step 25900: lr=1.00E-05, loss= 1.1917 (max= 1.8238), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:45,901 - root - INFO - Step 25900: lr=1.00E-05, loss= 1.1917 (max= 1.8238), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:45,901 - root - INFO - Step 25900: lr=1.00E-05, loss= 1.1917 (max= 1.8238), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:45,901 - root - INFO - Step 25900: lr=1.00E-05, loss= 1.1917 (max= 1.8238), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:02:45,901 - root - INFO - Step 25900: lr=1.00E-05, loss= 1.1917 (max= 1.8238), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:01,828 - root - INFO - Step 25910: lr=1.00E-05, loss= 1.1844 (max= 1.5584), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:01,828 - root - INFO - Step 25910: lr=1.00E-05, loss= 1.1844 (max= 1.5584), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:01,828 - root - INFO - Step 25910: lr=1.00E-05, loss= 1.1844 (max= 1.5584), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:01,828 - root - INFO - Step 25910: lr=1.00E-05, loss= 1.1844 (max= 1.5584), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:01,828 - root - INFO - Step 25910: lr=1.00E-05, loss= 1.1844 (max= 1.5584), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:01,828 - root - INFO - Step 25910: lr=1.00E-05, loss= 1.1844 (max= 1.5584), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:01,828 - root - INFO - Step 25910: lr=1.00E-05, loss= 1.1844 (max= 1.5584), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:01,829 - root - INFO - Step 25910: lr=1.00E-05, loss= 1.1844 (max= 1.5584), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:17,780 - root - INFO - Step 25920: lr=1.00E-05, loss= 1.1750 (max= 1.6397), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:17,780 - root - INFO - Step 25920: lr=1.00E-05, loss= 1.1750 (max= 1.6397), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:17,780 - root - INFO - Step 25920: lr=1.00E-05, loss= 1.1750 (max= 1.6397), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:17,781 - root - INFO - Step 25920: lr=1.00E-05, loss= 1.1750 (max= 1.6397), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:17,781 - root - INFO - Step 25920: lr=1.00E-05, loss= 1.1750 (max= 1.6397), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:17,781 - root - INFO - Step 25920: lr=1.00E-05, loss= 1.1750 (max= 1.6397), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:17,781 - root - INFO - Step 25920: lr=1.00E-05, loss= 1.1750 (max= 1.6397), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:17,781 - root - INFO - Step 25920: lr=1.00E-05, loss= 1.1750 (max= 1.6397), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:33,747 - root - INFO - Step 25930: lr=1.00E-05, loss= 1.1944 (max= 1.6086), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:33,747 - root - INFO - Step 25930: lr=1.00E-05, loss= 1.1944 (max= 1.6086), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:33,747 - root - INFO - Step 25930: lr=1.00E-05, loss= 1.1944 (max= 1.6086), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:33,747 - root - INFO - Step 25930: lr=1.00E-05, loss= 1.1944 (max= 1.6086), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:33,748 - root - INFO - Step 25930: lr=1.00E-05, loss= 1.1944 (max= 1.6086), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:33,748 - root - INFO - Step 25930: lr=1.00E-05, loss= 1.1944 (max= 1.6086), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:33,748 - root - INFO - Step 25930: lr=1.00E-05, loss= 1.1944 (max= 1.6086), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:33,748 - root - INFO - Step 25930: lr=1.00E-05, loss= 1.1944 (max= 1.6086), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:49,706 - root - INFO - Step 25940: lr=1.00E-05, loss= 1.1611 (max= 1.5143), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:49,706 - root - INFO - Step 25940: lr=1.00E-05, loss= 1.1611 (max= 1.5143), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:49,706 - root - INFO - Step 25940: lr=1.00E-05, loss= 1.1611 (max= 1.5143), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:49,706 - root - INFO - Step 25940: lr=1.00E-05, loss= 1.1611 (max= 1.5143), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:49,706 - root - INFO - Step 25940: lr=1.00E-05, loss= 1.1611 (max= 1.5143), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:49,706 - root - INFO - Step 25940: lr=1.00E-05, loss= 1.1611 (max= 1.5143), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:49,707 - root - INFO - Step 25940: lr=1.00E-05, loss= 1.1611 (max= 1.5143), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:03:49,707 - root - INFO - Step 25940: lr=1.00E-05, loss= 1.1611 (max= 1.5143), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:05,636 - root - INFO - Step 25950: lr=1.00E-05, loss= 1.1696 (max= 1.4872), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:05,636 - root - INFO - Step 25950: lr=1.00E-05, loss= 1.1696 (max= 1.4872), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:05,636 - root - INFO - Step 25950: lr=1.00E-05, loss= 1.1696 (max= 1.4872), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:05,636 - root - INFO - Step 25950: lr=1.00E-05, loss= 1.1696 (max= 1.4872), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:05,636 - root - INFO - Step 25950: lr=1.00E-05, loss= 1.1696 (max= 1.4872), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:05,636 - root - INFO - Step 25950: lr=1.00E-05, loss= 1.1696 (max= 1.4872), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:05,637 - root - INFO - Step 25950: lr=1.00E-05, loss= 1.1696 (max= 1.4872), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:05,637 - root - INFO - Step 25950: lr=1.00E-05, loss= 1.1696 (max= 1.4872), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:21,603 - root - INFO - Step 25960: lr=1.00E-05, loss= 1.1722 (max= 1.5052), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:04:21,603 - root - INFO - Step 25960: lr=1.00E-05, loss= 1.1722 (max= 1.5052), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:04:21,603 - root - INFO - Step 25960: lr=1.00E-05, loss= 1.1722 (max= 1.5052), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:04:21,604 - root - INFO - Step 25960: lr=1.00E-05, loss= 1.1722 (max= 1.5052), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:04:21,604 - root - INFO - Step 25960: lr=1.00E-05, loss= 1.1722 (max= 1.5052), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:04:21,604 - root - INFO - Step 25960: lr=1.00E-05, loss= 1.1722 (max= 1.5052), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:04:21,604 - root - INFO - Step 25960: lr=1.00E-05, loss= 1.1722 (max= 1.5052), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:04:21,604 - root - INFO - Step 25960: lr=1.00E-05, loss= 1.1722 (max= 1.5052), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:04:37,589 - root - INFO - Step 25970: lr=1.00E-05, loss= 1.1783 (max= 1.5623), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:37,589 - root - INFO - Step 25970: lr=1.00E-05, loss= 1.1783 (max= 1.5623), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:37,589 - root - INFO - Step 25970: lr=1.00E-05, loss= 1.1783 (max= 1.5623), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:37,590 - root - INFO - Step 25970: lr=1.00E-05, loss= 1.1783 (max= 1.5623), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:37,590 - root - INFO - Step 25970: lr=1.00E-05, loss= 1.1783 (max= 1.5623), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:37,590 - root - INFO - Step 25970: lr=1.00E-05, loss= 1.1783 (max= 1.5623), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:37,590 - root - INFO - Step 25970: lr=1.00E-05, loss= 1.1783 (max= 1.5623), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:37,590 - root - INFO - Step 25970: lr=1.00E-05, loss= 1.1783 (max= 1.5623), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:53,492 - root - INFO - Step 25980: lr=1.00E-05, loss= 1.2044 (max= 1.5042), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:53,492 - root - INFO - Step 25980: lr=1.00E-05, loss= 1.2044 (max= 1.5042), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:53,492 - root - INFO - Step 25980: lr=1.00E-05, loss= 1.2044 (max= 1.5042), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:53,493 - root - INFO - Step 25980: lr=1.00E-05, loss= 1.2044 (max= 1.5042), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:53,493 - root - INFO - Step 25980: lr=1.00E-05, loss= 1.2044 (max= 1.5042), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:53,493 - root - INFO - Step 25980: lr=1.00E-05, loss= 1.2044 (max= 1.5042), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:53,493 - root - INFO - Step 25980: lr=1.00E-05, loss= 1.2044 (max= 1.5042), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:04:53,493 - root - INFO - Step 25980: lr=1.00E-05, loss= 1.2044 (max= 1.5042), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:05:09,424 - root - INFO - Step 25990: lr=1.00E-05, loss= 1.1898 (max= 1.5690), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:05:09,424 - root - INFO - Step 25990: lr=1.00E-05, loss= 1.1898 (max= 1.5690), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:05:09,424 - root - INFO - Step 25990: lr=1.00E-05, loss= 1.1898 (max= 1.5690), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:05:09,424 - root - INFO - Step 25990: lr=1.00E-05, loss= 1.1898 (max= 1.5690), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:05:09,424 - root - INFO - Step 25990: lr=1.00E-05, loss= 1.1898 (max= 1.5690), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:05:09,424 - root - INFO - Step 25990: lr=1.00E-05, loss= 1.1898 (max= 1.5690), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:05:09,424 - root - INFO - Step 25990: lr=1.00E-05, loss= 1.1898 (max= 1.5690), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:05:09,424 - root - INFO - Step 25990: lr=1.00E-05, loss= 1.1898 (max= 1.5690), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-26000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-26000! Save time: 4.424546480178833 +2025-10-24 22:05:25,369 - root - INFO - Step 26000: lr=1.00E-05, loss= 1.1817 (max= 1.5390), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:05:25,369 - root - INFO - Saving a full checkpoint at step 26000 +2025-10-24 22:05:25,369 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:05:25,369 - root - INFO - Step 26000: lr=1.00E-05, loss= 1.1817 (max= 1.5390), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:05:25,369 - root - INFO - Saving a full checkpoint at step 26000 +2025-10-24 22:05:25,369 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:05:25,369 - root - INFO - Step 26000: lr=1.00E-05, loss= 1.1817 (max= 1.5390), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:05:25,369 - root - INFO - Step 26000: lr=1.00E-05, loss= 1.1817 (max= 1.5390), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:05:25,370 - root - INFO - Saving a full checkpoint at step 26000 +2025-10-24 22:05:25,370 - root - INFO - Step 26000: lr=1.00E-05, loss= 1.1817 (max= 1.5390), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:05:25,370 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:05:25,370 - root - INFO - Saving a full checkpoint at step 26000 +2025-10-24 22:05:25,370 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:05:25,370 - root - INFO - Saving a full checkpoint at step 26000 +2025-10-24 22:05:25,370 - root - INFO - Step 26000: lr=1.00E-05, loss= 1.1817 (max= 1.5390), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:05:25,370 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:05:25,370 - root - INFO - Step 26000: lr=1.00E-05, loss= 1.1817 (max= 1.5390), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:05:25,370 - root - INFO - Saving a full checkpoint at step 26000 +2025-10-24 22:05:25,370 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:05:25,370 - root - INFO - Saving a full checkpoint at step 26000 +2025-10-24 22:05:25,370 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:05:25,370 - root - INFO - Step 26000: lr=1.00E-05, loss= 1.1817 (max= 1.5390), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:05:25,370 - root - INFO - Saving a full checkpoint at step 26000 +2025-10-24 22:05:25,370 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:05:39,053 - root - INFO - Finished saving the checkpoint in 13.68 seconds +2025-10-24 22:05:39,060 - root - INFO - Finished saving the checkpoint in 13.69 seconds +2025-10-24 22:05:39,060 - root - INFO - Finished saving the checkpoint in 13.69 seconds +2025-10-24 22:05:39,060 - root - INFO - Finished saving the checkpoint in 13.69 seconds +2025-10-24 22:05:39,062 - root - INFO - Finished saving the checkpoint in 13.69 seconds +2025-10-24 22:05:39,062 - root - INFO - Finished saving the checkpoint in 13.69 seconds +2025-10-24 22:05:39,062 - root - INFO - Finished saving the checkpoint in 13.69 seconds +2025-10-24 22:05:39,064 - root - INFO - Finished saving the checkpoint in 13.69 seconds +2025-10-24 22:05:54,922 - root - INFO - Step 26010: lr=1.00E-05, loss= 1.1909 (max= 1.5178), tps=11089, mfu=23.10%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:05:54,922 - root - INFO - Step 26010: lr=1.00E-05, loss= 1.1909 (max= 1.5178), tps=11089, mfu=23.10%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:05:54,922 - root - INFO - Step 26010: lr=1.00E-05, loss= 1.1909 (max= 1.5178), tps=11089, mfu=23.10%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:05:54,922 - root - INFO - Step 26010: lr=1.00E-05, loss= 1.1909 (max= 1.5178), tps=11089, mfu=23.10%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:05:54,922 - root - INFO - Step 26010: lr=1.00E-05, loss= 1.1909 (max= 1.5178), tps=11089, mfu=23.10%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:05:54,922 - root - INFO - Step 26010: lr=1.00E-05, loss= 1.1909 (max= 1.5178), tps=11089, mfu=23.10%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:05:54,923 - root - INFO - Step 26010: lr=1.00E-05, loss= 1.1909 (max= 1.5178), tps=11089, mfu=23.10%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:05:54,923 - root - INFO - Step 26010: lr=1.00E-05, loss= 1.1909 (max= 1.5178), tps=11089, mfu=23.10%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:06:10,885 - root - INFO - Step 26020: lr=1.00E-05, loss= 1.1997 (max= 1.6937), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:10,885 - root - INFO - Step 26020: lr=1.00E-05, loss= 1.1997 (max= 1.6937), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:10,885 - root - INFO - Step 26020: lr=1.00E-05, loss= 1.1997 (max= 1.6937), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:10,885 - root - INFO - Step 26020: lr=1.00E-05, loss= 1.1997 (max= 1.6937), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:10,885 - root - INFO - Step 26020: lr=1.00E-05, loss= 1.1997 (max= 1.6937), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:10,885 - root - INFO - Step 26020: lr=1.00E-05, loss= 1.1997 (max= 1.6937), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:10,885 - root - INFO - Step 26020: lr=1.00E-05, loss= 1.1997 (max= 1.6937), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:10,885 - root - INFO - Step 26020: lr=1.00E-05, loss= 1.1997 (max= 1.6937), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:26,838 - root - INFO - Step 26030: lr=1.00E-05, loss= 1.1468 (max= 1.5490), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:26,838 - root - INFO - Step 26030: lr=1.00E-05, loss= 1.1468 (max= 1.5490), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:26,838 - root - INFO - Step 26030: lr=1.00E-05, loss= 1.1468 (max= 1.5490), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:26,838 - root - INFO - Step 26030: lr=1.00E-05, loss= 1.1468 (max= 1.5490), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:26,838 - root - INFO - Step 26030: lr=1.00E-05, loss= 1.1468 (max= 1.5490), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:26,838 - root - INFO - Step 26030: lr=1.00E-05, loss= 1.1468 (max= 1.5490), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:26,838 - root - INFO - Step 26030: lr=1.00E-05, loss= 1.1468 (max= 1.5490), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:26,838 - root - INFO - Step 26030: lr=1.00E-05, loss= 1.1468 (max= 1.5490), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:42,835 - root - INFO - Step 26040: lr=1.00E-05, loss= 1.1834 (max= 1.5420), tps=20487, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:42,835 - root - INFO - Step 26040: lr=1.00E-05, loss= 1.1834 (max= 1.5420), tps=20487, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:42,835 - root - INFO - Step 26040: lr=1.00E-05, loss= 1.1834 (max= 1.5420), tps=20487, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:42,835 - root - INFO - Step 26040: lr=1.00E-05, loss= 1.1834 (max= 1.5420), tps=20487, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:42,835 - root - INFO - Step 26040: lr=1.00E-05, loss= 1.1834 (max= 1.5420), tps=20487, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:42,835 - root - INFO - Step 26040: lr=1.00E-05, loss= 1.1834 (max= 1.5420), tps=20488, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:42,836 - root - INFO - Step 26040: lr=1.00E-05, loss= 1.1834 (max= 1.5420), tps=20488, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:42,836 - root - INFO - Step 26040: lr=1.00E-05, loss= 1.1834 (max= 1.5420), tps=20487, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:58,783 - root - INFO - Step 26050: lr=1.00E-05, loss= 1.1856 (max= 1.5902), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:58,783 - root - INFO - Step 26050: lr=1.00E-05, loss= 1.1856 (max= 1.5902), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:58,784 - root - INFO - Step 26050: lr=1.00E-05, loss= 1.1856 (max= 1.5902), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:58,784 - root - INFO - Step 26050: lr=1.00E-05, loss= 1.1856 (max= 1.5902), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:58,784 - root - INFO - Step 26050: lr=1.00E-05, loss= 1.1856 (max= 1.5902), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:58,784 - root - INFO - Step 26050: lr=1.00E-05, loss= 1.1856 (max= 1.5902), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:58,784 - root - INFO - Step 26050: lr=1.00E-05, loss= 1.1856 (max= 1.5902), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:06:58,784 - root - INFO - Step 26050: lr=1.00E-05, loss= 1.1856 (max= 1.5902), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:14,687 - root - INFO - Step 26060: lr=1.00E-05, loss= 1.1701 (max= 1.5854), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:14,687 - root - INFO - Step 26060: lr=1.00E-05, loss= 1.1701 (max= 1.5854), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:14,687 - root - INFO - Step 26060: lr=1.00E-05, loss= 1.1701 (max= 1.5854), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:14,687 - root - INFO - Step 26060: lr=1.00E-05, loss= 1.1701 (max= 1.5854), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:14,687 - root - INFO - Step 26060: lr=1.00E-05, loss= 1.1701 (max= 1.5854), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:14,687 - root - INFO - Step 26060: lr=1.00E-05, loss= 1.1701 (max= 1.5854), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:14,687 - root - INFO - Step 26060: lr=1.00E-05, loss= 1.1701 (max= 1.5854), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:14,687 - root - INFO - Step 26060: lr=1.00E-05, loss= 1.1701 (max= 1.5854), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:30,639 - root - INFO - Step 26070: lr=1.00E-05, loss= 1.1961 (max= 1.7241), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:30,639 - root - INFO - Step 26070: lr=1.00E-05, loss= 1.1961 (max= 1.7241), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:30,639 - root - INFO - Step 26070: lr=1.00E-05, loss= 1.1961 (max= 1.7241), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:30,639 - root - INFO - Step 26070: lr=1.00E-05, loss= 1.1961 (max= 1.7241), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:30,639 - root - INFO - Step 26070: lr=1.00E-05, loss= 1.1961 (max= 1.7241), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:30,639 - root - INFO - Step 26070: lr=1.00E-05, loss= 1.1961 (max= 1.7241), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:30,640 - root - INFO - Step 26070: lr=1.00E-05, loss= 1.1961 (max= 1.7241), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:30,640 - root - INFO - Step 26070: lr=1.00E-05, loss= 1.1961 (max= 1.7241), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:40,957 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:2896721 +2025-10-24 22:07:46,567 - root - INFO - Step 26080: lr=1.00E-05, loss= 1.1370 (max= 1.4608), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:46,567 - root - INFO - Step 26080: lr=1.00E-05, loss= 1.1370 (max= 1.4608), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:46,568 - root - INFO - Step 26080: lr=1.00E-05, loss= 1.1370 (max= 1.4608), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:46,568 - root - INFO - Step 26080: lr=1.00E-05, loss= 1.1370 (max= 1.4608), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:46,568 - root - INFO - Step 26080: lr=1.00E-05, loss= 1.1370 (max= 1.4608), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:46,568 - root - INFO - Step 26080: lr=1.00E-05, loss= 1.1370 (max= 1.4608), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:46,568 - root - INFO - Step 26080: lr=1.00E-05, loss= 1.1370 (max= 1.4608), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:07:46,568 - root - INFO - Step 26080: lr=1.00E-05, loss= 1.1370 (max= 1.4608), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:02,534 - root - INFO - Step 26090: lr=1.00E-05, loss= 1.1451 (max= 1.6972), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:02,534 - root - INFO - Step 26090: lr=1.00E-05, loss= 1.1451 (max= 1.6972), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:02,534 - root - INFO - Step 26090: lr=1.00E-05, loss= 1.1451 (max= 1.6972), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:02,534 - root - INFO - Step 26090: lr=1.00E-05, loss= 1.1451 (max= 1.6972), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:02,534 - root - INFO - Step 26090: lr=1.00E-05, loss= 1.1451 (max= 1.6972), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:02,534 - root - INFO - Step 26090: lr=1.00E-05, loss= 1.1451 (max= 1.6972), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:02,534 - root - INFO - Step 26090: lr=1.00E-05, loss= 1.1451 (max= 1.6972), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:02,534 - root - INFO - Step 26090: lr=1.00E-05, loss= 1.1451 (max= 1.6972), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:18,480 - root - INFO - Step 26100: lr=1.00E-05, loss= 1.1508 (max= 1.7661), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:18,480 - root - INFO - Step 26100: lr=1.00E-05, loss= 1.1508 (max= 1.7661), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:18,480 - root - INFO - Step 26100: lr=1.00E-05, loss= 1.1508 (max= 1.7661), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:18,480 - root - INFO - Step 26100: lr=1.00E-05, loss= 1.1508 (max= 1.7661), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:18,480 - root - INFO - Step 26100: lr=1.00E-05, loss= 1.1508 (max= 1.7661), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:18,481 - root - INFO - Step 26100: lr=1.00E-05, loss= 1.1508 (max= 1.7661), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:18,481 - root - INFO - Step 26100: lr=1.00E-05, loss= 1.1508 (max= 1.7661), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:18,481 - root - INFO - Step 26100: lr=1.00E-05, loss= 1.1508 (max= 1.7661), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:34,434 - root - INFO - Step 26110: lr=1.00E-05, loss= 1.1459 (max= 1.6074), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:34,434 - root - INFO - Step 26110: lr=1.00E-05, loss= 1.1459 (max= 1.6074), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:34,434 - root - INFO - Step 26110: lr=1.00E-05, loss= 1.1459 (max= 1.6074), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:34,434 - root - INFO - Step 26110: lr=1.00E-05, loss= 1.1459 (max= 1.6074), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:34,434 - root - INFO - Step 26110: lr=1.00E-05, loss= 1.1459 (max= 1.6074), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:34,434 - root - INFO - Step 26110: lr=1.00E-05, loss= 1.1459 (max= 1.6074), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:34,434 - root - INFO - Step 26110: lr=1.00E-05, loss= 1.1459 (max= 1.6074), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:34,434 - root - INFO - Step 26110: lr=1.00E-05, loss= 1.1459 (max= 1.6074), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:50,354 - root - INFO - Step 26120: lr=1.00E-05, loss= 1.1762 (max= 1.5398), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:50,354 - root - INFO - Step 26120: lr=1.00E-05, loss= 1.1762 (max= 1.5398), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:50,354 - root - INFO - Step 26120: lr=1.00E-05, loss= 1.1762 (max= 1.5398), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:50,355 - root - INFO - Step 26120: lr=1.00E-05, loss= 1.1762 (max= 1.5398), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:50,355 - root - INFO - Step 26120: lr=1.00E-05, loss= 1.1762 (max= 1.5398), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:50,355 - root - INFO - Step 26120: lr=1.00E-05, loss= 1.1762 (max= 1.5398), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:50,355 - root - INFO - Step 26120: lr=1.00E-05, loss= 1.1762 (max= 1.5398), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:08:50,355 - root - INFO - Step 26120: lr=1.00E-05, loss= 1.1762 (max= 1.5398), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:06,296 - root - INFO - Step 26130: lr=1.00E-05, loss= 1.2091 (max= 1.6222), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:06,296 - root - INFO - Step 26130: lr=1.00E-05, loss= 1.2091 (max= 1.6222), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:06,296 - root - INFO - Step 26130: lr=1.00E-05, loss= 1.2091 (max= 1.6222), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:06,296 - root - INFO - Step 26130: lr=1.00E-05, loss= 1.2091 (max= 1.6222), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:06,296 - root - INFO - Step 26130: lr=1.00E-05, loss= 1.2091 (max= 1.6222), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:06,296 - root - INFO - Step 26130: lr=1.00E-05, loss= 1.2091 (max= 1.6222), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:06,296 - root - INFO - Step 26130: lr=1.00E-05, loss= 1.2091 (max= 1.6222), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:06,296 - root - INFO - Step 26130: lr=1.00E-05, loss= 1.2091 (max= 1.6222), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:22,231 - root - INFO - Step 26140: lr=1.00E-05, loss= 1.1794 (max= 1.9099), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:09:22,231 - root - INFO - Step 26140: lr=1.00E-05, loss= 1.1794 (max= 1.9099), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:09:22,231 - root - INFO - Step 26140: lr=1.00E-05, loss= 1.1794 (max= 1.9099), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:09:22,231 - root - INFO - Step 26140: lr=1.00E-05, loss= 1.1794 (max= 1.9099), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:09:22,231 - root - INFO - Step 26140: lr=1.00E-05, loss= 1.1794 (max= 1.9099), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:09:22,231 - root - INFO - Step 26140: lr=1.00E-05, loss= 1.1794 (max= 1.9099), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:09:22,231 - root - INFO - Step 26140: lr=1.00E-05, loss= 1.1794 (max= 1.9099), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:09:22,231 - root - INFO - Step 26140: lr=1.00E-05, loss= 1.1794 (max= 1.9099), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:09:38,183 - root - INFO - Step 26150: lr=1.00E-05, loss= 1.1793 (max= 1.6365), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:38,183 - root - INFO - Step 26150: lr=1.00E-05, loss= 1.1793 (max= 1.6365), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:38,184 - root - INFO - Step 26150: lr=1.00E-05, loss= 1.1793 (max= 1.6365), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:38,184 - root - INFO - Step 26150: lr=1.00E-05, loss= 1.1793 (max= 1.6365), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:38,184 - root - INFO - Step 26150: lr=1.00E-05, loss= 1.1793 (max= 1.6365), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:38,184 - root - INFO - Step 26150: lr=1.00E-05, loss= 1.1793 (max= 1.6365), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:38,184 - root - INFO - Step 26150: lr=1.00E-05, loss= 1.1793 (max= 1.6365), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:38,184 - root - INFO - Step 26150: lr=1.00E-05, loss= 1.1793 (max= 1.6365), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:54,149 - root - INFO - Step 26160: lr=1.00E-05, loss= 1.1587 (max= 1.5454), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:54,149 - root - INFO - Step 26160: lr=1.00E-05, loss= 1.1587 (max= 1.5454), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:54,149 - root - INFO - Step 26160: lr=1.00E-05, loss= 1.1587 (max= 1.5454), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:54,149 - root - INFO - Step 26160: lr=1.00E-05, loss= 1.1587 (max= 1.5454), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:54,149 - root - INFO - Step 26160: lr=1.00E-05, loss= 1.1587 (max= 1.5454), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:54,149 - root - INFO - Step 26160: lr=1.00E-05, loss= 1.1587 (max= 1.5454), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:54,149 - root - INFO - Step 26160: lr=1.00E-05, loss= 1.1587 (max= 1.5454), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:09:54,150 - root - INFO - Step 26160: lr=1.00E-05, loss= 1.1587 (max= 1.5454), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:10,071 - root - INFO - Step 26170: lr=1.00E-05, loss= 1.1801 (max= 1.5953), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:10,071 - root - INFO - Step 26170: lr=1.00E-05, loss= 1.1801 (max= 1.5953), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:10,071 - root - INFO - Step 26170: lr=1.00E-05, loss= 1.1801 (max= 1.5953), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:10,071 - root - INFO - Step 26170: lr=1.00E-05, loss= 1.1801 (max= 1.5953), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:10,071 - root - INFO - Step 26170: lr=1.00E-05, loss= 1.1801 (max= 1.5953), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:10,071 - root - INFO - Step 26170: lr=1.00E-05, loss= 1.1801 (max= 1.5953), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:10,071 - root - INFO - Step 26170: lr=1.00E-05, loss= 1.1801 (max= 1.5953), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:10,072 - root - INFO - Step 26170: lr=1.00E-05, loss= 1.1801 (max= 1.5953), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:26,029 - root - INFO - Step 26180: lr=1.00E-05, loss= 1.2072 (max= 1.9266), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:26,029 - root - INFO - Step 26180: lr=1.00E-05, loss= 1.2072 (max= 1.9266), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:26,029 - root - INFO - Step 26180: lr=1.00E-05, loss= 1.2072 (max= 1.9266), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:26,029 - root - INFO - Step 26180: lr=1.00E-05, loss= 1.2072 (max= 1.9266), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:26,029 - root - INFO - Step 26180: lr=1.00E-05, loss= 1.2072 (max= 1.9266), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:26,029 - root - INFO - Step 26180: lr=1.00E-05, loss= 1.2072 (max= 1.9266), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:26,029 - root - INFO - Step 26180: lr=1.00E-05, loss= 1.2072 (max= 1.9266), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:26,029 - root - INFO - Step 26180: lr=1.00E-05, loss= 1.2072 (max= 1.9266), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:42,016 - root - INFO - Step 26190: lr=1.00E-05, loss= 1.1803 (max= 1.6575), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:42,016 - root - INFO - Step 26190: lr=1.00E-05, loss= 1.1803 (max= 1.6575), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:42,016 - root - INFO - Step 26190: lr=1.00E-05, loss= 1.1803 (max= 1.6575), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:42,016 - root - INFO - Step 26190: lr=1.00E-05, loss= 1.1803 (max= 1.6575), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:42,016 - root - INFO - Step 26190: lr=1.00E-05, loss= 1.1803 (max= 1.6575), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:42,016 - root - INFO - Step 26190: lr=1.00E-05, loss= 1.1803 (max= 1.6575), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:42,016 - root - INFO - Step 26190: lr=1.00E-05, loss= 1.1803 (max= 1.6575), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:42,016 - root - INFO - Step 26190: lr=1.00E-05, loss= 1.1803 (max= 1.6575), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:57,926 - root - INFO - Step 26200: lr=1.00E-05, loss= 1.1623 (max= 1.6131), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:57,926 - root - INFO - Step 26200: lr=1.00E-05, loss= 1.1623 (max= 1.6131), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:57,926 - root - INFO - Step 26200: lr=1.00E-05, loss= 1.1623 (max= 1.6131), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:57,926 - root - INFO - Step 26200: lr=1.00E-05, loss= 1.1623 (max= 1.6131), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:57,926 - root - INFO - Step 26200: lr=1.00E-05, loss= 1.1623 (max= 1.6131), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:57,926 - root - INFO - Step 26200: lr=1.00E-05, loss= 1.1623 (max= 1.6131), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:57,926 - root - INFO - Step 26200: lr=1.00E-05, loss= 1.1623 (max= 1.6131), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:10:57,927 - root - INFO - Step 26200: lr=1.00E-05, loss= 1.1623 (max= 1.6131), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:13,853 - root - INFO - Step 26210: lr=1.00E-05, loss= 1.1778 (max= 1.5967), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:13,854 - root - INFO - Step 26210: lr=1.00E-05, loss= 1.1778 (max= 1.5967), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:13,854 - root - INFO - Step 26210: lr=1.00E-05, loss= 1.1778 (max= 1.5967), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:13,854 - root - INFO - Step 26210: lr=1.00E-05, loss= 1.1778 (max= 1.5967), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:13,854 - root - INFO - Step 26210: lr=1.00E-05, loss= 1.1778 (max= 1.5967), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:13,854 - root - INFO - Step 26210: lr=1.00E-05, loss= 1.1778 (max= 1.5967), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:13,854 - root - INFO - Step 26210: lr=1.00E-05, loss= 1.1778 (max= 1.5967), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:13,854 - root - INFO - Step 26210: lr=1.00E-05, loss= 1.1778 (max= 1.5967), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:29,765 - root - INFO - Step 26220: lr=1.00E-05, loss= 1.1671 (max= 1.5525), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:29,765 - root - INFO - Step 26220: lr=1.00E-05, loss= 1.1671 (max= 1.5525), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:29,765 - root - INFO - Step 26220: lr=1.00E-05, loss= 1.1671 (max= 1.5525), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:29,765 - root - INFO - Step 26220: lr=1.00E-05, loss= 1.1671 (max= 1.5525), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:29,765 - root - INFO - Step 26220: lr=1.00E-05, loss= 1.1671 (max= 1.5525), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:29,765 - root - INFO - Step 26220: lr=1.00E-05, loss= 1.1671 (max= 1.5525), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:29,766 - root - INFO - Step 26220: lr=1.00E-05, loss= 1.1671 (max= 1.5525), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:29,766 - root - INFO - Step 26220: lr=1.00E-05, loss= 1.1671 (max= 1.5525), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:45,669 - root - INFO - Step 26230: lr=1.00E-05, loss= 1.1951 (max= 1.5540), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:45,669 - root - INFO - Step 26230: lr=1.00E-05, loss= 1.1951 (max= 1.5540), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:45,669 - root - INFO - Step 26230: lr=1.00E-05, loss= 1.1951 (max= 1.5540), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:45,669 - root - INFO - Step 26230: lr=1.00E-05, loss= 1.1951 (max= 1.5540), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:45,669 - root - INFO - Step 26230: lr=1.00E-05, loss= 1.1951 (max= 1.5540), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:45,669 - root - INFO - Step 26230: lr=1.00E-05, loss= 1.1951 (max= 1.5540), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:45,669 - root - INFO - Step 26230: lr=1.00E-05, loss= 1.1951 (max= 1.5540), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:11:45,669 - root - INFO - Step 26230: lr=1.00E-05, loss= 1.1951 (max= 1.5540), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:12:01,638 - root - INFO - Step 26240: lr=1.00E-05, loss= 1.1719 (max= 1.5666), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:12:01,638 - root - INFO - Step 26240: lr=1.00E-05, loss= 1.1719 (max= 1.5666), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:12:01,638 - root - INFO - Step 26240: lr=1.00E-05, loss= 1.1719 (max= 1.5666), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:12:01,638 - root - INFO - Step 26240: lr=1.00E-05, loss= 1.1719 (max= 1.5666), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:12:01,638 - root - INFO - Step 26240: lr=1.00E-05, loss= 1.1719 (max= 1.5666), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:12:01,639 - root - INFO - Step 26240: lr=1.00E-05, loss= 1.1719 (max= 1.5666), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:12:01,639 - root - INFO - Step 26240: lr=1.00E-05, loss= 1.1719 (max= 1.5666), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:12:01,639 - root - INFO - Step 26240: lr=1.00E-05, loss= 1.1719 (max= 1.5666), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:12:17,565 - root - INFO - Step 26250: lr=1.00E-05, loss= 1.1486 (max= 1.5183), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:12:17,565 - root - INFO - Step 26250: lr=1.00E-05, loss= 1.1486 (max= 1.5183), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:12:17,565 - root - INFO - Step 26250: lr=1.00E-05, loss= 1.1486 (max= 1.5183), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:12:17,565 - root - INFO - Step 26250: lr=1.00E-05, loss= 1.1486 (max= 1.5183), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:12:17,565 - root - INFO - Step 26250: lr=1.00E-05, loss= 1.1486 (max= 1.5183), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:12:17,565 - root - INFO - Step 26250: lr=1.00E-05, loss= 1.1486 (max= 1.5183), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:12:17,565 - root - INFO - Step 26250: lr=1.00E-05, loss= 1.1486 (max= 1.5183), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:12:17,565 - root - INFO - Step 26250: lr=1.00E-05, loss= 1.1486 (max= 1.5183), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:12:33,459 - root - INFO - Step 26260: lr=1.00E-05, loss= 1.2170 (max= 1.6836), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:12:33,460 - root - INFO - Step 26260: lr=1.00E-05, loss= 1.2170 (max= 1.6836), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:12:33,460 - root - INFO - Step 26260: lr=1.00E-05, loss= 1.2170 (max= 1.6836), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:12:33,460 - root - INFO - Step 26260: lr=1.00E-05, loss= 1.2170 (max= 1.6836), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:12:33,460 - root - INFO - Step 26260: lr=1.00E-05, loss= 1.2170 (max= 1.6836), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:12:33,460 - root - INFO - Step 26260: lr=1.00E-05, loss= 1.2170 (max= 1.6836), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:12:33,460 - root - INFO - Step 26260: lr=1.00E-05, loss= 1.2170 (max= 1.6836), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:12:33,460 - root - INFO - Step 26260: lr=1.00E-05, loss= 1.2170 (max= 1.6836), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:12:49,403 - root - INFO - Step 26270: lr=1.00E-05, loss= 1.1614 (max= 1.5719), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:12:49,403 - root - INFO - Step 26270: lr=1.00E-05, loss= 1.1614 (max= 1.5719), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:12:49,403 - root - INFO - Step 26270: lr=1.00E-05, loss= 1.1614 (max= 1.5719), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:12:49,403 - root - INFO - Step 26270: lr=1.00E-05, loss= 1.1614 (max= 1.5719), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:12:49,404 - root - INFO - Step 26270: lr=1.00E-05, loss= 1.1614 (max= 1.5719), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:12:49,404 - root - INFO - Step 26270: lr=1.00E-05, loss= 1.1614 (max= 1.5719), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:12:49,404 - root - INFO - Step 26270: lr=1.00E-05, loss= 1.1614 (max= 1.5719), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:12:49,404 - root - INFO - Step 26270: lr=1.00E-05, loss= 1.1614 (max= 1.5719), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:05,356 - root - INFO - Step 26280: lr=1.00E-05, loss= 1.1514 (max= 1.7114), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:05,356 - root - INFO - Step 26280: lr=1.00E-05, loss= 1.1514 (max= 1.7114), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:05,356 - root - INFO - Step 26280: lr=1.00E-05, loss= 1.1514 (max= 1.7114), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:05,356 - root - INFO - Step 26280: lr=1.00E-05, loss= 1.1514 (max= 1.7114), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:05,356 - root - INFO - Step 26280: lr=1.00E-05, loss= 1.1514 (max= 1.7114), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:05,356 - root - INFO - Step 26280: lr=1.00E-05, loss= 1.1514 (max= 1.7114), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:05,356 - root - INFO - Step 26280: lr=1.00E-05, loss= 1.1514 (max= 1.7114), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:05,356 - root - INFO - Step 26280: lr=1.00E-05, loss= 1.1514 (max= 1.7114), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:21,276 - root - INFO - Step 26290: lr=1.00E-05, loss= 1.1896 (max= 1.5431), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:21,276 - root - INFO - Step 26290: lr=1.00E-05, loss= 1.1896 (max= 1.5431), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:21,276 - root - INFO - Step 26290: lr=1.00E-05, loss= 1.1896 (max= 1.5431), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:21,276 - root - INFO - Step 26290: lr=1.00E-05, loss= 1.1896 (max= 1.5431), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:21,276 - root - INFO - Step 26290: lr=1.00E-05, loss= 1.1896 (max= 1.5431), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:21,276 - root - INFO - Step 26290: lr=1.00E-05, loss= 1.1896 (max= 1.5431), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:21,276 - root - INFO - Step 26290: lr=1.00E-05, loss= 1.1896 (max= 1.5431), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:21,276 - root - INFO - Step 26290: lr=1.00E-05, loss= 1.1896 (max= 1.5431), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:37,201 - root - INFO - Step 26300: lr=1.00E-05, loss= 1.2040 (max= 1.5879), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:37,201 - root - INFO - Step 26300: lr=1.00E-05, loss= 1.2040 (max= 1.5879), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:37,201 - root - INFO - Step 26300: lr=1.00E-05, loss= 1.2040 (max= 1.5879), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:37,202 - root - INFO - Step 26300: lr=1.00E-05, loss= 1.2040 (max= 1.5879), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:37,202 - root - INFO - Step 26300: lr=1.00E-05, loss= 1.2040 (max= 1.5879), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:37,202 - root - INFO - Step 26300: lr=1.00E-05, loss= 1.2040 (max= 1.5879), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:37,202 - root - INFO - Step 26300: lr=1.00E-05, loss= 1.2040 (max= 1.5879), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:37,202 - root - INFO - Step 26300: lr=1.00E-05, loss= 1.2040 (max= 1.5879), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:13:53,156 - root - INFO - Step 26310: lr=1.00E-05, loss= 1.2092 (max= 1.8062), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:13:53,156 - root - INFO - Step 26310: lr=1.00E-05, loss= 1.2092 (max= 1.8062), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:13:53,156 - root - INFO - Step 26310: lr=1.00E-05, loss= 1.2092 (max= 1.8062), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:13:53,156 - root - INFO - Step 26310: lr=1.00E-05, loss= 1.2092 (max= 1.8062), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:13:53,156 - root - INFO - Step 26310: lr=1.00E-05, loss= 1.2092 (max= 1.8062), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:13:53,157 - root - INFO - Step 26310: lr=1.00E-05, loss= 1.2092 (max= 1.8062), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:13:53,157 - root - INFO - Step 26310: lr=1.00E-05, loss= 1.2092 (max= 1.8062), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:13:53,157 - root - INFO - Step 26310: lr=1.00E-05, loss= 1.2092 (max= 1.8062), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:14:09,129 - root - INFO - Step 26320: lr=1.00E-05, loss= 1.1535 (max= 1.6444), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:14:09,129 - root - INFO - Step 26320: lr=1.00E-05, loss= 1.1535 (max= 1.6444), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:14:09,129 - root - INFO - Step 26320: lr=1.00E-05, loss= 1.1535 (max= 1.6444), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:14:09,129 - root - INFO - Step 26320: lr=1.00E-05, loss= 1.1535 (max= 1.6444), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:14:09,129 - root - INFO - Step 26320: lr=1.00E-05, loss= 1.1535 (max= 1.6444), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:14:09,129 - root - INFO - Step 26320: lr=1.00E-05, loss= 1.1535 (max= 1.6444), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:14:09,129 - root - INFO - Step 26320: lr=1.00E-05, loss= 1.1535 (max= 1.6444), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:14:09,129 - root - INFO - Step 26320: lr=1.00E-05, loss= 1.1535 (max= 1.6444), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:14:25,036 - root - INFO - Step 26330: lr=1.00E-05, loss= 1.1787 (max= 1.7790), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:14:25,036 - root - INFO - Step 26330: lr=1.00E-05, loss= 1.1787 (max= 1.7790), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:14:25,036 - root - INFO - Step 26330: lr=1.00E-05, loss= 1.1787 (max= 1.7790), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:14:25,036 - root - INFO - Step 26330: lr=1.00E-05, loss= 1.1787 (max= 1.7790), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:14:25,036 - root - INFO - Step 26330: lr=1.00E-05, loss= 1.1787 (max= 1.7790), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:14:25,036 - root - INFO - Step 26330: lr=1.00E-05, loss= 1.1787 (max= 1.7790), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:14:25,036 - root - INFO - Step 26330: lr=1.00E-05, loss= 1.1787 (max= 1.7790), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:14:25,036 - root - INFO - Step 26330: lr=1.00E-05, loss= 1.1787 (max= 1.7790), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:14:40,967 - root - INFO - Step 26340: lr=1.00E-05, loss= 1.1878 (max= 1.5104), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:14:40,967 - root - INFO - Step 26340: lr=1.00E-05, loss= 1.1878 (max= 1.5104), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:14:40,967 - root - INFO - Step 26340: lr=1.00E-05, loss= 1.1878 (max= 1.5104), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:14:40,967 - root - INFO - Step 26340: lr=1.00E-05, loss= 1.1878 (max= 1.5104), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:14:40,967 - root - INFO - Step 26340: lr=1.00E-05, loss= 1.1878 (max= 1.5104), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:14:40,967 - root - INFO - Step 26340: lr=1.00E-05, loss= 1.1878 (max= 1.5104), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:14:40,967 - root - INFO - Step 26340: lr=1.00E-05, loss= 1.1878 (max= 1.5104), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:14:40,967 - root - INFO - Step 26340: lr=1.00E-05, loss= 1.1878 (max= 1.5104), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:14:56,922 - root - INFO - Step 26350: lr=1.00E-05, loss= 1.2042 (max= 1.6244), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:14:56,922 - root - INFO - Step 26350: lr=1.00E-05, loss= 1.2042 (max= 1.6244), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:14:56,922 - root - INFO - Step 26350: lr=1.00E-05, loss= 1.2042 (max= 1.6244), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:14:56,922 - root - INFO - Step 26350: lr=1.00E-05, loss= 1.2042 (max= 1.6244), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:14:56,922 - root - INFO - Step 26350: lr=1.00E-05, loss= 1.2042 (max= 1.6244), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:14:56,922 - root - INFO - Step 26350: lr=1.00E-05, loss= 1.2042 (max= 1.6244), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:14:56,923 - root - INFO - Step 26350: lr=1.00E-05, loss= 1.2042 (max= 1.6244), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:14:56,923 - root - INFO - Step 26350: lr=1.00E-05, loss= 1.2042 (max= 1.6244), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:15:12,861 - root - INFO - Step 26360: lr=1.00E-05, loss= 1.2195 (max= 1.5895), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:15:12,861 - root - INFO - Step 26360: lr=1.00E-05, loss= 1.2195 (max= 1.5895), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:15:12,861 - root - INFO - Step 26360: lr=1.00E-05, loss= 1.2195 (max= 1.5895), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:15:12,861 - root - INFO - Step 26360: lr=1.00E-05, loss= 1.2195 (max= 1.5895), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:15:12,861 - root - INFO - Step 26360: lr=1.00E-05, loss= 1.2195 (max= 1.5895), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:15:12,861 - root - INFO - Step 26360: lr=1.00E-05, loss= 1.2195 (max= 1.5895), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:15:12,861 - root - INFO - Step 26360: lr=1.00E-05, loss= 1.2195 (max= 1.5895), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:15:12,862 - root - INFO - Step 26360: lr=1.00E-05, loss= 1.2195 (max= 1.5895), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:15:28,784 - root - INFO - Step 26370: lr=1.00E-05, loss= 1.1894 (max= 1.5710), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:15:28,784 - root - INFO - Step 26370: lr=1.00E-05, loss= 1.1894 (max= 1.5710), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:15:28,784 - root - INFO - Step 26370: lr=1.00E-05, loss= 1.1894 (max= 1.5710), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:15:28,784 - root - INFO - Step 26370: lr=1.00E-05, loss= 1.1894 (max= 1.5710), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:15:28,784 - root - INFO - Step 26370: lr=1.00E-05, loss= 1.1894 (max= 1.5710), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:15:28,784 - root - INFO - Step 26370: lr=1.00E-05, loss= 1.1894 (max= 1.5710), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:15:28,784 - root - INFO - Step 26370: lr=1.00E-05, loss= 1.1894 (max= 1.5710), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:15:28,784 - root - INFO - Step 26370: lr=1.00E-05, loss= 1.1894 (max= 1.5710), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:15:44,766 - root - INFO - Step 26380: lr=1.00E-05, loss= 1.1740 (max= 1.6816), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:15:44,766 - root - INFO - Step 26380: lr=1.00E-05, loss= 1.1740 (max= 1.6816), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:15:44,766 - root - INFO - Step 26380: lr=1.00E-05, loss= 1.1740 (max= 1.6816), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:15:44,766 - root - INFO - Step 26380: lr=1.00E-05, loss= 1.1740 (max= 1.6816), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:15:44,766 - root - INFO - Step 26380: lr=1.00E-05, loss= 1.1740 (max= 1.6816), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:15:44,766 - root - INFO - Step 26380: lr=1.00E-05, loss= 1.1740 (max= 1.6816), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:15:44,766 - root - INFO - Step 26380: lr=1.00E-05, loss= 1.1740 (max= 1.6816), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:15:44,766 - root - INFO - Step 26380: lr=1.00E-05, loss= 1.1740 (max= 1.6816), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:16:00,669 - root - INFO - Step 26390: lr=1.00E-05, loss= 1.1781 (max= 1.5842), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:16:00,669 - root - INFO - Step 26390: lr=1.00E-05, loss= 1.1781 (max= 1.5842), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:16:00,669 - root - INFO - Step 26390: lr=1.00E-05, loss= 1.1781 (max= 1.5842), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:16:00,669 - root - INFO - Step 26390: lr=1.00E-05, loss= 1.1781 (max= 1.5842), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:16:00,669 - root - INFO - Step 26390: lr=1.00E-05, loss= 1.1781 (max= 1.5842), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:16:00,670 - root - INFO - Step 26390: lr=1.00E-05, loss= 1.1781 (max= 1.5842), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:16:00,670 - root - INFO - Step 26390: lr=1.00E-05, loss= 1.1781 (max= 1.5842), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:16:00,670 - root - INFO - Step 26390: lr=1.00E-05, loss= 1.1781 (max= 1.5842), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:16:16,659 - root - INFO - Step 26400: lr=1.00E-05, loss= 1.1846 (max= 1.7167), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:16,659 - root - INFO - Step 26400: lr=1.00E-05, loss= 1.1846 (max= 1.7167), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:16,659 - root - INFO - Step 26400: lr=1.00E-05, loss= 1.1846 (max= 1.7167), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:16,659 - root - INFO - Step 26400: lr=1.00E-05, loss= 1.1846 (max= 1.7167), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:16,659 - root - INFO - Step 26400: lr=1.00E-05, loss= 1.1846 (max= 1.7167), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:16,659 - root - INFO - Step 26400: lr=1.00E-05, loss= 1.1846 (max= 1.7167), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:16,659 - root - INFO - Step 26400: lr=1.00E-05, loss= 1.1846 (max= 1.7167), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:16,659 - root - INFO - Step 26400: lr=1.00E-05, loss= 1.1846 (max= 1.7167), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:32,605 - root - INFO - Step 26410: lr=1.00E-05, loss= 1.1891 (max= 1.8166), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:32,605 - root - INFO - Step 26410: lr=1.00E-05, loss= 1.1891 (max= 1.8166), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:32,605 - root - INFO - Step 26410: lr=1.00E-05, loss= 1.1891 (max= 1.8166), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:32,605 - root - INFO - Step 26410: lr=1.00E-05, loss= 1.1891 (max= 1.8166), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:32,605 - root - INFO - Step 26410: lr=1.00E-05, loss= 1.1891 (max= 1.8166), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:32,605 - root - INFO - Step 26410: lr=1.00E-05, loss= 1.1891 (max= 1.8166), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:32,605 - root - INFO - Step 26410: lr=1.00E-05, loss= 1.1891 (max= 1.8166), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:32,605 - root - INFO - Step 26410: lr=1.00E-05, loss= 1.1891 (max= 1.8166), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:48,552 - root - INFO - Step 26420: lr=1.00E-05, loss= 1.2155 (max= 1.9878), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:48,552 - root - INFO - Step 26420: lr=1.00E-05, loss= 1.2155 (max= 1.9878), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:48,552 - root - INFO - Step 26420: lr=1.00E-05, loss= 1.2155 (max= 1.9878), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:48,552 - root - INFO - Step 26420: lr=1.00E-05, loss= 1.2155 (max= 1.9878), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:48,552 - root - INFO - Step 26420: lr=1.00E-05, loss= 1.2155 (max= 1.9878), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:48,552 - root - INFO - Step 26420: lr=1.00E-05, loss= 1.2155 (max= 1.9878), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:48,552 - root - INFO - Step 26420: lr=1.00E-05, loss= 1.2155 (max= 1.9878), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:16:48,552 - root - INFO - Step 26420: lr=1.00E-05, loss= 1.2155 (max= 1.9878), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:17:04,508 - root - INFO - Step 26430: lr=1.00E-05, loss= 1.1645 (max= 1.6231), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:04,508 - root - INFO - Step 26430: lr=1.00E-05, loss= 1.1645 (max= 1.6231), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:04,508 - root - INFO - Step 26430: lr=1.00E-05, loss= 1.1645 (max= 1.6231), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:04,508 - root - INFO - Step 26430: lr=1.00E-05, loss= 1.1645 (max= 1.6231), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:04,508 - root - INFO - Step 26430: lr=1.00E-05, loss= 1.1645 (max= 1.6231), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:04,508 - root - INFO - Step 26430: lr=1.00E-05, loss= 1.1645 (max= 1.6231), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:04,508 - root - INFO - Step 26430: lr=1.00E-05, loss= 1.1645 (max= 1.6231), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:04,508 - root - INFO - Step 26430: lr=1.00E-05, loss= 1.1645 (max= 1.6231), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:20,428 - root - INFO - Step 26440: lr=1.00E-05, loss= 1.1807 (max= 1.5722), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:20,428 - root - INFO - Step 26440: lr=1.00E-05, loss= 1.1807 (max= 1.5722), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:20,428 - root - INFO - Step 26440: lr=1.00E-05, loss= 1.1807 (max= 1.5722), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:20,428 - root - INFO - Step 26440: lr=1.00E-05, loss= 1.1807 (max= 1.5722), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:20,429 - root - INFO - Step 26440: lr=1.00E-05, loss= 1.1807 (max= 1.5722), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:20,429 - root - INFO - Step 26440: lr=1.00E-05, loss= 1.1807 (max= 1.5722), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:20,429 - root - INFO - Step 26440: lr=1.00E-05, loss= 1.1807 (max= 1.5722), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:20,429 - root - INFO - Step 26440: lr=1.00E-05, loss= 1.1807 (max= 1.5722), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:36,375 - root - INFO - Step 26450: lr=1.00E-05, loss= 1.2101 (max= 1.7279), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:36,375 - root - INFO - Step 26450: lr=1.00E-05, loss= 1.2101 (max= 1.7279), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:36,375 - root - INFO - Step 26450: lr=1.00E-05, loss= 1.2101 (max= 1.7279), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:36,375 - root - INFO - Step 26450: lr=1.00E-05, loss= 1.2101 (max= 1.7279), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:36,376 - root - INFO - Step 26450: lr=1.00E-05, loss= 1.2101 (max= 1.7279), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:36,376 - root - INFO - Step 26450: lr=1.00E-05, loss= 1.2101 (max= 1.7279), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:36,376 - root - INFO - Step 26450: lr=1.00E-05, loss= 1.2101 (max= 1.7279), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:36,376 - root - INFO - Step 26450: lr=1.00E-05, loss= 1.2101 (max= 1.7279), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:17:52,273 - root - INFO - Step 26460: lr=1.00E-05, loss= 1.1989 (max= 1.5389), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:17:52,273 - root - INFO - Step 26460: lr=1.00E-05, loss= 1.1989 (max= 1.5389), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:17:52,273 - root - INFO - Step 26460: lr=1.00E-05, loss= 1.1989 (max= 1.5389), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:17:52,273 - root - INFO - Step 26460: lr=1.00E-05, loss= 1.1989 (max= 1.5389), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:17:52,273 - root - INFO - Step 26460: lr=1.00E-05, loss= 1.1989 (max= 1.5389), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:17:52,273 - root - INFO - Step 26460: lr=1.00E-05, loss= 1.1989 (max= 1.5389), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:17:52,273 - root - INFO - Step 26460: lr=1.00E-05, loss= 1.1989 (max= 1.5389), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:17:52,273 - root - INFO - Step 26460: lr=1.00E-05, loss= 1.1989 (max= 1.5389), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:08,200 - root - INFO - Step 26470: lr=1.00E-05, loss= 1.1909 (max= 1.5013), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:18:08,200 - root - INFO - Step 26470: lr=1.00E-05, loss= 1.1909 (max= 1.5013), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:18:08,200 - root - INFO - Step 26470: lr=1.00E-05, loss= 1.1909 (max= 1.5013), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:18:08,200 - root - INFO - Step 26470: lr=1.00E-05, loss= 1.1909 (max= 1.5013), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:18:08,200 - root - INFO - Step 26470: lr=1.00E-05, loss= 1.1909 (max= 1.5013), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:18:08,200 - root - INFO - Step 26470: lr=1.00E-05, loss= 1.1909 (max= 1.5013), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:18:08,200 - root - INFO - Step 26470: lr=1.00E-05, loss= 1.1909 (max= 1.5013), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:18:08,200 - root - INFO - Step 26470: lr=1.00E-05, loss= 1.1909 (max= 1.5013), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:18:24,123 - root - INFO - Step 26480: lr=1.00E-05, loss= 1.1708 (max= 1.5463), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:24,123 - root - INFO - Step 26480: lr=1.00E-05, loss= 1.1708 (max= 1.5463), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:24,123 - root - INFO - Step 26480: lr=1.00E-05, loss= 1.1708 (max= 1.5463), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:24,123 - root - INFO - Step 26480: lr=1.00E-05, loss= 1.1708 (max= 1.5463), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:24,123 - root - INFO - Step 26480: lr=1.00E-05, loss= 1.1708 (max= 1.5463), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:24,123 - root - INFO - Step 26480: lr=1.00E-05, loss= 1.1708 (max= 1.5463), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:24,124 - root - INFO - Step 26480: lr=1.00E-05, loss= 1.1708 (max= 1.5463), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:24,124 - root - INFO - Step 26480: lr=1.00E-05, loss= 1.1708 (max= 1.5463), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:40,082 - root - INFO - Step 26490: lr=1.00E-05, loss= 1.1633 (max= 1.5178), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:40,082 - root - INFO - Step 26490: lr=1.00E-05, loss= 1.1633 (max= 1.5178), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:40,082 - root - INFO - Step 26490: lr=1.00E-05, loss= 1.1633 (max= 1.5178), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:40,082 - root - INFO - Step 26490: lr=1.00E-05, loss= 1.1633 (max= 1.5178), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:40,082 - root - INFO - Step 26490: lr=1.00E-05, loss= 1.1633 (max= 1.5178), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:40,082 - root - INFO - Step 26490: lr=1.00E-05, loss= 1.1633 (max= 1.5178), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:40,082 - root - INFO - Step 26490: lr=1.00E-05, loss= 1.1633 (max= 1.5178), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:40,082 - root - INFO - Step 26490: lr=1.00E-05, loss= 1.1633 (max= 1.5178), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:56,023 - root - INFO - Step 26500: lr=1.00E-05, loss= 1.2001 (max= 1.6193), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:56,023 - root - INFO - Step 26500: lr=1.00E-05, loss= 1.2001 (max= 1.6193), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:56,023 - root - INFO - Step 26500: lr=1.00E-05, loss= 1.2001 (max= 1.6193), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:56,023 - root - INFO - Step 26500: lr=1.00E-05, loss= 1.2001 (max= 1.6193), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:56,023 - root - INFO - Step 26500: lr=1.00E-05, loss= 1.2001 (max= 1.6193), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:56,023 - root - INFO - Step 26500: lr=1.00E-05, loss= 1.2001 (max= 1.6193), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:56,023 - root - INFO - Step 26500: lr=1.00E-05, loss= 1.2001 (max= 1.6193), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:18:56,023 - root - INFO - Step 26500: lr=1.00E-05, loss= 1.2001 (max= 1.6193), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:19:11,927 - root - INFO - Step 26510: lr=1.00E-05, loss= 1.1925 (max= 1.5646), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:19:11,927 - root - INFO - Step 26510: lr=1.00E-05, loss= 1.1925 (max= 1.5646), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:19:11,927 - root - INFO - Step 26510: lr=1.00E-05, loss= 1.1925 (max= 1.5646), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:19:11,927 - root - INFO - Step 26510: lr=1.00E-05, loss= 1.1925 (max= 1.5646), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:19:11,927 - root - INFO - Step 26510: lr=1.00E-05, loss= 1.1925 (max= 1.5646), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:19:11,927 - root - INFO - Step 26510: lr=1.00E-05, loss= 1.1925 (max= 1.5646), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:19:11,928 - root - INFO - Step 26510: lr=1.00E-05, loss= 1.1925 (max= 1.5646), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:19:11,928 - root - INFO - Step 26510: lr=1.00E-05, loss= 1.1925 (max= 1.5646), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:19:27,890 - root - INFO - Step 26520: lr=1.00E-05, loss= 1.1994 (max= 1.5865), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:19:27,890 - root - INFO - Step 26520: lr=1.00E-05, loss= 1.1994 (max= 1.5865), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:19:27,890 - root - INFO - Step 26520: lr=1.00E-05, loss= 1.1994 (max= 1.5865), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:19:27,890 - root - INFO - Step 26520: lr=1.00E-05, loss= 1.1994 (max= 1.5865), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:19:27,890 - root - INFO - Step 26520: lr=1.00E-05, loss= 1.1994 (max= 1.5865), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:19:27,891 - root - INFO - Step 26520: lr=1.00E-05, loss= 1.1994 (max= 1.5865), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:19:27,891 - root - INFO - Step 26520: lr=1.00E-05, loss= 1.1994 (max= 1.5865), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:19:27,891 - root - INFO - Step 26520: lr=1.00E-05, loss= 1.1994 (max= 1.5865), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:19:43,829 - root - INFO - Step 26530: lr=1.00E-05, loss= 1.1503 (max= 1.5753), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:19:43,829 - root - INFO - Step 26530: lr=1.00E-05, loss= 1.1503 (max= 1.5753), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:19:43,829 - root - INFO - Step 26530: lr=1.00E-05, loss= 1.1503 (max= 1.5753), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:19:43,829 - root - INFO - Step 26530: lr=1.00E-05, loss= 1.1503 (max= 1.5753), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:19:43,829 - root - INFO - Step 26530: lr=1.00E-05, loss= 1.1503 (max= 1.5753), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:19:43,829 - root - INFO - Step 26530: lr=1.00E-05, loss= 1.1503 (max= 1.5753), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:19:43,829 - root - INFO - Step 26530: lr=1.00E-05, loss= 1.1503 (max= 1.5753), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:19:43,829 - root - INFO - Step 26530: lr=1.00E-05, loss= 1.1503 (max= 1.5753), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:19:59,790 - root - INFO - Step 26540: lr=1.00E-05, loss= 1.1961 (max= 1.4987), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:19:59,790 - root - INFO - Step 26540: lr=1.00E-05, loss= 1.1961 (max= 1.4987), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:19:59,791 - root - INFO - Step 26540: lr=1.00E-05, loss= 1.1961 (max= 1.4987), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:19:59,791 - root - INFO - Step 26540: lr=1.00E-05, loss= 1.1961 (max= 1.4987), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:19:59,791 - root - INFO - Step 26540: lr=1.00E-05, loss= 1.1961 (max= 1.4987), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:19:59,791 - root - INFO - Step 26540: lr=1.00E-05, loss= 1.1961 (max= 1.4987), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:19:59,791 - root - INFO - Step 26540: lr=1.00E-05, loss= 1.1961 (max= 1.4987), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:19:59,791 - root - INFO - Step 26540: lr=1.00E-05, loss= 1.1961 (max= 1.4987), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:15,710 - root - INFO - Step 26550: lr=1.00E-05, loss= 1.1990 (max= 1.5399), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:15,710 - root - INFO - Step 26550: lr=1.00E-05, loss= 1.1990 (max= 1.5399), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:15,710 - root - INFO - Step 26550: lr=1.00E-05, loss= 1.1990 (max= 1.5399), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:15,710 - root - INFO - Step 26550: lr=1.00E-05, loss= 1.1990 (max= 1.5399), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:15,710 - root - INFO - Step 26550: lr=1.00E-05, loss= 1.1990 (max= 1.5399), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:15,710 - root - INFO - Step 26550: lr=1.00E-05, loss= 1.1990 (max= 1.5399), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:15,710 - root - INFO - Step 26550: lr=1.00E-05, loss= 1.1990 (max= 1.5399), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:15,711 - root - INFO - Step 26550: lr=1.00E-05, loss= 1.1990 (max= 1.5399), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:31,657 - root - INFO - Step 26560: lr=1.00E-05, loss= 1.1878 (max= 1.5727), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:31,657 - root - INFO - Step 26560: lr=1.00E-05, loss= 1.1878 (max= 1.5727), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:31,657 - root - INFO - Step 26560: lr=1.00E-05, loss= 1.1878 (max= 1.5727), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:31,657 - root - INFO - Step 26560: lr=1.00E-05, loss= 1.1878 (max= 1.5727), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:31,657 - root - INFO - Step 26560: lr=1.00E-05, loss= 1.1878 (max= 1.5727), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:31,657 - root - INFO - Step 26560: lr=1.00E-05, loss= 1.1878 (max= 1.5727), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:31,657 - root - INFO - Step 26560: lr=1.00E-05, loss= 1.1878 (max= 1.5727), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:31,657 - root - INFO - Step 26560: lr=1.00E-05, loss= 1.1878 (max= 1.5727), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:47,603 - root - INFO - Step 26570: lr=1.00E-05, loss= 1.1750 (max= 1.9029), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:47,603 - root - INFO - Step 26570: lr=1.00E-05, loss= 1.1750 (max= 1.9029), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:47,603 - root - INFO - Step 26570: lr=1.00E-05, loss= 1.1750 (max= 1.9029), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:47,603 - root - INFO - Step 26570: lr=1.00E-05, loss= 1.1750 (max= 1.9029), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:47,603 - root - INFO - Step 26570: lr=1.00E-05, loss= 1.1750 (max= 1.9029), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:47,603 - root - INFO - Step 26570: lr=1.00E-05, loss= 1.1750 (max= 1.9029), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:47,603 - root - INFO - Step 26570: lr=1.00E-05, loss= 1.1750 (max= 1.9029), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:20:47,604 - root - INFO - Step 26570: lr=1.00E-05, loss= 1.1750 (max= 1.9029), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:21:03,492 - root - INFO - Step 26580: lr=1.00E-05, loss= 1.1615 (max= 1.5121), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:03,492 - root - INFO - Step 26580: lr=1.00E-05, loss= 1.1615 (max= 1.5121), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:03,492 - root - INFO - Step 26580: lr=1.00E-05, loss= 1.1615 (max= 1.5121), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:03,492 - root - INFO - Step 26580: lr=1.00E-05, loss= 1.1615 (max= 1.5121), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:03,492 - root - INFO - Step 26580: lr=1.00E-05, loss= 1.1615 (max= 1.5121), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:03,492 - root - INFO - Step 26580: lr=1.00E-05, loss= 1.1615 (max= 1.5121), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:03,492 - root - INFO - Step 26580: lr=1.00E-05, loss= 1.1615 (max= 1.5121), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:03,493 - root - INFO - Step 26580: lr=1.00E-05, loss= 1.1615 (max= 1.5121), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:19,365 - root - INFO - Step 26590: lr=1.00E-05, loss= 1.1785 (max= 1.5792), tps=20648, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:21:19,365 - root - INFO - Step 26590: lr=1.00E-05, loss= 1.1785 (max= 1.5792), tps=20648, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:21:19,365 - root - INFO - Step 26590: lr=1.00E-05, loss= 1.1785 (max= 1.5792), tps=20648, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:21:19,365 - root - INFO - Step 26590: lr=1.00E-05, loss= 1.1785 (max= 1.5792), tps=20648, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:21:19,365 - root - INFO - Step 26590: lr=1.00E-05, loss= 1.1785 (max= 1.5792), tps=20648, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:21:19,365 - root - INFO - Step 26590: lr=1.00E-05, loss= 1.1785 (max= 1.5792), tps=20649, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:21:19,365 - root - INFO - Step 26590: lr=1.00E-05, loss= 1.1785 (max= 1.5792), tps=20649, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:21:19,365 - root - INFO - Step 26590: lr=1.00E-05, loss= 1.1785 (max= 1.5792), tps=20649, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:21:35,311 - root - INFO - Step 26600: lr=1.00E-05, loss= 1.1800 (max= 1.7182), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:35,311 - root - INFO - Step 26600: lr=1.00E-05, loss= 1.1800 (max= 1.7182), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:35,311 - root - INFO - Step 26600: lr=1.00E-05, loss= 1.1800 (max= 1.7182), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:35,311 - root - INFO - Step 26600: lr=1.00E-05, loss= 1.1800 (max= 1.7182), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:35,311 - root - INFO - Step 26600: lr=1.00E-05, loss= 1.1800 (max= 1.7182), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:35,311 - root - INFO - Step 26600: lr=1.00E-05, loss= 1.1800 (max= 1.7182), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:35,311 - root - INFO - Step 26600: lr=1.00E-05, loss= 1.1800 (max= 1.7182), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:35,311 - root - INFO - Step 26600: lr=1.00E-05, loss= 1.1800 (max= 1.7182), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:51,259 - root - INFO - Step 26610: lr=1.00E-05, loss= 1.1737 (max= 1.7161), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:51,259 - root - INFO - Step 26610: lr=1.00E-05, loss= 1.1737 (max= 1.7161), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:51,259 - root - INFO - Step 26610: lr=1.00E-05, loss= 1.1737 (max= 1.7161), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:51,259 - root - INFO - Step 26610: lr=1.00E-05, loss= 1.1737 (max= 1.7161), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:51,259 - root - INFO - Step 26610: lr=1.00E-05, loss= 1.1737 (max= 1.7161), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:51,259 - root - INFO - Step 26610: lr=1.00E-05, loss= 1.1737 (max= 1.7161), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:51,259 - root - INFO - Step 26610: lr=1.00E-05, loss= 1.1737 (max= 1.7161), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:21:51,259 - root - INFO - Step 26610: lr=1.00E-05, loss= 1.1737 (max= 1.7161), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:22:07,228 - root - INFO - Step 26620: lr=1.00E-05, loss= 1.1912 (max= 1.7879), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:22:07,228 - root - INFO - Step 26620: lr=1.00E-05, loss= 1.1912 (max= 1.7879), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:22:07,228 - root - INFO - Step 26620: lr=1.00E-05, loss= 1.1912 (max= 1.7879), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:22:07,228 - root - INFO - Step 26620: lr=1.00E-05, loss= 1.1912 (max= 1.7879), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:22:07,228 - root - INFO - Step 26620: lr=1.00E-05, loss= 1.1912 (max= 1.7879), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:22:07,228 - root - INFO - Step 26620: lr=1.00E-05, loss= 1.1912 (max= 1.7879), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:22:07,228 - root - INFO - Step 26620: lr=1.00E-05, loss= 1.1912 (max= 1.7879), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:22:07,228 - root - INFO - Step 26620: lr=1.00E-05, loss= 1.1912 (max= 1.7879), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:22:23,169 - root - INFO - Step 26630: lr=1.00E-05, loss= 1.1951 (max= 1.7302), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:22:23,169 - root - INFO - Step 26630: lr=1.00E-05, loss= 1.1951 (max= 1.7302), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:22:23,169 - root - INFO - Step 26630: lr=1.00E-05, loss= 1.1951 (max= 1.7302), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:22:23,169 - root - INFO - Step 26630: lr=1.00E-05, loss= 1.1951 (max= 1.7302), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:22:23,169 - root - INFO - Step 26630: lr=1.00E-05, loss= 1.1951 (max= 1.7302), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:22:23,169 - root - INFO - Step 26630: lr=1.00E-05, loss= 1.1951 (max= 1.7302), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:22:23,169 - root - INFO - Step 26630: lr=1.00E-05, loss= 1.1951 (max= 1.7302), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:22:23,169 - root - INFO - Step 26630: lr=1.00E-05, loss= 1.1951 (max= 1.7302), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:22:39,114 - root - INFO - Step 26640: lr=1.00E-05, loss= 1.1681 (max= 1.8466), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:22:39,114 - root - INFO - Step 26640: lr=1.00E-05, loss= 1.1681 (max= 1.8466), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:22:39,114 - root - INFO - Step 26640: lr=1.00E-05, loss= 1.1681 (max= 1.8466), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:22:39,114 - root - INFO - Step 26640: lr=1.00E-05, loss= 1.1681 (max= 1.8466), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:22:39,114 - root - INFO - Step 26640: lr=1.00E-05, loss= 1.1681 (max= 1.8466), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:22:39,114 - root - INFO - Step 26640: lr=1.00E-05, loss= 1.1681 (max= 1.8466), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:22:39,114 - root - INFO - Step 26640: lr=1.00E-05, loss= 1.1681 (max= 1.8466), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:22:39,114 - root - INFO - Step 26640: lr=1.00E-05, loss= 1.1681 (max= 1.8466), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:22:55,038 - root - INFO - Step 26650: lr=1.00E-05, loss= 1.2013 (max= 1.9952), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:22:55,038 - root - INFO - Step 26650: lr=1.00E-05, loss= 1.2013 (max= 1.9952), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:22:55,038 - root - INFO - Step 26650: lr=1.00E-05, loss= 1.2013 (max= 1.9952), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:22:55,038 - root - INFO - Step 26650: lr=1.00E-05, loss= 1.2013 (max= 1.9952), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:22:55,039 - root - INFO - Step 26650: lr=1.00E-05, loss= 1.2013 (max= 1.9952), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:22:55,039 - root - INFO - Step 26650: lr=1.00E-05, loss= 1.2013 (max= 1.9952), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:22:55,039 - root - INFO - Step 26650: lr=1.00E-05, loss= 1.2013 (max= 1.9952), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:22:55,039 - root - INFO - Step 26650: lr=1.00E-05, loss= 1.2013 (max= 1.9952), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:23:11,001 - root - INFO - Step 26660: lr=1.00E-05, loss= 1.1781 (max= 1.5373), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:11,001 - root - INFO - Step 26660: lr=1.00E-05, loss= 1.1781 (max= 1.5373), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:11,001 - root - INFO - Step 26660: lr=1.00E-05, loss= 1.1781 (max= 1.5373), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:11,002 - root - INFO - Step 26660: lr=1.00E-05, loss= 1.1781 (max= 1.5373), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:11,002 - root - INFO - Step 26660: lr=1.00E-05, loss= 1.1781 (max= 1.5373), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:11,002 - root - INFO - Step 26660: lr=1.00E-05, loss= 1.1781 (max= 1.5373), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:11,002 - root - INFO - Step 26660: lr=1.00E-05, loss= 1.1781 (max= 1.5373), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:11,002 - root - INFO - Step 26660: lr=1.00E-05, loss= 1.1781 (max= 1.5373), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:26,921 - root - INFO - Step 26670: lr=1.00E-05, loss= 1.1729 (max= 1.4582), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:26,921 - root - INFO - Step 26670: lr=1.00E-05, loss= 1.1729 (max= 1.4582), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:26,921 - root - INFO - Step 26670: lr=1.00E-05, loss= 1.1729 (max= 1.4582), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:26,921 - root - INFO - Step 26670: lr=1.00E-05, loss= 1.1729 (max= 1.4582), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:26,921 - root - INFO - Step 26670: lr=1.00E-05, loss= 1.1729 (max= 1.4582), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:26,921 - root - INFO - Step 26670: lr=1.00E-05, loss= 1.1729 (max= 1.4582), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:26,921 - root - INFO - Step 26670: lr=1.00E-05, loss= 1.1729 (max= 1.4582), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:26,922 - root - INFO - Step 26670: lr=1.00E-05, loss= 1.1729 (max= 1.4582), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:42,851 - root - INFO - Step 26680: lr=1.00E-05, loss= 1.2086 (max= 1.5478), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:42,851 - root - INFO - Step 26680: lr=1.00E-05, loss= 1.2086 (max= 1.5478), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:42,851 - root - INFO - Step 26680: lr=1.00E-05, loss= 1.2086 (max= 1.5478), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:42,851 - root - INFO - Step 26680: lr=1.00E-05, loss= 1.2086 (max= 1.5478), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:42,851 - root - INFO - Step 26680: lr=1.00E-05, loss= 1.2086 (max= 1.5478), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:42,851 - root - INFO - Step 26680: lr=1.00E-05, loss= 1.2086 (max= 1.5478), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:42,851 - root - INFO - Step 26680: lr=1.00E-05, loss= 1.2086 (max= 1.5478), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:42,851 - root - INFO - Step 26680: lr=1.00E-05, loss= 1.2086 (max= 1.5478), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:23:58,789 - root - INFO - Step 26690: lr=1.00E-05, loss= 1.1746 (max= 1.6987), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:23:58,789 - root - INFO - Step 26690: lr=1.00E-05, loss= 1.1746 (max= 1.6987), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:23:58,789 - root - INFO - Step 26690: lr=1.00E-05, loss= 1.1746 (max= 1.6987), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:23:58,789 - root - INFO - Step 26690: lr=1.00E-05, loss= 1.1746 (max= 1.6987), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:23:58,789 - root - INFO - Step 26690: lr=1.00E-05, loss= 1.1746 (max= 1.6987), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:23:58,789 - root - INFO - Step 26690: lr=1.00E-05, loss= 1.1746 (max= 1.6987), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:23:58,789 - root - INFO - Step 26690: lr=1.00E-05, loss= 1.1746 (max= 1.6987), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:23:58,789 - root - INFO - Step 26690: lr=1.00E-05, loss= 1.1746 (max= 1.6987), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:24:14,779 - root - INFO - Step 26700: lr=1.00E-05, loss= 1.1946 (max= 1.7912), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:24:14,779 - root - INFO - Step 26700: lr=1.00E-05, loss= 1.1946 (max= 1.7912), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:24:14,779 - root - INFO - Step 26700: lr=1.00E-05, loss= 1.1946 (max= 1.7912), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:24:14,779 - root - INFO - Step 26700: lr=1.00E-05, loss= 1.1946 (max= 1.7912), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:24:14,779 - root - INFO - Step 26700: lr=1.00E-05, loss= 1.1946 (max= 1.7912), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:24:14,779 - root - INFO - Step 26700: lr=1.00E-05, loss= 1.1946 (max= 1.7912), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:24:14,779 - root - INFO - Step 26700: lr=1.00E-05, loss= 1.1946 (max= 1.7912), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:24:14,779 - root - INFO - Step 26700: lr=1.00E-05, loss= 1.1946 (max= 1.7912), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:24:30,717 - root - INFO - Step 26710: lr=1.00E-05, loss= 1.1633 (max= 1.5280), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:24:30,717 - root - INFO - Step 26710: lr=1.00E-05, loss= 1.1633 (max= 1.5280), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:24:30,717 - root - INFO - Step 26710: lr=1.00E-05, loss= 1.1633 (max= 1.5280), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:24:30,717 - root - INFO - Step 26710: lr=1.00E-05, loss= 1.1633 (max= 1.5280), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:24:30,718 - root - INFO - Step 26710: lr=1.00E-05, loss= 1.1633 (max= 1.5280), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:24:30,718 - root - INFO - Step 26710: lr=1.00E-05, loss= 1.1633 (max= 1.5280), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:24:30,718 - root - INFO - Step 26710: lr=1.00E-05, loss= 1.1633 (max= 1.5280), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:24:30,718 - root - INFO - Step 26710: lr=1.00E-05, loss= 1.1633 (max= 1.5280), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:24:46,609 - root - INFO - Step 26720: lr=1.00E-05, loss= 1.2066 (max= 1.7801), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:24:46,609 - root - INFO - Step 26720: lr=1.00E-05, loss= 1.2066 (max= 1.7801), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:24:46,609 - root - INFO - Step 26720: lr=1.00E-05, loss= 1.2066 (max= 1.7801), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:24:46,609 - root - INFO - Step 26720: lr=1.00E-05, loss= 1.2066 (max= 1.7801), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:24:46,609 - root - INFO - Step 26720: lr=1.00E-05, loss= 1.2066 (max= 1.7801), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:24:46,609 - root - INFO - Step 26720: lr=1.00E-05, loss= 1.2066 (max= 1.7801), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:24:46,609 - root - INFO - Step 26720: lr=1.00E-05, loss= 1.2066 (max= 1.7801), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:24:46,609 - root - INFO - Step 26720: lr=1.00E-05, loss= 1.2066 (max= 1.7801), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:25:02,508 - root - INFO - Step 26730: lr=1.00E-05, loss= 1.1732 (max= 1.6149), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:02,508 - root - INFO - Step 26730: lr=1.00E-05, loss= 1.1732 (max= 1.6149), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:02,508 - root - INFO - Step 26730: lr=1.00E-05, loss= 1.1732 (max= 1.6149), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:02,508 - root - INFO - Step 26730: lr=1.00E-05, loss= 1.1732 (max= 1.6149), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:02,508 - root - INFO - Step 26730: lr=1.00E-05, loss= 1.1732 (max= 1.6149), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:02,509 - root - INFO - Step 26730: lr=1.00E-05, loss= 1.1732 (max= 1.6149), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:02,509 - root - INFO - Step 26730: lr=1.00E-05, loss= 1.1732 (max= 1.6149), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:02,509 - root - INFO - Step 26730: lr=1.00E-05, loss= 1.1732 (max= 1.6149), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:18,397 - root - INFO - Step 26740: lr=1.00E-05, loss= 1.1970 (max= 1.5577), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:18,397 - root - INFO - Step 26740: lr=1.00E-05, loss= 1.1970 (max= 1.5577), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:18,397 - root - INFO - Step 26740: lr=1.00E-05, loss= 1.1970 (max= 1.5577), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:18,397 - root - INFO - Step 26740: lr=1.00E-05, loss= 1.1970 (max= 1.5577), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:18,397 - root - INFO - Step 26740: lr=1.00E-05, loss= 1.1970 (max= 1.5577), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:18,397 - root - INFO - Step 26740: lr=1.00E-05, loss= 1.1970 (max= 1.5577), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:18,397 - root - INFO - Step 26740: lr=1.00E-05, loss= 1.1970 (max= 1.5577), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:18,398 - root - INFO - Step 26740: lr=1.00E-05, loss= 1.1970 (max= 1.5577), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:34,285 - root - INFO - Step 26750: lr=1.00E-05, loss= 1.1855 (max= 1.5429), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:34,285 - root - INFO - Step 26750: lr=1.00E-05, loss= 1.1855 (max= 1.5429), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:34,285 - root - INFO - Step 26750: lr=1.00E-05, loss= 1.1855 (max= 1.5429), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:34,285 - root - INFO - Step 26750: lr=1.00E-05, loss= 1.1855 (max= 1.5429), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:34,285 - root - INFO - Step 26750: lr=1.00E-05, loss= 1.1855 (max= 1.5429), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:34,285 - root - INFO - Step 26750: lr=1.00E-05, loss= 1.1855 (max= 1.5429), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:34,285 - root - INFO - Step 26750: lr=1.00E-05, loss= 1.1855 (max= 1.5429), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:34,285 - root - INFO - Step 26750: lr=1.00E-05, loss= 1.1855 (max= 1.5429), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:50,230 - root - INFO - Step 26760: lr=1.00E-05, loss= 1.2032 (max= 1.7362), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:50,230 - root - INFO - Step 26760: lr=1.00E-05, loss= 1.2032 (max= 1.7362), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:50,230 - root - INFO - Step 26760: lr=1.00E-05, loss= 1.2032 (max= 1.7362), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:50,230 - root - INFO - Step 26760: lr=1.00E-05, loss= 1.2032 (max= 1.7362), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:50,230 - root - INFO - Step 26760: lr=1.00E-05, loss= 1.2032 (max= 1.7362), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:50,230 - root - INFO - Step 26760: lr=1.00E-05, loss= 1.2032 (max= 1.7362), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:50,230 - root - INFO - Step 26760: lr=1.00E-05, loss= 1.2032 (max= 1.7362), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:25:50,230 - root - INFO - Step 26760: lr=1.00E-05, loss= 1.2032 (max= 1.7362), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:06,158 - root - INFO - Step 26770: lr=1.00E-05, loss= 1.1660 (max= 1.5880), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:06,158 - root - INFO - Step 26770: lr=1.00E-05, loss= 1.1660 (max= 1.5880), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:06,159 - root - INFO - Step 26770: lr=1.00E-05, loss= 1.1660 (max= 1.5880), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:06,159 - root - INFO - Step 26770: lr=1.00E-05, loss= 1.1660 (max= 1.5880), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:06,159 - root - INFO - Step 26770: lr=1.00E-05, loss= 1.1660 (max= 1.5880), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:06,159 - root - INFO - Step 26770: lr=1.00E-05, loss= 1.1660 (max= 1.5880), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:06,159 - root - INFO - Step 26770: lr=1.00E-05, loss= 1.1660 (max= 1.5880), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:06,159 - root - INFO - Step 26770: lr=1.00E-05, loss= 1.1660 (max= 1.5880), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:22,072 - root - INFO - Step 26780: lr=1.00E-05, loss= 1.2253 (max= 1.5786), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:22,072 - root - INFO - Step 26780: lr=1.00E-05, loss= 1.2253 (max= 1.5786), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:22,072 - root - INFO - Step 26780: lr=1.00E-05, loss= 1.2253 (max= 1.5786), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:22,072 - root - INFO - Step 26780: lr=1.00E-05, loss= 1.2253 (max= 1.5786), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:22,072 - root - INFO - Step 26780: lr=1.00E-05, loss= 1.2253 (max= 1.5786), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:22,072 - root - INFO - Step 26780: lr=1.00E-05, loss= 1.2253 (max= 1.5786), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:22,072 - root - INFO - Step 26780: lr=1.00E-05, loss= 1.2253 (max= 1.5786), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:22,073 - root - INFO - Step 26780: lr=1.00E-05, loss= 1.2253 (max= 1.5786), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:37,996 - root - INFO - Step 26790: lr=1.00E-05, loss= 1.2060 (max= 1.8090), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:37,996 - root - INFO - Step 26790: lr=1.00E-05, loss= 1.2060 (max= 1.8090), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:37,997 - root - INFO - Step 26790: lr=1.00E-05, loss= 1.2060 (max= 1.8090), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:37,997 - root - INFO - Step 26790: lr=1.00E-05, loss= 1.2060 (max= 1.8090), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:37,997 - root - INFO - Step 26790: lr=1.00E-05, loss= 1.2060 (max= 1.8090), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:37,997 - root - INFO - Step 26790: lr=1.00E-05, loss= 1.2060 (max= 1.8090), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:37,997 - root - INFO - Step 26790: lr=1.00E-05, loss= 1.2060 (max= 1.8090), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:37,997 - root - INFO - Step 26790: lr=1.00E-05, loss= 1.2060 (max= 1.8090), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:53,908 - root - INFO - Step 26800: lr=1.00E-05, loss= 1.1763 (max= 1.5221), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:53,908 - root - INFO - Step 26800: lr=1.00E-05, loss= 1.1763 (max= 1.5221), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:53,908 - root - INFO - Step 26800: lr=1.00E-05, loss= 1.1763 (max= 1.5221), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:53,908 - root - INFO - Step 26800: lr=1.00E-05, loss= 1.1763 (max= 1.5221), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:53,908 - root - INFO - Step 26800: lr=1.00E-05, loss= 1.1763 (max= 1.5221), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:53,908 - root - INFO - Step 26800: lr=1.00E-05, loss= 1.1763 (max= 1.5221), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:53,908 - root - INFO - Step 26800: lr=1.00E-05, loss= 1.1763 (max= 1.5221), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:26:53,908 - root - INFO - Step 26800: lr=1.00E-05, loss= 1.1763 (max= 1.5221), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:27:09,840 - root - INFO - Step 26810: lr=1.00E-05, loss= 1.1631 (max= 1.7840), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:27:09,841 - root - INFO - Step 26810: lr=1.00E-05, loss= 1.1631 (max= 1.7840), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:27:09,841 - root - INFO - Step 26810: lr=1.00E-05, loss= 1.1631 (max= 1.7840), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:27:09,841 - root - INFO - Step 26810: lr=1.00E-05, loss= 1.1631 (max= 1.7840), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:27:09,841 - root - INFO - Step 26810: lr=1.00E-05, loss= 1.1631 (max= 1.7840), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:27:09,841 - root - INFO - Step 26810: lr=1.00E-05, loss= 1.1631 (max= 1.7840), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:27:09,841 - root - INFO - Step 26810: lr=1.00E-05, loss= 1.1631 (max= 1.7840), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:27:09,841 - root - INFO - Step 26810: lr=1.00E-05, loss= 1.1631 (max= 1.7840), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:27:25,819 - root - INFO - Step 26820: lr=1.00E-05, loss= 1.1687 (max= 1.5663), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:27:25,819 - root - INFO - Step 26820: lr=1.00E-05, loss= 1.1687 (max= 1.5663), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:27:25,819 - root - INFO - Step 26820: lr=1.00E-05, loss= 1.1687 (max= 1.5663), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:27:25,819 - root - INFO - Step 26820: lr=1.00E-05, loss= 1.1687 (max= 1.5663), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:27:25,819 - root - INFO - Step 26820: lr=1.00E-05, loss= 1.1687 (max= 1.5663), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:27:25,819 - root - INFO - Step 26820: lr=1.00E-05, loss= 1.1687 (max= 1.5663), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:27:25,819 - root - INFO - Step 26820: lr=1.00E-05, loss= 1.1687 (max= 1.5663), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:27:25,819 - root - INFO - Step 26820: lr=1.00E-05, loss= 1.1687 (max= 1.5663), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:27:41,749 - root - INFO - Step 26830: lr=1.00E-05, loss= 1.1976 (max= 1.6557), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:27:41,749 - root - INFO - Step 26830: lr=1.00E-05, loss= 1.1976 (max= 1.6557), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:27:41,749 - root - INFO - Step 26830: lr=1.00E-05, loss= 1.1976 (max= 1.6557), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:27:41,749 - root - INFO - Step 26830: lr=1.00E-05, loss= 1.1976 (max= 1.6557), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:27:41,749 - root - INFO - Step 26830: lr=1.00E-05, loss= 1.1976 (max= 1.6557), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:27:41,749 - root - INFO - Step 26830: lr=1.00E-05, loss= 1.1976 (max= 1.6557), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:27:41,749 - root - INFO - Step 26830: lr=1.00E-05, loss= 1.1976 (max= 1.6557), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:27:41,749 - root - INFO - Step 26830: lr=1.00E-05, loss= 1.1976 (max= 1.6557), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:27:57,712 - root - INFO - Step 26840: lr=1.00E-05, loss= 1.1927 (max= 1.5884), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:27:57,712 - root - INFO - Step 26840: lr=1.00E-05, loss= 1.1927 (max= 1.5884), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:27:57,712 - root - INFO - Step 26840: lr=1.00E-05, loss= 1.1927 (max= 1.5884), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:27:57,712 - root - INFO - Step 26840: lr=1.00E-05, loss= 1.1927 (max= 1.5884), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:27:57,712 - root - INFO - Step 26840: lr=1.00E-05, loss= 1.1927 (max= 1.5884), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:27:57,712 - root - INFO - Step 26840: lr=1.00E-05, loss= 1.1927 (max= 1.5884), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:27:57,712 - root - INFO - Step 26840: lr=1.00E-05, loss= 1.1927 (max= 1.5884), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:27:57,712 - root - INFO - Step 26840: lr=1.00E-05, loss= 1.1927 (max= 1.5884), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:28:13,689 - root - INFO - Step 26850: lr=1.00E-05, loss= 1.1758 (max= 1.4734), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:28:13,689 - root - INFO - Step 26850: lr=1.00E-05, loss= 1.1758 (max= 1.4734), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:28:13,689 - root - INFO - Step 26850: lr=1.00E-05, loss= 1.1758 (max= 1.4734), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:28:13,689 - root - INFO - Step 26850: lr=1.00E-05, loss= 1.1758 (max= 1.4734), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:28:13,689 - root - INFO - Step 26850: lr=1.00E-05, loss= 1.1758 (max= 1.4734), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:28:13,689 - root - INFO - Step 26850: lr=1.00E-05, loss= 1.1758 (max= 1.4734), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:28:13,690 - root - INFO - Step 26850: lr=1.00E-05, loss= 1.1758 (max= 1.4734), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:28:13,690 - root - INFO - Step 26850: lr=1.00E-05, loss= 1.1758 (max= 1.4734), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:28:29,628 - root - INFO - Step 26860: lr=1.00E-05, loss= 1.2326 (max= 1.6139), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:28:29,628 - root - INFO - Step 26860: lr=1.00E-05, loss= 1.2326 (max= 1.6139), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:28:29,628 - root - INFO - Step 26860: lr=1.00E-05, loss= 1.2326 (max= 1.6139), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:28:29,628 - root - INFO - Step 26860: lr=1.00E-05, loss= 1.2326 (max= 1.6139), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:28:29,628 - root - INFO - Step 26860: lr=1.00E-05, loss= 1.2326 (max= 1.6139), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:28:29,628 - root - INFO - Step 26860: lr=1.00E-05, loss= 1.2326 (max= 1.6139), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:28:29,628 - root - INFO - Step 26860: lr=1.00E-05, loss= 1.2326 (max= 1.6139), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:28:29,629 - root - INFO - Step 26860: lr=1.00E-05, loss= 1.2326 (max= 1.6139), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:28:45,606 - root - INFO - Step 26870: lr=1.00E-05, loss= 1.1669 (max= 1.6040), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:28:45,606 - root - INFO - Step 26870: lr=1.00E-05, loss= 1.1669 (max= 1.6040), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:28:45,606 - root - INFO - Step 26870: lr=1.00E-05, loss= 1.1669 (max= 1.6040), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:28:45,606 - root - INFO - Step 26870: lr=1.00E-05, loss= 1.1669 (max= 1.6040), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:28:45,606 - root - INFO - Step 26870: lr=1.00E-05, loss= 1.1669 (max= 1.6040), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:28:45,606 - root - INFO - Step 26870: lr=1.00E-05, loss= 1.1669 (max= 1.6040), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:28:45,606 - root - INFO - Step 26870: lr=1.00E-05, loss= 1.1669 (max= 1.6040), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:28:45,606 - root - INFO - Step 26870: lr=1.00E-05, loss= 1.1669 (max= 1.6040), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:01,576 - root - INFO - Step 26880: lr=1.00E-05, loss= 1.1958 (max= 1.8228), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:01,576 - root - INFO - Step 26880: lr=1.00E-05, loss= 1.1958 (max= 1.8228), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:01,576 - root - INFO - Step 26880: lr=1.00E-05, loss= 1.1958 (max= 1.8228), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:01,576 - root - INFO - Step 26880: lr=1.00E-05, loss= 1.1958 (max= 1.8228), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:01,576 - root - INFO - Step 26880: lr=1.00E-05, loss= 1.1958 (max= 1.8228), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:01,576 - root - INFO - Step 26880: lr=1.00E-05, loss= 1.1958 (max= 1.8228), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:01,576 - root - INFO - Step 26880: lr=1.00E-05, loss= 1.1958 (max= 1.8228), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:01,576 - root - INFO - Step 26880: lr=1.00E-05, loss= 1.1958 (max= 1.8228), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:17,496 - root - INFO - Step 26890: lr=1.00E-05, loss= 1.1853 (max= 1.5784), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:17,496 - root - INFO - Step 26890: lr=1.00E-05, loss= 1.1853 (max= 1.5784), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:17,496 - root - INFO - Step 26890: lr=1.00E-05, loss= 1.1853 (max= 1.5784), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:17,496 - root - INFO - Step 26890: lr=1.00E-05, loss= 1.1853 (max= 1.5784), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:17,496 - root - INFO - Step 26890: lr=1.00E-05, loss= 1.1853 (max= 1.5784), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:17,496 - root - INFO - Step 26890: lr=1.00E-05, loss= 1.1853 (max= 1.5784), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:17,496 - root - INFO - Step 26890: lr=1.00E-05, loss= 1.1853 (max= 1.5784), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:17,496 - root - INFO - Step 26890: lr=1.00E-05, loss= 1.1853 (max= 1.5784), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:33,453 - root - INFO - Step 26900: lr=1.00E-05, loss= 1.1833 (max= 1.5605), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:33,453 - root - INFO - Step 26900: lr=1.00E-05, loss= 1.1833 (max= 1.5605), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:33,453 - root - INFO - Step 26900: lr=1.00E-05, loss= 1.1833 (max= 1.5605), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:33,453 - root - INFO - Step 26900: lr=1.00E-05, loss= 1.1833 (max= 1.5605), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:33,453 - root - INFO - Step 26900: lr=1.00E-05, loss= 1.1833 (max= 1.5605), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:33,453 - root - INFO - Step 26900: lr=1.00E-05, loss= 1.1833 (max= 1.5605), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:33,453 - root - INFO - Step 26900: lr=1.00E-05, loss= 1.1833 (max= 1.5605), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:33,454 - root - INFO - Step 26900: lr=1.00E-05, loss= 1.1833 (max= 1.5605), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:29:49,380 - root - INFO - Step 26910: lr=1.00E-05, loss= 1.1930 (max= 1.5950), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:29:49,380 - root - INFO - Step 26910: lr=1.00E-05, loss= 1.1930 (max= 1.5950), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:29:49,380 - root - INFO - Step 26910: lr=1.00E-05, loss= 1.1930 (max= 1.5950), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:29:49,380 - root - INFO - Step 26910: lr=1.00E-05, loss= 1.1930 (max= 1.5950), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:29:49,380 - root - INFO - Step 26910: lr=1.00E-05, loss= 1.1930 (max= 1.5950), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:29:49,380 - root - INFO - Step 26910: lr=1.00E-05, loss= 1.1930 (max= 1.5950), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:29:49,380 - root - INFO - Step 26910: lr=1.00E-05, loss= 1.1930 (max= 1.5950), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:29:49,380 - root - INFO - Step 26910: lr=1.00E-05, loss= 1.1930 (max= 1.5950), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:05,283 - root - INFO - Step 26920: lr=1.00E-05, loss= 1.1996 (max= 1.5966), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:05,283 - root - INFO - Step 26920: lr=1.00E-05, loss= 1.1996 (max= 1.5966), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:05,283 - root - INFO - Step 26920: lr=1.00E-05, loss= 1.1996 (max= 1.5966), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:05,283 - root - INFO - Step 26920: lr=1.00E-05, loss= 1.1996 (max= 1.5966), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:05,283 - root - INFO - Step 26920: lr=1.00E-05, loss= 1.1996 (max= 1.5966), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:05,283 - root - INFO - Step 26920: lr=1.00E-05, loss= 1.1996 (max= 1.5966), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:05,283 - root - INFO - Step 26920: lr=1.00E-05, loss= 1.1996 (max= 1.5966), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:05,284 - root - INFO - Step 26920: lr=1.00E-05, loss= 1.1996 (max= 1.5966), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:21,216 - root - INFO - Step 26930: lr=1.00E-05, loss= 1.1830 (max= 1.5633), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:21,216 - root - INFO - Step 26930: lr=1.00E-05, loss= 1.1830 (max= 1.5633), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:21,217 - root - INFO - Step 26930: lr=1.00E-05, loss= 1.1830 (max= 1.5633), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:21,217 - root - INFO - Step 26930: lr=1.00E-05, loss= 1.1830 (max= 1.5633), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:21,217 - root - INFO - Step 26930: lr=1.00E-05, loss= 1.1830 (max= 1.5633), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:21,217 - root - INFO - Step 26930: lr=1.00E-05, loss= 1.1830 (max= 1.5633), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:21,217 - root - INFO - Step 26930: lr=1.00E-05, loss= 1.1830 (max= 1.5633), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:21,217 - root - INFO - Step 26930: lr=1.00E-05, loss= 1.1830 (max= 1.5633), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:37,196 - root - INFO - Step 26940: lr=1.00E-05, loss= 1.1945 (max= 1.7801), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:30:37,196 - root - INFO - Step 26940: lr=1.00E-05, loss= 1.1945 (max= 1.7801), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:30:37,196 - root - INFO - Step 26940: lr=1.00E-05, loss= 1.1945 (max= 1.7801), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:30:37,196 - root - INFO - Step 26940: lr=1.00E-05, loss= 1.1945 (max= 1.7801), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:30:37,196 - root - INFO - Step 26940: lr=1.00E-05, loss= 1.1945 (max= 1.7801), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:30:37,196 - root - INFO - Step 26940: lr=1.00E-05, loss= 1.1945 (max= 1.7801), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:30:37,196 - root - INFO - Step 26940: lr=1.00E-05, loss= 1.1945 (max= 1.7801), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:30:37,196 - root - INFO - Step 26940: lr=1.00E-05, loss= 1.1945 (max= 1.7801), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:30:53,111 - root - INFO - Step 26950: lr=1.00E-05, loss= 1.1441 (max= 1.5254), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:53,111 - root - INFO - Step 26950: lr=1.00E-05, loss= 1.1441 (max= 1.5254), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:53,111 - root - INFO - Step 26950: lr=1.00E-05, loss= 1.1441 (max= 1.5254), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:53,111 - root - INFO - Step 26950: lr=1.00E-05, loss= 1.1441 (max= 1.5254), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:53,112 - root - INFO - Step 26950: lr=1.00E-05, loss= 1.1441 (max= 1.5254), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:53,112 - root - INFO - Step 26950: lr=1.00E-05, loss= 1.1441 (max= 1.5254), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:53,112 - root - INFO - Step 26950: lr=1.00E-05, loss= 1.1441 (max= 1.5254), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:30:53,112 - root - INFO - Step 26950: lr=1.00E-05, loss= 1.1441 (max= 1.5254), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:09,018 - root - INFO - Step 26960: lr=1.00E-05, loss= 1.1755 (max= 1.4987), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:31:09,018 - root - INFO - Step 26960: lr=1.00E-05, loss= 1.1755 (max= 1.4987), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:31:09,018 - root - INFO - Step 26960: lr=1.00E-05, loss= 1.1755 (max= 1.4987), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:31:09,018 - root - INFO - Step 26960: lr=1.00E-05, loss= 1.1755 (max= 1.4987), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:31:09,018 - root - INFO - Step 26960: lr=1.00E-05, loss= 1.1755 (max= 1.4987), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:31:09,018 - root - INFO - Step 26960: lr=1.00E-05, loss= 1.1755 (max= 1.4987), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:31:09,018 - root - INFO - Step 26960: lr=1.00E-05, loss= 1.1755 (max= 1.4987), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:31:09,018 - root - INFO - Step 26960: lr=1.00E-05, loss= 1.1755 (max= 1.4987), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:31:24,957 - root - INFO - Step 26970: lr=1.00E-05, loss= 1.1689 (max= 1.8929), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:24,957 - root - INFO - Step 26970: lr=1.00E-05, loss= 1.1689 (max= 1.8929), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:24,957 - root - INFO - Step 26970: lr=1.00E-05, loss= 1.1689 (max= 1.8929), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:24,957 - root - INFO - Step 26970: lr=1.00E-05, loss= 1.1689 (max= 1.8929), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:24,957 - root - INFO - Step 26970: lr=1.00E-05, loss= 1.1689 (max= 1.8929), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:24,957 - root - INFO - Step 26970: lr=1.00E-05, loss= 1.1689 (max= 1.8929), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:24,957 - root - INFO - Step 26970: lr=1.00E-05, loss= 1.1689 (max= 1.8929), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:24,957 - root - INFO - Step 26970: lr=1.00E-05, loss= 1.1689 (max= 1.8929), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:40,921 - root - INFO - Step 26980: lr=1.00E-05, loss= 1.1890 (max= 1.7212), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:40,921 - root - INFO - Step 26980: lr=1.00E-05, loss= 1.1890 (max= 1.7212), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:40,921 - root - INFO - Step 26980: lr=1.00E-05, loss= 1.1890 (max= 1.7212), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:40,921 - root - INFO - Step 26980: lr=1.00E-05, loss= 1.1890 (max= 1.7212), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:40,921 - root - INFO - Step 26980: lr=1.00E-05, loss= 1.1890 (max= 1.7212), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:40,921 - root - INFO - Step 26980: lr=1.00E-05, loss= 1.1890 (max= 1.7212), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:40,921 - root - INFO - Step 26980: lr=1.00E-05, loss= 1.1890 (max= 1.7212), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:40,921 - root - INFO - Step 26980: lr=1.00E-05, loss= 1.1890 (max= 1.7212), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:56,830 - root - INFO - Step 26990: lr=1.00E-05, loss= 1.2007 (max= 1.6865), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:56,830 - root - INFO - Step 26990: lr=1.00E-05, loss= 1.2007 (max= 1.6865), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:56,830 - root - INFO - Step 26990: lr=1.00E-05, loss= 1.2007 (max= 1.6865), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:56,830 - root - INFO - Step 26990: lr=1.00E-05, loss= 1.2007 (max= 1.6865), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:56,830 - root - INFO - Step 26990: lr=1.00E-05, loss= 1.2007 (max= 1.6865), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:56,830 - root - INFO - Step 26990: lr=1.00E-05, loss= 1.2007 (max= 1.6865), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:56,830 - root - INFO - Step 26990: lr=1.00E-05, loss= 1.2007 (max= 1.6865), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:31:56,831 - root - INFO - Step 26990: lr=1.00E-05, loss= 1.2007 (max= 1.6865), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-27000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-27000! Save time: 4.3537187576293945 +2025-10-24 22:32:12,767 - root - INFO - Step 27000: lr=1.00E-05, loss= 1.2007 (max= 1.6913), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:12,767 - root - INFO - Step 27000: lr=1.00E-05, loss= 1.2007 (max= 1.6913), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:12,767 - root - INFO - Saving a full checkpoint at step 27000 +2025-10-24 22:32:12,767 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:32:12,767 - root - INFO - Saving a full checkpoint at step 27000 +2025-10-24 22:32:12,767 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:32:12,767 - root - INFO - Step 27000: lr=1.00E-05, loss= 1.2007 (max= 1.6913), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:12,767 - root - INFO - Step 27000: lr=1.00E-05, loss= 1.2007 (max= 1.6913), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:12,767 - root - INFO - Step 27000: lr=1.00E-05, loss= 1.2007 (max= 1.6913), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:12,768 - root - INFO - Saving a full checkpoint at step 27000 +2025-10-24 22:32:12,767 - root - INFO - Step 27000: lr=1.00E-05, loss= 1.2007 (max= 1.6913), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:12,768 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:32:12,768 - root - INFO - Saving a full checkpoint at step 27000 +2025-10-24 22:32:12,767 - root - INFO - Step 27000: lr=1.00E-05, loss= 1.2007 (max= 1.6913), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:12,768 - root - INFO - Saving a full checkpoint at step 27000 +2025-10-24 22:32:12,768 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:32:12,768 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:32:12,768 - root - INFO - Saving a full checkpoint at step 27000 +2025-10-24 22:32:12,768 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:32:12,768 - root - INFO - Saving a full checkpoint at step 27000 +2025-10-24 22:32:12,768 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:32:12,768 - root - INFO - Step 27000: lr=1.00E-05, loss= 1.2007 (max= 1.6913), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:12,768 - root - INFO - Saving a full checkpoint at step 27000 +2025-10-24 22:32:12,768 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:32:27,266 - root - INFO - Finished saving the checkpoint in 14.50 seconds +2025-10-24 22:32:27,273 - root - INFO - Finished saving the checkpoint in 14.50 seconds +2025-10-24 22:32:27,273 - root - INFO - Finished saving the checkpoint in 14.51 seconds +2025-10-24 22:32:27,274 - root - INFO - Finished saving the checkpoint in 14.51 seconds +2025-10-24 22:32:27,274 - root - INFO - Finished saving the checkpoint in 14.51 seconds +2025-10-24 22:32:27,274 - root - INFO - Finished saving the checkpoint in 14.51 seconds +2025-10-24 22:32:27,274 - root - INFO - Finished saving the checkpoint in 14.51 seconds +2025-10-24 22:32:27,275 - root - INFO - Finished saving the checkpoint in 14.51 seconds +2025-10-24 22:32:43,125 - root - INFO - Step 27010: lr=1.00E-05, loss= 1.1820 (max= 1.4686), tps=10795, mfu=22.49%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:43,125 - root - INFO - Step 27010: lr=1.00E-05, loss= 1.1820 (max= 1.4686), tps=10795, mfu=22.49%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:43,126 - root - INFO - Step 27010: lr=1.00E-05, loss= 1.1820 (max= 1.4686), tps=10795, mfu=22.49%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:43,126 - root - INFO - Step 27010: lr=1.00E-05, loss= 1.1820 (max= 1.4686), tps=10795, mfu=22.49%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:43,126 - root - INFO - Step 27010: lr=1.00E-05, loss= 1.1820 (max= 1.4686), tps=10795, mfu=22.49%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:43,126 - root - INFO - Step 27010: lr=1.00E-05, loss= 1.1820 (max= 1.4686), tps=10795, mfu=22.49%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:43,126 - root - INFO - Step 27010: lr=1.00E-05, loss= 1.1820 (max= 1.4686), tps=10795, mfu=22.49%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:43,126 - root - INFO - Step 27010: lr=1.00E-05, loss= 1.1820 (max= 1.4686), tps=10795, mfu=22.49%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:59,021 - root - INFO - Step 27020: lr=1.00E-05, loss= 1.1462 (max= 1.4963), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:59,021 - root - INFO - Step 27020: lr=1.00E-05, loss= 1.1462 (max= 1.4963), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:59,021 - root - INFO - Step 27020: lr=1.00E-05, loss= 1.1462 (max= 1.4963), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:59,021 - root - INFO - Step 27020: lr=1.00E-05, loss= 1.1462 (max= 1.4963), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:59,021 - root - INFO - Step 27020: lr=1.00E-05, loss= 1.1462 (max= 1.4963), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:59,021 - root - INFO - Step 27020: lr=1.00E-05, loss= 1.1462 (max= 1.4963), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:59,021 - root - INFO - Step 27020: lr=1.00E-05, loss= 1.1462 (max= 1.4963), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:32:59,021 - root - INFO - Step 27020: lr=1.00E-05, loss= 1.1462 (max= 1.4963), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:33:15,009 - root - INFO - Step 27030: lr=1.00E-05, loss= 1.1875 (max= 1.9042), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:33:15,009 - root - INFO - Step 27030: lr=1.00E-05, loss= 1.1875 (max= 1.9042), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:33:15,009 - root - INFO - Step 27030: lr=1.00E-05, loss= 1.1875 (max= 1.9042), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:33:15,009 - root - INFO - Step 27030: lr=1.00E-05, loss= 1.1875 (max= 1.9042), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:33:15,009 - root - INFO - Step 27030: lr=1.00E-05, loss= 1.1875 (max= 1.9042), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:33:15,009 - root - INFO - Step 27030: lr=1.00E-05, loss= 1.1875 (max= 1.9042), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:33:15,009 - root - INFO - Step 27030: lr=1.00E-05, loss= 1.1875 (max= 1.9042), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:33:15,010 - root - INFO - Step 27030: lr=1.00E-05, loss= 1.1875 (max= 1.9042), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:33:30,920 - root - INFO - Step 27040: lr=1.00E-05, loss= 1.1671 (max= 1.6072), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:33:30,920 - root - INFO - Step 27040: lr=1.00E-05, loss= 1.1671 (max= 1.6072), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:33:30,920 - root - INFO - Step 27040: lr=1.00E-05, loss= 1.1671 (max= 1.6072), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:33:30,920 - root - INFO - Step 27040: lr=1.00E-05, loss= 1.1671 (max= 1.6072), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:33:30,920 - root - INFO - Step 27040: lr=1.00E-05, loss= 1.1671 (max= 1.6072), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:33:30,920 - root - INFO - Step 27040: lr=1.00E-05, loss= 1.1671 (max= 1.6072), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:33:30,920 - root - INFO - Step 27040: lr=1.00E-05, loss= 1.1671 (max= 1.6072), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:33:30,920 - root - INFO - Step 27040: lr=1.00E-05, loss= 1.1671 (max= 1.6072), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:33:46,892 - root - INFO - Step 27050: lr=1.00E-05, loss= 1.1844 (max= 1.5026), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:33:46,892 - root - INFO - Step 27050: lr=1.00E-05, loss= 1.1844 (max= 1.5026), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:33:46,892 - root - INFO - Step 27050: lr=1.00E-05, loss= 1.1844 (max= 1.5026), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:33:46,892 - root - INFO - Step 27050: lr=1.00E-05, loss= 1.1844 (max= 1.5026), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:33:46,892 - root - INFO - Step 27050: lr=1.00E-05, loss= 1.1844 (max= 1.5026), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:33:46,892 - root - INFO - Step 27050: lr=1.00E-05, loss= 1.1844 (max= 1.5026), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:33:46,892 - root - INFO - Step 27050: lr=1.00E-05, loss= 1.1844 (max= 1.5026), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:33:46,892 - root - INFO - Step 27050: lr=1.00E-05, loss= 1.1844 (max= 1.5026), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:33:52,377 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:2227547 +2025-10-24 22:34:02,813 - root - INFO - Step 27060: lr=1.00E-05, loss= 1.1695 (max= 1.6472), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:02,813 - root - INFO - Step 27060: lr=1.00E-05, loss= 1.1695 (max= 1.6472), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:02,813 - root - INFO - Step 27060: lr=1.00E-05, loss= 1.1695 (max= 1.6472), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:02,813 - root - INFO - Step 27060: lr=1.00E-05, loss= 1.1695 (max= 1.6472), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:02,813 - root - INFO - Step 27060: lr=1.00E-05, loss= 1.1695 (max= 1.6472), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:02,813 - root - INFO - Step 27060: lr=1.00E-05, loss= 1.1695 (max= 1.6472), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:02,813 - root - INFO - Step 27060: lr=1.00E-05, loss= 1.1695 (max= 1.6472), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:02,813 - root - INFO - Step 27060: lr=1.00E-05, loss= 1.1695 (max= 1.6472), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:18,738 - root - INFO - Step 27070: lr=1.00E-05, loss= 1.2190 (max= 1.6067), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:18,739 - root - INFO - Step 27070: lr=1.00E-05, loss= 1.2190 (max= 1.6067), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:18,739 - root - INFO - Step 27070: lr=1.00E-05, loss= 1.2190 (max= 1.6067), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:18,739 - root - INFO - Step 27070: lr=1.00E-05, loss= 1.2190 (max= 1.6067), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:18,739 - root - INFO - Step 27070: lr=1.00E-05, loss= 1.2190 (max= 1.6067), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:18,739 - root - INFO - Step 27070: lr=1.00E-05, loss= 1.2190 (max= 1.6067), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:18,739 - root - INFO - Step 27070: lr=1.00E-05, loss= 1.2190 (max= 1.6067), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:18,739 - root - INFO - Step 27070: lr=1.00E-05, loss= 1.2190 (max= 1.6067), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:34,701 - root - INFO - Step 27080: lr=1.00E-05, loss= 1.1764 (max= 1.4680), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:34,701 - root - INFO - Step 27080: lr=1.00E-05, loss= 1.1764 (max= 1.4680), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:34,701 - root - INFO - Step 27080: lr=1.00E-05, loss= 1.1764 (max= 1.4680), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:34,701 - root - INFO - Step 27080: lr=1.00E-05, loss= 1.1764 (max= 1.4680), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:34,701 - root - INFO - Step 27080: lr=1.00E-05, loss= 1.1764 (max= 1.4680), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:34,701 - root - INFO - Step 27080: lr=1.00E-05, loss= 1.1764 (max= 1.4680), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:34,701 - root - INFO - Step 27080: lr=1.00E-05, loss= 1.1764 (max= 1.4680), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:34,701 - root - INFO - Step 27080: lr=1.00E-05, loss= 1.1764 (max= 1.4680), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:50,667 - root - INFO - Step 27090: lr=1.00E-05, loss= 1.1874 (max= 1.5860), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:50,667 - root - INFO - Step 27090: lr=1.00E-05, loss= 1.1874 (max= 1.5860), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:50,667 - root - INFO - Step 27090: lr=1.00E-05, loss= 1.1874 (max= 1.5860), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:50,667 - root - INFO - Step 27090: lr=1.00E-05, loss= 1.1874 (max= 1.5860), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:50,667 - root - INFO - Step 27090: lr=1.00E-05, loss= 1.1874 (max= 1.5860), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:50,667 - root - INFO - Step 27090: lr=1.00E-05, loss= 1.1874 (max= 1.5860), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:50,668 - root - INFO - Step 27090: lr=1.00E-05, loss= 1.1874 (max= 1.5860), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:34:50,668 - root - INFO - Step 27090: lr=1.00E-05, loss= 1.1874 (max= 1.5860), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:06,622 - root - INFO - Step 27100: lr=1.00E-05, loss= 1.1927 (max= 1.5629), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:06,622 - root - INFO - Step 27100: lr=1.00E-05, loss= 1.1927 (max= 1.5629), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:06,622 - root - INFO - Step 27100: lr=1.00E-05, loss= 1.1927 (max= 1.5629), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:06,622 - root - INFO - Step 27100: lr=1.00E-05, loss= 1.1927 (max= 1.5629), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:06,622 - root - INFO - Step 27100: lr=1.00E-05, loss= 1.1927 (max= 1.5629), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:06,622 - root - INFO - Step 27100: lr=1.00E-05, loss= 1.1927 (max= 1.5629), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:06,622 - root - INFO - Step 27100: lr=1.00E-05, loss= 1.1927 (max= 1.5629), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:06,622 - root - INFO - Step 27100: lr=1.00E-05, loss= 1.1927 (max= 1.5629), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:22,596 - root - INFO - Step 27110: lr=1.00E-05, loss= 1.1911 (max= 1.6760), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:22,596 - root - INFO - Step 27110: lr=1.00E-05, loss= 1.1911 (max= 1.6760), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:22,596 - root - INFO - Step 27110: lr=1.00E-05, loss= 1.1911 (max= 1.6760), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:22,596 - root - INFO - Step 27110: lr=1.00E-05, loss= 1.1911 (max= 1.6760), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:22,596 - root - INFO - Step 27110: lr=1.00E-05, loss= 1.1911 (max= 1.6760), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:22,596 - root - INFO - Step 27110: lr=1.00E-05, loss= 1.1911 (max= 1.6760), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:22,596 - root - INFO - Step 27110: lr=1.00E-05, loss= 1.1911 (max= 1.6760), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:22,596 - root - INFO - Step 27110: lr=1.00E-05, loss= 1.1911 (max= 1.6760), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:38,569 - root - INFO - Step 27120: lr=1.00E-05, loss= 1.1792 (max= 1.5523), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:38,569 - root - INFO - Step 27120: lr=1.00E-05, loss= 1.1792 (max= 1.5523), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:38,569 - root - INFO - Step 27120: lr=1.00E-05, loss= 1.1792 (max= 1.5523), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:38,569 - root - INFO - Step 27120: lr=1.00E-05, loss= 1.1792 (max= 1.5523), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:38,569 - root - INFO - Step 27120: lr=1.00E-05, loss= 1.1792 (max= 1.5523), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:38,569 - root - INFO - Step 27120: lr=1.00E-05, loss= 1.1792 (max= 1.5523), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:38,569 - root - INFO - Step 27120: lr=1.00E-05, loss= 1.1792 (max= 1.5523), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:38,569 - root - INFO - Step 27120: lr=1.00E-05, loss= 1.1792 (max= 1.5523), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:35:54,538 - root - INFO - Step 27130: lr=1.00E-05, loss= 1.1488 (max= 1.6189), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:35:54,538 - root - INFO - Step 27130: lr=1.00E-05, loss= 1.1488 (max= 1.6189), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:35:54,538 - root - INFO - Step 27130: lr=1.00E-05, loss= 1.1488 (max= 1.6189), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:35:54,538 - root - INFO - Step 27130: lr=1.00E-05, loss= 1.1488 (max= 1.6189), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:35:54,538 - root - INFO - Step 27130: lr=1.00E-05, loss= 1.1488 (max= 1.6189), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:35:54,538 - root - INFO - Step 27130: lr=1.00E-05, loss= 1.1488 (max= 1.6189), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:35:54,538 - root - INFO - Step 27130: lr=1.00E-05, loss= 1.1488 (max= 1.6189), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:35:54,538 - root - INFO - Step 27130: lr=1.00E-05, loss= 1.1488 (max= 1.6189), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:36:10,487 - root - INFO - Step 27140: lr=1.00E-05, loss= 1.1551 (max= 1.5891), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:10,487 - root - INFO - Step 27140: lr=1.00E-05, loss= 1.1551 (max= 1.5891), tps=20549, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:10,487 - root - INFO - Step 27140: lr=1.00E-05, loss= 1.1551 (max= 1.5891), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:10,487 - root - INFO - Step 27140: lr=1.00E-05, loss= 1.1551 (max= 1.5891), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:10,487 - root - INFO - Step 27140: lr=1.00E-05, loss= 1.1551 (max= 1.5891), tps=20549, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:10,487 - root - INFO - Step 27140: lr=1.00E-05, loss= 1.1551 (max= 1.5891), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:10,487 - root - INFO - Step 27140: lr=1.00E-05, loss= 1.1551 (max= 1.5891), tps=20549, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:10,488 - root - INFO - Step 27140: lr=1.00E-05, loss= 1.1551 (max= 1.5891), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:26,391 - root - INFO - Step 27150: lr=1.00E-05, loss= 1.1904 (max= 1.5633), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:36:26,391 - root - INFO - Step 27150: lr=1.00E-05, loss= 1.1904 (max= 1.5633), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:36:26,391 - root - INFO - Step 27150: lr=1.00E-05, loss= 1.1904 (max= 1.5633), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:36:26,392 - root - INFO - Step 27150: lr=1.00E-05, loss= 1.1904 (max= 1.5633), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:36:26,392 - root - INFO - Step 27150: lr=1.00E-05, loss= 1.1904 (max= 1.5633), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:36:26,392 - root - INFO - Step 27150: lr=1.00E-05, loss= 1.1904 (max= 1.5633), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:36:26,392 - root - INFO - Step 27150: lr=1.00E-05, loss= 1.1904 (max= 1.5633), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:36:26,392 - root - INFO - Step 27150: lr=1.00E-05, loss= 1.1904 (max= 1.5633), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:36:42,394 - root - INFO - Step 27160: lr=1.00E-05, loss= 1.1918 (max= 1.5921), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:42,394 - root - INFO - Step 27160: lr=1.00E-05, loss= 1.1918 (max= 1.5921), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:42,394 - root - INFO - Step 27160: lr=1.00E-05, loss= 1.1918 (max= 1.5921), tps=20480, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:42,394 - root - INFO - Step 27160: lr=1.00E-05, loss= 1.1918 (max= 1.5921), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:42,394 - root - INFO - Step 27160: lr=1.00E-05, loss= 1.1918 (max= 1.5921), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:42,394 - root - INFO - Step 27160: lr=1.00E-05, loss= 1.1918 (max= 1.5921), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:42,394 - root - INFO - Step 27160: lr=1.00E-05, loss= 1.1918 (max= 1.5921), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:42,395 - root - INFO - Step 27160: lr=1.00E-05, loss= 1.1918 (max= 1.5921), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:58,370 - root - INFO - Step 27170: lr=1.00E-05, loss= 1.1896 (max= 1.5600), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:58,370 - root - INFO - Step 27170: lr=1.00E-05, loss= 1.1896 (max= 1.5600), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:58,370 - root - INFO - Step 27170: lr=1.00E-05, loss= 1.1896 (max= 1.5600), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:58,370 - root - INFO - Step 27170: lr=1.00E-05, loss= 1.1896 (max= 1.5600), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:58,370 - root - INFO - Step 27170: lr=1.00E-05, loss= 1.1896 (max= 1.5600), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:58,370 - root - INFO - Step 27170: lr=1.00E-05, loss= 1.1896 (max= 1.5600), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:58,370 - root - INFO - Step 27170: lr=1.00E-05, loss= 1.1896 (max= 1.5600), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:36:58,370 - root - INFO - Step 27170: lr=1.00E-05, loss= 1.1896 (max= 1.5600), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:37:14,308 - root - INFO - Step 27180: lr=1.00E-05, loss= 1.2129 (max= 1.5149), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:37:14,308 - root - INFO - Step 27180: lr=1.00E-05, loss= 1.2129 (max= 1.5149), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:37:14,308 - root - INFO - Step 27180: lr=1.00E-05, loss= 1.2129 (max= 1.5149), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:37:14,308 - root - INFO - Step 27180: lr=1.00E-05, loss= 1.2129 (max= 1.5149), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:37:14,308 - root - INFO - Step 27180: lr=1.00E-05, loss= 1.2129 (max= 1.5149), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:37:14,308 - root - INFO - Step 27180: lr=1.00E-05, loss= 1.2129 (max= 1.5149), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:37:14,308 - root - INFO - Step 27180: lr=1.00E-05, loss= 1.2129 (max= 1.5149), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:37:14,308 - root - INFO - Step 27180: lr=1.00E-05, loss= 1.2129 (max= 1.5149), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:37:30,269 - root - INFO - Step 27190: lr=1.00E-05, loss= 1.1849 (max= 1.5722), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:37:30,269 - root - INFO - Step 27190: lr=1.00E-05, loss= 1.1849 (max= 1.5722), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:37:30,269 - root - INFO - Step 27190: lr=1.00E-05, loss= 1.1849 (max= 1.5722), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:37:30,269 - root - INFO - Step 27190: lr=1.00E-05, loss= 1.1849 (max= 1.5722), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:37:30,269 - root - INFO - Step 27190: lr=1.00E-05, loss= 1.1849 (max= 1.5722), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:37:30,269 - root - INFO - Step 27190: lr=1.00E-05, loss= 1.1849 (max= 1.5722), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:37:30,269 - root - INFO - Step 27190: lr=1.00E-05, loss= 1.1849 (max= 1.5722), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:37:30,270 - root - INFO - Step 27190: lr=1.00E-05, loss= 1.1849 (max= 1.5722), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:37:46,188 - root - INFO - Step 27200: lr=1.00E-05, loss= 1.1959 (max= 1.8595), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:37:46,188 - root - INFO - Step 27200: lr=1.00E-05, loss= 1.1959 (max= 1.8595), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:37:46,188 - root - INFO - Step 27200: lr=1.00E-05, loss= 1.1959 (max= 1.8595), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:37:46,188 - root - INFO - Step 27200: lr=1.00E-05, loss= 1.1959 (max= 1.8595), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:37:46,188 - root - INFO - Step 27200: lr=1.00E-05, loss= 1.1959 (max= 1.8595), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:37:46,188 - root - INFO - Step 27200: lr=1.00E-05, loss= 1.1959 (max= 1.8595), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:37:46,188 - root - INFO - Step 27200: lr=1.00E-05, loss= 1.1959 (max= 1.8595), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:37:46,189 - root - INFO - Step 27200: lr=1.00E-05, loss= 1.1959 (max= 1.8595), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:02,106 - root - INFO - Step 27210: lr=1.00E-05, loss= 1.1965 (max= 1.6153), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:02,106 - root - INFO - Step 27210: lr=1.00E-05, loss= 1.1965 (max= 1.6153), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:02,106 - root - INFO - Step 27210: lr=1.00E-05, loss= 1.1965 (max= 1.6153), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:02,107 - root - INFO - Step 27210: lr=1.00E-05, loss= 1.1965 (max= 1.6153), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:02,107 - root - INFO - Step 27210: lr=1.00E-05, loss= 1.1965 (max= 1.6153), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:02,107 - root - INFO - Step 27210: lr=1.00E-05, loss= 1.1965 (max= 1.6153), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:02,107 - root - INFO - Step 27210: lr=1.00E-05, loss= 1.1965 (max= 1.6153), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:02,107 - root - INFO - Step 27210: lr=1.00E-05, loss= 1.1965 (max= 1.6153), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:18,045 - root - INFO - Step 27220: lr=1.00E-05, loss= 1.1953 (max= 1.7011), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:18,045 - root - INFO - Step 27220: lr=1.00E-05, loss= 1.1953 (max= 1.7011), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:18,045 - root - INFO - Step 27220: lr=1.00E-05, loss= 1.1953 (max= 1.7011), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:18,045 - root - INFO - Step 27220: lr=1.00E-05, loss= 1.1953 (max= 1.7011), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:18,045 - root - INFO - Step 27220: lr=1.00E-05, loss= 1.1953 (max= 1.7011), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:18,045 - root - INFO - Step 27220: lr=1.00E-05, loss= 1.1953 (max= 1.7011), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:18,045 - root - INFO - Step 27220: lr=1.00E-05, loss= 1.1953 (max= 1.7011), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:18,045 - root - INFO - Step 27220: lr=1.00E-05, loss= 1.1953 (max= 1.7011), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:33,999 - root - INFO - Step 27230: lr=1.00E-05, loss= 1.1857 (max= 1.6129), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:33,999 - root - INFO - Step 27230: lr=1.00E-05, loss= 1.1857 (max= 1.6129), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:33,999 - root - INFO - Step 27230: lr=1.00E-05, loss= 1.1857 (max= 1.6129), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:33,999 - root - INFO - Step 27230: lr=1.00E-05, loss= 1.1857 (max= 1.6129), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:33,999 - root - INFO - Step 27230: lr=1.00E-05, loss= 1.1857 (max= 1.6129), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:33,999 - root - INFO - Step 27230: lr=1.00E-05, loss= 1.1857 (max= 1.6129), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:33,999 - root - INFO - Step 27230: lr=1.00E-05, loss= 1.1857 (max= 1.6129), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:33,999 - root - INFO - Step 27230: lr=1.00E-05, loss= 1.1857 (max= 1.6129), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:38:49,920 - root - INFO - Step 27240: lr=1.00E-05, loss= 1.1691 (max= 1.5532), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:38:49,920 - root - INFO - Step 27240: lr=1.00E-05, loss= 1.1691 (max= 1.5532), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:38:49,920 - root - INFO - Step 27240: lr=1.00E-05, loss= 1.1691 (max= 1.5532), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:38:49,920 - root - INFO - Step 27240: lr=1.00E-05, loss= 1.1691 (max= 1.5532), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:38:49,920 - root - INFO - Step 27240: lr=1.00E-05, loss= 1.1691 (max= 1.5532), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:38:49,920 - root - INFO - Step 27240: lr=1.00E-05, loss= 1.1691 (max= 1.5532), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:38:49,920 - root - INFO - Step 27240: lr=1.00E-05, loss= 1.1691 (max= 1.5532), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:38:49,921 - root - INFO - Step 27240: lr=1.00E-05, loss= 1.1691 (max= 1.5532), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:05,822 - root - INFO - Step 27250: lr=1.00E-05, loss= 1.2329 (max= 1.8101), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:05,822 - root - INFO - Step 27250: lr=1.00E-05, loss= 1.2329 (max= 1.8101), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:05,822 - root - INFO - Step 27250: lr=1.00E-05, loss= 1.2329 (max= 1.8101), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:05,823 - root - INFO - Step 27250: lr=1.00E-05, loss= 1.2329 (max= 1.8101), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:05,823 - root - INFO - Step 27250: lr=1.00E-05, loss= 1.2329 (max= 1.8101), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:05,823 - root - INFO - Step 27250: lr=1.00E-05, loss= 1.2329 (max= 1.8101), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:05,823 - root - INFO - Step 27250: lr=1.00E-05, loss= 1.2329 (max= 1.8101), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:05,823 - root - INFO - Step 27250: lr=1.00E-05, loss= 1.2329 (max= 1.8101), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:21,771 - root - INFO - Step 27260: lr=1.00E-05, loss= 1.2060 (max= 1.5529), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:21,771 - root - INFO - Step 27260: lr=1.00E-05, loss= 1.2060 (max= 1.5529), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:21,771 - root - INFO - Step 27260: lr=1.00E-05, loss= 1.2060 (max= 1.5529), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:21,771 - root - INFO - Step 27260: lr=1.00E-05, loss= 1.2060 (max= 1.5529), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:21,771 - root - INFO - Step 27260: lr=1.00E-05, loss= 1.2060 (max= 1.5529), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:21,771 - root - INFO - Step 27260: lr=1.00E-05, loss= 1.2060 (max= 1.5529), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:21,771 - root - INFO - Step 27260: lr=1.00E-05, loss= 1.2060 (max= 1.5529), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:21,771 - root - INFO - Step 27260: lr=1.00E-05, loss= 1.2060 (max= 1.5529), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:37,744 - root - INFO - Step 27270: lr=1.00E-05, loss= 1.2067 (max= 1.5859), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:37,744 - root - INFO - Step 27270: lr=1.00E-05, loss= 1.2067 (max= 1.5859), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:37,744 - root - INFO - Step 27270: lr=1.00E-05, loss= 1.2067 (max= 1.5859), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:37,744 - root - INFO - Step 27270: lr=1.00E-05, loss= 1.2067 (max= 1.5859), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:37,744 - root - INFO - Step 27270: lr=1.00E-05, loss= 1.2067 (max= 1.5859), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:37,744 - root - INFO - Step 27270: lr=1.00E-05, loss= 1.2067 (max= 1.5859), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:37,744 - root - INFO - Step 27270: lr=1.00E-05, loss= 1.2067 (max= 1.5859), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:37,744 - root - INFO - Step 27270: lr=1.00E-05, loss= 1.2067 (max= 1.5859), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:39:53,708 - root - INFO - Step 27280: lr=1.00E-05, loss= 1.1757 (max= 1.5440), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:39:53,708 - root - INFO - Step 27280: lr=1.00E-05, loss= 1.1757 (max= 1.5440), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:39:53,708 - root - INFO - Step 27280: lr=1.00E-05, loss= 1.1757 (max= 1.5440), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:39:53,709 - root - INFO - Step 27280: lr=1.00E-05, loss= 1.1757 (max= 1.5440), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:39:53,709 - root - INFO - Step 27280: lr=1.00E-05, loss= 1.1757 (max= 1.5440), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:39:53,709 - root - INFO - Step 27280: lr=1.00E-05, loss= 1.1757 (max= 1.5440), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:39:53,709 - root - INFO - Step 27280: lr=1.00E-05, loss= 1.1757 (max= 1.5440), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:39:53,709 - root - INFO - Step 27280: lr=1.00E-05, loss= 1.1757 (max= 1.5440), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:40:09,649 - root - INFO - Step 27290: lr=1.00E-05, loss= 1.1995 (max= 1.8853), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:09,649 - root - INFO - Step 27290: lr=1.00E-05, loss= 1.1995 (max= 1.8853), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:09,649 - root - INFO - Step 27290: lr=1.00E-05, loss= 1.1995 (max= 1.8853), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:09,649 - root - INFO - Step 27290: lr=1.00E-05, loss= 1.1995 (max= 1.8853), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:09,649 - root - INFO - Step 27290: lr=1.00E-05, loss= 1.1995 (max= 1.8853), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:09,649 - root - INFO - Step 27290: lr=1.00E-05, loss= 1.1995 (max= 1.8853), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:09,649 - root - INFO - Step 27290: lr=1.00E-05, loss= 1.1995 (max= 1.8853), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:09,649 - root - INFO - Step 27290: lr=1.00E-05, loss= 1.1995 (max= 1.8853), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:25,602 - root - INFO - Step 27300: lr=1.00E-05, loss= 1.2116 (max= 1.8172), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:25,602 - root - INFO - Step 27300: lr=1.00E-05, loss= 1.2116 (max= 1.8172), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:25,602 - root - INFO - Step 27300: lr=1.00E-05, loss= 1.2116 (max= 1.8172), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:25,602 - root - INFO - Step 27300: lr=1.00E-05, loss= 1.2116 (max= 1.8172), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:25,603 - root - INFO - Step 27300: lr=1.00E-05, loss= 1.2116 (max= 1.8172), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:25,603 - root - INFO - Step 27300: lr=1.00E-05, loss= 1.2116 (max= 1.8172), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:25,603 - root - INFO - Step 27300: lr=1.00E-05, loss= 1.2116 (max= 1.8172), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:25,603 - root - INFO - Step 27300: lr=1.00E-05, loss= 1.2116 (max= 1.8172), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:41,519 - root - INFO - Step 27310: lr=1.00E-05, loss= 1.1898 (max= 1.4714), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:41,519 - root - INFO - Step 27310: lr=1.00E-05, loss= 1.1898 (max= 1.4714), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:41,520 - root - INFO - Step 27310: lr=1.00E-05, loss= 1.1898 (max= 1.4714), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:41,520 - root - INFO - Step 27310: lr=1.00E-05, loss= 1.1898 (max= 1.4714), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:41,520 - root - INFO - Step 27310: lr=1.00E-05, loss= 1.1898 (max= 1.4714), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:41,520 - root - INFO - Step 27310: lr=1.00E-05, loss= 1.1898 (max= 1.4714), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:41,520 - root - INFO - Step 27310: lr=1.00E-05, loss= 1.1898 (max= 1.4714), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:41,520 - root - INFO - Step 27310: lr=1.00E-05, loss= 1.1898 (max= 1.4714), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:40:57,423 - root - INFO - Step 27320: lr=1.00E-05, loss= 1.2022 (max= 1.5329), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:40:57,423 - root - INFO - Step 27320: lr=1.00E-05, loss= 1.2022 (max= 1.5329), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:40:57,423 - root - INFO - Step 27320: lr=1.00E-05, loss= 1.2022 (max= 1.5329), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:40:57,424 - root - INFO - Step 27320: lr=1.00E-05, loss= 1.2022 (max= 1.5329), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:40:57,424 - root - INFO - Step 27320: lr=1.00E-05, loss= 1.2022 (max= 1.5329), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:40:57,424 - root - INFO - Step 27320: lr=1.00E-05, loss= 1.2022 (max= 1.5329), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:40:57,424 - root - INFO - Step 27320: lr=1.00E-05, loss= 1.2022 (max= 1.5329), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:40:57,424 - root - INFO - Step 27320: lr=1.00E-05, loss= 1.2022 (max= 1.5329), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:41:13,398 - root - INFO - Step 27330: lr=1.00E-05, loss= 1.1956 (max= 1.5358), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:13,398 - root - INFO - Step 27330: lr=1.00E-05, loss= 1.1956 (max= 1.5358), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:13,398 - root - INFO - Step 27330: lr=1.00E-05, loss= 1.1956 (max= 1.5358), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:13,398 - root - INFO - Step 27330: lr=1.00E-05, loss= 1.1956 (max= 1.5358), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:13,398 - root - INFO - Step 27330: lr=1.00E-05, loss= 1.1956 (max= 1.5358), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:13,398 - root - INFO - Step 27330: lr=1.00E-05, loss= 1.1956 (max= 1.5358), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:13,398 - root - INFO - Step 27330: lr=1.00E-05, loss= 1.1956 (max= 1.5358), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:13,399 - root - INFO - Step 27330: lr=1.00E-05, loss= 1.1956 (max= 1.5358), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:29,357 - root - INFO - Step 27340: lr=1.00E-05, loss= 1.1936 (max= 1.5879), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:29,357 - root - INFO - Step 27340: lr=1.00E-05, loss= 1.1936 (max= 1.5879), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:29,357 - root - INFO - Step 27340: lr=1.00E-05, loss= 1.1936 (max= 1.5879), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:29,357 - root - INFO - Step 27340: lr=1.00E-05, loss= 1.1936 (max= 1.5879), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:29,357 - root - INFO - Step 27340: lr=1.00E-05, loss= 1.1936 (max= 1.5879), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:29,357 - root - INFO - Step 27340: lr=1.00E-05, loss= 1.1936 (max= 1.5879), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:29,357 - root - INFO - Step 27340: lr=1.00E-05, loss= 1.1936 (max= 1.5879), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:29,357 - root - INFO - Step 27340: lr=1.00E-05, loss= 1.1936 (max= 1.5879), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:45,301 - root - INFO - Step 27350: lr=1.00E-05, loss= 1.1700 (max= 1.5871), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:45,301 - root - INFO - Step 27350: lr=1.00E-05, loss= 1.1700 (max= 1.5871), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:45,301 - root - INFO - Step 27350: lr=1.00E-05, loss= 1.1700 (max= 1.5871), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:45,302 - root - INFO - Step 27350: lr=1.00E-05, loss= 1.1700 (max= 1.5871), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:45,302 - root - INFO - Step 27350: lr=1.00E-05, loss= 1.1700 (max= 1.5871), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:45,302 - root - INFO - Step 27350: lr=1.00E-05, loss= 1.1700 (max= 1.5871), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:45,302 - root - INFO - Step 27350: lr=1.00E-05, loss= 1.1700 (max= 1.5871), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:41:45,302 - root - INFO - Step 27350: lr=1.00E-05, loss= 1.1700 (max= 1.5871), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:01,263 - root - INFO - Step 27360: lr=1.00E-05, loss= 1.1757 (max= 1.5541), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:01,264 - root - INFO - Step 27360: lr=1.00E-05, loss= 1.1757 (max= 1.5541), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:01,264 - root - INFO - Step 27360: lr=1.00E-05, loss= 1.1757 (max= 1.5541), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:01,264 - root - INFO - Step 27360: lr=1.00E-05, loss= 1.1757 (max= 1.5541), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:01,264 - root - INFO - Step 27360: lr=1.00E-05, loss= 1.1757 (max= 1.5541), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:01,264 - root - INFO - Step 27360: lr=1.00E-05, loss= 1.1757 (max= 1.5541), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:01,264 - root - INFO - Step 27360: lr=1.00E-05, loss= 1.1757 (max= 1.5541), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:01,264 - root - INFO - Step 27360: lr=1.00E-05, loss= 1.1757 (max= 1.5541), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:17,179 - root - INFO - Step 27370: lr=1.00E-05, loss= 1.1972 (max= 1.8069), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:17,180 - root - INFO - Step 27370: lr=1.00E-05, loss= 1.1972 (max= 1.8069), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:17,180 - root - INFO - Step 27370: lr=1.00E-05, loss= 1.1972 (max= 1.8069), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:17,180 - root - INFO - Step 27370: lr=1.00E-05, loss= 1.1972 (max= 1.8069), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:17,180 - root - INFO - Step 27370: lr=1.00E-05, loss= 1.1972 (max= 1.8069), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:17,180 - root - INFO - Step 27370: lr=1.00E-05, loss= 1.1972 (max= 1.8069), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:17,180 - root - INFO - Step 27370: lr=1.00E-05, loss= 1.1972 (max= 1.8069), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:17,180 - root - INFO - Step 27370: lr=1.00E-05, loss= 1.1972 (max= 1.8069), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:33,124 - root - INFO - Step 27380: lr=1.00E-05, loss= 1.1823 (max= 1.5732), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:33,124 - root - INFO - Step 27380: lr=1.00E-05, loss= 1.1823 (max= 1.5732), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:33,124 - root - INFO - Step 27380: lr=1.00E-05, loss= 1.1823 (max= 1.5732), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:33,124 - root - INFO - Step 27380: lr=1.00E-05, loss= 1.1823 (max= 1.5732), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:33,124 - root - INFO - Step 27380: lr=1.00E-05, loss= 1.1823 (max= 1.5732), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:33,124 - root - INFO - Step 27380: lr=1.00E-05, loss= 1.1823 (max= 1.5732), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:33,124 - root - INFO - Step 27380: lr=1.00E-05, loss= 1.1823 (max= 1.5732), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:33,124 - root - INFO - Step 27380: lr=1.00E-05, loss= 1.1823 (max= 1.5732), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:49,082 - root - INFO - Step 27390: lr=1.00E-05, loss= 1.1919 (max= 1.6096), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:49,082 - root - INFO - Step 27390: lr=1.00E-05, loss= 1.1919 (max= 1.6096), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:49,082 - root - INFO - Step 27390: lr=1.00E-05, loss= 1.1919 (max= 1.6096), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:49,082 - root - INFO - Step 27390: lr=1.00E-05, loss= 1.1919 (max= 1.6096), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:49,082 - root - INFO - Step 27390: lr=1.00E-05, loss= 1.1919 (max= 1.6096), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:49,082 - root - INFO - Step 27390: lr=1.00E-05, loss= 1.1919 (max= 1.6096), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:49,082 - root - INFO - Step 27390: lr=1.00E-05, loss= 1.1919 (max= 1.6096), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:42:49,082 - root - INFO - Step 27390: lr=1.00E-05, loss= 1.1919 (max= 1.6096), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:05,045 - root - INFO - Step 27400: lr=1.00E-05, loss= 1.2120 (max= 1.7702), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:05,045 - root - INFO - Step 27400: lr=1.00E-05, loss= 1.2120 (max= 1.7702), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:05,045 - root - INFO - Step 27400: lr=1.00E-05, loss= 1.2120 (max= 1.7702), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:05,045 - root - INFO - Step 27400: lr=1.00E-05, loss= 1.2120 (max= 1.7702), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:05,045 - root - INFO - Step 27400: lr=1.00E-05, loss= 1.2120 (max= 1.7702), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:05,045 - root - INFO - Step 27400: lr=1.00E-05, loss= 1.2120 (max= 1.7702), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:05,045 - root - INFO - Step 27400: lr=1.00E-05, loss= 1.2120 (max= 1.7702), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:05,045 - root - INFO - Step 27400: lr=1.00E-05, loss= 1.2120 (max= 1.7702), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:20,949 - root - INFO - Step 27410: lr=1.00E-05, loss= 1.1899 (max= 1.6504), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:20,949 - root - INFO - Step 27410: lr=1.00E-05, loss= 1.1899 (max= 1.6504), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:20,949 - root - INFO - Step 27410: lr=1.00E-05, loss= 1.1899 (max= 1.6504), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:20,949 - root - INFO - Step 27410: lr=1.00E-05, loss= 1.1899 (max= 1.6504), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:20,949 - root - INFO - Step 27410: lr=1.00E-05, loss= 1.1899 (max= 1.6504), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:20,949 - root - INFO - Step 27410: lr=1.00E-05, loss= 1.1899 (max= 1.6504), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:20,949 - root - INFO - Step 27410: lr=1.00E-05, loss= 1.1899 (max= 1.6504), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:20,949 - root - INFO - Step 27410: lr=1.00E-05, loss= 1.1899 (max= 1.6504), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:36,906 - root - INFO - Step 27420: lr=1.00E-05, loss= 1.2194 (max= 1.5588), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:36,906 - root - INFO - Step 27420: lr=1.00E-05, loss= 1.2194 (max= 1.5588), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:36,906 - root - INFO - Step 27420: lr=1.00E-05, loss= 1.2194 (max= 1.5588), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:36,906 - root - INFO - Step 27420: lr=1.00E-05, loss= 1.2194 (max= 1.5588), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:36,906 - root - INFO - Step 27420: lr=1.00E-05, loss= 1.2194 (max= 1.5588), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:36,906 - root - INFO - Step 27420: lr=1.00E-05, loss= 1.2194 (max= 1.5588), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:36,906 - root - INFO - Step 27420: lr=1.00E-05, loss= 1.2194 (max= 1.5588), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:36,907 - root - INFO - Step 27420: lr=1.00E-05, loss= 1.2194 (max= 1.5588), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:52,869 - root - INFO - Step 27430: lr=1.00E-05, loss= 1.1860 (max= 1.5693), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:52,869 - root - INFO - Step 27430: lr=1.00E-05, loss= 1.1860 (max= 1.5693), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:52,869 - root - INFO - Step 27430: lr=1.00E-05, loss= 1.1860 (max= 1.5693), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:52,869 - root - INFO - Step 27430: lr=1.00E-05, loss= 1.1860 (max= 1.5693), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:52,870 - root - INFO - Step 27430: lr=1.00E-05, loss= 1.1860 (max= 1.5693), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:52,870 - root - INFO - Step 27430: lr=1.00E-05, loss= 1.1860 (max= 1.5693), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:52,870 - root - INFO - Step 27430: lr=1.00E-05, loss= 1.1860 (max= 1.5693), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:43:52,870 - root - INFO - Step 27430: lr=1.00E-05, loss= 1.1860 (max= 1.5693), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:08,779 - root - INFO - Step 27440: lr=1.00E-05, loss= 1.1950 (max= 1.8322), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:08,779 - root - INFO - Step 27440: lr=1.00E-05, loss= 1.1950 (max= 1.8322), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:08,779 - root - INFO - Step 27440: lr=1.00E-05, loss= 1.1950 (max= 1.8322), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:08,779 - root - INFO - Step 27440: lr=1.00E-05, loss= 1.1950 (max= 1.8322), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:08,779 - root - INFO - Step 27440: lr=1.00E-05, loss= 1.1950 (max= 1.8322), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:08,779 - root - INFO - Step 27440: lr=1.00E-05, loss= 1.1950 (max= 1.8322), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:08,779 - root - INFO - Step 27440: lr=1.00E-05, loss= 1.1950 (max= 1.8322), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:08,780 - root - INFO - Step 27440: lr=1.00E-05, loss= 1.1950 (max= 1.8322), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:24,733 - root - INFO - Step 27450: lr=1.00E-05, loss= 1.1718 (max= 1.5076), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:24,733 - root - INFO - Step 27450: lr=1.00E-05, loss= 1.1718 (max= 1.5076), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:24,733 - root - INFO - Step 27450: lr=1.00E-05, loss= 1.1718 (max= 1.5076), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:24,733 - root - INFO - Step 27450: lr=1.00E-05, loss= 1.1718 (max= 1.5076), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:24,733 - root - INFO - Step 27450: lr=1.00E-05, loss= 1.1718 (max= 1.5076), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:24,733 - root - INFO - Step 27450: lr=1.00E-05, loss= 1.1718 (max= 1.5076), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:24,733 - root - INFO - Step 27450: lr=1.00E-05, loss= 1.1718 (max= 1.5076), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:24,734 - root - INFO - Step 27450: lr=1.00E-05, loss= 1.1718 (max= 1.5076), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:40,670 - root - INFO - Step 27460: lr=1.00E-05, loss= 1.1909 (max= 1.6114), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:40,670 - root - INFO - Step 27460: lr=1.00E-05, loss= 1.1909 (max= 1.6114), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:40,670 - root - INFO - Step 27460: lr=1.00E-05, loss= 1.1909 (max= 1.6114), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:40,670 - root - INFO - Step 27460: lr=1.00E-05, loss= 1.1909 (max= 1.6114), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:40,670 - root - INFO - Step 27460: lr=1.00E-05, loss= 1.1909 (max= 1.6114), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:40,670 - root - INFO - Step 27460: lr=1.00E-05, loss= 1.1909 (max= 1.6114), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:40,670 - root - INFO - Step 27460: lr=1.00E-05, loss= 1.1909 (max= 1.6114), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:40,670 - root - INFO - Step 27460: lr=1.00E-05, loss= 1.1909 (max= 1.6114), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:56,631 - root - INFO - Step 27470: lr=1.00E-05, loss= 1.1793 (max= 1.6988), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:56,631 - root - INFO - Step 27470: lr=1.00E-05, loss= 1.1793 (max= 1.6988), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:56,631 - root - INFO - Step 27470: lr=1.00E-05, loss= 1.1793 (max= 1.6988), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:56,631 - root - INFO - Step 27470: lr=1.00E-05, loss= 1.1793 (max= 1.6988), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:56,631 - root - INFO - Step 27470: lr=1.00E-05, loss= 1.1793 (max= 1.6988), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:56,631 - root - INFO - Step 27470: lr=1.00E-05, loss= 1.1793 (max= 1.6988), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:56,631 - root - INFO - Step 27470: lr=1.00E-05, loss= 1.1793 (max= 1.6988), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:44:56,631 - root - INFO - Step 27470: lr=1.00E-05, loss= 1.1793 (max= 1.6988), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:12,539 - root - INFO - Step 27480: lr=1.00E-05, loss= 1.1689 (max= 1.5692), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:12,539 - root - INFO - Step 27480: lr=1.00E-05, loss= 1.1689 (max= 1.5692), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:12,539 - root - INFO - Step 27480: lr=1.00E-05, loss= 1.1689 (max= 1.5692), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:12,539 - root - INFO - Step 27480: lr=1.00E-05, loss= 1.1689 (max= 1.5692), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:12,539 - root - INFO - Step 27480: lr=1.00E-05, loss= 1.1689 (max= 1.5692), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:12,539 - root - INFO - Step 27480: lr=1.00E-05, loss= 1.1689 (max= 1.5692), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:12,539 - root - INFO - Step 27480: lr=1.00E-05, loss= 1.1689 (max= 1.5692), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:12,539 - root - INFO - Step 27480: lr=1.00E-05, loss= 1.1689 (max= 1.5692), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:28,528 - root - INFO - Step 27490: lr=1.00E-05, loss= 1.1724 (max= 1.6571), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:28,528 - root - INFO - Step 27490: lr=1.00E-05, loss= 1.1724 (max= 1.6571), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:28,528 - root - INFO - Step 27490: lr=1.00E-05, loss= 1.1724 (max= 1.6571), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:28,528 - root - INFO - Step 27490: lr=1.00E-05, loss= 1.1724 (max= 1.6571), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:28,528 - root - INFO - Step 27490: lr=1.00E-05, loss= 1.1724 (max= 1.6571), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:28,528 - root - INFO - Step 27490: lr=1.00E-05, loss= 1.1724 (max= 1.6571), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:28,528 - root - INFO - Step 27490: lr=1.00E-05, loss= 1.1724 (max= 1.6571), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:28,528 - root - INFO - Step 27490: lr=1.00E-05, loss= 1.1724 (max= 1.6571), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:44,475 - root - INFO - Step 27500: lr=1.00E-05, loss= 1.1936 (max= 1.5336), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:44,475 - root - INFO - Step 27500: lr=1.00E-05, loss= 1.1936 (max= 1.5336), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:44,475 - root - INFO - Step 27500: lr=1.00E-05, loss= 1.1936 (max= 1.5336), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:44,475 - root - INFO - Step 27500: lr=1.00E-05, loss= 1.1936 (max= 1.5336), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:44,475 - root - INFO - Step 27500: lr=1.00E-05, loss= 1.1936 (max= 1.5336), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:44,475 - root - INFO - Step 27500: lr=1.00E-05, loss= 1.1936 (max= 1.5336), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:44,475 - root - INFO - Step 27500: lr=1.00E-05, loss= 1.1936 (max= 1.5336), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:45:44,475 - root - INFO - Step 27500: lr=1.00E-05, loss= 1.1936 (max= 1.5336), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:00,446 - root - INFO - Step 27510: lr=1.00E-05, loss= 1.1884 (max= 1.7381), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:00,446 - root - INFO - Step 27510: lr=1.00E-05, loss= 1.1884 (max= 1.7381), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:00,446 - root - INFO - Step 27510: lr=1.00E-05, loss= 1.1884 (max= 1.7381), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:00,446 - root - INFO - Step 27510: lr=1.00E-05, loss= 1.1884 (max= 1.7381), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:00,446 - root - INFO - Step 27510: lr=1.00E-05, loss= 1.1884 (max= 1.7381), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:00,446 - root - INFO - Step 27510: lr=1.00E-05, loss= 1.1884 (max= 1.7381), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:00,446 - root - INFO - Step 27510: lr=1.00E-05, loss= 1.1884 (max= 1.7381), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:00,446 - root - INFO - Step 27510: lr=1.00E-05, loss= 1.1884 (max= 1.7381), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:16,358 - root - INFO - Step 27520: lr=1.00E-05, loss= 1.1920 (max= 1.5624), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:46:16,358 - root - INFO - Step 27520: lr=1.00E-05, loss= 1.1920 (max= 1.5624), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:46:16,358 - root - INFO - Step 27520: lr=1.00E-05, loss= 1.1920 (max= 1.5624), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:46:16,358 - root - INFO - Step 27520: lr=1.00E-05, loss= 1.1920 (max= 1.5624), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:46:16,359 - root - INFO - Step 27520: lr=1.00E-05, loss= 1.1920 (max= 1.5624), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:46:16,359 - root - INFO - Step 27520: lr=1.00E-05, loss= 1.1920 (max= 1.5624), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:46:16,359 - root - INFO - Step 27520: lr=1.00E-05, loss= 1.1920 (max= 1.5624), tps=20597, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:46:16,359 - root - INFO - Step 27520: lr=1.00E-05, loss= 1.1920 (max= 1.5624), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:46:32,287 - root - INFO - Step 27530: lr=1.00E-05, loss= 1.1907 (max= 1.8868), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:32,287 - root - INFO - Step 27530: lr=1.00E-05, loss= 1.1907 (max= 1.8868), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:32,287 - root - INFO - Step 27530: lr=1.00E-05, loss= 1.1907 (max= 1.8868), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:32,287 - root - INFO - Step 27530: lr=1.00E-05, loss= 1.1907 (max= 1.8868), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:32,287 - root - INFO - Step 27530: lr=1.00E-05, loss= 1.1907 (max= 1.8868), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:32,287 - root - INFO - Step 27530: lr=1.00E-05, loss= 1.1907 (max= 1.8868), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:32,287 - root - INFO - Step 27530: lr=1.00E-05, loss= 1.1907 (max= 1.8868), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:32,287 - root - INFO - Step 27530: lr=1.00E-05, loss= 1.1907 (max= 1.8868), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:48,229 - root - INFO - Step 27540: lr=1.00E-05, loss= 1.1728 (max= 1.8552), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:48,229 - root - INFO - Step 27540: lr=1.00E-05, loss= 1.1728 (max= 1.8552), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:48,229 - root - INFO - Step 27540: lr=1.00E-05, loss= 1.1728 (max= 1.8552), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:48,229 - root - INFO - Step 27540: lr=1.00E-05, loss= 1.1728 (max= 1.8552), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:48,229 - root - INFO - Step 27540: lr=1.00E-05, loss= 1.1728 (max= 1.8552), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:48,229 - root - INFO - Step 27540: lr=1.00E-05, loss= 1.1728 (max= 1.8552), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:48,229 - root - INFO - Step 27540: lr=1.00E-05, loss= 1.1728 (max= 1.8552), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:46:48,229 - root - INFO - Step 27540: lr=1.00E-05, loss= 1.1728 (max= 1.8552), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:47:04,156 - root - INFO - Step 27550: lr=1.00E-05, loss= 1.2169 (max= 1.8850), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:04,156 - root - INFO - Step 27550: lr=1.00E-05, loss= 1.2169 (max= 1.8850), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:04,156 - root - INFO - Step 27550: lr=1.00E-05, loss= 1.2169 (max= 1.8850), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:04,156 - root - INFO - Step 27550: lr=1.00E-05, loss= 1.2169 (max= 1.8850), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:04,156 - root - INFO - Step 27550: lr=1.00E-05, loss= 1.2169 (max= 1.8850), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:04,157 - root - INFO - Step 27550: lr=1.00E-05, loss= 1.2169 (max= 1.8850), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:04,157 - root - INFO - Step 27550: lr=1.00E-05, loss= 1.2169 (max= 1.8850), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:04,157 - root - INFO - Step 27550: lr=1.00E-05, loss= 1.2169 (max= 1.8850), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:20,127 - root - INFO - Step 27560: lr=1.00E-05, loss= 1.1848 (max= 1.7201), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:47:20,127 - root - INFO - Step 27560: lr=1.00E-05, loss= 1.1848 (max= 1.7201), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:47:20,127 - root - INFO - Step 27560: lr=1.00E-05, loss= 1.1848 (max= 1.7201), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:47:20,127 - root - INFO - Step 27560: lr=1.00E-05, loss= 1.1848 (max= 1.7201), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:47:20,127 - root - INFO - Step 27560: lr=1.00E-05, loss= 1.1848 (max= 1.7201), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:47:20,127 - root - INFO - Step 27560: lr=1.00E-05, loss= 1.1848 (max= 1.7201), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:47:20,127 - root - INFO - Step 27560: lr=1.00E-05, loss= 1.1848 (max= 1.7201), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:47:20,127 - root - INFO - Step 27560: lr=1.00E-05, loss= 1.1848 (max= 1.7201), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:47:36,028 - root - INFO - Step 27570: lr=1.00E-05, loss= 1.2087 (max= 1.6653), tps=20612, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:36,028 - root - INFO - Step 27570: lr=1.00E-05, loss= 1.2087 (max= 1.6653), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:36,028 - root - INFO - Step 27570: lr=1.00E-05, loss= 1.2087 (max= 1.6653), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:36,028 - root - INFO - Step 27570: lr=1.00E-05, loss= 1.2087 (max= 1.6653), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:36,029 - root - INFO - Step 27570: lr=1.00E-05, loss= 1.2087 (max= 1.6653), tps=20612, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:36,029 - root - INFO - Step 27570: lr=1.00E-05, loss= 1.2087 (max= 1.6653), tps=20612, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:36,029 - root - INFO - Step 27570: lr=1.00E-05, loss= 1.2087 (max= 1.6653), tps=20612, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:36,029 - root - INFO - Step 27570: lr=1.00E-05, loss= 1.2087 (max= 1.6653), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:51,980 - root - INFO - Step 27580: lr=1.00E-05, loss= 1.2132 (max= 1.7554), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:51,980 - root - INFO - Step 27580: lr=1.00E-05, loss= 1.2132 (max= 1.7554), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:51,980 - root - INFO - Step 27580: lr=1.00E-05, loss= 1.2132 (max= 1.7554), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:51,980 - root - INFO - Step 27580: lr=1.00E-05, loss= 1.2132 (max= 1.7554), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:51,980 - root - INFO - Step 27580: lr=1.00E-05, loss= 1.2132 (max= 1.7554), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:51,980 - root - INFO - Step 27580: lr=1.00E-05, loss= 1.2132 (max= 1.7554), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:51,980 - root - INFO - Step 27580: lr=1.00E-05, loss= 1.2132 (max= 1.7554), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:47:51,980 - root - INFO - Step 27580: lr=1.00E-05, loss= 1.2132 (max= 1.7554), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:48:07,914 - root - INFO - Step 27590: lr=1.00E-05, loss= 1.1945 (max= 1.6111), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:48:07,914 - root - INFO - Step 27590: lr=1.00E-05, loss= 1.1945 (max= 1.6111), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:48:07,914 - root - INFO - Step 27590: lr=1.00E-05, loss= 1.1945 (max= 1.6111), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:48:07,914 - root - INFO - Step 27590: lr=1.00E-05, loss= 1.1945 (max= 1.6111), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:48:07,914 - root - INFO - Step 27590: lr=1.00E-05, loss= 1.1945 (max= 1.6111), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:48:07,914 - root - INFO - Step 27590: lr=1.00E-05, loss= 1.1945 (max= 1.6111), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:48:07,914 - root - INFO - Step 27590: lr=1.00E-05, loss= 1.1945 (max= 1.6111), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:48:07,914 - root - INFO - Step 27590: lr=1.00E-05, loss= 1.1945 (max= 1.6111), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:48:23,885 - root - INFO - Step 27600: lr=1.00E-05, loss= 1.2150 (max= 1.5845), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:48:23,885 - root - INFO - Step 27600: lr=1.00E-05, loss= 1.2150 (max= 1.5845), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:48:23,885 - root - INFO - Step 27600: lr=1.00E-05, loss= 1.2150 (max= 1.5845), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:48:23,885 - root - INFO - Step 27600: lr=1.00E-05, loss= 1.2150 (max= 1.5845), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:48:23,885 - root - INFO - Step 27600: lr=1.00E-05, loss= 1.2150 (max= 1.5845), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:48:23,885 - root - INFO - Step 27600: lr=1.00E-05, loss= 1.2150 (max= 1.5845), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:48:23,885 - root - INFO - Step 27600: lr=1.00E-05, loss= 1.2150 (max= 1.5845), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:48:23,885 - root - INFO - Step 27600: lr=1.00E-05, loss= 1.2150 (max= 1.5845), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:48:39,815 - root - INFO - Step 27610: lr=1.00E-05, loss= 1.1729 (max= 1.7800), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:48:39,815 - root - INFO - Step 27610: lr=1.00E-05, loss= 1.1729 (max= 1.7800), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:48:39,815 - root - INFO - Step 27610: lr=1.00E-05, loss= 1.1729 (max= 1.7800), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:48:39,815 - root - INFO - Step 27610: lr=1.00E-05, loss= 1.1729 (max= 1.7800), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:48:39,815 - root - INFO - Step 27610: lr=1.00E-05, loss= 1.1729 (max= 1.7800), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:48:39,815 - root - INFO - Step 27610: lr=1.00E-05, loss= 1.1729 (max= 1.7800), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:48:39,815 - root - INFO - Step 27610: lr=1.00E-05, loss= 1.1729 (max= 1.7800), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:48:39,815 - root - INFO - Step 27610: lr=1.00E-05, loss= 1.1729 (max= 1.7800), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:48:55,731 - root - INFO - Step 27620: lr=1.00E-05, loss= 1.2043 (max= 1.6153), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:48:55,731 - root - INFO - Step 27620: lr=1.00E-05, loss= 1.2043 (max= 1.6153), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:48:55,731 - root - INFO - Step 27620: lr=1.00E-05, loss= 1.2043 (max= 1.6153), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:48:55,731 - root - INFO - Step 27620: lr=1.00E-05, loss= 1.2043 (max= 1.6153), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:48:55,731 - root - INFO - Step 27620: lr=1.00E-05, loss= 1.2043 (max= 1.6153), tps=20593, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:48:55,731 - root - INFO - Step 27620: lr=1.00E-05, loss= 1.2043 (max= 1.6153), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:48:55,731 - root - INFO - Step 27620: lr=1.00E-05, loss= 1.2043 (max= 1.6153), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:48:55,731 - root - INFO - Step 27620: lr=1.00E-05, loss= 1.2043 (max= 1.6153), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:49:11,676 - root - INFO - Step 27630: lr=1.00E-05, loss= 1.1834 (max= 1.6436), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:11,676 - root - INFO - Step 27630: lr=1.00E-05, loss= 1.1834 (max= 1.6436), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:11,676 - root - INFO - Step 27630: lr=1.00E-05, loss= 1.1834 (max= 1.6436), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:11,676 - root - INFO - Step 27630: lr=1.00E-05, loss= 1.1834 (max= 1.6436), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:11,676 - root - INFO - Step 27630: lr=1.00E-05, loss= 1.1834 (max= 1.6436), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:11,676 - root - INFO - Step 27630: lr=1.00E-05, loss= 1.1834 (max= 1.6436), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:11,676 - root - INFO - Step 27630: lr=1.00E-05, loss= 1.1834 (max= 1.6436), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:11,676 - root - INFO - Step 27630: lr=1.00E-05, loss= 1.1834 (max= 1.6436), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:27,603 - root - INFO - Step 27640: lr=1.00E-05, loss= 1.2098 (max= 1.6472), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:49:27,603 - root - INFO - Step 27640: lr=1.00E-05, loss= 1.2098 (max= 1.6472), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:49:27,603 - root - INFO - Step 27640: lr=1.00E-05, loss= 1.2098 (max= 1.6472), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:49:27,603 - root - INFO - Step 27640: lr=1.00E-05, loss= 1.2098 (max= 1.6472), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:49:27,604 - root - INFO - Step 27640: lr=1.00E-05, loss= 1.2098 (max= 1.6472), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:49:27,604 - root - INFO - Step 27640: lr=1.00E-05, loss= 1.2098 (max= 1.6472), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:49:27,604 - root - INFO - Step 27640: lr=1.00E-05, loss= 1.2098 (max= 1.6472), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:49:27,604 - root - INFO - Step 27640: lr=1.00E-05, loss= 1.2098 (max= 1.6472), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:49:43,532 - root - INFO - Step 27650: lr=1.00E-05, loss= 1.1512 (max= 1.4061), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:43,532 - root - INFO - Step 27650: lr=1.00E-05, loss= 1.1512 (max= 1.4061), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:43,532 - root - INFO - Step 27650: lr=1.00E-05, loss= 1.1512 (max= 1.4061), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:43,532 - root - INFO - Step 27650: lr=1.00E-05, loss= 1.1512 (max= 1.4061), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:43,532 - root - INFO - Step 27650: lr=1.00E-05, loss= 1.1512 (max= 1.4061), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:43,532 - root - INFO - Step 27650: lr=1.00E-05, loss= 1.1512 (max= 1.4061), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:43,532 - root - INFO - Step 27650: lr=1.00E-05, loss= 1.1512 (max= 1.4061), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:43,532 - root - INFO - Step 27650: lr=1.00E-05, loss= 1.1512 (max= 1.4061), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:59,443 - root - INFO - Step 27660: lr=1.00E-05, loss= 1.1734 (max= 1.6520), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:59,443 - root - INFO - Step 27660: lr=1.00E-05, loss= 1.1734 (max= 1.6520), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:59,443 - root - INFO - Step 27660: lr=1.00E-05, loss= 1.1734 (max= 1.6520), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:59,443 - root - INFO - Step 27660: lr=1.00E-05, loss= 1.1734 (max= 1.6520), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:59,443 - root - INFO - Step 27660: lr=1.00E-05, loss= 1.1734 (max= 1.6520), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:59,443 - root - INFO - Step 27660: lr=1.00E-05, loss= 1.1734 (max= 1.6520), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:59,443 - root - INFO - Step 27660: lr=1.00E-05, loss= 1.1734 (max= 1.6520), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:49:59,444 - root - INFO - Step 27660: lr=1.00E-05, loss= 1.1734 (max= 1.6520), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:15,436 - root - INFO - Step 27670: lr=1.00E-05, loss= 1.1901 (max= 1.5031), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:15,436 - root - INFO - Step 27670: lr=1.00E-05, loss= 1.1901 (max= 1.5031), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:15,436 - root - INFO - Step 27670: lr=1.00E-05, loss= 1.1901 (max= 1.5031), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:15,436 - root - INFO - Step 27670: lr=1.00E-05, loss= 1.1901 (max= 1.5031), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:15,436 - root - INFO - Step 27670: lr=1.00E-05, loss= 1.1901 (max= 1.5031), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:15,436 - root - INFO - Step 27670: lr=1.00E-05, loss= 1.1901 (max= 1.5031), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:15,436 - root - INFO - Step 27670: lr=1.00E-05, loss= 1.1901 (max= 1.5031), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:15,436 - root - INFO - Step 27670: lr=1.00E-05, loss= 1.1901 (max= 1.5031), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:31,386 - root - INFO - Step 27680: lr=1.00E-05, loss= 1.2008 (max= 1.6074), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:31,386 - root - INFO - Step 27680: lr=1.00E-05, loss= 1.2008 (max= 1.6074), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:31,386 - root - INFO - Step 27680: lr=1.00E-05, loss= 1.2008 (max= 1.6074), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:31,386 - root - INFO - Step 27680: lr=1.00E-05, loss= 1.2008 (max= 1.6074), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:31,386 - root - INFO - Step 27680: lr=1.00E-05, loss= 1.2008 (max= 1.6074), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:31,386 - root - INFO - Step 27680: lr=1.00E-05, loss= 1.2008 (max= 1.6074), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:31,386 - root - INFO - Step 27680: lr=1.00E-05, loss= 1.2008 (max= 1.6074), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:31,386 - root - INFO - Step 27680: lr=1.00E-05, loss= 1.2008 (max= 1.6074), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:47,343 - root - INFO - Step 27690: lr=1.00E-05, loss= 1.1521 (max= 1.6547), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:47,343 - root - INFO - Step 27690: lr=1.00E-05, loss= 1.1521 (max= 1.6547), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:47,343 - root - INFO - Step 27690: lr=1.00E-05, loss= 1.1521 (max= 1.6547), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:47,343 - root - INFO - Step 27690: lr=1.00E-05, loss= 1.1521 (max= 1.6547), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:47,343 - root - INFO - Step 27690: lr=1.00E-05, loss= 1.1521 (max= 1.6547), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:47,343 - root - INFO - Step 27690: lr=1.00E-05, loss= 1.1521 (max= 1.6547), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:47,343 - root - INFO - Step 27690: lr=1.00E-05, loss= 1.1521 (max= 1.6547), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:50:47,343 - root - INFO - Step 27690: lr=1.00E-05, loss= 1.1521 (max= 1.6547), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:03,278 - root - INFO - Step 27700: lr=1.00E-05, loss= 1.1683 (max= 1.8834), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:03,278 - root - INFO - Step 27700: lr=1.00E-05, loss= 1.1683 (max= 1.8834), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:03,278 - root - INFO - Step 27700: lr=1.00E-05, loss= 1.1683 (max= 1.8834), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:03,278 - root - INFO - Step 27700: lr=1.00E-05, loss= 1.1683 (max= 1.8834), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:03,278 - root - INFO - Step 27700: lr=1.00E-05, loss= 1.1683 (max= 1.8834), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:03,278 - root - INFO - Step 27700: lr=1.00E-05, loss= 1.1683 (max= 1.8834), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:03,278 - root - INFO - Step 27700: lr=1.00E-05, loss= 1.1683 (max= 1.8834), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:03,279 - root - INFO - Step 27700: lr=1.00E-05, loss= 1.1683 (max= 1.8834), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:19,200 - root - INFO - Step 27710: lr=1.00E-05, loss= 1.2140 (max= 1.5718), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:19,200 - root - INFO - Step 27710: lr=1.00E-05, loss= 1.2140 (max= 1.5718), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:19,200 - root - INFO - Step 27710: lr=1.00E-05, loss= 1.2140 (max= 1.5718), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:19,201 - root - INFO - Step 27710: lr=1.00E-05, loss= 1.2140 (max= 1.5718), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:19,201 - root - INFO - Step 27710: lr=1.00E-05, loss= 1.2140 (max= 1.5718), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:19,201 - root - INFO - Step 27710: lr=1.00E-05, loss= 1.2140 (max= 1.5718), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:19,201 - root - INFO - Step 27710: lr=1.00E-05, loss= 1.2140 (max= 1.5718), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:19,201 - root - INFO - Step 27710: lr=1.00E-05, loss= 1.2140 (max= 1.5718), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:35,190 - root - INFO - Step 27720: lr=1.00E-05, loss= 1.1997 (max= 1.7264), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:35,190 - root - INFO - Step 27720: lr=1.00E-05, loss= 1.1997 (max= 1.7264), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:35,191 - root - INFO - Step 27720: lr=1.00E-05, loss= 1.1997 (max= 1.7264), tps=20497, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:35,191 - root - INFO - Step 27720: lr=1.00E-05, loss= 1.1997 (max= 1.7264), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:35,191 - root - INFO - Step 27720: lr=1.00E-05, loss= 1.1997 (max= 1.7264), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:35,191 - root - INFO - Step 27720: lr=1.00E-05, loss= 1.1997 (max= 1.7264), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:35,191 - root - INFO - Step 27720: lr=1.00E-05, loss= 1.1997 (max= 1.7264), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:35,191 - root - INFO - Step 27720: lr=1.00E-05, loss= 1.1997 (max= 1.7264), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:51,144 - root - INFO - Step 27730: lr=1.00E-05, loss= 1.2055 (max= 1.5759), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:51,144 - root - INFO - Step 27730: lr=1.00E-05, loss= 1.2055 (max= 1.5759), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:51,144 - root - INFO - Step 27730: lr=1.00E-05, loss= 1.2055 (max= 1.5759), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:51,144 - root - INFO - Step 27730: lr=1.00E-05, loss= 1.2055 (max= 1.5759), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:51,144 - root - INFO - Step 27730: lr=1.00E-05, loss= 1.2055 (max= 1.5759), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:51,144 - root - INFO - Step 27730: lr=1.00E-05, loss= 1.2055 (max= 1.5759), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:51,144 - root - INFO - Step 27730: lr=1.00E-05, loss= 1.2055 (max= 1.5759), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:51:51,144 - root - INFO - Step 27730: lr=1.00E-05, loss= 1.2055 (max= 1.5759), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:52:07,105 - root - INFO - Step 27740: lr=1.00E-05, loss= 1.1571 (max= 1.9113), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:52:07,105 - root - INFO - Step 27740: lr=1.00E-05, loss= 1.1571 (max= 1.9113), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:52:07,105 - root - INFO - Step 27740: lr=1.00E-05, loss= 1.1571 (max= 1.9113), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:52:07,105 - root - INFO - Step 27740: lr=1.00E-05, loss= 1.1571 (max= 1.9113), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:52:07,105 - root - INFO - Step 27740: lr=1.00E-05, loss= 1.1571 (max= 1.9113), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:52:07,105 - root - INFO - Step 27740: lr=1.00E-05, loss= 1.1571 (max= 1.9113), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:52:07,105 - root - INFO - Step 27740: lr=1.00E-05, loss= 1.1571 (max= 1.9113), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:52:07,105 - root - INFO - Step 27740: lr=1.00E-05, loss= 1.1571 (max= 1.9113), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:52:23,068 - root - INFO - Step 27750: lr=1.00E-05, loss= 1.1796 (max= 1.7077), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:52:23,068 - root - INFO - Step 27750: lr=1.00E-05, loss= 1.1796 (max= 1.7077), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:52:23,068 - root - INFO - Step 27750: lr=1.00E-05, loss= 1.1796 (max= 1.7077), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:52:23,068 - root - INFO - Step 27750: lr=1.00E-05, loss= 1.1796 (max= 1.7077), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:52:23,068 - root - INFO - Step 27750: lr=1.00E-05, loss= 1.1796 (max= 1.7077), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:52:23,068 - root - INFO - Step 27750: lr=1.00E-05, loss= 1.1796 (max= 1.7077), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:52:23,068 - root - INFO - Step 27750: lr=1.00E-05, loss= 1.1796 (max= 1.7077), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:52:23,068 - root - INFO - Step 27750: lr=1.00E-05, loss= 1.1796 (max= 1.7077), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:52:39,003 - root - INFO - Step 27760: lr=1.00E-05, loss= 1.1880 (max= 1.6426), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:52:39,003 - root - INFO - Step 27760: lr=1.00E-05, loss= 1.1880 (max= 1.6426), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:52:39,003 - root - INFO - Step 27760: lr=1.00E-05, loss= 1.1880 (max= 1.6426), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:52:39,003 - root - INFO - Step 27760: lr=1.00E-05, loss= 1.1880 (max= 1.6426), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:52:39,003 - root - INFO - Step 27760: lr=1.00E-05, loss= 1.1880 (max= 1.6426), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:52:39,003 - root - INFO - Step 27760: lr=1.00E-05, loss= 1.1880 (max= 1.6426), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:52:39,003 - root - INFO - Step 27760: lr=1.00E-05, loss= 1.1880 (max= 1.6426), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:52:39,003 - root - INFO - Step 27760: lr=1.00E-05, loss= 1.1880 (max= 1.6426), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:52:54,945 - root - INFO - Step 27770: lr=1.00E-05, loss= 1.1786 (max= 1.7053), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:52:54,945 - root - INFO - Step 27770: lr=1.00E-05, loss= 1.1786 (max= 1.7053), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:52:54,945 - root - INFO - Step 27770: lr=1.00E-05, loss= 1.1786 (max= 1.7053), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:52:54,945 - root - INFO - Step 27770: lr=1.00E-05, loss= 1.1786 (max= 1.7053), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:52:54,945 - root - INFO - Step 27770: lr=1.00E-05, loss= 1.1786 (max= 1.7053), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:52:54,945 - root - INFO - Step 27770: lr=1.00E-05, loss= 1.1786 (max= 1.7053), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:52:54,945 - root - INFO - Step 27770: lr=1.00E-05, loss= 1.1786 (max= 1.7053), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:52:54,945 - root - INFO - Step 27770: lr=1.00E-05, loss= 1.1786 (max= 1.7053), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:53:10,894 - root - INFO - Step 27780: lr=1.00E-05, loss= 1.2037 (max= 1.6403), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:53:10,894 - root - INFO - Step 27780: lr=1.00E-05, loss= 1.2037 (max= 1.6403), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:53:10,894 - root - INFO - Step 27780: lr=1.00E-05, loss= 1.2037 (max= 1.6403), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:53:10,894 - root - INFO - Step 27780: lr=1.00E-05, loss= 1.2037 (max= 1.6403), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:53:10,894 - root - INFO - Step 27780: lr=1.00E-05, loss= 1.2037 (max= 1.6403), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:53:10,894 - root - INFO - Step 27780: lr=1.00E-05, loss= 1.2037 (max= 1.6403), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:53:10,894 - root - INFO - Step 27780: lr=1.00E-05, loss= 1.2037 (max= 1.6403), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:53:10,894 - root - INFO - Step 27780: lr=1.00E-05, loss= 1.2037 (max= 1.6403), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:53:26,846 - root - INFO - Step 27790: lr=1.00E-05, loss= 1.1622 (max= 1.5059), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:53:26,846 - root - INFO - Step 27790: lr=1.00E-05, loss= 1.1622 (max= 1.5059), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:53:26,846 - root - INFO - Step 27790: lr=1.00E-05, loss= 1.1622 (max= 1.5059), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:53:26,846 - root - INFO - Step 27790: lr=1.00E-05, loss= 1.1622 (max= 1.5059), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:53:26,846 - root - INFO - Step 27790: lr=1.00E-05, loss= 1.1622 (max= 1.5059), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:53:26,846 - root - INFO - Step 27790: lr=1.00E-05, loss= 1.1622 (max= 1.5059), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:53:26,846 - root - INFO - Step 27790: lr=1.00E-05, loss= 1.1622 (max= 1.5059), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:53:26,846 - root - INFO - Step 27790: lr=1.00E-05, loss= 1.1622 (max= 1.5059), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:53:42,778 - root - INFO - Step 27800: lr=1.00E-05, loss= 1.2199 (max= 1.8523), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:53:42,778 - root - INFO - Step 27800: lr=1.00E-05, loss= 1.2199 (max= 1.8523), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:53:42,778 - root - INFO - Step 27800: lr=1.00E-05, loss= 1.2199 (max= 1.8523), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:53:42,778 - root - INFO - Step 27800: lr=1.00E-05, loss= 1.2199 (max= 1.8523), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:53:42,778 - root - INFO - Step 27800: lr=1.00E-05, loss= 1.2199 (max= 1.8523), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:53:42,778 - root - INFO - Step 27800: lr=1.00E-05, loss= 1.2199 (max= 1.8523), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:53:42,778 - root - INFO - Step 27800: lr=1.00E-05, loss= 1.2199 (max= 1.8523), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:53:42,778 - root - INFO - Step 27800: lr=1.00E-05, loss= 1.2199 (max= 1.8523), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:53:58,776 - root - INFO - Step 27810: lr=1.00E-05, loss= 1.1958 (max= 1.6687), tps=20486, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:53:58,776 - root - INFO - Step 27810: lr=1.00E-05, loss= 1.1958 (max= 1.6687), tps=20486, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:53:58,776 - root - INFO - Step 27810: lr=1.00E-05, loss= 1.1958 (max= 1.6687), tps=20486, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:53:58,776 - root - INFO - Step 27810: lr=1.00E-05, loss= 1.1958 (max= 1.6687), tps=20486, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:53:58,776 - root - INFO - Step 27810: lr=1.00E-05, loss= 1.1958 (max= 1.6687), tps=20487, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:53:58,776 - root - INFO - Step 27810: lr=1.00E-05, loss= 1.1958 (max= 1.6687), tps=20487, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:53:58,776 - root - INFO - Step 27810: lr=1.00E-05, loss= 1.1958 (max= 1.6687), tps=20487, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:53:58,776 - root - INFO - Step 27810: lr=1.00E-05, loss= 1.1958 (max= 1.6687), tps=20487, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:14,797 - root - INFO - Step 27820: lr=1.00E-05, loss= 1.1859 (max= 1.5004), tps=20458, mfu=42.62%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:14,797 - root - INFO - Step 27820: lr=1.00E-05, loss= 1.1859 (max= 1.5004), tps=20458, mfu=42.62%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:14,797 - root - INFO - Step 27820: lr=1.00E-05, loss= 1.1859 (max= 1.5004), tps=20458, mfu=42.62%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:14,797 - root - INFO - Step 27820: lr=1.00E-05, loss= 1.1859 (max= 1.5004), tps=20458, mfu=42.62%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:14,797 - root - INFO - Step 27820: lr=1.00E-05, loss= 1.1859 (max= 1.5004), tps=20458, mfu=42.63%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:14,797 - root - INFO - Step 27820: lr=1.00E-05, loss= 1.1859 (max= 1.5004), tps=20458, mfu=42.63%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:14,797 - root - INFO - Step 27820: lr=1.00E-05, loss= 1.1859 (max= 1.5004), tps=20458, mfu=42.63%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:14,797 - root - INFO - Step 27820: lr=1.00E-05, loss= 1.1859 (max= 1.5004), tps=20457, mfu=42.62%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:30,723 - root - INFO - Step 27830: lr=1.00E-05, loss= 1.1730 (max= 1.6060), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:30,723 - root - INFO - Step 27830: lr=1.00E-05, loss= 1.1730 (max= 1.6060), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:30,723 - root - INFO - Step 27830: lr=1.00E-05, loss= 1.1730 (max= 1.6060), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:30,723 - root - INFO - Step 27830: lr=1.00E-05, loss= 1.1730 (max= 1.6060), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:30,723 - root - INFO - Step 27830: lr=1.00E-05, loss= 1.1730 (max= 1.6060), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:30,723 - root - INFO - Step 27830: lr=1.00E-05, loss= 1.1730 (max= 1.6060), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:30,723 - root - INFO - Step 27830: lr=1.00E-05, loss= 1.1730 (max= 1.6060), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:30,723 - root - INFO - Step 27830: lr=1.00E-05, loss= 1.1730 (max= 1.6060), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:46,646 - root - INFO - Step 27840: lr=1.00E-05, loss= 1.1922 (max= 1.5981), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:46,646 - root - INFO - Step 27840: lr=1.00E-05, loss= 1.1922 (max= 1.5981), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:46,646 - root - INFO - Step 27840: lr=1.00E-05, loss= 1.1922 (max= 1.5981), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:46,646 - root - INFO - Step 27840: lr=1.00E-05, loss= 1.1922 (max= 1.5981), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:46,646 - root - INFO - Step 27840: lr=1.00E-05, loss= 1.1922 (max= 1.5981), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:46,646 - root - INFO - Step 27840: lr=1.00E-05, loss= 1.1922 (max= 1.5981), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:46,646 - root - INFO - Step 27840: lr=1.00E-05, loss= 1.1922 (max= 1.5981), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:54:46,646 - root - INFO - Step 27840: lr=1.00E-05, loss= 1.1922 (max= 1.5981), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:02,567 - root - INFO - Step 27850: lr=1.00E-05, loss= 1.2061 (max= 2.0390), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:02,567 - root - INFO - Step 27850: lr=1.00E-05, loss= 1.2061 (max= 2.0390), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:02,567 - root - INFO - Step 27850: lr=1.00E-05, loss= 1.2061 (max= 2.0390), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:02,567 - root - INFO - Step 27850: lr=1.00E-05, loss= 1.2061 (max= 2.0390), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:02,567 - root - INFO - Step 27850: lr=1.00E-05, loss= 1.2061 (max= 2.0390), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:02,567 - root - INFO - Step 27850: lr=1.00E-05, loss= 1.2061 (max= 2.0390), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:02,567 - root - INFO - Step 27850: lr=1.00E-05, loss= 1.2061 (max= 2.0390), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:02,567 - root - INFO - Step 27850: lr=1.00E-05, loss= 1.2061 (max= 2.0390), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:18,500 - root - INFO - Step 27860: lr=1.00E-05, loss= 1.1879 (max= 1.5040), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:18,500 - root - INFO - Step 27860: lr=1.00E-05, loss= 1.1879 (max= 1.5040), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:18,501 - root - INFO - Step 27860: lr=1.00E-05, loss= 1.1879 (max= 1.5040), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:18,501 - root - INFO - Step 27860: lr=1.00E-05, loss= 1.1879 (max= 1.5040), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:18,501 - root - INFO - Step 27860: lr=1.00E-05, loss= 1.1879 (max= 1.5040), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:18,501 - root - INFO - Step 27860: lr=1.00E-05, loss= 1.1879 (max= 1.5040), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:18,501 - root - INFO - Step 27860: lr=1.00E-05, loss= 1.1879 (max= 1.5040), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:18,501 - root - INFO - Step 27860: lr=1.00E-05, loss= 1.1879 (max= 1.5040), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:34,440 - root - INFO - Step 27870: lr=1.00E-05, loss= 1.1432 (max= 1.6270), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:34,440 - root - INFO - Step 27870: lr=1.00E-05, loss= 1.1432 (max= 1.6270), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:34,440 - root - INFO - Step 27870: lr=1.00E-05, loss= 1.1432 (max= 1.6270), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:34,440 - root - INFO - Step 27870: lr=1.00E-05, loss= 1.1432 (max= 1.6270), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:34,440 - root - INFO - Step 27870: lr=1.00E-05, loss= 1.1432 (max= 1.6270), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:34,440 - root - INFO - Step 27870: lr=1.00E-05, loss= 1.1432 (max= 1.6270), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:34,440 - root - INFO - Step 27870: lr=1.00E-05, loss= 1.1432 (max= 1.6270), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:34,440 - root - INFO - Step 27870: lr=1.00E-05, loss= 1.1432 (max= 1.6270), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:50,390 - root - INFO - Step 27880: lr=1.00E-05, loss= 1.1811 (max= 1.6511), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:50,390 - root - INFO - Step 27880: lr=1.00E-05, loss= 1.1811 (max= 1.6511), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:50,390 - root - INFO - Step 27880: lr=1.00E-05, loss= 1.1811 (max= 1.6511), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:50,390 - root - INFO - Step 27880: lr=1.00E-05, loss= 1.1811 (max= 1.6511), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:50,390 - root - INFO - Step 27880: lr=1.00E-05, loss= 1.1811 (max= 1.6511), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:50,390 - root - INFO - Step 27880: lr=1.00E-05, loss= 1.1811 (max= 1.6511), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:50,390 - root - INFO - Step 27880: lr=1.00E-05, loss= 1.1811 (max= 1.6511), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:55:50,390 - root - INFO - Step 27880: lr=1.00E-05, loss= 1.1811 (max= 1.6511), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:06,303 - root - INFO - Step 27890: lr=1.00E-05, loss= 1.1874 (max= 1.5526), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:06,303 - root - INFO - Step 27890: lr=1.00E-05, loss= 1.1874 (max= 1.5526), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:06,304 - root - INFO - Step 27890: lr=1.00E-05, loss= 1.1874 (max= 1.5526), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:06,304 - root - INFO - Step 27890: lr=1.00E-05, loss= 1.1874 (max= 1.5526), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:06,304 - root - INFO - Step 27890: lr=1.00E-05, loss= 1.1874 (max= 1.5526), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:06,304 - root - INFO - Step 27890: lr=1.00E-05, loss= 1.1874 (max= 1.5526), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:06,304 - root - INFO - Step 27890: lr=1.00E-05, loss= 1.1874 (max= 1.5526), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:06,304 - root - INFO - Step 27890: lr=1.00E-05, loss= 1.1874 (max= 1.5526), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:22,195 - root - INFO - Step 27900: lr=1.00E-05, loss= 1.2064 (max= 1.5234), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:22,195 - root - INFO - Step 27900: lr=1.00E-05, loss= 1.2064 (max= 1.5234), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:22,195 - root - INFO - Step 27900: lr=1.00E-05, loss= 1.2064 (max= 1.5234), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:22,196 - root - INFO - Step 27900: lr=1.00E-05, loss= 1.2064 (max= 1.5234), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:22,196 - root - INFO - Step 27900: lr=1.00E-05, loss= 1.2064 (max= 1.5234), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:22,196 - root - INFO - Step 27900: lr=1.00E-05, loss= 1.2064 (max= 1.5234), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:22,196 - root - INFO - Step 27900: lr=1.00E-05, loss= 1.2064 (max= 1.5234), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:22,196 - root - INFO - Step 27900: lr=1.00E-05, loss= 1.2064 (max= 1.5234), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:38,090 - root - INFO - Step 27910: lr=1.00E-05, loss= 1.1778 (max= 1.7048), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:38,090 - root - INFO - Step 27910: lr=1.00E-05, loss= 1.1778 (max= 1.7048), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:38,090 - root - INFO - Step 27910: lr=1.00E-05, loss= 1.1778 (max= 1.7048), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:38,090 - root - INFO - Step 27910: lr=1.00E-05, loss= 1.1778 (max= 1.7048), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:38,090 - root - INFO - Step 27910: lr=1.00E-05, loss= 1.1778 (max= 1.7048), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:38,090 - root - INFO - Step 27910: lr=1.00E-05, loss= 1.1778 (max= 1.7048), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:38,090 - root - INFO - Step 27910: lr=1.00E-05, loss= 1.1778 (max= 1.7048), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:38,090 - root - INFO - Step 27910: lr=1.00E-05, loss= 1.1778 (max= 1.7048), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:54,037 - root - INFO - Step 27920: lr=1.00E-05, loss= 1.1897 (max= 1.6703), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:54,037 - root - INFO - Step 27920: lr=1.00E-05, loss= 1.1897 (max= 1.6703), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:54,037 - root - INFO - Step 27920: lr=1.00E-05, loss= 1.1897 (max= 1.6703), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:54,037 - root - INFO - Step 27920: lr=1.00E-05, loss= 1.1897 (max= 1.6703), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:54,037 - root - INFO - Step 27920: lr=1.00E-05, loss= 1.1897 (max= 1.6703), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:54,037 - root - INFO - Step 27920: lr=1.00E-05, loss= 1.1897 (max= 1.6703), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:54,037 - root - INFO - Step 27920: lr=1.00E-05, loss= 1.1897 (max= 1.6703), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:56:54,037 - root - INFO - Step 27920: lr=1.00E-05, loss= 1.1897 (max= 1.6703), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:09,927 - root - INFO - Step 27930: lr=1.00E-05, loss= 1.1798 (max= 1.7284), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:09,927 - root - INFO - Step 27930: lr=1.00E-05, loss= 1.1798 (max= 1.7284), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:09,927 - root - INFO - Step 27930: lr=1.00E-05, loss= 1.1798 (max= 1.7284), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:09,927 - root - INFO - Step 27930: lr=1.00E-05, loss= 1.1798 (max= 1.7284), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:09,928 - root - INFO - Step 27930: lr=1.00E-05, loss= 1.1798 (max= 1.7284), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:09,928 - root - INFO - Step 27930: lr=1.00E-05, loss= 1.1798 (max= 1.7284), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:09,928 - root - INFO - Step 27930: lr=1.00E-05, loss= 1.1798 (max= 1.7284), tps=20626, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:09,928 - root - INFO - Step 27930: lr=1.00E-05, loss= 1.1798 (max= 1.7284), tps=20625, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:25,855 - root - INFO - Step 27940: lr=1.00E-05, loss= 1.1947 (max= 1.5281), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:57:25,855 - root - INFO - Step 27940: lr=1.00E-05, loss= 1.1947 (max= 1.5281), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:57:25,855 - root - INFO - Step 27940: lr=1.00E-05, loss= 1.1947 (max= 1.5281), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:57:25,855 - root - INFO - Step 27940: lr=1.00E-05, loss= 1.1947 (max= 1.5281), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:57:25,855 - root - INFO - Step 27940: lr=1.00E-05, loss= 1.1947 (max= 1.5281), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:57:25,855 - root - INFO - Step 27940: lr=1.00E-05, loss= 1.1947 (max= 1.5281), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:57:25,855 - root - INFO - Step 27940: lr=1.00E-05, loss= 1.1947 (max= 1.5281), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:57:25,855 - root - INFO - Step 27940: lr=1.00E-05, loss= 1.1947 (max= 1.5281), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:57:41,826 - root - INFO - Step 27950: lr=1.00E-05, loss= 1.2042 (max= 1.6341), tps=20521, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:41,826 - root - INFO - Step 27950: lr=1.00E-05, loss= 1.2042 (max= 1.6341), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:41,826 - root - INFO - Step 27950: lr=1.00E-05, loss= 1.2042 (max= 1.6341), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:41,826 - root - INFO - Step 27950: lr=1.00E-05, loss= 1.2042 (max= 1.6341), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:41,827 - root - INFO - Step 27950: lr=1.00E-05, loss= 1.2042 (max= 1.6341), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:41,827 - root - INFO - Step 27950: lr=1.00E-05, loss= 1.2042 (max= 1.6341), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:41,827 - root - INFO - Step 27950: lr=1.00E-05, loss= 1.2042 (max= 1.6341), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:41,827 - root - INFO - Step 27950: lr=1.00E-05, loss= 1.2042 (max= 1.6341), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:57,734 - root - INFO - Step 27960: lr=1.00E-05, loss= 1.1943 (max= 1.6386), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:57,734 - root - INFO - Step 27960: lr=1.00E-05, loss= 1.1943 (max= 1.6386), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:57,734 - root - INFO - Step 27960: lr=1.00E-05, loss= 1.1943 (max= 1.6386), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:57,734 - root - INFO - Step 27960: lr=1.00E-05, loss= 1.1943 (max= 1.6386), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:57,734 - root - INFO - Step 27960: lr=1.00E-05, loss= 1.1943 (max= 1.6386), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:57,735 - root - INFO - Step 27960: lr=1.00E-05, loss= 1.1943 (max= 1.6386), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:57,735 - root - INFO - Step 27960: lr=1.00E-05, loss= 1.1943 (max= 1.6386), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:57:57,735 - root - INFO - Step 27960: lr=1.00E-05, loss= 1.1943 (max= 1.6386), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:58:13,639 - root - INFO - Step 27970: lr=1.00E-05, loss= 1.1679 (max= 1.6298), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:58:13,639 - root - INFO - Step 27970: lr=1.00E-05, loss= 1.1679 (max= 1.6298), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:58:13,639 - root - INFO - Step 27970: lr=1.00E-05, loss= 1.1679 (max= 1.6298), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:58:13,639 - root - INFO - Step 27970: lr=1.00E-05, loss= 1.1679 (max= 1.6298), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:58:13,639 - root - INFO - Step 27970: lr=1.00E-05, loss= 1.1679 (max= 1.6298), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:58:13,639 - root - INFO - Step 27970: lr=1.00E-05, loss= 1.1679 (max= 1.6298), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:58:13,640 - root - INFO - Step 27970: lr=1.00E-05, loss= 1.1679 (max= 1.6298), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:58:13,640 - root - INFO - Step 27970: lr=1.00E-05, loss= 1.1679 (max= 1.6298), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:58:29,588 - root - INFO - Step 27980: lr=1.00E-05, loss= 1.1898 (max= 1.9376), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:58:29,588 - root - INFO - Step 27980: lr=1.00E-05, loss= 1.1898 (max= 1.9376), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:58:29,588 - root - INFO - Step 27980: lr=1.00E-05, loss= 1.1898 (max= 1.9376), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:58:29,588 - root - INFO - Step 27980: lr=1.00E-05, loss= 1.1898 (max= 1.9376), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:58:29,588 - root - INFO - Step 27980: lr=1.00E-05, loss= 1.1898 (max= 1.9376), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:58:29,588 - root - INFO - Step 27980: lr=1.00E-05, loss= 1.1898 (max= 1.9376), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:58:29,588 - root - INFO - Step 27980: lr=1.00E-05, loss= 1.1898 (max= 1.9376), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:58:29,588 - root - INFO - Step 27980: lr=1.00E-05, loss= 1.1898 (max= 1.9376), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:58:45,551 - root - INFO - Step 27990: lr=1.00E-05, loss= 1.1929 (max= 1.5489), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:58:45,551 - root - INFO - Step 27990: lr=1.00E-05, loss= 1.1929 (max= 1.5489), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:58:45,551 - root - INFO - Step 27990: lr=1.00E-05, loss= 1.1929 (max= 1.5489), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:58:45,551 - root - INFO - Step 27990: lr=1.00E-05, loss= 1.1929 (max= 1.5489), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:58:45,551 - root - INFO - Step 27990: lr=1.00E-05, loss= 1.1929 (max= 1.5489), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:58:45,551 - root - INFO - Step 27990: lr=1.00E-05, loss= 1.1929 (max= 1.5489), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:58:45,551 - root - INFO - Step 27990: lr=1.00E-05, loss= 1.1929 (max= 1.5489), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:58:45,552 - root - INFO - Step 27990: lr=1.00E-05, loss= 1.1929 (max= 1.5489), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-28000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-28000! Save time: 4.526084899902344 +2025-10-24 22:59:01,502 - root - INFO - Step 28000: lr=1.00E-05, loss= 1.2160 (max= 1.7129), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:59:01,502 - root - INFO - Step 28000: lr=1.00E-05, loss= 1.2160 (max= 1.7129), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:59:01,502 - root - INFO - Step 28000: lr=1.00E-05, loss= 1.2160 (max= 1.7129), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:59:01,502 - root - INFO - Saving a full checkpoint at step 28000 +2025-10-24 22:59:01,502 - root - INFO - Saving a full checkpoint at step 28000 +2025-10-24 22:59:01,502 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:59:01,502 - root - INFO - Saving a full checkpoint at step 28000 +2025-10-24 22:59:01,502 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:59:01,502 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:59:01,502 - root - INFO - Step 28000: lr=1.00E-05, loss= 1.2160 (max= 1.7129), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:59:01,502 - root - INFO - Step 28000: lr=1.00E-05, loss= 1.2160 (max= 1.7129), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:59:01,502 - root - INFO - Saving a full checkpoint at step 28000 +2025-10-24 22:59:01,502 - root - INFO - Saving a full checkpoint at step 28000 +2025-10-24 22:59:01,502 - root - INFO - Step 28000: lr=1.00E-05, loss= 1.2160 (max= 1.7129), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:59:01,502 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:59:01,502 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:59:01,502 - root - INFO - Step 28000: lr=1.00E-05, loss= 1.2160 (max= 1.7129), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:59:01,502 - root - INFO - Saving a full checkpoint at step 28000 +2025-10-24 22:59:01,502 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:59:01,502 - root - INFO - Step 28000: lr=1.00E-05, loss= 1.2160 (max= 1.7129), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:59:01,502 - root - INFO - Saving a full checkpoint at step 28000 +2025-10-24 22:59:01,502 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:59:01,502 - root - INFO - Saving a full checkpoint at step 28000 +2025-10-24 22:59:01,502 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 22:59:16,701 - root - INFO - Finished saving the checkpoint in 15.20 seconds +2025-10-24 22:59:16,707 - root - INFO - Finished saving the checkpoint in 15.21 seconds +2025-10-24 22:59:16,708 - root - INFO - Finished saving the checkpoint in 15.21 seconds +2025-10-24 22:59:16,708 - root - INFO - Finished saving the checkpoint in 15.21 seconds +2025-10-24 22:59:16,708 - root - INFO - Finished saving the checkpoint in 15.21 seconds +2025-10-24 22:59:16,709 - root - INFO - Finished saving the checkpoint in 15.21 seconds +2025-10-24 22:59:16,709 - root - INFO - Finished saving the checkpoint in 15.21 seconds +2025-10-24 22:59:16,711 - root - INFO - Finished saving the checkpoint in 15.21 seconds +2025-10-24 22:59:32,574 - root - INFO - Step 28010: lr=1.00E-05, loss= 1.1833 (max= 1.6813), tps=10547, mfu=21.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:59:32,574 - root - INFO - Step 28010: lr=1.00E-05, loss= 1.1833 (max= 1.6813), tps=10547, mfu=21.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:59:32,574 - root - INFO - Step 28010: lr=1.00E-05, loss= 1.1833 (max= 1.6813), tps=10547, mfu=21.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:59:32,574 - root - INFO - Step 28010: lr=1.00E-05, loss= 1.1833 (max= 1.6813), tps=10547, mfu=21.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:59:32,574 - root - INFO - Step 28010: lr=1.00E-05, loss= 1.1833 (max= 1.6813), tps=10547, mfu=21.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:59:32,574 - root - INFO - Step 28010: lr=1.00E-05, loss= 1.1833 (max= 1.6813), tps=10547, mfu=21.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:59:32,574 - root - INFO - Step 28010: lr=1.00E-05, loss= 1.1833 (max= 1.6813), tps=10547, mfu=21.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:59:32,575 - root - INFO - Step 28010: lr=1.00E-05, loss= 1.1833 (max= 1.6813), tps=10547, mfu=21.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 22:59:48,554 - root - INFO - Step 28020: lr=1.00E-05, loss= 1.2190 (max= 1.6692), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:59:48,554 - root - INFO - Step 28020: lr=1.00E-05, loss= 1.2190 (max= 1.6692), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:59:48,554 - root - INFO - Step 28020: lr=1.00E-05, loss= 1.2190 (max= 1.6692), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:59:48,554 - root - INFO - Step 28020: lr=1.00E-05, loss= 1.2190 (max= 1.6692), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:59:48,554 - root - INFO - Step 28020: lr=1.00E-05, loss= 1.2190 (max= 1.6692), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:59:48,554 - root - INFO - Step 28020: lr=1.00E-05, loss= 1.2190 (max= 1.6692), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:59:48,555 - root - INFO - Step 28020: lr=1.00E-05, loss= 1.2190 (max= 1.6692), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 22:59:48,555 - root - INFO - Step 28020: lr=1.00E-05, loss= 1.2190 (max= 1.6692), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:04,471 - root - INFO - Step 28030: lr=1.00E-05, loss= 1.1843 (max= 1.5256), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:04,471 - root - INFO - Step 28030: lr=1.00E-05, loss= 1.1843 (max= 1.5256), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:04,471 - root - INFO - Step 28030: lr=1.00E-05, loss= 1.1843 (max= 1.5256), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:04,471 - root - INFO - Step 28030: lr=1.00E-05, loss= 1.1843 (max= 1.5256), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:04,471 - root - INFO - Step 28030: lr=1.00E-05, loss= 1.1843 (max= 1.5256), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:04,471 - root - INFO - Step 28030: lr=1.00E-05, loss= 1.1843 (max= 1.5256), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:04,471 - root - INFO - Step 28030: lr=1.00E-05, loss= 1.1843 (max= 1.5256), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:04,471 - root - INFO - Step 28030: lr=1.00E-05, loss= 1.1843 (max= 1.5256), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:20,407 - root - INFO - Step 28040: lr=1.00E-05, loss= 1.2245 (max= 1.6055), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:20,407 - root - INFO - Step 28040: lr=1.00E-05, loss= 1.2245 (max= 1.6055), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:20,407 - root - INFO - Step 28040: lr=1.00E-05, loss= 1.2245 (max= 1.6055), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:20,407 - root - INFO - Step 28040: lr=1.00E-05, loss= 1.2245 (max= 1.6055), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:20,407 - root - INFO - Step 28040: lr=1.00E-05, loss= 1.2245 (max= 1.6055), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:20,407 - root - INFO - Step 28040: lr=1.00E-05, loss= 1.2245 (max= 1.6055), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:20,407 - root - INFO - Step 28040: lr=1.00E-05, loss= 1.2245 (max= 1.6055), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:20,407 - root - INFO - Step 28040: lr=1.00E-05, loss= 1.2245 (max= 1.6055), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:36,374 - root - INFO - Step 28050: lr=1.00E-05, loss= 1.2070 (max= 1.5698), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:36,374 - root - INFO - Step 28050: lr=1.00E-05, loss= 1.2070 (max= 1.5698), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:36,374 - root - INFO - Step 28050: lr=1.00E-05, loss= 1.2070 (max= 1.5698), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:36,375 - root - INFO - Step 28050: lr=1.00E-05, loss= 1.2070 (max= 1.5698), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:36,375 - root - INFO - Step 28050: lr=1.00E-05, loss= 1.2070 (max= 1.5698), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:36,375 - root - INFO - Step 28050: lr=1.00E-05, loss= 1.2070 (max= 1.5698), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:36,375 - root - INFO - Step 28050: lr=1.00E-05, loss= 1.2070 (max= 1.5698), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:36,375 - root - INFO - Step 28050: lr=1.00E-05, loss= 1.2070 (max= 1.5698), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:52,295 - root - INFO - Step 28060: lr=1.00E-05, loss= 1.1771 (max= 1.5436), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:52,295 - root - INFO - Step 28060: lr=1.00E-05, loss= 1.1771 (max= 1.5436), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:52,295 - root - INFO - Step 28060: lr=1.00E-05, loss= 1.1771 (max= 1.5436), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:52,296 - root - INFO - Step 28060: lr=1.00E-05, loss= 1.1771 (max= 1.5436), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:52,296 - root - INFO - Step 28060: lr=1.00E-05, loss= 1.1771 (max= 1.5436), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:52,296 - root - INFO - Step 28060: lr=1.00E-05, loss= 1.1771 (max= 1.5436), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:52,296 - root - INFO - Step 28060: lr=1.00E-05, loss= 1.1771 (max= 1.5436), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:00:52,296 - root - INFO - Step 28060: lr=1.00E-05, loss= 1.1771 (max= 1.5436), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:01:08,218 - root - INFO - Step 28070: lr=1.00E-05, loss= 1.1963 (max= 1.6862), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:01:08,218 - root - INFO - Step 28070: lr=1.00E-05, loss= 1.1963 (max= 1.6862), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:01:08,218 - root - INFO - Step 28070: lr=1.00E-05, loss= 1.1963 (max= 1.6862), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:01:08,218 - root - INFO - Step 28070: lr=1.00E-05, loss= 1.1963 (max= 1.6862), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:01:08,218 - root - INFO - Step 28070: lr=1.00E-05, loss= 1.1963 (max= 1.6862), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:01:08,218 - root - INFO - Step 28070: lr=1.00E-05, loss= 1.1963 (max= 1.6862), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:01:08,218 - root - INFO - Step 28070: lr=1.00E-05, loss= 1.1963 (max= 1.6862), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:01:08,218 - root - INFO - Step 28070: lr=1.00E-05, loss= 1.1963 (max= 1.6862), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:01:24,112 - root - INFO - Step 28080: lr=1.00E-05, loss= 1.2135 (max= 1.6213), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:01:24,112 - root - INFO - Step 28080: lr=1.00E-05, loss= 1.2135 (max= 1.6213), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:01:24,112 - root - INFO - Step 28080: lr=1.00E-05, loss= 1.2135 (max= 1.6213), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:01:24,113 - root - INFO - Step 28080: lr=1.00E-05, loss= 1.2135 (max= 1.6213), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:01:24,113 - root - INFO - Step 28080: lr=1.00E-05, loss= 1.2135 (max= 1.6213), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:01:24,113 - root - INFO - Step 28080: lr=1.00E-05, loss= 1.2135 (max= 1.6213), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:01:24,113 - root - INFO - Step 28080: lr=1.00E-05, loss= 1.2135 (max= 1.6213), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:01:24,113 - root - INFO - Step 28080: lr=1.00E-05, loss= 1.2135 (max= 1.6213), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:01:40,078 - root - INFO - Step 28090: lr=1.00E-05, loss= 1.1979 (max= 1.9985), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:01:40,078 - root - INFO - Step 28090: lr=1.00E-05, loss= 1.1979 (max= 1.9985), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:01:40,078 - root - INFO - Step 28090: lr=1.00E-05, loss= 1.1979 (max= 1.9985), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:01:40,078 - root - INFO - Step 28090: lr=1.00E-05, loss= 1.1979 (max= 1.9985), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:01:40,079 - root - INFO - Step 28090: lr=1.00E-05, loss= 1.1979 (max= 1.9985), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:01:40,079 - root - INFO - Step 28090: lr=1.00E-05, loss= 1.1979 (max= 1.9985), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:01:40,079 - root - INFO - Step 28090: lr=1.00E-05, loss= 1.1979 (max= 1.9985), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:01:40,079 - root - INFO - Step 28090: lr=1.00E-05, loss= 1.1979 (max= 1.9985), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:01:56,025 - root - INFO - Step 28100: lr=1.00E-05, loss= 1.1926 (max= 1.7133), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:01:56,025 - root - INFO - Step 28100: lr=1.00E-05, loss= 1.1926 (max= 1.7133), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:01:56,025 - root - INFO - Step 28100: lr=1.00E-05, loss= 1.1926 (max= 1.7133), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:01:56,025 - root - INFO - Step 28100: lr=1.00E-05, loss= 1.1926 (max= 1.7133), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:01:56,025 - root - INFO - Step 28100: lr=1.00E-05, loss= 1.1926 (max= 1.7133), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:01:56,025 - root - INFO - Step 28100: lr=1.00E-05, loss= 1.1926 (max= 1.7133), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:01:56,025 - root - INFO - Step 28100: lr=1.00E-05, loss= 1.1926 (max= 1.7133), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:01:56,025 - root - INFO - Step 28100: lr=1.00E-05, loss= 1.1926 (max= 1.7133), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:11,900 - root - INFO - Step 28110: lr=1.00E-05, loss= 1.1981 (max= 1.7536), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:11,899 - root - INFO - Step 28110: lr=1.00E-05, loss= 1.1981 (max= 1.7536), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:11,900 - root - INFO - Step 28110: lr=1.00E-05, loss= 1.1981 (max= 1.7536), tps=20645, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:11,900 - root - INFO - Step 28110: lr=1.00E-05, loss= 1.1981 (max= 1.7536), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:11,900 - root - INFO - Step 28110: lr=1.00E-05, loss= 1.1981 (max= 1.7536), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:11,900 - root - INFO - Step 28110: lr=1.00E-05, loss= 1.1981 (max= 1.7536), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:11,900 - root - INFO - Step 28110: lr=1.00E-05, loss= 1.1981 (max= 1.7536), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:11,900 - root - INFO - Step 28110: lr=1.00E-05, loss= 1.1981 (max= 1.7536), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:27,841 - root - INFO - Step 28120: lr=1.00E-05, loss= 1.2142 (max= 1.6578), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:27,841 - root - INFO - Step 28120: lr=1.00E-05, loss= 1.2142 (max= 1.6578), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:27,841 - root - INFO - Step 28120: lr=1.00E-05, loss= 1.2142 (max= 1.6578), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:27,841 - root - INFO - Step 28120: lr=1.00E-05, loss= 1.2142 (max= 1.6578), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:27,841 - root - INFO - Step 28120: lr=1.00E-05, loss= 1.2142 (max= 1.6578), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:27,841 - root - INFO - Step 28120: lr=1.00E-05, loss= 1.2142 (max= 1.6578), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:27,841 - root - INFO - Step 28120: lr=1.00E-05, loss= 1.2142 (max= 1.6578), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:27,841 - root - INFO - Step 28120: lr=1.00E-05, loss= 1.2142 (max= 1.6578), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:43,814 - root - INFO - Step 28130: lr=1.00E-05, loss= 1.1798 (max= 1.6948), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:02:43,814 - root - INFO - Step 28130: lr=1.00E-05, loss= 1.1798 (max= 1.6948), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:02:43,814 - root - INFO - Step 28130: lr=1.00E-05, loss= 1.1798 (max= 1.6948), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:02:43,814 - root - INFO - Step 28130: lr=1.00E-05, loss= 1.1798 (max= 1.6948), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:02:43,814 - root - INFO - Step 28130: lr=1.00E-05, loss= 1.1798 (max= 1.6948), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:02:43,814 - root - INFO - Step 28130: lr=1.00E-05, loss= 1.1798 (max= 1.6948), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:02:43,814 - root - INFO - Step 28130: lr=1.00E-05, loss= 1.1798 (max= 1.6948), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:02:43,814 - root - INFO - Step 28130: lr=1.00E-05, loss= 1.1798 (max= 1.6948), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:02:59,713 - root - INFO - Step 28140: lr=1.00E-05, loss= 1.2065 (max= 1.7437), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:59,713 - root - INFO - Step 28140: lr=1.00E-05, loss= 1.2065 (max= 1.7437), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:59,713 - root - INFO - Step 28140: lr=1.00E-05, loss= 1.2065 (max= 1.7437), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:59,713 - root - INFO - Step 28140: lr=1.00E-05, loss= 1.2065 (max= 1.7437), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:59,713 - root - INFO - Step 28140: lr=1.00E-05, loss= 1.2065 (max= 1.7437), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:59,713 - root - INFO - Step 28140: lr=1.00E-05, loss= 1.2065 (max= 1.7437), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:59,713 - root - INFO - Step 28140: lr=1.00E-05, loss= 1.2065 (max= 1.7437), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:02:59,713 - root - INFO - Step 28140: lr=1.00E-05, loss= 1.2065 (max= 1.7437), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:03:15,619 - root - INFO - Step 28150: lr=1.00E-05, loss= 1.1834 (max= 1.6357), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:03:15,619 - root - INFO - Step 28150: lr=1.00E-05, loss= 1.1834 (max= 1.6357), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:03:15,619 - root - INFO - Step 28150: lr=1.00E-05, loss= 1.1834 (max= 1.6357), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:03:15,619 - root - INFO - Step 28150: lr=1.00E-05, loss= 1.1834 (max= 1.6357), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:03:15,619 - root - INFO - Step 28150: lr=1.00E-05, loss= 1.1834 (max= 1.6357), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:03:15,619 - root - INFO - Step 28150: lr=1.00E-05, loss= 1.1834 (max= 1.6357), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:03:15,619 - root - INFO - Step 28150: lr=1.00E-05, loss= 1.1834 (max= 1.6357), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:03:15,619 - root - INFO - Step 28150: lr=1.00E-05, loss= 1.1834 (max= 1.6357), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:03:31,545 - root - INFO - Step 28160: lr=1.00E-05, loss= 1.2166 (max= 1.5380), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:03:31,545 - root - INFO - Step 28160: lr=1.00E-05, loss= 1.2166 (max= 1.5380), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:03:31,546 - root - INFO - Step 28160: lr=1.00E-05, loss= 1.2166 (max= 1.5380), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:03:31,546 - root - INFO - Step 28160: lr=1.00E-05, loss= 1.2166 (max= 1.5380), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:03:31,546 - root - INFO - Step 28160: lr=1.00E-05, loss= 1.2166 (max= 1.5380), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:03:31,546 - root - INFO - Step 28160: lr=1.00E-05, loss= 1.2166 (max= 1.5380), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:03:31,546 - root - INFO - Step 28160: lr=1.00E-05, loss= 1.2166 (max= 1.5380), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:03:31,546 - root - INFO - Step 28160: lr=1.00E-05, loss= 1.2166 (max= 1.5380), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:03:47,452 - root - INFO - Step 28170: lr=1.00E-05, loss= 1.1857 (max= 1.5486), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:03:47,452 - root - INFO - Step 28170: lr=1.00E-05, loss= 1.1857 (max= 1.5486), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:03:47,452 - root - INFO - Step 28170: lr=1.00E-05, loss= 1.1857 (max= 1.5486), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:03:47,452 - root - INFO - Step 28170: lr=1.00E-05, loss= 1.1857 (max= 1.5486), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:03:47,452 - root - INFO - Step 28170: lr=1.00E-05, loss= 1.1857 (max= 1.5486), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:03:47,452 - root - INFO - Step 28170: lr=1.00E-05, loss= 1.1857 (max= 1.5486), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:03:47,452 - root - INFO - Step 28170: lr=1.00E-05, loss= 1.1857 (max= 1.5486), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:03:47,452 - root - INFO - Step 28170: lr=1.00E-05, loss= 1.1857 (max= 1.5486), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:03,406 - root - INFO - Step 28180: lr=1.00E-05, loss= 1.1589 (max= 1.5910), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:03,406 - root - INFO - Step 28180: lr=1.00E-05, loss= 1.1589 (max= 1.5910), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:03,406 - root - INFO - Step 28180: lr=1.00E-05, loss= 1.1589 (max= 1.5910), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:03,406 - root - INFO - Step 28180: lr=1.00E-05, loss= 1.1589 (max= 1.5910), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:03,407 - root - INFO - Step 28180: lr=1.00E-05, loss= 1.1589 (max= 1.5910), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:03,407 - root - INFO - Step 28180: lr=1.00E-05, loss= 1.1589 (max= 1.5910), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:03,407 - root - INFO - Step 28180: lr=1.00E-05, loss= 1.1589 (max= 1.5910), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:03,407 - root - INFO - Step 28180: lr=1.00E-05, loss= 1.1589 (max= 1.5910), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:19,363 - root - INFO - Step 28190: lr=1.00E-05, loss= 1.1947 (max= 1.5864), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:19,363 - root - INFO - Step 28190: lr=1.00E-05, loss= 1.1947 (max= 1.5864), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:19,363 - root - INFO - Step 28190: lr=1.00E-05, loss= 1.1947 (max= 1.5864), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:19,364 - root - INFO - Step 28190: lr=1.00E-05, loss= 1.1947 (max= 1.5864), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:19,364 - root - INFO - Step 28190: lr=1.00E-05, loss= 1.1947 (max= 1.5864), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:19,364 - root - INFO - Step 28190: lr=1.00E-05, loss= 1.1947 (max= 1.5864), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:19,364 - root - INFO - Step 28190: lr=1.00E-05, loss= 1.1947 (max= 1.5864), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:19,364 - root - INFO - Step 28190: lr=1.00E-05, loss= 1.1947 (max= 1.5864), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:35,286 - root - INFO - Step 28200: lr=1.00E-05, loss= 1.1816 (max= 1.5463), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:35,286 - root - INFO - Step 28200: lr=1.00E-05, loss= 1.1816 (max= 1.5463), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:35,286 - root - INFO - Step 28200: lr=1.00E-05, loss= 1.1816 (max= 1.5463), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:35,286 - root - INFO - Step 28200: lr=1.00E-05, loss= 1.1816 (max= 1.5463), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:35,286 - root - INFO - Step 28200: lr=1.00E-05, loss= 1.1816 (max= 1.5463), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:35,286 - root - INFO - Step 28200: lr=1.00E-05, loss= 1.1816 (max= 1.5463), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:35,286 - root - INFO - Step 28200: lr=1.00E-05, loss= 1.1816 (max= 1.5463), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:35,286 - root - INFO - Step 28200: lr=1.00E-05, loss= 1.1816 (max= 1.5463), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:04:51,241 - root - INFO - Step 28210: lr=1.00E-05, loss= 1.2206 (max= 1.6020), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:04:51,241 - root - INFO - Step 28210: lr=1.00E-05, loss= 1.2206 (max= 1.6020), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:04:51,241 - root - INFO - Step 28210: lr=1.00E-05, loss= 1.2206 (max= 1.6020), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:04:51,241 - root - INFO - Step 28210: lr=1.00E-05, loss= 1.2206 (max= 1.6020), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:04:51,241 - root - INFO - Step 28210: lr=1.00E-05, loss= 1.2206 (max= 1.6020), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:04:51,242 - root - INFO - Step 28210: lr=1.00E-05, loss= 1.2206 (max= 1.6020), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:04:51,242 - root - INFO - Step 28210: lr=1.00E-05, loss= 1.2206 (max= 1.6020), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:04:51,242 - root - INFO - Step 28210: lr=1.00E-05, loss= 1.2206 (max= 1.6020), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:05:07,126 - root - INFO - Step 28220: lr=1.00E-05, loss= 1.1883 (max= 1.5497), tps=20632, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:07,126 - root - INFO - Step 28220: lr=1.00E-05, loss= 1.1883 (max= 1.5497), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:07,126 - root - INFO - Step 28220: lr=1.00E-05, loss= 1.1883 (max= 1.5497), tps=20632, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:07,126 - root - INFO - Step 28220: lr=1.00E-05, loss= 1.1883 (max= 1.5497), tps=20632, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:07,126 - root - INFO - Step 28220: lr=1.00E-05, loss= 1.1883 (max= 1.5497), tps=20632, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:07,126 - root - INFO - Step 28220: lr=1.00E-05, loss= 1.1883 (max= 1.5497), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:07,127 - root - INFO - Step 28220: lr=1.00E-05, loss= 1.1883 (max= 1.5497), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:07,127 - root - INFO - Step 28220: lr=1.00E-05, loss= 1.1883 (max= 1.5497), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:23,098 - root - INFO - Step 28230: lr=1.00E-05, loss= 1.1988 (max= 1.6523), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:05:23,098 - root - INFO - Step 28230: lr=1.00E-05, loss= 1.1988 (max= 1.6523), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:05:23,098 - root - INFO - Step 28230: lr=1.00E-05, loss= 1.1988 (max= 1.6523), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:05:23,098 - root - INFO - Step 28230: lr=1.00E-05, loss= 1.1988 (max= 1.6523), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:05:23,098 - root - INFO - Step 28230: lr=1.00E-05, loss= 1.1988 (max= 1.6523), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:05:23,098 - root - INFO - Step 28230: lr=1.00E-05, loss= 1.1988 (max= 1.6523), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:05:23,098 - root - INFO - Step 28230: lr=1.00E-05, loss= 1.1988 (max= 1.6523), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:05:23,098 - root - INFO - Step 28230: lr=1.00E-05, loss= 1.1988 (max= 1.6523), tps=20521, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:05:33,451 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:1836153 +2025-10-24 23:05:39,085 - root - INFO - Step 28240: lr=1.00E-05, loss= 1.1559 (max= 1.5733), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:39,085 - root - INFO - Step 28240: lr=1.00E-05, loss= 1.1559 (max= 1.5733), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:39,086 - root - INFO - Step 28240: lr=1.00E-05, loss= 1.1559 (max= 1.5733), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:39,086 - root - INFO - Step 28240: lr=1.00E-05, loss= 1.1559 (max= 1.5733), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:39,086 - root - INFO - Step 28240: lr=1.00E-05, loss= 1.1559 (max= 1.5733), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:39,086 - root - INFO - Step 28240: lr=1.00E-05, loss= 1.1559 (max= 1.5733), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:39,086 - root - INFO - Step 28240: lr=1.00E-05, loss= 1.1559 (max= 1.5733), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:39,086 - root - INFO - Step 28240: lr=1.00E-05, loss= 1.1559 (max= 1.5733), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:55,042 - root - INFO - Step 28250: lr=1.00E-05, loss= 1.1908 (max= 1.7652), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:55,042 - root - INFO - Step 28250: lr=1.00E-05, loss= 1.1908 (max= 1.7652), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:55,042 - root - INFO - Step 28250: lr=1.00E-05, loss= 1.1908 (max= 1.7652), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:55,042 - root - INFO - Step 28250: lr=1.00E-05, loss= 1.1908 (max= 1.7652), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:55,042 - root - INFO - Step 28250: lr=1.00E-05, loss= 1.1908 (max= 1.7652), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:55,042 - root - INFO - Step 28250: lr=1.00E-05, loss= 1.1908 (max= 1.7652), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:55,042 - root - INFO - Step 28250: lr=1.00E-05, loss= 1.1908 (max= 1.7652), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:05:55,042 - root - INFO - Step 28250: lr=1.00E-05, loss= 1.1908 (max= 1.7652), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:06:10,976 - root - INFO - Step 28260: lr=1.00E-05, loss= 1.2043 (max= 1.7129), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:06:10,976 - root - INFO - Step 28260: lr=1.00E-05, loss= 1.2043 (max= 1.7129), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:06:10,976 - root - INFO - Step 28260: lr=1.00E-05, loss= 1.2043 (max= 1.7129), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:06:10,976 - root - INFO - Step 28260: lr=1.00E-05, loss= 1.2043 (max= 1.7129), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:06:10,976 - root - INFO - Step 28260: lr=1.00E-05, loss= 1.2043 (max= 1.7129), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:06:10,976 - root - INFO - Step 28260: lr=1.00E-05, loss= 1.2043 (max= 1.7129), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:06:10,976 - root - INFO - Step 28260: lr=1.00E-05, loss= 1.2043 (max= 1.7129), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:06:10,976 - root - INFO - Step 28260: lr=1.00E-05, loss= 1.2043 (max= 1.7129), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:06:26,886 - root - INFO - Step 28270: lr=1.00E-05, loss= 1.2242 (max= 1.5614), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:06:26,886 - root - INFO - Step 28270: lr=1.00E-05, loss= 1.2242 (max= 1.5614), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:06:26,886 - root - INFO - Step 28270: lr=1.00E-05, loss= 1.2242 (max= 1.5614), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:06:26,886 - root - INFO - Step 28270: lr=1.00E-05, loss= 1.2242 (max= 1.5614), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:06:26,886 - root - INFO - Step 28270: lr=1.00E-05, loss= 1.2242 (max= 1.5614), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:06:26,886 - root - INFO - Step 28270: lr=1.00E-05, loss= 1.2242 (max= 1.5614), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:06:26,886 - root - INFO - Step 28270: lr=1.00E-05, loss= 1.2242 (max= 1.5614), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:06:26,886 - root - INFO - Step 28270: lr=1.00E-05, loss= 1.2242 (max= 1.5614), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:06:42,850 - root - INFO - Step 28280: lr=1.00E-05, loss= 1.1999 (max= 1.5889), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:06:42,850 - root - INFO - Step 28280: lr=1.00E-05, loss= 1.1999 (max= 1.5889), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:06:42,850 - root - INFO - Step 28280: lr=1.00E-05, loss= 1.1999 (max= 1.5889), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:06:42,850 - root - INFO - Step 28280: lr=1.00E-05, loss= 1.1999 (max= 1.5889), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:06:42,850 - root - INFO - Step 28280: lr=1.00E-05, loss= 1.1999 (max= 1.5889), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:06:42,850 - root - INFO - Step 28280: lr=1.00E-05, loss= 1.1999 (max= 1.5889), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:06:42,850 - root - INFO - Step 28280: lr=1.00E-05, loss= 1.1999 (max= 1.5889), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:06:42,850 - root - INFO - Step 28280: lr=1.00E-05, loss= 1.1999 (max= 1.5889), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:06:58,756 - root - INFO - Step 28290: lr=1.00E-05, loss= 1.1846 (max= 1.5557), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:06:58,756 - root - INFO - Step 28290: lr=1.00E-05, loss= 1.1846 (max= 1.5557), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:06:58,756 - root - INFO - Step 28290: lr=1.00E-05, loss= 1.1846 (max= 1.5557), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:06:58,756 - root - INFO - Step 28290: lr=1.00E-05, loss= 1.1846 (max= 1.5557), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:06:58,756 - root - INFO - Step 28290: lr=1.00E-05, loss= 1.1846 (max= 1.5557), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:06:58,756 - root - INFO - Step 28290: lr=1.00E-05, loss= 1.1846 (max= 1.5557), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:06:58,756 - root - INFO - Step 28290: lr=1.00E-05, loss= 1.1846 (max= 1.5557), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:06:58,756 - root - INFO - Step 28290: lr=1.00E-05, loss= 1.1846 (max= 1.5557), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:07:14,717 - root - INFO - Step 28300: lr=1.00E-05, loss= 1.2025 (max= 1.5744), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:07:14,717 - root - INFO - Step 28300: lr=1.00E-05, loss= 1.2025 (max= 1.5744), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:07:14,717 - root - INFO - Step 28300: lr=1.00E-05, loss= 1.2025 (max= 1.5744), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:07:14,718 - root - INFO - Step 28300: lr=1.00E-05, loss= 1.2025 (max= 1.5744), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:07:14,718 - root - INFO - Step 28300: lr=1.00E-05, loss= 1.2025 (max= 1.5744), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:07:14,718 - root - INFO - Step 28300: lr=1.00E-05, loss= 1.2025 (max= 1.5744), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:07:14,718 - root - INFO - Step 28300: lr=1.00E-05, loss= 1.2025 (max= 1.5744), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:07:14,718 - root - INFO - Step 28300: lr=1.00E-05, loss= 1.2025 (max= 1.5744), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:07:30,677 - root - INFO - Step 28310: lr=1.00E-05, loss= 1.2340 (max= 1.6617), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:07:30,677 - root - INFO - Step 28310: lr=1.00E-05, loss= 1.2340 (max= 1.6617), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:07:30,677 - root - INFO - Step 28310: lr=1.00E-05, loss= 1.2340 (max= 1.6617), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:07:30,677 - root - INFO - Step 28310: lr=1.00E-05, loss= 1.2340 (max= 1.6617), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:07:30,677 - root - INFO - Step 28310: lr=1.00E-05, loss= 1.2340 (max= 1.6617), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:07:30,677 - root - INFO - Step 28310: lr=1.00E-05, loss= 1.2340 (max= 1.6617), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:07:30,677 - root - INFO - Step 28310: lr=1.00E-05, loss= 1.2340 (max= 1.6617), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:07:30,677 - root - INFO - Step 28310: lr=1.00E-05, loss= 1.2340 (max= 1.6617), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:07:46,617 - root - INFO - Step 28320: lr=1.00E-05, loss= 1.1341 (max= 1.5497), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:07:46,617 - root - INFO - Step 28320: lr=1.00E-05, loss= 1.1341 (max= 1.5497), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:07:46,617 - root - INFO - Step 28320: lr=1.00E-05, loss= 1.1341 (max= 1.5497), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:07:46,617 - root - INFO - Step 28320: lr=1.00E-05, loss= 1.1341 (max= 1.5497), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:07:46,617 - root - INFO - Step 28320: lr=1.00E-05, loss= 1.1341 (max= 1.5497), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:07:46,617 - root - INFO - Step 28320: lr=1.00E-05, loss= 1.1341 (max= 1.5497), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:07:46,617 - root - INFO - Step 28320: lr=1.00E-05, loss= 1.1341 (max= 1.5497), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:07:46,617 - root - INFO - Step 28320: lr=1.00E-05, loss= 1.1341 (max= 1.5497), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:08:02,610 - root - INFO - Step 28330: lr=1.00E-05, loss= 1.2158 (max= 1.7251), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:08:02,610 - root - INFO - Step 28330: lr=1.00E-05, loss= 1.2158 (max= 1.7251), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:08:02,610 - root - INFO - Step 28330: lr=1.00E-05, loss= 1.2158 (max= 1.7251), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:08:02,610 - root - INFO - Step 28330: lr=1.00E-05, loss= 1.2158 (max= 1.7251), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:08:02,610 - root - INFO - Step 28330: lr=1.00E-05, loss= 1.2158 (max= 1.7251), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:08:02,610 - root - INFO - Step 28330: lr=1.00E-05, loss= 1.2158 (max= 1.7251), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:08:02,610 - root - INFO - Step 28330: lr=1.00E-05, loss= 1.2158 (max= 1.7251), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:08:02,610 - root - INFO - Step 28330: lr=1.00E-05, loss= 1.2158 (max= 1.7251), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:08:08,110 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:5124780 +2025-10-24 23:08:18,547 - root - INFO - Step 28340: lr=1.00E-05, loss= 1.1667 (max= 1.5793), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:18,547 - root - INFO - Step 28340: lr=1.00E-05, loss= 1.1667 (max= 1.5793), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:18,548 - root - INFO - Step 28340: lr=1.00E-05, loss= 1.1667 (max= 1.5793), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:18,548 - root - INFO - Step 28340: lr=1.00E-05, loss= 1.1667 (max= 1.5793), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:18,548 - root - INFO - Step 28340: lr=1.00E-05, loss= 1.1667 (max= 1.5793), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:18,548 - root - INFO - Step 28340: lr=1.00E-05, loss= 1.1667 (max= 1.5793), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:18,548 - root - INFO - Step 28340: lr=1.00E-05, loss= 1.1667 (max= 1.5793), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:18,548 - root - INFO - Step 28340: lr=1.00E-05, loss= 1.1667 (max= 1.5793), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:34,516 - root - INFO - Step 28350: lr=1.00E-05, loss= 1.1873 (max= 1.6500), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:34,516 - root - INFO - Step 28350: lr=1.00E-05, loss= 1.1873 (max= 1.6500), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:34,516 - root - INFO - Step 28350: lr=1.00E-05, loss= 1.1873 (max= 1.6500), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:34,516 - root - INFO - Step 28350: lr=1.00E-05, loss= 1.1873 (max= 1.6500), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:34,516 - root - INFO - Step 28350: lr=1.00E-05, loss= 1.1873 (max= 1.6500), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:34,516 - root - INFO - Step 28350: lr=1.00E-05, loss= 1.1873 (max= 1.6500), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:34,516 - root - INFO - Step 28350: lr=1.00E-05, loss= 1.1873 (max= 1.6500), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:34,516 - root - INFO - Step 28350: lr=1.00E-05, loss= 1.1873 (max= 1.6500), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:50,450 - root - INFO - Step 28360: lr=1.00E-05, loss= 1.1963 (max= 1.6027), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:50,450 - root - INFO - Step 28360: lr=1.00E-05, loss= 1.1963 (max= 1.6027), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:50,450 - root - INFO - Step 28360: lr=1.00E-05, loss= 1.1963 (max= 1.6027), tps=20569, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:50,450 - root - INFO - Step 28360: lr=1.00E-05, loss= 1.1963 (max= 1.6027), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:50,450 - root - INFO - Step 28360: lr=1.00E-05, loss= 1.1963 (max= 1.6027), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:50,450 - root - INFO - Step 28360: lr=1.00E-05, loss= 1.1963 (max= 1.6027), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:50,450 - root - INFO - Step 28360: lr=1.00E-05, loss= 1.1963 (max= 1.6027), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:08:50,450 - root - INFO - Step 28360: lr=1.00E-05, loss= 1.1963 (max= 1.6027), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:06,363 - root - INFO - Step 28370: lr=1.00E-05, loss= 1.1752 (max= 1.5698), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:06,363 - root - INFO - Step 28370: lr=1.00E-05, loss= 1.1752 (max= 1.5698), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:06,363 - root - INFO - Step 28370: lr=1.00E-05, loss= 1.1752 (max= 1.5698), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:06,363 - root - INFO - Step 28370: lr=1.00E-05, loss= 1.1752 (max= 1.5698), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:06,363 - root - INFO - Step 28370: lr=1.00E-05, loss= 1.1752 (max= 1.5698), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:06,363 - root - INFO - Step 28370: lr=1.00E-05, loss= 1.1752 (max= 1.5698), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:06,363 - root - INFO - Step 28370: lr=1.00E-05, loss= 1.1752 (max= 1.5698), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:06,363 - root - INFO - Step 28370: lr=1.00E-05, loss= 1.1752 (max= 1.5698), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:22,313 - root - INFO - Step 28380: lr=1.00E-05, loss= 1.1944 (max= 1.8513), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:22,313 - root - INFO - Step 28380: lr=1.00E-05, loss= 1.1944 (max= 1.8513), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:22,313 - root - INFO - Step 28380: lr=1.00E-05, loss= 1.1944 (max= 1.8513), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:22,313 - root - INFO - Step 28380: lr=1.00E-05, loss= 1.1944 (max= 1.8513), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:22,313 - root - INFO - Step 28380: lr=1.00E-05, loss= 1.1944 (max= 1.8513), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:22,313 - root - INFO - Step 28380: lr=1.00E-05, loss= 1.1944 (max= 1.8513), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:22,313 - root - INFO - Step 28380: lr=1.00E-05, loss= 1.1944 (max= 1.8513), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:22,313 - root - INFO - Step 28380: lr=1.00E-05, loss= 1.1944 (max= 1.8513), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:38,260 - root - INFO - Step 28390: lr=1.00E-05, loss= 1.1740 (max= 1.5584), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:38,260 - root - INFO - Step 28390: lr=1.00E-05, loss= 1.1740 (max= 1.5584), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:38,260 - root - INFO - Step 28390: lr=1.00E-05, loss= 1.1740 (max= 1.5584), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:38,260 - root - INFO - Step 28390: lr=1.00E-05, loss= 1.1740 (max= 1.5584), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:38,260 - root - INFO - Step 28390: lr=1.00E-05, loss= 1.1740 (max= 1.5584), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:38,260 - root - INFO - Step 28390: lr=1.00E-05, loss= 1.1740 (max= 1.5584), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:38,260 - root - INFO - Step 28390: lr=1.00E-05, loss= 1.1740 (max= 1.5584), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:38,260 - root - INFO - Step 28390: lr=1.00E-05, loss= 1.1740 (max= 1.5584), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:54,193 - root - INFO - Step 28400: lr=1.00E-05, loss= 1.2145 (max= 1.7013), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:54,193 - root - INFO - Step 28400: lr=1.00E-05, loss= 1.2145 (max= 1.7013), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:54,193 - root - INFO - Step 28400: lr=1.00E-05, loss= 1.2145 (max= 1.7013), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:54,194 - root - INFO - Step 28400: lr=1.00E-05, loss= 1.2145 (max= 1.7013), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:54,194 - root - INFO - Step 28400: lr=1.00E-05, loss= 1.2145 (max= 1.7013), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:54,194 - root - INFO - Step 28400: lr=1.00E-05, loss= 1.2145 (max= 1.7013), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:54,194 - root - INFO - Step 28400: lr=1.00E-05, loss= 1.2145 (max= 1.7013), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:09:54,194 - root - INFO - Step 28400: lr=1.00E-05, loss= 1.2145 (max= 1.7013), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:10:10,133 - root - INFO - Step 28410: lr=1.00E-05, loss= 1.1945 (max= 1.5524), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:10:10,133 - root - INFO - Step 28410: lr=1.00E-05, loss= 1.1945 (max= 1.5524), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:10:10,133 - root - INFO - Step 28410: lr=1.00E-05, loss= 1.1945 (max= 1.5524), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:10:10,133 - root - INFO - Step 28410: lr=1.00E-05, loss= 1.1945 (max= 1.5524), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:10:10,133 - root - INFO - Step 28410: lr=1.00E-05, loss= 1.1945 (max= 1.5524), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:10:10,133 - root - INFO - Step 28410: lr=1.00E-05, loss= 1.1945 (max= 1.5524), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:10:10,133 - root - INFO - Step 28410: lr=1.00E-05, loss= 1.1945 (max= 1.5524), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:10:10,133 - root - INFO - Step 28410: lr=1.00E-05, loss= 1.1945 (max= 1.5524), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:10:26,051 - root - INFO - Step 28420: lr=1.00E-05, loss= 1.2009 (max= 1.7231), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:10:26,051 - root - INFO - Step 28420: lr=1.00E-05, loss= 1.2009 (max= 1.7231), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:10:26,051 - root - INFO - Step 28420: lr=1.00E-05, loss= 1.2009 (max= 1.7231), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:10:26,051 - root - INFO - Step 28420: lr=1.00E-05, loss= 1.2009 (max= 1.7231), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:10:26,051 - root - INFO - Step 28420: lr=1.00E-05, loss= 1.2009 (max= 1.7231), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:10:26,051 - root - INFO - Step 28420: lr=1.00E-05, loss= 1.2009 (max= 1.7231), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:10:26,051 - root - INFO - Step 28420: lr=1.00E-05, loss= 1.2009 (max= 1.7231), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:10:26,051 - root - INFO - Step 28420: lr=1.00E-05, loss= 1.2009 (max= 1.7231), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:10:41,949 - root - INFO - Step 28430: lr=1.00E-05, loss= 1.1924 (max= 2.0739), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:10:41,949 - root - INFO - Step 28430: lr=1.00E-05, loss= 1.1924 (max= 2.0739), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:10:41,949 - root - INFO - Step 28430: lr=1.00E-05, loss= 1.1924 (max= 2.0739), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:10:41,949 - root - INFO - Step 28430: lr=1.00E-05, loss= 1.1924 (max= 2.0739), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:10:41,949 - root - INFO - Step 28430: lr=1.00E-05, loss= 1.1924 (max= 2.0739), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:10:41,949 - root - INFO - Step 28430: lr=1.00E-05, loss= 1.1924 (max= 2.0739), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:10:41,950 - root - INFO - Step 28430: lr=1.00E-05, loss= 1.1924 (max= 2.0739), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:10:41,950 - root - INFO - Step 28430: lr=1.00E-05, loss= 1.1924 (max= 2.0739), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:10:57,848 - root - INFO - Step 28440: lr=1.00E-05, loss= 1.1738 (max= 1.6375), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:10:57,848 - root - INFO - Step 28440: lr=1.00E-05, loss= 1.1738 (max= 1.6375), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:10:57,848 - root - INFO - Step 28440: lr=1.00E-05, loss= 1.1738 (max= 1.6375), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:10:57,848 - root - INFO - Step 28440: lr=1.00E-05, loss= 1.1738 (max= 1.6375), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:10:57,848 - root - INFO - Step 28440: lr=1.00E-05, loss= 1.1738 (max= 1.6375), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:10:57,848 - root - INFO - Step 28440: lr=1.00E-05, loss= 1.1738 (max= 1.6375), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:10:57,848 - root - INFO - Step 28440: lr=1.00E-05, loss= 1.1738 (max= 1.6375), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:10:57,848 - root - INFO - Step 28440: lr=1.00E-05, loss= 1.1738 (max= 1.6375), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:11:13,759 - root - INFO - Step 28450: lr=1.00E-05, loss= 1.1440 (max= 1.5190), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:11:13,759 - root - INFO - Step 28450: lr=1.00E-05, loss= 1.1440 (max= 1.5190), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:11:13,759 - root - INFO - Step 28450: lr=1.00E-05, loss= 1.1440 (max= 1.5190), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:11:13,759 - root - INFO - Step 28450: lr=1.00E-05, loss= 1.1440 (max= 1.5190), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:11:13,759 - root - INFO - Step 28450: lr=1.00E-05, loss= 1.1440 (max= 1.5190), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:11:13,759 - root - INFO - Step 28450: lr=1.00E-05, loss= 1.1440 (max= 1.5190), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:11:13,759 - root - INFO - Step 28450: lr=1.00E-05, loss= 1.1440 (max= 1.5190), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:11:13,759 - root - INFO - Step 28450: lr=1.00E-05, loss= 1.1440 (max= 1.5190), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:11:29,693 - root - INFO - Step 28460: lr=1.00E-05, loss= 1.1852 (max= 1.6097), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:11:29,693 - root - INFO - Step 28460: lr=1.00E-05, loss= 1.1852 (max= 1.6097), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:11:29,693 - root - INFO - Step 28460: lr=1.00E-05, loss= 1.1852 (max= 1.6097), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:11:29,693 - root - INFO - Step 28460: lr=1.00E-05, loss= 1.1852 (max= 1.6097), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:11:29,693 - root - INFO - Step 28460: lr=1.00E-05, loss= 1.1852 (max= 1.6097), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:11:29,693 - root - INFO - Step 28460: lr=1.00E-05, loss= 1.1852 (max= 1.6097), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:11:29,693 - root - INFO - Step 28460: lr=1.00E-05, loss= 1.1852 (max= 1.6097), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:11:29,693 - root - INFO - Step 28460: lr=1.00E-05, loss= 1.1852 (max= 1.6097), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:11:45,617 - root - INFO - Step 28470: lr=1.00E-05, loss= 1.1985 (max= 1.6465), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:11:45,617 - root - INFO - Step 28470: lr=1.00E-05, loss= 1.1985 (max= 1.6465), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:11:45,617 - root - INFO - Step 28470: lr=1.00E-05, loss= 1.1985 (max= 1.6465), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:11:45,617 - root - INFO - Step 28470: lr=1.00E-05, loss= 1.1985 (max= 1.6465), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:11:45,617 - root - INFO - Step 28470: lr=1.00E-05, loss= 1.1985 (max= 1.6465), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:11:45,617 - root - INFO - Step 28470: lr=1.00E-05, loss= 1.1985 (max= 1.6465), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:11:45,617 - root - INFO - Step 28470: lr=1.00E-05, loss= 1.1985 (max= 1.6465), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:11:45,617 - root - INFO - Step 28470: lr=1.00E-05, loss= 1.1985 (max= 1.6465), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:12:01,511 - root - INFO - Step 28480: lr=1.00E-05, loss= 1.1611 (max= 1.5761), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:12:01,511 - root - INFO - Step 28480: lr=1.00E-05, loss= 1.1611 (max= 1.5761), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:12:01,511 - root - INFO - Step 28480: lr=1.00E-05, loss= 1.1611 (max= 1.5761), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:12:01,512 - root - INFO - Step 28480: lr=1.00E-05, loss= 1.1611 (max= 1.5761), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:12:01,512 - root - INFO - Step 28480: lr=1.00E-05, loss= 1.1611 (max= 1.5761), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:12:01,512 - root - INFO - Step 28480: lr=1.00E-05, loss= 1.1611 (max= 1.5761), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:12:01,512 - root - INFO - Step 28480: lr=1.00E-05, loss= 1.1611 (max= 1.5761), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:12:01,512 - root - INFO - Step 28480: lr=1.00E-05, loss= 1.1611 (max= 1.5761), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:12:17,442 - root - INFO - Step 28490: lr=1.00E-05, loss= 1.1725 (max= 1.6801), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:12:17,442 - root - INFO - Step 28490: lr=1.00E-05, loss= 1.1725 (max= 1.6801), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:12:17,442 - root - INFO - Step 28490: lr=1.00E-05, loss= 1.1725 (max= 1.6801), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:12:17,442 - root - INFO - Step 28490: lr=1.00E-05, loss= 1.1725 (max= 1.6801), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:12:17,443 - root - INFO - Step 28490: lr=1.00E-05, loss= 1.1725 (max= 1.6801), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:12:17,443 - root - INFO - Step 28490: lr=1.00E-05, loss= 1.1725 (max= 1.6801), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:12:17,443 - root - INFO - Step 28490: lr=1.00E-05, loss= 1.1725 (max= 1.6801), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:12:17,443 - root - INFO - Step 28490: lr=1.00E-05, loss= 1.1725 (max= 1.6801), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:12:33,430 - root - INFO - Step 28500: lr=1.00E-05, loss= 1.1949 (max= 1.7056), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:12:33,430 - root - INFO - Step 28500: lr=1.00E-05, loss= 1.1949 (max= 1.7056), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:12:33,430 - root - INFO - Step 28500: lr=1.00E-05, loss= 1.1949 (max= 1.7056), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:12:33,430 - root - INFO - Step 28500: lr=1.00E-05, loss= 1.1949 (max= 1.7056), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:12:33,430 - root - INFO - Step 28500: lr=1.00E-05, loss= 1.1949 (max= 1.7056), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:12:33,430 - root - INFO - Step 28500: lr=1.00E-05, loss= 1.1949 (max= 1.7056), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:12:33,430 - root - INFO - Step 28500: lr=1.00E-05, loss= 1.1949 (max= 1.7056), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:12:33,430 - root - INFO - Step 28500: lr=1.00E-05, loss= 1.1949 (max= 1.7056), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:12:49,373 - root - INFO - Step 28510: lr=1.00E-05, loss= 1.1782 (max= 1.7504), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:12:49,373 - root - INFO - Step 28510: lr=1.00E-05, loss= 1.1782 (max= 1.7504), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:12:49,373 - root - INFO - Step 28510: lr=1.00E-05, loss= 1.1782 (max= 1.7504), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:12:49,373 - root - INFO - Step 28510: lr=1.00E-05, loss= 1.1782 (max= 1.7504), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:12:49,373 - root - INFO - Step 28510: lr=1.00E-05, loss= 1.1782 (max= 1.7504), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:12:49,373 - root - INFO - Step 28510: lr=1.00E-05, loss= 1.1782 (max= 1.7504), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:12:49,374 - root - INFO - Step 28510: lr=1.00E-05, loss= 1.1782 (max= 1.7504), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:12:49,374 - root - INFO - Step 28510: lr=1.00E-05, loss= 1.1782 (max= 1.7504), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:13:05,282 - root - INFO - Step 28520: lr=1.00E-05, loss= 1.2124 (max= 1.6152), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:05,282 - root - INFO - Step 28520: lr=1.00E-05, loss= 1.2124 (max= 1.6152), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:05,282 - root - INFO - Step 28520: lr=1.00E-05, loss= 1.2124 (max= 1.6152), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:05,282 - root - INFO - Step 28520: lr=1.00E-05, loss= 1.2124 (max= 1.6152), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:05,282 - root - INFO - Step 28520: lr=1.00E-05, loss= 1.2124 (max= 1.6152), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:05,282 - root - INFO - Step 28520: lr=1.00E-05, loss= 1.2124 (max= 1.6152), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:05,282 - root - INFO - Step 28520: lr=1.00E-05, loss= 1.2124 (max= 1.6152), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:05,282 - root - INFO - Step 28520: lr=1.00E-05, loss= 1.2124 (max= 1.6152), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:21,233 - root - INFO - Step 28530: lr=1.00E-05, loss= 1.1879 (max= 1.5500), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:21,233 - root - INFO - Step 28530: lr=1.00E-05, loss= 1.1879 (max= 1.5500), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:21,233 - root - INFO - Step 28530: lr=1.00E-05, loss= 1.1879 (max= 1.5500), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:21,233 - root - INFO - Step 28530: lr=1.00E-05, loss= 1.1879 (max= 1.5500), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:21,233 - root - INFO - Step 28530: lr=1.00E-05, loss= 1.1879 (max= 1.5500), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:21,233 - root - INFO - Step 28530: lr=1.00E-05, loss= 1.1879 (max= 1.5500), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:21,233 - root - INFO - Step 28530: lr=1.00E-05, loss= 1.1879 (max= 1.5500), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:21,233 - root - INFO - Step 28530: lr=1.00E-05, loss= 1.1879 (max= 1.5500), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:37,138 - root - INFO - Step 28540: lr=1.00E-05, loss= 1.2195 (max= 1.6434), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:37,138 - root - INFO - Step 28540: lr=1.00E-05, loss= 1.2195 (max= 1.6434), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:37,138 - root - INFO - Step 28540: lr=1.00E-05, loss= 1.2195 (max= 1.6434), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:37,138 - root - INFO - Step 28540: lr=1.00E-05, loss= 1.2195 (max= 1.6434), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:37,138 - root - INFO - Step 28540: lr=1.00E-05, loss= 1.2195 (max= 1.6434), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:37,138 - root - INFO - Step 28540: lr=1.00E-05, loss= 1.2195 (max= 1.6434), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:37,138 - root - INFO - Step 28540: lr=1.00E-05, loss= 1.2195 (max= 1.6434), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:37,138 - root - INFO - Step 28540: lr=1.00E-05, loss= 1.2195 (max= 1.6434), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:53,039 - root - INFO - Step 28550: lr=1.00E-05, loss= 1.1981 (max= 1.6629), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:53,039 - root - INFO - Step 28550: lr=1.00E-05, loss= 1.1981 (max= 1.6629), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:53,039 - root - INFO - Step 28550: lr=1.00E-05, loss= 1.1981 (max= 1.6629), tps=20612, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:53,039 - root - INFO - Step 28550: lr=1.00E-05, loss= 1.1981 (max= 1.6629), tps=20612, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:53,039 - root - INFO - Step 28550: lr=1.00E-05, loss= 1.1981 (max= 1.6629), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:53,039 - root - INFO - Step 28550: lr=1.00E-05, loss= 1.1981 (max= 1.6629), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:53,039 - root - INFO - Step 28550: lr=1.00E-05, loss= 1.1981 (max= 1.6629), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:13:53,039 - root - INFO - Step 28550: lr=1.00E-05, loss= 1.1981 (max= 1.6629), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:09,008 - root - INFO - Step 28560: lr=1.00E-05, loss= 1.1528 (max= 1.6623), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:09,008 - root - INFO - Step 28560: lr=1.00E-05, loss= 1.1528 (max= 1.6623), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:09,008 - root - INFO - Step 28560: lr=1.00E-05, loss= 1.1528 (max= 1.6623), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:09,008 - root - INFO - Step 28560: lr=1.00E-05, loss= 1.1528 (max= 1.6623), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:09,008 - root - INFO - Step 28560: lr=1.00E-05, loss= 1.1528 (max= 1.6623), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:09,008 - root - INFO - Step 28560: lr=1.00E-05, loss= 1.1528 (max= 1.6623), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:09,008 - root - INFO - Step 28560: lr=1.00E-05, loss= 1.1528 (max= 1.6623), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:09,008 - root - INFO - Step 28560: lr=1.00E-05, loss= 1.1528 (max= 1.6623), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:24,900 - root - INFO - Step 28570: lr=1.00E-05, loss= 1.2031 (max= 1.5934), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:24,900 - root - INFO - Step 28570: lr=1.00E-05, loss= 1.2031 (max= 1.5934), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:24,900 - root - INFO - Step 28570: lr=1.00E-05, loss= 1.2031 (max= 1.5934), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:24,900 - root - INFO - Step 28570: lr=1.00E-05, loss= 1.2031 (max= 1.5934), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:24,900 - root - INFO - Step 28570: lr=1.00E-05, loss= 1.2031 (max= 1.5934), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:24,900 - root - INFO - Step 28570: lr=1.00E-05, loss= 1.2031 (max= 1.5934), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:24,900 - root - INFO - Step 28570: lr=1.00E-05, loss= 1.2031 (max= 1.5934), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:24,900 - root - INFO - Step 28570: lr=1.00E-05, loss= 1.2031 (max= 1.5934), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:40,876 - root - INFO - Step 28580: lr=1.00E-05, loss= 1.1821 (max= 1.5381), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:40,876 - root - INFO - Step 28580: lr=1.00E-05, loss= 1.1821 (max= 1.5381), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:40,876 - root - INFO - Step 28580: lr=1.00E-05, loss= 1.1821 (max= 1.5381), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:40,876 - root - INFO - Step 28580: lr=1.00E-05, loss= 1.1821 (max= 1.5381), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:40,876 - root - INFO - Step 28580: lr=1.00E-05, loss= 1.1821 (max= 1.5381), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:40,876 - root - INFO - Step 28580: lr=1.00E-05, loss= 1.1821 (max= 1.5381), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:40,876 - root - INFO - Step 28580: lr=1.00E-05, loss= 1.1821 (max= 1.5381), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:40,876 - root - INFO - Step 28580: lr=1.00E-05, loss= 1.1821 (max= 1.5381), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:56,794 - root - INFO - Step 28590: lr=1.00E-05, loss= 1.1757 (max= 1.5214), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:56,794 - root - INFO - Step 28590: lr=1.00E-05, loss= 1.1757 (max= 1.5214), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:56,794 - root - INFO - Step 28590: lr=1.00E-05, loss= 1.1757 (max= 1.5214), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:56,794 - root - INFO - Step 28590: lr=1.00E-05, loss= 1.1757 (max= 1.5214), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:56,794 - root - INFO - Step 28590: lr=1.00E-05, loss= 1.1757 (max= 1.5214), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:56,794 - root - INFO - Step 28590: lr=1.00E-05, loss= 1.1757 (max= 1.5214), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:56,794 - root - INFO - Step 28590: lr=1.00E-05, loss= 1.1757 (max= 1.5214), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:14:56,794 - root - INFO - Step 28590: lr=1.00E-05, loss= 1.1757 (max= 1.5214), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:12,732 - root - INFO - Step 28600: lr=1.00E-05, loss= 1.1947 (max= 1.6746), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:12,732 - root - INFO - Step 28600: lr=1.00E-05, loss= 1.1947 (max= 1.6746), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:12,732 - root - INFO - Step 28600: lr=1.00E-05, loss= 1.1947 (max= 1.6746), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:12,732 - root - INFO - Step 28600: lr=1.00E-05, loss= 1.1947 (max= 1.6746), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:12,733 - root - INFO - Step 28600: lr=1.00E-05, loss= 1.1947 (max= 1.6746), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:12,733 - root - INFO - Step 28600: lr=1.00E-05, loss= 1.1947 (max= 1.6746), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:12,733 - root - INFO - Step 28600: lr=1.00E-05, loss= 1.1947 (max= 1.6746), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:12,733 - root - INFO - Step 28600: lr=1.00E-05, loss= 1.1947 (max= 1.6746), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:28,666 - root - INFO - Step 28610: lr=1.00E-05, loss= 1.2015 (max= 1.6413), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:28,666 - root - INFO - Step 28610: lr=1.00E-05, loss= 1.2015 (max= 1.6413), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:28,666 - root - INFO - Step 28610: lr=1.00E-05, loss= 1.2015 (max= 1.6413), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:28,666 - root - INFO - Step 28610: lr=1.00E-05, loss= 1.2015 (max= 1.6413), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:28,667 - root - INFO - Step 28610: lr=1.00E-05, loss= 1.2015 (max= 1.6413), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:28,667 - root - INFO - Step 28610: lr=1.00E-05, loss= 1.2015 (max= 1.6413), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:28,667 - root - INFO - Step 28610: lr=1.00E-05, loss= 1.2015 (max= 1.6413), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:28,667 - root - INFO - Step 28610: lr=1.00E-05, loss= 1.2015 (max= 1.6413), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:44,595 - root - INFO - Step 28620: lr=1.00E-05, loss= 1.1902 (max= 1.4995), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:44,595 - root - INFO - Step 28620: lr=1.00E-05, loss= 1.1902 (max= 1.4995), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:44,595 - root - INFO - Step 28620: lr=1.00E-05, loss= 1.1902 (max= 1.4995), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:44,595 - root - INFO - Step 28620: lr=1.00E-05, loss= 1.1902 (max= 1.4995), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:44,595 - root - INFO - Step 28620: lr=1.00E-05, loss= 1.1902 (max= 1.4995), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:44,595 - root - INFO - Step 28620: lr=1.00E-05, loss= 1.1902 (max= 1.4995), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:44,595 - root - INFO - Step 28620: lr=1.00E-05, loss= 1.1902 (max= 1.4995), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:15:44,595 - root - INFO - Step 28620: lr=1.00E-05, loss= 1.1902 (max= 1.4995), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:00,494 - root - INFO - Step 28630: lr=1.00E-05, loss= 1.1636 (max= 1.7165), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:00,495 - root - INFO - Step 28630: lr=1.00E-05, loss= 1.1636 (max= 1.7165), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:00,495 - root - INFO - Step 28630: lr=1.00E-05, loss= 1.1636 (max= 1.7165), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:00,495 - root - INFO - Step 28630: lr=1.00E-05, loss= 1.1636 (max= 1.7165), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:00,495 - root - INFO - Step 28630: lr=1.00E-05, loss= 1.1636 (max= 1.7165), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:00,495 - root - INFO - Step 28630: lr=1.00E-05, loss= 1.1636 (max= 1.7165), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:00,495 - root - INFO - Step 28630: lr=1.00E-05, loss= 1.1636 (max= 1.7165), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:00,495 - root - INFO - Step 28630: lr=1.00E-05, loss= 1.1636 (max= 1.7165), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:16,423 - root - INFO - Step 28640: lr=1.00E-05, loss= 1.1810 (max= 1.6937), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:16:16,423 - root - INFO - Step 28640: lr=1.00E-05, loss= 1.1810 (max= 1.6937), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:16:16,423 - root - INFO - Step 28640: lr=1.00E-05, loss= 1.1810 (max= 1.6937), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:16:16,423 - root - INFO - Step 28640: lr=1.00E-05, loss= 1.1810 (max= 1.6937), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:16:16,423 - root - INFO - Step 28640: lr=1.00E-05, loss= 1.1810 (max= 1.6937), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:16:16,423 - root - INFO - Step 28640: lr=1.00E-05, loss= 1.1810 (max= 1.6937), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:16:16,423 - root - INFO - Step 28640: lr=1.00E-05, loss= 1.1810 (max= 1.6937), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:16:16,423 - root - INFO - Step 28640: lr=1.00E-05, loss= 1.1810 (max= 1.6937), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:16:32,399 - root - INFO - Step 28650: lr=1.00E-05, loss= 1.2134 (max= 1.5991), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:32,399 - root - INFO - Step 28650: lr=1.00E-05, loss= 1.2134 (max= 1.5991), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:32,399 - root - INFO - Step 28650: lr=1.00E-05, loss= 1.2134 (max= 1.5991), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:32,399 - root - INFO - Step 28650: lr=1.00E-05, loss= 1.2134 (max= 1.5991), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:32,399 - root - INFO - Step 28650: lr=1.00E-05, loss= 1.2134 (max= 1.5991), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:32,399 - root - INFO - Step 28650: lr=1.00E-05, loss= 1.2134 (max= 1.5991), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:32,400 - root - INFO - Step 28650: lr=1.00E-05, loss= 1.2134 (max= 1.5991), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:32,400 - root - INFO - Step 28650: lr=1.00E-05, loss= 1.2134 (max= 1.5991), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:48,352 - root - INFO - Step 28660: lr=1.00E-05, loss= 1.2097 (max= 1.6187), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:48,352 - root - INFO - Step 28660: lr=1.00E-05, loss= 1.2097 (max= 1.6187), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:48,352 - root - INFO - Step 28660: lr=1.00E-05, loss= 1.2097 (max= 1.6187), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:48,352 - root - INFO - Step 28660: lr=1.00E-05, loss= 1.2097 (max= 1.6187), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:48,352 - root - INFO - Step 28660: lr=1.00E-05, loss= 1.2097 (max= 1.6187), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:48,352 - root - INFO - Step 28660: lr=1.00E-05, loss= 1.2097 (max= 1.6187), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:48,352 - root - INFO - Step 28660: lr=1.00E-05, loss= 1.2097 (max= 1.6187), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:16:48,352 - root - INFO - Step 28660: lr=1.00E-05, loss= 1.2097 (max= 1.6187), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:04,268 - root - INFO - Step 28670: lr=1.00E-05, loss= 1.1916 (max= 1.5363), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:04,268 - root - INFO - Step 28670: lr=1.00E-05, loss= 1.1916 (max= 1.5363), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:04,268 - root - INFO - Step 28670: lr=1.00E-05, loss= 1.1916 (max= 1.5363), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:04,268 - root - INFO - Step 28670: lr=1.00E-05, loss= 1.1916 (max= 1.5363), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:04,268 - root - INFO - Step 28670: lr=1.00E-05, loss= 1.1916 (max= 1.5363), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:04,268 - root - INFO - Step 28670: lr=1.00E-05, loss= 1.1916 (max= 1.5363), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:04,269 - root - INFO - Step 28670: lr=1.00E-05, loss= 1.1916 (max= 1.5363), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:04,269 - root - INFO - Step 28670: lr=1.00E-05, loss= 1.1916 (max= 1.5363), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:20,190 - root - INFO - Step 28680: lr=1.00E-05, loss= 1.1872 (max= 1.5892), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:20,190 - root - INFO - Step 28680: lr=1.00E-05, loss= 1.1872 (max= 1.5892), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:20,190 - root - INFO - Step 28680: lr=1.00E-05, loss= 1.1872 (max= 1.5892), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:20,190 - root - INFO - Step 28680: lr=1.00E-05, loss= 1.1872 (max= 1.5892), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:20,190 - root - INFO - Step 28680: lr=1.00E-05, loss= 1.1872 (max= 1.5892), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:20,190 - root - INFO - Step 28680: lr=1.00E-05, loss= 1.1872 (max= 1.5892), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:20,190 - root - INFO - Step 28680: lr=1.00E-05, loss= 1.1872 (max= 1.5892), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:20,190 - root - INFO - Step 28680: lr=1.00E-05, loss= 1.1872 (max= 1.5892), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:36,143 - root - INFO - Step 28690: lr=1.00E-05, loss= 1.1859 (max= 1.7166), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:36,143 - root - INFO - Step 28690: lr=1.00E-05, loss= 1.1859 (max= 1.7166), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:36,143 - root - INFO - Step 28690: lr=1.00E-05, loss= 1.1859 (max= 1.7166), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:36,143 - root - INFO - Step 28690: lr=1.00E-05, loss= 1.1859 (max= 1.7166), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:36,143 - root - INFO - Step 28690: lr=1.00E-05, loss= 1.1859 (max= 1.7166), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:36,143 - root - INFO - Step 28690: lr=1.00E-05, loss= 1.1859 (max= 1.7166), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:36,143 - root - INFO - Step 28690: lr=1.00E-05, loss= 1.1859 (max= 1.7166), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:36,143 - root - INFO - Step 28690: lr=1.00E-05, loss= 1.1859 (max= 1.7166), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:17:52,067 - root - INFO - Step 28700: lr=1.00E-05, loss= 1.2145 (max= 1.6085), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:17:52,067 - root - INFO - Step 28700: lr=1.00E-05, loss= 1.2145 (max= 1.6085), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:17:52,067 - root - INFO - Step 28700: lr=1.00E-05, loss= 1.2145 (max= 1.6085), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:17:52,067 - root - INFO - Step 28700: lr=1.00E-05, loss= 1.2145 (max= 1.6085), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:17:52,068 - root - INFO - Step 28700: lr=1.00E-05, loss= 1.2145 (max= 1.6085), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:17:52,068 - root - INFO - Step 28700: lr=1.00E-05, loss= 1.2145 (max= 1.6085), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:17:52,068 - root - INFO - Step 28700: lr=1.00E-05, loss= 1.2145 (max= 1.6085), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:17:52,068 - root - INFO - Step 28700: lr=1.00E-05, loss= 1.2145 (max= 1.6085), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:08,031 - root - INFO - Step 28710: lr=1.00E-05, loss= 1.2308 (max= 1.7891), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:18:08,031 - root - INFO - Step 28710: lr=1.00E-05, loss= 1.2308 (max= 1.7891), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:18:08,031 - root - INFO - Step 28710: lr=1.00E-05, loss= 1.2308 (max= 1.7891), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:18:08,031 - root - INFO - Step 28710: lr=1.00E-05, loss= 1.2308 (max= 1.7891), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:18:08,031 - root - INFO - Step 28710: lr=1.00E-05, loss= 1.2308 (max= 1.7891), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:18:08,031 - root - INFO - Step 28710: lr=1.00E-05, loss= 1.2308 (max= 1.7891), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:18:08,031 - root - INFO - Step 28710: lr=1.00E-05, loss= 1.2308 (max= 1.7891), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:18:08,032 - root - INFO - Step 28710: lr=1.00E-05, loss= 1.2308 (max= 1.7891), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:18:23,991 - root - INFO - Step 28720: lr=1.00E-05, loss= 1.1686 (max= 1.4834), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:23,991 - root - INFO - Step 28720: lr=1.00E-05, loss= 1.1686 (max= 1.4834), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:23,991 - root - INFO - Step 28720: lr=1.00E-05, loss= 1.1686 (max= 1.4834), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:23,991 - root - INFO - Step 28720: lr=1.00E-05, loss= 1.1686 (max= 1.4834), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:23,991 - root - INFO - Step 28720: lr=1.00E-05, loss= 1.1686 (max= 1.4834), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:23,991 - root - INFO - Step 28720: lr=1.00E-05, loss= 1.1686 (max= 1.4834), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:23,991 - root - INFO - Step 28720: lr=1.00E-05, loss= 1.1686 (max= 1.4834), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:23,991 - root - INFO - Step 28720: lr=1.00E-05, loss= 1.1686 (max= 1.4834), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:39,988 - root - INFO - Step 28730: lr=1.00E-05, loss= 1.1673 (max= 1.5217), tps=20488, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:39,988 - root - INFO - Step 28730: lr=1.00E-05, loss= 1.1673 (max= 1.5217), tps=20487, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:39,988 - root - INFO - Step 28730: lr=1.00E-05, loss= 1.1673 (max= 1.5217), tps=20487, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:39,988 - root - INFO - Step 28730: lr=1.00E-05, loss= 1.1673 (max= 1.5217), tps=20488, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:39,988 - root - INFO - Step 28730: lr=1.00E-05, loss= 1.1673 (max= 1.5217), tps=20488, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:39,988 - root - INFO - Step 28730: lr=1.00E-05, loss= 1.1673 (max= 1.5217), tps=20488, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:39,988 - root - INFO - Step 28730: lr=1.00E-05, loss= 1.1673 (max= 1.5217), tps=20488, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:39,989 - root - INFO - Step 28730: lr=1.00E-05, loss= 1.1673 (max= 1.5217), tps=20487, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:55,942 - root - INFO - Step 28740: lr=1.00E-05, loss= 1.1809 (max= 1.5481), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:55,942 - root - INFO - Step 28740: lr=1.00E-05, loss= 1.1809 (max= 1.5481), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:55,942 - root - INFO - Step 28740: lr=1.00E-05, loss= 1.1809 (max= 1.5481), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:55,942 - root - INFO - Step 28740: lr=1.00E-05, loss= 1.1809 (max= 1.5481), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:55,942 - root - INFO - Step 28740: lr=1.00E-05, loss= 1.1809 (max= 1.5481), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:55,942 - root - INFO - Step 28740: lr=1.00E-05, loss= 1.1809 (max= 1.5481), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:55,942 - root - INFO - Step 28740: lr=1.00E-05, loss= 1.1809 (max= 1.5481), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:18:55,943 - root - INFO - Step 28740: lr=1.00E-05, loss= 1.1809 (max= 1.5481), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:19:11,917 - root - INFO - Step 28750: lr=1.00E-05, loss= 1.1798 (max= 1.8228), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:11,917 - root - INFO - Step 28750: lr=1.00E-05, loss= 1.1798 (max= 1.8228), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:11,917 - root - INFO - Step 28750: lr=1.00E-05, loss= 1.1798 (max= 1.8228), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:11,917 - root - INFO - Step 28750: lr=1.00E-05, loss= 1.1798 (max= 1.8228), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:11,918 - root - INFO - Step 28750: lr=1.00E-05, loss= 1.1798 (max= 1.8228), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:11,918 - root - INFO - Step 28750: lr=1.00E-05, loss= 1.1798 (max= 1.8228), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:11,918 - root - INFO - Step 28750: lr=1.00E-05, loss= 1.1798 (max= 1.8228), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:11,918 - root - INFO - Step 28750: lr=1.00E-05, loss= 1.1798 (max= 1.8228), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:27,817 - root - INFO - Step 28760: lr=1.00E-05, loss= 1.1716 (max= 1.6010), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:27,817 - root - INFO - Step 28760: lr=1.00E-05, loss= 1.1716 (max= 1.6010), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:27,817 - root - INFO - Step 28760: lr=1.00E-05, loss= 1.1716 (max= 1.6010), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:27,817 - root - INFO - Step 28760: lr=1.00E-05, loss= 1.1716 (max= 1.6010), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:27,817 - root - INFO - Step 28760: lr=1.00E-05, loss= 1.1716 (max= 1.6010), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:27,817 - root - INFO - Step 28760: lr=1.00E-05, loss= 1.1716 (max= 1.6010), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:27,817 - root - INFO - Step 28760: lr=1.00E-05, loss= 1.1716 (max= 1.6010), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:27,817 - root - INFO - Step 28760: lr=1.00E-05, loss= 1.1716 (max= 1.6010), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:43,755 - root - INFO - Step 28770: lr=1.00E-05, loss= 1.1699 (max= 1.5461), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:43,755 - root - INFO - Step 28770: lr=1.00E-05, loss= 1.1699 (max= 1.5461), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:43,755 - root - INFO - Step 28770: lr=1.00E-05, loss= 1.1699 (max= 1.5461), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:43,755 - root - INFO - Step 28770: lr=1.00E-05, loss= 1.1699 (max= 1.5461), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:43,756 - root - INFO - Step 28770: lr=1.00E-05, loss= 1.1699 (max= 1.5461), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:43,756 - root - INFO - Step 28770: lr=1.00E-05, loss= 1.1699 (max= 1.5461), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:43,756 - root - INFO - Step 28770: lr=1.00E-05, loss= 1.1699 (max= 1.5461), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:43,756 - root - INFO - Step 28770: lr=1.00E-05, loss= 1.1699 (max= 1.5461), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:59,694 - root - INFO - Step 28780: lr=1.00E-05, loss= 1.1752 (max= 1.5616), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:59,694 - root - INFO - Step 28780: lr=1.00E-05, loss= 1.1752 (max= 1.5616), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:59,694 - root - INFO - Step 28780: lr=1.00E-05, loss= 1.1752 (max= 1.5616), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:59,694 - root - INFO - Step 28780: lr=1.00E-05, loss= 1.1752 (max= 1.5616), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:59,694 - root - INFO - Step 28780: lr=1.00E-05, loss= 1.1752 (max= 1.5616), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:59,694 - root - INFO - Step 28780: lr=1.00E-05, loss= 1.1752 (max= 1.5616), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:59,694 - root - INFO - Step 28780: lr=1.00E-05, loss= 1.1752 (max= 1.5616), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:19:59,694 - root - INFO - Step 28780: lr=1.00E-05, loss= 1.1752 (max= 1.5616), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:15,620 - root - INFO - Step 28790: lr=1.00E-05, loss= 1.1739 (max= 1.6866), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:15,620 - root - INFO - Step 28790: lr=1.00E-05, loss= 1.1739 (max= 1.6866), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:15,620 - root - INFO - Step 28790: lr=1.00E-05, loss= 1.1739 (max= 1.6866), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:15,620 - root - INFO - Step 28790: lr=1.00E-05, loss= 1.1739 (max= 1.6866), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:15,621 - root - INFO - Step 28790: lr=1.00E-05, loss= 1.1739 (max= 1.6866), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:15,621 - root - INFO - Step 28790: lr=1.00E-05, loss= 1.1739 (max= 1.6866), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:15,621 - root - INFO - Step 28790: lr=1.00E-05, loss= 1.1739 (max= 1.6866), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:15,621 - root - INFO - Step 28790: lr=1.00E-05, loss= 1.1739 (max= 1.6866), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:31,521 - root - INFO - Step 28800: lr=1.00E-05, loss= 1.1922 (max= 1.6126), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:31,521 - root - INFO - Step 28800: lr=1.00E-05, loss= 1.1922 (max= 1.6126), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:31,521 - root - INFO - Step 28800: lr=1.00E-05, loss= 1.1922 (max= 1.6126), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:31,521 - root - INFO - Step 28800: lr=1.00E-05, loss= 1.1922 (max= 1.6126), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:31,521 - root - INFO - Step 28800: lr=1.00E-05, loss= 1.1922 (max= 1.6126), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:31,521 - root - INFO - Step 28800: lr=1.00E-05, loss= 1.1922 (max= 1.6126), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:31,521 - root - INFO - Step 28800: lr=1.00E-05, loss= 1.1922 (max= 1.6126), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:31,521 - root - INFO - Step 28800: lr=1.00E-05, loss= 1.1922 (max= 1.6126), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:47,428 - root - INFO - Step 28810: lr=1.00E-05, loss= 1.2306 (max= 1.7293), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:47,428 - root - INFO - Step 28810: lr=1.00E-05, loss= 1.2306 (max= 1.7293), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:47,428 - root - INFO - Step 28810: lr=1.00E-05, loss= 1.2306 (max= 1.7293), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:47,428 - root - INFO - Step 28810: lr=1.00E-05, loss= 1.2306 (max= 1.7293), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:47,429 - root - INFO - Step 28810: lr=1.00E-05, loss= 1.2306 (max= 1.7293), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:47,429 - root - INFO - Step 28810: lr=1.00E-05, loss= 1.2306 (max= 1.7293), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:47,429 - root - INFO - Step 28810: lr=1.00E-05, loss= 1.2306 (max= 1.7293), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:20:47,429 - root - INFO - Step 28810: lr=1.00E-05, loss= 1.2306 (max= 1.7293), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:03,363 - root - INFO - Step 28820: lr=1.00E-05, loss= 1.1906 (max= 1.5890), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:03,363 - root - INFO - Step 28820: lr=1.00E-05, loss= 1.1906 (max= 1.5890), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:03,363 - root - INFO - Step 28820: lr=1.00E-05, loss= 1.1906 (max= 1.5890), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:03,363 - root - INFO - Step 28820: lr=1.00E-05, loss= 1.1906 (max= 1.5890), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:03,363 - root - INFO - Step 28820: lr=1.00E-05, loss= 1.1906 (max= 1.5890), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:03,363 - root - INFO - Step 28820: lr=1.00E-05, loss= 1.1906 (max= 1.5890), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:03,363 - root - INFO - Step 28820: lr=1.00E-05, loss= 1.1906 (max= 1.5890), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:03,363 - root - INFO - Step 28820: lr=1.00E-05, loss= 1.1906 (max= 1.5890), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:19,281 - root - INFO - Step 28830: lr=1.00E-05, loss= 1.2318 (max= 1.6278), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:19,281 - root - INFO - Step 28830: lr=1.00E-05, loss= 1.2318 (max= 1.6278), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:19,281 - root - INFO - Step 28830: lr=1.00E-05, loss= 1.2318 (max= 1.6278), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:19,281 - root - INFO - Step 28830: lr=1.00E-05, loss= 1.2318 (max= 1.6278), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:19,281 - root - INFO - Step 28830: lr=1.00E-05, loss= 1.2318 (max= 1.6278), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:19,281 - root - INFO - Step 28830: lr=1.00E-05, loss= 1.2318 (max= 1.6278), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:19,281 - root - INFO - Step 28830: lr=1.00E-05, loss= 1.2318 (max= 1.6278), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:19,281 - root - INFO - Step 28830: lr=1.00E-05, loss= 1.2318 (max= 1.6278), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:35,245 - root - INFO - Step 28840: lr=1.00E-05, loss= 1.1712 (max= 1.6869), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:35,245 - root - INFO - Step 28840: lr=1.00E-05, loss= 1.1712 (max= 1.6869), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:35,245 - root - INFO - Step 28840: lr=1.00E-05, loss= 1.1712 (max= 1.6869), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:35,245 - root - INFO - Step 28840: lr=1.00E-05, loss= 1.1712 (max= 1.6869), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:35,245 - root - INFO - Step 28840: lr=1.00E-05, loss= 1.1712 (max= 1.6869), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:35,245 - root - INFO - Step 28840: lr=1.00E-05, loss= 1.1712 (max= 1.6869), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:35,245 - root - INFO - Step 28840: lr=1.00E-05, loss= 1.1712 (max= 1.6869), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:35,245 - root - INFO - Step 28840: lr=1.00E-05, loss= 1.1712 (max= 1.6869), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:51,192 - root - INFO - Step 28850: lr=1.00E-05, loss= 1.1952 (max= 1.7158), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:51,192 - root - INFO - Step 28850: lr=1.00E-05, loss= 1.1952 (max= 1.7158), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:51,192 - root - INFO - Step 28850: lr=1.00E-05, loss= 1.1952 (max= 1.7158), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:51,193 - root - INFO - Step 28850: lr=1.00E-05, loss= 1.1952 (max= 1.7158), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:51,193 - root - INFO - Step 28850: lr=1.00E-05, loss= 1.1952 (max= 1.7158), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:51,193 - root - INFO - Step 28850: lr=1.00E-05, loss= 1.1952 (max= 1.7158), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:51,193 - root - INFO - Step 28850: lr=1.00E-05, loss= 1.1952 (max= 1.7158), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:21:51,193 - root - INFO - Step 28850: lr=1.00E-05, loss= 1.1952 (max= 1.7158), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:07,136 - root - INFO - Step 28860: lr=1.00E-05, loss= 1.2287 (max= 1.7112), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:07,136 - root - INFO - Step 28860: lr=1.00E-05, loss= 1.2287 (max= 1.7112), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:07,136 - root - INFO - Step 28860: lr=1.00E-05, loss= 1.2287 (max= 1.7112), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:07,136 - root - INFO - Step 28860: lr=1.00E-05, loss= 1.2287 (max= 1.7112), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:07,136 - root - INFO - Step 28860: lr=1.00E-05, loss= 1.2287 (max= 1.7112), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:07,137 - root - INFO - Step 28860: lr=1.00E-05, loss= 1.2287 (max= 1.7112), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:07,137 - root - INFO - Step 28860: lr=1.00E-05, loss= 1.2287 (max= 1.7112), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:07,137 - root - INFO - Step 28860: lr=1.00E-05, loss= 1.2287 (max= 1.7112), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:23,102 - root - INFO - Step 28870: lr=1.00E-05, loss= 1.2013 (max= 1.9308), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:22:23,102 - root - INFO - Step 28870: lr=1.00E-05, loss= 1.2013 (max= 1.9308), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:22:23,102 - root - INFO - Step 28870: lr=1.00E-05, loss= 1.2013 (max= 1.9308), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:22:23,102 - root - INFO - Step 28870: lr=1.00E-05, loss= 1.2013 (max= 1.9308), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:22:23,102 - root - INFO - Step 28870: lr=1.00E-05, loss= 1.2013 (max= 1.9308), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:22:23,102 - root - INFO - Step 28870: lr=1.00E-05, loss= 1.2013 (max= 1.9308), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:22:23,102 - root - INFO - Step 28870: lr=1.00E-05, loss= 1.2013 (max= 1.9308), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:22:23,103 - root - INFO - Step 28870: lr=1.00E-05, loss= 1.2013 (max= 1.9308), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:22:39,019 - root - INFO - Step 28880: lr=1.00E-05, loss= 1.2065 (max= 1.6891), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:39,019 - root - INFO - Step 28880: lr=1.00E-05, loss= 1.2065 (max= 1.6891), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:39,019 - root - INFO - Step 28880: lr=1.00E-05, loss= 1.2065 (max= 1.6891), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:39,019 - root - INFO - Step 28880: lr=1.00E-05, loss= 1.2065 (max= 1.6891), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:39,019 - root - INFO - Step 28880: lr=1.00E-05, loss= 1.2065 (max= 1.6891), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:39,019 - root - INFO - Step 28880: lr=1.00E-05, loss= 1.2065 (max= 1.6891), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:39,019 - root - INFO - Step 28880: lr=1.00E-05, loss= 1.2065 (max= 1.6891), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:39,019 - root - INFO - Step 28880: lr=1.00E-05, loss= 1.2065 (max= 1.6891), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:54,943 - root - INFO - Step 28890: lr=1.00E-05, loss= 1.2052 (max= 1.5733), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:54,943 - root - INFO - Step 28890: lr=1.00E-05, loss= 1.2052 (max= 1.5733), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:54,943 - root - INFO - Step 28890: lr=1.00E-05, loss= 1.2052 (max= 1.5733), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:54,943 - root - INFO - Step 28890: lr=1.00E-05, loss= 1.2052 (max= 1.5733), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:54,943 - root - INFO - Step 28890: lr=1.00E-05, loss= 1.2052 (max= 1.5733), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:54,943 - root - INFO - Step 28890: lr=1.00E-05, loss= 1.2052 (max= 1.5733), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:54,943 - root - INFO - Step 28890: lr=1.00E-05, loss= 1.2052 (max= 1.5733), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:22:54,943 - root - INFO - Step 28890: lr=1.00E-05, loss= 1.2052 (max= 1.5733), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:10,899 - root - INFO - Step 28900: lr=1.00E-05, loss= 1.1727 (max= 1.5463), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:10,899 - root - INFO - Step 28900: lr=1.00E-05, loss= 1.1727 (max= 1.5463), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:10,899 - root - INFO - Step 28900: lr=1.00E-05, loss= 1.1727 (max= 1.5463), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:10,900 - root - INFO - Step 28900: lr=1.00E-05, loss= 1.1727 (max= 1.5463), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:10,900 - root - INFO - Step 28900: lr=1.00E-05, loss= 1.1727 (max= 1.5463), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:10,900 - root - INFO - Step 28900: lr=1.00E-05, loss= 1.1727 (max= 1.5463), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:10,900 - root - INFO - Step 28900: lr=1.00E-05, loss= 1.1727 (max= 1.5463), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:10,900 - root - INFO - Step 28900: lr=1.00E-05, loss= 1.1727 (max= 1.5463), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:26,794 - root - INFO - Step 28910: lr=1.00E-05, loss= 1.2156 (max= 1.7578), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:23:26,794 - root - INFO - Step 28910: lr=1.00E-05, loss= 1.2156 (max= 1.7578), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:23:26,794 - root - INFO - Step 28910: lr=1.00E-05, loss= 1.2156 (max= 1.7578), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:23:26,794 - root - INFO - Step 28910: lr=1.00E-05, loss= 1.2156 (max= 1.7578), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:23:26,794 - root - INFO - Step 28910: lr=1.00E-05, loss= 1.2156 (max= 1.7578), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:23:26,794 - root - INFO - Step 28910: lr=1.00E-05, loss= 1.2156 (max= 1.7578), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:23:26,794 - root - INFO - Step 28910: lr=1.00E-05, loss= 1.2156 (max= 1.7578), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:23:26,794 - root - INFO - Step 28910: lr=1.00E-05, loss= 1.2156 (max= 1.7578), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:23:42,742 - root - INFO - Step 28920: lr=1.00E-05, loss= 1.1568 (max= 1.5483), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:42,742 - root - INFO - Step 28920: lr=1.00E-05, loss= 1.1568 (max= 1.5483), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:42,742 - root - INFO - Step 28920: lr=1.00E-05, loss= 1.1568 (max= 1.5483), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:42,742 - root - INFO - Step 28920: lr=1.00E-05, loss= 1.1568 (max= 1.5483), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:42,742 - root - INFO - Step 28920: lr=1.00E-05, loss= 1.1568 (max= 1.5483), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:42,742 - root - INFO - Step 28920: lr=1.00E-05, loss= 1.1568 (max= 1.5483), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:42,742 - root - INFO - Step 28920: lr=1.00E-05, loss= 1.1568 (max= 1.5483), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:42,742 - root - INFO - Step 28920: lr=1.00E-05, loss= 1.1568 (max= 1.5483), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:58,705 - root - INFO - Step 28930: lr=1.00E-05, loss= 1.1928 (max= 1.5525), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:58,705 - root - INFO - Step 28930: lr=1.00E-05, loss= 1.1928 (max= 1.5525), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:58,705 - root - INFO - Step 28930: lr=1.00E-05, loss= 1.1928 (max= 1.5525), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:58,705 - root - INFO - Step 28930: lr=1.00E-05, loss= 1.1928 (max= 1.5525), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:58,705 - root - INFO - Step 28930: lr=1.00E-05, loss= 1.1928 (max= 1.5525), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:58,705 - root - INFO - Step 28930: lr=1.00E-05, loss= 1.1928 (max= 1.5525), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:58,705 - root - INFO - Step 28930: lr=1.00E-05, loss= 1.1928 (max= 1.5525), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:23:58,706 - root - INFO - Step 28930: lr=1.00E-05, loss= 1.1928 (max= 1.5525), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:14,624 - root - INFO - Step 28940: lr=1.00E-05, loss= 1.1724 (max= 1.9806), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:14,624 - root - INFO - Step 28940: lr=1.00E-05, loss= 1.1724 (max= 1.9806), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:14,624 - root - INFO - Step 28940: lr=1.00E-05, loss= 1.1724 (max= 1.9806), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:14,624 - root - INFO - Step 28940: lr=1.00E-05, loss= 1.1724 (max= 1.9806), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:14,625 - root - INFO - Step 28940: lr=1.00E-05, loss= 1.1724 (max= 1.9806), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:14,625 - root - INFO - Step 28940: lr=1.00E-05, loss= 1.1724 (max= 1.9806), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:14,625 - root - INFO - Step 28940: lr=1.00E-05, loss= 1.1724 (max= 1.9806), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:14,625 - root - INFO - Step 28940: lr=1.00E-05, loss= 1.1724 (max= 1.9806), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:30,570 - root - INFO - Step 28950: lr=1.00E-05, loss= 1.2104 (max= 1.8184), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:30,570 - root - INFO - Step 28950: lr=1.00E-05, loss= 1.2104 (max= 1.8184), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:30,570 - root - INFO - Step 28950: lr=1.00E-05, loss= 1.2104 (max= 1.8184), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:30,570 - root - INFO - Step 28950: lr=1.00E-05, loss= 1.2104 (max= 1.8184), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:30,571 - root - INFO - Step 28950: lr=1.00E-05, loss= 1.2104 (max= 1.8184), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:30,571 - root - INFO - Step 28950: lr=1.00E-05, loss= 1.2104 (max= 1.8184), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:30,571 - root - INFO - Step 28950: lr=1.00E-05, loss= 1.2104 (max= 1.8184), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:30,571 - root - INFO - Step 28950: lr=1.00E-05, loss= 1.2104 (max= 1.8184), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:46,537 - root - INFO - Step 28960: lr=1.00E-05, loss= 1.2054 (max= 1.6436), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:46,537 - root - INFO - Step 28960: lr=1.00E-05, loss= 1.2054 (max= 1.6436), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:46,537 - root - INFO - Step 28960: lr=1.00E-05, loss= 1.2054 (max= 1.6436), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:46,537 - root - INFO - Step 28960: lr=1.00E-05, loss= 1.2054 (max= 1.6436), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:46,537 - root - INFO - Step 28960: lr=1.00E-05, loss= 1.2054 (max= 1.6436), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:46,537 - root - INFO - Step 28960: lr=1.00E-05, loss= 1.2054 (max= 1.6436), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:46,537 - root - INFO - Step 28960: lr=1.00E-05, loss= 1.2054 (max= 1.6436), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:24:46,537 - root - INFO - Step 28960: lr=1.00E-05, loss= 1.2054 (max= 1.6436), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:02,466 - root - INFO - Step 28970: lr=1.00E-05, loss= 1.1614 (max= 1.6060), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:02,466 - root - INFO - Step 28970: lr=1.00E-05, loss= 1.1614 (max= 1.6060), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:02,466 - root - INFO - Step 28970: lr=1.00E-05, loss= 1.1614 (max= 1.6060), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:02,466 - root - INFO - Step 28970: lr=1.00E-05, loss= 1.1614 (max= 1.6060), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:02,466 - root - INFO - Step 28970: lr=1.00E-05, loss= 1.1614 (max= 1.6060), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:02,466 - root - INFO - Step 28970: lr=1.00E-05, loss= 1.1614 (max= 1.6060), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:02,466 - root - INFO - Step 28970: lr=1.00E-05, loss= 1.1614 (max= 1.6060), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:02,466 - root - INFO - Step 28970: lr=1.00E-05, loss= 1.1614 (max= 1.6060), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:18,413 - root - INFO - Step 28980: lr=1.00E-05, loss= 1.1861 (max= 1.5460), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:18,413 - root - INFO - Step 28980: lr=1.00E-05, loss= 1.1861 (max= 1.5460), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:18,413 - root - INFO - Step 28980: lr=1.00E-05, loss= 1.1861 (max= 1.5460), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:18,413 - root - INFO - Step 28980: lr=1.00E-05, loss= 1.1861 (max= 1.5460), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:18,413 - root - INFO - Step 28980: lr=1.00E-05, loss= 1.1861 (max= 1.5460), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:18,413 - root - INFO - Step 28980: lr=1.00E-05, loss= 1.1861 (max= 1.5460), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:18,413 - root - INFO - Step 28980: lr=1.00E-05, loss= 1.1861 (max= 1.5460), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:18,413 - root - INFO - Step 28980: lr=1.00E-05, loss= 1.1861 (max= 1.5460), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:34,370 - root - INFO - Step 28990: lr=1.00E-05, loss= 1.1998 (max= 1.5153), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:34,370 - root - INFO - Step 28990: lr=1.00E-05, loss= 1.1998 (max= 1.5153), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:34,370 - root - INFO - Step 28990: lr=1.00E-05, loss= 1.1998 (max= 1.5153), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:34,370 - root - INFO - Step 28990: lr=1.00E-05, loss= 1.1998 (max= 1.5153), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:34,371 - root - INFO - Step 28990: lr=1.00E-05, loss= 1.1998 (max= 1.5153), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:34,371 - root - INFO - Step 28990: lr=1.00E-05, loss= 1.1998 (max= 1.5153), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:34,371 - root - INFO - Step 28990: lr=1.00E-05, loss= 1.1998 (max= 1.5153), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:34,371 - root - INFO - Step 28990: lr=1.00E-05, loss= 1.1998 (max= 1.5153), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-29000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-29000! Save time: 4.3778393268585205 +2025-10-24 23:25:50,301 - root - INFO - Step 29000: lr=1.00E-05, loss= 1.1884 (max= 1.6507), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:50,301 - root - INFO - Step 29000: lr=1.00E-05, loss= 1.1884 (max= 1.6507), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:50,301 - root - INFO - Saving a full checkpoint at step 29000 +2025-10-24 23:25:50,301 - root - INFO - Step 29000: lr=1.00E-05, loss= 1.1884 (max= 1.6507), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:50,301 - root - INFO - Step 29000: lr=1.00E-05, loss= 1.1884 (max= 1.6507), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:50,301 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 23:25:50,301 - root - INFO - Saving a full checkpoint at step 29000 +2025-10-24 23:25:50,301 - root - INFO - Saving a full checkpoint at step 29000 +2025-10-24 23:25:50,301 - root - INFO - Saving a full checkpoint at step 29000 +2025-10-24 23:25:50,301 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 23:25:50,301 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 23:25:50,301 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 23:25:50,301 - root - INFO - Step 29000: lr=1.00E-05, loss= 1.1884 (max= 1.6507), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:50,301 - root - INFO - Step 29000: lr=1.00E-05, loss= 1.1884 (max= 1.6507), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:50,301 - root - INFO - Saving a full checkpoint at step 29000 +2025-10-24 23:25:50,301 - root - INFO - Saving a full checkpoint at step 29000 +2025-10-24 23:25:50,301 - root - INFO - Step 29000: lr=1.00E-05, loss= 1.1884 (max= 1.6507), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:50,301 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 23:25:50,301 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 23:25:50,301 - root - INFO - Step 29000: lr=1.00E-05, loss= 1.1884 (max= 1.6507), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:25:50,302 - root - INFO - Saving a full checkpoint at step 29000 +2025-10-24 23:25:50,302 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 23:25:50,302 - root - INFO - Saving a full checkpoint at step 29000 +2025-10-24 23:25:50,302 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 23:26:05,551 - root - INFO - Finished saving the checkpoint in 15.25 seconds +2025-10-24 23:26:05,557 - root - INFO - Finished saving the checkpoint in 15.26 seconds +2025-10-24 23:26:05,558 - root - INFO - Finished saving the checkpoint in 15.26 seconds +2025-10-24 23:26:05,558 - root - INFO - Finished saving the checkpoint in 15.26 seconds +2025-10-24 23:26:05,558 - root - INFO - Finished saving the checkpoint in 15.26 seconds +2025-10-24 23:26:05,558 - root - INFO - Finished saving the checkpoint in 15.26 seconds +2025-10-24 23:26:05,559 - root - INFO - Finished saving the checkpoint in 15.26 seconds +2025-10-24 23:26:05,559 - root - INFO - Finished saving the checkpoint in 15.26 seconds +2025-10-24 23:26:21,464 - root - INFO - Step 29010: lr=1.00E-05, loss= 1.1675 (max= 1.7738), tps=10516, mfu=21.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:26:21,464 - root - INFO - Step 29010: lr=1.00E-05, loss= 1.1675 (max= 1.7738), tps=10516, mfu=21.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:26:21,464 - root - INFO - Step 29010: lr=1.00E-05, loss= 1.1675 (max= 1.7738), tps=10516, mfu=21.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:26:21,464 - root - INFO - Step 29010: lr=1.00E-05, loss= 1.1675 (max= 1.7738), tps=10516, mfu=21.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:26:21,464 - root - INFO - Step 29010: lr=1.00E-05, loss= 1.1675 (max= 1.7738), tps=10516, mfu=21.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:26:21,464 - root - INFO - Step 29010: lr=1.00E-05, loss= 1.1675 (max= 1.7738), tps=10516, mfu=21.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:26:21,464 - root - INFO - Step 29010: lr=1.00E-05, loss= 1.1675 (max= 1.7738), tps=10516, mfu=21.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:26:21,464 - root - INFO - Step 29010: lr=1.00E-05, loss= 1.1675 (max= 1.7738), tps=10516, mfu=21.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:26:37,337 - root - INFO - Step 29020: lr=1.00E-05, loss= 1.1995 (max= 1.6575), tps=20648, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:26:37,337 - root - INFO - Step 29020: lr=1.00E-05, loss= 1.1995 (max= 1.6575), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:26:37,337 - root - INFO - Step 29020: lr=1.00E-05, loss= 1.1995 (max= 1.6575), tps=20648, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:26:37,337 - root - INFO - Step 29020: lr=1.00E-05, loss= 1.1995 (max= 1.6575), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:26:37,337 - root - INFO - Step 29020: lr=1.00E-05, loss= 1.1995 (max= 1.6575), tps=20648, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:26:37,338 - root - INFO - Step 29020: lr=1.00E-05, loss= 1.1995 (max= 1.6575), tps=20648, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:26:37,338 - root - INFO - Step 29020: lr=1.00E-05, loss= 1.1995 (max= 1.6575), tps=20648, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:26:37,338 - root - INFO - Step 29020: lr=1.00E-05, loss= 1.1995 (max= 1.6575), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:26:53,258 - root - INFO - Step 29030: lr=1.00E-05, loss= 1.1766 (max= 1.5135), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:26:53,258 - root - INFO - Step 29030: lr=1.00E-05, loss= 1.1766 (max= 1.5135), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:26:53,258 - root - INFO - Step 29030: lr=1.00E-05, loss= 1.1766 (max= 1.5135), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:26:53,258 - root - INFO - Step 29030: lr=1.00E-05, loss= 1.1766 (max= 1.5135), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:26:53,258 - root - INFO - Step 29030: lr=1.00E-05, loss= 1.1766 (max= 1.5135), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:26:53,258 - root - INFO - Step 29030: lr=1.00E-05, loss= 1.1766 (max= 1.5135), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:26:53,258 - root - INFO - Step 29030: lr=1.00E-05, loss= 1.1766 (max= 1.5135), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:26:53,258 - root - INFO - Step 29030: lr=1.00E-05, loss= 1.1766 (max= 1.5135), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:27:09,111 - root - INFO - Step 29040: lr=1.00E-05, loss= 1.1765 (max= 1.6478), tps=20674, mfu=43.08%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:09,111 - root - INFO - Step 29040: lr=1.00E-05, loss= 1.1765 (max= 1.6478), tps=20674, mfu=43.08%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:09,111 - root - INFO - Step 29040: lr=1.00E-05, loss= 1.1765 (max= 1.6478), tps=20674, mfu=43.08%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:09,111 - root - INFO - Step 29040: lr=1.00E-05, loss= 1.1765 (max= 1.6478), tps=20674, mfu=43.08%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:09,111 - root - INFO - Step 29040: lr=1.00E-05, loss= 1.1765 (max= 1.6478), tps=20674, mfu=43.08%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:09,111 - root - INFO - Step 29040: lr=1.00E-05, loss= 1.1765 (max= 1.6478), tps=20674, mfu=43.08%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:09,111 - root - INFO - Step 29040: lr=1.00E-05, loss= 1.1765 (max= 1.6478), tps=20674, mfu=43.08%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:09,111 - root - INFO - Step 29040: lr=1.00E-05, loss= 1.1765 (max= 1.6478), tps=20674, mfu=43.07%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:25,082 - root - INFO - Step 29050: lr=1.00E-05, loss= 1.1709 (max= 1.5456), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:27:25,082 - root - INFO - Step 29050: lr=1.00E-05, loss= 1.1709 (max= 1.5456), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:27:25,082 - root - INFO - Step 29050: lr=1.00E-05, loss= 1.1709 (max= 1.5456), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:27:25,082 - root - INFO - Step 29050: lr=1.00E-05, loss= 1.1709 (max= 1.5456), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:27:25,082 - root - INFO - Step 29050: lr=1.00E-05, loss= 1.1709 (max= 1.5456), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:27:25,082 - root - INFO - Step 29050: lr=1.00E-05, loss= 1.1709 (max= 1.5456), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:27:25,082 - root - INFO - Step 29050: lr=1.00E-05, loss= 1.1709 (max= 1.5456), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:27:25,082 - root - INFO - Step 29050: lr=1.00E-05, loss= 1.1709 (max= 1.5456), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:27:41,004 - root - INFO - Step 29060: lr=1.00E-05, loss= 1.1874 (max= 1.5716), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:41,004 - root - INFO - Step 29060: lr=1.00E-05, loss= 1.1874 (max= 1.5716), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:41,004 - root - INFO - Step 29060: lr=1.00E-05, loss= 1.1874 (max= 1.5716), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:41,004 - root - INFO - Step 29060: lr=1.00E-05, loss= 1.1874 (max= 1.5716), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:41,004 - root - INFO - Step 29060: lr=1.00E-05, loss= 1.1874 (max= 1.5716), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:41,004 - root - INFO - Step 29060: lr=1.00E-05, loss= 1.1874 (max= 1.5716), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:41,004 - root - INFO - Step 29060: lr=1.00E-05, loss= 1.1874 (max= 1.5716), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:41,004 - root - INFO - Step 29060: lr=1.00E-05, loss= 1.1874 (max= 1.5716), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:56,949 - root - INFO - Step 29070: lr=1.00E-05, loss= 1.2353 (max= 1.7170), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:56,949 - root - INFO - Step 29070: lr=1.00E-05, loss= 1.2353 (max= 1.7170), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:56,949 - root - INFO - Step 29070: lr=1.00E-05, loss= 1.2353 (max= 1.7170), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:56,950 - root - INFO - Step 29070: lr=1.00E-05, loss= 1.2353 (max= 1.7170), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:56,950 - root - INFO - Step 29070: lr=1.00E-05, loss= 1.2353 (max= 1.7170), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:56,950 - root - INFO - Step 29070: lr=1.00E-05, loss= 1.2353 (max= 1.7170), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:56,950 - root - INFO - Step 29070: lr=1.00E-05, loss= 1.2353 (max= 1.7170), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:27:56,950 - root - INFO - Step 29070: lr=1.00E-05, loss= 1.2353 (max= 1.7170), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:28:12,911 - root - INFO - Step 29080: lr=1.00E-05, loss= 1.2041 (max= 1.6247), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:28:12,911 - root - INFO - Step 29080: lr=1.00E-05, loss= 1.2041 (max= 1.6247), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:28:12,911 - root - INFO - Step 29080: lr=1.00E-05, loss= 1.2041 (max= 1.6247), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:28:12,911 - root - INFO - Step 29080: lr=1.00E-05, loss= 1.2041 (max= 1.6247), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:28:12,911 - root - INFO - Step 29080: lr=1.00E-05, loss= 1.2041 (max= 1.6247), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:28:12,911 - root - INFO - Step 29080: lr=1.00E-05, loss= 1.2041 (max= 1.6247), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:28:12,911 - root - INFO - Step 29080: lr=1.00E-05, loss= 1.2041 (max= 1.6247), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:28:12,911 - root - INFO - Step 29080: lr=1.00E-05, loss= 1.2041 (max= 1.6247), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:28:28,879 - root - INFO - Step 29090: lr=1.00E-05, loss= 1.1609 (max= 1.5959), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:28:28,879 - root - INFO - Step 29090: lr=1.00E-05, loss= 1.1609 (max= 1.5959), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:28:28,879 - root - INFO - Step 29090: lr=1.00E-05, loss= 1.1609 (max= 1.5959), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:28:28,879 - root - INFO - Step 29090: lr=1.00E-05, loss= 1.1609 (max= 1.5959), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:28:28,879 - root - INFO - Step 29090: lr=1.00E-05, loss= 1.1609 (max= 1.5959), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:28:28,879 - root - INFO - Step 29090: lr=1.00E-05, loss= 1.1609 (max= 1.5959), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:28:28,879 - root - INFO - Step 29090: lr=1.00E-05, loss= 1.1609 (max= 1.5959), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:28:28,879 - root - INFO - Step 29090: lr=1.00E-05, loss= 1.1609 (max= 1.5959), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:28:44,813 - root - INFO - Step 29100: lr=1.00E-05, loss= 1.1820 (max= 1.6104), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:28:44,813 - root - INFO - Step 29100: lr=1.00E-05, loss= 1.1820 (max= 1.6104), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:28:44,813 - root - INFO - Step 29100: lr=1.00E-05, loss= 1.1820 (max= 1.6104), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:28:44,813 - root - INFO - Step 29100: lr=1.00E-05, loss= 1.1820 (max= 1.6104), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:28:44,814 - root - INFO - Step 29100: lr=1.00E-05, loss= 1.1820 (max= 1.6104), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:28:44,814 - root - INFO - Step 29100: lr=1.00E-05, loss= 1.1820 (max= 1.6104), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:28:44,814 - root - INFO - Step 29100: lr=1.00E-05, loss= 1.1820 (max= 1.6104), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:28:44,814 - root - INFO - Step 29100: lr=1.00E-05, loss= 1.1820 (max= 1.6104), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:29:00,792 - root - INFO - Step 29110: lr=1.00E-05, loss= 1.1755 (max= 1.5443), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:29:00,792 - root - INFO - Step 29110: lr=1.00E-05, loss= 1.1755 (max= 1.5443), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:29:00,792 - root - INFO - Step 29110: lr=1.00E-05, loss= 1.1755 (max= 1.5443), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:29:00,792 - root - INFO - Step 29110: lr=1.00E-05, loss= 1.1755 (max= 1.5443), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:29:00,792 - root - INFO - Step 29110: lr=1.00E-05, loss= 1.1755 (max= 1.5443), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:29:00,792 - root - INFO - Step 29110: lr=1.00E-05, loss= 1.1755 (max= 1.5443), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:29:00,792 - root - INFO - Step 29110: lr=1.00E-05, loss= 1.1755 (max= 1.5443), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:29:00,792 - root - INFO - Step 29110: lr=1.00E-05, loss= 1.1755 (max= 1.5443), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:29:16,798 - root - INFO - Step 29120: lr=1.00E-05, loss= 1.1916 (max= 1.5719), tps=20476, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:29:16,798 - root - INFO - Step 29120: lr=1.00E-05, loss= 1.1916 (max= 1.5719), tps=20476, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:29:16,798 - root - INFO - Step 29120: lr=1.00E-05, loss= 1.1916 (max= 1.5719), tps=20476, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:29:16,798 - root - INFO - Step 29120: lr=1.00E-05, loss= 1.1916 (max= 1.5719), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:29:16,798 - root - INFO - Step 29120: lr=1.00E-05, loss= 1.1916 (max= 1.5719), tps=20476, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:29:16,798 - root - INFO - Step 29120: lr=1.00E-05, loss= 1.1916 (max= 1.5719), tps=20476, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:29:16,798 - root - INFO - Step 29120: lr=1.00E-05, loss= 1.1916 (max= 1.5719), tps=20476, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:29:16,798 - root - INFO - Step 29120: lr=1.00E-05, loss= 1.1916 (max= 1.5719), tps=20476, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:29:32,750 - root - INFO - Step 29130: lr=1.00E-05, loss= 1.1903 (max= 1.7168), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:29:32,751 - root - INFO - Step 29130: lr=1.00E-05, loss= 1.1903 (max= 1.7168), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:29:32,751 - root - INFO - Step 29130: lr=1.00E-05, loss= 1.1903 (max= 1.7168), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:29:32,751 - root - INFO - Step 29130: lr=1.00E-05, loss= 1.1903 (max= 1.7168), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:29:32,751 - root - INFO - Step 29130: lr=1.00E-05, loss= 1.1903 (max= 1.7168), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:29:32,751 - root - INFO - Step 29130: lr=1.00E-05, loss= 1.1903 (max= 1.7168), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:29:32,751 - root - INFO - Step 29130: lr=1.00E-05, loss= 1.1903 (max= 1.7168), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:29:32,751 - root - INFO - Step 29130: lr=1.00E-05, loss= 1.1903 (max= 1.7168), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:29:48,691 - root - INFO - Step 29140: lr=1.00E-05, loss= 1.1810 (max= 1.5681), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:29:48,691 - root - INFO - Step 29140: lr=1.00E-05, loss= 1.1810 (max= 1.5681), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:29:48,691 - root - INFO - Step 29140: lr=1.00E-05, loss= 1.1810 (max= 1.5681), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:29:48,691 - root - INFO - Step 29140: lr=1.00E-05, loss= 1.1810 (max= 1.5681), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:29:48,691 - root - INFO - Step 29140: lr=1.00E-05, loss= 1.1810 (max= 1.5681), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:29:48,691 - root - INFO - Step 29140: lr=1.00E-05, loss= 1.1810 (max= 1.5681), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:29:48,691 - root - INFO - Step 29140: lr=1.00E-05, loss= 1.1810 (max= 1.5681), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:29:48,691 - root - INFO - Step 29140: lr=1.00E-05, loss= 1.1810 (max= 1.5681), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:30:04,612 - root - INFO - Step 29150: lr=1.00E-05, loss= 1.1775 (max= 1.6026), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:04,612 - root - INFO - Step 29150: lr=1.00E-05, loss= 1.1775 (max= 1.6026), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:04,612 - root - INFO - Step 29150: lr=1.00E-05, loss= 1.1775 (max= 1.6026), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:04,612 - root - INFO - Step 29150: lr=1.00E-05, loss= 1.1775 (max= 1.6026), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:04,612 - root - INFO - Step 29150: lr=1.00E-05, loss= 1.1775 (max= 1.6026), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:04,612 - root - INFO - Step 29150: lr=1.00E-05, loss= 1.1775 (max= 1.6026), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:04,612 - root - INFO - Step 29150: lr=1.00E-05, loss= 1.1775 (max= 1.6026), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:04,612 - root - INFO - Step 29150: lr=1.00E-05, loss= 1.1775 (max= 1.6026), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:20,571 - root - INFO - Step 29160: lr=1.00E-05, loss= 1.1722 (max= 1.5072), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:30:20,571 - root - INFO - Step 29160: lr=1.00E-05, loss= 1.1722 (max= 1.5072), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:30:20,572 - root - INFO - Step 29160: lr=1.00E-05, loss= 1.1722 (max= 1.5072), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:30:20,572 - root - INFO - Step 29160: lr=1.00E-05, loss= 1.1722 (max= 1.5072), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:30:20,572 - root - INFO - Step 29160: lr=1.00E-05, loss= 1.1722 (max= 1.5072), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:30:20,572 - root - INFO - Step 29160: lr=1.00E-05, loss= 1.1722 (max= 1.5072), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:30:20,572 - root - INFO - Step 29160: lr=1.00E-05, loss= 1.1722 (max= 1.5072), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:30:20,572 - root - INFO - Step 29160: lr=1.00E-05, loss= 1.1722 (max= 1.5072), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:30:36,504 - root - INFO - Step 29170: lr=1.00E-05, loss= 1.1541 (max= 1.5314), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:36,504 - root - INFO - Step 29170: lr=1.00E-05, loss= 1.1541 (max= 1.5314), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:36,504 - root - INFO - Step 29170: lr=1.00E-05, loss= 1.1541 (max= 1.5314), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:36,504 - root - INFO - Step 29170: lr=1.00E-05, loss= 1.1541 (max= 1.5314), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:36,504 - root - INFO - Step 29170: lr=1.00E-05, loss= 1.1541 (max= 1.5314), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:36,504 - root - INFO - Step 29170: lr=1.00E-05, loss= 1.1541 (max= 1.5314), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:36,504 - root - INFO - Step 29170: lr=1.00E-05, loss= 1.1541 (max= 1.5314), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:36,504 - root - INFO - Step 29170: lr=1.00E-05, loss= 1.1541 (max= 1.5314), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:52,464 - root - INFO - Step 29180: lr=1.00E-05, loss= 1.2185 (max= 1.5342), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:52,464 - root - INFO - Step 29180: lr=1.00E-05, loss= 1.2185 (max= 1.5342), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:52,464 - root - INFO - Step 29180: lr=1.00E-05, loss= 1.2185 (max= 1.5342), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:52,464 - root - INFO - Step 29180: lr=1.00E-05, loss= 1.2185 (max= 1.5342), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:52,464 - root - INFO - Step 29180: lr=1.00E-05, loss= 1.2185 (max= 1.5342), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:52,464 - root - INFO - Step 29180: lr=1.00E-05, loss= 1.2185 (max= 1.5342), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:52,464 - root - INFO - Step 29180: lr=1.00E-05, loss= 1.2185 (max= 1.5342), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:30:52,464 - root - INFO - Step 29180: lr=1.00E-05, loss= 1.2185 (max= 1.5342), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:08,417 - root - INFO - Step 29190: lr=1.00E-05, loss= 1.1641 (max= 1.7912), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:08,417 - root - INFO - Step 29190: lr=1.00E-05, loss= 1.1641 (max= 1.7912), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:08,417 - root - INFO - Step 29190: lr=1.00E-05, loss= 1.1641 (max= 1.7912), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:08,417 - root - INFO - Step 29190: lr=1.00E-05, loss= 1.1641 (max= 1.7912), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:08,417 - root - INFO - Step 29190: lr=1.00E-05, loss= 1.1641 (max= 1.7912), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:08,417 - root - INFO - Step 29190: lr=1.00E-05, loss= 1.1641 (max= 1.7912), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:08,417 - root - INFO - Step 29190: lr=1.00E-05, loss= 1.1641 (max= 1.7912), tps=20545, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:08,417 - root - INFO - Step 29190: lr=1.00E-05, loss= 1.1641 (max= 1.7912), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:24,368 - root - INFO - Step 29200: lr=1.00E-05, loss= 1.1929 (max= 1.5367), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:24,368 - root - INFO - Step 29200: lr=1.00E-05, loss= 1.1929 (max= 1.5367), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:24,368 - root - INFO - Step 29200: lr=1.00E-05, loss= 1.1929 (max= 1.5367), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:24,368 - root - INFO - Step 29200: lr=1.00E-05, loss= 1.1929 (max= 1.5367), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:24,368 - root - INFO - Step 29200: lr=1.00E-05, loss= 1.1929 (max= 1.5367), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:24,368 - root - INFO - Step 29200: lr=1.00E-05, loss= 1.1929 (max= 1.5367), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:24,368 - root - INFO - Step 29200: lr=1.00E-05, loss= 1.1929 (max= 1.5367), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:24,368 - root - INFO - Step 29200: lr=1.00E-05, loss= 1.1929 (max= 1.5367), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:40,310 - root - INFO - Step 29210: lr=1.00E-05, loss= 1.1737 (max= 1.6109), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:40,310 - root - INFO - Step 29210: lr=1.00E-05, loss= 1.1737 (max= 1.6109), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:40,310 - root - INFO - Step 29210: lr=1.00E-05, loss= 1.1737 (max= 1.6109), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:40,310 - root - INFO - Step 29210: lr=1.00E-05, loss= 1.1737 (max= 1.6109), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:40,310 - root - INFO - Step 29210: lr=1.00E-05, loss= 1.1737 (max= 1.6109), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:40,310 - root - INFO - Step 29210: lr=1.00E-05, loss= 1.1737 (max= 1.6109), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:40,310 - root - INFO - Step 29210: lr=1.00E-05, loss= 1.1737 (max= 1.6109), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:40,310 - root - INFO - Step 29210: lr=1.00E-05, loss= 1.1737 (max= 1.6109), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:50,615 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:7501525 +2025-10-24 23:31:56,239 - root - INFO - Step 29220: lr=1.00E-05, loss= 1.1871 (max= 1.7814), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:56,239 - root - INFO - Step 29220: lr=1.00E-05, loss= 1.1871 (max= 1.7814), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:56,240 - root - INFO - Step 29220: lr=1.00E-05, loss= 1.1871 (max= 1.7814), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:56,240 - root - INFO - Step 29220: lr=1.00E-05, loss= 1.1871 (max= 1.7814), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:56,240 - root - INFO - Step 29220: lr=1.00E-05, loss= 1.1871 (max= 1.7814), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:56,240 - root - INFO - Step 29220: lr=1.00E-05, loss= 1.1871 (max= 1.7814), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:56,240 - root - INFO - Step 29220: lr=1.00E-05, loss= 1.1871 (max= 1.7814), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:31:56,240 - root - INFO - Step 29220: lr=1.00E-05, loss= 1.1871 (max= 1.7814), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:32:12,162 - root - INFO - Step 29230: lr=1.00E-05, loss= 1.1841 (max= 1.6605), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:32:12,162 - root - INFO - Step 29230: lr=1.00E-05, loss= 1.1841 (max= 1.6605), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:32:12,162 - root - INFO - Step 29230: lr=1.00E-05, loss= 1.1841 (max= 1.6605), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:32:12,162 - root - INFO - Step 29230: lr=1.00E-05, loss= 1.1841 (max= 1.6605), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:32:12,162 - root - INFO - Step 29230: lr=1.00E-05, loss= 1.1841 (max= 1.6605), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:32:12,162 - root - INFO - Step 29230: lr=1.00E-05, loss= 1.1841 (max= 1.6605), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:32:12,162 - root - INFO - Step 29230: lr=1.00E-05, loss= 1.1841 (max= 1.6605), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:32:12,162 - root - INFO - Step 29230: lr=1.00E-05, loss= 1.1841 (max= 1.6605), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:32:28,134 - root - INFO - Step 29240: lr=1.00E-05, loss= 1.1791 (max= 1.6416), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:32:28,134 - root - INFO - Step 29240: lr=1.00E-05, loss= 1.1791 (max= 1.6416), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:32:28,134 - root - INFO - Step 29240: lr=1.00E-05, loss= 1.1791 (max= 1.6416), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:32:28,134 - root - INFO - Step 29240: lr=1.00E-05, loss= 1.1791 (max= 1.6416), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:32:28,135 - root - INFO - Step 29240: lr=1.00E-05, loss= 1.1791 (max= 1.6416), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:32:28,135 - root - INFO - Step 29240: lr=1.00E-05, loss= 1.1791 (max= 1.6416), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:32:28,135 - root - INFO - Step 29240: lr=1.00E-05, loss= 1.1791 (max= 1.6416), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:32:28,135 - root - INFO - Step 29240: lr=1.00E-05, loss= 1.1791 (max= 1.6416), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:32:44,076 - root - INFO - Step 29250: lr=1.00E-05, loss= 1.1928 (max= 1.5616), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:32:44,076 - root - INFO - Step 29250: lr=1.00E-05, loss= 1.1928 (max= 1.5616), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:32:44,076 - root - INFO - Step 29250: lr=1.00E-05, loss= 1.1928 (max= 1.5616), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:32:44,076 - root - INFO - Step 29250: lr=1.00E-05, loss= 1.1928 (max= 1.5616), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:32:44,076 - root - INFO - Step 29250: lr=1.00E-05, loss= 1.1928 (max= 1.5616), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:32:44,076 - root - INFO - Step 29250: lr=1.00E-05, loss= 1.1928 (max= 1.5616), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:32:44,076 - root - INFO - Step 29250: lr=1.00E-05, loss= 1.1928 (max= 1.5616), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:32:44,077 - root - INFO - Step 29250: lr=1.00E-05, loss= 1.1928 (max= 1.5616), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:33:00,014 - root - INFO - Step 29260: lr=1.00E-05, loss= 1.1660 (max= 1.7892), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:33:00,014 - root - INFO - Step 29260: lr=1.00E-05, loss= 1.1660 (max= 1.7892), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:33:00,014 - root - INFO - Step 29260: lr=1.00E-05, loss= 1.1660 (max= 1.7892), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:33:00,014 - root - INFO - Step 29260: lr=1.00E-05, loss= 1.1660 (max= 1.7892), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:33:00,014 - root - INFO - Step 29260: lr=1.00E-05, loss= 1.1660 (max= 1.7892), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:33:00,014 - root - INFO - Step 29260: lr=1.00E-05, loss= 1.1660 (max= 1.7892), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:33:00,014 - root - INFO - Step 29260: lr=1.00E-05, loss= 1.1660 (max= 1.7892), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:33:00,014 - root - INFO - Step 29260: lr=1.00E-05, loss= 1.1660 (max= 1.7892), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:33:15,950 - root - INFO - Step 29270: lr=1.00E-05, loss= 1.1971 (max= 1.9072), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:33:15,950 - root - INFO - Step 29270: lr=1.00E-05, loss= 1.1971 (max= 1.9072), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:33:15,950 - root - INFO - Step 29270: lr=1.00E-05, loss= 1.1971 (max= 1.9072), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:33:15,950 - root - INFO - Step 29270: lr=1.00E-05, loss= 1.1971 (max= 1.9072), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:33:15,951 - root - INFO - Step 29270: lr=1.00E-05, loss= 1.1971 (max= 1.9072), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:33:15,951 - root - INFO - Step 29270: lr=1.00E-05, loss= 1.1971 (max= 1.9072), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:33:15,951 - root - INFO - Step 29270: lr=1.00E-05, loss= 1.1971 (max= 1.9072), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:33:15,951 - root - INFO - Step 29270: lr=1.00E-05, loss= 1.1971 (max= 1.9072), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:33:31,919 - root - INFO - Step 29280: lr=1.00E-05, loss= 1.2175 (max= 1.5939), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:33:31,919 - root - INFO - Step 29280: lr=1.00E-05, loss= 1.2175 (max= 1.5939), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:33:31,919 - root - INFO - Step 29280: lr=1.00E-05, loss= 1.2175 (max= 1.5939), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:33:31,919 - root - INFO - Step 29280: lr=1.00E-05, loss= 1.2175 (max= 1.5939), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:33:31,919 - root - INFO - Step 29280: lr=1.00E-05, loss= 1.2175 (max= 1.5939), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:33:31,919 - root - INFO - Step 29280: lr=1.00E-05, loss= 1.2175 (max= 1.5939), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:33:31,919 - root - INFO - Step 29280: lr=1.00E-05, loss= 1.2175 (max= 1.5939), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:33:31,919 - root - INFO - Step 29280: lr=1.00E-05, loss= 1.2175 (max= 1.5939), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:33:47,842 - root - INFO - Step 29290: lr=1.00E-05, loss= 1.1595 (max= 1.5771), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:33:47,842 - root - INFO - Step 29290: lr=1.00E-05, loss= 1.1595 (max= 1.5771), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:33:47,842 - root - INFO - Step 29290: lr=1.00E-05, loss= 1.1595 (max= 1.5771), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:33:47,842 - root - INFO - Step 29290: lr=1.00E-05, loss= 1.1595 (max= 1.5771), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:33:47,842 - root - INFO - Step 29290: lr=1.00E-05, loss= 1.1595 (max= 1.5771), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:33:47,842 - root - INFO - Step 29290: lr=1.00E-05, loss= 1.1595 (max= 1.5771), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:33:47,842 - root - INFO - Step 29290: lr=1.00E-05, loss= 1.1595 (max= 1.5771), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:33:47,842 - root - INFO - Step 29290: lr=1.00E-05, loss= 1.1595 (max= 1.5771), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:03,780 - root - INFO - Step 29300: lr=1.00E-05, loss= 1.1617 (max= 1.7184), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:03,780 - root - INFO - Step 29300: lr=1.00E-05, loss= 1.1617 (max= 1.7184), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:03,780 - root - INFO - Step 29300: lr=1.00E-05, loss= 1.1617 (max= 1.7184), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:03,780 - root - INFO - Step 29300: lr=1.00E-05, loss= 1.1617 (max= 1.7184), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:03,780 - root - INFO - Step 29300: lr=1.00E-05, loss= 1.1617 (max= 1.7184), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:03,780 - root - INFO - Step 29300: lr=1.00E-05, loss= 1.1617 (max= 1.7184), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:03,780 - root - INFO - Step 29300: lr=1.00E-05, loss= 1.1617 (max= 1.7184), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:03,780 - root - INFO - Step 29300: lr=1.00E-05, loss= 1.1617 (max= 1.7184), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:19,713 - root - INFO - Step 29310: lr=1.00E-05, loss= 1.1820 (max= 1.4904), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:19,713 - root - INFO - Step 29310: lr=1.00E-05, loss= 1.1820 (max= 1.4904), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:19,713 - root - INFO - Step 29310: lr=1.00E-05, loss= 1.1820 (max= 1.4904), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:19,713 - root - INFO - Step 29310: lr=1.00E-05, loss= 1.1820 (max= 1.4904), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:19,713 - root - INFO - Step 29310: lr=1.00E-05, loss= 1.1820 (max= 1.4904), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:19,713 - root - INFO - Step 29310: lr=1.00E-05, loss= 1.1820 (max= 1.4904), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:19,713 - root - INFO - Step 29310: lr=1.00E-05, loss= 1.1820 (max= 1.4904), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:19,713 - root - INFO - Step 29310: lr=1.00E-05, loss= 1.1820 (max= 1.4904), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:35,653 - root - INFO - Step 29320: lr=1.00E-05, loss= 1.2026 (max= 1.7670), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:35,653 - root - INFO - Step 29320: lr=1.00E-05, loss= 1.2026 (max= 1.7670), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:35,653 - root - INFO - Step 29320: lr=1.00E-05, loss= 1.2026 (max= 1.7670), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:35,653 - root - INFO - Step 29320: lr=1.00E-05, loss= 1.2026 (max= 1.7670), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:35,653 - root - INFO - Step 29320: lr=1.00E-05, loss= 1.2026 (max= 1.7670), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:35,653 - root - INFO - Step 29320: lr=1.00E-05, loss= 1.2026 (max= 1.7670), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:35,653 - root - INFO - Step 29320: lr=1.00E-05, loss= 1.2026 (max= 1.7670), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:35,653 - root - INFO - Step 29320: lr=1.00E-05, loss= 1.2026 (max= 1.7670), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:51,595 - root - INFO - Step 29330: lr=1.00E-05, loss= 1.2190 (max= 1.4927), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:51,595 - root - INFO - Step 29330: lr=1.00E-05, loss= 1.2190 (max= 1.4927), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:51,595 - root - INFO - Step 29330: lr=1.00E-05, loss= 1.2190 (max= 1.4927), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:51,595 - root - INFO - Step 29330: lr=1.00E-05, loss= 1.2190 (max= 1.4927), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:51,595 - root - INFO - Step 29330: lr=1.00E-05, loss= 1.2190 (max= 1.4927), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:51,595 - root - INFO - Step 29330: lr=1.00E-05, loss= 1.2190 (max= 1.4927), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:51,595 - root - INFO - Step 29330: lr=1.00E-05, loss= 1.2190 (max= 1.4927), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:34:51,596 - root - INFO - Step 29330: lr=1.00E-05, loss= 1.2190 (max= 1.4927), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:07,532 - root - INFO - Step 29340: lr=1.00E-05, loss= 1.1659 (max= 1.6672), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:07,532 - root - INFO - Step 29340: lr=1.00E-05, loss= 1.1659 (max= 1.6672), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:07,532 - root - INFO - Step 29340: lr=1.00E-05, loss= 1.1659 (max= 1.6672), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:07,533 - root - INFO - Step 29340: lr=1.00E-05, loss= 1.1659 (max= 1.6672), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:07,533 - root - INFO - Step 29340: lr=1.00E-05, loss= 1.1659 (max= 1.6672), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:07,533 - root - INFO - Step 29340: lr=1.00E-05, loss= 1.1659 (max= 1.6672), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:07,533 - root - INFO - Step 29340: lr=1.00E-05, loss= 1.1659 (max= 1.6672), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:07,533 - root - INFO - Step 29340: lr=1.00E-05, loss= 1.1659 (max= 1.6672), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:23,488 - root - INFO - Step 29350: lr=1.00E-05, loss= 1.1806 (max= 1.6020), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:23,488 - root - INFO - Step 29350: lr=1.00E-05, loss= 1.1806 (max= 1.6020), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:23,489 - root - INFO - Step 29350: lr=1.00E-05, loss= 1.1806 (max= 1.6020), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:23,489 - root - INFO - Step 29350: lr=1.00E-05, loss= 1.1806 (max= 1.6020), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:23,489 - root - INFO - Step 29350: lr=1.00E-05, loss= 1.1806 (max= 1.6020), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:23,489 - root - INFO - Step 29350: lr=1.00E-05, loss= 1.1806 (max= 1.6020), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:23,489 - root - INFO - Step 29350: lr=1.00E-05, loss= 1.1806 (max= 1.6020), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:23,489 - root - INFO - Step 29350: lr=1.00E-05, loss= 1.1806 (max= 1.6020), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:39,366 - root - INFO - Step 29360: lr=1.00E-05, loss= 1.2138 (max= 1.8134), tps=20641, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:39,366 - root - INFO - Step 29360: lr=1.00E-05, loss= 1.2138 (max= 1.8134), tps=20641, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:39,367 - root - INFO - Step 29360: lr=1.00E-05, loss= 1.2138 (max= 1.8134), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:39,367 - root - INFO - Step 29360: lr=1.00E-05, loss= 1.2138 (max= 1.8134), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:39,367 - root - INFO - Step 29360: lr=1.00E-05, loss= 1.2138 (max= 1.8134), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:39,367 - root - INFO - Step 29360: lr=1.00E-05, loss= 1.2138 (max= 1.8134), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:39,367 - root - INFO - Step 29360: lr=1.00E-05, loss= 1.2138 (max= 1.8134), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:39,367 - root - INFO - Step 29360: lr=1.00E-05, loss= 1.2138 (max= 1.8134), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:35:55,300 - root - INFO - Step 29370: lr=1.00E-05, loss= 1.1735 (max= 1.4659), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:35:55,300 - root - INFO - Step 29370: lr=1.00E-05, loss= 1.1735 (max= 1.4659), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:35:55,300 - root - INFO - Step 29370: lr=1.00E-05, loss= 1.1735 (max= 1.4659), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:35:55,300 - root - INFO - Step 29370: lr=1.00E-05, loss= 1.1735 (max= 1.4659), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:35:55,300 - root - INFO - Step 29370: lr=1.00E-05, loss= 1.1735 (max= 1.4659), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:35:55,300 - root - INFO - Step 29370: lr=1.00E-05, loss= 1.1735 (max= 1.4659), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:35:55,300 - root - INFO - Step 29370: lr=1.00E-05, loss= 1.1735 (max= 1.4659), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:35:55,300 - root - INFO - Step 29370: lr=1.00E-05, loss= 1.1735 (max= 1.4659), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:36:11,274 - root - INFO - Step 29380: lr=1.00E-05, loss= 1.1799 (max= 1.9818), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:36:11,274 - root - INFO - Step 29380: lr=1.00E-05, loss= 1.1799 (max= 1.9818), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:36:11,274 - root - INFO - Step 29380: lr=1.00E-05, loss= 1.1799 (max= 1.9818), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:36:11,274 - root - INFO - Step 29380: lr=1.00E-05, loss= 1.1799 (max= 1.9818), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:36:11,274 - root - INFO - Step 29380: lr=1.00E-05, loss= 1.1799 (max= 1.9818), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:36:11,275 - root - INFO - Step 29380: lr=1.00E-05, loss= 1.1799 (max= 1.9818), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:36:11,275 - root - INFO - Step 29380: lr=1.00E-05, loss= 1.1799 (max= 1.9818), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:36:11,275 - root - INFO - Step 29380: lr=1.00E-05, loss= 1.1799 (max= 1.9818), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:36:27,202 - root - INFO - Step 29390: lr=1.00E-05, loss= 1.1742 (max= 1.4693), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:36:27,202 - root - INFO - Step 29390: lr=1.00E-05, loss= 1.1742 (max= 1.4693), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:36:27,202 - root - INFO - Step 29390: lr=1.00E-05, loss= 1.1742 (max= 1.4693), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:36:27,202 - root - INFO - Step 29390: lr=1.00E-05, loss= 1.1742 (max= 1.4693), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:36:27,202 - root - INFO - Step 29390: lr=1.00E-05, loss= 1.1742 (max= 1.4693), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:36:27,202 - root - INFO - Step 29390: lr=1.00E-05, loss= 1.1742 (max= 1.4693), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:36:27,202 - root - INFO - Step 29390: lr=1.00E-05, loss= 1.1742 (max= 1.4693), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:36:27,202 - root - INFO - Step 29390: lr=1.00E-05, loss= 1.1742 (max= 1.4693), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:36:43,179 - root - INFO - Step 29400: lr=1.00E-05, loss= 1.1771 (max= 1.6221), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:36:43,179 - root - INFO - Step 29400: lr=1.00E-05, loss= 1.1771 (max= 1.6221), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:36:43,179 - root - INFO - Step 29400: lr=1.00E-05, loss= 1.1771 (max= 1.6221), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:36:43,179 - root - INFO - Step 29400: lr=1.00E-05, loss= 1.1771 (max= 1.6221), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:36:43,179 - root - INFO - Step 29400: lr=1.00E-05, loss= 1.1771 (max= 1.6221), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:36:43,179 - root - INFO - Step 29400: lr=1.00E-05, loss= 1.1771 (max= 1.6221), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:36:43,179 - root - INFO - Step 29400: lr=1.00E-05, loss= 1.1771 (max= 1.6221), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:36:43,179 - root - INFO - Step 29400: lr=1.00E-05, loss= 1.1771 (max= 1.6221), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:36:59,144 - root - INFO - Step 29410: lr=1.00E-05, loss= 1.1915 (max= 1.5519), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:36:59,144 - root - INFO - Step 29410: lr=1.00E-05, loss= 1.1915 (max= 1.5519), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:36:59,144 - root - INFO - Step 29410: lr=1.00E-05, loss= 1.1915 (max= 1.5519), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:36:59,144 - root - INFO - Step 29410: lr=1.00E-05, loss= 1.1915 (max= 1.5519), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:36:59,144 - root - INFO - Step 29410: lr=1.00E-05, loss= 1.1915 (max= 1.5519), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:36:59,144 - root - INFO - Step 29410: lr=1.00E-05, loss= 1.1915 (max= 1.5519), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:36:59,144 - root - INFO - Step 29410: lr=1.00E-05, loss= 1.1915 (max= 1.5519), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:36:59,144 - root - INFO - Step 29410: lr=1.00E-05, loss= 1.1915 (max= 1.5519), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:37:15,064 - root - INFO - Step 29420: lr=1.00E-05, loss= 1.1654 (max= 1.5287), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:37:15,064 - root - INFO - Step 29420: lr=1.00E-05, loss= 1.1654 (max= 1.5287), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:37:15,064 - root - INFO - Step 29420: lr=1.00E-05, loss= 1.1654 (max= 1.5287), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:37:15,064 - root - INFO - Step 29420: lr=1.00E-05, loss= 1.1654 (max= 1.5287), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:37:15,064 - root - INFO - Step 29420: lr=1.00E-05, loss= 1.1654 (max= 1.5287), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:37:15,064 - root - INFO - Step 29420: lr=1.00E-05, loss= 1.1654 (max= 1.5287), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:37:15,064 - root - INFO - Step 29420: lr=1.00E-05, loss= 1.1654 (max= 1.5287), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:37:15,064 - root - INFO - Step 29420: lr=1.00E-05, loss= 1.1654 (max= 1.5287), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:37:30,971 - root - INFO - Step 29430: lr=1.00E-05, loss= 1.1989 (max= 1.8124), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:37:30,971 - root - INFO - Step 29430: lr=1.00E-05, loss= 1.1989 (max= 1.8124), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:37:30,971 - root - INFO - Step 29430: lr=1.00E-05, loss= 1.1989 (max= 1.8124), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:37:30,971 - root - INFO - Step 29430: lr=1.00E-05, loss= 1.1989 (max= 1.8124), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:37:30,971 - root - INFO - Step 29430: lr=1.00E-05, loss= 1.1989 (max= 1.8124), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:37:30,971 - root - INFO - Step 29430: lr=1.00E-05, loss= 1.1989 (max= 1.8124), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:37:30,971 - root - INFO - Step 29430: lr=1.00E-05, loss= 1.1989 (max= 1.8124), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:37:30,971 - root - INFO - Step 29430: lr=1.00E-05, loss= 1.1989 (max= 1.8124), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:37:46,898 - root - INFO - Step 29440: lr=1.00E-05, loss= 1.1796 (max= 1.6576), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:37:46,899 - root - INFO - Step 29440: lr=1.00E-05, loss= 1.1796 (max= 1.6576), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:37:46,899 - root - INFO - Step 29440: lr=1.00E-05, loss= 1.1796 (max= 1.6576), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:37:46,899 - root - INFO - Step 29440: lr=1.00E-05, loss= 1.1796 (max= 1.6576), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:37:46,899 - root - INFO - Step 29440: lr=1.00E-05, loss= 1.1796 (max= 1.6576), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:37:46,899 - root - INFO - Step 29440: lr=1.00E-05, loss= 1.1796 (max= 1.6576), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:37:46,899 - root - INFO - Step 29440: lr=1.00E-05, loss= 1.1796 (max= 1.6576), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:37:46,899 - root - INFO - Step 29440: lr=1.00E-05, loss= 1.1796 (max= 1.6576), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:38:02,791 - root - INFO - Step 29450: lr=1.00E-05, loss= 1.1798 (max= 1.6100), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:38:02,791 - root - INFO - Step 29450: lr=1.00E-05, loss= 1.1798 (max= 1.6100), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:38:02,791 - root - INFO - Step 29450: lr=1.00E-05, loss= 1.1798 (max= 1.6100), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:38:02,791 - root - INFO - Step 29450: lr=1.00E-05, loss= 1.1798 (max= 1.6100), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:38:02,791 - root - INFO - Step 29450: lr=1.00E-05, loss= 1.1798 (max= 1.6100), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:38:02,791 - root - INFO - Step 29450: lr=1.00E-05, loss= 1.1798 (max= 1.6100), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:38:02,791 - root - INFO - Step 29450: lr=1.00E-05, loss= 1.1798 (max= 1.6100), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:38:02,791 - root - INFO - Step 29450: lr=1.00E-05, loss= 1.1798 (max= 1.6100), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:38:18,726 - root - INFO - Step 29460: lr=1.00E-05, loss= 1.2088 (max= 1.6218), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:38:18,726 - root - INFO - Step 29460: lr=1.00E-05, loss= 1.2088 (max= 1.6218), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:38:18,726 - root - INFO - Step 29460: lr=1.00E-05, loss= 1.2088 (max= 1.6218), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:38:18,726 - root - INFO - Step 29460: lr=1.00E-05, loss= 1.2088 (max= 1.6218), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:38:18,726 - root - INFO - Step 29460: lr=1.00E-05, loss= 1.2088 (max= 1.6218), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:38:18,726 - root - INFO - Step 29460: lr=1.00E-05, loss= 1.2088 (max= 1.6218), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:38:18,726 - root - INFO - Step 29460: lr=1.00E-05, loss= 1.2088 (max= 1.6218), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:38:18,726 - root - INFO - Step 29460: lr=1.00E-05, loss= 1.2088 (max= 1.6218), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:38:34,676 - root - INFO - Step 29470: lr=1.00E-05, loss= 1.2055 (max= 1.5920), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:38:34,676 - root - INFO - Step 29470: lr=1.00E-05, loss= 1.2055 (max= 1.5920), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:38:34,676 - root - INFO - Step 29470: lr=1.00E-05, loss= 1.2055 (max= 1.5920), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:38:34,676 - root - INFO - Step 29470: lr=1.00E-05, loss= 1.2055 (max= 1.5920), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:38:34,676 - root - INFO - Step 29470: lr=1.00E-05, loss= 1.2055 (max= 1.5920), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:38:34,677 - root - INFO - Step 29470: lr=1.00E-05, loss= 1.2055 (max= 1.5920), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:38:34,677 - root - INFO - Step 29470: lr=1.00E-05, loss= 1.2055 (max= 1.5920), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:38:34,677 - root - INFO - Step 29470: lr=1.00E-05, loss= 1.2055 (max= 1.5920), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:38:50,590 - root - INFO - Step 29480: lr=1.00E-05, loss= 1.2025 (max= 1.8934), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:38:50,590 - root - INFO - Step 29480: lr=1.00E-05, loss= 1.2025 (max= 1.8934), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:38:50,590 - root - INFO - Step 29480: lr=1.00E-05, loss= 1.2025 (max= 1.8934), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:38:50,590 - root - INFO - Step 29480: lr=1.00E-05, loss= 1.2025 (max= 1.8934), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:38:50,590 - root - INFO - Step 29480: lr=1.00E-05, loss= 1.2025 (max= 1.8934), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:38:50,590 - root - INFO - Step 29480: lr=1.00E-05, loss= 1.2025 (max= 1.8934), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:38:50,590 - root - INFO - Step 29480: lr=1.00E-05, loss= 1.2025 (max= 1.8934), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:38:50,590 - root - INFO - Step 29480: lr=1.00E-05, loss= 1.2025 (max= 1.8934), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:06,563 - root - INFO - Step 29490: lr=1.00E-05, loss= 1.1966 (max= 1.7044), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:06,563 - root - INFO - Step 29490: lr=1.00E-05, loss= 1.1966 (max= 1.7044), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:06,564 - root - INFO - Step 29490: lr=1.00E-05, loss= 1.1966 (max= 1.7044), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:06,564 - root - INFO - Step 29490: lr=1.00E-05, loss= 1.1966 (max= 1.7044), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:06,564 - root - INFO - Step 29490: lr=1.00E-05, loss= 1.1966 (max= 1.7044), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:06,564 - root - INFO - Step 29490: lr=1.00E-05, loss= 1.1966 (max= 1.7044), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:06,564 - root - INFO - Step 29490: lr=1.00E-05, loss= 1.1966 (max= 1.7044), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:06,564 - root - INFO - Step 29490: lr=1.00E-05, loss= 1.1966 (max= 1.7044), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:22,515 - root - INFO - Step 29500: lr=1.00E-05, loss= 1.1728 (max= 1.5156), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:22,515 - root - INFO - Step 29500: lr=1.00E-05, loss= 1.1728 (max= 1.5156), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:22,515 - root - INFO - Step 29500: lr=1.00E-05, loss= 1.1728 (max= 1.5156), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:22,515 - root - INFO - Step 29500: lr=1.00E-05, loss= 1.1728 (max= 1.5156), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:22,515 - root - INFO - Step 29500: lr=1.00E-05, loss= 1.1728 (max= 1.5156), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:22,515 - root - INFO - Step 29500: lr=1.00E-05, loss= 1.1728 (max= 1.5156), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:22,515 - root - INFO - Step 29500: lr=1.00E-05, loss= 1.1728 (max= 1.5156), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:22,515 - root - INFO - Step 29500: lr=1.00E-05, loss= 1.1728 (max= 1.5156), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:38,467 - root - INFO - Step 29510: lr=1.00E-05, loss= 1.2079 (max= 1.5636), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:38,467 - root - INFO - Step 29510: lr=1.00E-05, loss= 1.2079 (max= 1.5636), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:38,467 - root - INFO - Step 29510: lr=1.00E-05, loss= 1.2079 (max= 1.5636), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:38,467 - root - INFO - Step 29510: lr=1.00E-05, loss= 1.2079 (max= 1.5636), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:38,467 - root - INFO - Step 29510: lr=1.00E-05, loss= 1.2079 (max= 1.5636), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:38,467 - root - INFO - Step 29510: lr=1.00E-05, loss= 1.2079 (max= 1.5636), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:38,467 - root - INFO - Step 29510: lr=1.00E-05, loss= 1.2079 (max= 1.5636), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:38,467 - root - INFO - Step 29510: lr=1.00E-05, loss= 1.2079 (max= 1.5636), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:39:54,377 - root - INFO - Step 29520: lr=1.00E-05, loss= 1.1888 (max= 1.8627), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:39:54,377 - root - INFO - Step 29520: lr=1.00E-05, loss= 1.1888 (max= 1.8627), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:39:54,377 - root - INFO - Step 29520: lr=1.00E-05, loss= 1.1888 (max= 1.8627), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:39:54,377 - root - INFO - Step 29520: lr=1.00E-05, loss= 1.1888 (max= 1.8627), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:39:54,377 - root - INFO - Step 29520: lr=1.00E-05, loss= 1.1888 (max= 1.8627), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:39:54,377 - root - INFO - Step 29520: lr=1.00E-05, loss= 1.1888 (max= 1.8627), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:39:54,377 - root - INFO - Step 29520: lr=1.00E-05, loss= 1.1888 (max= 1.8627), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:39:54,377 - root - INFO - Step 29520: lr=1.00E-05, loss= 1.1888 (max= 1.8627), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:40:10,315 - root - INFO - Step 29530: lr=1.00E-05, loss= 1.2241 (max= 1.5979), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:40:10,315 - root - INFO - Step 29530: lr=1.00E-05, loss= 1.2241 (max= 1.5979), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:40:10,315 - root - INFO - Step 29530: lr=1.00E-05, loss= 1.2241 (max= 1.5979), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:40:10,315 - root - INFO - Step 29530: lr=1.00E-05, loss= 1.2241 (max= 1.5979), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:40:10,315 - root - INFO - Step 29530: lr=1.00E-05, loss= 1.2241 (max= 1.5979), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:40:10,315 - root - INFO - Step 29530: lr=1.00E-05, loss= 1.2241 (max= 1.5979), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:40:10,315 - root - INFO - Step 29530: lr=1.00E-05, loss= 1.2241 (max= 1.5979), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:40:10,315 - root - INFO - Step 29530: lr=1.00E-05, loss= 1.2241 (max= 1.5979), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:40:26,236 - root - INFO - Step 29540: lr=1.00E-05, loss= 1.1791 (max= 1.6211), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:40:26,237 - root - INFO - Step 29540: lr=1.00E-05, loss= 1.1791 (max= 1.6211), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:40:26,237 - root - INFO - Step 29540: lr=1.00E-05, loss= 1.1791 (max= 1.6211), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:40:26,237 - root - INFO - Step 29540: lr=1.00E-05, loss= 1.1791 (max= 1.6211), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:40:26,237 - root - INFO - Step 29540: lr=1.00E-05, loss= 1.1791 (max= 1.6211), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:40:26,237 - root - INFO - Step 29540: lr=1.00E-05, loss= 1.1791 (max= 1.6211), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:40:26,237 - root - INFO - Step 29540: lr=1.00E-05, loss= 1.1791 (max= 1.6211), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:40:26,237 - root - INFO - Step 29540: lr=1.00E-05, loss= 1.1791 (max= 1.6211), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:40:42,150 - root - INFO - Step 29550: lr=1.00E-05, loss= 1.1891 (max= 1.7017), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:40:42,150 - root - INFO - Step 29550: lr=1.00E-05, loss= 1.1891 (max= 1.7017), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:40:42,150 - root - INFO - Step 29550: lr=1.00E-05, loss= 1.1891 (max= 1.7017), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:40:42,150 - root - INFO - Step 29550: lr=1.00E-05, loss= 1.1891 (max= 1.7017), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:40:42,150 - root - INFO - Step 29550: lr=1.00E-05, loss= 1.1891 (max= 1.7017), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:40:42,150 - root - INFO - Step 29550: lr=1.00E-05, loss= 1.1891 (max= 1.7017), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:40:42,150 - root - INFO - Step 29550: lr=1.00E-05, loss= 1.1891 (max= 1.7017), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:40:42,150 - root - INFO - Step 29550: lr=1.00E-05, loss= 1.1891 (max= 1.7017), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:40:58,117 - root - INFO - Step 29560: lr=1.00E-05, loss= 1.1809 (max= 1.5874), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:40:58,118 - root - INFO - Step 29560: lr=1.00E-05, loss= 1.1809 (max= 1.5874), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:40:58,118 - root - INFO - Step 29560: lr=1.00E-05, loss= 1.1809 (max= 1.5874), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:40:58,118 - root - INFO - Step 29560: lr=1.00E-05, loss= 1.1809 (max= 1.5874), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:40:58,118 - root - INFO - Step 29560: lr=1.00E-05, loss= 1.1809 (max= 1.5874), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:40:58,118 - root - INFO - Step 29560: lr=1.00E-05, loss= 1.1809 (max= 1.5874), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:40:58,118 - root - INFO - Step 29560: lr=1.00E-05, loss= 1.1809 (max= 1.5874), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:40:58,118 - root - INFO - Step 29560: lr=1.00E-05, loss= 1.1809 (max= 1.5874), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:41:14,038 - root - INFO - Step 29570: lr=1.00E-05, loss= 1.1708 (max= 1.5865), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:41:14,038 - root - INFO - Step 29570: lr=1.00E-05, loss= 1.1708 (max= 1.5865), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:41:14,038 - root - INFO - Step 29570: lr=1.00E-05, loss= 1.1708 (max= 1.5865), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:41:14,038 - root - INFO - Step 29570: lr=1.00E-05, loss= 1.1708 (max= 1.5865), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:41:14,038 - root - INFO - Step 29570: lr=1.00E-05, loss= 1.1708 (max= 1.5865), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:41:14,038 - root - INFO - Step 29570: lr=1.00E-05, loss= 1.1708 (max= 1.5865), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:41:14,038 - root - INFO - Step 29570: lr=1.00E-05, loss= 1.1708 (max= 1.5865), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:41:14,038 - root - INFO - Step 29570: lr=1.00E-05, loss= 1.1708 (max= 1.5865), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:41:30,019 - root - INFO - Step 29580: lr=1.00E-05, loss= 1.1805 (max= 1.6380), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:41:30,019 - root - INFO - Step 29580: lr=1.00E-05, loss= 1.1805 (max= 1.6380), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:41:30,019 - root - INFO - Step 29580: lr=1.00E-05, loss= 1.1805 (max= 1.6380), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:41:30,019 - root - INFO - Step 29580: lr=1.00E-05, loss= 1.1805 (max= 1.6380), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:41:30,019 - root - INFO - Step 29580: lr=1.00E-05, loss= 1.1805 (max= 1.6380), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:41:30,019 - root - INFO - Step 29580: lr=1.00E-05, loss= 1.1805 (max= 1.6380), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:41:30,019 - root - INFO - Step 29580: lr=1.00E-05, loss= 1.1805 (max= 1.6380), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:41:30,019 - root - INFO - Step 29580: lr=1.00E-05, loss= 1.1805 (max= 1.6380), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:41:45,932 - root - INFO - Step 29590: lr=1.00E-05, loss= 1.2217 (max= 1.6455), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:41:45,932 - root - INFO - Step 29590: lr=1.00E-05, loss= 1.2217 (max= 1.6455), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:41:45,932 - root - INFO - Step 29590: lr=1.00E-05, loss= 1.2217 (max= 1.6455), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:41:45,932 - root - INFO - Step 29590: lr=1.00E-05, loss= 1.2217 (max= 1.6455), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:41:45,932 - root - INFO - Step 29590: lr=1.00E-05, loss= 1.2217 (max= 1.6455), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:41:45,932 - root - INFO - Step 29590: lr=1.00E-05, loss= 1.2217 (max= 1.6455), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:41:45,932 - root - INFO - Step 29590: lr=1.00E-05, loss= 1.2217 (max= 1.6455), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:41:45,933 - root - INFO - Step 29590: lr=1.00E-05, loss= 1.2217 (max= 1.6455), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:42:01,840 - root - INFO - Step 29600: lr=1.00E-05, loss= 1.1724 (max= 1.5392), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:42:01,841 - root - INFO - Step 29600: lr=1.00E-05, loss= 1.1724 (max= 1.5392), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:42:01,841 - root - INFO - Step 29600: lr=1.00E-05, loss= 1.1724 (max= 1.5392), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:42:01,841 - root - INFO - Step 29600: lr=1.00E-05, loss= 1.1724 (max= 1.5392), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:42:01,841 - root - INFO - Step 29600: lr=1.00E-05, loss= 1.1724 (max= 1.5392), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:42:01,841 - root - INFO - Step 29600: lr=1.00E-05, loss= 1.1724 (max= 1.5392), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:42:01,841 - root - INFO - Step 29600: lr=1.00E-05, loss= 1.1724 (max= 1.5392), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:42:01,841 - root - INFO - Step 29600: lr=1.00E-05, loss= 1.1724 (max= 1.5392), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:42:17,786 - root - INFO - Step 29610: lr=1.00E-05, loss= 1.2070 (max= 1.5999), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:42:17,786 - root - INFO - Step 29610: lr=1.00E-05, loss= 1.2070 (max= 1.5999), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:42:17,787 - root - INFO - Step 29610: lr=1.00E-05, loss= 1.2070 (max= 1.5999), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:42:17,787 - root - INFO - Step 29610: lr=1.00E-05, loss= 1.2070 (max= 1.5999), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:42:17,787 - root - INFO - Step 29610: lr=1.00E-05, loss= 1.2070 (max= 1.5999), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:42:17,787 - root - INFO - Step 29610: lr=1.00E-05, loss= 1.2070 (max= 1.5999), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:42:17,787 - root - INFO - Step 29610: lr=1.00E-05, loss= 1.2070 (max= 1.5999), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:42:17,787 - root - INFO - Step 29610: lr=1.00E-05, loss= 1.2070 (max= 1.5999), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:42:33,730 - root - INFO - Step 29620: lr=1.00E-05, loss= 1.1693 (max= 1.6028), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:42:33,730 - root - INFO - Step 29620: lr=1.00E-05, loss= 1.1693 (max= 1.6028), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:42:33,730 - root - INFO - Step 29620: lr=1.00E-05, loss= 1.1693 (max= 1.6028), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:42:33,730 - root - INFO - Step 29620: lr=1.00E-05, loss= 1.1693 (max= 1.6028), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:42:33,730 - root - INFO - Step 29620: lr=1.00E-05, loss= 1.1693 (max= 1.6028), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:42:33,730 - root - INFO - Step 29620: lr=1.00E-05, loss= 1.1693 (max= 1.6028), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:42:33,730 - root - INFO - Step 29620: lr=1.00E-05, loss= 1.1693 (max= 1.6028), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:42:33,730 - root - INFO - Step 29620: lr=1.00E-05, loss= 1.1693 (max= 1.6028), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:42:49,662 - root - INFO - Step 29630: lr=1.00E-05, loss= 1.1832 (max= 1.6440), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:42:49,662 - root - INFO - Step 29630: lr=1.00E-05, loss= 1.1832 (max= 1.6440), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:42:49,662 - root - INFO - Step 29630: lr=1.00E-05, loss= 1.1832 (max= 1.6440), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:42:49,662 - root - INFO - Step 29630: lr=1.00E-05, loss= 1.1832 (max= 1.6440), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:42:49,662 - root - INFO - Step 29630: lr=1.00E-05, loss= 1.1832 (max= 1.6440), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:42:49,662 - root - INFO - Step 29630: lr=1.00E-05, loss= 1.1832 (max= 1.6440), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:42:49,662 - root - INFO - Step 29630: lr=1.00E-05, loss= 1.1832 (max= 1.6440), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:42:49,662 - root - INFO - Step 29630: lr=1.00E-05, loss= 1.1832 (max= 1.6440), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:05,578 - root - INFO - Step 29640: lr=1.00E-05, loss= 1.2012 (max= 1.6935), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:05,578 - root - INFO - Step 29640: lr=1.00E-05, loss= 1.2012 (max= 1.6935), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:05,578 - root - INFO - Step 29640: lr=1.00E-05, loss= 1.2012 (max= 1.6935), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:05,578 - root - INFO - Step 29640: lr=1.00E-05, loss= 1.2012 (max= 1.6935), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:05,578 - root - INFO - Step 29640: lr=1.00E-05, loss= 1.2012 (max= 1.6935), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:05,578 - root - INFO - Step 29640: lr=1.00E-05, loss= 1.2012 (max= 1.6935), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:05,578 - root - INFO - Step 29640: lr=1.00E-05, loss= 1.2012 (max= 1.6935), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:05,578 - root - INFO - Step 29640: lr=1.00E-05, loss= 1.2012 (max= 1.6935), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:21,511 - root - INFO - Step 29650: lr=1.00E-05, loss= 1.2020 (max= 1.5121), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:43:21,511 - root - INFO - Step 29650: lr=1.00E-05, loss= 1.2020 (max= 1.5121), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:43:21,511 - root - INFO - Step 29650: lr=1.00E-05, loss= 1.2020 (max= 1.5121), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:43:21,511 - root - INFO - Step 29650: lr=1.00E-05, loss= 1.2020 (max= 1.5121), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:43:21,511 - root - INFO - Step 29650: lr=1.00E-05, loss= 1.2020 (max= 1.5121), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:43:21,511 - root - INFO - Step 29650: lr=1.00E-05, loss= 1.2020 (max= 1.5121), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:43:21,511 - root - INFO - Step 29650: lr=1.00E-05, loss= 1.2020 (max= 1.5121), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:43:21,511 - root - INFO - Step 29650: lr=1.00E-05, loss= 1.2020 (max= 1.5121), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:43:37,461 - root - INFO - Step 29660: lr=1.00E-05, loss= 1.1886 (max= 1.6378), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:37,461 - root - INFO - Step 29660: lr=1.00E-05, loss= 1.1886 (max= 1.6378), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:37,461 - root - INFO - Step 29660: lr=1.00E-05, loss= 1.1886 (max= 1.6378), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:37,461 - root - INFO - Step 29660: lr=1.00E-05, loss= 1.1886 (max= 1.6378), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:37,461 - root - INFO - Step 29660: lr=1.00E-05, loss= 1.1886 (max= 1.6378), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:37,461 - root - INFO - Step 29660: lr=1.00E-05, loss= 1.1886 (max= 1.6378), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:37,461 - root - INFO - Step 29660: lr=1.00E-05, loss= 1.1886 (max= 1.6378), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:37,461 - root - INFO - Step 29660: lr=1.00E-05, loss= 1.1886 (max= 1.6378), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:53,395 - root - INFO - Step 29670: lr=1.00E-05, loss= 1.1548 (max= 1.7595), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:53,395 - root - INFO - Step 29670: lr=1.00E-05, loss= 1.1548 (max= 1.7595), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:53,395 - root - INFO - Step 29670: lr=1.00E-05, loss= 1.1548 (max= 1.7595), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:53,395 - root - INFO - Step 29670: lr=1.00E-05, loss= 1.1548 (max= 1.7595), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:53,395 - root - INFO - Step 29670: lr=1.00E-05, loss= 1.1548 (max= 1.7595), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:53,395 - root - INFO - Step 29670: lr=1.00E-05, loss= 1.1548 (max= 1.7595), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:53,395 - root - INFO - Step 29670: lr=1.00E-05, loss= 1.1548 (max= 1.7595), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:43:53,395 - root - INFO - Step 29670: lr=1.00E-05, loss= 1.1548 (max= 1.7595), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:44:09,286 - root - INFO - Step 29680: lr=1.00E-05, loss= 1.2084 (max= 1.5547), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:44:09,287 - root - INFO - Step 29680: lr=1.00E-05, loss= 1.2084 (max= 1.5547), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:44:09,287 - root - INFO - Step 29680: lr=1.00E-05, loss= 1.2084 (max= 1.5547), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:44:09,287 - root - INFO - Step 29680: lr=1.00E-05, loss= 1.2084 (max= 1.5547), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:44:09,287 - root - INFO - Step 29680: lr=1.00E-05, loss= 1.2084 (max= 1.5547), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:44:09,287 - root - INFO - Step 29680: lr=1.00E-05, loss= 1.2084 (max= 1.5547), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:44:09,287 - root - INFO - Step 29680: lr=1.00E-05, loss= 1.2084 (max= 1.5547), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:44:09,287 - root - INFO - Step 29680: lr=1.00E-05, loss= 1.2084 (max= 1.5547), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:44:25,213 - root - INFO - Step 29690: lr=1.00E-05, loss= 1.1970 (max= 2.0933), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:44:25,213 - root - INFO - Step 29690: lr=1.00E-05, loss= 1.1970 (max= 2.0933), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:44:25,213 - root - INFO - Step 29690: lr=1.00E-05, loss= 1.1970 (max= 2.0933), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:44:25,213 - root - INFO - Step 29690: lr=1.00E-05, loss= 1.1970 (max= 2.0933), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:44:25,213 - root - INFO - Step 29690: lr=1.00E-05, loss= 1.1970 (max= 2.0933), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:44:25,213 - root - INFO - Step 29690: lr=1.00E-05, loss= 1.1970 (max= 2.0933), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:44:25,213 - root - INFO - Step 29690: lr=1.00E-05, loss= 1.1970 (max= 2.0933), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:44:25,213 - root - INFO - Step 29690: lr=1.00E-05, loss= 1.1970 (max= 2.0933), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:44:41,096 - root - INFO - Step 29700: lr=1.00E-05, loss= 1.1960 (max= 1.7453), tps=20634, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:44:41,096 - root - INFO - Step 29700: lr=1.00E-05, loss= 1.1960 (max= 1.7453), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:44:41,096 - root - INFO - Step 29700: lr=1.00E-05, loss= 1.1960 (max= 1.7453), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:44:41,096 - root - INFO - Step 29700: lr=1.00E-05, loss= 1.1960 (max= 1.7453), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:44:41,097 - root - INFO - Step 29700: lr=1.00E-05, loss= 1.1960 (max= 1.7453), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:44:41,097 - root - INFO - Step 29700: lr=1.00E-05, loss= 1.1960 (max= 1.7453), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:44:41,097 - root - INFO - Step 29700: lr=1.00E-05, loss= 1.1960 (max= 1.7453), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:44:41,097 - root - INFO - Step 29700: lr=1.00E-05, loss= 1.1960 (max= 1.7453), tps=20634, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:44:57,019 - root - INFO - Step 29710: lr=1.00E-05, loss= 1.1753 (max= 1.4882), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:44:57,019 - root - INFO - Step 29710: lr=1.00E-05, loss= 1.1753 (max= 1.4882), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:44:57,020 - root - INFO - Step 29710: lr=1.00E-05, loss= 1.1753 (max= 1.4882), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:44:57,020 - root - INFO - Step 29710: lr=1.00E-05, loss= 1.1753 (max= 1.4882), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:44:57,020 - root - INFO - Step 29710: lr=1.00E-05, loss= 1.1753 (max= 1.4882), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:44:57,020 - root - INFO - Step 29710: lr=1.00E-05, loss= 1.1753 (max= 1.4882), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:44:57,020 - root - INFO - Step 29710: lr=1.00E-05, loss= 1.1753 (max= 1.4882), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:44:57,020 - root - INFO - Step 29710: lr=1.00E-05, loss= 1.1753 (max= 1.4882), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:45:12,970 - root - INFO - Step 29720: lr=1.00E-05, loss= 1.1957 (max= 1.5114), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:45:12,970 - root - INFO - Step 29720: lr=1.00E-05, loss= 1.1957 (max= 1.5114), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:45:12,970 - root - INFO - Step 29720: lr=1.00E-05, loss= 1.1957 (max= 1.5114), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:45:12,970 - root - INFO - Step 29720: lr=1.00E-05, loss= 1.1957 (max= 1.5114), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:45:12,970 - root - INFO - Step 29720: lr=1.00E-05, loss= 1.1957 (max= 1.5114), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:45:12,970 - root - INFO - Step 29720: lr=1.00E-05, loss= 1.1957 (max= 1.5114), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:45:12,970 - root - INFO - Step 29720: lr=1.00E-05, loss= 1.1957 (max= 1.5114), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:45:12,970 - root - INFO - Step 29720: lr=1.00E-05, loss= 1.1957 (max= 1.5114), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:45:28,891 - root - INFO - Step 29730: lr=1.00E-05, loss= 1.1754 (max= 1.5986), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:45:28,891 - root - INFO - Step 29730: lr=1.00E-05, loss= 1.1754 (max= 1.5986), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:45:28,891 - root - INFO - Step 29730: lr=1.00E-05, loss= 1.1754 (max= 1.5986), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:45:28,891 - root - INFO - Step 29730: lr=1.00E-05, loss= 1.1754 (max= 1.5986), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:45:28,891 - root - INFO - Step 29730: lr=1.00E-05, loss= 1.1754 (max= 1.5986), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:45:28,891 - root - INFO - Step 29730: lr=1.00E-05, loss= 1.1754 (max= 1.5986), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:45:28,891 - root - INFO - Step 29730: lr=1.00E-05, loss= 1.1754 (max= 1.5986), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:45:28,891 - root - INFO - Step 29730: lr=1.00E-05, loss= 1.1754 (max= 1.5986), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:45:44,871 - root - INFO - Step 29740: lr=1.00E-05, loss= 1.1926 (max= 1.8544), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:45:44,871 - root - INFO - Step 29740: lr=1.00E-05, loss= 1.1926 (max= 1.8544), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:45:44,872 - root - INFO - Step 29740: lr=1.00E-05, loss= 1.1926 (max= 1.8544), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:45:44,872 - root - INFO - Step 29740: lr=1.00E-05, loss= 1.1926 (max= 1.8544), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:45:44,872 - root - INFO - Step 29740: lr=1.00E-05, loss= 1.1926 (max= 1.8544), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:45:44,872 - root - INFO - Step 29740: lr=1.00E-05, loss= 1.1926 (max= 1.8544), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:45:44,872 - root - INFO - Step 29740: lr=1.00E-05, loss= 1.1926 (max= 1.8544), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:45:44,872 - root - INFO - Step 29740: lr=1.00E-05, loss= 1.1926 (max= 1.8544), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:00,851 - root - INFO - Step 29750: lr=1.00E-05, loss= 1.1836 (max= 1.6816), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:00,851 - root - INFO - Step 29750: lr=1.00E-05, loss= 1.1836 (max= 1.6816), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:00,851 - root - INFO - Step 29750: lr=1.00E-05, loss= 1.1836 (max= 1.6816), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:00,851 - root - INFO - Step 29750: lr=1.00E-05, loss= 1.1836 (max= 1.6816), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:00,851 - root - INFO - Step 29750: lr=1.00E-05, loss= 1.1836 (max= 1.6816), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:00,851 - root - INFO - Step 29750: lr=1.00E-05, loss= 1.1836 (max= 1.6816), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:00,851 - root - INFO - Step 29750: lr=1.00E-05, loss= 1.1836 (max= 1.6816), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:00,851 - root - INFO - Step 29750: lr=1.00E-05, loss= 1.1836 (max= 1.6816), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:16,800 - root - INFO - Step 29760: lr=1.00E-05, loss= 1.2081 (max= 1.5341), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:46:16,800 - root - INFO - Step 29760: lr=1.00E-05, loss= 1.2081 (max= 1.5341), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:46:16,801 - root - INFO - Step 29760: lr=1.00E-05, loss= 1.2081 (max= 1.5341), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:46:16,801 - root - INFO - Step 29760: lr=1.00E-05, loss= 1.2081 (max= 1.5341), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:46:16,801 - root - INFO - Step 29760: lr=1.00E-05, loss= 1.2081 (max= 1.5341), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:46:16,801 - root - INFO - Step 29760: lr=1.00E-05, loss= 1.2081 (max= 1.5341), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:46:16,801 - root - INFO - Step 29760: lr=1.00E-05, loss= 1.2081 (max= 1.5341), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:46:16,801 - root - INFO - Step 29760: lr=1.00E-05, loss= 1.2081 (max= 1.5341), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:46:32,760 - root - INFO - Step 29770: lr=1.00E-05, loss= 1.2069 (max= 1.5218), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:32,760 - root - INFO - Step 29770: lr=1.00E-05, loss= 1.2069 (max= 1.5218), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:32,760 - root - INFO - Step 29770: lr=1.00E-05, loss= 1.2069 (max= 1.5218), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:32,760 - root - INFO - Step 29770: lr=1.00E-05, loss= 1.2069 (max= 1.5218), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:32,760 - root - INFO - Step 29770: lr=1.00E-05, loss= 1.2069 (max= 1.5218), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:32,760 - root - INFO - Step 29770: lr=1.00E-05, loss= 1.2069 (max= 1.5218), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:32,760 - root - INFO - Step 29770: lr=1.00E-05, loss= 1.2069 (max= 1.5218), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:32,760 - root - INFO - Step 29770: lr=1.00E-05, loss= 1.2069 (max= 1.5218), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:48,707 - root - INFO - Step 29780: lr=1.00E-05, loss= 1.1960 (max= 1.6105), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:48,707 - root - INFO - Step 29780: lr=1.00E-05, loss= 1.1960 (max= 1.6105), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:48,707 - root - INFO - Step 29780: lr=1.00E-05, loss= 1.1960 (max= 1.6105), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:48,707 - root - INFO - Step 29780: lr=1.00E-05, loss= 1.1960 (max= 1.6105), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:48,707 - root - INFO - Step 29780: lr=1.00E-05, loss= 1.1960 (max= 1.6105), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:48,707 - root - INFO - Step 29780: lr=1.00E-05, loss= 1.1960 (max= 1.6105), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:48,707 - root - INFO - Step 29780: lr=1.00E-05, loss= 1.1960 (max= 1.6105), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:46:48,707 - root - INFO - Step 29780: lr=1.00E-05, loss= 1.1960 (max= 1.6105), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:04,613 - root - INFO - Step 29790: lr=1.00E-05, loss= 1.1901 (max= 1.9288), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:04,613 - root - INFO - Step 29790: lr=1.00E-05, loss= 1.1901 (max= 1.9288), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:04,614 - root - INFO - Step 29790: lr=1.00E-05, loss= 1.1901 (max= 1.9288), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:04,614 - root - INFO - Step 29790: lr=1.00E-05, loss= 1.1901 (max= 1.9288), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:04,614 - root - INFO - Step 29790: lr=1.00E-05, loss= 1.1901 (max= 1.9288), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:04,614 - root - INFO - Step 29790: lr=1.00E-05, loss= 1.1901 (max= 1.9288), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:04,614 - root - INFO - Step 29790: lr=1.00E-05, loss= 1.1901 (max= 1.9288), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:04,614 - root - INFO - Step 29790: lr=1.00E-05, loss= 1.1901 (max= 1.9288), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:20,594 - root - INFO - Step 29800: lr=1.00E-05, loss= 1.1912 (max= 1.5453), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:20,594 - root - INFO - Step 29800: lr=1.00E-05, loss= 1.1912 (max= 1.5453), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:20,594 - root - INFO - Step 29800: lr=1.00E-05, loss= 1.1912 (max= 1.5453), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:20,594 - root - INFO - Step 29800: lr=1.00E-05, loss= 1.1912 (max= 1.5453), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:20,594 - root - INFO - Step 29800: lr=1.00E-05, loss= 1.1912 (max= 1.5453), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:20,594 - root - INFO - Step 29800: lr=1.00E-05, loss= 1.1912 (max= 1.5453), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:20,594 - root - INFO - Step 29800: lr=1.00E-05, loss= 1.1912 (max= 1.5453), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:20,594 - root - INFO - Step 29800: lr=1.00E-05, loss= 1.1912 (max= 1.5453), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:36,560 - root - INFO - Step 29810: lr=1.00E-05, loss= 1.2203 (max= 1.5394), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:36,560 - root - INFO - Step 29810: lr=1.00E-05, loss= 1.2203 (max= 1.5394), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:36,560 - root - INFO - Step 29810: lr=1.00E-05, loss= 1.2203 (max= 1.5394), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:36,560 - root - INFO - Step 29810: lr=1.00E-05, loss= 1.2203 (max= 1.5394), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:36,560 - root - INFO - Step 29810: lr=1.00E-05, loss= 1.2203 (max= 1.5394), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:36,560 - root - INFO - Step 29810: lr=1.00E-05, loss= 1.2203 (max= 1.5394), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:36,560 - root - INFO - Step 29810: lr=1.00E-05, loss= 1.2203 (max= 1.5394), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:36,560 - root - INFO - Step 29810: lr=1.00E-05, loss= 1.2203 (max= 1.5394), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:52,511 - root - INFO - Step 29820: lr=1.00E-05, loss= 1.1731 (max= 1.5236), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:52,511 - root - INFO - Step 29820: lr=1.00E-05, loss= 1.1731 (max= 1.5236), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:52,511 - root - INFO - Step 29820: lr=1.00E-05, loss= 1.1731 (max= 1.5236), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:52,511 - root - INFO - Step 29820: lr=1.00E-05, loss= 1.1731 (max= 1.5236), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:52,511 - root - INFO - Step 29820: lr=1.00E-05, loss= 1.1731 (max= 1.5236), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:52,511 - root - INFO - Step 29820: lr=1.00E-05, loss= 1.1731 (max= 1.5236), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:52,511 - root - INFO - Step 29820: lr=1.00E-05, loss= 1.1731 (max= 1.5236), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:47:52,511 - root - INFO - Step 29820: lr=1.00E-05, loss= 1.1731 (max= 1.5236), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:48:08,424 - root - INFO - Step 29830: lr=1.00E-05, loss= 1.1767 (max= 1.7306), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:48:08,424 - root - INFO - Step 29830: lr=1.00E-05, loss= 1.1767 (max= 1.7306), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:48:08,425 - root - INFO - Step 29830: lr=1.00E-05, loss= 1.1767 (max= 1.7306), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:48:08,425 - root - INFO - Step 29830: lr=1.00E-05, loss= 1.1767 (max= 1.7306), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:48:08,425 - root - INFO - Step 29830: lr=1.00E-05, loss= 1.1767 (max= 1.7306), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:48:08,425 - root - INFO - Step 29830: lr=1.00E-05, loss= 1.1767 (max= 1.7306), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:48:08,425 - root - INFO - Step 29830: lr=1.00E-05, loss= 1.1767 (max= 1.7306), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:48:08,425 - root - INFO - Step 29830: lr=1.00E-05, loss= 1.1767 (max= 1.7306), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:48:24,421 - root - INFO - Step 29840: lr=1.00E-05, loss= 1.1999 (max= 1.8904), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:48:24,421 - root - INFO - Step 29840: lr=1.00E-05, loss= 1.1999 (max= 1.8904), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:48:24,421 - root - INFO - Step 29840: lr=1.00E-05, loss= 1.1999 (max= 1.8904), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:48:24,421 - root - INFO - Step 29840: lr=1.00E-05, loss= 1.1999 (max= 1.8904), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:48:24,421 - root - INFO - Step 29840: lr=1.00E-05, loss= 1.1999 (max= 1.8904), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:48:24,421 - root - INFO - Step 29840: lr=1.00E-05, loss= 1.1999 (max= 1.8904), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:48:24,421 - root - INFO - Step 29840: lr=1.00E-05, loss= 1.1999 (max= 1.8904), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:48:24,421 - root - INFO - Step 29840: lr=1.00E-05, loss= 1.1999 (max= 1.8904), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:48:40,341 - root - INFO - Step 29850: lr=1.00E-05, loss= 1.1906 (max= 1.5796), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:48:40,341 - root - INFO - Step 29850: lr=1.00E-05, loss= 1.1906 (max= 1.5796), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:48:40,341 - root - INFO - Step 29850: lr=1.00E-05, loss= 1.1906 (max= 1.5796), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:48:40,341 - root - INFO - Step 29850: lr=1.00E-05, loss= 1.1906 (max= 1.5796), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:48:40,341 - root - INFO - Step 29850: lr=1.00E-05, loss= 1.1906 (max= 1.5796), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:48:40,341 - root - INFO - Step 29850: lr=1.00E-05, loss= 1.1906 (max= 1.5796), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:48:40,341 - root - INFO - Step 29850: lr=1.00E-05, loss= 1.1906 (max= 1.5796), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:48:40,341 - root - INFO - Step 29850: lr=1.00E-05, loss= 1.1906 (max= 1.5796), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:48:56,327 - root - INFO - Step 29860: lr=1.00E-05, loss= 1.1549 (max= 1.5392), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:48:56,327 - root - INFO - Step 29860: lr=1.00E-05, loss= 1.1549 (max= 1.5392), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:48:56,327 - root - INFO - Step 29860: lr=1.00E-05, loss= 1.1549 (max= 1.5392), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:48:56,327 - root - INFO - Step 29860: lr=1.00E-05, loss= 1.1549 (max= 1.5392), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:48:56,327 - root - INFO - Step 29860: lr=1.00E-05, loss= 1.1549 (max= 1.5392), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:48:56,327 - root - INFO - Step 29860: lr=1.00E-05, loss= 1.1549 (max= 1.5392), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:48:56,327 - root - INFO - Step 29860: lr=1.00E-05, loss= 1.1549 (max= 1.5392), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:48:56,327 - root - INFO - Step 29860: lr=1.00E-05, loss= 1.1549 (max= 1.5392), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:49:12,283 - root - INFO - Step 29870: lr=1.00E-05, loss= 1.1997 (max= 1.6115), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:49:12,283 - root - INFO - Step 29870: lr=1.00E-05, loss= 1.1997 (max= 1.6115), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:49:12,283 - root - INFO - Step 29870: lr=1.00E-05, loss= 1.1997 (max= 1.6115), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:49:12,283 - root - INFO - Step 29870: lr=1.00E-05, loss= 1.1997 (max= 1.6115), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:49:12,283 - root - INFO - Step 29870: lr=1.00E-05, loss= 1.1997 (max= 1.6115), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:49:12,283 - root - INFO - Step 29870: lr=1.00E-05, loss= 1.1997 (max= 1.6115), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:49:12,283 - root - INFO - Step 29870: lr=1.00E-05, loss= 1.1997 (max= 1.6115), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:49:12,283 - root - INFO - Step 29870: lr=1.00E-05, loss= 1.1997 (max= 1.6115), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:49:28,276 - root - INFO - Step 29880: lr=1.00E-05, loss= 1.2220 (max= 1.7162), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:49:28,276 - root - INFO - Step 29880: lr=1.00E-05, loss= 1.2220 (max= 1.7162), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:49:28,276 - root - INFO - Step 29880: lr=1.00E-05, loss= 1.2220 (max= 1.7162), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:49:28,276 - root - INFO - Step 29880: lr=1.00E-05, loss= 1.2220 (max= 1.7162), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:49:28,276 - root - INFO - Step 29880: lr=1.00E-05, loss= 1.2220 (max= 1.7162), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:49:28,276 - root - INFO - Step 29880: lr=1.00E-05, loss= 1.2220 (max= 1.7162), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:49:28,276 - root - INFO - Step 29880: lr=1.00E-05, loss= 1.2220 (max= 1.7162), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:49:28,276 - root - INFO - Step 29880: lr=1.00E-05, loss= 1.2220 (max= 1.7162), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:49:40,214 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:2629368 +2025-10-24 23:49:44,262 - root - INFO - Step 29890: lr=1.00E-05, loss= 1.1835 (max= 1.5935), tps=20501, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:49:44,262 - root - INFO - Step 29890: lr=1.00E-05, loss= 1.1835 (max= 1.5935), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:49:44,262 - root - INFO - Step 29890: lr=1.00E-05, loss= 1.1835 (max= 1.5935), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:49:44,262 - root - INFO - Step 29890: lr=1.00E-05, loss= 1.1835 (max= 1.5935), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:49:44,262 - root - INFO - Step 29890: lr=1.00E-05, loss= 1.1835 (max= 1.5935), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:49:44,262 - root - INFO - Step 29890: lr=1.00E-05, loss= 1.1835 (max= 1.5935), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:49:44,262 - root - INFO - Step 29890: lr=1.00E-05, loss= 1.1835 (max= 1.5935), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:49:44,262 - root - INFO - Step 29890: lr=1.00E-05, loss= 1.1835 (max= 1.5935), tps=20501, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:00,235 - root - INFO - Step 29900: lr=1.00E-05, loss= 1.2053 (max= 1.5587), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:00,235 - root - INFO - Step 29900: lr=1.00E-05, loss= 1.2053 (max= 1.5587), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:00,235 - root - INFO - Step 29900: lr=1.00E-05, loss= 1.2053 (max= 1.5587), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:00,235 - root - INFO - Step 29900: lr=1.00E-05, loss= 1.2053 (max= 1.5587), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:00,235 - root - INFO - Step 29900: lr=1.00E-05, loss= 1.2053 (max= 1.5587), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:00,235 - root - INFO - Step 29900: lr=1.00E-05, loss= 1.2053 (max= 1.5587), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:00,236 - root - INFO - Step 29900: lr=1.00E-05, loss= 1.2053 (max= 1.5587), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:00,236 - root - INFO - Step 29900: lr=1.00E-05, loss= 1.2053 (max= 1.5587), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:16,143 - root - INFO - Step 29910: lr=1.00E-05, loss= 1.1765 (max= 1.5238), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:50:16,143 - root - INFO - Step 29910: lr=1.00E-05, loss= 1.1765 (max= 1.5238), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:50:16,143 - root - INFO - Step 29910: lr=1.00E-05, loss= 1.1765 (max= 1.5238), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:50:16,143 - root - INFO - Step 29910: lr=1.00E-05, loss= 1.1765 (max= 1.5238), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:50:16,143 - root - INFO - Step 29910: lr=1.00E-05, loss= 1.1765 (max= 1.5238), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:50:16,144 - root - INFO - Step 29910: lr=1.00E-05, loss= 1.1765 (max= 1.5238), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:50:16,144 - root - INFO - Step 29910: lr=1.00E-05, loss= 1.1765 (max= 1.5238), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:50:16,144 - root - INFO - Step 29910: lr=1.00E-05, loss= 1.1765 (max= 1.5238), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:50:32,104 - root - INFO - Step 29920: lr=1.00E-05, loss= 1.1984 (max= 1.4961), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:32,104 - root - INFO - Step 29920: lr=1.00E-05, loss= 1.1984 (max= 1.4961), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:32,105 - root - INFO - Step 29920: lr=1.00E-05, loss= 1.1984 (max= 1.4961), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:32,105 - root - INFO - Step 29920: lr=1.00E-05, loss= 1.1984 (max= 1.4961), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:32,105 - root - INFO - Step 29920: lr=1.00E-05, loss= 1.1984 (max= 1.4961), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:32,105 - root - INFO - Step 29920: lr=1.00E-05, loss= 1.1984 (max= 1.4961), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:32,105 - root - INFO - Step 29920: lr=1.00E-05, loss= 1.1984 (max= 1.4961), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:32,105 - root - INFO - Step 29920: lr=1.00E-05, loss= 1.1984 (max= 1.4961), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:48,042 - root - INFO - Step 29930: lr=1.00E-05, loss= 1.1633 (max= 1.5143), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:48,042 - root - INFO - Step 29930: lr=1.00E-05, loss= 1.1633 (max= 1.5143), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:48,043 - root - INFO - Step 29930: lr=1.00E-05, loss= 1.1633 (max= 1.5143), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:48,043 - root - INFO - Step 29930: lr=1.00E-05, loss= 1.1633 (max= 1.5143), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:48,043 - root - INFO - Step 29930: lr=1.00E-05, loss= 1.1633 (max= 1.5143), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:48,043 - root - INFO - Step 29930: lr=1.00E-05, loss= 1.1633 (max= 1.5143), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:48,043 - root - INFO - Step 29930: lr=1.00E-05, loss= 1.1633 (max= 1.5143), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:50:48,043 - root - INFO - Step 29930: lr=1.00E-05, loss= 1.1633 (max= 1.5143), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:51:03,986 - root - INFO - Step 29940: lr=1.00E-05, loss= 1.2120 (max= 1.5437), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:03,986 - root - INFO - Step 29940: lr=1.00E-05, loss= 1.2120 (max= 1.5437), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:03,986 - root - INFO - Step 29940: lr=1.00E-05, loss= 1.2120 (max= 1.5437), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:03,986 - root - INFO - Step 29940: lr=1.00E-05, loss= 1.2120 (max= 1.5437), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:03,986 - root - INFO - Step 29940: lr=1.00E-05, loss= 1.2120 (max= 1.5437), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:03,986 - root - INFO - Step 29940: lr=1.00E-05, loss= 1.2120 (max= 1.5437), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:03,986 - root - INFO - Step 29940: lr=1.00E-05, loss= 1.2120 (max= 1.5437), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:03,986 - root - INFO - Step 29940: lr=1.00E-05, loss= 1.2120 (max= 1.5437), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:19,905 - root - INFO - Step 29950: lr=1.00E-05, loss= 1.1714 (max= 1.5821), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:19,905 - root - INFO - Step 29950: lr=1.00E-05, loss= 1.1714 (max= 1.5821), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:19,905 - root - INFO - Step 29950: lr=1.00E-05, loss= 1.1714 (max= 1.5821), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:19,905 - root - INFO - Step 29950: lr=1.00E-05, loss= 1.1714 (max= 1.5821), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:19,905 - root - INFO - Step 29950: lr=1.00E-05, loss= 1.1714 (max= 1.5821), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:19,905 - root - INFO - Step 29950: lr=1.00E-05, loss= 1.1714 (max= 1.5821), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:19,905 - root - INFO - Step 29950: lr=1.00E-05, loss= 1.1714 (max= 1.5821), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:19,905 - root - INFO - Step 29950: lr=1.00E-05, loss= 1.1714 (max= 1.5821), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:35,812 - root - INFO - Step 29960: lr=1.00E-05, loss= 1.1888 (max= 1.6466), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:35,812 - root - INFO - Step 29960: lr=1.00E-05, loss= 1.1888 (max= 1.6466), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:35,812 - root - INFO - Step 29960: lr=1.00E-05, loss= 1.1888 (max= 1.6466), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:35,812 - root - INFO - Step 29960: lr=1.00E-05, loss= 1.1888 (max= 1.6466), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:35,813 - root - INFO - Step 29960: lr=1.00E-05, loss= 1.1888 (max= 1.6466), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:35,813 - root - INFO - Step 29960: lr=1.00E-05, loss= 1.1888 (max= 1.6466), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:35,813 - root - INFO - Step 29960: lr=1.00E-05, loss= 1.1888 (max= 1.6466), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:35,813 - root - INFO - Step 29960: lr=1.00E-05, loss= 1.1888 (max= 1.6466), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:51,760 - root - INFO - Step 29970: lr=1.00E-05, loss= 1.1793 (max= 1.5461), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:51,760 - root - INFO - Step 29970: lr=1.00E-05, loss= 1.1793 (max= 1.5461), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:51,760 - root - INFO - Step 29970: lr=1.00E-05, loss= 1.1793 (max= 1.5461), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:51,760 - root - INFO - Step 29970: lr=1.00E-05, loss= 1.1793 (max= 1.5461), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:51,760 - root - INFO - Step 29970: lr=1.00E-05, loss= 1.1793 (max= 1.5461), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:51,760 - root - INFO - Step 29970: lr=1.00E-05, loss= 1.1793 (max= 1.5461), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:51,760 - root - INFO - Step 29970: lr=1.00E-05, loss= 1.1793 (max= 1.5461), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:51:51,761 - root - INFO - Step 29970: lr=1.00E-05, loss= 1.1793 (max= 1.5461), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:52:07,722 - root - INFO - Step 29980: lr=1.00E-05, loss= 1.1958 (max= 1.6049), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:52:07,722 - root - INFO - Step 29980: lr=1.00E-05, loss= 1.1958 (max= 1.6049), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:52:07,722 - root - INFO - Step 29980: lr=1.00E-05, loss= 1.1958 (max= 1.6049), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:52:07,723 - root - INFO - Step 29980: lr=1.00E-05, loss= 1.1958 (max= 1.6049), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:52:07,723 - root - INFO - Step 29980: lr=1.00E-05, loss= 1.1958 (max= 1.6049), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:52:07,723 - root - INFO - Step 29980: lr=1.00E-05, loss= 1.1958 (max= 1.6049), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:52:07,723 - root - INFO - Step 29980: lr=1.00E-05, loss= 1.1958 (max= 1.6049), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:52:07,723 - root - INFO - Step 29980: lr=1.00E-05, loss= 1.1958 (max= 1.6049), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:52:23,647 - root - INFO - Step 29990: lr=1.00E-05, loss= 1.2103 (max= 1.5382), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:52:23,647 - root - INFO - Step 29990: lr=1.00E-05, loss= 1.2103 (max= 1.5382), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:52:23,647 - root - INFO - Step 29990: lr=1.00E-05, loss= 1.2103 (max= 1.5382), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:52:23,647 - root - INFO - Step 29990: lr=1.00E-05, loss= 1.2103 (max= 1.5382), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:52:23,647 - root - INFO - Step 29990: lr=1.00E-05, loss= 1.2103 (max= 1.5382), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:52:23,647 - root - INFO - Step 29990: lr=1.00E-05, loss= 1.2103 (max= 1.5382), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:52:23,647 - root - INFO - Step 29990: lr=1.00E-05, loss= 1.2103 (max= 1.5382), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:52:23,647 - root - INFO - Step 29990: lr=1.00E-05, loss= 1.2103 (max= 1.5382), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-30000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-30000! Save time: 4.3807103633880615 +2025-10-24 23:52:39,534 - root - INFO - Step 30000: lr=1.00E-05, loss= 1.1852 (max= 1.5635), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:52:39,534 - root - INFO - Step 30000: lr=1.00E-05, loss= 1.1852 (max= 1.5635), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:52:39,534 - root - INFO - Saving a full checkpoint at step 30000 +2025-10-24 23:52:39,534 - root - INFO - Saving a full checkpoint at step 30000 +2025-10-24 23:52:39,534 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 23:52:39,534 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 23:52:39,534 - root - INFO - Step 30000: lr=1.00E-05, loss= 1.1852 (max= 1.5635), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:52:39,534 - root - INFO - Step 30000: lr=1.00E-05, loss= 1.1852 (max= 1.5635), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:52:39,534 - root - INFO - Step 30000: lr=1.00E-05, loss= 1.1852 (max= 1.5635), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:52:39,534 - root - INFO - Saving a full checkpoint at step 30000 +2025-10-24 23:52:39,534 - root - INFO - Saving a full checkpoint at step 30000 +2025-10-24 23:52:39,534 - root - INFO - Saving a full checkpoint at step 30000 +2025-10-24 23:52:39,534 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 23:52:39,534 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 23:52:39,534 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 23:52:39,534 - root - INFO - Step 30000: lr=1.00E-05, loss= 1.1852 (max= 1.5635), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:52:39,534 - root - INFO - Step 30000: lr=1.00E-05, loss= 1.1852 (max= 1.5635), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:52:39,534 - root - INFO - Step 30000: lr=1.00E-05, loss= 1.1852 (max= 1.5635), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:52:39,534 - root - INFO - Saving a full checkpoint at step 30000 +2025-10-24 23:52:39,534 - root - INFO - Saving a full checkpoint at step 30000 +2025-10-24 23:52:39,534 - root - INFO - Saving a full checkpoint at step 30000 +2025-10-24 23:52:39,534 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 23:52:39,534 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 23:52:39,534 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-24 23:52:54,050 - root - INFO - Finished saving the checkpoint in 14.52 seconds +2025-10-24 23:52:54,057 - root - INFO - Finished saving the checkpoint in 14.52 seconds +2025-10-24 23:52:54,057 - root - INFO - Finished saving the checkpoint in 14.52 seconds +2025-10-24 23:52:54,057 - root - INFO - Finished saving the checkpoint in 14.52 seconds +2025-10-24 23:52:54,057 - root - INFO - Finished saving the checkpoint in 14.52 seconds +2025-10-24 23:52:54,058 - root - INFO - Finished saving the checkpoint in 14.52 seconds +2025-10-24 23:52:54,058 - root - INFO - Finished saving the checkpoint in 14.52 seconds +2025-10-24 23:52:54,058 - root - INFO - Finished saving the checkpoint in 14.52 seconds +2025-10-24 23:53:09,904 - root - INFO - Step 30010: lr=1.00E-05, loss= 1.1995 (max= 1.6383), tps=10791, mfu=22.48%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:09,904 - root - INFO - Step 30010: lr=1.00E-05, loss= 1.1995 (max= 1.6383), tps=10791, mfu=22.48%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:09,904 - root - INFO - Step 30010: lr=1.00E-05, loss= 1.1995 (max= 1.6383), tps=10791, mfu=22.48%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:09,904 - root - INFO - Step 30010: lr=1.00E-05, loss= 1.1995 (max= 1.6383), tps=10791, mfu=22.48%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:09,905 - root - INFO - Step 30010: lr=1.00E-05, loss= 1.1995 (max= 1.6383), tps=10791, mfu=22.48%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:09,905 - root - INFO - Step 30010: lr=1.00E-05, loss= 1.1995 (max= 1.6383), tps=10791, mfu=22.48%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:09,905 - root - INFO - Step 30010: lr=1.00E-05, loss= 1.1995 (max= 1.6383), tps=10791, mfu=22.48%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:09,905 - root - INFO - Step 30010: lr=1.00E-05, loss= 1.1995 (max= 1.6383), tps=10791, mfu=22.48%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:25,845 - root - INFO - Step 30020: lr=1.00E-05, loss= 1.1938 (max= 1.7311), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:25,845 - root - INFO - Step 30020: lr=1.00E-05, loss= 1.1938 (max= 1.7311), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:25,845 - root - INFO - Step 30020: lr=1.00E-05, loss= 1.1938 (max= 1.7311), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:25,846 - root - INFO - Step 30020: lr=1.00E-05, loss= 1.1938 (max= 1.7311), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:25,846 - root - INFO - Step 30020: lr=1.00E-05, loss= 1.1938 (max= 1.7311), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:25,846 - root - INFO - Step 30020: lr=1.00E-05, loss= 1.1938 (max= 1.7311), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:25,846 - root - INFO - Step 30020: lr=1.00E-05, loss= 1.1938 (max= 1.7311), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:25,846 - root - INFO - Step 30020: lr=1.00E-05, loss= 1.1938 (max= 1.7311), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:41,812 - root - INFO - Step 30030: lr=1.00E-05, loss= 1.2013 (max= 1.9651), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:41,812 - root - INFO - Step 30030: lr=1.00E-05, loss= 1.2013 (max= 1.9651), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:41,812 - root - INFO - Step 30030: lr=1.00E-05, loss= 1.2013 (max= 1.9651), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:41,812 - root - INFO - Step 30030: lr=1.00E-05, loss= 1.2013 (max= 1.9651), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:41,812 - root - INFO - Step 30030: lr=1.00E-05, loss= 1.2013 (max= 1.9651), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:41,812 - root - INFO - Step 30030: lr=1.00E-05, loss= 1.2013 (max= 1.9651), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:41,812 - root - INFO - Step 30030: lr=1.00E-05, loss= 1.2013 (max= 1.9651), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:41,812 - root - INFO - Step 30030: lr=1.00E-05, loss= 1.2013 (max= 1.9651), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:53:57,744 - root - INFO - Step 30040: lr=1.00E-05, loss= 1.1722 (max= 1.5894), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:53:57,744 - root - INFO - Step 30040: lr=1.00E-05, loss= 1.1722 (max= 1.5894), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:53:57,744 - root - INFO - Step 30040: lr=1.00E-05, loss= 1.1722 (max= 1.5894), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:53:57,744 - root - INFO - Step 30040: lr=1.00E-05, loss= 1.1722 (max= 1.5894), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:53:57,744 - root - INFO - Step 30040: lr=1.00E-05, loss= 1.1722 (max= 1.5894), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:53:57,744 - root - INFO - Step 30040: lr=1.00E-05, loss= 1.1722 (max= 1.5894), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:53:57,744 - root - INFO - Step 30040: lr=1.00E-05, loss= 1.1722 (max= 1.5894), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:53:57,744 - root - INFO - Step 30040: lr=1.00E-05, loss= 1.1722 (max= 1.5894), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:54:13,699 - root - INFO - Step 30050: lr=1.00E-05, loss= 1.2175 (max= 1.9036), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:54:13,699 - root - INFO - Step 30050: lr=1.00E-05, loss= 1.2175 (max= 1.9036), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:54:13,699 - root - INFO - Step 30050: lr=1.00E-05, loss= 1.2175 (max= 1.9036), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:54:13,699 - root - INFO - Step 30050: lr=1.00E-05, loss= 1.2175 (max= 1.9036), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:54:13,699 - root - INFO - Step 30050: lr=1.00E-05, loss= 1.2175 (max= 1.9036), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:54:13,699 - root - INFO - Step 30050: lr=1.00E-05, loss= 1.2175 (max= 1.9036), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:54:13,699 - root - INFO - Step 30050: lr=1.00E-05, loss= 1.2175 (max= 1.9036), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:54:13,699 - root - INFO - Step 30050: lr=1.00E-05, loss= 1.2175 (max= 1.9036), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:54:29,643 - root - INFO - Step 30060: lr=1.00E-05, loss= 1.1859 (max= 1.7119), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:54:29,644 - root - INFO - Step 30060: lr=1.00E-05, loss= 1.1859 (max= 1.7119), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:54:29,644 - root - INFO - Step 30060: lr=1.00E-05, loss= 1.1859 (max= 1.7119), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:54:29,644 - root - INFO - Step 30060: lr=1.00E-05, loss= 1.1859 (max= 1.7119), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:54:29,644 - root - INFO - Step 30060: lr=1.00E-05, loss= 1.1859 (max= 1.7119), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:54:29,644 - root - INFO - Step 30060: lr=1.00E-05, loss= 1.1859 (max= 1.7119), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:54:29,644 - root - INFO - Step 30060: lr=1.00E-05, loss= 1.1859 (max= 1.7119), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:54:29,644 - root - INFO - Step 30060: lr=1.00E-05, loss= 1.1859 (max= 1.7119), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:54:45,566 - root - INFO - Step 30070: lr=1.00E-05, loss= 1.1600 (max= 1.5009), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:54:45,566 - root - INFO - Step 30070: lr=1.00E-05, loss= 1.1600 (max= 1.5009), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:54:45,566 - root - INFO - Step 30070: lr=1.00E-05, loss= 1.1600 (max= 1.5009), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:54:45,566 - root - INFO - Step 30070: lr=1.00E-05, loss= 1.1600 (max= 1.5009), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:54:45,566 - root - INFO - Step 30070: lr=1.00E-05, loss= 1.1600 (max= 1.5009), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:54:45,566 - root - INFO - Step 30070: lr=1.00E-05, loss= 1.1600 (max= 1.5009), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:54:45,566 - root - INFO - Step 30070: lr=1.00E-05, loss= 1.1600 (max= 1.5009), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:54:45,566 - root - INFO - Step 30070: lr=1.00E-05, loss= 1.1600 (max= 1.5009), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:55:01,518 - root - INFO - Step 30080: lr=1.00E-05, loss= 1.2081 (max= 1.6336), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:55:01,518 - root - INFO - Step 30080: lr=1.00E-05, loss= 1.2081 (max= 1.6336), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:55:01,518 - root - INFO - Step 30080: lr=1.00E-05, loss= 1.2081 (max= 1.6336), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:55:01,518 - root - INFO - Step 30080: lr=1.00E-05, loss= 1.2081 (max= 1.6336), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:55:01,518 - root - INFO - Step 30080: lr=1.00E-05, loss= 1.2081 (max= 1.6336), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:55:01,518 - root - INFO - Step 30080: lr=1.00E-05, loss= 1.2081 (max= 1.6336), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:55:01,518 - root - INFO - Step 30080: lr=1.00E-05, loss= 1.2081 (max= 1.6336), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:55:01,518 - root - INFO - Step 30080: lr=1.00E-05, loss= 1.2081 (max= 1.6336), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:55:17,446 - root - INFO - Step 30090: lr=1.00E-05, loss= 1.1525 (max= 1.8896), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:17,446 - root - INFO - Step 30090: lr=1.00E-05, loss= 1.1525 (max= 1.8896), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:17,446 - root - INFO - Step 30090: lr=1.00E-05, loss= 1.1525 (max= 1.8896), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:17,446 - root - INFO - Step 30090: lr=1.00E-05, loss= 1.1525 (max= 1.8896), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:17,446 - root - INFO - Step 30090: lr=1.00E-05, loss= 1.1525 (max= 1.8896), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:17,446 - root - INFO - Step 30090: lr=1.00E-05, loss= 1.1525 (max= 1.8896), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:17,446 - root - INFO - Step 30090: lr=1.00E-05, loss= 1.1525 (max= 1.8896), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:17,446 - root - INFO - Step 30090: lr=1.00E-05, loss= 1.1525 (max= 1.8896), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:33,406 - root - INFO - Step 30100: lr=1.00E-05, loss= 1.1470 (max= 1.5261), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:33,406 - root - INFO - Step 30100: lr=1.00E-05, loss= 1.1470 (max= 1.5261), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:33,406 - root - INFO - Step 30100: lr=1.00E-05, loss= 1.1470 (max= 1.5261), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:33,406 - root - INFO - Step 30100: lr=1.00E-05, loss= 1.1470 (max= 1.5261), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:33,406 - root - INFO - Step 30100: lr=1.00E-05, loss= 1.1470 (max= 1.5261), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:33,406 - root - INFO - Step 30100: lr=1.00E-05, loss= 1.1470 (max= 1.5261), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:33,407 - root - INFO - Step 30100: lr=1.00E-05, loss= 1.1470 (max= 1.5261), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:33,407 - root - INFO - Step 30100: lr=1.00E-05, loss= 1.1470 (max= 1.5261), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:49,322 - root - INFO - Step 30110: lr=1.00E-05, loss= 1.1634 (max= 1.8201), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:49,322 - root - INFO - Step 30110: lr=1.00E-05, loss= 1.1634 (max= 1.8201), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:49,322 - root - INFO - Step 30110: lr=1.00E-05, loss= 1.1634 (max= 1.8201), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:49,322 - root - INFO - Step 30110: lr=1.00E-05, loss= 1.1634 (max= 1.8201), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:49,322 - root - INFO - Step 30110: lr=1.00E-05, loss= 1.1634 (max= 1.8201), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:49,322 - root - INFO - Step 30110: lr=1.00E-05, loss= 1.1634 (max= 1.8201), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:49,322 - root - INFO - Step 30110: lr=1.00E-05, loss= 1.1634 (max= 1.8201), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:55:49,322 - root - INFO - Step 30110: lr=1.00E-05, loss= 1.1634 (max= 1.8201), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:05,242 - root - INFO - Step 30120: lr=1.00E-05, loss= 1.1730 (max= 1.7330), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:05,242 - root - INFO - Step 30120: lr=1.00E-05, loss= 1.1730 (max= 1.7330), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:05,242 - root - INFO - Step 30120: lr=1.00E-05, loss= 1.1730 (max= 1.7330), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:05,242 - root - INFO - Step 30120: lr=1.00E-05, loss= 1.1730 (max= 1.7330), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:05,242 - root - INFO - Step 30120: lr=1.00E-05, loss= 1.1730 (max= 1.7330), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:05,242 - root - INFO - Step 30120: lr=1.00E-05, loss= 1.1730 (max= 1.7330), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:05,242 - root - INFO - Step 30120: lr=1.00E-05, loss= 1.1730 (max= 1.7330), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:05,242 - root - INFO - Step 30120: lr=1.00E-05, loss= 1.1730 (max= 1.7330), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:21,174 - root - INFO - Step 30130: lr=1.00E-05, loss= 1.2089 (max= 1.6592), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:21,174 - root - INFO - Step 30130: lr=1.00E-05, loss= 1.2089 (max= 1.6592), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:21,174 - root - INFO - Step 30130: lr=1.00E-05, loss= 1.2089 (max= 1.6592), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:21,174 - root - INFO - Step 30130: lr=1.00E-05, loss= 1.2089 (max= 1.6592), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:21,174 - root - INFO - Step 30130: lr=1.00E-05, loss= 1.2089 (max= 1.6592), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:21,174 - root - INFO - Step 30130: lr=1.00E-05, loss= 1.2089 (max= 1.6592), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:21,174 - root - INFO - Step 30130: lr=1.00E-05, loss= 1.2089 (max= 1.6592), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:21,174 - root - INFO - Step 30130: lr=1.00E-05, loss= 1.2089 (max= 1.6592), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:37,129 - root - INFO - Step 30140: lr=1.00E-05, loss= 1.1890 (max= 1.4728), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:37,129 - root - INFO - Step 30140: lr=1.00E-05, loss= 1.1890 (max= 1.4728), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:37,130 - root - INFO - Step 30140: lr=1.00E-05, loss= 1.1890 (max= 1.4728), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:37,130 - root - INFO - Step 30140: lr=1.00E-05, loss= 1.1890 (max= 1.4728), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:37,130 - root - INFO - Step 30140: lr=1.00E-05, loss= 1.1890 (max= 1.4728), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:37,130 - root - INFO - Step 30140: lr=1.00E-05, loss= 1.1890 (max= 1.4728), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:37,130 - root - INFO - Step 30140: lr=1.00E-05, loss= 1.1890 (max= 1.4728), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:37,130 - root - INFO - Step 30140: lr=1.00E-05, loss= 1.1890 (max= 1.4728), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:56:53,077 - root - INFO - Step 30150: lr=1.00E-05, loss= 1.2378 (max= 1.5610), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:56:53,077 - root - INFO - Step 30150: lr=1.00E-05, loss= 1.2378 (max= 1.5610), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:56:53,077 - root - INFO - Step 30150: lr=1.00E-05, loss= 1.2378 (max= 1.5610), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:56:53,077 - root - INFO - Step 30150: lr=1.00E-05, loss= 1.2378 (max= 1.5610), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:56:53,077 - root - INFO - Step 30150: lr=1.00E-05, loss= 1.2378 (max= 1.5610), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:56:53,077 - root - INFO - Step 30150: lr=1.00E-05, loss= 1.2378 (max= 1.5610), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:56:53,077 - root - INFO - Step 30150: lr=1.00E-05, loss= 1.2378 (max= 1.5610), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:56:53,077 - root - INFO - Step 30150: lr=1.00E-05, loss= 1.2378 (max= 1.5610), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:08,977 - root - INFO - Step 30160: lr=1.00E-05, loss= 1.1862 (max= 1.5801), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:08,977 - root - INFO - Step 30160: lr=1.00E-05, loss= 1.1862 (max= 1.5801), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:08,978 - root - INFO - Step 30160: lr=1.00E-05, loss= 1.1862 (max= 1.5801), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:08,978 - root - INFO - Step 30160: lr=1.00E-05, loss= 1.1862 (max= 1.5801), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:08,978 - root - INFO - Step 30160: lr=1.00E-05, loss= 1.1862 (max= 1.5801), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:08,978 - root - INFO - Step 30160: lr=1.00E-05, loss= 1.1862 (max= 1.5801), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:08,978 - root - INFO - Step 30160: lr=1.00E-05, loss= 1.1862 (max= 1.5801), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:08,978 - root - INFO - Step 30160: lr=1.00E-05, loss= 1.1862 (max= 1.5801), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:24,928 - root - INFO - Step 30170: lr=1.00E-05, loss= 1.2101 (max= 1.6017), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:24,928 - root - INFO - Step 30170: lr=1.00E-05, loss= 1.2101 (max= 1.6017), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:24,928 - root - INFO - Step 30170: lr=1.00E-05, loss= 1.2101 (max= 1.6017), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:24,928 - root - INFO - Step 30170: lr=1.00E-05, loss= 1.2101 (max= 1.6017), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:24,928 - root - INFO - Step 30170: lr=1.00E-05, loss= 1.2101 (max= 1.6017), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:24,928 - root - INFO - Step 30170: lr=1.00E-05, loss= 1.2101 (max= 1.6017), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:24,928 - root - INFO - Step 30170: lr=1.00E-05, loss= 1.2101 (max= 1.6017), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:24,928 - root - INFO - Step 30170: lr=1.00E-05, loss= 1.2101 (max= 1.6017), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:40,863 - root - INFO - Step 30180: lr=1.00E-05, loss= 1.1954 (max= 1.9208), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:40,863 - root - INFO - Step 30180: lr=1.00E-05, loss= 1.1954 (max= 1.9208), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:40,863 - root - INFO - Step 30180: lr=1.00E-05, loss= 1.1954 (max= 1.9208), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:40,863 - root - INFO - Step 30180: lr=1.00E-05, loss= 1.1954 (max= 1.9208), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:40,863 - root - INFO - Step 30180: lr=1.00E-05, loss= 1.1954 (max= 1.9208), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:40,864 - root - INFO - Step 30180: lr=1.00E-05, loss= 1.1954 (max= 1.9208), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:40,864 - root - INFO - Step 30180: lr=1.00E-05, loss= 1.1954 (max= 1.9208), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:40,864 - root - INFO - Step 30180: lr=1.00E-05, loss= 1.1954 (max= 1.9208), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:56,790 - root - INFO - Step 30190: lr=1.00E-05, loss= 1.1553 (max= 1.4641), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:56,790 - root - INFO - Step 30190: lr=1.00E-05, loss= 1.1553 (max= 1.4641), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:56,790 - root - INFO - Step 30190: lr=1.00E-05, loss= 1.1553 (max= 1.4641), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:56,790 - root - INFO - Step 30190: lr=1.00E-05, loss= 1.1553 (max= 1.4641), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:56,790 - root - INFO - Step 30190: lr=1.00E-05, loss= 1.1553 (max= 1.4641), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:56,790 - root - INFO - Step 30190: lr=1.00E-05, loss= 1.1553 (max= 1.4641), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:56,790 - root - INFO - Step 30190: lr=1.00E-05, loss= 1.1553 (max= 1.4641), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:57:56,790 - root - INFO - Step 30190: lr=1.00E-05, loss= 1.1553 (max= 1.4641), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:58:03,888 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:221766 +2025-10-24 23:58:12,750 - root - INFO - Step 30200: lr=1.00E-05, loss= 1.2120 (max= 1.7559), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:58:12,750 - root - INFO - Step 30200: lr=1.00E-05, loss= 1.2120 (max= 1.7559), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:58:12,750 - root - INFO - Step 30200: lr=1.00E-05, loss= 1.2120 (max= 1.7559), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:58:12,750 - root - INFO - Step 30200: lr=1.00E-05, loss= 1.2120 (max= 1.7559), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:58:12,750 - root - INFO - Step 30200: lr=1.00E-05, loss= 1.2120 (max= 1.7559), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:58:12,750 - root - INFO - Step 30200: lr=1.00E-05, loss= 1.2120 (max= 1.7559), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:58:12,750 - root - INFO - Step 30200: lr=1.00E-05, loss= 1.2120 (max= 1.7559), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:58:12,750 - root - INFO - Step 30200: lr=1.00E-05, loss= 1.2120 (max= 1.7559), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:58:28,706 - root - INFO - Step 30210: lr=1.00E-05, loss= 1.1780 (max= 1.6276), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:58:28,706 - root - INFO - Step 30210: lr=1.00E-05, loss= 1.1780 (max= 1.6276), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:58:28,706 - root - INFO - Step 30210: lr=1.00E-05, loss= 1.1780 (max= 1.6276), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:58:28,706 - root - INFO - Step 30210: lr=1.00E-05, loss= 1.1780 (max= 1.6276), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:58:28,706 - root - INFO - Step 30210: lr=1.00E-05, loss= 1.1780 (max= 1.6276), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:58:28,706 - root - INFO - Step 30210: lr=1.00E-05, loss= 1.1780 (max= 1.6276), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:58:28,706 - root - INFO - Step 30210: lr=1.00E-05, loss= 1.1780 (max= 1.6276), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:58:28,706 - root - INFO - Step 30210: lr=1.00E-05, loss= 1.1780 (max= 1.6276), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:58:44,635 - root - INFO - Step 30220: lr=1.00E-05, loss= 1.1921 (max= 1.6716), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:58:44,635 - root - INFO - Step 30220: lr=1.00E-05, loss= 1.1921 (max= 1.6716), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:58:44,635 - root - INFO - Step 30220: lr=1.00E-05, loss= 1.1921 (max= 1.6716), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:58:44,636 - root - INFO - Step 30220: lr=1.00E-05, loss= 1.1921 (max= 1.6716), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:58:44,636 - root - INFO - Step 30220: lr=1.00E-05, loss= 1.1921 (max= 1.6716), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:58:44,636 - root - INFO - Step 30220: lr=1.00E-05, loss= 1.1921 (max= 1.6716), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:58:44,636 - root - INFO - Step 30220: lr=1.00E-05, loss= 1.1921 (max= 1.6716), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:58:44,636 - root - INFO - Step 30220: lr=1.00E-05, loss= 1.1921 (max= 1.6716), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:59:00,599 - root - INFO - Step 30230: lr=1.00E-05, loss= 1.2075 (max= 1.5754), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:00,599 - root - INFO - Step 30230: lr=1.00E-05, loss= 1.2075 (max= 1.5754), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:00,599 - root - INFO - Step 30230: lr=1.00E-05, loss= 1.2075 (max= 1.5754), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:00,599 - root - INFO - Step 30230: lr=1.00E-05, loss= 1.2075 (max= 1.5754), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:00,599 - root - INFO - Step 30230: lr=1.00E-05, loss= 1.2075 (max= 1.5754), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:00,599 - root - INFO - Step 30230: lr=1.00E-05, loss= 1.2075 (max= 1.5754), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:00,599 - root - INFO - Step 30230: lr=1.00E-05, loss= 1.2075 (max= 1.5754), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:00,599 - root - INFO - Step 30230: lr=1.00E-05, loss= 1.2075 (max= 1.5754), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:16,532 - root - INFO - Step 30240: lr=1.00E-05, loss= 1.2129 (max= 1.4955), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:16,532 - root - INFO - Step 30240: lr=1.00E-05, loss= 1.2129 (max= 1.4955), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:16,532 - root - INFO - Step 30240: lr=1.00E-05, loss= 1.2129 (max= 1.4955), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:16,532 - root - INFO - Step 30240: lr=1.00E-05, loss= 1.2129 (max= 1.4955), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:16,532 - root - INFO - Step 30240: lr=1.00E-05, loss= 1.2129 (max= 1.4955), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:16,532 - root - INFO - Step 30240: lr=1.00E-05, loss= 1.2129 (max= 1.4955), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:16,532 - root - INFO - Step 30240: lr=1.00E-05, loss= 1.2129 (max= 1.4955), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:16,532 - root - INFO - Step 30240: lr=1.00E-05, loss= 1.2129 (max= 1.4955), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:32,473 - root - INFO - Step 30250: lr=1.00E-05, loss= 1.1893 (max= 1.6545), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:32,473 - root - INFO - Step 30250: lr=1.00E-05, loss= 1.1893 (max= 1.6545), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:32,473 - root - INFO - Step 30250: lr=1.00E-05, loss= 1.1893 (max= 1.6545), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:32,473 - root - INFO - Step 30250: lr=1.00E-05, loss= 1.1893 (max= 1.6545), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:32,473 - root - INFO - Step 30250: lr=1.00E-05, loss= 1.1893 (max= 1.6545), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:32,473 - root - INFO - Step 30250: lr=1.00E-05, loss= 1.1893 (max= 1.6545), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:32,473 - root - INFO - Step 30250: lr=1.00E-05, loss= 1.1893 (max= 1.6545), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:32,473 - root - INFO - Step 30250: lr=1.00E-05, loss= 1.1893 (max= 1.6545), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-24 23:59:48,393 - root - INFO - Step 30260: lr=1.00E-05, loss= 1.1868 (max= 1.5518), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:59:48,393 - root - INFO - Step 30260: lr=1.00E-05, loss= 1.1868 (max= 1.5518), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:59:48,393 - root - INFO - Step 30260: lr=1.00E-05, loss= 1.1868 (max= 1.5518), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:59:48,393 - root - INFO - Step 30260: lr=1.00E-05, loss= 1.1868 (max= 1.5518), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:59:48,393 - root - INFO - Step 30260: lr=1.00E-05, loss= 1.1868 (max= 1.5518), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:59:48,393 - root - INFO - Step 30260: lr=1.00E-05, loss= 1.1868 (max= 1.5518), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:59:48,393 - root - INFO - Step 30260: lr=1.00E-05, loss= 1.1868 (max= 1.5518), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-24 23:59:48,393 - root - INFO - Step 30260: lr=1.00E-05, loss= 1.1868 (max= 1.5518), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:04,274 - root - INFO - Step 30270: lr=1.00E-05, loss= 1.2150 (max= 1.6332), tps=20637, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:00:04,274 - root - INFO - Step 30270: lr=1.00E-05, loss= 1.2150 (max= 1.6332), tps=20637, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:00:04,274 - root - INFO - Step 30270: lr=1.00E-05, loss= 1.2150 (max= 1.6332), tps=20638, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:00:04,274 - root - INFO - Step 30270: lr=1.00E-05, loss= 1.2150 (max= 1.6332), tps=20638, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:00:04,274 - root - INFO - Step 30270: lr=1.00E-05, loss= 1.2150 (max= 1.6332), tps=20638, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:00:04,274 - root - INFO - Step 30270: lr=1.00E-05, loss= 1.2150 (max= 1.6332), tps=20637, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:00:04,274 - root - INFO - Step 30270: lr=1.00E-05, loss= 1.2150 (max= 1.6332), tps=20638, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:00:04,274 - root - INFO - Step 30270: lr=1.00E-05, loss= 1.2150 (max= 1.6332), tps=20637, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:00:20,220 - root - INFO - Step 30280: lr=1.00E-05, loss= 1.1711 (max= 1.5578), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:20,220 - root - INFO - Step 30280: lr=1.00E-05, loss= 1.1711 (max= 1.5578), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:20,220 - root - INFO - Step 30280: lr=1.00E-05, loss= 1.1711 (max= 1.5578), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:20,220 - root - INFO - Step 30280: lr=1.00E-05, loss= 1.1711 (max= 1.5578), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:20,220 - root - INFO - Step 30280: lr=1.00E-05, loss= 1.1711 (max= 1.5578), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:20,221 - root - INFO - Step 30280: lr=1.00E-05, loss= 1.1711 (max= 1.5578), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:20,221 - root - INFO - Step 30280: lr=1.00E-05, loss= 1.1711 (max= 1.5578), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:20,221 - root - INFO - Step 30280: lr=1.00E-05, loss= 1.1711 (max= 1.5578), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:36,173 - root - INFO - Step 30290: lr=1.00E-05, loss= 1.1905 (max= 1.5360), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:36,173 - root - INFO - Step 30290: lr=1.00E-05, loss= 1.1905 (max= 1.5360), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:36,173 - root - INFO - Step 30290: lr=1.00E-05, loss= 1.1905 (max= 1.5360), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:36,173 - root - INFO - Step 30290: lr=1.00E-05, loss= 1.1905 (max= 1.5360), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:36,173 - root - INFO - Step 30290: lr=1.00E-05, loss= 1.1905 (max= 1.5360), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:36,173 - root - INFO - Step 30290: lr=1.00E-05, loss= 1.1905 (max= 1.5360), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:36,173 - root - INFO - Step 30290: lr=1.00E-05, loss= 1.1905 (max= 1.5360), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:36,174 - root - INFO - Step 30290: lr=1.00E-05, loss= 1.1905 (max= 1.5360), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:52,164 - root - INFO - Step 30300: lr=1.00E-05, loss= 1.1594 (max= 1.6453), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:52,164 - root - INFO - Step 30300: lr=1.00E-05, loss= 1.1594 (max= 1.6453), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:52,164 - root - INFO - Step 30300: lr=1.00E-05, loss= 1.1594 (max= 1.6453), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:52,164 - root - INFO - Step 30300: lr=1.00E-05, loss= 1.1594 (max= 1.6453), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:52,164 - root - INFO - Step 30300: lr=1.00E-05, loss= 1.1594 (max= 1.6453), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:52,164 - root - INFO - Step 30300: lr=1.00E-05, loss= 1.1594 (max= 1.6453), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:52,164 - root - INFO - Step 30300: lr=1.00E-05, loss= 1.1594 (max= 1.6453), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:00:52,164 - root - INFO - Step 30300: lr=1.00E-05, loss= 1.1594 (max= 1.6453), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:01:08,123 - root - INFO - Step 30310: lr=1.00E-05, loss= 1.1904 (max= 1.5686), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:01:08,123 - root - INFO - Step 30310: lr=1.00E-05, loss= 1.1904 (max= 1.5686), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:01:08,123 - root - INFO - Step 30310: lr=1.00E-05, loss= 1.1904 (max= 1.5686), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:01:08,123 - root - INFO - Step 30310: lr=1.00E-05, loss= 1.1904 (max= 1.5686), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:01:08,123 - root - INFO - Step 30310: lr=1.00E-05, loss= 1.1904 (max= 1.5686), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:01:08,123 - root - INFO - Step 30310: lr=1.00E-05, loss= 1.1904 (max= 1.5686), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:01:08,123 - root - INFO - Step 30310: lr=1.00E-05, loss= 1.1904 (max= 1.5686), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:01:08,123 - root - INFO - Step 30310: lr=1.00E-05, loss= 1.1904 (max= 1.5686), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:01:24,029 - root - INFO - Step 30320: lr=1.00E-05, loss= 1.2074 (max= 1.7818), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:01:24,029 - root - INFO - Step 30320: lr=1.00E-05, loss= 1.2074 (max= 1.7818), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:01:24,029 - root - INFO - Step 30320: lr=1.00E-05, loss= 1.2074 (max= 1.7818), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:01:24,029 - root - INFO - Step 30320: lr=1.00E-05, loss= 1.2074 (max= 1.7818), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:01:24,029 - root - INFO - Step 30320: lr=1.00E-05, loss= 1.2074 (max= 1.7818), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:01:24,029 - root - INFO - Step 30320: lr=1.00E-05, loss= 1.2074 (max= 1.7818), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:01:24,029 - root - INFO - Step 30320: lr=1.00E-05, loss= 1.2074 (max= 1.7818), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:01:24,029 - root - INFO - Step 30320: lr=1.00E-05, loss= 1.2074 (max= 1.7818), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:01:39,976 - root - INFO - Step 30330: lr=1.00E-05, loss= 1.2012 (max= 1.7850), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:01:39,976 - root - INFO - Step 30330: lr=1.00E-05, loss= 1.2012 (max= 1.7850), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:01:39,976 - root - INFO - Step 30330: lr=1.00E-05, loss= 1.2012 (max= 1.7850), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:01:39,976 - root - INFO - Step 30330: lr=1.00E-05, loss= 1.2012 (max= 1.7850), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:01:39,976 - root - INFO - Step 30330: lr=1.00E-05, loss= 1.2012 (max= 1.7850), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:01:39,976 - root - INFO - Step 30330: lr=1.00E-05, loss= 1.2012 (max= 1.7850), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:01:39,976 - root - INFO - Step 30330: lr=1.00E-05, loss= 1.2012 (max= 1.7850), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:01:39,976 - root - INFO - Step 30330: lr=1.00E-05, loss= 1.2012 (max= 1.7850), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:01:55,944 - root - INFO - Step 30340: lr=1.00E-05, loss= 1.1675 (max= 1.5553), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:01:55,944 - root - INFO - Step 30340: lr=1.00E-05, loss= 1.1675 (max= 1.5553), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:01:55,944 - root - INFO - Step 30340: lr=1.00E-05, loss= 1.1675 (max= 1.5553), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:01:55,944 - root - INFO - Step 30340: lr=1.00E-05, loss= 1.1675 (max= 1.5553), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:01:55,944 - root - INFO - Step 30340: lr=1.00E-05, loss= 1.1675 (max= 1.5553), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:01:55,944 - root - INFO - Step 30340: lr=1.00E-05, loss= 1.1675 (max= 1.5553), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:01:55,944 - root - INFO - Step 30340: lr=1.00E-05, loss= 1.1675 (max= 1.5553), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:01:55,944 - root - INFO - Step 30340: lr=1.00E-05, loss= 1.1675 (max= 1.5553), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:11,949 - root - INFO - Step 30350: lr=1.00E-05, loss= 1.1577 (max= 1.6061), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:02:11,949 - root - INFO - Step 30350: lr=1.00E-05, loss= 1.1577 (max= 1.6061), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:02:11,949 - root - INFO - Step 30350: lr=1.00E-05, loss= 1.1577 (max= 1.6061), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:02:11,949 - root - INFO - Step 30350: lr=1.00E-05, loss= 1.1577 (max= 1.6061), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:02:11,949 - root - INFO - Step 30350: lr=1.00E-05, loss= 1.1577 (max= 1.6061), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:02:11,949 - root - INFO - Step 30350: lr=1.00E-05, loss= 1.1577 (max= 1.6061), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:02:11,949 - root - INFO - Step 30350: lr=1.00E-05, loss= 1.1577 (max= 1.6061), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:02:11,949 - root - INFO - Step 30350: lr=1.00E-05, loss= 1.1577 (max= 1.6061), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:02:27,881 - root - INFO - Step 30360: lr=1.00E-05, loss= 1.1860 (max= 1.5666), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:27,881 - root - INFO - Step 30360: lr=1.00E-05, loss= 1.1860 (max= 1.5666), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:27,882 - root - INFO - Step 30360: lr=1.00E-05, loss= 1.1860 (max= 1.5666), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:27,882 - root - INFO - Step 30360: lr=1.00E-05, loss= 1.1860 (max= 1.5666), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:27,882 - root - INFO - Step 30360: lr=1.00E-05, loss= 1.1860 (max= 1.5666), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:27,882 - root - INFO - Step 30360: lr=1.00E-05, loss= 1.1860 (max= 1.5666), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:27,882 - root - INFO - Step 30360: lr=1.00E-05, loss= 1.1860 (max= 1.5666), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:27,882 - root - INFO - Step 30360: lr=1.00E-05, loss= 1.1860 (max= 1.5666), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:43,857 - root - INFO - Step 30370: lr=1.00E-05, loss= 1.1942 (max= 1.5192), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:43,857 - root - INFO - Step 30370: lr=1.00E-05, loss= 1.1942 (max= 1.5192), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:43,857 - root - INFO - Step 30370: lr=1.00E-05, loss= 1.1942 (max= 1.5192), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:43,857 - root - INFO - Step 30370: lr=1.00E-05, loss= 1.1942 (max= 1.5192), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:43,857 - root - INFO - Step 30370: lr=1.00E-05, loss= 1.1942 (max= 1.5192), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:43,857 - root - INFO - Step 30370: lr=1.00E-05, loss= 1.1942 (max= 1.5192), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:43,857 - root - INFO - Step 30370: lr=1.00E-05, loss= 1.1942 (max= 1.5192), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:43,857 - root - INFO - Step 30370: lr=1.00E-05, loss= 1.1942 (max= 1.5192), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:59,810 - root - INFO - Step 30380: lr=1.00E-05, loss= 1.1993 (max= 1.5737), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:59,810 - root - INFO - Step 30380: lr=1.00E-05, loss= 1.1993 (max= 1.5737), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:59,810 - root - INFO - Step 30380: lr=1.00E-05, loss= 1.1993 (max= 1.5737), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:59,810 - root - INFO - Step 30380: lr=1.00E-05, loss= 1.1993 (max= 1.5737), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:59,810 - root - INFO - Step 30380: lr=1.00E-05, loss= 1.1993 (max= 1.5737), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:59,810 - root - INFO - Step 30380: lr=1.00E-05, loss= 1.1993 (max= 1.5737), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:59,810 - root - INFO - Step 30380: lr=1.00E-05, loss= 1.1993 (max= 1.5737), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:02:59,811 - root - INFO - Step 30380: lr=1.00E-05, loss= 1.1993 (max= 1.5737), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:15,762 - root - INFO - Step 30390: lr=1.00E-05, loss= 1.1967 (max= 1.8641), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:15,762 - root - INFO - Step 30390: lr=1.00E-05, loss= 1.1967 (max= 1.8641), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:15,762 - root - INFO - Step 30390: lr=1.00E-05, loss= 1.1967 (max= 1.8641), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:15,762 - root - INFO - Step 30390: lr=1.00E-05, loss= 1.1967 (max= 1.8641), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:15,762 - root - INFO - Step 30390: lr=1.00E-05, loss= 1.1967 (max= 1.8641), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:15,762 - root - INFO - Step 30390: lr=1.00E-05, loss= 1.1967 (max= 1.8641), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:15,762 - root - INFO - Step 30390: lr=1.00E-05, loss= 1.1967 (max= 1.8641), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:15,762 - root - INFO - Step 30390: lr=1.00E-05, loss= 1.1967 (max= 1.8641), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:31,673 - root - INFO - Step 30400: lr=1.00E-05, loss= 1.1945 (max= 1.4853), tps=20597, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:31,673 - root - INFO - Step 30400: lr=1.00E-05, loss= 1.1945 (max= 1.4853), tps=20597, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:31,674 - root - INFO - Step 30400: lr=1.00E-05, loss= 1.1945 (max= 1.4853), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:31,674 - root - INFO - Step 30400: lr=1.00E-05, loss= 1.1945 (max= 1.4853), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:31,674 - root - INFO - Step 30400: lr=1.00E-05, loss= 1.1945 (max= 1.4853), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:31,674 - root - INFO - Step 30400: lr=1.00E-05, loss= 1.1945 (max= 1.4853), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:31,674 - root - INFO - Step 30400: lr=1.00E-05, loss= 1.1945 (max= 1.4853), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:31,674 - root - INFO - Step 30400: lr=1.00E-05, loss= 1.1945 (max= 1.4853), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:47,623 - root - INFO - Step 30410: lr=1.00E-05, loss= 1.2006 (max= 1.6453), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:47,623 - root - INFO - Step 30410: lr=1.00E-05, loss= 1.2006 (max= 1.6453), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:47,623 - root - INFO - Step 30410: lr=1.00E-05, loss= 1.2006 (max= 1.6453), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:47,623 - root - INFO - Step 30410: lr=1.00E-05, loss= 1.2006 (max= 1.6453), tps=20549, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:47,623 - root - INFO - Step 30410: lr=1.00E-05, loss= 1.2006 (max= 1.6453), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:47,623 - root - INFO - Step 30410: lr=1.00E-05, loss= 1.2006 (max= 1.6453), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:47,623 - root - INFO - Step 30410: lr=1.00E-05, loss= 1.2006 (max= 1.6453), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:03:47,623 - root - INFO - Step 30410: lr=1.00E-05, loss= 1.2006 (max= 1.6453), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:03,544 - root - INFO - Step 30420: lr=1.00E-05, loss= 1.1525 (max= 1.5265), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:03,544 - root - INFO - Step 30420: lr=1.00E-05, loss= 1.1525 (max= 1.5265), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:03,545 - root - INFO - Step 30420: lr=1.00E-05, loss= 1.1525 (max= 1.5265), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:03,545 - root - INFO - Step 30420: lr=1.00E-05, loss= 1.1525 (max= 1.5265), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:03,545 - root - INFO - Step 30420: lr=1.00E-05, loss= 1.1525 (max= 1.5265), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:03,545 - root - INFO - Step 30420: lr=1.00E-05, loss= 1.1525 (max= 1.5265), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:03,545 - root - INFO - Step 30420: lr=1.00E-05, loss= 1.1525 (max= 1.5265), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:03,545 - root - INFO - Step 30420: lr=1.00E-05, loss= 1.1525 (max= 1.5265), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:19,457 - root - INFO - Step 30430: lr=1.00E-05, loss= 1.2065 (max= 1.7397), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:04:19,457 - root - INFO - Step 30430: lr=1.00E-05, loss= 1.2065 (max= 1.7397), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:04:19,457 - root - INFO - Step 30430: lr=1.00E-05, loss= 1.2065 (max= 1.7397), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:04:19,458 - root - INFO - Step 30430: lr=1.00E-05, loss= 1.2065 (max= 1.7397), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:04:19,458 - root - INFO - Step 30430: lr=1.00E-05, loss= 1.2065 (max= 1.7397), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:04:19,458 - root - INFO - Step 30430: lr=1.00E-05, loss= 1.2065 (max= 1.7397), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:04:19,458 - root - INFO - Step 30430: lr=1.00E-05, loss= 1.2065 (max= 1.7397), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:04:19,458 - root - INFO - Step 30430: lr=1.00E-05, loss= 1.2065 (max= 1.7397), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:04:35,404 - root - INFO - Step 30440: lr=1.00E-05, loss= 1.1903 (max= 1.8736), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:35,404 - root - INFO - Step 30440: lr=1.00E-05, loss= 1.1903 (max= 1.8736), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:35,404 - root - INFO - Step 30440: lr=1.00E-05, loss= 1.1903 (max= 1.8736), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:35,404 - root - INFO - Step 30440: lr=1.00E-05, loss= 1.1903 (max= 1.8736), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:35,404 - root - INFO - Step 30440: lr=1.00E-05, loss= 1.1903 (max= 1.8736), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:35,404 - root - INFO - Step 30440: lr=1.00E-05, loss= 1.1903 (max= 1.8736), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:35,404 - root - INFO - Step 30440: lr=1.00E-05, loss= 1.1903 (max= 1.8736), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:35,404 - root - INFO - Step 30440: lr=1.00E-05, loss= 1.1903 (max= 1.8736), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:51,321 - root - INFO - Step 30450: lr=1.00E-05, loss= 1.1956 (max= 1.6532), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:51,321 - root - INFO - Step 30450: lr=1.00E-05, loss= 1.1956 (max= 1.6532), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:51,321 - root - INFO - Step 30450: lr=1.00E-05, loss= 1.1956 (max= 1.6532), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:51,321 - root - INFO - Step 30450: lr=1.00E-05, loss= 1.1956 (max= 1.6532), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:51,321 - root - INFO - Step 30450: lr=1.00E-05, loss= 1.1956 (max= 1.6532), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:51,321 - root - INFO - Step 30450: lr=1.00E-05, loss= 1.1956 (max= 1.6532), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:51,321 - root - INFO - Step 30450: lr=1.00E-05, loss= 1.1956 (max= 1.6532), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:04:51,321 - root - INFO - Step 30450: lr=1.00E-05, loss= 1.1956 (max= 1.6532), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:07,237 - root - INFO - Step 30460: lr=1.00E-05, loss= 1.1926 (max= 1.5452), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:07,237 - root - INFO - Step 30460: lr=1.00E-05, loss= 1.1926 (max= 1.5452), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:07,237 - root - INFO - Step 30460: lr=1.00E-05, loss= 1.1926 (max= 1.5452), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:07,237 - root - INFO - Step 30460: lr=1.00E-05, loss= 1.1926 (max= 1.5452), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:07,237 - root - INFO - Step 30460: lr=1.00E-05, loss= 1.1926 (max= 1.5452), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:07,237 - root - INFO - Step 30460: lr=1.00E-05, loss= 1.1926 (max= 1.5452), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:07,237 - root - INFO - Step 30460: lr=1.00E-05, loss= 1.1926 (max= 1.5452), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:07,237 - root - INFO - Step 30460: lr=1.00E-05, loss= 1.1926 (max= 1.5452), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:23,166 - root - INFO - Step 30470: lr=1.00E-05, loss= 1.1702 (max= 1.5118), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:23,166 - root - INFO - Step 30470: lr=1.00E-05, loss= 1.1702 (max= 1.5118), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:23,166 - root - INFO - Step 30470: lr=1.00E-05, loss= 1.1702 (max= 1.5118), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:23,166 - root - INFO - Step 30470: lr=1.00E-05, loss= 1.1702 (max= 1.5118), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:23,166 - root - INFO - Step 30470: lr=1.00E-05, loss= 1.1702 (max= 1.5118), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:23,166 - root - INFO - Step 30470: lr=1.00E-05, loss= 1.1702 (max= 1.5118), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:23,166 - root - INFO - Step 30470: lr=1.00E-05, loss= 1.1702 (max= 1.5118), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:23,166 - root - INFO - Step 30470: lr=1.00E-05, loss= 1.1702 (max= 1.5118), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:39,093 - root - INFO - Step 30480: lr=1.00E-05, loss= 1.1754 (max= 1.6864), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:39,093 - root - INFO - Step 30480: lr=1.00E-05, loss= 1.1754 (max= 1.6864), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:39,093 - root - INFO - Step 30480: lr=1.00E-05, loss= 1.1754 (max= 1.6864), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:39,093 - root - INFO - Step 30480: lr=1.00E-05, loss= 1.1754 (max= 1.6864), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:39,093 - root - INFO - Step 30480: lr=1.00E-05, loss= 1.1754 (max= 1.6864), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:39,093 - root - INFO - Step 30480: lr=1.00E-05, loss= 1.1754 (max= 1.6864), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:39,093 - root - INFO - Step 30480: lr=1.00E-05, loss= 1.1754 (max= 1.6864), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:39,093 - root - INFO - Step 30480: lr=1.00E-05, loss= 1.1754 (max= 1.6864), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:05:54,985 - root - INFO - Step 30490: lr=1.00E-05, loss= 1.2001 (max= 1.5794), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:05:54,985 - root - INFO - Step 30490: lr=1.00E-05, loss= 1.2001 (max= 1.5794), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:05:54,985 - root - INFO - Step 30490: lr=1.00E-05, loss= 1.2001 (max= 1.5794), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:05:54,985 - root - INFO - Step 30490: lr=1.00E-05, loss= 1.2001 (max= 1.5794), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:05:54,986 - root - INFO - Step 30490: lr=1.00E-05, loss= 1.2001 (max= 1.5794), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:05:54,986 - root - INFO - Step 30490: lr=1.00E-05, loss= 1.2001 (max= 1.5794), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:05:54,986 - root - INFO - Step 30490: lr=1.00E-05, loss= 1.2001 (max= 1.5794), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:05:54,986 - root - INFO - Step 30490: lr=1.00E-05, loss= 1.2001 (max= 1.5794), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:10,939 - root - INFO - Step 30500: lr=1.00E-05, loss= 1.1463 (max= 1.5287), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:10,939 - root - INFO - Step 30500: lr=1.00E-05, loss= 1.1463 (max= 1.5287), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:10,939 - root - INFO - Step 30500: lr=1.00E-05, loss= 1.1463 (max= 1.5287), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:10,939 - root - INFO - Step 30500: lr=1.00E-05, loss= 1.1463 (max= 1.5287), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:10,939 - root - INFO - Step 30500: lr=1.00E-05, loss= 1.1463 (max= 1.5287), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:10,939 - root - INFO - Step 30500: lr=1.00E-05, loss= 1.1463 (max= 1.5287), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:10,939 - root - INFO - Step 30500: lr=1.00E-05, loss= 1.1463 (max= 1.5287), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:10,939 - root - INFO - Step 30500: lr=1.00E-05, loss= 1.1463 (max= 1.5287), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:26,845 - root - INFO - Step 30510: lr=1.00E-05, loss= 1.2127 (max= 1.6246), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:06:26,845 - root - INFO - Step 30510: lr=1.00E-05, loss= 1.2127 (max= 1.6246), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:06:26,845 - root - INFO - Step 30510: lr=1.00E-05, loss= 1.2127 (max= 1.6246), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:06:26,845 - root - INFO - Step 30510: lr=1.00E-05, loss= 1.2127 (max= 1.6246), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:06:26,846 - root - INFO - Step 30510: lr=1.00E-05, loss= 1.2127 (max= 1.6246), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:06:26,846 - root - INFO - Step 30510: lr=1.00E-05, loss= 1.2127 (max= 1.6246), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:06:26,846 - root - INFO - Step 30510: lr=1.00E-05, loss= 1.2127 (max= 1.6246), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:06:26,846 - root - INFO - Step 30510: lr=1.00E-05, loss= 1.2127 (max= 1.6246), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:06:42,775 - root - INFO - Step 30520: lr=1.00E-05, loss= 1.2182 (max= 1.8058), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:42,775 - root - INFO - Step 30520: lr=1.00E-05, loss= 1.2182 (max= 1.8058), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:42,775 - root - INFO - Step 30520: lr=1.00E-05, loss= 1.2182 (max= 1.8058), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:42,775 - root - INFO - Step 30520: lr=1.00E-05, loss= 1.2182 (max= 1.8058), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:42,775 - root - INFO - Step 30520: lr=1.00E-05, loss= 1.2182 (max= 1.8058), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:42,775 - root - INFO - Step 30520: lr=1.00E-05, loss= 1.2182 (max= 1.8058), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:42,775 - root - INFO - Step 30520: lr=1.00E-05, loss= 1.2182 (max= 1.8058), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:42,775 - root - INFO - Step 30520: lr=1.00E-05, loss= 1.2182 (max= 1.8058), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:58,741 - root - INFO - Step 30530: lr=1.00E-05, loss= 1.1666 (max= 1.6970), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:58,741 - root - INFO - Step 30530: lr=1.00E-05, loss= 1.1666 (max= 1.6970), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:58,741 - root - INFO - Step 30530: lr=1.00E-05, loss= 1.1666 (max= 1.6970), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:58,741 - root - INFO - Step 30530: lr=1.00E-05, loss= 1.1666 (max= 1.6970), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:58,742 - root - INFO - Step 30530: lr=1.00E-05, loss= 1.1666 (max= 1.6970), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:58,742 - root - INFO - Step 30530: lr=1.00E-05, loss= 1.1666 (max= 1.6970), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:58,742 - root - INFO - Step 30530: lr=1.00E-05, loss= 1.1666 (max= 1.6970), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:06:58,742 - root - INFO - Step 30530: lr=1.00E-05, loss= 1.1666 (max= 1.6970), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:07:14,711 - root - INFO - Step 30540: lr=1.00E-05, loss= 1.1941 (max= 1.8799), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:07:14,711 - root - INFO - Step 30540: lr=1.00E-05, loss= 1.1941 (max= 1.8799), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:07:14,711 - root - INFO - Step 30540: lr=1.00E-05, loss= 1.1941 (max= 1.8799), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:07:14,711 - root - INFO - Step 30540: lr=1.00E-05, loss= 1.1941 (max= 1.8799), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:07:14,711 - root - INFO - Step 30540: lr=1.00E-05, loss= 1.1941 (max= 1.8799), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:07:14,711 - root - INFO - Step 30540: lr=1.00E-05, loss= 1.1941 (max= 1.8799), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:07:14,711 - root - INFO - Step 30540: lr=1.00E-05, loss= 1.1941 (max= 1.8799), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:07:14,711 - root - INFO - Step 30540: lr=1.00E-05, loss= 1.1941 (max= 1.8799), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:07:30,650 - root - INFO - Step 30550: lr=1.00E-05, loss= 1.2165 (max= 1.5854), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:07:30,650 - root - INFO - Step 30550: lr=1.00E-05, loss= 1.2165 (max= 1.5854), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:07:30,650 - root - INFO - Step 30550: lr=1.00E-05, loss= 1.2165 (max= 1.5854), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:07:30,651 - root - INFO - Step 30550: lr=1.00E-05, loss= 1.2165 (max= 1.5854), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:07:30,651 - root - INFO - Step 30550: lr=1.00E-05, loss= 1.2165 (max= 1.5854), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:07:30,651 - root - INFO - Step 30550: lr=1.00E-05, loss= 1.2165 (max= 1.5854), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:07:30,651 - root - INFO - Step 30550: lr=1.00E-05, loss= 1.2165 (max= 1.5854), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:07:30,651 - root - INFO - Step 30550: lr=1.00E-05, loss= 1.2165 (max= 1.5854), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:07:46,589 - root - INFO - Step 30560: lr=1.00E-05, loss= 1.1468 (max= 1.5151), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:07:46,589 - root - INFO - Step 30560: lr=1.00E-05, loss= 1.1468 (max= 1.5151), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:07:46,589 - root - INFO - Step 30560: lr=1.00E-05, loss= 1.1468 (max= 1.5151), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:07:46,589 - root - INFO - Step 30560: lr=1.00E-05, loss= 1.1468 (max= 1.5151), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:07:46,589 - root - INFO - Step 30560: lr=1.00E-05, loss= 1.1468 (max= 1.5151), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:07:46,589 - root - INFO - Step 30560: lr=1.00E-05, loss= 1.1468 (max= 1.5151), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:07:46,589 - root - INFO - Step 30560: lr=1.00E-05, loss= 1.1468 (max= 1.5151), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:07:46,589 - root - INFO - Step 30560: lr=1.00E-05, loss= 1.1468 (max= 1.5151), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:08:02,544 - root - INFO - Step 30570: lr=1.00E-05, loss= 1.1809 (max= 1.7279), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:02,544 - root - INFO - Step 30570: lr=1.00E-05, loss= 1.1809 (max= 1.7279), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:02,544 - root - INFO - Step 30570: lr=1.00E-05, loss= 1.1809 (max= 1.7279), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:02,544 - root - INFO - Step 30570: lr=1.00E-05, loss= 1.1809 (max= 1.7279), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:02,544 - root - INFO - Step 30570: lr=1.00E-05, loss= 1.1809 (max= 1.7279), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:02,544 - root - INFO - Step 30570: lr=1.00E-05, loss= 1.1809 (max= 1.7279), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:02,544 - root - INFO - Step 30570: lr=1.00E-05, loss= 1.1809 (max= 1.7279), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:02,544 - root - INFO - Step 30570: lr=1.00E-05, loss= 1.1809 (max= 1.7279), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:18,458 - root - INFO - Step 30580: lr=1.00E-05, loss= 1.2095 (max= 1.6457), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:18,458 - root - INFO - Step 30580: lr=1.00E-05, loss= 1.2095 (max= 1.6457), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:18,458 - root - INFO - Step 30580: lr=1.00E-05, loss= 1.2095 (max= 1.6457), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:18,458 - root - INFO - Step 30580: lr=1.00E-05, loss= 1.2095 (max= 1.6457), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:18,458 - root - INFO - Step 30580: lr=1.00E-05, loss= 1.2095 (max= 1.6457), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:18,458 - root - INFO - Step 30580: lr=1.00E-05, loss= 1.2095 (max= 1.6457), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:18,458 - root - INFO - Step 30580: lr=1.00E-05, loss= 1.2095 (max= 1.6457), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:18,458 - root - INFO - Step 30580: lr=1.00E-05, loss= 1.2095 (max= 1.6457), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:34,404 - root - INFO - Step 30590: lr=1.00E-05, loss= 1.1889 (max= 1.6250), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:34,404 - root - INFO - Step 30590: lr=1.00E-05, loss= 1.1889 (max= 1.6250), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:34,404 - root - INFO - Step 30590: lr=1.00E-05, loss= 1.1889 (max= 1.6250), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:34,404 - root - INFO - Step 30590: lr=1.00E-05, loss= 1.1889 (max= 1.6250), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:34,404 - root - INFO - Step 30590: lr=1.00E-05, loss= 1.1889 (max= 1.6250), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:34,404 - root - INFO - Step 30590: lr=1.00E-05, loss= 1.1889 (max= 1.6250), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:34,404 - root - INFO - Step 30590: lr=1.00E-05, loss= 1.1889 (max= 1.6250), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:34,404 - root - INFO - Step 30590: lr=1.00E-05, loss= 1.1889 (max= 1.6250), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:50,389 - root - INFO - Step 30600: lr=1.00E-05, loss= 1.2073 (max= 1.6614), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:50,389 - root - INFO - Step 30600: lr=1.00E-05, loss= 1.2073 (max= 1.6614), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:50,390 - root - INFO - Step 30600: lr=1.00E-05, loss= 1.2073 (max= 1.6614), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:50,390 - root - INFO - Step 30600: lr=1.00E-05, loss= 1.2073 (max= 1.6614), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:50,390 - root - INFO - Step 30600: lr=1.00E-05, loss= 1.2073 (max= 1.6614), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:50,390 - root - INFO - Step 30600: lr=1.00E-05, loss= 1.2073 (max= 1.6614), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:50,390 - root - INFO - Step 30600: lr=1.00E-05, loss= 1.2073 (max= 1.6614), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:08:50,390 - root - INFO - Step 30600: lr=1.00E-05, loss= 1.2073 (max= 1.6614), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:06,341 - root - INFO - Step 30610: lr=1.00E-05, loss= 1.1778 (max= 1.5679), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:06,341 - root - INFO - Step 30610: lr=1.00E-05, loss= 1.1778 (max= 1.5679), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:06,341 - root - INFO - Step 30610: lr=1.00E-05, loss= 1.1778 (max= 1.5679), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:06,341 - root - INFO - Step 30610: lr=1.00E-05, loss= 1.1778 (max= 1.5679), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:06,341 - root - INFO - Step 30610: lr=1.00E-05, loss= 1.1778 (max= 1.5679), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:06,341 - root - INFO - Step 30610: lr=1.00E-05, loss= 1.1778 (max= 1.5679), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:06,341 - root - INFO - Step 30610: lr=1.00E-05, loss= 1.1778 (max= 1.5679), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:06,341 - root - INFO - Step 30610: lr=1.00E-05, loss= 1.1778 (max= 1.5679), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:22,258 - root - INFO - Step 30620: lr=1.00E-05, loss= 1.1678 (max= 1.6075), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:22,258 - root - INFO - Step 30620: lr=1.00E-05, loss= 1.1678 (max= 1.6075), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:22,258 - root - INFO - Step 30620: lr=1.00E-05, loss= 1.1678 (max= 1.6075), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:22,258 - root - INFO - Step 30620: lr=1.00E-05, loss= 1.1678 (max= 1.6075), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:22,258 - root - INFO - Step 30620: lr=1.00E-05, loss= 1.1678 (max= 1.6075), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:22,258 - root - INFO - Step 30620: lr=1.00E-05, loss= 1.1678 (max= 1.6075), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:22,258 - root - INFO - Step 30620: lr=1.00E-05, loss= 1.1678 (max= 1.6075), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:22,258 - root - INFO - Step 30620: lr=1.00E-05, loss= 1.1678 (max= 1.6075), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:38,215 - root - INFO - Step 30630: lr=1.00E-05, loss= 1.2181 (max= 1.7114), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:09:38,215 - root - INFO - Step 30630: lr=1.00E-05, loss= 1.2181 (max= 1.7114), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:09:38,215 - root - INFO - Step 30630: lr=1.00E-05, loss= 1.2181 (max= 1.7114), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:09:38,215 - root - INFO - Step 30630: lr=1.00E-05, loss= 1.2181 (max= 1.7114), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:09:38,215 - root - INFO - Step 30630: lr=1.00E-05, loss= 1.2181 (max= 1.7114), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:09:38,215 - root - INFO - Step 30630: lr=1.00E-05, loss= 1.2181 (max= 1.7114), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:09:38,215 - root - INFO - Step 30630: lr=1.00E-05, loss= 1.2181 (max= 1.7114), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:09:38,215 - root - INFO - Step 30630: lr=1.00E-05, loss= 1.2181 (max= 1.7114), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:09:54,159 - root - INFO - Step 30640: lr=1.00E-05, loss= 1.1761 (max= 1.5142), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:54,159 - root - INFO - Step 30640: lr=1.00E-05, loss= 1.1761 (max= 1.5142), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:54,159 - root - INFO - Step 30640: lr=1.00E-05, loss= 1.1761 (max= 1.5142), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:54,159 - root - INFO - Step 30640: lr=1.00E-05, loss= 1.1761 (max= 1.5142), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:54,159 - root - INFO - Step 30640: lr=1.00E-05, loss= 1.1761 (max= 1.5142), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:54,159 - root - INFO - Step 30640: lr=1.00E-05, loss= 1.1761 (max= 1.5142), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:54,159 - root - INFO - Step 30640: lr=1.00E-05, loss= 1.1761 (max= 1.5142), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:09:54,159 - root - INFO - Step 30640: lr=1.00E-05, loss= 1.1761 (max= 1.5142), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:10:10,121 - root - INFO - Step 30650: lr=1.00E-05, loss= 1.1587 (max= 1.6873), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:10,121 - root - INFO - Step 30650: lr=1.00E-05, loss= 1.1587 (max= 1.6873), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:10,121 - root - INFO - Step 30650: lr=1.00E-05, loss= 1.1587 (max= 1.6873), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:10,122 - root - INFO - Step 30650: lr=1.00E-05, loss= 1.1587 (max= 1.6873), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:10,122 - root - INFO - Step 30650: lr=1.00E-05, loss= 1.1587 (max= 1.6873), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:10,122 - root - INFO - Step 30650: lr=1.00E-05, loss= 1.1587 (max= 1.6873), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:10,122 - root - INFO - Step 30650: lr=1.00E-05, loss= 1.1587 (max= 1.6873), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:10,122 - root - INFO - Step 30650: lr=1.00E-05, loss= 1.1587 (max= 1.6873), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:26,065 - root - INFO - Step 30660: lr=1.00E-05, loss= 1.1899 (max= 1.5849), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:26,065 - root - INFO - Step 30660: lr=1.00E-05, loss= 1.1899 (max= 1.5849), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:26,065 - root - INFO - Step 30660: lr=1.00E-05, loss= 1.1899 (max= 1.5849), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:26,065 - root - INFO - Step 30660: lr=1.00E-05, loss= 1.1899 (max= 1.5849), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:26,065 - root - INFO - Step 30660: lr=1.00E-05, loss= 1.1899 (max= 1.5849), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:26,065 - root - INFO - Step 30660: lr=1.00E-05, loss= 1.1899 (max= 1.5849), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:26,065 - root - INFO - Step 30660: lr=1.00E-05, loss= 1.1899 (max= 1.5849), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:26,065 - root - INFO - Step 30660: lr=1.00E-05, loss= 1.1899 (max= 1.5849), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:41,992 - root - INFO - Step 30670: lr=1.00E-05, loss= 1.1484 (max= 1.5359), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:41,992 - root - INFO - Step 30670: lr=1.00E-05, loss= 1.1484 (max= 1.5359), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:41,992 - root - INFO - Step 30670: lr=1.00E-05, loss= 1.1484 (max= 1.5359), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:41,992 - root - INFO - Step 30670: lr=1.00E-05, loss= 1.1484 (max= 1.5359), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:41,992 - root - INFO - Step 30670: lr=1.00E-05, loss= 1.1484 (max= 1.5359), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:41,992 - root - INFO - Step 30670: lr=1.00E-05, loss= 1.1484 (max= 1.5359), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:41,992 - root - INFO - Step 30670: lr=1.00E-05, loss= 1.1484 (max= 1.5359), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:41,992 - root - INFO - Step 30670: lr=1.00E-05, loss= 1.1484 (max= 1.5359), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:57,972 - root - INFO - Step 30680: lr=1.00E-05, loss= 1.1822 (max= 1.5109), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:57,972 - root - INFO - Step 30680: lr=1.00E-05, loss= 1.1822 (max= 1.5109), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:57,972 - root - INFO - Step 30680: lr=1.00E-05, loss= 1.1822 (max= 1.5109), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:57,972 - root - INFO - Step 30680: lr=1.00E-05, loss= 1.1822 (max= 1.5109), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:57,972 - root - INFO - Step 30680: lr=1.00E-05, loss= 1.1822 (max= 1.5109), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:57,972 - root - INFO - Step 30680: lr=1.00E-05, loss= 1.1822 (max= 1.5109), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:57,972 - root - INFO - Step 30680: lr=1.00E-05, loss= 1.1822 (max= 1.5109), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:10:57,972 - root - INFO - Step 30680: lr=1.00E-05, loss= 1.1822 (max= 1.5109), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:11:13,867 - root - INFO - Step 30690: lr=1.00E-05, loss= 1.1480 (max= 1.5293), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:11:13,867 - root - INFO - Step 30690: lr=1.00E-05, loss= 1.1480 (max= 1.5293), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:11:13,867 - root - INFO - Step 30690: lr=1.00E-05, loss= 1.1480 (max= 1.5293), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:11:13,868 - root - INFO - Step 30690: lr=1.00E-05, loss= 1.1480 (max= 1.5293), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:11:13,868 - root - INFO - Step 30690: lr=1.00E-05, loss= 1.1480 (max= 1.5293), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:11:13,868 - root - INFO - Step 30690: lr=1.00E-05, loss= 1.1480 (max= 1.5293), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:11:13,868 - root - INFO - Step 30690: lr=1.00E-05, loss= 1.1480 (max= 1.5293), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:11:13,868 - root - INFO - Step 30690: lr=1.00E-05, loss= 1.1480 (max= 1.5293), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:11:29,803 - root - INFO - Step 30700: lr=1.00E-05, loss= 1.1647 (max= 1.5620), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:11:29,803 - root - INFO - Step 30700: lr=1.00E-05, loss= 1.1647 (max= 1.5620), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:11:29,803 - root - INFO - Step 30700: lr=1.00E-05, loss= 1.1647 (max= 1.5620), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:11:29,803 - root - INFO - Step 30700: lr=1.00E-05, loss= 1.1647 (max= 1.5620), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:11:29,803 - root - INFO - Step 30700: lr=1.00E-05, loss= 1.1647 (max= 1.5620), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:11:29,803 - root - INFO - Step 30700: lr=1.00E-05, loss= 1.1647 (max= 1.5620), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:11:29,803 - root - INFO - Step 30700: lr=1.00E-05, loss= 1.1647 (max= 1.5620), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:11:29,803 - root - INFO - Step 30700: lr=1.00E-05, loss= 1.1647 (max= 1.5620), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:11:45,775 - root - INFO - Step 30710: lr=1.00E-05, loss= 1.1750 (max= 1.6340), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:11:45,775 - root - INFO - Step 30710: lr=1.00E-05, loss= 1.1750 (max= 1.6340), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:11:45,775 - root - INFO - Step 30710: lr=1.00E-05, loss= 1.1750 (max= 1.6340), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:11:45,775 - root - INFO - Step 30710: lr=1.00E-05, loss= 1.1750 (max= 1.6340), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:11:45,775 - root - INFO - Step 30710: lr=1.00E-05, loss= 1.1750 (max= 1.6340), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:11:45,775 - root - INFO - Step 30710: lr=1.00E-05, loss= 1.1750 (max= 1.6340), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:11:45,775 - root - INFO - Step 30710: lr=1.00E-05, loss= 1.1750 (max= 1.6340), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:11:45,775 - root - INFO - Step 30710: lr=1.00E-05, loss= 1.1750 (max= 1.6340), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:12:01,649 - root - INFO - Step 30720: lr=1.00E-05, loss= 1.1651 (max= 1.5498), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:01,649 - root - INFO - Step 30720: lr=1.00E-05, loss= 1.1651 (max= 1.5498), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:01,649 - root - INFO - Step 30720: lr=1.00E-05, loss= 1.1651 (max= 1.5498), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:01,649 - root - INFO - Step 30720: lr=1.00E-05, loss= 1.1651 (max= 1.5498), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:01,649 - root - INFO - Step 30720: lr=1.00E-05, loss= 1.1651 (max= 1.5498), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:01,649 - root - INFO - Step 30720: lr=1.00E-05, loss= 1.1651 (max= 1.5498), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:01,649 - root - INFO - Step 30720: lr=1.00E-05, loss= 1.1651 (max= 1.5498), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:01,649 - root - INFO - Step 30720: lr=1.00E-05, loss= 1.1651 (max= 1.5498), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:17,536 - root - INFO - Step 30730: lr=1.00E-05, loss= 1.1651 (max= 1.5869), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:12:17,536 - root - INFO - Step 30730: lr=1.00E-05, loss= 1.1651 (max= 1.5869), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:12:17,536 - root - INFO - Step 30730: lr=1.00E-05, loss= 1.1651 (max= 1.5869), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:12:17,536 - root - INFO - Step 30730: lr=1.00E-05, loss= 1.1651 (max= 1.5869), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:12:17,536 - root - INFO - Step 30730: lr=1.00E-05, loss= 1.1651 (max= 1.5869), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:12:17,536 - root - INFO - Step 30730: lr=1.00E-05, loss= 1.1651 (max= 1.5869), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:12:17,536 - root - INFO - Step 30730: lr=1.00E-05, loss= 1.1651 (max= 1.5869), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:12:17,536 - root - INFO - Step 30730: lr=1.00E-05, loss= 1.1651 (max= 1.5869), tps=20629, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:12:33,451 - root - INFO - Step 30740: lr=1.00E-05, loss= 1.2031 (max= 1.6947), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:33,451 - root - INFO - Step 30740: lr=1.00E-05, loss= 1.2031 (max= 1.6947), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:33,451 - root - INFO - Step 30740: lr=1.00E-05, loss= 1.2031 (max= 1.6947), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:33,451 - root - INFO - Step 30740: lr=1.00E-05, loss= 1.2031 (max= 1.6947), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:33,451 - root - INFO - Step 30740: lr=1.00E-05, loss= 1.2031 (max= 1.6947), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:33,451 - root - INFO - Step 30740: lr=1.00E-05, loss= 1.2031 (max= 1.6947), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:33,451 - root - INFO - Step 30740: lr=1.00E-05, loss= 1.2031 (max= 1.6947), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:33,452 - root - INFO - Step 30740: lr=1.00E-05, loss= 1.2031 (max= 1.6947), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:49,372 - root - INFO - Step 30750: lr=1.00E-05, loss= 1.1834 (max= 1.8154), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:49,372 - root - INFO - Step 30750: lr=1.00E-05, loss= 1.1834 (max= 1.8154), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:49,372 - root - INFO - Step 30750: lr=1.00E-05, loss= 1.1834 (max= 1.8154), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:49,372 - root - INFO - Step 30750: lr=1.00E-05, loss= 1.1834 (max= 1.8154), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:49,372 - root - INFO - Step 30750: lr=1.00E-05, loss= 1.1834 (max= 1.8154), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:49,372 - root - INFO - Step 30750: lr=1.00E-05, loss= 1.1834 (max= 1.8154), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:49,372 - root - INFO - Step 30750: lr=1.00E-05, loss= 1.1834 (max= 1.8154), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:12:49,372 - root - INFO - Step 30750: lr=1.00E-05, loss= 1.1834 (max= 1.8154), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:13:05,338 - root - INFO - Step 30760: lr=1.00E-05, loss= 1.1537 (max= 1.5771), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:05,338 - root - INFO - Step 30760: lr=1.00E-05, loss= 1.1537 (max= 1.5771), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:05,339 - root - INFO - Step 30760: lr=1.00E-05, loss= 1.1537 (max= 1.5771), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:05,339 - root - INFO - Step 30760: lr=1.00E-05, loss= 1.1537 (max= 1.5771), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:05,339 - root - INFO - Step 30760: lr=1.00E-05, loss= 1.1537 (max= 1.5771), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:05,339 - root - INFO - Step 30760: lr=1.00E-05, loss= 1.1537 (max= 1.5771), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:05,339 - root - INFO - Step 30760: lr=1.00E-05, loss= 1.1537 (max= 1.5771), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:05,339 - root - INFO - Step 30760: lr=1.00E-05, loss= 1.1537 (max= 1.5771), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:21,257 - root - INFO - Step 30770: lr=1.00E-05, loss= 1.1753 (max= 1.5338), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:21,257 - root - INFO - Step 30770: lr=1.00E-05, loss= 1.1753 (max= 1.5338), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:21,258 - root - INFO - Step 30770: lr=1.00E-05, loss= 1.1753 (max= 1.5338), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:21,258 - root - INFO - Step 30770: lr=1.00E-05, loss= 1.1753 (max= 1.5338), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:21,258 - root - INFO - Step 30770: lr=1.00E-05, loss= 1.1753 (max= 1.5338), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:21,258 - root - INFO - Step 30770: lr=1.00E-05, loss= 1.1753 (max= 1.5338), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:21,258 - root - INFO - Step 30770: lr=1.00E-05, loss= 1.1753 (max= 1.5338), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:21,258 - root - INFO - Step 30770: lr=1.00E-05, loss= 1.1753 (max= 1.5338), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:37,220 - root - INFO - Step 30780: lr=1.00E-05, loss= 1.1588 (max= 1.6216), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:37,220 - root - INFO - Step 30780: lr=1.00E-05, loss= 1.1588 (max= 1.6216), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:37,221 - root - INFO - Step 30780: lr=1.00E-05, loss= 1.1588 (max= 1.6216), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:37,221 - root - INFO - Step 30780: lr=1.00E-05, loss= 1.1588 (max= 1.6216), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:37,221 - root - INFO - Step 30780: lr=1.00E-05, loss= 1.1588 (max= 1.6216), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:37,221 - root - INFO - Step 30780: lr=1.00E-05, loss= 1.1588 (max= 1.6216), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:37,221 - root - INFO - Step 30780: lr=1.00E-05, loss= 1.1588 (max= 1.6216), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:37,221 - root - INFO - Step 30780: lr=1.00E-05, loss= 1.1588 (max= 1.6216), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:53,143 - root - INFO - Step 30790: lr=1.00E-05, loss= 1.1651 (max= 1.5005), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:53,143 - root - INFO - Step 30790: lr=1.00E-05, loss= 1.1651 (max= 1.5005), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:53,143 - root - INFO - Step 30790: lr=1.00E-05, loss= 1.1651 (max= 1.5005), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:53,143 - root - INFO - Step 30790: lr=1.00E-05, loss= 1.1651 (max= 1.5005), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:53,143 - root - INFO - Step 30790: lr=1.00E-05, loss= 1.1651 (max= 1.5005), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:53,143 - root - INFO - Step 30790: lr=1.00E-05, loss= 1.1651 (max= 1.5005), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:53,143 - root - INFO - Step 30790: lr=1.00E-05, loss= 1.1651 (max= 1.5005), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:13:53,143 - root - INFO - Step 30790: lr=1.00E-05, loss= 1.1651 (max= 1.5005), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:14:09,099 - root - INFO - Step 30800: lr=1.00E-05, loss= 1.1646 (max= 1.6361), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:14:09,099 - root - INFO - Step 30800: lr=1.00E-05, loss= 1.1646 (max= 1.6361), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:14:09,099 - root - INFO - Step 30800: lr=1.00E-05, loss= 1.1646 (max= 1.6361), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:14:09,099 - root - INFO - Step 30800: lr=1.00E-05, loss= 1.1646 (max= 1.6361), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:14:09,099 - root - INFO - Step 30800: lr=1.00E-05, loss= 1.1646 (max= 1.6361), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:14:09,099 - root - INFO - Step 30800: lr=1.00E-05, loss= 1.1646 (max= 1.6361), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:14:09,099 - root - INFO - Step 30800: lr=1.00E-05, loss= 1.1646 (max= 1.6361), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:14:09,099 - root - INFO - Step 30800: lr=1.00E-05, loss= 1.1646 (max= 1.6361), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:14:25,029 - root - INFO - Step 30810: lr=1.00E-05, loss= 1.2010 (max= 2.0835), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:25,029 - root - INFO - Step 30810: lr=1.00E-05, loss= 1.2010 (max= 2.0835), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:25,029 - root - INFO - Step 30810: lr=1.00E-05, loss= 1.2010 (max= 2.0835), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:25,029 - root - INFO - Step 30810: lr=1.00E-05, loss= 1.2010 (max= 2.0835), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:25,029 - root - INFO - Step 30810: lr=1.00E-05, loss= 1.2010 (max= 2.0835), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:25,029 - root - INFO - Step 30810: lr=1.00E-05, loss= 1.2010 (max= 2.0835), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:25,029 - root - INFO - Step 30810: lr=1.00E-05, loss= 1.2010 (max= 2.0835), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:25,029 - root - INFO - Step 30810: lr=1.00E-05, loss= 1.2010 (max= 2.0835), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:40,980 - root - INFO - Step 30820: lr=1.00E-05, loss= 1.1894 (max= 1.4738), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:40,981 - root - INFO - Step 30820: lr=1.00E-05, loss= 1.1894 (max= 1.4738), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:40,981 - root - INFO - Step 30820: lr=1.00E-05, loss= 1.1894 (max= 1.4738), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:40,981 - root - INFO - Step 30820: lr=1.00E-05, loss= 1.1894 (max= 1.4738), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:40,981 - root - INFO - Step 30820: lr=1.00E-05, loss= 1.1894 (max= 1.4738), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:40,981 - root - INFO - Step 30820: lr=1.00E-05, loss= 1.1894 (max= 1.4738), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:40,981 - root - INFO - Step 30820: lr=1.00E-05, loss= 1.1894 (max= 1.4738), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:40,981 - root - INFO - Step 30820: lr=1.00E-05, loss= 1.1894 (max= 1.4738), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:56,936 - root - INFO - Step 30830: lr=1.00E-05, loss= 1.1684 (max= 1.7153), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:56,936 - root - INFO - Step 30830: lr=1.00E-05, loss= 1.1684 (max= 1.7153), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:56,936 - root - INFO - Step 30830: lr=1.00E-05, loss= 1.1684 (max= 1.7153), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:56,937 - root - INFO - Step 30830: lr=1.00E-05, loss= 1.1684 (max= 1.7153), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:56,937 - root - INFO - Step 30830: lr=1.00E-05, loss= 1.1684 (max= 1.7153), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:56,937 - root - INFO - Step 30830: lr=1.00E-05, loss= 1.1684 (max= 1.7153), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:56,937 - root - INFO - Step 30830: lr=1.00E-05, loss= 1.1684 (max= 1.7153), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:14:56,937 - root - INFO - Step 30830: lr=1.00E-05, loss= 1.1684 (max= 1.7153), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:12,897 - root - INFO - Step 30840: lr=1.00E-05, loss= 1.1854 (max= 1.4818), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:12,897 - root - INFO - Step 30840: lr=1.00E-05, loss= 1.1854 (max= 1.4818), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:12,897 - root - INFO - Step 30840: lr=1.00E-05, loss= 1.1854 (max= 1.4818), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:12,897 - root - INFO - Step 30840: lr=1.00E-05, loss= 1.1854 (max= 1.4818), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:12,897 - root - INFO - Step 30840: lr=1.00E-05, loss= 1.1854 (max= 1.4818), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:12,897 - root - INFO - Step 30840: lr=1.00E-05, loss= 1.1854 (max= 1.4818), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:12,897 - root - INFO - Step 30840: lr=1.00E-05, loss= 1.1854 (max= 1.4818), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:12,897 - root - INFO - Step 30840: lr=1.00E-05, loss= 1.1854 (max= 1.4818), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:28,835 - root - INFO - Step 30850: lr=1.00E-05, loss= 1.1698 (max= 1.5834), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:28,835 - root - INFO - Step 30850: lr=1.00E-05, loss= 1.1698 (max= 1.5834), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:28,835 - root - INFO - Step 30850: lr=1.00E-05, loss= 1.1698 (max= 1.5834), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:28,835 - root - INFO - Step 30850: lr=1.00E-05, loss= 1.1698 (max= 1.5834), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:28,835 - root - INFO - Step 30850: lr=1.00E-05, loss= 1.1698 (max= 1.5834), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:28,835 - root - INFO - Step 30850: lr=1.00E-05, loss= 1.1698 (max= 1.5834), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:28,835 - root - INFO - Step 30850: lr=1.00E-05, loss= 1.1698 (max= 1.5834), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:28,835 - root - INFO - Step 30850: lr=1.00E-05, loss= 1.1698 (max= 1.5834), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:44,780 - root - INFO - Step 30860: lr=1.00E-05, loss= 1.1463 (max= 1.4546), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:44,780 - root - INFO - Step 30860: lr=1.00E-05, loss= 1.1463 (max= 1.4546), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:44,780 - root - INFO - Step 30860: lr=1.00E-05, loss= 1.1463 (max= 1.4546), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:44,780 - root - INFO - Step 30860: lr=1.00E-05, loss= 1.1463 (max= 1.4546), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:44,780 - root - INFO - Step 30860: lr=1.00E-05, loss= 1.1463 (max= 1.4546), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:44,780 - root - INFO - Step 30860: lr=1.00E-05, loss= 1.1463 (max= 1.4546), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:44,780 - root - INFO - Step 30860: lr=1.00E-05, loss= 1.1463 (max= 1.4546), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:15:44,780 - root - INFO - Step 30860: lr=1.00E-05, loss= 1.1463 (max= 1.4546), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:00,709 - root - INFO - Step 30870: lr=1.00E-05, loss= 1.1889 (max= 1.7933), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:16:00,710 - root - INFO - Step 30870: lr=1.00E-05, loss= 1.1889 (max= 1.7933), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:16:00,710 - root - INFO - Step 30870: lr=1.00E-05, loss= 1.1889 (max= 1.7933), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:16:00,710 - root - INFO - Step 30870: lr=1.00E-05, loss= 1.1889 (max= 1.7933), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:16:00,710 - root - INFO - Step 30870: lr=1.00E-05, loss= 1.1889 (max= 1.7933), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:16:00,710 - root - INFO - Step 30870: lr=1.00E-05, loss= 1.1889 (max= 1.7933), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:16:00,710 - root - INFO - Step 30870: lr=1.00E-05, loss= 1.1889 (max= 1.7933), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:16:00,710 - root - INFO - Step 30870: lr=1.00E-05, loss= 1.1889 (max= 1.7933), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:16:16,657 - root - INFO - Step 30880: lr=1.00E-05, loss= 1.1424 (max= 1.4332), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:16,657 - root - INFO - Step 30880: lr=1.00E-05, loss= 1.1424 (max= 1.4332), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:16,657 - root - INFO - Step 30880: lr=1.00E-05, loss= 1.1424 (max= 1.4332), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:16,657 - root - INFO - Step 30880: lr=1.00E-05, loss= 1.1424 (max= 1.4332), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:16,657 - root - INFO - Step 30880: lr=1.00E-05, loss= 1.1424 (max= 1.4332), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:16,657 - root - INFO - Step 30880: lr=1.00E-05, loss= 1.1424 (max= 1.4332), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:16,657 - root - INFO - Step 30880: lr=1.00E-05, loss= 1.1424 (max= 1.4332), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:16,657 - root - INFO - Step 30880: lr=1.00E-05, loss= 1.1424 (max= 1.4332), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:32,577 - root - INFO - Step 30890: lr=1.00E-05, loss= 1.1696 (max= 1.5756), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:32,577 - root - INFO - Step 30890: lr=1.00E-05, loss= 1.1696 (max= 1.5756), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:32,577 - root - INFO - Step 30890: lr=1.00E-05, loss= 1.1696 (max= 1.5756), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:32,577 - root - INFO - Step 30890: lr=1.00E-05, loss= 1.1696 (max= 1.5756), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:32,577 - root - INFO - Step 30890: lr=1.00E-05, loss= 1.1696 (max= 1.5756), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:32,577 - root - INFO - Step 30890: lr=1.00E-05, loss= 1.1696 (max= 1.5756), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:32,577 - root - INFO - Step 30890: lr=1.00E-05, loss= 1.1696 (max= 1.5756), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:32,577 - root - INFO - Step 30890: lr=1.00E-05, loss= 1.1696 (max= 1.5756), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:48,445 - root - INFO - Step 30900: lr=1.00E-05, loss= 1.1945 (max= 1.9990), tps=20655, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:48,445 - root - INFO - Step 30900: lr=1.00E-05, loss= 1.1945 (max= 1.9990), tps=20654, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:48,445 - root - INFO - Step 30900: lr=1.00E-05, loss= 1.1945 (max= 1.9990), tps=20655, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:48,445 - root - INFO - Step 30900: lr=1.00E-05, loss= 1.1945 (max= 1.9990), tps=20655, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:48,445 - root - INFO - Step 30900: lr=1.00E-05, loss= 1.1945 (max= 1.9990), tps=20655, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:48,445 - root - INFO - Step 30900: lr=1.00E-05, loss= 1.1945 (max= 1.9990), tps=20655, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:48,445 - root - INFO - Step 30900: lr=1.00E-05, loss= 1.1945 (max= 1.9990), tps=20655, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:16:48,445 - root - INFO - Step 30900: lr=1.00E-05, loss= 1.1945 (max= 1.9990), tps=20655, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:04,401 - root - INFO - Step 30910: lr=1.00E-05, loss= 1.1731 (max= 1.5723), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:04,401 - root - INFO - Step 30910: lr=1.00E-05, loss= 1.1731 (max= 1.5723), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:04,401 - root - INFO - Step 30910: lr=1.00E-05, loss= 1.1731 (max= 1.5723), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:04,401 - root - INFO - Step 30910: lr=1.00E-05, loss= 1.1731 (max= 1.5723), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:04,401 - root - INFO - Step 30910: lr=1.00E-05, loss= 1.1731 (max= 1.5723), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:04,401 - root - INFO - Step 30910: lr=1.00E-05, loss= 1.1731 (max= 1.5723), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:04,401 - root - INFO - Step 30910: lr=1.00E-05, loss= 1.1731 (max= 1.5723), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:04,401 - root - INFO - Step 30910: lr=1.00E-05, loss= 1.1731 (max= 1.5723), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:20,393 - root - INFO - Step 30920: lr=1.00E-05, loss= 1.2044 (max= 1.7101), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:20,393 - root - INFO - Step 30920: lr=1.00E-05, loss= 1.2044 (max= 1.7101), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:20,393 - root - INFO - Step 30920: lr=1.00E-05, loss= 1.2044 (max= 1.7101), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:20,394 - root - INFO - Step 30920: lr=1.00E-05, loss= 1.2044 (max= 1.7101), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:20,394 - root - INFO - Step 30920: lr=1.00E-05, loss= 1.2044 (max= 1.7101), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:20,394 - root - INFO - Step 30920: lr=1.00E-05, loss= 1.2044 (max= 1.7101), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:20,394 - root - INFO - Step 30920: lr=1.00E-05, loss= 1.2044 (max= 1.7101), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:20,394 - root - INFO - Step 30920: lr=1.00E-05, loss= 1.2044 (max= 1.7101), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:36,374 - root - INFO - Step 30930: lr=1.00E-05, loss= 1.1690 (max= 1.5282), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:17:36,374 - root - INFO - Step 30930: lr=1.00E-05, loss= 1.1690 (max= 1.5282), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:17:36,374 - root - INFO - Step 30930: lr=1.00E-05, loss= 1.1690 (max= 1.5282), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:17:36,375 - root - INFO - Step 30930: lr=1.00E-05, loss= 1.1690 (max= 1.5282), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:17:36,375 - root - INFO - Step 30930: lr=1.00E-05, loss= 1.1690 (max= 1.5282), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:17:36,375 - root - INFO - Step 30930: lr=1.00E-05, loss= 1.1690 (max= 1.5282), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:17:36,375 - root - INFO - Step 30930: lr=1.00E-05, loss= 1.1690 (max= 1.5282), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:17:36,375 - root - INFO - Step 30930: lr=1.00E-05, loss= 1.1690 (max= 1.5282), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:17:52,371 - root - INFO - Step 30940: lr=1.00E-05, loss= 1.2184 (max= 1.7798), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:52,371 - root - INFO - Step 30940: lr=1.00E-05, loss= 1.2184 (max= 1.7798), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:52,371 - root - INFO - Step 30940: lr=1.00E-05, loss= 1.2184 (max= 1.7798), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:52,371 - root - INFO - Step 30940: lr=1.00E-05, loss= 1.2184 (max= 1.7798), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:52,371 - root - INFO - Step 30940: lr=1.00E-05, loss= 1.2184 (max= 1.7798), tps=20490, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:52,371 - root - INFO - Step 30940: lr=1.00E-05, loss= 1.2184 (max= 1.7798), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:52,371 - root - INFO - Step 30940: lr=1.00E-05, loss= 1.2184 (max= 1.7798), tps=20488, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:17:52,371 - root - INFO - Step 30940: lr=1.00E-05, loss= 1.2184 (max= 1.7798), tps=20489, mfu=42.69%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:18:08,315 - root - INFO - Step 30950: lr=1.00E-05, loss= 1.1714 (max= 1.8361), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:08,315 - root - INFO - Step 30950: lr=1.00E-05, loss= 1.1714 (max= 1.8361), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:08,315 - root - INFO - Step 30950: lr=1.00E-05, loss= 1.1714 (max= 1.8361), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:08,316 - root - INFO - Step 30950: lr=1.00E-05, loss= 1.1714 (max= 1.8361), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:08,316 - root - INFO - Step 30950: lr=1.00E-05, loss= 1.1714 (max= 1.8361), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:08,316 - root - INFO - Step 30950: lr=1.00E-05, loss= 1.1714 (max= 1.8361), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:08,316 - root - INFO - Step 30950: lr=1.00E-05, loss= 1.1714 (max= 1.8361), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:08,316 - root - INFO - Step 30950: lr=1.00E-05, loss= 1.1714 (max= 1.8361), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:24,211 - root - INFO - Step 30960: lr=1.00E-05, loss= 1.1704 (max= 1.6265), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:24,211 - root - INFO - Step 30960: lr=1.00E-05, loss= 1.1704 (max= 1.6265), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:24,211 - root - INFO - Step 30960: lr=1.00E-05, loss= 1.1704 (max= 1.6265), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:24,211 - root - INFO - Step 30960: lr=1.00E-05, loss= 1.1704 (max= 1.6265), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:24,211 - root - INFO - Step 30960: lr=1.00E-05, loss= 1.1704 (max= 1.6265), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:24,211 - root - INFO - Step 30960: lr=1.00E-05, loss= 1.1704 (max= 1.6265), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:24,211 - root - INFO - Step 30960: lr=1.00E-05, loss= 1.1704 (max= 1.6265), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:24,211 - root - INFO - Step 30960: lr=1.00E-05, loss= 1.1704 (max= 1.6265), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:28,138 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:4625435 +2025-10-25 00:18:40,129 - root - INFO - Step 30970: lr=1.00E-05, loss= 1.1875 (max= 1.5962), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:40,129 - root - INFO - Step 30970: lr=1.00E-05, loss= 1.1875 (max= 1.5962), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:40,129 - root - INFO - Step 30970: lr=1.00E-05, loss= 1.1875 (max= 1.5962), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:40,129 - root - INFO - Step 30970: lr=1.00E-05, loss= 1.1875 (max= 1.5962), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:40,129 - root - INFO - Step 30970: lr=1.00E-05, loss= 1.1875 (max= 1.5962), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:40,129 - root - INFO - Step 30970: lr=1.00E-05, loss= 1.1875 (max= 1.5962), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:40,129 - root - INFO - Step 30970: lr=1.00E-05, loss= 1.1875 (max= 1.5962), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:40,129 - root - INFO - Step 30970: lr=1.00E-05, loss= 1.1875 (max= 1.5962), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:56,067 - root - INFO - Step 30980: lr=1.00E-05, loss= 1.1785 (max= 1.4923), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:56,067 - root - INFO - Step 30980: lr=1.00E-05, loss= 1.1785 (max= 1.4923), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:56,067 - root - INFO - Step 30980: lr=1.00E-05, loss= 1.1785 (max= 1.4923), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:56,067 - root - INFO - Step 30980: lr=1.00E-05, loss= 1.1785 (max= 1.4923), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:56,067 - root - INFO - Step 30980: lr=1.00E-05, loss= 1.1785 (max= 1.4923), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:56,067 - root - INFO - Step 30980: lr=1.00E-05, loss= 1.1785 (max= 1.4923), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:56,067 - root - INFO - Step 30980: lr=1.00E-05, loss= 1.1785 (max= 1.4923), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:18:56,067 - root - INFO - Step 30980: lr=1.00E-05, loss= 1.1785 (max= 1.4923), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:19:11,995 - root - INFO - Step 30990: lr=1.00E-05, loss= 1.2103 (max= 1.7145), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:19:11,995 - root - INFO - Step 30990: lr=1.00E-05, loss= 1.2103 (max= 1.7145), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:19:11,995 - root - INFO - Step 30990: lr=1.00E-05, loss= 1.2103 (max= 1.7145), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:19:11,995 - root - INFO - Step 30990: lr=1.00E-05, loss= 1.2103 (max= 1.7145), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:19:11,995 - root - INFO - Step 30990: lr=1.00E-05, loss= 1.2103 (max= 1.7145), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:19:11,995 - root - INFO - Step 30990: lr=1.00E-05, loss= 1.2103 (max= 1.7145), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:19:11,995 - root - INFO - Step 30990: lr=1.00E-05, loss= 1.2103 (max= 1.7145), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:19:11,996 - root - INFO - Step 30990: lr=1.00E-05, loss= 1.2103 (max= 1.7145), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-31000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-31000! Save time: 4.423083066940308 +2025-10-25 00:19:27,897 - root - INFO - Step 31000: lr=1.00E-05, loss= 1.1741 (max= 1.7056), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:19:27,897 - root - INFO - Step 31000: lr=1.00E-05, loss= 1.1741 (max= 1.7056), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:19:27,897 - root - INFO - Step 31000: lr=1.00E-05, loss= 1.1741 (max= 1.7056), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:19:27,897 - root - INFO - Step 31000: lr=1.00E-05, loss= 1.1741 (max= 1.7056), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:19:27,897 - root - INFO - Saving a full checkpoint at step 31000 +2025-10-25 00:19:27,897 - root - INFO - Saving a full checkpoint at step 31000 +2025-10-25 00:19:27,897 - root - INFO - Saving a full checkpoint at step 31000 +2025-10-25 00:19:27,897 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 00:19:27,897 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 00:19:27,897 - root - INFO - Saving a full checkpoint at step 31000 +2025-10-25 00:19:27,897 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 00:19:27,897 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 00:19:27,897 - root - INFO - Step 31000: lr=1.00E-05, loss= 1.1741 (max= 1.7056), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:19:27,898 - root - INFO - Saving a full checkpoint at step 31000 +2025-10-25 00:19:27,898 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 00:19:27,897 - root - INFO - Step 31000: lr=1.00E-05, loss= 1.1741 (max= 1.7056), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:19:27,897 - root - INFO - Step 31000: lr=1.00E-05, loss= 1.1741 (max= 1.7056), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:19:27,898 - root - INFO - Step 31000: lr=1.00E-05, loss= 1.1741 (max= 1.7056), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:19:27,898 - root - INFO - Saving a full checkpoint at step 31000 +2025-10-25 00:19:27,898 - root - INFO - Saving a full checkpoint at step 31000 +2025-10-25 00:19:27,898 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 00:19:27,898 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 00:19:27,898 - root - INFO - Saving a full checkpoint at step 31000 +2025-10-25 00:19:27,898 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 00:19:42,675 - root - INFO - Finished saving the checkpoint in 14.78 seconds +2025-10-25 00:19:42,681 - root - INFO - Finished saving the checkpoint in 14.78 seconds +2025-10-25 00:19:42,682 - root - INFO - Finished saving the checkpoint in 14.78 seconds +2025-10-25 00:19:42,682 - root - INFO - Finished saving the checkpoint in 14.78 seconds +2025-10-25 00:19:42,682 - root - INFO - Finished saving the checkpoint in 14.78 seconds +2025-10-25 00:19:42,683 - root - INFO - Finished saving the checkpoint in 14.79 seconds +2025-10-25 00:19:42,683 - root - INFO - Finished saving the checkpoint in 14.79 seconds +2025-10-25 00:19:42,683 - root - INFO - Finished saving the checkpoint in 14.79 seconds +2025-10-25 00:19:58,576 - root - INFO - Step 31010: lr=1.00E-05, loss= 1.1745 (max= 1.5225), tps=10682, mfu=22.26%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:19:58,576 - root - INFO - Step 31010: lr=1.00E-05, loss= 1.1745 (max= 1.5225), tps=10682, mfu=22.26%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:19:58,576 - root - INFO - Step 31010: lr=1.00E-05, loss= 1.1745 (max= 1.5225), tps=10682, mfu=22.26%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:19:58,576 - root - INFO - Step 31010: lr=1.00E-05, loss= 1.1745 (max= 1.5225), tps=10682, mfu=22.26%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:19:58,577 - root - INFO - Step 31010: lr=1.00E-05, loss= 1.1745 (max= 1.5225), tps=10682, mfu=22.26%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:19:58,577 - root - INFO - Step 31010: lr=1.00E-05, loss= 1.1745 (max= 1.5225), tps=10682, mfu=22.26%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:19:58,577 - root - INFO - Step 31010: lr=1.00E-05, loss= 1.1745 (max= 1.5225), tps=10682, mfu=22.26%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:19:58,577 - root - INFO - Step 31010: lr=1.00E-05, loss= 1.1745 (max= 1.5225), tps=10682, mfu=22.26%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:20:14,503 - root - INFO - Step 31020: lr=1.00E-05, loss= 1.1828 (max= 1.5316), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:20:14,503 - root - INFO - Step 31020: lr=1.00E-05, loss= 1.1828 (max= 1.5316), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:20:14,503 - root - INFO - Step 31020: lr=1.00E-05, loss= 1.1828 (max= 1.5316), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:20:14,503 - root - INFO - Step 31020: lr=1.00E-05, loss= 1.1828 (max= 1.5316), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:20:14,503 - root - INFO - Step 31020: lr=1.00E-05, loss= 1.1828 (max= 1.5316), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:20:14,503 - root - INFO - Step 31020: lr=1.00E-05, loss= 1.1828 (max= 1.5316), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:20:14,503 - root - INFO - Step 31020: lr=1.00E-05, loss= 1.1828 (max= 1.5316), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:20:14,503 - root - INFO - Step 31020: lr=1.00E-05, loss= 1.1828 (max= 1.5316), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:20:30,413 - root - INFO - Step 31030: lr=1.00E-05, loss= 1.1775 (max= 1.7370), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:20:30,413 - root - INFO - Step 31030: lr=1.00E-05, loss= 1.1775 (max= 1.7370), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:20:30,413 - root - INFO - Step 31030: lr=1.00E-05, loss= 1.1775 (max= 1.7370), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:20:30,413 - root - INFO - Step 31030: lr=1.00E-05, loss= 1.1775 (max= 1.7370), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:20:30,413 - root - INFO - Step 31030: lr=1.00E-05, loss= 1.1775 (max= 1.7370), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:20:30,413 - root - INFO - Step 31030: lr=1.00E-05, loss= 1.1775 (max= 1.7370), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:20:30,413 - root - INFO - Step 31030: lr=1.00E-05, loss= 1.1775 (max= 1.7370), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:20:30,413 - root - INFO - Step 31030: lr=1.00E-05, loss= 1.1775 (max= 1.7370), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:20:46,328 - root - INFO - Step 31040: lr=1.00E-05, loss= 1.1630 (max= 1.4720), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:20:46,328 - root - INFO - Step 31040: lr=1.00E-05, loss= 1.1630 (max= 1.4720), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:20:46,328 - root - INFO - Step 31040: lr=1.00E-05, loss= 1.1630 (max= 1.4720), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:20:46,328 - root - INFO - Step 31040: lr=1.00E-05, loss= 1.1630 (max= 1.4720), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:20:46,328 - root - INFO - Step 31040: lr=1.00E-05, loss= 1.1630 (max= 1.4720), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:20:46,328 - root - INFO - Step 31040: lr=1.00E-05, loss= 1.1630 (max= 1.4720), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:20:46,329 - root - INFO - Step 31040: lr=1.00E-05, loss= 1.1630 (max= 1.4720), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:20:46,329 - root - INFO - Step 31040: lr=1.00E-05, loss= 1.1630 (max= 1.4720), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:21:02,214 - root - INFO - Step 31050: lr=1.00E-05, loss= 1.1286 (max= 1.4938), tps=20631, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:21:02,214 - root - INFO - Step 31050: lr=1.00E-05, loss= 1.1286 (max= 1.4938), tps=20631, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:21:02,214 - root - INFO - Step 31050: lr=1.00E-05, loss= 1.1286 (max= 1.4938), tps=20631, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:21:02,214 - root - INFO - Step 31050: lr=1.00E-05, loss= 1.1286 (max= 1.4938), tps=20631, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:21:02,214 - root - INFO - Step 31050: lr=1.00E-05, loss= 1.1286 (max= 1.4938), tps=20631, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:21:02,214 - root - INFO - Step 31050: lr=1.00E-05, loss= 1.1286 (max= 1.4938), tps=20631, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:21:02,214 - root - INFO - Step 31050: lr=1.00E-05, loss= 1.1286 (max= 1.4938), tps=20631, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:21:02,214 - root - INFO - Step 31050: lr=1.00E-05, loss= 1.1286 (max= 1.4938), tps=20631, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:21:18,164 - root - INFO - Step 31060: lr=1.00E-05, loss= 1.1898 (max= 1.5771), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:18,164 - root - INFO - Step 31060: lr=1.00E-05, loss= 1.1898 (max= 1.5771), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:18,164 - root - INFO - Step 31060: lr=1.00E-05, loss= 1.1898 (max= 1.5771), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:18,164 - root - INFO - Step 31060: lr=1.00E-05, loss= 1.1898 (max= 1.5771), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:18,164 - root - INFO - Step 31060: lr=1.00E-05, loss= 1.1898 (max= 1.5771), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:18,164 - root - INFO - Step 31060: lr=1.00E-05, loss= 1.1898 (max= 1.5771), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:18,165 - root - INFO - Step 31060: lr=1.00E-05, loss= 1.1898 (max= 1.5771), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:18,165 - root - INFO - Step 31060: lr=1.00E-05, loss= 1.1898 (max= 1.5771), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:34,091 - root - INFO - Step 31070: lr=1.00E-05, loss= 1.1670 (max= 1.5160), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:34,091 - root - INFO - Step 31070: lr=1.00E-05, loss= 1.1670 (max= 1.5160), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:34,091 - root - INFO - Step 31070: lr=1.00E-05, loss= 1.1670 (max= 1.5160), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:34,091 - root - INFO - Step 31070: lr=1.00E-05, loss= 1.1670 (max= 1.5160), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:34,091 - root - INFO - Step 31070: lr=1.00E-05, loss= 1.1670 (max= 1.5160), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:34,091 - root - INFO - Step 31070: lr=1.00E-05, loss= 1.1670 (max= 1.5160), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:34,091 - root - INFO - Step 31070: lr=1.00E-05, loss= 1.1670 (max= 1.5160), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:34,091 - root - INFO - Step 31070: lr=1.00E-05, loss= 1.1670 (max= 1.5160), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:49,974 - root - INFO - Step 31080: lr=1.00E-05, loss= 1.1555 (max= 1.5964), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:49,974 - root - INFO - Step 31080: lr=1.00E-05, loss= 1.1555 (max= 1.5964), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:49,974 - root - INFO - Step 31080: lr=1.00E-05, loss= 1.1555 (max= 1.5964), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:49,974 - root - INFO - Step 31080: lr=1.00E-05, loss= 1.1555 (max= 1.5964), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:49,974 - root - INFO - Step 31080: lr=1.00E-05, loss= 1.1555 (max= 1.5964), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:49,974 - root - INFO - Step 31080: lr=1.00E-05, loss= 1.1555 (max= 1.5964), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:49,974 - root - INFO - Step 31080: lr=1.00E-05, loss= 1.1555 (max= 1.5964), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:21:49,974 - root - INFO - Step 31080: lr=1.00E-05, loss= 1.1555 (max= 1.5964), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:05,853 - root - INFO - Step 31090: lr=1.00E-05, loss= 1.1669 (max= 1.4844), tps=20640, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:05,853 - root - INFO - Step 31090: lr=1.00E-05, loss= 1.1669 (max= 1.4844), tps=20640, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:05,853 - root - INFO - Step 31090: lr=1.00E-05, loss= 1.1669 (max= 1.4844), tps=20640, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:05,853 - root - INFO - Step 31090: lr=1.00E-05, loss= 1.1669 (max= 1.4844), tps=20640, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:05,853 - root - INFO - Step 31090: lr=1.00E-05, loss= 1.1669 (max= 1.4844), tps=20640, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:05,853 - root - INFO - Step 31090: lr=1.00E-05, loss= 1.1669 (max= 1.4844), tps=20640, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:05,853 - root - INFO - Step 31090: lr=1.00E-05, loss= 1.1669 (max= 1.4844), tps=20641, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:05,853 - root - INFO - Step 31090: lr=1.00E-05, loss= 1.1669 (max= 1.4844), tps=20640, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:21,792 - root - INFO - Step 31100: lr=1.00E-05, loss= 1.1577 (max= 1.6536), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:21,792 - root - INFO - Step 31100: lr=1.00E-05, loss= 1.1577 (max= 1.6536), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:21,792 - root - INFO - Step 31100: lr=1.00E-05, loss= 1.1577 (max= 1.6536), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:21,793 - root - INFO - Step 31100: lr=1.00E-05, loss= 1.1577 (max= 1.6536), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:21,793 - root - INFO - Step 31100: lr=1.00E-05, loss= 1.1577 (max= 1.6536), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:21,793 - root - INFO - Step 31100: lr=1.00E-05, loss= 1.1577 (max= 1.6536), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:21,793 - root - INFO - Step 31100: lr=1.00E-05, loss= 1.1577 (max= 1.6536), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:21,793 - root - INFO - Step 31100: lr=1.00E-05, loss= 1.1577 (max= 1.6536), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:37,739 - root - INFO - Step 31110: lr=1.00E-05, loss= 1.1918 (max= 1.6780), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:37,739 - root - INFO - Step 31110: lr=1.00E-05, loss= 1.1918 (max= 1.6780), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:37,739 - root - INFO - Step 31110: lr=1.00E-05, loss= 1.1918 (max= 1.6780), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:37,739 - root - INFO - Step 31110: lr=1.00E-05, loss= 1.1918 (max= 1.6780), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:37,739 - root - INFO - Step 31110: lr=1.00E-05, loss= 1.1918 (max= 1.6780), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:37,739 - root - INFO - Step 31110: lr=1.00E-05, loss= 1.1918 (max= 1.6780), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:37,739 - root - INFO - Step 31110: lr=1.00E-05, loss= 1.1918 (max= 1.6780), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:37,739 - root - INFO - Step 31110: lr=1.00E-05, loss= 1.1918 (max= 1.6780), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:53,685 - root - INFO - Step 31120: lr=1.00E-05, loss= 1.1901 (max= 1.5147), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:53,685 - root - INFO - Step 31120: lr=1.00E-05, loss= 1.1901 (max= 1.5147), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:53,685 - root - INFO - Step 31120: lr=1.00E-05, loss= 1.1901 (max= 1.5147), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:53,685 - root - INFO - Step 31120: lr=1.00E-05, loss= 1.1901 (max= 1.5147), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:53,685 - root - INFO - Step 31120: lr=1.00E-05, loss= 1.1901 (max= 1.5147), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:53,685 - root - INFO - Step 31120: lr=1.00E-05, loss= 1.1901 (max= 1.5147), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:53,685 - root - INFO - Step 31120: lr=1.00E-05, loss= 1.1901 (max= 1.5147), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:22:53,685 - root - INFO - Step 31120: lr=1.00E-05, loss= 1.1901 (max= 1.5147), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:23:09,638 - root - INFO - Step 31130: lr=1.00E-05, loss= 1.1751 (max= 1.5936), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:09,638 - root - INFO - Step 31130: lr=1.00E-05, loss= 1.1751 (max= 1.5936), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:09,638 - root - INFO - Step 31130: lr=1.00E-05, loss= 1.1751 (max= 1.5936), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:09,638 - root - INFO - Step 31130: lr=1.00E-05, loss= 1.1751 (max= 1.5936), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:09,638 - root - INFO - Step 31130: lr=1.00E-05, loss= 1.1751 (max= 1.5936), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:09,638 - root - INFO - Step 31130: lr=1.00E-05, loss= 1.1751 (max= 1.5936), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:09,638 - root - INFO - Step 31130: lr=1.00E-05, loss= 1.1751 (max= 1.5936), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:09,638 - root - INFO - Step 31130: lr=1.00E-05, loss= 1.1751 (max= 1.5936), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:25,579 - root - INFO - Step 31140: lr=1.00E-05, loss= 1.1945 (max= 1.6919), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:25,579 - root - INFO - Step 31140: lr=1.00E-05, loss= 1.1945 (max= 1.6919), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:25,579 - root - INFO - Step 31140: lr=1.00E-05, loss= 1.1945 (max= 1.6919), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:25,579 - root - INFO - Step 31140: lr=1.00E-05, loss= 1.1945 (max= 1.6919), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:25,579 - root - INFO - Step 31140: lr=1.00E-05, loss= 1.1945 (max= 1.6919), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:25,579 - root - INFO - Step 31140: lr=1.00E-05, loss= 1.1945 (max= 1.6919), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:25,579 - root - INFO - Step 31140: lr=1.00E-05, loss= 1.1945 (max= 1.6919), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:25,579 - root - INFO - Step 31140: lr=1.00E-05, loss= 1.1945 (max= 1.6919), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:41,517 - root - INFO - Step 31150: lr=1.00E-05, loss= 1.2292 (max= 1.6834), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:41,517 - root - INFO - Step 31150: lr=1.00E-05, loss= 1.2292 (max= 1.6834), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:41,517 - root - INFO - Step 31150: lr=1.00E-05, loss= 1.2292 (max= 1.6834), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:41,517 - root - INFO - Step 31150: lr=1.00E-05, loss= 1.2292 (max= 1.6834), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:41,517 - root - INFO - Step 31150: lr=1.00E-05, loss= 1.2292 (max= 1.6834), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:41,517 - root - INFO - Step 31150: lr=1.00E-05, loss= 1.2292 (max= 1.6834), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:41,517 - root - INFO - Step 31150: lr=1.00E-05, loss= 1.2292 (max= 1.6834), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:41,517 - root - INFO - Step 31150: lr=1.00E-05, loss= 1.2292 (max= 1.6834), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:23:57,502 - root - INFO - Step 31160: lr=1.00E-05, loss= 1.1536 (max= 1.6787), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:23:57,502 - root - INFO - Step 31160: lr=1.00E-05, loss= 1.1536 (max= 1.6787), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:23:57,502 - root - INFO - Step 31160: lr=1.00E-05, loss= 1.1536 (max= 1.6787), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:23:57,502 - root - INFO - Step 31160: lr=1.00E-05, loss= 1.1536 (max= 1.6787), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:23:57,502 - root - INFO - Step 31160: lr=1.00E-05, loss= 1.1536 (max= 1.6787), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:23:57,502 - root - INFO - Step 31160: lr=1.00E-05, loss= 1.1536 (max= 1.6787), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:23:57,502 - root - INFO - Step 31160: lr=1.00E-05, loss= 1.1536 (max= 1.6787), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:23:57,502 - root - INFO - Step 31160: lr=1.00E-05, loss= 1.1536 (max= 1.6787), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:24:13,413 - root - INFO - Step 31170: lr=1.00E-05, loss= 1.1574 (max= 1.6131), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:24:13,413 - root - INFO - Step 31170: lr=1.00E-05, loss= 1.1574 (max= 1.6131), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:24:13,413 - root - INFO - Step 31170: lr=1.00E-05, loss= 1.1574 (max= 1.6131), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:24:13,413 - root - INFO - Step 31170: lr=1.00E-05, loss= 1.1574 (max= 1.6131), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:24:13,413 - root - INFO - Step 31170: lr=1.00E-05, loss= 1.1574 (max= 1.6131), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:24:13,413 - root - INFO - Step 31170: lr=1.00E-05, loss= 1.1574 (max= 1.6131), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:24:13,413 - root - INFO - Step 31170: lr=1.00E-05, loss= 1.1574 (max= 1.6131), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:24:13,413 - root - INFO - Step 31170: lr=1.00E-05, loss= 1.1574 (max= 1.6131), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:24:29,343 - root - INFO - Step 31180: lr=1.00E-05, loss= 1.2048 (max= 1.7235), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:24:29,343 - root - INFO - Step 31180: lr=1.00E-05, loss= 1.2048 (max= 1.7235), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:24:29,343 - root - INFO - Step 31180: lr=1.00E-05, loss= 1.2048 (max= 1.7235), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:24:29,343 - root - INFO - Step 31180: lr=1.00E-05, loss= 1.2048 (max= 1.7235), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:24:29,343 - root - INFO - Step 31180: lr=1.00E-05, loss= 1.2048 (max= 1.7235), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:24:29,343 - root - INFO - Step 31180: lr=1.00E-05, loss= 1.2048 (max= 1.7235), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:24:29,343 - root - INFO - Step 31180: lr=1.00E-05, loss= 1.2048 (max= 1.7235), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:24:29,343 - root - INFO - Step 31180: lr=1.00E-05, loss= 1.2048 (max= 1.7235), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:24:45,267 - root - INFO - Step 31190: lr=1.00E-05, loss= 1.1758 (max= 1.6308), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:24:45,267 - root - INFO - Step 31190: lr=1.00E-05, loss= 1.1758 (max= 1.6308), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:24:45,267 - root - INFO - Step 31190: lr=1.00E-05, loss= 1.1758 (max= 1.6308), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:24:45,267 - root - INFO - Step 31190: lr=1.00E-05, loss= 1.1758 (max= 1.6308), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:24:45,267 - root - INFO - Step 31190: lr=1.00E-05, loss= 1.1758 (max= 1.6308), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:24:45,267 - root - INFO - Step 31190: lr=1.00E-05, loss= 1.1758 (max= 1.6308), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:24:45,267 - root - INFO - Step 31190: lr=1.00E-05, loss= 1.1758 (max= 1.6308), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:24:45,267 - root - INFO - Step 31190: lr=1.00E-05, loss= 1.1758 (max= 1.6308), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:25:01,277 - root - INFO - Step 31200: lr=1.00E-05, loss= 1.1429 (max= 1.5938), tps=20471, mfu=42.65%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:01,277 - root - INFO - Step 31200: lr=1.00E-05, loss= 1.1429 (max= 1.5938), tps=20471, mfu=42.65%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:01,277 - root - INFO - Step 31200: lr=1.00E-05, loss= 1.1429 (max= 1.5938), tps=20471, mfu=42.65%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:01,278 - root - INFO - Step 31200: lr=1.00E-05, loss= 1.1429 (max= 1.5938), tps=20471, mfu=42.65%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:01,278 - root - INFO - Step 31200: lr=1.00E-05, loss= 1.1429 (max= 1.5938), tps=20471, mfu=42.65%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:01,278 - root - INFO - Step 31200: lr=1.00E-05, loss= 1.1429 (max= 1.5938), tps=20471, mfu=42.65%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:01,278 - root - INFO - Step 31200: lr=1.00E-05, loss= 1.1429 (max= 1.5938), tps=20471, mfu=42.65%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:01,278 - root - INFO - Step 31200: lr=1.00E-05, loss= 1.1429 (max= 1.5938), tps=20471, mfu=42.65%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:17,223 - root - INFO - Step 31210: lr=1.00E-05, loss= 1.1543 (max= 1.4846), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:17,224 - root - INFO - Step 31210: lr=1.00E-05, loss= 1.1543 (max= 1.4846), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:17,224 - root - INFO - Step 31210: lr=1.00E-05, loss= 1.1543 (max= 1.4846), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:17,224 - root - INFO - Step 31210: lr=1.00E-05, loss= 1.1543 (max= 1.4846), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:17,224 - root - INFO - Step 31210: lr=1.00E-05, loss= 1.1543 (max= 1.4846), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:17,224 - root - INFO - Step 31210: lr=1.00E-05, loss= 1.1543 (max= 1.4846), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:17,224 - root - INFO - Step 31210: lr=1.00E-05, loss= 1.1543 (max= 1.4846), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:17,224 - root - INFO - Step 31210: lr=1.00E-05, loss= 1.1543 (max= 1.4846), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:33,184 - root - INFO - Step 31220: lr=1.00E-05, loss= 1.1745 (max= 1.5777), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:25:33,184 - root - INFO - Step 31220: lr=1.00E-05, loss= 1.1745 (max= 1.5777), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:25:33,184 - root - INFO - Step 31220: lr=1.00E-05, loss= 1.1745 (max= 1.5777), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:25:33,184 - root - INFO - Step 31220: lr=1.00E-05, loss= 1.1745 (max= 1.5777), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:25:33,184 - root - INFO - Step 31220: lr=1.00E-05, loss= 1.1745 (max= 1.5777), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:25:33,184 - root - INFO - Step 31220: lr=1.00E-05, loss= 1.1745 (max= 1.5777), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:25:33,184 - root - INFO - Step 31220: lr=1.00E-05, loss= 1.1745 (max= 1.5777), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:25:33,184 - root - INFO - Step 31220: lr=1.00E-05, loss= 1.1745 (max= 1.5777), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:25:49,086 - root - INFO - Step 31230: lr=1.00E-05, loss= 1.1553 (max= 1.6160), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:49,086 - root - INFO - Step 31230: lr=1.00E-05, loss= 1.1553 (max= 1.6160), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:49,086 - root - INFO - Step 31230: lr=1.00E-05, loss= 1.1553 (max= 1.6160), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:49,086 - root - INFO - Step 31230: lr=1.00E-05, loss= 1.1553 (max= 1.6160), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:49,086 - root - INFO - Step 31230: lr=1.00E-05, loss= 1.1553 (max= 1.6160), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:49,086 - root - INFO - Step 31230: lr=1.00E-05, loss= 1.1553 (max= 1.6160), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:49,086 - root - INFO - Step 31230: lr=1.00E-05, loss= 1.1553 (max= 1.6160), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:25:49,086 - root - INFO - Step 31230: lr=1.00E-05, loss= 1.1553 (max= 1.6160), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:26:05,010 - root - INFO - Step 31240: lr=1.00E-05, loss= 1.1571 (max= 1.6582), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:05,010 - root - INFO - Step 31240: lr=1.00E-05, loss= 1.1571 (max= 1.6582), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:05,010 - root - INFO - Step 31240: lr=1.00E-05, loss= 1.1571 (max= 1.6582), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:05,010 - root - INFO - Step 31240: lr=1.00E-05, loss= 1.1571 (max= 1.6582), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:05,010 - root - INFO - Step 31240: lr=1.00E-05, loss= 1.1571 (max= 1.6582), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:05,010 - root - INFO - Step 31240: lr=1.00E-05, loss= 1.1571 (max= 1.6582), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:05,010 - root - INFO - Step 31240: lr=1.00E-05, loss= 1.1571 (max= 1.6582), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:05,010 - root - INFO - Step 31240: lr=1.00E-05, loss= 1.1571 (max= 1.6582), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:20,914 - root - INFO - Step 31250: lr=1.00E-05, loss= 1.1605 (max= 1.4898), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:26:20,914 - root - INFO - Step 31250: lr=1.00E-05, loss= 1.1605 (max= 1.4898), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:26:20,915 - root - INFO - Step 31250: lr=1.00E-05, loss= 1.1605 (max= 1.4898), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:26:20,915 - root - INFO - Step 31250: lr=1.00E-05, loss= 1.1605 (max= 1.4898), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:26:20,915 - root - INFO - Step 31250: lr=1.00E-05, loss= 1.1605 (max= 1.4898), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:26:20,915 - root - INFO - Step 31250: lr=1.00E-05, loss= 1.1605 (max= 1.4898), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:26:20,915 - root - INFO - Step 31250: lr=1.00E-05, loss= 1.1605 (max= 1.4898), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:26:20,915 - root - INFO - Step 31250: lr=1.00E-05, loss= 1.1605 (max= 1.4898), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:26:36,844 - root - INFO - Step 31260: lr=1.00E-05, loss= 1.1285 (max= 1.5726), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:36,844 - root - INFO - Step 31260: lr=1.00E-05, loss= 1.1285 (max= 1.5726), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:36,845 - root - INFO - Step 31260: lr=1.00E-05, loss= 1.1285 (max= 1.5726), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:36,845 - root - INFO - Step 31260: lr=1.00E-05, loss= 1.1285 (max= 1.5726), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:36,845 - root - INFO - Step 31260: lr=1.00E-05, loss= 1.1285 (max= 1.5726), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:36,845 - root - INFO - Step 31260: lr=1.00E-05, loss= 1.1285 (max= 1.5726), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:36,845 - root - INFO - Step 31260: lr=1.00E-05, loss= 1.1285 (max= 1.5726), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:36,845 - root - INFO - Step 31260: lr=1.00E-05, loss= 1.1285 (max= 1.5726), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:52,806 - root - INFO - Step 31270: lr=1.00E-05, loss= 1.1279 (max= 1.4669), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:52,806 - root - INFO - Step 31270: lr=1.00E-05, loss= 1.1279 (max= 1.4669), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:52,807 - root - INFO - Step 31270: lr=1.00E-05, loss= 1.1279 (max= 1.4669), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:52,807 - root - INFO - Step 31270: lr=1.00E-05, loss= 1.1279 (max= 1.4669), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:52,807 - root - INFO - Step 31270: lr=1.00E-05, loss= 1.1279 (max= 1.4669), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:52,807 - root - INFO - Step 31270: lr=1.00E-05, loss= 1.1279 (max= 1.4669), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:52,807 - root - INFO - Step 31270: lr=1.00E-05, loss= 1.1279 (max= 1.4669), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:26:52,807 - root - INFO - Step 31270: lr=1.00E-05, loss= 1.1279 (max= 1.4669), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:27:08,782 - root - INFO - Step 31280: lr=1.00E-05, loss= 1.1477 (max= 1.7571), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:27:08,782 - root - INFO - Step 31280: lr=1.00E-05, loss= 1.1477 (max= 1.7571), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:27:08,782 - root - INFO - Step 31280: lr=1.00E-05, loss= 1.1477 (max= 1.7571), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:27:08,782 - root - INFO - Step 31280: lr=1.00E-05, loss= 1.1477 (max= 1.7571), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:27:08,782 - root - INFO - Step 31280: lr=1.00E-05, loss= 1.1477 (max= 1.7571), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:27:08,782 - root - INFO - Step 31280: lr=1.00E-05, loss= 1.1477 (max= 1.7571), tps=20516, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:27:08,782 - root - INFO - Step 31280: lr=1.00E-05, loss= 1.1477 (max= 1.7571), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:27:08,782 - root - INFO - Step 31280: lr=1.00E-05, loss= 1.1477 (max= 1.7571), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:27:24,680 - root - INFO - Step 31290: lr=1.00E-05, loss= 1.1239 (max= 1.5726), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:27:24,680 - root - INFO - Step 31290: lr=1.00E-05, loss= 1.1239 (max= 1.5726), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:27:24,680 - root - INFO - Step 31290: lr=1.00E-05, loss= 1.1239 (max= 1.5726), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:27:24,680 - root - INFO - Step 31290: lr=1.00E-05, loss= 1.1239 (max= 1.5726), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:27:24,681 - root - INFO - Step 31290: lr=1.00E-05, loss= 1.1239 (max= 1.5726), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:27:24,681 - root - INFO - Step 31290: lr=1.00E-05, loss= 1.1239 (max= 1.5726), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:27:24,681 - root - INFO - Step 31290: lr=1.00E-05, loss= 1.1239 (max= 1.5726), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:27:24,681 - root - INFO - Step 31290: lr=1.00E-05, loss= 1.1239 (max= 1.5726), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:27:40,592 - root - INFO - Step 31300: lr=1.00E-05, loss= 1.1522 (max= 1.4681), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:27:40,592 - root - INFO - Step 31300: lr=1.00E-05, loss= 1.1522 (max= 1.4681), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:27:40,592 - root - INFO - Step 31300: lr=1.00E-05, loss= 1.1522 (max= 1.4681), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:27:40,592 - root - INFO - Step 31300: lr=1.00E-05, loss= 1.1522 (max= 1.4681), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:27:40,592 - root - INFO - Step 31300: lr=1.00E-05, loss= 1.1522 (max= 1.4681), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:27:40,592 - root - INFO - Step 31300: lr=1.00E-05, loss= 1.1522 (max= 1.4681), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:27:40,592 - root - INFO - Step 31300: lr=1.00E-05, loss= 1.1522 (max= 1.4681), tps=20597, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:27:40,593 - root - INFO - Step 31300: lr=1.00E-05, loss= 1.1522 (max= 1.4681), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:27:56,567 - root - INFO - Step 31310: lr=1.00E-05, loss= 1.1773 (max= 1.6300), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:27:56,567 - root - INFO - Step 31310: lr=1.00E-05, loss= 1.1773 (max= 1.6300), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:27:56,567 - root - INFO - Step 31310: lr=1.00E-05, loss= 1.1773 (max= 1.6300), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:27:56,567 - root - INFO - Step 31310: lr=1.00E-05, loss= 1.1773 (max= 1.6300), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:27:56,567 - root - INFO - Step 31310: lr=1.00E-05, loss= 1.1773 (max= 1.6300), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:27:56,567 - root - INFO - Step 31310: lr=1.00E-05, loss= 1.1773 (max= 1.6300), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:27:56,567 - root - INFO - Step 31310: lr=1.00E-05, loss= 1.1773 (max= 1.6300), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:27:56,567 - root - INFO - Step 31310: lr=1.00E-05, loss= 1.1773 (max= 1.6300), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:28:12,489 - root - INFO - Step 31320: lr=1.00E-05, loss= 1.1416 (max= 1.5698), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:28:12,489 - root - INFO - Step 31320: lr=1.00E-05, loss= 1.1416 (max= 1.5698), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:28:12,489 - root - INFO - Step 31320: lr=1.00E-05, loss= 1.1416 (max= 1.5698), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:28:12,489 - root - INFO - Step 31320: lr=1.00E-05, loss= 1.1416 (max= 1.5698), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:28:12,489 - root - INFO - Step 31320: lr=1.00E-05, loss= 1.1416 (max= 1.5698), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:28:12,489 - root - INFO - Step 31320: lr=1.00E-05, loss= 1.1416 (max= 1.5698), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:28:12,489 - root - INFO - Step 31320: lr=1.00E-05, loss= 1.1416 (max= 1.5698), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:28:12,489 - root - INFO - Step 31320: lr=1.00E-05, loss= 1.1416 (max= 1.5698), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:28:28,412 - root - INFO - Step 31330: lr=1.00E-05, loss= 1.1329 (max= 1.9979), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:28:28,412 - root - INFO - Step 31330: lr=1.00E-05, loss= 1.1329 (max= 1.9979), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:28:28,412 - root - INFO - Step 31330: lr=1.00E-05, loss= 1.1329 (max= 1.9979), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:28:28,412 - root - INFO - Step 31330: lr=1.00E-05, loss= 1.1329 (max= 1.9979), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:28:28,412 - root - INFO - Step 31330: lr=1.00E-05, loss= 1.1329 (max= 1.9979), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:28:28,412 - root - INFO - Step 31330: lr=1.00E-05, loss= 1.1329 (max= 1.9979), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:28:28,412 - root - INFO - Step 31330: lr=1.00E-05, loss= 1.1329 (max= 1.9979), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:28:28,412 - root - INFO - Step 31330: lr=1.00E-05, loss= 1.1329 (max= 1.9979), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:28:44,351 - root - INFO - Step 31340: lr=1.00E-05, loss= 1.1384 (max= 1.5924), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:28:44,351 - root - INFO - Step 31340: lr=1.00E-05, loss= 1.1384 (max= 1.5924), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:28:44,351 - root - INFO - Step 31340: lr=1.00E-05, loss= 1.1384 (max= 1.5924), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:28:44,352 - root - INFO - Step 31340: lr=1.00E-05, loss= 1.1384 (max= 1.5924), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:28:44,352 - root - INFO - Step 31340: lr=1.00E-05, loss= 1.1384 (max= 1.5924), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:28:44,352 - root - INFO - Step 31340: lr=1.00E-05, loss= 1.1384 (max= 1.5924), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:28:44,352 - root - INFO - Step 31340: lr=1.00E-05, loss= 1.1384 (max= 1.5924), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:28:44,352 - root - INFO - Step 31340: lr=1.00E-05, loss= 1.1384 (max= 1.5924), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:29:00,298 - root - INFO - Step 31350: lr=1.00E-05, loss= 1.1724 (max= 1.4648), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:29:00,298 - root - INFO - Step 31350: lr=1.00E-05, loss= 1.1724 (max= 1.4648), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:29:00,298 - root - INFO - Step 31350: lr=1.00E-05, loss= 1.1724 (max= 1.4648), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:29:00,298 - root - INFO - Step 31350: lr=1.00E-05, loss= 1.1724 (max= 1.4648), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:29:00,298 - root - INFO - Step 31350: lr=1.00E-05, loss= 1.1724 (max= 1.4648), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:29:00,298 - root - INFO - Step 31350: lr=1.00E-05, loss= 1.1724 (max= 1.4648), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:29:00,298 - root - INFO - Step 31350: lr=1.00E-05, loss= 1.1724 (max= 1.4648), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:29:00,298 - root - INFO - Step 31350: lr=1.00E-05, loss= 1.1724 (max= 1.4648), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:29:16,258 - root - INFO - Step 31360: lr=1.00E-05, loss= 1.1291 (max= 1.5392), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:29:16,258 - root - INFO - Step 31360: lr=1.00E-05, loss= 1.1291 (max= 1.5392), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:29:16,258 - root - INFO - Step 31360: lr=1.00E-05, loss= 1.1291 (max= 1.5392), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:29:16,258 - root - INFO - Step 31360: lr=1.00E-05, loss= 1.1291 (max= 1.5392), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:29:16,258 - root - INFO - Step 31360: lr=1.00E-05, loss= 1.1291 (max= 1.5392), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:29:16,258 - root - INFO - Step 31360: lr=1.00E-05, loss= 1.1291 (max= 1.5392), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:29:16,258 - root - INFO - Step 31360: lr=1.00E-05, loss= 1.1291 (max= 1.5392), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:29:16,258 - root - INFO - Step 31360: lr=1.00E-05, loss= 1.1291 (max= 1.5392), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:29:32,186 - root - INFO - Step 31370: lr=1.00E-05, loss= 1.1619 (max= 1.6819), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:29:32,186 - root - INFO - Step 31370: lr=1.00E-05, loss= 1.1619 (max= 1.6819), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:29:32,186 - root - INFO - Step 31370: lr=1.00E-05, loss= 1.1619 (max= 1.6819), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:29:32,186 - root - INFO - Step 31370: lr=1.00E-05, loss= 1.1619 (max= 1.6819), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:29:32,186 - root - INFO - Step 31370: lr=1.00E-05, loss= 1.1619 (max= 1.6819), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:29:32,186 - root - INFO - Step 31370: lr=1.00E-05, loss= 1.1619 (max= 1.6819), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:29:32,186 - root - INFO - Step 31370: lr=1.00E-05, loss= 1.1619 (max= 1.6819), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:29:32,186 - root - INFO - Step 31370: lr=1.00E-05, loss= 1.1619 (max= 1.6819), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:29:48,088 - root - INFO - Step 31380: lr=1.00E-05, loss= 1.1325 (max= 1.5692), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:29:48,088 - root - INFO - Step 31380: lr=1.00E-05, loss= 1.1325 (max= 1.5692), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:29:48,088 - root - INFO - Step 31380: lr=1.00E-05, loss= 1.1325 (max= 1.5692), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:29:48,088 - root - INFO - Step 31380: lr=1.00E-05, loss= 1.1325 (max= 1.5692), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:29:48,088 - root - INFO - Step 31380: lr=1.00E-05, loss= 1.1325 (max= 1.5692), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:29:48,088 - root - INFO - Step 31380: lr=1.00E-05, loss= 1.1325 (max= 1.5692), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:29:48,088 - root - INFO - Step 31380: lr=1.00E-05, loss= 1.1325 (max= 1.5692), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:29:48,088 - root - INFO - Step 31380: lr=1.00E-05, loss= 1.1325 (max= 1.5692), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:30:03,976 - root - INFO - Step 31390: lr=1.00E-05, loss= 1.1508 (max= 1.7593), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:30:03,976 - root - INFO - Step 31390: lr=1.00E-05, loss= 1.1508 (max= 1.7593), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:30:03,976 - root - INFO - Step 31390: lr=1.00E-05, loss= 1.1508 (max= 1.7593), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:30:03,976 - root - INFO - Step 31390: lr=1.00E-05, loss= 1.1508 (max= 1.7593), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:30:03,976 - root - INFO - Step 31390: lr=1.00E-05, loss= 1.1508 (max= 1.7593), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:30:03,976 - root - INFO - Step 31390: lr=1.00E-05, loss= 1.1508 (max= 1.7593), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:30:03,976 - root - INFO - Step 31390: lr=1.00E-05, loss= 1.1508 (max= 1.7593), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:30:03,977 - root - INFO - Step 31390: lr=1.00E-05, loss= 1.1508 (max= 1.7593), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:30:19,931 - root - INFO - Step 31400: lr=1.00E-05, loss= 1.1235 (max= 1.4277), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:30:19,931 - root - INFO - Step 31400: lr=1.00E-05, loss= 1.1235 (max= 1.4277), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:30:19,931 - root - INFO - Step 31400: lr=1.00E-05, loss= 1.1235 (max= 1.4277), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:30:19,931 - root - INFO - Step 31400: lr=1.00E-05, loss= 1.1235 (max= 1.4277), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:30:19,931 - root - INFO - Step 31400: lr=1.00E-05, loss= 1.1235 (max= 1.4277), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:30:19,931 - root - INFO - Step 31400: lr=1.00E-05, loss= 1.1235 (max= 1.4277), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:30:19,931 - root - INFO - Step 31400: lr=1.00E-05, loss= 1.1235 (max= 1.4277), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:30:19,931 - root - INFO - Step 31400: lr=1.00E-05, loss= 1.1235 (max= 1.4277), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:30:35,871 - root - INFO - Step 31410: lr=1.00E-05, loss= 1.1263 (max= 1.5833), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:30:35,871 - root - INFO - Step 31410: lr=1.00E-05, loss= 1.1263 (max= 1.5833), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:30:35,871 - root - INFO - Step 31410: lr=1.00E-05, loss= 1.1263 (max= 1.5833), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:30:35,871 - root - INFO - Step 31410: lr=1.00E-05, loss= 1.1263 (max= 1.5833), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:30:35,872 - root - INFO - Step 31410: lr=1.00E-05, loss= 1.1263 (max= 1.5833), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:30:35,872 - root - INFO - Step 31410: lr=1.00E-05, loss= 1.1263 (max= 1.5833), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:30:35,872 - root - INFO - Step 31410: lr=1.00E-05, loss= 1.1263 (max= 1.5833), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:30:35,872 - root - INFO - Step 31410: lr=1.00E-05, loss= 1.1263 (max= 1.5833), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:30:51,776 - root - INFO - Step 31420: lr=1.00E-05, loss= 1.1298 (max= 1.7728), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:30:51,776 - root - INFO - Step 31420: lr=1.00E-05, loss= 1.1298 (max= 1.7728), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:30:51,776 - root - INFO - Step 31420: lr=1.00E-05, loss= 1.1298 (max= 1.7728), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:30:51,776 - root - INFO - Step 31420: lr=1.00E-05, loss= 1.1298 (max= 1.7728), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:30:51,776 - root - INFO - Step 31420: lr=1.00E-05, loss= 1.1298 (max= 1.7728), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:30:51,776 - root - INFO - Step 31420: lr=1.00E-05, loss= 1.1298 (max= 1.7728), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:30:51,777 - root - INFO - Step 31420: lr=1.00E-05, loss= 1.1298 (max= 1.7728), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:30:51,777 - root - INFO - Step 31420: lr=1.00E-05, loss= 1.1298 (max= 1.7728), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:31:07,714 - root - INFO - Step 31430: lr=1.00E-05, loss= 1.1403 (max= 1.4564), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:31:07,714 - root - INFO - Step 31430: lr=1.00E-05, loss= 1.1403 (max= 1.4564), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:31:07,715 - root - INFO - Step 31430: lr=1.00E-05, loss= 1.1403 (max= 1.4564), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:31:07,715 - root - INFO - Step 31430: lr=1.00E-05, loss= 1.1403 (max= 1.4564), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:31:07,715 - root - INFO - Step 31430: lr=1.00E-05, loss= 1.1403 (max= 1.4564), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:31:07,715 - root - INFO - Step 31430: lr=1.00E-05, loss= 1.1403 (max= 1.4564), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:31:07,715 - root - INFO - Step 31430: lr=1.00E-05, loss= 1.1403 (max= 1.4564), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:31:07,715 - root - INFO - Step 31430: lr=1.00E-05, loss= 1.1403 (max= 1.4564), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:31:13,259 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:7112798 +2025-10-25 00:31:23,684 - root - INFO - Step 31440: lr=1.00E-05, loss= 1.1586 (max= 1.5304), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:31:23,684 - root - INFO - Step 31440: lr=1.00E-05, loss= 1.1586 (max= 1.5304), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:31:23,684 - root - INFO - Step 31440: lr=1.00E-05, loss= 1.1586 (max= 1.5304), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:31:23,684 - root - INFO - Step 31440: lr=1.00E-05, loss= 1.1586 (max= 1.5304), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:31:23,684 - root - INFO - Step 31440: lr=1.00E-05, loss= 1.1586 (max= 1.5304), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:31:23,684 - root - INFO - Step 31440: lr=1.00E-05, loss= 1.1586 (max= 1.5304), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:31:23,684 - root - INFO - Step 31440: lr=1.00E-05, loss= 1.1586 (max= 1.5304), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:31:23,684 - root - INFO - Step 31440: lr=1.00E-05, loss= 1.1586 (max= 1.5304), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:31:39,614 - root - INFO - Step 31450: lr=1.00E-05, loss= 1.1481 (max= 1.5733), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:31:39,614 - root - INFO - Step 31450: lr=1.00E-05, loss= 1.1481 (max= 1.5733), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:31:39,614 - root - INFO - Step 31450: lr=1.00E-05, loss= 1.1481 (max= 1.5733), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:31:39,614 - root - INFO - Step 31450: lr=1.00E-05, loss= 1.1481 (max= 1.5733), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:31:39,615 - root - INFO - Step 31450: lr=1.00E-05, loss= 1.1481 (max= 1.5733), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:31:39,615 - root - INFO - Step 31450: lr=1.00E-05, loss= 1.1481 (max= 1.5733), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:31:39,615 - root - INFO - Step 31450: lr=1.00E-05, loss= 1.1481 (max= 1.5733), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:31:39,615 - root - INFO - Step 31450: lr=1.00E-05, loss= 1.1481 (max= 1.5733), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:31:55,541 - root - INFO - Step 31460: lr=1.00E-05, loss= 1.1576 (max= 1.4999), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:31:55,541 - root - INFO - Step 31460: lr=1.00E-05, loss= 1.1576 (max= 1.4999), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:31:55,541 - root - INFO - Step 31460: lr=1.00E-05, loss= 1.1576 (max= 1.4999), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:31:55,541 - root - INFO - Step 31460: lr=1.00E-05, loss= 1.1576 (max= 1.4999), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:31:55,541 - root - INFO - Step 31460: lr=1.00E-05, loss= 1.1576 (max= 1.4999), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:31:55,541 - root - INFO - Step 31460: lr=1.00E-05, loss= 1.1576 (max= 1.4999), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:31:55,541 - root - INFO - Step 31460: lr=1.00E-05, loss= 1.1576 (max= 1.4999), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:31:55,541 - root - INFO - Step 31460: lr=1.00E-05, loss= 1.1576 (max= 1.4999), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:32:11,473 - root - INFO - Step 31470: lr=1.00E-05, loss= 1.1388 (max= 1.6399), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:32:11,473 - root - INFO - Step 31470: lr=1.00E-05, loss= 1.1388 (max= 1.6399), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:32:11,473 - root - INFO - Step 31470: lr=1.00E-05, loss= 1.1388 (max= 1.6399), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:32:11,473 - root - INFO - Step 31470: lr=1.00E-05, loss= 1.1388 (max= 1.6399), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:32:11,473 - root - INFO - Step 31470: lr=1.00E-05, loss= 1.1388 (max= 1.6399), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:32:11,473 - root - INFO - Step 31470: lr=1.00E-05, loss= 1.1388 (max= 1.6399), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:32:11,473 - root - INFO - Step 31470: lr=1.00E-05, loss= 1.1388 (max= 1.6399), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:32:11,473 - root - INFO - Step 31470: lr=1.00E-05, loss= 1.1388 (max= 1.6399), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:32:27,412 - root - INFO - Step 31480: lr=1.00E-05, loss= 1.1377 (max= 1.4719), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:32:27,412 - root - INFO - Step 31480: lr=1.00E-05, loss= 1.1377 (max= 1.4719), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:32:27,412 - root - INFO - Step 31480: lr=1.00E-05, loss= 1.1377 (max= 1.4719), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:32:27,412 - root - INFO - Step 31480: lr=1.00E-05, loss= 1.1377 (max= 1.4719), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:32:27,412 - root - INFO - Step 31480: lr=1.00E-05, loss= 1.1377 (max= 1.4719), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:32:27,412 - root - INFO - Step 31480: lr=1.00E-05, loss= 1.1377 (max= 1.4719), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:32:27,412 - root - INFO - Step 31480: lr=1.00E-05, loss= 1.1377 (max= 1.4719), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:32:27,412 - root - INFO - Step 31480: lr=1.00E-05, loss= 1.1377 (max= 1.4719), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:32:43,360 - root - INFO - Step 31490: lr=1.00E-05, loss= 1.1411 (max= 1.4955), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:32:43,361 - root - INFO - Step 31490: lr=1.00E-05, loss= 1.1411 (max= 1.4955), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:32:43,361 - root - INFO - Step 31490: lr=1.00E-05, loss= 1.1411 (max= 1.4955), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:32:43,361 - root - INFO - Step 31490: lr=1.00E-05, loss= 1.1411 (max= 1.4955), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:32:43,361 - root - INFO - Step 31490: lr=1.00E-05, loss= 1.1411 (max= 1.4955), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:32:43,361 - root - INFO - Step 31490: lr=1.00E-05, loss= 1.1411 (max= 1.4955), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:32:43,361 - root - INFO - Step 31490: lr=1.00E-05, loss= 1.1411 (max= 1.4955), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:32:43,361 - root - INFO - Step 31490: lr=1.00E-05, loss= 1.1411 (max= 1.4955), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:32:59,275 - root - INFO - Step 31500: lr=1.00E-05, loss= 1.1184 (max= 1.5054), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:32:59,275 - root - INFO - Step 31500: lr=1.00E-05, loss= 1.1184 (max= 1.5054), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:32:59,275 - root - INFO - Step 31500: lr=1.00E-05, loss= 1.1184 (max= 1.5054), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:32:59,275 - root - INFO - Step 31500: lr=1.00E-05, loss= 1.1184 (max= 1.5054), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:32:59,275 - root - INFO - Step 31500: lr=1.00E-05, loss= 1.1184 (max= 1.5054), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:32:59,275 - root - INFO - Step 31500: lr=1.00E-05, loss= 1.1184 (max= 1.5054), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:32:59,275 - root - INFO - Step 31500: lr=1.00E-05, loss= 1.1184 (max= 1.5054), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:32:59,275 - root - INFO - Step 31500: lr=1.00E-05, loss= 1.1184 (max= 1.5054), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:33:15,190 - root - INFO - Step 31510: lr=1.00E-05, loss= 1.1470 (max= 1.4836), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:33:15,190 - root - INFO - Step 31510: lr=1.00E-05, loss= 1.1470 (max= 1.4836), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:33:15,190 - root - INFO - Step 31510: lr=1.00E-05, loss= 1.1470 (max= 1.4836), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:33:15,190 - root - INFO - Step 31510: lr=1.00E-05, loss= 1.1470 (max= 1.4836), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:33:15,190 - root - INFO - Step 31510: lr=1.00E-05, loss= 1.1470 (max= 1.4836), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:33:15,190 - root - INFO - Step 31510: lr=1.00E-05, loss= 1.1470 (max= 1.4836), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:33:15,190 - root - INFO - Step 31510: lr=1.00E-05, loss= 1.1470 (max= 1.4836), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:33:15,190 - root - INFO - Step 31510: lr=1.00E-05, loss= 1.1470 (max= 1.4836), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:33:31,105 - root - INFO - Step 31520: lr=1.00E-05, loss= 1.1481 (max= 1.7117), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:33:31,105 - root - INFO - Step 31520: lr=1.00E-05, loss= 1.1481 (max= 1.7117), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:33:31,105 - root - INFO - Step 31520: lr=1.00E-05, loss= 1.1481 (max= 1.7117), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:33:31,105 - root - INFO - Step 31520: lr=1.00E-05, loss= 1.1481 (max= 1.7117), tps=20593, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:33:31,106 - root - INFO - Step 31520: lr=1.00E-05, loss= 1.1481 (max= 1.7117), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:33:31,106 - root - INFO - Step 31520: lr=1.00E-05, loss= 1.1481 (max= 1.7117), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:33:31,106 - root - INFO - Step 31520: lr=1.00E-05, loss= 1.1481 (max= 1.7117), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:33:31,106 - root - INFO - Step 31520: lr=1.00E-05, loss= 1.1481 (max= 1.7117), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:33:47,051 - root - INFO - Step 31530: lr=1.00E-05, loss= 1.1253 (max= 1.5870), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:33:47,051 - root - INFO - Step 31530: lr=1.00E-05, loss= 1.1253 (max= 1.5870), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:33:47,051 - root - INFO - Step 31530: lr=1.00E-05, loss= 1.1253 (max= 1.5870), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:33:47,051 - root - INFO - Step 31530: lr=1.00E-05, loss= 1.1253 (max= 1.5870), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:33:47,051 - root - INFO - Step 31530: lr=1.00E-05, loss= 1.1253 (max= 1.5870), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:33:47,051 - root - INFO - Step 31530: lr=1.00E-05, loss= 1.1253 (max= 1.5870), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:33:47,051 - root - INFO - Step 31530: lr=1.00E-05, loss= 1.1253 (max= 1.5870), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:33:47,051 - root - INFO - Step 31530: lr=1.00E-05, loss= 1.1253 (max= 1.5870), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:34:03,027 - root - INFO - Step 31540: lr=1.00E-05, loss= 1.1355 (max= 1.5720), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:34:03,027 - root - INFO - Step 31540: lr=1.00E-05, loss= 1.1355 (max= 1.5720), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:34:03,027 - root - INFO - Step 31540: lr=1.00E-05, loss= 1.1355 (max= 1.5720), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:34:03,027 - root - INFO - Step 31540: lr=1.00E-05, loss= 1.1355 (max= 1.5720), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:34:03,027 - root - INFO - Step 31540: lr=1.00E-05, loss= 1.1355 (max= 1.5720), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:34:03,027 - root - INFO - Step 31540: lr=1.00E-05, loss= 1.1355 (max= 1.5720), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:34:03,027 - root - INFO - Step 31540: lr=1.00E-05, loss= 1.1355 (max= 1.5720), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:34:03,027 - root - INFO - Step 31540: lr=1.00E-05, loss= 1.1355 (max= 1.5720), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:34:18,997 - root - INFO - Step 31550: lr=1.00E-05, loss= 1.1639 (max= 1.5540), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:34:18,997 - root - INFO - Step 31550: lr=1.00E-05, loss= 1.1639 (max= 1.5540), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:34:18,997 - root - INFO - Step 31550: lr=1.00E-05, loss= 1.1639 (max= 1.5540), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:34:18,997 - root - INFO - Step 31550: lr=1.00E-05, loss= 1.1639 (max= 1.5540), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:34:18,997 - root - INFO - Step 31550: lr=1.00E-05, loss= 1.1639 (max= 1.5540), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:34:18,997 - root - INFO - Step 31550: lr=1.00E-05, loss= 1.1639 (max= 1.5540), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:34:18,997 - root - INFO - Step 31550: lr=1.00E-05, loss= 1.1639 (max= 1.5540), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:34:18,997 - root - INFO - Step 31550: lr=1.00E-05, loss= 1.1639 (max= 1.5540), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:34:34,970 - root - INFO - Step 31560: lr=1.00E-05, loss= 1.1536 (max= 1.5829), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:34:34,970 - root - INFO - Step 31560: lr=1.00E-05, loss= 1.1536 (max= 1.5829), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:34:34,970 - root - INFO - Step 31560: lr=1.00E-05, loss= 1.1536 (max= 1.5829), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:34:34,970 - root - INFO - Step 31560: lr=1.00E-05, loss= 1.1536 (max= 1.5829), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:34:34,970 - root - INFO - Step 31560: lr=1.00E-05, loss= 1.1536 (max= 1.5829), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:34:34,970 - root - INFO - Step 31560: lr=1.00E-05, loss= 1.1536 (max= 1.5829), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:34:34,970 - root - INFO - Step 31560: lr=1.00E-05, loss= 1.1536 (max= 1.5829), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:34:34,970 - root - INFO - Step 31560: lr=1.00E-05, loss= 1.1536 (max= 1.5829), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:34:50,916 - root - INFO - Step 31570: lr=1.00E-05, loss= 1.1529 (max= 1.5543), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:34:50,916 - root - INFO - Step 31570: lr=1.00E-05, loss= 1.1529 (max= 1.5543), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:34:50,916 - root - INFO - Step 31570: lr=1.00E-05, loss= 1.1529 (max= 1.5543), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:34:50,916 - root - INFO - Step 31570: lr=1.00E-05, loss= 1.1529 (max= 1.5543), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:34:50,917 - root - INFO - Step 31570: lr=1.00E-05, loss= 1.1529 (max= 1.5543), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:34:50,917 - root - INFO - Step 31570: lr=1.00E-05, loss= 1.1529 (max= 1.5543), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:34:50,917 - root - INFO - Step 31570: lr=1.00E-05, loss= 1.1529 (max= 1.5543), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:34:50,917 - root - INFO - Step 31570: lr=1.00E-05, loss= 1.1529 (max= 1.5543), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:35:06,844 - root - INFO - Step 31580: lr=1.00E-05, loss= 1.1451 (max= 1.6311), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:06,844 - root - INFO - Step 31580: lr=1.00E-05, loss= 1.1451 (max= 1.6311), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:06,844 - root - INFO - Step 31580: lr=1.00E-05, loss= 1.1451 (max= 1.6311), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:06,844 - root - INFO - Step 31580: lr=1.00E-05, loss= 1.1451 (max= 1.6311), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:06,844 - root - INFO - Step 31580: lr=1.00E-05, loss= 1.1451 (max= 1.6311), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:06,845 - root - INFO - Step 31580: lr=1.00E-05, loss= 1.1451 (max= 1.6311), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:06,845 - root - INFO - Step 31580: lr=1.00E-05, loss= 1.1451 (max= 1.6311), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:06,845 - root - INFO - Step 31580: lr=1.00E-05, loss= 1.1451 (max= 1.6311), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:22,785 - root - INFO - Step 31590: lr=1.00E-05, loss= 1.1735 (max= 1.5731), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:35:22,785 - root - INFO - Step 31590: lr=1.00E-05, loss= 1.1735 (max= 1.5731), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:35:22,785 - root - INFO - Step 31590: lr=1.00E-05, loss= 1.1735 (max= 1.5731), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:35:22,785 - root - INFO - Step 31590: lr=1.00E-05, loss= 1.1735 (max= 1.5731), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:35:22,785 - root - INFO - Step 31590: lr=1.00E-05, loss= 1.1735 (max= 1.5731), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:35:22,785 - root - INFO - Step 31590: lr=1.00E-05, loss= 1.1735 (max= 1.5731), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:35:22,785 - root - INFO - Step 31590: lr=1.00E-05, loss= 1.1735 (max= 1.5731), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:35:22,785 - root - INFO - Step 31590: lr=1.00E-05, loss= 1.1735 (max= 1.5731), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:35:38,738 - root - INFO - Step 31600: lr=1.00E-05, loss= 1.1368 (max= 1.4426), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:38,738 - root - INFO - Step 31600: lr=1.00E-05, loss= 1.1368 (max= 1.4426), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:38,738 - root - INFO - Step 31600: lr=1.00E-05, loss= 1.1368 (max= 1.4426), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:38,738 - root - INFO - Step 31600: lr=1.00E-05, loss= 1.1368 (max= 1.4426), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:38,738 - root - INFO - Step 31600: lr=1.00E-05, loss= 1.1368 (max= 1.4426), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:38,738 - root - INFO - Step 31600: lr=1.00E-05, loss= 1.1368 (max= 1.4426), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:38,738 - root - INFO - Step 31600: lr=1.00E-05, loss= 1.1368 (max= 1.4426), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:38,738 - root - INFO - Step 31600: lr=1.00E-05, loss= 1.1368 (max= 1.4426), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:54,691 - root - INFO - Step 31610: lr=1.00E-05, loss= 1.1368 (max= 1.8227), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:54,691 - root - INFO - Step 31610: lr=1.00E-05, loss= 1.1368 (max= 1.8227), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:54,691 - root - INFO - Step 31610: lr=1.00E-05, loss= 1.1368 (max= 1.8227), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:54,691 - root - INFO - Step 31610: lr=1.00E-05, loss= 1.1368 (max= 1.8227), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:54,691 - root - INFO - Step 31610: lr=1.00E-05, loss= 1.1368 (max= 1.8227), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:54,691 - root - INFO - Step 31610: lr=1.00E-05, loss= 1.1368 (max= 1.8227), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:54,691 - root - INFO - Step 31610: lr=1.00E-05, loss= 1.1368 (max= 1.8227), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:35:54,691 - root - INFO - Step 31610: lr=1.00E-05, loss= 1.1368 (max= 1.8227), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:10,632 - root - INFO - Step 31620: lr=1.00E-05, loss= 1.1779 (max= 1.5331), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:36:10,632 - root - INFO - Step 31620: lr=1.00E-05, loss= 1.1779 (max= 1.5331), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:36:10,632 - root - INFO - Step 31620: lr=1.00E-05, loss= 1.1779 (max= 1.5331), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:36:10,632 - root - INFO - Step 31620: lr=1.00E-05, loss= 1.1779 (max= 1.5331), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:36:10,632 - root - INFO - Step 31620: lr=1.00E-05, loss= 1.1779 (max= 1.5331), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:36:10,632 - root - INFO - Step 31620: lr=1.00E-05, loss= 1.1779 (max= 1.5331), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:36:10,632 - root - INFO - Step 31620: lr=1.00E-05, loss= 1.1779 (max= 1.5331), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:36:10,632 - root - INFO - Step 31620: lr=1.00E-05, loss= 1.1779 (max= 1.5331), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:36:26,597 - root - INFO - Step 31630: lr=1.00E-05, loss= 1.1681 (max= 1.5057), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:26,597 - root - INFO - Step 31630: lr=1.00E-05, loss= 1.1681 (max= 1.5057), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:26,597 - root - INFO - Step 31630: lr=1.00E-05, loss= 1.1681 (max= 1.5057), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:26,597 - root - INFO - Step 31630: lr=1.00E-05, loss= 1.1681 (max= 1.5057), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:26,598 - root - INFO - Step 31630: lr=1.00E-05, loss= 1.1681 (max= 1.5057), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:26,598 - root - INFO - Step 31630: lr=1.00E-05, loss= 1.1681 (max= 1.5057), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:26,598 - root - INFO - Step 31630: lr=1.00E-05, loss= 1.1681 (max= 1.5057), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:26,598 - root - INFO - Step 31630: lr=1.00E-05, loss= 1.1681 (max= 1.5057), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:42,575 - root - INFO - Step 31640: lr=1.00E-05, loss= 1.1525 (max= 1.5991), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:42,575 - root - INFO - Step 31640: lr=1.00E-05, loss= 1.1525 (max= 1.5991), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:42,575 - root - INFO - Step 31640: lr=1.00E-05, loss= 1.1525 (max= 1.5991), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:42,575 - root - INFO - Step 31640: lr=1.00E-05, loss= 1.1525 (max= 1.5991), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:42,575 - root - INFO - Step 31640: lr=1.00E-05, loss= 1.1525 (max= 1.5991), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:42,575 - root - INFO - Step 31640: lr=1.00E-05, loss= 1.1525 (max= 1.5991), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:42,575 - root - INFO - Step 31640: lr=1.00E-05, loss= 1.1525 (max= 1.5991), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:42,575 - root - INFO - Step 31640: lr=1.00E-05, loss= 1.1525 (max= 1.5991), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:43,328 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:2924620 +2025-10-25 00:36:58,515 - root - INFO - Step 31650: lr=1.00E-05, loss= 1.1528 (max= 1.5326), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:58,515 - root - INFO - Step 31650: lr=1.00E-05, loss= 1.1528 (max= 1.5326), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:58,515 - root - INFO - Step 31650: lr=1.00E-05, loss= 1.1528 (max= 1.5326), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:58,515 - root - INFO - Step 31650: lr=1.00E-05, loss= 1.1528 (max= 1.5326), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:58,515 - root - INFO - Step 31650: lr=1.00E-05, loss= 1.1528 (max= 1.5326), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:58,515 - root - INFO - Step 31650: lr=1.00E-05, loss= 1.1528 (max= 1.5326), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:58,515 - root - INFO - Step 31650: lr=1.00E-05, loss= 1.1528 (max= 1.5326), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:36:58,515 - root - INFO - Step 31650: lr=1.00E-05, loss= 1.1528 (max= 1.5326), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:14,452 - root - INFO - Step 31660: lr=1.00E-05, loss= 1.1436 (max= 1.4902), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:14,452 - root - INFO - Step 31660: lr=1.00E-05, loss= 1.1436 (max= 1.4902), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:14,452 - root - INFO - Step 31660: lr=1.00E-05, loss= 1.1436 (max= 1.4902), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:14,452 - root - INFO - Step 31660: lr=1.00E-05, loss= 1.1436 (max= 1.4902), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:14,452 - root - INFO - Step 31660: lr=1.00E-05, loss= 1.1436 (max= 1.4902), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:14,452 - root - INFO - Step 31660: lr=1.00E-05, loss= 1.1436 (max= 1.4902), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:14,452 - root - INFO - Step 31660: lr=1.00E-05, loss= 1.1436 (max= 1.4902), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:14,452 - root - INFO - Step 31660: lr=1.00E-05, loss= 1.1436 (max= 1.4902), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:30,336 - root - INFO - Step 31670: lr=1.00E-05, loss= 1.1612 (max= 1.5245), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:30,336 - root - INFO - Step 31670: lr=1.00E-05, loss= 1.1612 (max= 1.5245), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:30,336 - root - INFO - Step 31670: lr=1.00E-05, loss= 1.1612 (max= 1.5245), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:30,336 - root - INFO - Step 31670: lr=1.00E-05, loss= 1.1612 (max= 1.5245), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:30,336 - root - INFO - Step 31670: lr=1.00E-05, loss= 1.1612 (max= 1.5245), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:30,336 - root - INFO - Step 31670: lr=1.00E-05, loss= 1.1612 (max= 1.5245), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:30,336 - root - INFO - Step 31670: lr=1.00E-05, loss= 1.1612 (max= 1.5245), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:30,336 - root - INFO - Step 31670: lr=1.00E-05, loss= 1.1612 (max= 1.5245), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:46,269 - root - INFO - Step 31680: lr=1.00E-05, loss= 1.1602 (max= 1.5966), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:46,269 - root - INFO - Step 31680: lr=1.00E-05, loss= 1.1602 (max= 1.5966), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:46,270 - root - INFO - Step 31680: lr=1.00E-05, loss= 1.1602 (max= 1.5966), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:46,270 - root - INFO - Step 31680: lr=1.00E-05, loss= 1.1602 (max= 1.5966), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:46,270 - root - INFO - Step 31680: lr=1.00E-05, loss= 1.1602 (max= 1.5966), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:46,270 - root - INFO - Step 31680: lr=1.00E-05, loss= 1.1602 (max= 1.5966), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:46,270 - root - INFO - Step 31680: lr=1.00E-05, loss= 1.1602 (max= 1.5966), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:37:46,270 - root - INFO - Step 31680: lr=1.00E-05, loss= 1.1602 (max= 1.5966), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:38:02,171 - root - INFO - Step 31690: lr=1.00E-05, loss= 1.1638 (max= 1.8468), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:02,171 - root - INFO - Step 31690: lr=1.00E-05, loss= 1.1638 (max= 1.8468), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:02,171 - root - INFO - Step 31690: lr=1.00E-05, loss= 1.1638 (max= 1.8468), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:02,171 - root - INFO - Step 31690: lr=1.00E-05, loss= 1.1638 (max= 1.8468), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:02,172 - root - INFO - Step 31690: lr=1.00E-05, loss= 1.1638 (max= 1.8468), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:02,172 - root - INFO - Step 31690: lr=1.00E-05, loss= 1.1638 (max= 1.8468), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:02,172 - root - INFO - Step 31690: lr=1.00E-05, loss= 1.1638 (max= 1.8468), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:02,172 - root - INFO - Step 31690: lr=1.00E-05, loss= 1.1638 (max= 1.8468), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:18,091 - root - INFO - Step 31700: lr=1.00E-05, loss= 1.1275 (max= 1.4765), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:18,091 - root - INFO - Step 31700: lr=1.00E-05, loss= 1.1275 (max= 1.4765), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:18,091 - root - INFO - Step 31700: lr=1.00E-05, loss= 1.1275 (max= 1.4765), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:18,091 - root - INFO - Step 31700: lr=1.00E-05, loss= 1.1275 (max= 1.4765), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:18,091 - root - INFO - Step 31700: lr=1.00E-05, loss= 1.1275 (max= 1.4765), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:18,091 - root - INFO - Step 31700: lr=1.00E-05, loss= 1.1275 (max= 1.4765), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:18,091 - root - INFO - Step 31700: lr=1.00E-05, loss= 1.1275 (max= 1.4765), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:18,091 - root - INFO - Step 31700: lr=1.00E-05, loss= 1.1275 (max= 1.4765), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:34,006 - root - INFO - Step 31710: lr=1.00E-05, loss= 1.1450 (max= 1.6625), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:38:34,006 - root - INFO - Step 31710: lr=1.00E-05, loss= 1.1450 (max= 1.6625), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:38:34,006 - root - INFO - Step 31710: lr=1.00E-05, loss= 1.1450 (max= 1.6625), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:38:34,006 - root - INFO - Step 31710: lr=1.00E-05, loss= 1.1450 (max= 1.6625), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:38:34,006 - root - INFO - Step 31710: lr=1.00E-05, loss= 1.1450 (max= 1.6625), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:38:34,006 - root - INFO - Step 31710: lr=1.00E-05, loss= 1.1450 (max= 1.6625), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:38:34,006 - root - INFO - Step 31710: lr=1.00E-05, loss= 1.1450 (max= 1.6625), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:38:34,006 - root - INFO - Step 31710: lr=1.00E-05, loss= 1.1450 (max= 1.6625), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:38:49,994 - root - INFO - Step 31720: lr=1.00E-05, loss= 1.1572 (max= 1.5055), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:49,994 - root - INFO - Step 31720: lr=1.00E-05, loss= 1.1572 (max= 1.5055), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:49,994 - root - INFO - Step 31720: lr=1.00E-05, loss= 1.1572 (max= 1.5055), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:49,994 - root - INFO - Step 31720: lr=1.00E-05, loss= 1.1572 (max= 1.5055), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:49,994 - root - INFO - Step 31720: lr=1.00E-05, loss= 1.1572 (max= 1.5055), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:49,994 - root - INFO - Step 31720: lr=1.00E-05, loss= 1.1572 (max= 1.5055), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:49,994 - root - INFO - Step 31720: lr=1.00E-05, loss= 1.1572 (max= 1.5055), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:38:49,994 - root - INFO - Step 31720: lr=1.00E-05, loss= 1.1572 (max= 1.5055), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:39:05,936 - root - INFO - Step 31730: lr=1.00E-05, loss= 1.1693 (max= 1.5771), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:39:05,937 - root - INFO - Step 31730: lr=1.00E-05, loss= 1.1693 (max= 1.5771), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:39:05,937 - root - INFO - Step 31730: lr=1.00E-05, loss= 1.1693 (max= 1.5771), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:39:05,937 - root - INFO - Step 31730: lr=1.00E-05, loss= 1.1693 (max= 1.5771), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:39:05,937 - root - INFO - Step 31730: lr=1.00E-05, loss= 1.1693 (max= 1.5771), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:39:05,937 - root - INFO - Step 31730: lr=1.00E-05, loss= 1.1693 (max= 1.5771), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:39:05,937 - root - INFO - Step 31730: lr=1.00E-05, loss= 1.1693 (max= 1.5771), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:39:05,937 - root - INFO - Step 31730: lr=1.00E-05, loss= 1.1693 (max= 1.5771), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:39:21,918 - root - INFO - Step 31740: lr=1.00E-05, loss= 1.1686 (max= 1.7597), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:39:21,918 - root - INFO - Step 31740: lr=1.00E-05, loss= 1.1686 (max= 1.7597), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:39:21,918 - root - INFO - Step 31740: lr=1.00E-05, loss= 1.1686 (max= 1.7597), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:39:21,918 - root - INFO - Step 31740: lr=1.00E-05, loss= 1.1686 (max= 1.7597), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:39:21,918 - root - INFO - Step 31740: lr=1.00E-05, loss= 1.1686 (max= 1.7597), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:39:21,918 - root - INFO - Step 31740: lr=1.00E-05, loss= 1.1686 (max= 1.7597), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:39:21,918 - root - INFO - Step 31740: lr=1.00E-05, loss= 1.1686 (max= 1.7597), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:39:21,918 - root - INFO - Step 31740: lr=1.00E-05, loss= 1.1686 (max= 1.7597), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:39:37,856 - root - INFO - Step 31750: lr=1.00E-05, loss= 1.1658 (max= 1.6043), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:39:37,856 - root - INFO - Step 31750: lr=1.00E-05, loss= 1.1658 (max= 1.6043), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:39:37,856 - root - INFO - Step 31750: lr=1.00E-05, loss= 1.1658 (max= 1.6043), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:39:37,856 - root - INFO - Step 31750: lr=1.00E-05, loss= 1.1658 (max= 1.6043), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:39:37,856 - root - INFO - Step 31750: lr=1.00E-05, loss= 1.1658 (max= 1.6043), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:39:37,856 - root - INFO - Step 31750: lr=1.00E-05, loss= 1.1658 (max= 1.6043), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:39:37,856 - root - INFO - Step 31750: lr=1.00E-05, loss= 1.1658 (max= 1.6043), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:39:37,856 - root - INFO - Step 31750: lr=1.00E-05, loss= 1.1658 (max= 1.6043), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:39:53,803 - root - INFO - Step 31760: lr=1.00E-05, loss= 1.1376 (max= 1.4727), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:39:53,803 - root - INFO - Step 31760: lr=1.00E-05, loss= 1.1376 (max= 1.4727), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:39:53,803 - root - INFO - Step 31760: lr=1.00E-05, loss= 1.1376 (max= 1.4727), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:39:53,803 - root - INFO - Step 31760: lr=1.00E-05, loss= 1.1376 (max= 1.4727), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:39:53,803 - root - INFO - Step 31760: lr=1.00E-05, loss= 1.1376 (max= 1.4727), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:39:53,803 - root - INFO - Step 31760: lr=1.00E-05, loss= 1.1376 (max= 1.4727), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:39:53,803 - root - INFO - Step 31760: lr=1.00E-05, loss= 1.1376 (max= 1.4727), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:39:53,803 - root - INFO - Step 31760: lr=1.00E-05, loss= 1.1376 (max= 1.4727), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:40:09,762 - root - INFO - Step 31770: lr=1.00E-05, loss= 1.1753 (max= 1.5876), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:40:09,762 - root - INFO - Step 31770: lr=1.00E-05, loss= 1.1753 (max= 1.5876), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:40:09,762 - root - INFO - Step 31770: lr=1.00E-05, loss= 1.1753 (max= 1.5876), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:40:09,762 - root - INFO - Step 31770: lr=1.00E-05, loss= 1.1753 (max= 1.5876), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:40:09,762 - root - INFO - Step 31770: lr=1.00E-05, loss= 1.1753 (max= 1.5876), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:40:09,762 - root - INFO - Step 31770: lr=1.00E-05, loss= 1.1753 (max= 1.5876), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:40:09,762 - root - INFO - Step 31770: lr=1.00E-05, loss= 1.1753 (max= 1.5876), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:40:09,762 - root - INFO - Step 31770: lr=1.00E-05, loss= 1.1753 (max= 1.5876), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:40:25,678 - root - INFO - Step 31780: lr=1.00E-05, loss= 1.1544 (max= 1.4585), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:40:25,678 - root - INFO - Step 31780: lr=1.00E-05, loss= 1.1544 (max= 1.4585), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:40:25,678 - root - INFO - Step 31780: lr=1.00E-05, loss= 1.1544 (max= 1.4585), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:40:25,678 - root - INFO - Step 31780: lr=1.00E-05, loss= 1.1544 (max= 1.4585), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:40:25,678 - root - INFO - Step 31780: lr=1.00E-05, loss= 1.1544 (max= 1.4585), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:40:25,678 - root - INFO - Step 31780: lr=1.00E-05, loss= 1.1544 (max= 1.4585), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:40:25,678 - root - INFO - Step 31780: lr=1.00E-05, loss= 1.1544 (max= 1.4585), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:40:25,678 - root - INFO - Step 31780: lr=1.00E-05, loss= 1.1544 (max= 1.4585), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:40:41,611 - root - INFO - Step 31790: lr=1.00E-05, loss= 1.1530 (max= 1.5516), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:40:41,611 - root - INFO - Step 31790: lr=1.00E-05, loss= 1.1530 (max= 1.5516), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:40:41,611 - root - INFO - Step 31790: lr=1.00E-05, loss= 1.1530 (max= 1.5516), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:40:41,611 - root - INFO - Step 31790: lr=1.00E-05, loss= 1.1530 (max= 1.5516), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:40:41,611 - root - INFO - Step 31790: lr=1.00E-05, loss= 1.1530 (max= 1.5516), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:40:41,611 - root - INFO - Step 31790: lr=1.00E-05, loss= 1.1530 (max= 1.5516), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:40:41,611 - root - INFO - Step 31790: lr=1.00E-05, loss= 1.1530 (max= 1.5516), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:40:41,611 - root - INFO - Step 31790: lr=1.00E-05, loss= 1.1530 (max= 1.5516), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:40:57,565 - root - INFO - Step 31800: lr=1.00E-05, loss= 1.1348 (max= 1.5428), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:40:57,566 - root - INFO - Step 31800: lr=1.00E-05, loss= 1.1348 (max= 1.5428), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:40:57,566 - root - INFO - Step 31800: lr=1.00E-05, loss= 1.1348 (max= 1.5428), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:40:57,566 - root - INFO - Step 31800: lr=1.00E-05, loss= 1.1348 (max= 1.5428), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:40:57,566 - root - INFO - Step 31800: lr=1.00E-05, loss= 1.1348 (max= 1.5428), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:40:57,566 - root - INFO - Step 31800: lr=1.00E-05, loss= 1.1348 (max= 1.5428), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:40:57,566 - root - INFO - Step 31800: lr=1.00E-05, loss= 1.1348 (max= 1.5428), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:40:57,566 - root - INFO - Step 31800: lr=1.00E-05, loss= 1.1348 (max= 1.5428), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:41:13,490 - root - INFO - Step 31810: lr=1.00E-05, loss= 1.1537 (max= 1.5832), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:13,490 - root - INFO - Step 31810: lr=1.00E-05, loss= 1.1537 (max= 1.5832), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:13,490 - root - INFO - Step 31810: lr=1.00E-05, loss= 1.1537 (max= 1.5832), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:13,490 - root - INFO - Step 31810: lr=1.00E-05, loss= 1.1537 (max= 1.5832), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:13,490 - root - INFO - Step 31810: lr=1.00E-05, loss= 1.1537 (max= 1.5832), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:13,490 - root - INFO - Step 31810: lr=1.00E-05, loss= 1.1537 (max= 1.5832), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:13,490 - root - INFO - Step 31810: lr=1.00E-05, loss= 1.1537 (max= 1.5832), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:13,490 - root - INFO - Step 31810: lr=1.00E-05, loss= 1.1537 (max= 1.5832), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:29,492 - root - INFO - Step 31820: lr=1.00E-05, loss= 1.1482 (max= 1.5727), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:29,492 - root - INFO - Step 31820: lr=1.00E-05, loss= 1.1482 (max= 1.5727), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:29,492 - root - INFO - Step 31820: lr=1.00E-05, loss= 1.1482 (max= 1.5727), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:29,492 - root - INFO - Step 31820: lr=1.00E-05, loss= 1.1482 (max= 1.5727), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:29,492 - root - INFO - Step 31820: lr=1.00E-05, loss= 1.1482 (max= 1.5727), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:29,492 - root - INFO - Step 31820: lr=1.00E-05, loss= 1.1482 (max= 1.5727), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:29,492 - root - INFO - Step 31820: lr=1.00E-05, loss= 1.1482 (max= 1.5727), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:29,492 - root - INFO - Step 31820: lr=1.00E-05, loss= 1.1482 (max= 1.5727), tps=20481, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:45,399 - root - INFO - Step 31830: lr=1.00E-05, loss= 1.1794 (max= 1.6553), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:45,399 - root - INFO - Step 31830: lr=1.00E-05, loss= 1.1794 (max= 1.6553), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:45,399 - root - INFO - Step 31830: lr=1.00E-05, loss= 1.1794 (max= 1.6553), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:45,399 - root - INFO - Step 31830: lr=1.00E-05, loss= 1.1794 (max= 1.6553), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:45,399 - root - INFO - Step 31830: lr=1.00E-05, loss= 1.1794 (max= 1.6553), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:45,399 - root - INFO - Step 31830: lr=1.00E-05, loss= 1.1794 (max= 1.6553), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:45,399 - root - INFO - Step 31830: lr=1.00E-05, loss= 1.1794 (max= 1.6553), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:41:45,399 - root - INFO - Step 31830: lr=1.00E-05, loss= 1.1794 (max= 1.6553), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:01,344 - root - INFO - Step 31840: lr=1.00E-05, loss= 1.1707 (max= 1.5322), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:42:01,344 - root - INFO - Step 31840: lr=1.00E-05, loss= 1.1707 (max= 1.5322), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:42:01,344 - root - INFO - Step 31840: lr=1.00E-05, loss= 1.1707 (max= 1.5322), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:42:01,344 - root - INFO - Step 31840: lr=1.00E-05, loss= 1.1707 (max= 1.5322), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:42:01,344 - root - INFO - Step 31840: lr=1.00E-05, loss= 1.1707 (max= 1.5322), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:42:01,344 - root - INFO - Step 31840: lr=1.00E-05, loss= 1.1707 (max= 1.5322), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:42:01,344 - root - INFO - Step 31840: lr=1.00E-05, loss= 1.1707 (max= 1.5322), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:42:01,344 - root - INFO - Step 31840: lr=1.00E-05, loss= 1.1707 (max= 1.5322), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:42:17,304 - root - INFO - Step 31850: lr=1.00E-05, loss= 1.1489 (max= 1.5000), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:17,304 - root - INFO - Step 31850: lr=1.00E-05, loss= 1.1489 (max= 1.5000), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:17,304 - root - INFO - Step 31850: lr=1.00E-05, loss= 1.1489 (max= 1.5000), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:17,304 - root - INFO - Step 31850: lr=1.00E-05, loss= 1.1489 (max= 1.5000), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:17,304 - root - INFO - Step 31850: lr=1.00E-05, loss= 1.1489 (max= 1.5000), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:17,304 - root - INFO - Step 31850: lr=1.00E-05, loss= 1.1489 (max= 1.5000), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:17,304 - root - INFO - Step 31850: lr=1.00E-05, loss= 1.1489 (max= 1.5000), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:17,304 - root - INFO - Step 31850: lr=1.00E-05, loss= 1.1489 (max= 1.5000), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:33,147 - root - INFO - Step 31860: lr=1.00E-05, loss= 1.1437 (max= 1.5921), tps=20687, mfu=43.10%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:33,147 - root - INFO - Step 31860: lr=1.00E-05, loss= 1.1437 (max= 1.5921), tps=20687, mfu=43.10%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:33,147 - root - INFO - Step 31860: lr=1.00E-05, loss= 1.1437 (max= 1.5921), tps=20687, mfu=43.10%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:33,147 - root - INFO - Step 31860: lr=1.00E-05, loss= 1.1437 (max= 1.5921), tps=20687, mfu=43.10%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:33,147 - root - INFO - Step 31860: lr=1.00E-05, loss= 1.1437 (max= 1.5921), tps=20687, mfu=43.10%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:33,147 - root - INFO - Step 31860: lr=1.00E-05, loss= 1.1437 (max= 1.5921), tps=20687, mfu=43.10%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:33,147 - root - INFO - Step 31860: lr=1.00E-05, loss= 1.1437 (max= 1.5921), tps=20687, mfu=43.10%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:33,147 - root - INFO - Step 31860: lr=1.00E-05, loss= 1.1437 (max= 1.5921), tps=20687, mfu=43.10%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:49,093 - root - INFO - Step 31870: lr=1.00E-05, loss= 1.1552 (max= 1.5035), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:49,093 - root - INFO - Step 31870: lr=1.00E-05, loss= 1.1552 (max= 1.5035), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:49,093 - root - INFO - Step 31870: lr=1.00E-05, loss= 1.1552 (max= 1.5035), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:49,093 - root - INFO - Step 31870: lr=1.00E-05, loss= 1.1552 (max= 1.5035), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:49,093 - root - INFO - Step 31870: lr=1.00E-05, loss= 1.1552 (max= 1.5035), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:49,093 - root - INFO - Step 31870: lr=1.00E-05, loss= 1.1552 (max= 1.5035), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:49,093 - root - INFO - Step 31870: lr=1.00E-05, loss= 1.1552 (max= 1.5035), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:42:49,093 - root - INFO - Step 31870: lr=1.00E-05, loss= 1.1552 (max= 1.5035), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:43:04,996 - root - INFO - Step 31880: lr=1.00E-05, loss= 1.1604 (max= 1.6923), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:43:04,996 - root - INFO - Step 31880: lr=1.00E-05, loss= 1.1604 (max= 1.6923), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:43:04,996 - root - INFO - Step 31880: lr=1.00E-05, loss= 1.1604 (max= 1.6923), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:43:04,996 - root - INFO - Step 31880: lr=1.00E-05, loss= 1.1604 (max= 1.6923), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:43:04,996 - root - INFO - Step 31880: lr=1.00E-05, loss= 1.1604 (max= 1.6923), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:43:04,996 - root - INFO - Step 31880: lr=1.00E-05, loss= 1.1604 (max= 1.6923), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:43:04,996 - root - INFO - Step 31880: lr=1.00E-05, loss= 1.1604 (max= 1.6923), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:43:04,996 - root - INFO - Step 31880: lr=1.00E-05, loss= 1.1604 (max= 1.6923), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:43:20,957 - root - INFO - Step 31890: lr=1.00E-05, loss= 1.1434 (max= 1.4317), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:43:20,957 - root - INFO - Step 31890: lr=1.00E-05, loss= 1.1434 (max= 1.4317), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:43:20,957 - root - INFO - Step 31890: lr=1.00E-05, loss= 1.1434 (max= 1.4317), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:43:20,957 - root - INFO - Step 31890: lr=1.00E-05, loss= 1.1434 (max= 1.4317), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:43:20,957 - root - INFO - Step 31890: lr=1.00E-05, loss= 1.1434 (max= 1.4317), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:43:20,957 - root - INFO - Step 31890: lr=1.00E-05, loss= 1.1434 (max= 1.4317), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:43:20,957 - root - INFO - Step 31890: lr=1.00E-05, loss= 1.1434 (max= 1.4317), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:43:20,958 - root - INFO - Step 31890: lr=1.00E-05, loss= 1.1434 (max= 1.4317), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:43:36,824 - root - INFO - Step 31900: lr=1.00E-05, loss= 1.1652 (max= 1.5429), tps=20656, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:43:36,824 - root - INFO - Step 31900: lr=1.00E-05, loss= 1.1652 (max= 1.5429), tps=20656, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:43:36,824 - root - INFO - Step 31900: lr=1.00E-05, loss= 1.1652 (max= 1.5429), tps=20656, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:43:36,824 - root - INFO - Step 31900: lr=1.00E-05, loss= 1.1652 (max= 1.5429), tps=20656, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:43:36,824 - root - INFO - Step 31900: lr=1.00E-05, loss= 1.1652 (max= 1.5429), tps=20656, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:43:36,824 - root - INFO - Step 31900: lr=1.00E-05, loss= 1.1652 (max= 1.5429), tps=20656, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:43:36,824 - root - INFO - Step 31900: lr=1.00E-05, loss= 1.1652 (max= 1.5429), tps=20656, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:43:36,824 - root - INFO - Step 31900: lr=1.00E-05, loss= 1.1652 (max= 1.5429), tps=20656, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:43:40,736 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:577857 +2025-10-25 00:43:52,687 - root - INFO - Step 31910: lr=1.00E-05, loss= 1.1399 (max= 1.5261), tps=20661, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:43:52,687 - root - INFO - Step 31910: lr=1.00E-05, loss= 1.1399 (max= 1.5261), tps=20661, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:43:52,687 - root - INFO - Step 31910: lr=1.00E-05, loss= 1.1399 (max= 1.5261), tps=20661, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:43:52,687 - root - INFO - Step 31910: lr=1.00E-05, loss= 1.1399 (max= 1.5261), tps=20661, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:43:52,687 - root - INFO - Step 31910: lr=1.00E-05, loss= 1.1399 (max= 1.5261), tps=20662, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:43:52,687 - root - INFO - Step 31910: lr=1.00E-05, loss= 1.1399 (max= 1.5261), tps=20661, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:43:52,687 - root - INFO - Step 31910: lr=1.00E-05, loss= 1.1399 (max= 1.5261), tps=20662, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:43:52,687 - root - INFO - Step 31910: lr=1.00E-05, loss= 1.1399 (max= 1.5261), tps=20661, mfu=43.05%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:08,625 - root - INFO - Step 31920: lr=1.00E-05, loss= 1.1881 (max= 1.5934), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:08,626 - root - INFO - Step 31920: lr=1.00E-05, loss= 1.1881 (max= 1.5934), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:08,626 - root - INFO - Step 31920: lr=1.00E-05, loss= 1.1881 (max= 1.5934), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:08,626 - root - INFO - Step 31920: lr=1.00E-05, loss= 1.1881 (max= 1.5934), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:08,626 - root - INFO - Step 31920: lr=1.00E-05, loss= 1.1881 (max= 1.5934), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:08,626 - root - INFO - Step 31920: lr=1.00E-05, loss= 1.1881 (max= 1.5934), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:08,626 - root - INFO - Step 31920: lr=1.00E-05, loss= 1.1881 (max= 1.5934), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:08,626 - root - INFO - Step 31920: lr=1.00E-05, loss= 1.1881 (max= 1.5934), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:15,717 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:3284251 +2025-10-25 00:44:24,545 - root - INFO - Step 31930: lr=1.00E-05, loss= 1.1848 (max= 1.5799), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:24,545 - root - INFO - Step 31930: lr=1.00E-05, loss= 1.1848 (max= 1.5799), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:24,545 - root - INFO - Step 31930: lr=1.00E-05, loss= 1.1848 (max= 1.5799), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:24,545 - root - INFO - Step 31930: lr=1.00E-05, loss= 1.1848 (max= 1.5799), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:24,545 - root - INFO - Step 31930: lr=1.00E-05, loss= 1.1848 (max= 1.5799), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:24,545 - root - INFO - Step 31930: lr=1.00E-05, loss= 1.1848 (max= 1.5799), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:24,545 - root - INFO - Step 31930: lr=1.00E-05, loss= 1.1848 (max= 1.5799), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:24,545 - root - INFO - Step 31930: lr=1.00E-05, loss= 1.1848 (max= 1.5799), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:40,449 - root - INFO - Step 31940: lr=1.00E-05, loss= 1.1382 (max= 1.5703), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:40,449 - root - INFO - Step 31940: lr=1.00E-05, loss= 1.1382 (max= 1.5703), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:40,449 - root - INFO - Step 31940: lr=1.00E-05, loss= 1.1382 (max= 1.5703), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:40,449 - root - INFO - Step 31940: lr=1.00E-05, loss= 1.1382 (max= 1.5703), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:40,449 - root - INFO - Step 31940: lr=1.00E-05, loss= 1.1382 (max= 1.5703), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:40,449 - root - INFO - Step 31940: lr=1.00E-05, loss= 1.1382 (max= 1.5703), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:40,449 - root - INFO - Step 31940: lr=1.00E-05, loss= 1.1382 (max= 1.5703), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:40,449 - root - INFO - Step 31940: lr=1.00E-05, loss= 1.1382 (max= 1.5703), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:44:56,387 - root - INFO - Step 31950: lr=1.00E-05, loss= 1.1680 (max= 1.6445), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:44:56,387 - root - INFO - Step 31950: lr=1.00E-05, loss= 1.1680 (max= 1.6445), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:44:56,387 - root - INFO - Step 31950: lr=1.00E-05, loss= 1.1680 (max= 1.6445), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:44:56,387 - root - INFO - Step 31950: lr=1.00E-05, loss= 1.1680 (max= 1.6445), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:44:56,387 - root - INFO - Step 31950: lr=1.00E-05, loss= 1.1680 (max= 1.6445), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:44:56,387 - root - INFO - Step 31950: lr=1.00E-05, loss= 1.1680 (max= 1.6445), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:44:56,387 - root - INFO - Step 31950: lr=1.00E-05, loss= 1.1680 (max= 1.6445), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:44:56,387 - root - INFO - Step 31950: lr=1.00E-05, loss= 1.1680 (max= 1.6445), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:45:12,344 - root - INFO - Step 31960: lr=1.00E-05, loss= 1.1467 (max= 1.6359), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:45:12,344 - root - INFO - Step 31960: lr=1.00E-05, loss= 1.1467 (max= 1.6359), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:45:12,344 - root - INFO - Step 31960: lr=1.00E-05, loss= 1.1467 (max= 1.6359), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:45:12,344 - root - INFO - Step 31960: lr=1.00E-05, loss= 1.1467 (max= 1.6359), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:45:12,344 - root - INFO - Step 31960: lr=1.00E-05, loss= 1.1467 (max= 1.6359), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:45:12,344 - root - INFO - Step 31960: lr=1.00E-05, loss= 1.1467 (max= 1.6359), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:45:12,344 - root - INFO - Step 31960: lr=1.00E-05, loss= 1.1467 (max= 1.6359), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:45:12,344 - root - INFO - Step 31960: lr=1.00E-05, loss= 1.1467 (max= 1.6359), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:45:16,261 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:563386 +2025-10-25 00:45:28,294 - root - INFO - Step 31970: lr=1.00E-05, loss= 1.1587 (max= 1.7806), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:45:28,294 - root - INFO - Step 31970: lr=1.00E-05, loss= 1.1587 (max= 1.7806), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:45:28,294 - root - INFO - Step 31970: lr=1.00E-05, loss= 1.1587 (max= 1.7806), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:45:28,294 - root - INFO - Step 31970: lr=1.00E-05, loss= 1.1587 (max= 1.7806), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:45:28,294 - root - INFO - Step 31970: lr=1.00E-05, loss= 1.1587 (max= 1.7806), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:45:28,294 - root - INFO - Step 31970: lr=1.00E-05, loss= 1.1587 (max= 1.7806), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:45:28,294 - root - INFO - Step 31970: lr=1.00E-05, loss= 1.1587 (max= 1.7806), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:45:28,295 - root - INFO - Step 31970: lr=1.00E-05, loss= 1.1587 (max= 1.7806), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:45:44,286 - root - INFO - Step 31980: lr=1.00E-05, loss= 1.1755 (max= 1.5373), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:45:44,286 - root - INFO - Step 31980: lr=1.00E-05, loss= 1.1755 (max= 1.5373), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:45:44,286 - root - INFO - Step 31980: lr=1.00E-05, loss= 1.1755 (max= 1.5373), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:45:44,286 - root - INFO - Step 31980: lr=1.00E-05, loss= 1.1755 (max= 1.5373), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:45:44,286 - root - INFO - Step 31980: lr=1.00E-05, loss= 1.1755 (max= 1.5373), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:45:44,286 - root - INFO - Step 31980: lr=1.00E-05, loss= 1.1755 (max= 1.5373), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:45:44,286 - root - INFO - Step 31980: lr=1.00E-05, loss= 1.1755 (max= 1.5373), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:45:44,286 - root - INFO - Step 31980: lr=1.00E-05, loss= 1.1755 (max= 1.5373), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:46:00,221 - root - INFO - Step 31990: lr=1.00E-05, loss= 1.1368 (max= 1.5335), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:46:00,221 - root - INFO - Step 31990: lr=1.00E-05, loss= 1.1368 (max= 1.5335), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:46:00,221 - root - INFO - Step 31990: lr=1.00E-05, loss= 1.1368 (max= 1.5335), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:46:00,221 - root - INFO - Step 31990: lr=1.00E-05, loss= 1.1368 (max= 1.5335), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:46:00,221 - root - INFO - Step 31990: lr=1.00E-05, loss= 1.1368 (max= 1.5335), tps=20569, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:46:00,221 - root - INFO - Step 31990: lr=1.00E-05, loss= 1.1368 (max= 1.5335), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:46:00,221 - root - INFO - Step 31990: lr=1.00E-05, loss= 1.1368 (max= 1.5335), tps=20569, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:46:00,221 - root - INFO - Step 31990: lr=1.00E-05, loss= 1.1368 (max= 1.5335), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-32000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-32000! Save time: 4.390704393386841 +2025-10-25 00:46:16,141 - root - INFO - Step 32000: lr=1.00E-05, loss= 1.1569 (max= 1.8030), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:46:16,141 - root - INFO - Step 32000: lr=1.00E-05, loss= 1.1569 (max= 1.8030), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:46:16,141 - root - INFO - Step 32000: lr=1.00E-05, loss= 1.1569 (max= 1.8030), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:46:16,141 - root - INFO - Step 32000: lr=1.00E-05, loss= 1.1569 (max= 1.8030), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:46:16,142 - root - INFO - Saving a full checkpoint at step 32000 +2025-10-25 00:46:16,142 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 00:46:16,142 - root - INFO - Saving a full checkpoint at step 32000 +2025-10-25 00:46:16,142 - root - INFO - Saving a full checkpoint at step 32000 +2025-10-25 00:46:16,142 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 00:46:16,142 - root - INFO - Saving a full checkpoint at step 32000 +2025-10-25 00:46:16,142 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 00:46:16,142 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 00:46:16,142 - root - INFO - Step 32000: lr=1.00E-05, loss= 1.1569 (max= 1.8030), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:46:16,142 - root - INFO - Step 32000: lr=1.00E-05, loss= 1.1569 (max= 1.8030), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:46:16,142 - root - INFO - Step 32000: lr=1.00E-05, loss= 1.1569 (max= 1.8030), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:46:16,142 - root - INFO - Saving a full checkpoint at step 32000 +2025-10-25 00:46:16,142 - root - INFO - Saving a full checkpoint at step 32000 +2025-10-25 00:46:16,142 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 00:46:16,142 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 00:46:16,142 - root - INFO - Step 32000: lr=1.00E-05, loss= 1.1569 (max= 1.8030), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:46:16,142 - root - INFO - Saving a full checkpoint at step 32000 +2025-10-25 00:46:16,142 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 00:46:16,142 - root - INFO - Saving a full checkpoint at step 32000 +2025-10-25 00:46:16,142 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 00:46:29,879 - root - INFO - Finished saving the checkpoint in 13.74 seconds +2025-10-25 00:46:29,885 - root - INFO - Finished saving the checkpoint in 13.74 seconds +2025-10-25 00:46:29,886 - root - INFO - Finished saving the checkpoint in 13.74 seconds +2025-10-25 00:46:29,886 - root - INFO - Finished saving the checkpoint in 13.74 seconds +2025-10-25 00:46:29,886 - root - INFO - Finished saving the checkpoint in 13.74 seconds +2025-10-25 00:46:29,886 - root - INFO - Finished saving the checkpoint in 13.74 seconds +2025-10-25 00:46:29,888 - root - INFO - Finished saving the checkpoint in 13.75 seconds +2025-10-25 00:46:29,888 - root - INFO - Finished saving the checkpoint in 13.75 seconds +2025-10-25 00:46:45,812 - root - INFO - Step 32010: lr=1.00E-05, loss= 1.1611 (max= 1.5474), tps=11045, mfu=23.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:46:45,812 - root - INFO - Step 32010: lr=1.00E-05, loss= 1.1611 (max= 1.5474), tps=11045, mfu=23.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:46:45,812 - root - INFO - Step 32010: lr=1.00E-05, loss= 1.1611 (max= 1.5474), tps=11045, mfu=23.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:46:45,812 - root - INFO - Step 32010: lr=1.00E-05, loss= 1.1611 (max= 1.5474), tps=11045, mfu=23.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:46:45,812 - root - INFO - Step 32010: lr=1.00E-05, loss= 1.1611 (max= 1.5474), tps=11045, mfu=23.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:46:45,812 - root - INFO - Step 32010: lr=1.00E-05, loss= 1.1611 (max= 1.5474), tps=11045, mfu=23.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:46:45,812 - root - INFO - Step 32010: lr=1.00E-05, loss= 1.1611 (max= 1.5474), tps=11045, mfu=23.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:46:45,812 - root - INFO - Step 32010: lr=1.00E-05, loss= 1.1611 (max= 1.5474), tps=11045, mfu=23.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:01,715 - root - INFO - Step 32020: lr=1.00E-05, loss= 1.1397 (max= 1.5858), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:01,715 - root - INFO - Step 32020: lr=1.00E-05, loss= 1.1397 (max= 1.5858), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:01,715 - root - INFO - Step 32020: lr=1.00E-05, loss= 1.1397 (max= 1.5858), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:01,715 - root - INFO - Step 32020: lr=1.00E-05, loss= 1.1397 (max= 1.5858), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:01,715 - root - INFO - Step 32020: lr=1.00E-05, loss= 1.1397 (max= 1.5858), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:01,715 - root - INFO - Step 32020: lr=1.00E-05, loss= 1.1397 (max= 1.5858), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:01,715 - root - INFO - Step 32020: lr=1.00E-05, loss= 1.1397 (max= 1.5858), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:01,715 - root - INFO - Step 32020: lr=1.00E-05, loss= 1.1397 (max= 1.5858), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:17,683 - root - INFO - Step 32030: lr=1.00E-05, loss= 1.1704 (max= 1.5940), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:17,683 - root - INFO - Step 32030: lr=1.00E-05, loss= 1.1704 (max= 1.5940), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:17,683 - root - INFO - Step 32030: lr=1.00E-05, loss= 1.1704 (max= 1.5940), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:17,683 - root - INFO - Step 32030: lr=1.00E-05, loss= 1.1704 (max= 1.5940), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:17,683 - root - INFO - Step 32030: lr=1.00E-05, loss= 1.1704 (max= 1.5940), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:17,683 - root - INFO - Step 32030: lr=1.00E-05, loss= 1.1704 (max= 1.5940), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:17,683 - root - INFO - Step 32030: lr=1.00E-05, loss= 1.1704 (max= 1.5940), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:17,683 - root - INFO - Step 32030: lr=1.00E-05, loss= 1.1704 (max= 1.5940), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:33,583 - root - INFO - Step 32040: lr=1.00E-05, loss= 1.1499 (max= 1.5956), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:47:33,583 - root - INFO - Step 32040: lr=1.00E-05, loss= 1.1499 (max= 1.5956), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:47:33,583 - root - INFO - Step 32040: lr=1.00E-05, loss= 1.1499 (max= 1.5956), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:47:33,583 - root - INFO - Step 32040: lr=1.00E-05, loss= 1.1499 (max= 1.5956), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:47:33,583 - root - INFO - Step 32040: lr=1.00E-05, loss= 1.1499 (max= 1.5956), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:47:33,583 - root - INFO - Step 32040: lr=1.00E-05, loss= 1.1499 (max= 1.5956), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:47:33,583 - root - INFO - Step 32040: lr=1.00E-05, loss= 1.1499 (max= 1.5956), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:47:33,583 - root - INFO - Step 32040: lr=1.00E-05, loss= 1.1499 (max= 1.5956), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:47:49,467 - root - INFO - Step 32050: lr=1.00E-05, loss= 1.1794 (max= 1.4840), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:49,467 - root - INFO - Step 32050: lr=1.00E-05, loss= 1.1794 (max= 1.4840), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:49,467 - root - INFO - Step 32050: lr=1.00E-05, loss= 1.1794 (max= 1.4840), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:49,467 - root - INFO - Step 32050: lr=1.00E-05, loss= 1.1794 (max= 1.4840), tps=20634, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:49,467 - root - INFO - Step 32050: lr=1.00E-05, loss= 1.1794 (max= 1.4840), tps=20634, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:49,467 - root - INFO - Step 32050: lr=1.00E-05, loss= 1.1794 (max= 1.4840), tps=20634, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:49,467 - root - INFO - Step 32050: lr=1.00E-05, loss= 1.1794 (max= 1.4840), tps=20634, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:47:49,467 - root - INFO - Step 32050: lr=1.00E-05, loss= 1.1794 (max= 1.4840), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:48:05,349 - root - INFO - Step 32060: lr=1.00E-05, loss= 1.1596 (max= 1.8454), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:48:05,349 - root - INFO - Step 32060: lr=1.00E-05, loss= 1.1596 (max= 1.8454), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:48:05,349 - root - INFO - Step 32060: lr=1.00E-05, loss= 1.1596 (max= 1.8454), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:48:05,349 - root - INFO - Step 32060: lr=1.00E-05, loss= 1.1596 (max= 1.8454), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:48:05,350 - root - INFO - Step 32060: lr=1.00E-05, loss= 1.1596 (max= 1.8454), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:48:05,350 - root - INFO - Step 32060: lr=1.00E-05, loss= 1.1596 (max= 1.8454), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:48:05,350 - root - INFO - Step 32060: lr=1.00E-05, loss= 1.1596 (max= 1.8454), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:48:05,350 - root - INFO - Step 32060: lr=1.00E-05, loss= 1.1596 (max= 1.8454), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:48:21,321 - root - INFO - Step 32070: lr=1.00E-05, loss= 1.1463 (max= 1.6851), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:48:21,321 - root - INFO - Step 32070: lr=1.00E-05, loss= 1.1463 (max= 1.6851), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:48:21,321 - root - INFO - Step 32070: lr=1.00E-05, loss= 1.1463 (max= 1.6851), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:48:21,321 - root - INFO - Step 32070: lr=1.00E-05, loss= 1.1463 (max= 1.6851), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:48:21,321 - root - INFO - Step 32070: lr=1.00E-05, loss= 1.1463 (max= 1.6851), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:48:21,321 - root - INFO - Step 32070: lr=1.00E-05, loss= 1.1463 (max= 1.6851), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:48:21,321 - root - INFO - Step 32070: lr=1.00E-05, loss= 1.1463 (max= 1.6851), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:48:21,321 - root - INFO - Step 32070: lr=1.00E-05, loss= 1.1463 (max= 1.6851), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:48:37,236 - root - INFO - Step 32080: lr=1.00E-05, loss= 1.1807 (max= 1.5809), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:48:37,236 - root - INFO - Step 32080: lr=1.00E-05, loss= 1.1807 (max= 1.5809), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:48:37,236 - root - INFO - Step 32080: lr=1.00E-05, loss= 1.1807 (max= 1.5809), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:48:37,236 - root - INFO - Step 32080: lr=1.00E-05, loss= 1.1807 (max= 1.5809), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:48:37,236 - root - INFO - Step 32080: lr=1.00E-05, loss= 1.1807 (max= 1.5809), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:48:37,236 - root - INFO - Step 32080: lr=1.00E-05, loss= 1.1807 (max= 1.5809), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:48:37,236 - root - INFO - Step 32080: lr=1.00E-05, loss= 1.1807 (max= 1.5809), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:48:37,236 - root - INFO - Step 32080: lr=1.00E-05, loss= 1.1807 (max= 1.5809), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:48:53,149 - root - INFO - Step 32090: lr=1.00E-05, loss= 1.1651 (max= 1.5083), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:48:53,149 - root - INFO - Step 32090: lr=1.00E-05, loss= 1.1651 (max= 1.5083), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:48:53,149 - root - INFO - Step 32090: lr=1.00E-05, loss= 1.1651 (max= 1.5083), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:48:53,149 - root - INFO - Step 32090: lr=1.00E-05, loss= 1.1651 (max= 1.5083), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:48:53,149 - root - INFO - Step 32090: lr=1.00E-05, loss= 1.1651 (max= 1.5083), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:48:53,149 - root - INFO - Step 32090: lr=1.00E-05, loss= 1.1651 (max= 1.5083), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:48:53,149 - root - INFO - Step 32090: lr=1.00E-05, loss= 1.1651 (max= 1.5083), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:48:53,149 - root - INFO - Step 32090: lr=1.00E-05, loss= 1.1651 (max= 1.5083), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:49:09,114 - root - INFO - Step 32100: lr=1.00E-05, loss= 1.1517 (max= 1.5682), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:09,114 - root - INFO - Step 32100: lr=1.00E-05, loss= 1.1517 (max= 1.5682), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:09,114 - root - INFO - Step 32100: lr=1.00E-05, loss= 1.1517 (max= 1.5682), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:09,114 - root - INFO - Step 32100: lr=1.00E-05, loss= 1.1517 (max= 1.5682), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:09,114 - root - INFO - Step 32100: lr=1.00E-05, loss= 1.1517 (max= 1.5682), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:09,114 - root - INFO - Step 32100: lr=1.00E-05, loss= 1.1517 (max= 1.5682), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:09,114 - root - INFO - Step 32100: lr=1.00E-05, loss= 1.1517 (max= 1.5682), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:09,115 - root - INFO - Step 32100: lr=1.00E-05, loss= 1.1517 (max= 1.5682), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:25,036 - root - INFO - Step 32110: lr=1.00E-05, loss= 1.1752 (max= 1.5796), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:25,036 - root - INFO - Step 32110: lr=1.00E-05, loss= 1.1752 (max= 1.5796), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:25,036 - root - INFO - Step 32110: lr=1.00E-05, loss= 1.1752 (max= 1.5796), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:25,036 - root - INFO - Step 32110: lr=1.00E-05, loss= 1.1752 (max= 1.5796), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:25,036 - root - INFO - Step 32110: lr=1.00E-05, loss= 1.1752 (max= 1.5796), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:25,036 - root - INFO - Step 32110: lr=1.00E-05, loss= 1.1752 (max= 1.5796), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:25,036 - root - INFO - Step 32110: lr=1.00E-05, loss= 1.1752 (max= 1.5796), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:25,037 - root - INFO - Step 32110: lr=1.00E-05, loss= 1.1752 (max= 1.5796), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:40,978 - root - INFO - Step 32120: lr=1.00E-05, loss= 1.1753 (max= 1.7254), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:40,978 - root - INFO - Step 32120: lr=1.00E-05, loss= 1.1753 (max= 1.7254), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:40,978 - root - INFO - Step 32120: lr=1.00E-05, loss= 1.1753 (max= 1.7254), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:40,978 - root - INFO - Step 32120: lr=1.00E-05, loss= 1.1753 (max= 1.7254), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:40,978 - root - INFO - Step 32120: lr=1.00E-05, loss= 1.1753 (max= 1.7254), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:40,978 - root - INFO - Step 32120: lr=1.00E-05, loss= 1.1753 (max= 1.7254), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:40,978 - root - INFO - Step 32120: lr=1.00E-05, loss= 1.1753 (max= 1.7254), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:40,978 - root - INFO - Step 32120: lr=1.00E-05, loss= 1.1753 (max= 1.7254), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:49:56,942 - root - INFO - Step 32130: lr=1.00E-05, loss= 1.1570 (max= 1.5235), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:49:56,942 - root - INFO - Step 32130: lr=1.00E-05, loss= 1.1570 (max= 1.5235), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:49:56,942 - root - INFO - Step 32130: lr=1.00E-05, loss= 1.1570 (max= 1.5235), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:49:56,942 - root - INFO - Step 32130: lr=1.00E-05, loss= 1.1570 (max= 1.5235), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:49:56,942 - root - INFO - Step 32130: lr=1.00E-05, loss= 1.1570 (max= 1.5235), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:49:56,942 - root - INFO - Step 32130: lr=1.00E-05, loss= 1.1570 (max= 1.5235), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:49:56,942 - root - INFO - Step 32130: lr=1.00E-05, loss= 1.1570 (max= 1.5235), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:49:56,943 - root - INFO - Step 32130: lr=1.00E-05, loss= 1.1570 (max= 1.5235), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:12,842 - root - INFO - Step 32140: lr=1.00E-05, loss= 1.1757 (max= 1.6850), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:12,842 - root - INFO - Step 32140: lr=1.00E-05, loss= 1.1757 (max= 1.6850), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:12,842 - root - INFO - Step 32140: lr=1.00E-05, loss= 1.1757 (max= 1.6850), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:12,842 - root - INFO - Step 32140: lr=1.00E-05, loss= 1.1757 (max= 1.6850), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:12,843 - root - INFO - Step 32140: lr=1.00E-05, loss= 1.1757 (max= 1.6850), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:12,843 - root - INFO - Step 32140: lr=1.00E-05, loss= 1.1757 (max= 1.6850), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:12,843 - root - INFO - Step 32140: lr=1.00E-05, loss= 1.1757 (max= 1.6850), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:12,843 - root - INFO - Step 32140: lr=1.00E-05, loss= 1.1757 (max= 1.6850), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:28,789 - root - INFO - Step 32150: lr=1.00E-05, loss= 1.1705 (max= 1.5497), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:28,789 - root - INFO - Step 32150: lr=1.00E-05, loss= 1.1705 (max= 1.5497), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:28,789 - root - INFO - Step 32150: lr=1.00E-05, loss= 1.1705 (max= 1.5497), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:28,789 - root - INFO - Step 32150: lr=1.00E-05, loss= 1.1705 (max= 1.5497), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:28,789 - root - INFO - Step 32150: lr=1.00E-05, loss= 1.1705 (max= 1.5497), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:28,789 - root - INFO - Step 32150: lr=1.00E-05, loss= 1.1705 (max= 1.5497), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:28,789 - root - INFO - Step 32150: lr=1.00E-05, loss= 1.1705 (max= 1.5497), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:28,789 - root - INFO - Step 32150: lr=1.00E-05, loss= 1.1705 (max= 1.5497), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:44,758 - root - INFO - Step 32160: lr=1.00E-05, loss= 1.1637 (max= 1.5331), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:44,758 - root - INFO - Step 32160: lr=1.00E-05, loss= 1.1637 (max= 1.5331), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:44,758 - root - INFO - Step 32160: lr=1.00E-05, loss= 1.1637 (max= 1.5331), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:44,758 - root - INFO - Step 32160: lr=1.00E-05, loss= 1.1637 (max= 1.5331), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:44,758 - root - INFO - Step 32160: lr=1.00E-05, loss= 1.1637 (max= 1.5331), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:44,758 - root - INFO - Step 32160: lr=1.00E-05, loss= 1.1637 (max= 1.5331), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:44,758 - root - INFO - Step 32160: lr=1.00E-05, loss= 1.1637 (max= 1.5331), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:50:44,759 - root - INFO - Step 32160: lr=1.00E-05, loss= 1.1637 (max= 1.5331), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:00,769 - root - INFO - Step 32170: lr=1.00E-05, loss= 1.1484 (max= 1.6235), tps=20471, mfu=42.65%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:00,769 - root - INFO - Step 32170: lr=1.00E-05, loss= 1.1484 (max= 1.6235), tps=20470, mfu=42.65%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:00,769 - root - INFO - Step 32170: lr=1.00E-05, loss= 1.1484 (max= 1.6235), tps=20470, mfu=42.65%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:00,769 - root - INFO - Step 32170: lr=1.00E-05, loss= 1.1484 (max= 1.6235), tps=20471, mfu=42.65%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:00,769 - root - INFO - Step 32170: lr=1.00E-05, loss= 1.1484 (max= 1.6235), tps=20471, mfu=42.65%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:00,769 - root - INFO - Step 32170: lr=1.00E-05, loss= 1.1484 (max= 1.6235), tps=20471, mfu=42.65%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:00,769 - root - INFO - Step 32170: lr=1.00E-05, loss= 1.1484 (max= 1.6235), tps=20471, mfu=42.65%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:00,769 - root - INFO - Step 32170: lr=1.00E-05, loss= 1.1484 (max= 1.6235), tps=20471, mfu=42.65%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:16,738 - root - INFO - Step 32180: lr=1.00E-05, loss= 1.1792 (max= 1.5864), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:16,738 - root - INFO - Step 32180: lr=1.00E-05, loss= 1.1792 (max= 1.5864), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:16,738 - root - INFO - Step 32180: lr=1.00E-05, loss= 1.1792 (max= 1.5864), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:16,738 - root - INFO - Step 32180: lr=1.00E-05, loss= 1.1792 (max= 1.5864), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:16,738 - root - INFO - Step 32180: lr=1.00E-05, loss= 1.1792 (max= 1.5864), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:16,738 - root - INFO - Step 32180: lr=1.00E-05, loss= 1.1792 (max= 1.5864), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:16,738 - root - INFO - Step 32180: lr=1.00E-05, loss= 1.1792 (max= 1.5864), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:16,738 - root - INFO - Step 32180: lr=1.00E-05, loss= 1.1792 (max= 1.5864), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:32,691 - root - INFO - Step 32190: lr=1.00E-05, loss= 1.1582 (max= 1.6324), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:51:32,691 - root - INFO - Step 32190: lr=1.00E-05, loss= 1.1582 (max= 1.6324), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:51:32,691 - root - INFO - Step 32190: lr=1.00E-05, loss= 1.1582 (max= 1.6324), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:51:32,691 - root - INFO - Step 32190: lr=1.00E-05, loss= 1.1582 (max= 1.6324), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:51:32,691 - root - INFO - Step 32190: lr=1.00E-05, loss= 1.1582 (max= 1.6324), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:51:32,691 - root - INFO - Step 32190: lr=1.00E-05, loss= 1.1582 (max= 1.6324), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:51:32,691 - root - INFO - Step 32190: lr=1.00E-05, loss= 1.1582 (max= 1.6324), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:51:32,691 - root - INFO - Step 32190: lr=1.00E-05, loss= 1.1582 (max= 1.6324), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:51:48,609 - root - INFO - Step 32200: lr=1.00E-05, loss= 1.1755 (max= 1.5495), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:48,609 - root - INFO - Step 32200: lr=1.00E-05, loss= 1.1755 (max= 1.5495), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:48,609 - root - INFO - Step 32200: lr=1.00E-05, loss= 1.1755 (max= 1.5495), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:48,609 - root - INFO - Step 32200: lr=1.00E-05, loss= 1.1755 (max= 1.5495), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:48,609 - root - INFO - Step 32200: lr=1.00E-05, loss= 1.1755 (max= 1.5495), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:48,609 - root - INFO - Step 32200: lr=1.00E-05, loss= 1.1755 (max= 1.5495), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:48,609 - root - INFO - Step 32200: lr=1.00E-05, loss= 1.1755 (max= 1.5495), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:51:48,610 - root - INFO - Step 32200: lr=1.00E-05, loss= 1.1755 (max= 1.5495), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:52:04,552 - root - INFO - Step 32210: lr=1.00E-05, loss= 1.1716 (max= 1.6977), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:52:04,552 - root - INFO - Step 32210: lr=1.00E-05, loss= 1.1716 (max= 1.6977), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:52:04,552 - root - INFO - Step 32210: lr=1.00E-05, loss= 1.1716 (max= 1.6977), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:52:04,552 - root - INFO - Step 32210: lr=1.00E-05, loss= 1.1716 (max= 1.6977), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:52:04,552 - root - INFO - Step 32210: lr=1.00E-05, loss= 1.1716 (max= 1.6977), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:52:04,552 - root - INFO - Step 32210: lr=1.00E-05, loss= 1.1716 (max= 1.6977), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:52:04,553 - root - INFO - Step 32210: lr=1.00E-05, loss= 1.1716 (max= 1.6977), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:52:04,553 - root - INFO - Step 32210: lr=1.00E-05, loss= 1.1716 (max= 1.6977), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:52:20,544 - root - INFO - Step 32220: lr=1.00E-05, loss= 1.1702 (max= 1.5156), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:52:20,544 - root - INFO - Step 32220: lr=1.00E-05, loss= 1.1702 (max= 1.5156), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:52:20,544 - root - INFO - Step 32220: lr=1.00E-05, loss= 1.1702 (max= 1.5156), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:52:20,544 - root - INFO - Step 32220: lr=1.00E-05, loss= 1.1702 (max= 1.5156), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:52:20,544 - root - INFO - Step 32220: lr=1.00E-05, loss= 1.1702 (max= 1.5156), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:52:20,544 - root - INFO - Step 32220: lr=1.00E-05, loss= 1.1702 (max= 1.5156), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:52:20,544 - root - INFO - Step 32220: lr=1.00E-05, loss= 1.1702 (max= 1.5156), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:52:20,544 - root - INFO - Step 32220: lr=1.00E-05, loss= 1.1702 (max= 1.5156), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:52:36,481 - root - INFO - Step 32230: lr=1.00E-05, loss= 1.1343 (max= 1.6877), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:52:36,481 - root - INFO - Step 32230: lr=1.00E-05, loss= 1.1343 (max= 1.6877), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:52:36,481 - root - INFO - Step 32230: lr=1.00E-05, loss= 1.1343 (max= 1.6877), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:52:36,481 - root - INFO - Step 32230: lr=1.00E-05, loss= 1.1343 (max= 1.6877), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:52:36,481 - root - INFO - Step 32230: lr=1.00E-05, loss= 1.1343 (max= 1.6877), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:52:36,481 - root - INFO - Step 32230: lr=1.00E-05, loss= 1.1343 (max= 1.6877), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:52:36,481 - root - INFO - Step 32230: lr=1.00E-05, loss= 1.1343 (max= 1.6877), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:52:36,481 - root - INFO - Step 32230: lr=1.00E-05, loss= 1.1343 (max= 1.6877), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:52:52,409 - root - INFO - Step 32240: lr=1.00E-05, loss= 1.1449 (max= 1.5653), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:52:52,409 - root - INFO - Step 32240: lr=1.00E-05, loss= 1.1449 (max= 1.5653), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:52:52,409 - root - INFO - Step 32240: lr=1.00E-05, loss= 1.1449 (max= 1.5653), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:52:52,409 - root - INFO - Step 32240: lr=1.00E-05, loss= 1.1449 (max= 1.5653), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:52:52,409 - root - INFO - Step 32240: lr=1.00E-05, loss= 1.1449 (max= 1.5653), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:52:52,409 - root - INFO - Step 32240: lr=1.00E-05, loss= 1.1449 (max= 1.5653), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:52:52,409 - root - INFO - Step 32240: lr=1.00E-05, loss= 1.1449 (max= 1.5653), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:52:52,409 - root - INFO - Step 32240: lr=1.00E-05, loss= 1.1449 (max= 1.5653), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:53:08,353 - root - INFO - Step 32250: lr=1.00E-05, loss= 1.1470 (max= 1.6658), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:53:08,353 - root - INFO - Step 32250: lr=1.00E-05, loss= 1.1470 (max= 1.6658), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:53:08,353 - root - INFO - Step 32250: lr=1.00E-05, loss= 1.1470 (max= 1.6658), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:53:08,353 - root - INFO - Step 32250: lr=1.00E-05, loss= 1.1470 (max= 1.6658), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:53:08,353 - root - INFO - Step 32250: lr=1.00E-05, loss= 1.1470 (max= 1.6658), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:53:08,354 - root - INFO - Step 32250: lr=1.00E-05, loss= 1.1470 (max= 1.6658), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:53:08,354 - root - INFO - Step 32250: lr=1.00E-05, loss= 1.1470 (max= 1.6658), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:53:08,354 - root - INFO - Step 32250: lr=1.00E-05, loss= 1.1470 (max= 1.6658), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:53:24,296 - root - INFO - Step 32260: lr=1.00E-05, loss= 1.1727 (max= 1.6070), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:53:24,296 - root - INFO - Step 32260: lr=1.00E-05, loss= 1.1727 (max= 1.6070), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:53:24,296 - root - INFO - Step 32260: lr=1.00E-05, loss= 1.1727 (max= 1.6070), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:53:24,296 - root - INFO - Step 32260: lr=1.00E-05, loss= 1.1727 (max= 1.6070), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:53:24,296 - root - INFO - Step 32260: lr=1.00E-05, loss= 1.1727 (max= 1.6070), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:53:24,296 - root - INFO - Step 32260: lr=1.00E-05, loss= 1.1727 (max= 1.6070), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:53:24,296 - root - INFO - Step 32260: lr=1.00E-05, loss= 1.1727 (max= 1.6070), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:53:24,296 - root - INFO - Step 32260: lr=1.00E-05, loss= 1.1727 (max= 1.6070), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:53:40,230 - root - INFO - Step 32270: lr=1.00E-05, loss= 1.1486 (max= 1.5105), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:53:40,230 - root - INFO - Step 32270: lr=1.00E-05, loss= 1.1486 (max= 1.5105), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:53:40,230 - root - INFO - Step 32270: lr=1.00E-05, loss= 1.1486 (max= 1.5105), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:53:40,230 - root - INFO - Step 32270: lr=1.00E-05, loss= 1.1486 (max= 1.5105), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:53:40,230 - root - INFO - Step 32270: lr=1.00E-05, loss= 1.1486 (max= 1.5105), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:53:40,230 - root - INFO - Step 32270: lr=1.00E-05, loss= 1.1486 (max= 1.5105), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:53:40,230 - root - INFO - Step 32270: lr=1.00E-05, loss= 1.1486 (max= 1.5105), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:53:40,230 - root - INFO - Step 32270: lr=1.00E-05, loss= 1.1486 (max= 1.5105), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:53:56,182 - root - INFO - Step 32280: lr=1.00E-05, loss= 1.1353 (max= 1.5175), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:53:56,182 - root - INFO - Step 32280: lr=1.00E-05, loss= 1.1353 (max= 1.5175), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:53:56,182 - root - INFO - Step 32280: lr=1.00E-05, loss= 1.1353 (max= 1.5175), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:53:56,182 - root - INFO - Step 32280: lr=1.00E-05, loss= 1.1353 (max= 1.5175), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:53:56,182 - root - INFO - Step 32280: lr=1.00E-05, loss= 1.1353 (max= 1.5175), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:53:56,183 - root - INFO - Step 32280: lr=1.00E-05, loss= 1.1353 (max= 1.5175), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:53:56,183 - root - INFO - Step 32280: lr=1.00E-05, loss= 1.1353 (max= 1.5175), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:53:56,183 - root - INFO - Step 32280: lr=1.00E-05, loss= 1.1353 (max= 1.5175), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:54:12,134 - root - INFO - Step 32290: lr=1.00E-05, loss= 1.1471 (max= 1.5573), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:54:12,134 - root - INFO - Step 32290: lr=1.00E-05, loss= 1.1471 (max= 1.5573), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:54:12,134 - root - INFO - Step 32290: lr=1.00E-05, loss= 1.1471 (max= 1.5573), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:54:12,134 - root - INFO - Step 32290: lr=1.00E-05, loss= 1.1471 (max= 1.5573), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:54:12,134 - root - INFO - Step 32290: lr=1.00E-05, loss= 1.1471 (max= 1.5573), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:54:12,135 - root - INFO - Step 32290: lr=1.00E-05, loss= 1.1471 (max= 1.5573), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:54:12,135 - root - INFO - Step 32290: lr=1.00E-05, loss= 1.1471 (max= 1.5573), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:54:12,135 - root - INFO - Step 32290: lr=1.00E-05, loss= 1.1471 (max= 1.5573), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:54:28,067 - root - INFO - Step 32300: lr=1.00E-05, loss= 1.1513 (max= 1.5437), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:54:28,067 - root - INFO - Step 32300: lr=1.00E-05, loss= 1.1513 (max= 1.5437), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:54:28,067 - root - INFO - Step 32300: lr=1.00E-05, loss= 1.1513 (max= 1.5437), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:54:28,067 - root - INFO - Step 32300: lr=1.00E-05, loss= 1.1513 (max= 1.5437), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:54:28,067 - root - INFO - Step 32300: lr=1.00E-05, loss= 1.1513 (max= 1.5437), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:54:28,067 - root - INFO - Step 32300: lr=1.00E-05, loss= 1.1513 (max= 1.5437), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:54:28,067 - root - INFO - Step 32300: lr=1.00E-05, loss= 1.1513 (max= 1.5437), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:54:28,067 - root - INFO - Step 32300: lr=1.00E-05, loss= 1.1513 (max= 1.5437), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:54:44,021 - root - INFO - Step 32310: lr=1.00E-05, loss= 1.1844 (max= 1.6398), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:54:44,021 - root - INFO - Step 32310: lr=1.00E-05, loss= 1.1844 (max= 1.6398), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:54:44,021 - root - INFO - Step 32310: lr=1.00E-05, loss= 1.1844 (max= 1.6398), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:54:44,021 - root - INFO - Step 32310: lr=1.00E-05, loss= 1.1844 (max= 1.6398), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:54:44,021 - root - INFO - Step 32310: lr=1.00E-05, loss= 1.1844 (max= 1.6398), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:54:44,021 - root - INFO - Step 32310: lr=1.00E-05, loss= 1.1844 (max= 1.6398), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:54:44,022 - root - INFO - Step 32310: lr=1.00E-05, loss= 1.1844 (max= 1.6398), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:54:44,022 - root - INFO - Step 32310: lr=1.00E-05, loss= 1.1844 (max= 1.6398), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:54:59,998 - root - INFO - Step 32320: lr=1.00E-05, loss= 1.1340 (max= 1.5157), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:54:59,998 - root - INFO - Step 32320: lr=1.00E-05, loss= 1.1340 (max= 1.5157), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:54:59,998 - root - INFO - Step 32320: lr=1.00E-05, loss= 1.1340 (max= 1.5157), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:54:59,998 - root - INFO - Step 32320: lr=1.00E-05, loss= 1.1340 (max= 1.5157), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:54:59,999 - root - INFO - Step 32320: lr=1.00E-05, loss= 1.1340 (max= 1.5157), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:54:59,999 - root - INFO - Step 32320: lr=1.00E-05, loss= 1.1340 (max= 1.5157), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:54:59,999 - root - INFO - Step 32320: lr=1.00E-05, loss= 1.1340 (max= 1.5157), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:54:59,999 - root - INFO - Step 32320: lr=1.00E-05, loss= 1.1340 (max= 1.5157), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:55:15,972 - root - INFO - Step 32330: lr=1.00E-05, loss= 1.1495 (max= 1.6498), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:55:15,972 - root - INFO - Step 32330: lr=1.00E-05, loss= 1.1495 (max= 1.6498), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:55:15,972 - root - INFO - Step 32330: lr=1.00E-05, loss= 1.1495 (max= 1.6498), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:55:15,972 - root - INFO - Step 32330: lr=1.00E-05, loss= 1.1495 (max= 1.6498), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:55:15,972 - root - INFO - Step 32330: lr=1.00E-05, loss= 1.1495 (max= 1.6498), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:55:15,972 - root - INFO - Step 32330: lr=1.00E-05, loss= 1.1495 (max= 1.6498), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:55:15,972 - root - INFO - Step 32330: lr=1.00E-05, loss= 1.1495 (max= 1.6498), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:55:15,972 - root - INFO - Step 32330: lr=1.00E-05, loss= 1.1495 (max= 1.6498), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:55:31,936 - root - INFO - Step 32340: lr=1.00E-05, loss= 1.1989 (max= 1.6671), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:55:31,936 - root - INFO - Step 32340: lr=1.00E-05, loss= 1.1989 (max= 1.6671), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:55:31,936 - root - INFO - Step 32340: lr=1.00E-05, loss= 1.1989 (max= 1.6671), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:55:31,936 - root - INFO - Step 32340: lr=1.00E-05, loss= 1.1989 (max= 1.6671), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:55:31,936 - root - INFO - Step 32340: lr=1.00E-05, loss= 1.1989 (max= 1.6671), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:55:31,937 - root - INFO - Step 32340: lr=1.00E-05, loss= 1.1989 (max= 1.6671), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:55:31,937 - root - INFO - Step 32340: lr=1.00E-05, loss= 1.1989 (max= 1.6671), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:55:31,937 - root - INFO - Step 32340: lr=1.00E-05, loss= 1.1989 (max= 1.6671), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:55:47,880 - root - INFO - Step 32350: lr=1.00E-05, loss= 1.1540 (max= 1.5552), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:55:47,880 - root - INFO - Step 32350: lr=1.00E-05, loss= 1.1540 (max= 1.5552), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:55:47,880 - root - INFO - Step 32350: lr=1.00E-05, loss= 1.1540 (max= 1.5552), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:55:47,880 - root - INFO - Step 32350: lr=1.00E-05, loss= 1.1540 (max= 1.5552), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:55:47,880 - root - INFO - Step 32350: lr=1.00E-05, loss= 1.1540 (max= 1.5552), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:55:47,880 - root - INFO - Step 32350: lr=1.00E-05, loss= 1.1540 (max= 1.5552), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:55:47,880 - root - INFO - Step 32350: lr=1.00E-05, loss= 1.1540 (max= 1.5552), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:55:47,880 - root - INFO - Step 32350: lr=1.00E-05, loss= 1.1540 (max= 1.5552), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:56:03,811 - root - INFO - Step 32360: lr=1.00E-05, loss= 1.1690 (max= 1.5253), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:03,811 - root - INFO - Step 32360: lr=1.00E-05, loss= 1.1690 (max= 1.5253), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:03,811 - root - INFO - Step 32360: lr=1.00E-05, loss= 1.1690 (max= 1.5253), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:03,811 - root - INFO - Step 32360: lr=1.00E-05, loss= 1.1690 (max= 1.5253), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:03,811 - root - INFO - Step 32360: lr=1.00E-05, loss= 1.1690 (max= 1.5253), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:03,811 - root - INFO - Step 32360: lr=1.00E-05, loss= 1.1690 (max= 1.5253), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:03,811 - root - INFO - Step 32360: lr=1.00E-05, loss= 1.1690 (max= 1.5253), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:03,811 - root - INFO - Step 32360: lr=1.00E-05, loss= 1.1690 (max= 1.5253), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:19,756 - root - INFO - Step 32370: lr=1.00E-05, loss= 1.1784 (max= 1.6813), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:56:19,756 - root - INFO - Step 32370: lr=1.00E-05, loss= 1.1784 (max= 1.6813), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:56:19,756 - root - INFO - Step 32370: lr=1.00E-05, loss= 1.1784 (max= 1.6813), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:56:19,756 - root - INFO - Step 32370: lr=1.00E-05, loss= 1.1784 (max= 1.6813), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:56:19,756 - root - INFO - Step 32370: lr=1.00E-05, loss= 1.1784 (max= 1.6813), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:56:19,756 - root - INFO - Step 32370: lr=1.00E-05, loss= 1.1784 (max= 1.6813), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:56:19,756 - root - INFO - Step 32370: lr=1.00E-05, loss= 1.1784 (max= 1.6813), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:56:19,756 - root - INFO - Step 32370: lr=1.00E-05, loss= 1.1784 (max= 1.6813), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:56:35,642 - root - INFO - Step 32380: lr=1.00E-05, loss= 1.1525 (max= 1.5909), tps=20632, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:35,642 - root - INFO - Step 32380: lr=1.00E-05, loss= 1.1525 (max= 1.5909), tps=20632, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:35,642 - root - INFO - Step 32380: lr=1.00E-05, loss= 1.1525 (max= 1.5909), tps=20632, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:35,642 - root - INFO - Step 32380: lr=1.00E-05, loss= 1.1525 (max= 1.5909), tps=20632, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:35,642 - root - INFO - Step 32380: lr=1.00E-05, loss= 1.1525 (max= 1.5909), tps=20632, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:35,642 - root - INFO - Step 32380: lr=1.00E-05, loss= 1.1525 (max= 1.5909), tps=20632, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:35,642 - root - INFO - Step 32380: lr=1.00E-05, loss= 1.1525 (max= 1.5909), tps=20632, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:35,642 - root - INFO - Step 32380: lr=1.00E-05, loss= 1.1525 (max= 1.5909), tps=20632, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:51,591 - root - INFO - Step 32390: lr=1.00E-05, loss= 1.1336 (max= 1.5676), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:51,591 - root - INFO - Step 32390: lr=1.00E-05, loss= 1.1336 (max= 1.5676), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:51,591 - root - INFO - Step 32390: lr=1.00E-05, loss= 1.1336 (max= 1.5676), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:51,591 - root - INFO - Step 32390: lr=1.00E-05, loss= 1.1336 (max= 1.5676), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:51,591 - root - INFO - Step 32390: lr=1.00E-05, loss= 1.1336 (max= 1.5676), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:51,592 - root - INFO - Step 32390: lr=1.00E-05, loss= 1.1336 (max= 1.5676), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:51,592 - root - INFO - Step 32390: lr=1.00E-05, loss= 1.1336 (max= 1.5676), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:56:51,592 - root - INFO - Step 32390: lr=1.00E-05, loss= 1.1336 (max= 1.5676), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:07,499 - root - INFO - Step 32400: lr=1.00E-05, loss= 1.1624 (max= 1.4911), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:07,499 - root - INFO - Step 32400: lr=1.00E-05, loss= 1.1624 (max= 1.4911), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:07,499 - root - INFO - Step 32400: lr=1.00E-05, loss= 1.1624 (max= 1.4911), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:07,499 - root - INFO - Step 32400: lr=1.00E-05, loss= 1.1624 (max= 1.4911), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:07,499 - root - INFO - Step 32400: lr=1.00E-05, loss= 1.1624 (max= 1.4911), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:07,499 - root - INFO - Step 32400: lr=1.00E-05, loss= 1.1624 (max= 1.4911), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:07,499 - root - INFO - Step 32400: lr=1.00E-05, loss= 1.1624 (max= 1.4911), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:07,499 - root - INFO - Step 32400: lr=1.00E-05, loss= 1.1624 (max= 1.4911), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:23,453 - root - INFO - Step 32410: lr=1.00E-05, loss= 1.1554 (max= 1.7501), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:23,453 - root - INFO - Step 32410: lr=1.00E-05, loss= 1.1554 (max= 1.7501), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:23,453 - root - INFO - Step 32410: lr=1.00E-05, loss= 1.1554 (max= 1.7501), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:23,453 - root - INFO - Step 32410: lr=1.00E-05, loss= 1.1554 (max= 1.7501), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:23,453 - root - INFO - Step 32410: lr=1.00E-05, loss= 1.1554 (max= 1.7501), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:23,453 - root - INFO - Step 32410: lr=1.00E-05, loss= 1.1554 (max= 1.7501), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:23,453 - root - INFO - Step 32410: lr=1.00E-05, loss= 1.1554 (max= 1.7501), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:23,453 - root - INFO - Step 32410: lr=1.00E-05, loss= 1.1554 (max= 1.7501), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:39,399 - root - INFO - Step 32420: lr=1.00E-05, loss= 1.1596 (max= 1.7050), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:39,399 - root - INFO - Step 32420: lr=1.00E-05, loss= 1.1596 (max= 1.7050), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:39,399 - root - INFO - Step 32420: lr=1.00E-05, loss= 1.1596 (max= 1.7050), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:39,399 - root - INFO - Step 32420: lr=1.00E-05, loss= 1.1596 (max= 1.7050), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:39,399 - root - INFO - Step 32420: lr=1.00E-05, loss= 1.1596 (max= 1.7050), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:39,399 - root - INFO - Step 32420: lr=1.00E-05, loss= 1.1596 (max= 1.7050), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:39,399 - root - INFO - Step 32420: lr=1.00E-05, loss= 1.1596 (max= 1.7050), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:39,399 - root - INFO - Step 32420: lr=1.00E-05, loss= 1.1596 (max= 1.7050), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:55,314 - root - INFO - Step 32430: lr=1.00E-05, loss= 1.1397 (max= 1.4834), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:55,314 - root - INFO - Step 32430: lr=1.00E-05, loss= 1.1397 (max= 1.4834), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:55,314 - root - INFO - Step 32430: lr=1.00E-05, loss= 1.1397 (max= 1.4834), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:55,314 - root - INFO - Step 32430: lr=1.00E-05, loss= 1.1397 (max= 1.4834), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:55,314 - root - INFO - Step 32430: lr=1.00E-05, loss= 1.1397 (max= 1.4834), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:55,314 - root - INFO - Step 32430: lr=1.00E-05, loss= 1.1397 (max= 1.4834), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:55,314 - root - INFO - Step 32430: lr=1.00E-05, loss= 1.1397 (max= 1.4834), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:57:55,314 - root - INFO - Step 32430: lr=1.00E-05, loss= 1.1397 (max= 1.4834), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:11,255 - root - INFO - Step 32440: lr=1.00E-05, loss= 1.1689 (max= 1.8264), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:58:11,255 - root - INFO - Step 32440: lr=1.00E-05, loss= 1.1689 (max= 1.8264), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:58:11,255 - root - INFO - Step 32440: lr=1.00E-05, loss= 1.1689 (max= 1.8264), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:58:11,255 - root - INFO - Step 32440: lr=1.00E-05, loss= 1.1689 (max= 1.8264), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:58:11,255 - root - INFO - Step 32440: lr=1.00E-05, loss= 1.1689 (max= 1.8264), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:58:11,255 - root - INFO - Step 32440: lr=1.00E-05, loss= 1.1689 (max= 1.8264), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:58:11,255 - root - INFO - Step 32440: lr=1.00E-05, loss= 1.1689 (max= 1.8264), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:58:11,255 - root - INFO - Step 32440: lr=1.00E-05, loss= 1.1689 (max= 1.8264), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:58:27,234 - root - INFO - Step 32450: lr=1.00E-05, loss= 1.1838 (max= 1.5081), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:27,234 - root - INFO - Step 32450: lr=1.00E-05, loss= 1.1838 (max= 1.5081), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:27,234 - root - INFO - Step 32450: lr=1.00E-05, loss= 1.1838 (max= 1.5081), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:27,234 - root - INFO - Step 32450: lr=1.00E-05, loss= 1.1838 (max= 1.5081), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:27,234 - root - INFO - Step 32450: lr=1.00E-05, loss= 1.1838 (max= 1.5081), tps=20511, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:27,234 - root - INFO - Step 32450: lr=1.00E-05, loss= 1.1838 (max= 1.5081), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:27,234 - root - INFO - Step 32450: lr=1.00E-05, loss= 1.1838 (max= 1.5081), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:27,234 - root - INFO - Step 32450: lr=1.00E-05, loss= 1.1838 (max= 1.5081), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:43,200 - root - INFO - Step 32460: lr=1.00E-05, loss= 1.1642 (max= 1.4874), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:43,200 - root - INFO - Step 32460: lr=1.00E-05, loss= 1.1642 (max= 1.4874), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:43,200 - root - INFO - Step 32460: lr=1.00E-05, loss= 1.1642 (max= 1.4874), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:43,200 - root - INFO - Step 32460: lr=1.00E-05, loss= 1.1642 (max= 1.4874), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:43,200 - root - INFO - Step 32460: lr=1.00E-05, loss= 1.1642 (max= 1.4874), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:43,200 - root - INFO - Step 32460: lr=1.00E-05, loss= 1.1642 (max= 1.4874), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:43,200 - root - INFO - Step 32460: lr=1.00E-05, loss= 1.1642 (max= 1.4874), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:43,200 - root - INFO - Step 32460: lr=1.00E-05, loss= 1.1642 (max= 1.4874), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:59,160 - root - INFO - Step 32470: lr=1.00E-05, loss= 1.1497 (max= 1.4912), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:59,160 - root - INFO - Step 32470: lr=1.00E-05, loss= 1.1497 (max= 1.4912), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:59,160 - root - INFO - Step 32470: lr=1.00E-05, loss= 1.1497 (max= 1.4912), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:59,160 - root - INFO - Step 32470: lr=1.00E-05, loss= 1.1497 (max= 1.4912), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:59,160 - root - INFO - Step 32470: lr=1.00E-05, loss= 1.1497 (max= 1.4912), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:59,160 - root - INFO - Step 32470: lr=1.00E-05, loss= 1.1497 (max= 1.4912), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:59,160 - root - INFO - Step 32470: lr=1.00E-05, loss= 1.1497 (max= 1.4912), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:58:59,160 - root - INFO - Step 32470: lr=1.00E-05, loss= 1.1497 (max= 1.4912), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:59:15,138 - root - INFO - Step 32480: lr=1.00E-05, loss= 1.1827 (max= 1.8251), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:59:15,138 - root - INFO - Step 32480: lr=1.00E-05, loss= 1.1827 (max= 1.8251), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:59:15,138 - root - INFO - Step 32480: lr=1.00E-05, loss= 1.1827 (max= 1.8251), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:59:15,138 - root - INFO - Step 32480: lr=1.00E-05, loss= 1.1827 (max= 1.8251), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:59:15,138 - root - INFO - Step 32480: lr=1.00E-05, loss= 1.1827 (max= 1.8251), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:59:15,138 - root - INFO - Step 32480: lr=1.00E-05, loss= 1.1827 (max= 1.8251), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:59:15,138 - root - INFO - Step 32480: lr=1.00E-05, loss= 1.1827 (max= 1.8251), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:59:15,139 - root - INFO - Step 32480: lr=1.00E-05, loss= 1.1827 (max= 1.8251), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:59:31,076 - root - INFO - Step 32490: lr=1.00E-05, loss= 1.1585 (max= 1.5114), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:59:31,076 - root - INFO - Step 32490: lr=1.00E-05, loss= 1.1585 (max= 1.5114), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:59:31,076 - root - INFO - Step 32490: lr=1.00E-05, loss= 1.1585 (max= 1.5114), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:59:31,076 - root - INFO - Step 32490: lr=1.00E-05, loss= 1.1585 (max= 1.5114), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:59:31,076 - root - INFO - Step 32490: lr=1.00E-05, loss= 1.1585 (max= 1.5114), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:59:31,076 - root - INFO - Step 32490: lr=1.00E-05, loss= 1.1585 (max= 1.5114), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:59:31,076 - root - INFO - Step 32490: lr=1.00E-05, loss= 1.1585 (max= 1.5114), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:59:31,076 - root - INFO - Step 32490: lr=1.00E-05, loss= 1.1585 (max= 1.5114), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 00:59:47,010 - root - INFO - Step 32500: lr=1.00E-05, loss= 1.1662 (max= 1.6302), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:59:47,010 - root - INFO - Step 32500: lr=1.00E-05, loss= 1.1662 (max= 1.6302), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:59:47,010 - root - INFO - Step 32500: lr=1.00E-05, loss= 1.1662 (max= 1.6302), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:59:47,010 - root - INFO - Step 32500: lr=1.00E-05, loss= 1.1662 (max= 1.6302), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:59:47,010 - root - INFO - Step 32500: lr=1.00E-05, loss= 1.1662 (max= 1.6302), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:59:47,010 - root - INFO - Step 32500: lr=1.00E-05, loss= 1.1662 (max= 1.6302), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:59:47,010 - root - INFO - Step 32500: lr=1.00E-05, loss= 1.1662 (max= 1.6302), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:59:47,010 - root - INFO - Step 32500: lr=1.00E-05, loss= 1.1662 (max= 1.6302), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 00:59:52,511 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:4170997 +2025-10-25 01:00:02,914 - root - INFO - Step 32510: lr=1.00E-05, loss= 1.1637 (max= 1.5303), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:00:02,914 - root - INFO - Step 32510: lr=1.00E-05, loss= 1.1637 (max= 1.5303), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:00:02,914 - root - INFO - Step 32510: lr=1.00E-05, loss= 1.1637 (max= 1.5303), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:00:02,914 - root - INFO - Step 32510: lr=1.00E-05, loss= 1.1637 (max= 1.5303), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:00:02,914 - root - INFO - Step 32510: lr=1.00E-05, loss= 1.1637 (max= 1.5303), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:00:02,914 - root - INFO - Step 32510: lr=1.00E-05, loss= 1.1637 (max= 1.5303), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:00:02,914 - root - INFO - Step 32510: lr=1.00E-05, loss= 1.1637 (max= 1.5303), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:00:02,914 - root - INFO - Step 32510: lr=1.00E-05, loss= 1.1637 (max= 1.5303), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:00:18,828 - root - INFO - Step 32520: lr=1.00E-05, loss= 1.1168 (max= 1.4848), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:00:18,828 - root - INFO - Step 32520: lr=1.00E-05, loss= 1.1168 (max= 1.4848), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:00:18,828 - root - INFO - Step 32520: lr=1.00E-05, loss= 1.1168 (max= 1.4848), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:00:18,828 - root - INFO - Step 32520: lr=1.00E-05, loss= 1.1168 (max= 1.4848), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:00:18,828 - root - INFO - Step 32520: lr=1.00E-05, loss= 1.1168 (max= 1.4848), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:00:18,828 - root - INFO - Step 32520: lr=1.00E-05, loss= 1.1168 (max= 1.4848), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:00:18,828 - root - INFO - Step 32520: lr=1.00E-05, loss= 1.1168 (max= 1.4848), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:00:18,828 - root - INFO - Step 32520: lr=1.00E-05, loss= 1.1168 (max= 1.4848), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:00:34,785 - root - INFO - Step 32530: lr=1.00E-05, loss= 1.1478 (max= 1.6069), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:00:34,785 - root - INFO - Step 32530: lr=1.00E-05, loss= 1.1478 (max= 1.6069), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:00:34,785 - root - INFO - Step 32530: lr=1.00E-05, loss= 1.1478 (max= 1.6069), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:00:34,785 - root - INFO - Step 32530: lr=1.00E-05, loss= 1.1478 (max= 1.6069), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:00:34,785 - root - INFO - Step 32530: lr=1.00E-05, loss= 1.1478 (max= 1.6069), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:00:34,785 - root - INFO - Step 32530: lr=1.00E-05, loss= 1.1478 (max= 1.6069), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:00:34,785 - root - INFO - Step 32530: lr=1.00E-05, loss= 1.1478 (max= 1.6069), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:00:34,785 - root - INFO - Step 32530: lr=1.00E-05, loss= 1.1478 (max= 1.6069), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:00:50,657 - root - INFO - Step 32540: lr=1.00E-05, loss= 1.1441 (max= 1.4430), tps=20650, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:00:50,657 - root - INFO - Step 32540: lr=1.00E-05, loss= 1.1441 (max= 1.4430), tps=20650, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:00:50,657 - root - INFO - Step 32540: lr=1.00E-05, loss= 1.1441 (max= 1.4430), tps=20650, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:00:50,657 - root - INFO - Step 32540: lr=1.00E-05, loss= 1.1441 (max= 1.4430), tps=20650, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:00:50,657 - root - INFO - Step 32540: lr=1.00E-05, loss= 1.1441 (max= 1.4430), tps=20650, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:00:50,657 - root - INFO - Step 32540: lr=1.00E-05, loss= 1.1441 (max= 1.4430), tps=20650, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:00:50,657 - root - INFO - Step 32540: lr=1.00E-05, loss= 1.1441 (max= 1.4430), tps=20650, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:00:50,658 - root - INFO - Step 32540: lr=1.00E-05, loss= 1.1441 (max= 1.4430), tps=20649, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:06,612 - root - INFO - Step 32550: lr=1.00E-05, loss= 1.1722 (max= 1.5188), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:06,612 - root - INFO - Step 32550: lr=1.00E-05, loss= 1.1722 (max= 1.5188), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:06,612 - root - INFO - Step 32550: lr=1.00E-05, loss= 1.1722 (max= 1.5188), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:06,612 - root - INFO - Step 32550: lr=1.00E-05, loss= 1.1722 (max= 1.5188), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:06,612 - root - INFO - Step 32550: lr=1.00E-05, loss= 1.1722 (max= 1.5188), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:06,612 - root - INFO - Step 32550: lr=1.00E-05, loss= 1.1722 (max= 1.5188), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:06,612 - root - INFO - Step 32550: lr=1.00E-05, loss= 1.1722 (max= 1.5188), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:06,612 - root - INFO - Step 32550: lr=1.00E-05, loss= 1.1722 (max= 1.5188), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:22,548 - root - INFO - Step 32560: lr=1.00E-05, loss= 1.1767 (max= 1.6059), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:22,548 - root - INFO - Step 32560: lr=1.00E-05, loss= 1.1767 (max= 1.6059), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:22,548 - root - INFO - Step 32560: lr=1.00E-05, loss= 1.1767 (max= 1.6059), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:22,548 - root - INFO - Step 32560: lr=1.00E-05, loss= 1.1767 (max= 1.6059), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:22,548 - root - INFO - Step 32560: lr=1.00E-05, loss= 1.1767 (max= 1.6059), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:22,548 - root - INFO - Step 32560: lr=1.00E-05, loss= 1.1767 (max= 1.6059), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:22,549 - root - INFO - Step 32560: lr=1.00E-05, loss= 1.1767 (max= 1.6059), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:22,549 - root - INFO - Step 32560: lr=1.00E-05, loss= 1.1767 (max= 1.6059), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:38,474 - root - INFO - Step 32570: lr=1.00E-05, loss= 1.1428 (max= 1.5182), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:38,474 - root - INFO - Step 32570: lr=1.00E-05, loss= 1.1428 (max= 1.5182), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:38,474 - root - INFO - Step 32570: lr=1.00E-05, loss= 1.1428 (max= 1.5182), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:38,474 - root - INFO - Step 32570: lr=1.00E-05, loss= 1.1428 (max= 1.5182), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:38,474 - root - INFO - Step 32570: lr=1.00E-05, loss= 1.1428 (max= 1.5182), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:38,474 - root - INFO - Step 32570: lr=1.00E-05, loss= 1.1428 (max= 1.5182), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:38,474 - root - INFO - Step 32570: lr=1.00E-05, loss= 1.1428 (max= 1.5182), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:38,474 - root - INFO - Step 32570: lr=1.00E-05, loss= 1.1428 (max= 1.5182), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:01:54,391 - root - INFO - Step 32580: lr=1.00E-05, loss= 1.1582 (max= 1.5053), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:01:54,391 - root - INFO - Step 32580: lr=1.00E-05, loss= 1.1582 (max= 1.5053), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:01:54,391 - root - INFO - Step 32580: lr=1.00E-05, loss= 1.1582 (max= 1.5053), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:01:54,391 - root - INFO - Step 32580: lr=1.00E-05, loss= 1.1582 (max= 1.5053), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:01:54,391 - root - INFO - Step 32580: lr=1.00E-05, loss= 1.1582 (max= 1.5053), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:01:54,391 - root - INFO - Step 32580: lr=1.00E-05, loss= 1.1582 (max= 1.5053), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:01:54,392 - root - INFO - Step 32580: lr=1.00E-05, loss= 1.1582 (max= 1.5053), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:01:54,392 - root - INFO - Step 32580: lr=1.00E-05, loss= 1.1582 (max= 1.5053), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:02:10,341 - root - INFO - Step 32590: lr=1.00E-05, loss= 1.1735 (max= 1.5169), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:02:10,341 - root - INFO - Step 32590: lr=1.00E-05, loss= 1.1735 (max= 1.5169), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:02:10,341 - root - INFO - Step 32590: lr=1.00E-05, loss= 1.1735 (max= 1.5169), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:02:10,341 - root - INFO - Step 32590: lr=1.00E-05, loss= 1.1735 (max= 1.5169), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:02:10,341 - root - INFO - Step 32590: lr=1.00E-05, loss= 1.1735 (max= 1.5169), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:02:10,341 - root - INFO - Step 32590: lr=1.00E-05, loss= 1.1735 (max= 1.5169), tps=20549, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:02:10,341 - root - INFO - Step 32590: lr=1.00E-05, loss= 1.1735 (max= 1.5169), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:02:10,341 - root - INFO - Step 32590: lr=1.00E-05, loss= 1.1735 (max= 1.5169), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:02:26,266 - root - INFO - Step 32600: lr=1.00E-05, loss= 1.1104 (max= 1.6327), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:26,266 - root - INFO - Step 32600: lr=1.00E-05, loss= 1.1104 (max= 1.6327), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:26,266 - root - INFO - Step 32600: lr=1.00E-05, loss= 1.1104 (max= 1.6327), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:26,266 - root - INFO - Step 32600: lr=1.00E-05, loss= 1.1104 (max= 1.6327), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:26,266 - root - INFO - Step 32600: lr=1.00E-05, loss= 1.1104 (max= 1.6327), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:26,266 - root - INFO - Step 32600: lr=1.00E-05, loss= 1.1104 (max= 1.6327), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:26,266 - root - INFO - Step 32600: lr=1.00E-05, loss= 1.1104 (max= 1.6327), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:26,266 - root - INFO - Step 32600: lr=1.00E-05, loss= 1.1104 (max= 1.6327), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:42,251 - root - INFO - Step 32610: lr=1.00E-05, loss= 1.1647 (max= 1.5078), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:42,251 - root - INFO - Step 32610: lr=1.00E-05, loss= 1.1647 (max= 1.5078), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:42,251 - root - INFO - Step 32610: lr=1.00E-05, loss= 1.1647 (max= 1.5078), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:42,251 - root - INFO - Step 32610: lr=1.00E-05, loss= 1.1647 (max= 1.5078), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:42,251 - root - INFO - Step 32610: lr=1.00E-05, loss= 1.1647 (max= 1.5078), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:42,251 - root - INFO - Step 32610: lr=1.00E-05, loss= 1.1647 (max= 1.5078), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:42,251 - root - INFO - Step 32610: lr=1.00E-05, loss= 1.1647 (max= 1.5078), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:42,251 - root - INFO - Step 32610: lr=1.00E-05, loss= 1.1647 (max= 1.5078), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:58,129 - root - INFO - Step 32620: lr=1.00E-05, loss= 1.1876 (max= 1.6014), tps=20641, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:58,129 - root - INFO - Step 32620: lr=1.00E-05, loss= 1.1876 (max= 1.6014), tps=20641, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:58,129 - root - INFO - Step 32620: lr=1.00E-05, loss= 1.1876 (max= 1.6014), tps=20641, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:58,129 - root - INFO - Step 32620: lr=1.00E-05, loss= 1.1876 (max= 1.6014), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:58,129 - root - INFO - Step 32620: lr=1.00E-05, loss= 1.1876 (max= 1.6014), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:58,129 - root - INFO - Step 32620: lr=1.00E-05, loss= 1.1876 (max= 1.6014), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:58,129 - root - INFO - Step 32620: lr=1.00E-05, loss= 1.1876 (max= 1.6014), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:02:58,129 - root - INFO - Step 32620: lr=1.00E-05, loss= 1.1876 (max= 1.6014), tps=20642, mfu=43.01%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:03:14,087 - root - INFO - Step 32630: lr=1.00E-05, loss= 1.1392 (max= 1.5333), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:03:14,087 - root - INFO - Step 32630: lr=1.00E-05, loss= 1.1392 (max= 1.5333), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:03:14,087 - root - INFO - Step 32630: lr=1.00E-05, loss= 1.1392 (max= 1.5333), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:03:14,087 - root - INFO - Step 32630: lr=1.00E-05, loss= 1.1392 (max= 1.5333), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:03:14,087 - root - INFO - Step 32630: lr=1.00E-05, loss= 1.1392 (max= 1.5333), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:03:14,087 - root - INFO - Step 32630: lr=1.00E-05, loss= 1.1392 (max= 1.5333), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:03:14,088 - root - INFO - Step 32630: lr=1.00E-05, loss= 1.1392 (max= 1.5333), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:03:14,088 - root - INFO - Step 32630: lr=1.00E-05, loss= 1.1392 (max= 1.5333), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:03:30,064 - root - INFO - Step 32640: lr=1.00E-05, loss= 1.1346 (max= 1.6396), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:03:30,064 - root - INFO - Step 32640: lr=1.00E-05, loss= 1.1346 (max= 1.6396), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:03:30,064 - root - INFO - Step 32640: lr=1.00E-05, loss= 1.1346 (max= 1.6396), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:03:30,064 - root - INFO - Step 32640: lr=1.00E-05, loss= 1.1346 (max= 1.6396), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:03:30,065 - root - INFO - Step 32640: lr=1.00E-05, loss= 1.1346 (max= 1.6396), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:03:30,065 - root - INFO - Step 32640: lr=1.00E-05, loss= 1.1346 (max= 1.6396), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:03:30,065 - root - INFO - Step 32640: lr=1.00E-05, loss= 1.1346 (max= 1.6396), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:03:30,065 - root - INFO - Step 32640: lr=1.00E-05, loss= 1.1346 (max= 1.6396), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:03:45,994 - root - INFO - Step 32650: lr=1.00E-05, loss= 1.1398 (max= 1.5959), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:03:45,994 - root - INFO - Step 32650: lr=1.00E-05, loss= 1.1398 (max= 1.5959), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:03:45,994 - root - INFO - Step 32650: lr=1.00E-05, loss= 1.1398 (max= 1.5959), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:03:45,994 - root - INFO - Step 32650: lr=1.00E-05, loss= 1.1398 (max= 1.5959), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:03:45,994 - root - INFO - Step 32650: lr=1.00E-05, loss= 1.1398 (max= 1.5959), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:03:45,994 - root - INFO - Step 32650: lr=1.00E-05, loss= 1.1398 (max= 1.5959), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:03:45,994 - root - INFO - Step 32650: lr=1.00E-05, loss= 1.1398 (max= 1.5959), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:03:45,994 - root - INFO - Step 32650: lr=1.00E-05, loss= 1.1398 (max= 1.5959), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:01,961 - root - INFO - Step 32660: lr=1.00E-05, loss= 1.1758 (max= 1.5810), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:01,961 - root - INFO - Step 32660: lr=1.00E-05, loss= 1.1758 (max= 1.5810), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:01,961 - root - INFO - Step 32660: lr=1.00E-05, loss= 1.1758 (max= 1.5810), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:01,961 - root - INFO - Step 32660: lr=1.00E-05, loss= 1.1758 (max= 1.5810), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:01,961 - root - INFO - Step 32660: lr=1.00E-05, loss= 1.1758 (max= 1.5810), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:01,962 - root - INFO - Step 32660: lr=1.00E-05, loss= 1.1758 (max= 1.5810), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:01,962 - root - INFO - Step 32660: lr=1.00E-05, loss= 1.1758 (max= 1.5810), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:01,962 - root - INFO - Step 32660: lr=1.00E-05, loss= 1.1758 (max= 1.5810), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:17,978 - root - INFO - Step 32670: lr=1.00E-05, loss= 1.1645 (max= 1.5442), tps=20463, mfu=42.63%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:17,978 - root - INFO - Step 32670: lr=1.00E-05, loss= 1.1645 (max= 1.5442), tps=20463, mfu=42.63%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:17,978 - root - INFO - Step 32670: lr=1.00E-05, loss= 1.1645 (max= 1.5442), tps=20463, mfu=42.63%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:17,978 - root - INFO - Step 32670: lr=1.00E-05, loss= 1.1645 (max= 1.5442), tps=20463, mfu=42.63%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:17,978 - root - INFO - Step 32670: lr=1.00E-05, loss= 1.1645 (max= 1.5442), tps=20463, mfu=42.63%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:17,978 - root - INFO - Step 32670: lr=1.00E-05, loss= 1.1645 (max= 1.5442), tps=20463, mfu=42.63%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:17,978 - root - INFO - Step 32670: lr=1.00E-05, loss= 1.1645 (max= 1.5442), tps=20463, mfu=42.63%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:17,978 - root - INFO - Step 32670: lr=1.00E-05, loss= 1.1645 (max= 1.5442), tps=20463, mfu=42.63%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:33,942 - root - INFO - Step 32680: lr=1.00E-05, loss= 1.1337 (max= 1.6546), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:04:33,942 - root - INFO - Step 32680: lr=1.00E-05, loss= 1.1337 (max= 1.6546), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:04:33,942 - root - INFO - Step 32680: lr=1.00E-05, loss= 1.1337 (max= 1.6546), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:04:33,942 - root - INFO - Step 32680: lr=1.00E-05, loss= 1.1337 (max= 1.6546), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:04:33,942 - root - INFO - Step 32680: lr=1.00E-05, loss= 1.1337 (max= 1.6546), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:04:33,942 - root - INFO - Step 32680: lr=1.00E-05, loss= 1.1337 (max= 1.6546), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:04:33,943 - root - INFO - Step 32680: lr=1.00E-05, loss= 1.1337 (max= 1.6546), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:04:33,943 - root - INFO - Step 32680: lr=1.00E-05, loss= 1.1337 (max= 1.6546), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:04:49,913 - root - INFO - Step 32690: lr=1.00E-05, loss= 1.1797 (max= 1.6240), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:49,913 - root - INFO - Step 32690: lr=1.00E-05, loss= 1.1797 (max= 1.6240), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:49,913 - root - INFO - Step 32690: lr=1.00E-05, loss= 1.1797 (max= 1.6240), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:49,913 - root - INFO - Step 32690: lr=1.00E-05, loss= 1.1797 (max= 1.6240), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:49,913 - root - INFO - Step 32690: lr=1.00E-05, loss= 1.1797 (max= 1.6240), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:49,913 - root - INFO - Step 32690: lr=1.00E-05, loss= 1.1797 (max= 1.6240), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:49,913 - root - INFO - Step 32690: lr=1.00E-05, loss= 1.1797 (max= 1.6240), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:04:49,913 - root - INFO - Step 32690: lr=1.00E-05, loss= 1.1797 (max= 1.6240), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:05,876 - root - INFO - Step 32700: lr=1.00E-05, loss= 1.1573 (max= 1.6114), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:05,876 - root - INFO - Step 32700: lr=1.00E-05, loss= 1.1573 (max= 1.6114), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:05,876 - root - INFO - Step 32700: lr=1.00E-05, loss= 1.1573 (max= 1.6114), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:05,877 - root - INFO - Step 32700: lr=1.00E-05, loss= 1.1573 (max= 1.6114), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:05,877 - root - INFO - Step 32700: lr=1.00E-05, loss= 1.1573 (max= 1.6114), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:05,877 - root - INFO - Step 32700: lr=1.00E-05, loss= 1.1573 (max= 1.6114), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:05,877 - root - INFO - Step 32700: lr=1.00E-05, loss= 1.1573 (max= 1.6114), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:05,877 - root - INFO - Step 32700: lr=1.00E-05, loss= 1.1573 (max= 1.6114), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:21,795 - root - INFO - Step 32710: lr=1.00E-05, loss= 1.1933 (max= 1.6289), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:21,795 - root - INFO - Step 32710: lr=1.00E-05, loss= 1.1933 (max= 1.6289), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:21,795 - root - INFO - Step 32710: lr=1.00E-05, loss= 1.1933 (max= 1.6289), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:21,795 - root - INFO - Step 32710: lr=1.00E-05, loss= 1.1933 (max= 1.6289), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:21,795 - root - INFO - Step 32710: lr=1.00E-05, loss= 1.1933 (max= 1.6289), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:21,795 - root - INFO - Step 32710: lr=1.00E-05, loss= 1.1933 (max= 1.6289), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:21,795 - root - INFO - Step 32710: lr=1.00E-05, loss= 1.1933 (max= 1.6289), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:21,795 - root - INFO - Step 32710: lr=1.00E-05, loss= 1.1933 (max= 1.6289), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:37,776 - root - INFO - Step 32720: lr=1.00E-05, loss= 1.1555 (max= 1.5748), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:37,776 - root - INFO - Step 32720: lr=1.00E-05, loss= 1.1555 (max= 1.5748), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:37,776 - root - INFO - Step 32720: lr=1.00E-05, loss= 1.1555 (max= 1.5748), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:37,776 - root - INFO - Step 32720: lr=1.00E-05, loss= 1.1555 (max= 1.5748), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:37,776 - root - INFO - Step 32720: lr=1.00E-05, loss= 1.1555 (max= 1.5748), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:37,776 - root - INFO - Step 32720: lr=1.00E-05, loss= 1.1555 (max= 1.5748), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:37,776 - root - INFO - Step 32720: lr=1.00E-05, loss= 1.1555 (max= 1.5748), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:37,776 - root - INFO - Step 32720: lr=1.00E-05, loss= 1.1555 (max= 1.5748), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:05:53,750 - root - INFO - Step 32730: lr=1.00E-05, loss= 1.1582 (max= 1.5970), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:05:53,750 - root - INFO - Step 32730: lr=1.00E-05, loss= 1.1582 (max= 1.5970), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:05:53,750 - root - INFO - Step 32730: lr=1.00E-05, loss= 1.1582 (max= 1.5970), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:05:53,750 - root - INFO - Step 32730: lr=1.00E-05, loss= 1.1582 (max= 1.5970), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:05:53,750 - root - INFO - Step 32730: lr=1.00E-05, loss= 1.1582 (max= 1.5970), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:05:53,750 - root - INFO - Step 32730: lr=1.00E-05, loss= 1.1582 (max= 1.5970), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:05:53,750 - root - INFO - Step 32730: lr=1.00E-05, loss= 1.1582 (max= 1.5970), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:05:53,750 - root - INFO - Step 32730: lr=1.00E-05, loss= 1.1582 (max= 1.5970), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:09,709 - root - INFO - Step 32740: lr=1.00E-05, loss= 1.1510 (max= 1.5996), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:09,709 - root - INFO - Step 32740: lr=1.00E-05, loss= 1.1510 (max= 1.5996), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:09,709 - root - INFO - Step 32740: lr=1.00E-05, loss= 1.1510 (max= 1.5996), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:09,709 - root - INFO - Step 32740: lr=1.00E-05, loss= 1.1510 (max= 1.5996), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:09,709 - root - INFO - Step 32740: lr=1.00E-05, loss= 1.1510 (max= 1.5996), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:09,709 - root - INFO - Step 32740: lr=1.00E-05, loss= 1.1510 (max= 1.5996), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:09,709 - root - INFO - Step 32740: lr=1.00E-05, loss= 1.1510 (max= 1.5996), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:09,709 - root - INFO - Step 32740: lr=1.00E-05, loss= 1.1510 (max= 1.5996), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:25,629 - root - INFO - Step 32750: lr=1.00E-05, loss= 1.1933 (max= 1.5757), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:25,629 - root - INFO - Step 32750: lr=1.00E-05, loss= 1.1933 (max= 1.5757), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:25,629 - root - INFO - Step 32750: lr=1.00E-05, loss= 1.1933 (max= 1.5757), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:25,629 - root - INFO - Step 32750: lr=1.00E-05, loss= 1.1933 (max= 1.5757), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:25,629 - root - INFO - Step 32750: lr=1.00E-05, loss= 1.1933 (max= 1.5757), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:25,629 - root - INFO - Step 32750: lr=1.00E-05, loss= 1.1933 (max= 1.5757), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:25,629 - root - INFO - Step 32750: lr=1.00E-05, loss= 1.1933 (max= 1.5757), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:25,629 - root - INFO - Step 32750: lr=1.00E-05, loss= 1.1933 (max= 1.5757), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:41,569 - root - INFO - Step 32760: lr=1.00E-05, loss= 1.1521 (max= 1.7773), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:41,569 - root - INFO - Step 32760: lr=1.00E-05, loss= 1.1521 (max= 1.7773), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:41,569 - root - INFO - Step 32760: lr=1.00E-05, loss= 1.1521 (max= 1.7773), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:41,569 - root - INFO - Step 32760: lr=1.00E-05, loss= 1.1521 (max= 1.7773), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:41,569 - root - INFO - Step 32760: lr=1.00E-05, loss= 1.1521 (max= 1.7773), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:41,569 - root - INFO - Step 32760: lr=1.00E-05, loss= 1.1521 (max= 1.7773), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:41,569 - root - INFO - Step 32760: lr=1.00E-05, loss= 1.1521 (max= 1.7773), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:41,569 - root - INFO - Step 32760: lr=1.00E-05, loss= 1.1521 (max= 1.7773), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:57,465 - root - INFO - Step 32770: lr=1.00E-05, loss= 1.1595 (max= 1.6521), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:57,465 - root - INFO - Step 32770: lr=1.00E-05, loss= 1.1595 (max= 1.6521), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:57,465 - root - INFO - Step 32770: lr=1.00E-05, loss= 1.1595 (max= 1.6521), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:57,465 - root - INFO - Step 32770: lr=1.00E-05, loss= 1.1595 (max= 1.6521), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:57,465 - root - INFO - Step 32770: lr=1.00E-05, loss= 1.1595 (max= 1.6521), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:57,465 - root - INFO - Step 32770: lr=1.00E-05, loss= 1.1595 (max= 1.6521), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:57,465 - root - INFO - Step 32770: lr=1.00E-05, loss= 1.1595 (max= 1.6521), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:06:57,465 - root - INFO - Step 32770: lr=1.00E-05, loss= 1.1595 (max= 1.6521), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:13,390 - root - INFO - Step 32780: lr=1.00E-05, loss= 1.1463 (max= 1.5094), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:13,390 - root - INFO - Step 32780: lr=1.00E-05, loss= 1.1463 (max= 1.5094), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:13,390 - root - INFO - Step 32780: lr=1.00E-05, loss= 1.1463 (max= 1.5094), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:13,390 - root - INFO - Step 32780: lr=1.00E-05, loss= 1.1463 (max= 1.5094), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:13,390 - root - INFO - Step 32780: lr=1.00E-05, loss= 1.1463 (max= 1.5094), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:13,390 - root - INFO - Step 32780: lr=1.00E-05, loss= 1.1463 (max= 1.5094), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:13,390 - root - INFO - Step 32780: lr=1.00E-05, loss= 1.1463 (max= 1.5094), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:13,390 - root - INFO - Step 32780: lr=1.00E-05, loss= 1.1463 (max= 1.5094), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:29,396 - root - INFO - Step 32790: lr=1.00E-05, loss= 1.1554 (max= 1.6237), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:29,396 - root - INFO - Step 32790: lr=1.00E-05, loss= 1.1554 (max= 1.6237), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:29,396 - root - INFO - Step 32790: lr=1.00E-05, loss= 1.1554 (max= 1.6237), tps=20476, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:29,396 - root - INFO - Step 32790: lr=1.00E-05, loss= 1.1554 (max= 1.6237), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:29,396 - root - INFO - Step 32790: lr=1.00E-05, loss= 1.1554 (max= 1.6237), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:29,396 - root - INFO - Step 32790: lr=1.00E-05, loss= 1.1554 (max= 1.6237), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:29,396 - root - INFO - Step 32790: lr=1.00E-05, loss= 1.1554 (max= 1.6237), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:29,396 - root - INFO - Step 32790: lr=1.00E-05, loss= 1.1554 (max= 1.6237), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:45,363 - root - INFO - Step 32800: lr=1.00E-05, loss= 1.2031 (max= 1.6623), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:45,363 - root - INFO - Step 32800: lr=1.00E-05, loss= 1.2031 (max= 1.6623), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:45,363 - root - INFO - Step 32800: lr=1.00E-05, loss= 1.2031 (max= 1.6623), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:45,363 - root - INFO - Step 32800: lr=1.00E-05, loss= 1.2031 (max= 1.6623), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:45,363 - root - INFO - Step 32800: lr=1.00E-05, loss= 1.2031 (max= 1.6623), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:45,363 - root - INFO - Step 32800: lr=1.00E-05, loss= 1.2031 (max= 1.6623), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:45,363 - root - INFO - Step 32800: lr=1.00E-05, loss= 1.2031 (max= 1.6623), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:07:45,363 - root - INFO - Step 32800: lr=1.00E-05, loss= 1.2031 (max= 1.6623), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:01,349 - root - INFO - Step 32810: lr=1.00E-05, loss= 1.1839 (max= 1.6369), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:01,349 - root - INFO - Step 32810: lr=1.00E-05, loss= 1.1839 (max= 1.6369), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:01,349 - root - INFO - Step 32810: lr=1.00E-05, loss= 1.1839 (max= 1.6369), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:01,349 - root - INFO - Step 32810: lr=1.00E-05, loss= 1.1839 (max= 1.6369), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:01,349 - root - INFO - Step 32810: lr=1.00E-05, loss= 1.1839 (max= 1.6369), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:01,349 - root - INFO - Step 32810: lr=1.00E-05, loss= 1.1839 (max= 1.6369), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:01,349 - root - INFO - Step 32810: lr=1.00E-05, loss= 1.1839 (max= 1.6369), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:01,349 - root - INFO - Step 32810: lr=1.00E-05, loss= 1.1839 (max= 1.6369), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:17,279 - root - INFO - Step 32820: lr=1.00E-05, loss= 1.1589 (max= 1.7570), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:17,280 - root - INFO - Step 32820: lr=1.00E-05, loss= 1.1589 (max= 1.7570), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:17,280 - root - INFO - Step 32820: lr=1.00E-05, loss= 1.1589 (max= 1.7570), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:17,280 - root - INFO - Step 32820: lr=1.00E-05, loss= 1.1589 (max= 1.7570), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:17,280 - root - INFO - Step 32820: lr=1.00E-05, loss= 1.1589 (max= 1.7570), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:17,280 - root - INFO - Step 32820: lr=1.00E-05, loss= 1.1589 (max= 1.7570), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:17,280 - root - INFO - Step 32820: lr=1.00E-05, loss= 1.1589 (max= 1.7570), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:17,280 - root - INFO - Step 32820: lr=1.00E-05, loss= 1.1589 (max= 1.7570), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:33,197 - root - INFO - Step 32830: lr=1.00E-05, loss= 1.1588 (max= 1.4917), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:08:33,197 - root - INFO - Step 32830: lr=1.00E-05, loss= 1.1588 (max= 1.4917), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:08:33,197 - root - INFO - Step 32830: lr=1.00E-05, loss= 1.1588 (max= 1.4917), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:08:33,197 - root - INFO - Step 32830: lr=1.00E-05, loss= 1.1588 (max= 1.4917), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:08:33,197 - root - INFO - Step 32830: lr=1.00E-05, loss= 1.1588 (max= 1.4917), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:08:33,197 - root - INFO - Step 32830: lr=1.00E-05, loss= 1.1588 (max= 1.4917), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:08:33,197 - root - INFO - Step 32830: lr=1.00E-05, loss= 1.1588 (max= 1.4917), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:08:33,197 - root - INFO - Step 32830: lr=1.00E-05, loss= 1.1588 (max= 1.4917), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:08:49,155 - root - INFO - Step 32840: lr=1.00E-05, loss= 1.1396 (max= 1.5199), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:49,155 - root - INFO - Step 32840: lr=1.00E-05, loss= 1.1396 (max= 1.5199), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:49,155 - root - INFO - Step 32840: lr=1.00E-05, loss= 1.1396 (max= 1.5199), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:49,155 - root - INFO - Step 32840: lr=1.00E-05, loss= 1.1396 (max= 1.5199), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:49,155 - root - INFO - Step 32840: lr=1.00E-05, loss= 1.1396 (max= 1.5199), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:49,155 - root - INFO - Step 32840: lr=1.00E-05, loss= 1.1396 (max= 1.5199), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:49,155 - root - INFO - Step 32840: lr=1.00E-05, loss= 1.1396 (max= 1.5199), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:08:49,155 - root - INFO - Step 32840: lr=1.00E-05, loss= 1.1396 (max= 1.5199), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:09:05,108 - root - INFO - Step 32850: lr=1.00E-05, loss= 1.1550 (max= 1.6877), tps=20545, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:05,108 - root - INFO - Step 32850: lr=1.00E-05, loss= 1.1550 (max= 1.6877), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:05,108 - root - INFO - Step 32850: lr=1.00E-05, loss= 1.1550 (max= 1.6877), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:05,108 - root - INFO - Step 32850: lr=1.00E-05, loss= 1.1550 (max= 1.6877), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:05,108 - root - INFO - Step 32850: lr=1.00E-05, loss= 1.1550 (max= 1.6877), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:05,108 - root - INFO - Step 32850: lr=1.00E-05, loss= 1.1550 (max= 1.6877), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:05,108 - root - INFO - Step 32850: lr=1.00E-05, loss= 1.1550 (max= 1.6877), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:05,108 - root - INFO - Step 32850: lr=1.00E-05, loss= 1.1550 (max= 1.6877), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:21,024 - root - INFO - Step 32860: lr=1.00E-05, loss= 1.1689 (max= 1.5987), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:21,024 - root - INFO - Step 32860: lr=1.00E-05, loss= 1.1689 (max= 1.5987), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:21,024 - root - INFO - Step 32860: lr=1.00E-05, loss= 1.1689 (max= 1.5987), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:21,025 - root - INFO - Step 32860: lr=1.00E-05, loss= 1.1689 (max= 1.5987), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:21,025 - root - INFO - Step 32860: lr=1.00E-05, loss= 1.1689 (max= 1.5987), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:21,025 - root - INFO - Step 32860: lr=1.00E-05, loss= 1.1689 (max= 1.5987), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:21,025 - root - INFO - Step 32860: lr=1.00E-05, loss= 1.1689 (max= 1.5987), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:21,025 - root - INFO - Step 32860: lr=1.00E-05, loss= 1.1689 (max= 1.5987), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:36,929 - root - INFO - Step 32870: lr=1.00E-05, loss= 1.1591 (max= 1.6016), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:09:36,929 - root - INFO - Step 32870: lr=1.00E-05, loss= 1.1591 (max= 1.6016), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:09:36,929 - root - INFO - Step 32870: lr=1.00E-05, loss= 1.1591 (max= 1.6016), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:09:36,929 - root - INFO - Step 32870: lr=1.00E-05, loss= 1.1591 (max= 1.6016), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:09:36,929 - root - INFO - Step 32870: lr=1.00E-05, loss= 1.1591 (max= 1.6016), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:09:36,929 - root - INFO - Step 32870: lr=1.00E-05, loss= 1.1591 (max= 1.6016), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:09:36,929 - root - INFO - Step 32870: lr=1.00E-05, loss= 1.1591 (max= 1.6016), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:09:36,929 - root - INFO - Step 32870: lr=1.00E-05, loss= 1.1591 (max= 1.6016), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:09:52,893 - root - INFO - Step 32880: lr=1.00E-05, loss= 1.1464 (max= 1.4481), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:52,893 - root - INFO - Step 32880: lr=1.00E-05, loss= 1.1464 (max= 1.4481), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:52,893 - root - INFO - Step 32880: lr=1.00E-05, loss= 1.1464 (max= 1.4481), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:52,893 - root - INFO - Step 32880: lr=1.00E-05, loss= 1.1464 (max= 1.4481), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:52,893 - root - INFO - Step 32880: lr=1.00E-05, loss= 1.1464 (max= 1.4481), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:52,893 - root - INFO - Step 32880: lr=1.00E-05, loss= 1.1464 (max= 1.4481), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:52,893 - root - INFO - Step 32880: lr=1.00E-05, loss= 1.1464 (max= 1.4481), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:09:52,894 - root - INFO - Step 32880: lr=1.00E-05, loss= 1.1464 (max= 1.4481), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:10:08,859 - root - INFO - Step 32890: lr=1.00E-05, loss= 1.1711 (max= 1.5808), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:08,859 - root - INFO - Step 32890: lr=1.00E-05, loss= 1.1711 (max= 1.5808), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:08,859 - root - INFO - Step 32890: lr=1.00E-05, loss= 1.1711 (max= 1.5808), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:08,859 - root - INFO - Step 32890: lr=1.00E-05, loss= 1.1711 (max= 1.5808), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:08,859 - root - INFO - Step 32890: lr=1.00E-05, loss= 1.1711 (max= 1.5808), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:08,859 - root - INFO - Step 32890: lr=1.00E-05, loss= 1.1711 (max= 1.5808), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:08,859 - root - INFO - Step 32890: lr=1.00E-05, loss= 1.1711 (max= 1.5808), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:08,859 - root - INFO - Step 32890: lr=1.00E-05, loss= 1.1711 (max= 1.5808), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:24,851 - root - INFO - Step 32900: lr=1.00E-05, loss= 1.1688 (max= 1.5294), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:24,851 - root - INFO - Step 32900: lr=1.00E-05, loss= 1.1688 (max= 1.5294), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:24,851 - root - INFO - Step 32900: lr=1.00E-05, loss= 1.1688 (max= 1.5294), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:24,851 - root - INFO - Step 32900: lr=1.00E-05, loss= 1.1688 (max= 1.5294), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:24,851 - root - INFO - Step 32900: lr=1.00E-05, loss= 1.1688 (max= 1.5294), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:24,851 - root - INFO - Step 32900: lr=1.00E-05, loss= 1.1688 (max= 1.5294), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:24,851 - root - INFO - Step 32900: lr=1.00E-05, loss= 1.1688 (max= 1.5294), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:24,851 - root - INFO - Step 32900: lr=1.00E-05, loss= 1.1688 (max= 1.5294), tps=20495, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:40,768 - root - INFO - Step 32910: lr=1.00E-05, loss= 1.1595 (max= 1.5534), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:40,768 - root - INFO - Step 32910: lr=1.00E-05, loss= 1.1595 (max= 1.5534), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:40,768 - root - INFO - Step 32910: lr=1.00E-05, loss= 1.1595 (max= 1.5534), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:40,769 - root - INFO - Step 32910: lr=1.00E-05, loss= 1.1595 (max= 1.5534), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:40,769 - root - INFO - Step 32910: lr=1.00E-05, loss= 1.1595 (max= 1.5534), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:40,769 - root - INFO - Step 32910: lr=1.00E-05, loss= 1.1595 (max= 1.5534), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:40,769 - root - INFO - Step 32910: lr=1.00E-05, loss= 1.1595 (max= 1.5534), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:40,769 - root - INFO - Step 32910: lr=1.00E-05, loss= 1.1595 (max= 1.5534), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:56,697 - root - INFO - Step 32920: lr=1.00E-05, loss= 1.1459 (max= 1.4891), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:56,697 - root - INFO - Step 32920: lr=1.00E-05, loss= 1.1459 (max= 1.4891), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:56,697 - root - INFO - Step 32920: lr=1.00E-05, loss= 1.1459 (max= 1.4891), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:56,697 - root - INFO - Step 32920: lr=1.00E-05, loss= 1.1459 (max= 1.4891), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:56,697 - root - INFO - Step 32920: lr=1.00E-05, loss= 1.1459 (max= 1.4891), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:56,697 - root - INFO - Step 32920: lr=1.00E-05, loss= 1.1459 (max= 1.4891), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:56,697 - root - INFO - Step 32920: lr=1.00E-05, loss= 1.1459 (max= 1.4891), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:10:56,697 - root - INFO - Step 32920: lr=1.00E-05, loss= 1.1459 (max= 1.4891), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:11:12,622 - root - INFO - Step 32930: lr=1.00E-05, loss= 1.1776 (max= 1.5649), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:11:12,622 - root - INFO - Step 32930: lr=1.00E-05, loss= 1.1776 (max= 1.5649), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:11:12,622 - root - INFO - Step 32930: lr=1.00E-05, loss= 1.1776 (max= 1.5649), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:11:12,622 - root - INFO - Step 32930: lr=1.00E-05, loss= 1.1776 (max= 1.5649), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:11:12,622 - root - INFO - Step 32930: lr=1.00E-05, loss= 1.1776 (max= 1.5649), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:11:12,622 - root - INFO - Step 32930: lr=1.00E-05, loss= 1.1776 (max= 1.5649), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:11:12,622 - root - INFO - Step 32930: lr=1.00E-05, loss= 1.1776 (max= 1.5649), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:11:12,622 - root - INFO - Step 32930: lr=1.00E-05, loss= 1.1776 (max= 1.5649), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:11:28,580 - root - INFO - Step 32940: lr=1.00E-05, loss= 1.1622 (max= 1.4665), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:11:28,580 - root - INFO - Step 32940: lr=1.00E-05, loss= 1.1622 (max= 1.4665), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:11:28,580 - root - INFO - Step 32940: lr=1.00E-05, loss= 1.1622 (max= 1.4665), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:11:28,580 - root - INFO - Step 32940: lr=1.00E-05, loss= 1.1622 (max= 1.4665), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:11:28,580 - root - INFO - Step 32940: lr=1.00E-05, loss= 1.1622 (max= 1.4665), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:11:28,580 - root - INFO - Step 32940: lr=1.00E-05, loss= 1.1622 (max= 1.4665), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:11:28,580 - root - INFO - Step 32940: lr=1.00E-05, loss= 1.1622 (max= 1.4665), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:11:28,580 - root - INFO - Step 32940: lr=1.00E-05, loss= 1.1622 (max= 1.4665), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:11:44,498 - root - INFO - Step 32950: lr=1.00E-05, loss= 1.1649 (max= 1.6257), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:11:44,498 - root - INFO - Step 32950: lr=1.00E-05, loss= 1.1649 (max= 1.6257), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:11:44,498 - root - INFO - Step 32950: lr=1.00E-05, loss= 1.1649 (max= 1.6257), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:11:44,498 - root - INFO - Step 32950: lr=1.00E-05, loss= 1.1649 (max= 1.6257), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:11:44,498 - root - INFO - Step 32950: lr=1.00E-05, loss= 1.1649 (max= 1.6257), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:11:44,498 - root - INFO - Step 32950: lr=1.00E-05, loss= 1.1649 (max= 1.6257), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:11:44,498 - root - INFO - Step 32950: lr=1.00E-05, loss= 1.1649 (max= 1.6257), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:11:44,498 - root - INFO - Step 32950: lr=1.00E-05, loss= 1.1649 (max= 1.6257), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:12:00,454 - root - INFO - Step 32960: lr=1.00E-05, loss= 1.1754 (max= 1.5231), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:00,454 - root - INFO - Step 32960: lr=1.00E-05, loss= 1.1754 (max= 1.5231), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:00,454 - root - INFO - Step 32960: lr=1.00E-05, loss= 1.1754 (max= 1.5231), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:00,454 - root - INFO - Step 32960: lr=1.00E-05, loss= 1.1754 (max= 1.5231), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:00,454 - root - INFO - Step 32960: lr=1.00E-05, loss= 1.1754 (max= 1.5231), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:00,454 - root - INFO - Step 32960: lr=1.00E-05, loss= 1.1754 (max= 1.5231), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:00,454 - root - INFO - Step 32960: lr=1.00E-05, loss= 1.1754 (max= 1.5231), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:00,454 - root - INFO - Step 32960: lr=1.00E-05, loss= 1.1754 (max= 1.5231), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:16,424 - root - INFO - Step 32970: lr=1.00E-05, loss= 1.1693 (max= 1.8019), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:16,424 - root - INFO - Step 32970: lr=1.00E-05, loss= 1.1693 (max= 1.8019), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:16,424 - root - INFO - Step 32970: lr=1.00E-05, loss= 1.1693 (max= 1.8019), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:16,424 - root - INFO - Step 32970: lr=1.00E-05, loss= 1.1693 (max= 1.8019), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:16,424 - root - INFO - Step 32970: lr=1.00E-05, loss= 1.1693 (max= 1.8019), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:16,424 - root - INFO - Step 32970: lr=1.00E-05, loss= 1.1693 (max= 1.8019), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:16,424 - root - INFO - Step 32970: lr=1.00E-05, loss= 1.1693 (max= 1.8019), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:16,424 - root - INFO - Step 32970: lr=1.00E-05, loss= 1.1693 (max= 1.8019), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:32,354 - root - INFO - Step 32980: lr=1.00E-05, loss= 1.1834 (max= 1.5908), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:12:32,354 - root - INFO - Step 32980: lr=1.00E-05, loss= 1.1834 (max= 1.5908), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:12:32,354 - root - INFO - Step 32980: lr=1.00E-05, loss= 1.1834 (max= 1.5908), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:12:32,355 - root - INFO - Step 32980: lr=1.00E-05, loss= 1.1834 (max= 1.5908), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:12:32,355 - root - INFO - Step 32980: lr=1.00E-05, loss= 1.1834 (max= 1.5908), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:12:32,355 - root - INFO - Step 32980: lr=1.00E-05, loss= 1.1834 (max= 1.5908), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:12:32,355 - root - INFO - Step 32980: lr=1.00E-05, loss= 1.1834 (max= 1.5908), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:12:32,355 - root - INFO - Step 32980: lr=1.00E-05, loss= 1.1834 (max= 1.5908), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:12:48,294 - root - INFO - Step 32990: lr=1.00E-05, loss= 1.1311 (max= 1.5494), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:48,294 - root - INFO - Step 32990: lr=1.00E-05, loss= 1.1311 (max= 1.5494), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:48,295 - root - INFO - Step 32990: lr=1.00E-05, loss= 1.1311 (max= 1.5494), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:48,295 - root - INFO - Step 32990: lr=1.00E-05, loss= 1.1311 (max= 1.5494), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:48,295 - root - INFO - Step 32990: lr=1.00E-05, loss= 1.1311 (max= 1.5494), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:48,295 - root - INFO - Step 32990: lr=1.00E-05, loss= 1.1311 (max= 1.5494), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:48,295 - root - INFO - Step 32990: lr=1.00E-05, loss= 1.1311 (max= 1.5494), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:12:48,295 - root - INFO - Step 32990: lr=1.00E-05, loss= 1.1311 (max= 1.5494), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-33000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-33000! Save time: 4.470308542251587 +2025-10-25 01:13:04,215 - root - INFO - Step 33000: lr=1.00E-05, loss= 1.1383 (max= 1.6451), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:13:04,215 - root - INFO - Step 33000: lr=1.00E-05, loss= 1.1383 (max= 1.6451), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:13:04,215 - root - INFO - Saving a full checkpoint at step 33000 +2025-10-25 01:13:04,215 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 01:13:04,215 - root - INFO - Step 33000: lr=1.00E-05, loss= 1.1383 (max= 1.6451), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:13:04,215 - root - INFO - Saving a full checkpoint at step 33000 +2025-10-25 01:13:04,215 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 01:13:04,215 - root - INFO - Saving a full checkpoint at step 33000 +2025-10-25 01:13:04,216 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 01:13:04,215 - root - INFO - Step 33000: lr=1.00E-05, loss= 1.1383 (max= 1.6451), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:13:04,215 - root - INFO - Step 33000: lr=1.00E-05, loss= 1.1383 (max= 1.6451), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:13:04,216 - root - INFO - Saving a full checkpoint at step 33000 +2025-10-25 01:13:04,216 - root - INFO - Saving a full checkpoint at step 33000 +2025-10-25 01:13:04,216 - root - INFO - Step 33000: lr=1.00E-05, loss= 1.1383 (max= 1.6451), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:13:04,216 - root - INFO - Step 33000: lr=1.00E-05, loss= 1.1383 (max= 1.6451), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:13:04,216 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 01:13:04,216 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 01:13:04,216 - root - INFO - Saving a full checkpoint at step 33000 +2025-10-25 01:13:04,216 - root - INFO - Saving a full checkpoint at step 33000 +2025-10-25 01:13:04,216 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 01:13:04,216 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 01:13:04,216 - root - INFO - Step 33000: lr=1.00E-05, loss= 1.1383 (max= 1.6451), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:13:04,216 - root - INFO - Saving a full checkpoint at step 33000 +2025-10-25 01:13:04,216 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 01:13:19,013 - root - INFO - Finished saving the checkpoint in 14.80 seconds +2025-10-25 01:13:19,018 - root - INFO - Finished saving the checkpoint in 14.80 seconds +2025-10-25 01:13:19,019 - root - INFO - Finished saving the checkpoint in 14.80 seconds +2025-10-25 01:13:19,019 - root - INFO - Finished saving the checkpoint in 14.80 seconds +2025-10-25 01:13:19,019 - root - INFO - Finished saving the checkpoint in 14.80 seconds +2025-10-25 01:13:19,019 - root - INFO - Finished saving the checkpoint in 14.80 seconds +2025-10-25 01:13:19,019 - root - INFO - Finished saving the checkpoint in 14.80 seconds +2025-10-25 01:13:19,020 - root - INFO - Finished saving the checkpoint in 14.80 seconds +2025-10-25 01:13:34,869 - root - INFO - Step 33010: lr=1.00E-05, loss= 1.1161 (max= 1.7805), tps=10691, mfu=22.27%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:13:34,869 - root - INFO - Step 33010: lr=1.00E-05, loss= 1.1161 (max= 1.7805), tps=10691, mfu=22.27%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:13:34,869 - root - INFO - Step 33010: lr=1.00E-05, loss= 1.1161 (max= 1.7805), tps=10691, mfu=22.27%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:13:34,869 - root - INFO - Step 33010: lr=1.00E-05, loss= 1.1161 (max= 1.7805), tps=10691, mfu=22.27%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:13:34,869 - root - INFO - Step 33010: lr=1.00E-05, loss= 1.1161 (max= 1.7805), tps=10691, mfu=22.27%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:13:34,869 - root - INFO - Step 33010: lr=1.00E-05, loss= 1.1161 (max= 1.7805), tps=10691, mfu=22.27%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:13:34,869 - root - INFO - Step 33010: lr=1.00E-05, loss= 1.1161 (max= 1.7805), tps=10691, mfu=22.27%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:13:34,869 - root - INFO - Step 33010: lr=1.00E-05, loss= 1.1161 (max= 1.7805), tps=10691, mfu=22.27%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:13:50,817 - root - INFO - Step 33020: lr=1.00E-05, loss= 1.1627 (max= 1.7159), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:13:50,817 - root - INFO - Step 33020: lr=1.00E-05, loss= 1.1627 (max= 1.7159), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:13:50,817 - root - INFO - Step 33020: lr=1.00E-05, loss= 1.1627 (max= 1.7159), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:13:50,817 - root - INFO - Step 33020: lr=1.00E-05, loss= 1.1627 (max= 1.7159), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:13:50,817 - root - INFO - Step 33020: lr=1.00E-05, loss= 1.1627 (max= 1.7159), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:13:50,817 - root - INFO - Step 33020: lr=1.00E-05, loss= 1.1627 (max= 1.7159), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:13:50,817 - root - INFO - Step 33020: lr=1.00E-05, loss= 1.1627 (max= 1.7159), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:13:50,817 - root - INFO - Step 33020: lr=1.00E-05, loss= 1.1627 (max= 1.7159), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:14:06,771 - root - INFO - Step 33030: lr=1.00E-05, loss= 1.1770 (max= 1.6370), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:14:06,771 - root - INFO - Step 33030: lr=1.00E-05, loss= 1.1770 (max= 1.6370), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:14:06,771 - root - INFO - Step 33030: lr=1.00E-05, loss= 1.1770 (max= 1.6370), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:14:06,771 - root - INFO - Step 33030: lr=1.00E-05, loss= 1.1770 (max= 1.6370), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:14:06,771 - root - INFO - Step 33030: lr=1.00E-05, loss= 1.1770 (max= 1.6370), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:14:06,771 - root - INFO - Step 33030: lr=1.00E-05, loss= 1.1770 (max= 1.6370), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:14:06,771 - root - INFO - Step 33030: lr=1.00E-05, loss= 1.1770 (max= 1.6370), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:14:06,771 - root - INFO - Step 33030: lr=1.00E-05, loss= 1.1770 (max= 1.6370), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:14:22,674 - root - INFO - Step 33040: lr=1.00E-05, loss= 1.1183 (max= 1.5159), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:14:22,675 - root - INFO - Step 33040: lr=1.00E-05, loss= 1.1183 (max= 1.5159), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:14:22,675 - root - INFO - Step 33040: lr=1.00E-05, loss= 1.1183 (max= 1.5159), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:14:22,675 - root - INFO - Step 33040: lr=1.00E-05, loss= 1.1183 (max= 1.5159), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:14:22,675 - root - INFO - Step 33040: lr=1.00E-05, loss= 1.1183 (max= 1.5159), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:14:22,675 - root - INFO - Step 33040: lr=1.00E-05, loss= 1.1183 (max= 1.5159), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:14:22,675 - root - INFO - Step 33040: lr=1.00E-05, loss= 1.1183 (max= 1.5159), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:14:22,675 - root - INFO - Step 33040: lr=1.00E-05, loss= 1.1183 (max= 1.5159), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:14:38,585 - root - INFO - Step 33050: lr=1.00E-05, loss= 1.1666 (max= 1.6281), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:14:38,585 - root - INFO - Step 33050: lr=1.00E-05, loss= 1.1666 (max= 1.6281), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:14:38,585 - root - INFO - Step 33050: lr=1.00E-05, loss= 1.1666 (max= 1.6281), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:14:38,585 - root - INFO - Step 33050: lr=1.00E-05, loss= 1.1666 (max= 1.6281), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:14:38,585 - root - INFO - Step 33050: lr=1.00E-05, loss= 1.1666 (max= 1.6281), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:14:38,585 - root - INFO - Step 33050: lr=1.00E-05, loss= 1.1666 (max= 1.6281), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:14:38,585 - root - INFO - Step 33050: lr=1.00E-05, loss= 1.1666 (max= 1.6281), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:14:38,585 - root - INFO - Step 33050: lr=1.00E-05, loss= 1.1666 (max= 1.6281), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:14:54,538 - root - INFO - Step 33060: lr=1.00E-05, loss= 1.1624 (max= 1.5120), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:14:54,538 - root - INFO - Step 33060: lr=1.00E-05, loss= 1.1624 (max= 1.5120), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:14:54,538 - root - INFO - Step 33060: lr=1.00E-05, loss= 1.1624 (max= 1.5120), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:14:54,538 - root - INFO - Step 33060: lr=1.00E-05, loss= 1.1624 (max= 1.5120), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:14:54,538 - root - INFO - Step 33060: lr=1.00E-05, loss= 1.1624 (max= 1.5120), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:14:54,538 - root - INFO - Step 33060: lr=1.00E-05, loss= 1.1624 (max= 1.5120), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:14:54,538 - root - INFO - Step 33060: lr=1.00E-05, loss= 1.1624 (max= 1.5120), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:14:54,538 - root - INFO - Step 33060: lr=1.00E-05, loss= 1.1624 (max= 1.5120), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:15:10,487 - root - INFO - Step 33070: lr=1.00E-05, loss= 1.1484 (max= 1.5865), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:15:10,487 - root - INFO - Step 33070: lr=1.00E-05, loss= 1.1484 (max= 1.5865), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:15:10,487 - root - INFO - Step 33070: lr=1.00E-05, loss= 1.1484 (max= 1.5865), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:15:10,487 - root - INFO - Step 33070: lr=1.00E-05, loss= 1.1484 (max= 1.5865), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:15:10,487 - root - INFO - Step 33070: lr=1.00E-05, loss= 1.1484 (max= 1.5865), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:15:10,487 - root - INFO - Step 33070: lr=1.00E-05, loss= 1.1484 (max= 1.5865), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:15:10,487 - root - INFO - Step 33070: lr=1.00E-05, loss= 1.1484 (max= 1.5865), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:15:10,487 - root - INFO - Step 33070: lr=1.00E-05, loss= 1.1484 (max= 1.5865), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:15:26,435 - root - INFO - Step 33080: lr=1.00E-05, loss= 1.1642 (max= 1.5403), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:15:26,435 - root - INFO - Step 33080: lr=1.00E-05, loss= 1.1642 (max= 1.5403), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:15:26,435 - root - INFO - Step 33080: lr=1.00E-05, loss= 1.1642 (max= 1.5403), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:15:26,435 - root - INFO - Step 33080: lr=1.00E-05, loss= 1.1642 (max= 1.5403), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:15:26,435 - root - INFO - Step 33080: lr=1.00E-05, loss= 1.1642 (max= 1.5403), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:15:26,435 - root - INFO - Step 33080: lr=1.00E-05, loss= 1.1642 (max= 1.5403), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:15:26,435 - root - INFO - Step 33080: lr=1.00E-05, loss= 1.1642 (max= 1.5403), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:15:26,435 - root - INFO - Step 33080: lr=1.00E-05, loss= 1.1642 (max= 1.5403), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:15:42,406 - root - INFO - Step 33090: lr=1.00E-05, loss= 1.1523 (max= 1.5318), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:15:42,406 - root - INFO - Step 33090: lr=1.00E-05, loss= 1.1523 (max= 1.5318), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:15:42,406 - root - INFO - Step 33090: lr=1.00E-05, loss= 1.1523 (max= 1.5318), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:15:42,406 - root - INFO - Step 33090: lr=1.00E-05, loss= 1.1523 (max= 1.5318), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:15:42,406 - root - INFO - Step 33090: lr=1.00E-05, loss= 1.1523 (max= 1.5318), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:15:42,406 - root - INFO - Step 33090: lr=1.00E-05, loss= 1.1523 (max= 1.5318), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:15:42,406 - root - INFO - Step 33090: lr=1.00E-05, loss= 1.1523 (max= 1.5318), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:15:42,406 - root - INFO - Step 33090: lr=1.00E-05, loss= 1.1523 (max= 1.5318), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:15:58,303 - root - INFO - Step 33100: lr=1.00E-05, loss= 1.1416 (max= 1.5291), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:15:58,303 - root - INFO - Step 33100: lr=1.00E-05, loss= 1.1416 (max= 1.5291), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:15:58,304 - root - INFO - Step 33100: lr=1.00E-05, loss= 1.1416 (max= 1.5291), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:15:58,304 - root - INFO - Step 33100: lr=1.00E-05, loss= 1.1416 (max= 1.5291), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:15:58,304 - root - INFO - Step 33100: lr=1.00E-05, loss= 1.1416 (max= 1.5291), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:15:58,304 - root - INFO - Step 33100: lr=1.00E-05, loss= 1.1416 (max= 1.5291), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:15:58,304 - root - INFO - Step 33100: lr=1.00E-05, loss= 1.1416 (max= 1.5291), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:15:58,304 - root - INFO - Step 33100: lr=1.00E-05, loss= 1.1416 (max= 1.5291), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:16:14,286 - root - INFO - Step 33110: lr=1.00E-05, loss= 1.1980 (max= 1.5758), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:16:14,286 - root - INFO - Step 33110: lr=1.00E-05, loss= 1.1980 (max= 1.5758), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:16:14,286 - root - INFO - Step 33110: lr=1.00E-05, loss= 1.1980 (max= 1.5758), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:16:14,286 - root - INFO - Step 33110: lr=1.00E-05, loss= 1.1980 (max= 1.5758), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:16:14,286 - root - INFO - Step 33110: lr=1.00E-05, loss= 1.1980 (max= 1.5758), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:16:14,286 - root - INFO - Step 33110: lr=1.00E-05, loss= 1.1980 (max= 1.5758), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:16:14,286 - root - INFO - Step 33110: lr=1.00E-05, loss= 1.1980 (max= 1.5758), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:16:14,286 - root - INFO - Step 33110: lr=1.00E-05, loss= 1.1980 (max= 1.5758), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:16:30,161 - root - INFO - Step 33120: lr=1.00E-05, loss= 1.1742 (max= 1.4996), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:16:30,161 - root - INFO - Step 33120: lr=1.00E-05, loss= 1.1742 (max= 1.4996), tps=20645, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:16:30,161 - root - INFO - Step 33120: lr=1.00E-05, loss= 1.1742 (max= 1.4996), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:16:30,161 - root - INFO - Step 33120: lr=1.00E-05, loss= 1.1742 (max= 1.4996), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:16:30,161 - root - INFO - Step 33120: lr=1.00E-05, loss= 1.1742 (max= 1.4996), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:16:30,161 - root - INFO - Step 33120: lr=1.00E-05, loss= 1.1742 (max= 1.4996), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:16:30,161 - root - INFO - Step 33120: lr=1.00E-05, loss= 1.1742 (max= 1.4996), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:16:30,161 - root - INFO - Step 33120: lr=1.00E-05, loss= 1.1742 (max= 1.4996), tps=20645, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:16:46,116 - root - INFO - Step 33130: lr=1.00E-05, loss= 1.1393 (max= 1.5928), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:16:46,116 - root - INFO - Step 33130: lr=1.00E-05, loss= 1.1393 (max= 1.5928), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:16:46,116 - root - INFO - Step 33130: lr=1.00E-05, loss= 1.1393 (max= 1.5928), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:16:46,116 - root - INFO - Step 33130: lr=1.00E-05, loss= 1.1393 (max= 1.5928), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:16:46,116 - root - INFO - Step 33130: lr=1.00E-05, loss= 1.1393 (max= 1.5928), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:16:46,116 - root - INFO - Step 33130: lr=1.00E-05, loss= 1.1393 (max= 1.5928), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:16:46,117 - root - INFO - Step 33130: lr=1.00E-05, loss= 1.1393 (max= 1.5928), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:16:46,117 - root - INFO - Step 33130: lr=1.00E-05, loss= 1.1393 (max= 1.5928), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:02,077 - root - INFO - Step 33140: lr=1.00E-05, loss= 1.1543 (max= 1.4891), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:02,077 - root - INFO - Step 33140: lr=1.00E-05, loss= 1.1543 (max= 1.4891), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:02,077 - root - INFO - Step 33140: lr=1.00E-05, loss= 1.1543 (max= 1.4891), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:02,078 - root - INFO - Step 33140: lr=1.00E-05, loss= 1.1543 (max= 1.4891), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:02,078 - root - INFO - Step 33140: lr=1.00E-05, loss= 1.1543 (max= 1.4891), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:02,078 - root - INFO - Step 33140: lr=1.00E-05, loss= 1.1543 (max= 1.4891), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:02,078 - root - INFO - Step 33140: lr=1.00E-05, loss= 1.1543 (max= 1.4891), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:02,078 - root - INFO - Step 33140: lr=1.00E-05, loss= 1.1543 (max= 1.4891), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:17,983 - root - INFO - Step 33150: lr=1.00E-05, loss= 1.1496 (max= 1.6174), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:17,983 - root - INFO - Step 33150: lr=1.00E-05, loss= 1.1496 (max= 1.6174), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:17,984 - root - INFO - Step 33150: lr=1.00E-05, loss= 1.1496 (max= 1.6174), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:17,984 - root - INFO - Step 33150: lr=1.00E-05, loss= 1.1496 (max= 1.6174), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:17,984 - root - INFO - Step 33150: lr=1.00E-05, loss= 1.1496 (max= 1.6174), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:17,984 - root - INFO - Step 33150: lr=1.00E-05, loss= 1.1496 (max= 1.6174), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:17,984 - root - INFO - Step 33150: lr=1.00E-05, loss= 1.1496 (max= 1.6174), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:17,984 - root - INFO - Step 33150: lr=1.00E-05, loss= 1.1496 (max= 1.6174), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:33,929 - root - INFO - Step 33160: lr=1.00E-05, loss= 1.1901 (max= 1.5636), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:17:33,929 - root - INFO - Step 33160: lr=1.00E-05, loss= 1.1901 (max= 1.5636), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:17:33,930 - root - INFO - Step 33160: lr=1.00E-05, loss= 1.1901 (max= 1.5636), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:17:33,930 - root - INFO - Step 33160: lr=1.00E-05, loss= 1.1901 (max= 1.5636), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:17:33,930 - root - INFO - Step 33160: lr=1.00E-05, loss= 1.1901 (max= 1.5636), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:17:33,930 - root - INFO - Step 33160: lr=1.00E-05, loss= 1.1901 (max= 1.5636), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:17:33,930 - root - INFO - Step 33160: lr=1.00E-05, loss= 1.1901 (max= 1.5636), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:17:33,930 - root - INFO - Step 33160: lr=1.00E-05, loss= 1.1901 (max= 1.5636), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:17:49,841 - root - INFO - Step 33170: lr=1.00E-05, loss= 1.1890 (max= 1.6067), tps=20597, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:49,841 - root - INFO - Step 33170: lr=1.00E-05, loss= 1.1890 (max= 1.6067), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:49,841 - root - INFO - Step 33170: lr=1.00E-05, loss= 1.1890 (max= 1.6067), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:49,841 - root - INFO - Step 33170: lr=1.00E-05, loss= 1.1890 (max= 1.6067), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:49,841 - root - INFO - Step 33170: lr=1.00E-05, loss= 1.1890 (max= 1.6067), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:49,841 - root - INFO - Step 33170: lr=1.00E-05, loss= 1.1890 (max= 1.6067), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:49,842 - root - INFO - Step 33170: lr=1.00E-05, loss= 1.1890 (max= 1.6067), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:17:49,842 - root - INFO - Step 33170: lr=1.00E-05, loss= 1.1890 (max= 1.6067), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:05,783 - root - INFO - Step 33180: lr=1.00E-05, loss= 1.1698 (max= 1.6339), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:05,784 - root - INFO - Step 33180: lr=1.00E-05, loss= 1.1698 (max= 1.6339), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:05,784 - root - INFO - Step 33180: lr=1.00E-05, loss= 1.1698 (max= 1.6339), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:05,784 - root - INFO - Step 33180: lr=1.00E-05, loss= 1.1698 (max= 1.6339), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:05,784 - root - INFO - Step 33180: lr=1.00E-05, loss= 1.1698 (max= 1.6339), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:05,784 - root - INFO - Step 33180: lr=1.00E-05, loss= 1.1698 (max= 1.6339), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:05,784 - root - INFO - Step 33180: lr=1.00E-05, loss= 1.1698 (max= 1.6339), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:05,784 - root - INFO - Step 33180: lr=1.00E-05, loss= 1.1698 (max= 1.6339), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:21,712 - root - INFO - Step 33190: lr=1.00E-05, loss= 1.1655 (max= 1.6438), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:18:21,713 - root - INFO - Step 33190: lr=1.00E-05, loss= 1.1655 (max= 1.6438), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:18:21,713 - root - INFO - Step 33190: lr=1.00E-05, loss= 1.1655 (max= 1.6438), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:18:21,713 - root - INFO - Step 33190: lr=1.00E-05, loss= 1.1655 (max= 1.6438), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:18:21,713 - root - INFO - Step 33190: lr=1.00E-05, loss= 1.1655 (max= 1.6438), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:18:21,713 - root - INFO - Step 33190: lr=1.00E-05, loss= 1.1655 (max= 1.6438), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:18:21,713 - root - INFO - Step 33190: lr=1.00E-05, loss= 1.1655 (max= 1.6438), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:18:21,713 - root - INFO - Step 33190: lr=1.00E-05, loss= 1.1655 (max= 1.6438), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:18:37,606 - root - INFO - Step 33200: lr=1.00E-05, loss= 1.1274 (max= 1.6630), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:37,606 - root - INFO - Step 33200: lr=1.00E-05, loss= 1.1274 (max= 1.6630), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:37,606 - root - INFO - Step 33200: lr=1.00E-05, loss= 1.1274 (max= 1.6630), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:37,606 - root - INFO - Step 33200: lr=1.00E-05, loss= 1.1274 (max= 1.6630), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:37,606 - root - INFO - Step 33200: lr=1.00E-05, loss= 1.1274 (max= 1.6630), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:37,606 - root - INFO - Step 33200: lr=1.00E-05, loss= 1.1274 (max= 1.6630), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:37,606 - root - INFO - Step 33200: lr=1.00E-05, loss= 1.1274 (max= 1.6630), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:37,606 - root - INFO - Step 33200: lr=1.00E-05, loss= 1.1274 (max= 1.6630), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:53,503 - root - INFO - Step 33210: lr=1.00E-05, loss= 1.1728 (max= 1.6688), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:53,503 - root - INFO - Step 33210: lr=1.00E-05, loss= 1.1728 (max= 1.6688), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:53,503 - root - INFO - Step 33210: lr=1.00E-05, loss= 1.1728 (max= 1.6688), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:53,503 - root - INFO - Step 33210: lr=1.00E-05, loss= 1.1728 (max= 1.6688), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:53,503 - root - INFO - Step 33210: lr=1.00E-05, loss= 1.1728 (max= 1.6688), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:53,503 - root - INFO - Step 33210: lr=1.00E-05, loss= 1.1728 (max= 1.6688), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:53,503 - root - INFO - Step 33210: lr=1.00E-05, loss= 1.1728 (max= 1.6688), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:18:53,503 - root - INFO - Step 33210: lr=1.00E-05, loss= 1.1728 (max= 1.6688), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:09,431 - root - INFO - Step 33220: lr=1.00E-05, loss= 1.1651 (max= 1.5438), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:09,431 - root - INFO - Step 33220: lr=1.00E-05, loss= 1.1651 (max= 1.5438), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:09,431 - root - INFO - Step 33220: lr=1.00E-05, loss= 1.1651 (max= 1.5438), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:09,431 - root - INFO - Step 33220: lr=1.00E-05, loss= 1.1651 (max= 1.5438), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:09,431 - root - INFO - Step 33220: lr=1.00E-05, loss= 1.1651 (max= 1.5438), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:09,431 - root - INFO - Step 33220: lr=1.00E-05, loss= 1.1651 (max= 1.5438), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:09,431 - root - INFO - Step 33220: lr=1.00E-05, loss= 1.1651 (max= 1.5438), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:09,431 - root - INFO - Step 33220: lr=1.00E-05, loss= 1.1651 (max= 1.5438), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:25,413 - root - INFO - Step 33230: lr=1.00E-05, loss= 1.1576 (max= 1.4878), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:19:25,413 - root - INFO - Step 33230: lr=1.00E-05, loss= 1.1576 (max= 1.4878), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:19:25,413 - root - INFO - Step 33230: lr=1.00E-05, loss= 1.1576 (max= 1.4878), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:19:25,414 - root - INFO - Step 33230: lr=1.00E-05, loss= 1.1576 (max= 1.4878), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:19:25,414 - root - INFO - Step 33230: lr=1.00E-05, loss= 1.1576 (max= 1.4878), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:19:25,414 - root - INFO - Step 33230: lr=1.00E-05, loss= 1.1576 (max= 1.4878), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:19:25,414 - root - INFO - Step 33230: lr=1.00E-05, loss= 1.1576 (max= 1.4878), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:19:25,414 - root - INFO - Step 33230: lr=1.00E-05, loss= 1.1576 (max= 1.4878), tps=20506, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:19:41,345 - root - INFO - Step 33240: lr=1.00E-05, loss= 1.1433 (max= 1.5919), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:41,345 - root - INFO - Step 33240: lr=1.00E-05, loss= 1.1433 (max= 1.5919), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:41,345 - root - INFO - Step 33240: lr=1.00E-05, loss= 1.1433 (max= 1.5919), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:41,345 - root - INFO - Step 33240: lr=1.00E-05, loss= 1.1433 (max= 1.5919), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:41,345 - root - INFO - Step 33240: lr=1.00E-05, loss= 1.1433 (max= 1.5919), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:41,345 - root - INFO - Step 33240: lr=1.00E-05, loss= 1.1433 (max= 1.5919), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:41,345 - root - INFO - Step 33240: lr=1.00E-05, loss= 1.1433 (max= 1.5919), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:41,346 - root - INFO - Step 33240: lr=1.00E-05, loss= 1.1433 (max= 1.5919), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:57,287 - root - INFO - Step 33250: lr=1.00E-05, loss= 1.1738 (max= 1.5518), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:57,287 - root - INFO - Step 33250: lr=1.00E-05, loss= 1.1738 (max= 1.5518), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:57,287 - root - INFO - Step 33250: lr=1.00E-05, loss= 1.1738 (max= 1.5518), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:57,287 - root - INFO - Step 33250: lr=1.00E-05, loss= 1.1738 (max= 1.5518), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:57,287 - root - INFO - Step 33250: lr=1.00E-05, loss= 1.1738 (max= 1.5518), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:57,287 - root - INFO - Step 33250: lr=1.00E-05, loss= 1.1738 (max= 1.5518), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:57,287 - root - INFO - Step 33250: lr=1.00E-05, loss= 1.1738 (max= 1.5518), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:19:57,287 - root - INFO - Step 33250: lr=1.00E-05, loss= 1.1738 (max= 1.5518), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:13,277 - root - INFO - Step 33260: lr=1.00E-05, loss= 1.1876 (max= 1.4428), tps=20497, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:13,277 - root - INFO - Step 33260: lr=1.00E-05, loss= 1.1876 (max= 1.4428), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:13,277 - root - INFO - Step 33260: lr=1.00E-05, loss= 1.1876 (max= 1.4428), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:13,277 - root - INFO - Step 33260: lr=1.00E-05, loss= 1.1876 (max= 1.4428), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:13,277 - root - INFO - Step 33260: lr=1.00E-05, loss= 1.1876 (max= 1.4428), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:13,277 - root - INFO - Step 33260: lr=1.00E-05, loss= 1.1876 (max= 1.4428), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:13,277 - root - INFO - Step 33260: lr=1.00E-05, loss= 1.1876 (max= 1.4428), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:13,277 - root - INFO - Step 33260: lr=1.00E-05, loss= 1.1876 (max= 1.4428), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:29,184 - root - INFO - Step 33270: lr=1.00E-05, loss= 1.1735 (max= 1.6395), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:29,184 - root - INFO - Step 33270: lr=1.00E-05, loss= 1.1735 (max= 1.6395), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:29,184 - root - INFO - Step 33270: lr=1.00E-05, loss= 1.1735 (max= 1.6395), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:29,184 - root - INFO - Step 33270: lr=1.00E-05, loss= 1.1735 (max= 1.6395), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:29,184 - root - INFO - Step 33270: lr=1.00E-05, loss= 1.1735 (max= 1.6395), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:29,184 - root - INFO - Step 33270: lr=1.00E-05, loss= 1.1735 (max= 1.6395), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:29,184 - root - INFO - Step 33270: lr=1.00E-05, loss= 1.1735 (max= 1.6395), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:29,184 - root - INFO - Step 33270: lr=1.00E-05, loss= 1.1735 (max= 1.6395), tps=20604, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:45,104 - root - INFO - Step 33280: lr=1.00E-05, loss= 1.1226 (max= 1.5803), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:45,104 - root - INFO - Step 33280: lr=1.00E-05, loss= 1.1226 (max= 1.5803), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:45,104 - root - INFO - Step 33280: lr=1.00E-05, loss= 1.1226 (max= 1.5803), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:45,104 - root - INFO - Step 33280: lr=1.00E-05, loss= 1.1226 (max= 1.5803), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:45,104 - root - INFO - Step 33280: lr=1.00E-05, loss= 1.1226 (max= 1.5803), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:45,104 - root - INFO - Step 33280: lr=1.00E-05, loss= 1.1226 (max= 1.5803), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:45,104 - root - INFO - Step 33280: lr=1.00E-05, loss= 1.1226 (max= 1.5803), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:20:45,105 - root - INFO - Step 33280: lr=1.00E-05, loss= 1.1226 (max= 1.5803), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:01,071 - root - INFO - Step 33290: lr=1.00E-05, loss= 1.1647 (max= 1.5027), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:01,071 - root - INFO - Step 33290: lr=1.00E-05, loss= 1.1647 (max= 1.5027), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:01,071 - root - INFO - Step 33290: lr=1.00E-05, loss= 1.1647 (max= 1.5027), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:01,072 - root - INFO - Step 33290: lr=1.00E-05, loss= 1.1647 (max= 1.5027), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:01,072 - root - INFO - Step 33290: lr=1.00E-05, loss= 1.1647 (max= 1.5027), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:01,072 - root - INFO - Step 33290: lr=1.00E-05, loss= 1.1647 (max= 1.5027), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:01,072 - root - INFO - Step 33290: lr=1.00E-05, loss= 1.1647 (max= 1.5027), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:01,072 - root - INFO - Step 33290: lr=1.00E-05, loss= 1.1647 (max= 1.5027), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:08,200 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:7378977 +2025-10-25 01:21:17,057 - root - INFO - Step 33300: lr=1.00E-05, loss= 1.1503 (max= 1.5240), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:17,057 - root - INFO - Step 33300: lr=1.00E-05, loss= 1.1503 (max= 1.5240), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:17,057 - root - INFO - Step 33300: lr=1.00E-05, loss= 1.1503 (max= 1.5240), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:17,057 - root - INFO - Step 33300: lr=1.00E-05, loss= 1.1503 (max= 1.5240), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:17,057 - root - INFO - Step 33300: lr=1.00E-05, loss= 1.1503 (max= 1.5240), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:17,057 - root - INFO - Step 33300: lr=1.00E-05, loss= 1.1503 (max= 1.5240), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:17,057 - root - INFO - Step 33300: lr=1.00E-05, loss= 1.1503 (max= 1.5240), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:17,057 - root - INFO - Step 33300: lr=1.00E-05, loss= 1.1503 (max= 1.5240), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:32,992 - root - INFO - Step 33310: lr=1.00E-05, loss= 1.1443 (max= 1.5378), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:32,992 - root - INFO - Step 33310: lr=1.00E-05, loss= 1.1443 (max= 1.5378), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:32,992 - root - INFO - Step 33310: lr=1.00E-05, loss= 1.1443 (max= 1.5378), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:32,992 - root - INFO - Step 33310: lr=1.00E-05, loss= 1.1443 (max= 1.5378), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:32,992 - root - INFO - Step 33310: lr=1.00E-05, loss= 1.1443 (max= 1.5378), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:32,992 - root - INFO - Step 33310: lr=1.00E-05, loss= 1.1443 (max= 1.5378), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:32,993 - root - INFO - Step 33310: lr=1.00E-05, loss= 1.1443 (max= 1.5378), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:32,993 - root - INFO - Step 33310: lr=1.00E-05, loss= 1.1443 (max= 1.5378), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:48,970 - root - INFO - Step 33320: lr=1.00E-05, loss= 1.1383 (max= 1.5597), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:48,970 - root - INFO - Step 33320: lr=1.00E-05, loss= 1.1383 (max= 1.5597), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:48,970 - root - INFO - Step 33320: lr=1.00E-05, loss= 1.1383 (max= 1.5597), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:48,970 - root - INFO - Step 33320: lr=1.00E-05, loss= 1.1383 (max= 1.5597), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:48,970 - root - INFO - Step 33320: lr=1.00E-05, loss= 1.1383 (max= 1.5597), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:48,970 - root - INFO - Step 33320: lr=1.00E-05, loss= 1.1383 (max= 1.5597), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:48,970 - root - INFO - Step 33320: lr=1.00E-05, loss= 1.1383 (max= 1.5597), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:21:48,970 - root - INFO - Step 33320: lr=1.00E-05, loss= 1.1383 (max= 1.5597), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:22:04,899 - root - INFO - Step 33330: lr=1.00E-05, loss= 1.1833 (max= 1.7866), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:22:04,899 - root - INFO - Step 33330: lr=1.00E-05, loss= 1.1833 (max= 1.7866), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:22:04,899 - root - INFO - Step 33330: lr=1.00E-05, loss= 1.1833 (max= 1.7866), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:22:04,899 - root - INFO - Step 33330: lr=1.00E-05, loss= 1.1833 (max= 1.7866), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:22:04,899 - root - INFO - Step 33330: lr=1.00E-05, loss= 1.1833 (max= 1.7866), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:22:04,899 - root - INFO - Step 33330: lr=1.00E-05, loss= 1.1833 (max= 1.7866), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:22:04,899 - root - INFO - Step 33330: lr=1.00E-05, loss= 1.1833 (max= 1.7866), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:22:04,899 - root - INFO - Step 33330: lr=1.00E-05, loss= 1.1833 (max= 1.7866), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:22:20,846 - root - INFO - Step 33340: lr=1.00E-05, loss= 1.1546 (max= 1.5462), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:22:20,846 - root - INFO - Step 33340: lr=1.00E-05, loss= 1.1546 (max= 1.5462), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:22:20,847 - root - INFO - Step 33340: lr=1.00E-05, loss= 1.1546 (max= 1.5462), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:22:20,847 - root - INFO - Step 33340: lr=1.00E-05, loss= 1.1546 (max= 1.5462), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:22:20,847 - root - INFO - Step 33340: lr=1.00E-05, loss= 1.1546 (max= 1.5462), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:22:20,847 - root - INFO - Step 33340: lr=1.00E-05, loss= 1.1546 (max= 1.5462), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:22:20,847 - root - INFO - Step 33340: lr=1.00E-05, loss= 1.1546 (max= 1.5462), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:22:20,847 - root - INFO - Step 33340: lr=1.00E-05, loss= 1.1546 (max= 1.5462), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:22:36,757 - root - INFO - Step 33350: lr=1.00E-05, loss= 1.1610 (max= 1.6360), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:22:36,757 - root - INFO - Step 33350: lr=1.00E-05, loss= 1.1610 (max= 1.6360), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:22:36,757 - root - INFO - Step 33350: lr=1.00E-05, loss= 1.1610 (max= 1.6360), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:22:36,757 - root - INFO - Step 33350: lr=1.00E-05, loss= 1.1610 (max= 1.6360), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:22:36,757 - root - INFO - Step 33350: lr=1.00E-05, loss= 1.1610 (max= 1.6360), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:22:36,757 - root - INFO - Step 33350: lr=1.00E-05, loss= 1.1610 (max= 1.6360), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:22:36,757 - root - INFO - Step 33350: lr=1.00E-05, loss= 1.1610 (max= 1.6360), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:22:36,757 - root - INFO - Step 33350: lr=1.00E-05, loss= 1.1610 (max= 1.6360), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:22:52,714 - root - INFO - Step 33360: lr=1.00E-05, loss= 1.1718 (max= 1.6716), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:22:52,715 - root - INFO - Step 33360: lr=1.00E-05, loss= 1.1718 (max= 1.6716), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:22:52,715 - root - INFO - Step 33360: lr=1.00E-05, loss= 1.1718 (max= 1.6716), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:22:52,715 - root - INFO - Step 33360: lr=1.00E-05, loss= 1.1718 (max= 1.6716), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:22:52,715 - root - INFO - Step 33360: lr=1.00E-05, loss= 1.1718 (max= 1.6716), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:22:52,715 - root - INFO - Step 33360: lr=1.00E-05, loss= 1.1718 (max= 1.6716), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:22:52,715 - root - INFO - Step 33360: lr=1.00E-05, loss= 1.1718 (max= 1.6716), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:22:52,715 - root - INFO - Step 33360: lr=1.00E-05, loss= 1.1718 (max= 1.6716), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:08,670 - root - INFO - Step 33370: lr=1.00E-05, loss= 1.1774 (max= 1.6040), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:23:08,670 - root - INFO - Step 33370: lr=1.00E-05, loss= 1.1774 (max= 1.6040), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:23:08,670 - root - INFO - Step 33370: lr=1.00E-05, loss= 1.1774 (max= 1.6040), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:23:08,670 - root - INFO - Step 33370: lr=1.00E-05, loss= 1.1774 (max= 1.6040), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:23:08,670 - root - INFO - Step 33370: lr=1.00E-05, loss= 1.1774 (max= 1.6040), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:23:08,670 - root - INFO - Step 33370: lr=1.00E-05, loss= 1.1774 (max= 1.6040), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:23:08,670 - root - INFO - Step 33370: lr=1.00E-05, loss= 1.1774 (max= 1.6040), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:23:08,670 - root - INFO - Step 33370: lr=1.00E-05, loss= 1.1774 (max= 1.6040), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:23:24,611 - root - INFO - Step 33380: lr=1.00E-05, loss= 1.1596 (max= 1.5239), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:24,611 - root - INFO - Step 33380: lr=1.00E-05, loss= 1.1596 (max= 1.5239), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:24,611 - root - INFO - Step 33380: lr=1.00E-05, loss= 1.1596 (max= 1.5239), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:24,611 - root - INFO - Step 33380: lr=1.00E-05, loss= 1.1596 (max= 1.5239), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:24,611 - root - INFO - Step 33380: lr=1.00E-05, loss= 1.1596 (max= 1.5239), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:24,611 - root - INFO - Step 33380: lr=1.00E-05, loss= 1.1596 (max= 1.5239), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:24,611 - root - INFO - Step 33380: lr=1.00E-05, loss= 1.1596 (max= 1.5239), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:24,611 - root - INFO - Step 33380: lr=1.00E-05, loss= 1.1596 (max= 1.5239), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:40,568 - root - INFO - Step 33390: lr=1.00E-05, loss= 1.1423 (max= 1.4523), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:40,568 - root - INFO - Step 33390: lr=1.00E-05, loss= 1.1423 (max= 1.4523), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:40,568 - root - INFO - Step 33390: lr=1.00E-05, loss= 1.1423 (max= 1.4523), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:40,568 - root - INFO - Step 33390: lr=1.00E-05, loss= 1.1423 (max= 1.4523), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:40,568 - root - INFO - Step 33390: lr=1.00E-05, loss= 1.1423 (max= 1.4523), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:40,568 - root - INFO - Step 33390: lr=1.00E-05, loss= 1.1423 (max= 1.4523), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:40,568 - root - INFO - Step 33390: lr=1.00E-05, loss= 1.1423 (max= 1.4523), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:40,569 - root - INFO - Step 33390: lr=1.00E-05, loss= 1.1423 (max= 1.4523), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:56,544 - root - INFO - Step 33400: lr=1.00E-05, loss= 1.1254 (max= 1.5132), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:56,545 - root - INFO - Step 33400: lr=1.00E-05, loss= 1.1254 (max= 1.5132), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:56,545 - root - INFO - Step 33400: lr=1.00E-05, loss= 1.1254 (max= 1.5132), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:56,545 - root - INFO - Step 33400: lr=1.00E-05, loss= 1.1254 (max= 1.5132), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:56,545 - root - INFO - Step 33400: lr=1.00E-05, loss= 1.1254 (max= 1.5132), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:56,545 - root - INFO - Step 33400: lr=1.00E-05, loss= 1.1254 (max= 1.5132), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:56,545 - root - INFO - Step 33400: lr=1.00E-05, loss= 1.1254 (max= 1.5132), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:23:56,545 - root - INFO - Step 33400: lr=1.00E-05, loss= 1.1254 (max= 1.5132), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:12,481 - root - INFO - Step 33410: lr=1.00E-05, loss= 1.1503 (max= 1.5142), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:12,481 - root - INFO - Step 33410: lr=1.00E-05, loss= 1.1503 (max= 1.5142), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:12,482 - root - INFO - Step 33410: lr=1.00E-05, loss= 1.1503 (max= 1.5142), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:12,482 - root - INFO - Step 33410: lr=1.00E-05, loss= 1.1503 (max= 1.5142), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:12,482 - root - INFO - Step 33410: lr=1.00E-05, loss= 1.1503 (max= 1.5142), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:12,482 - root - INFO - Step 33410: lr=1.00E-05, loss= 1.1503 (max= 1.5142), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:12,482 - root - INFO - Step 33410: lr=1.00E-05, loss= 1.1503 (max= 1.5142), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:12,482 - root - INFO - Step 33410: lr=1.00E-05, loss= 1.1503 (max= 1.5142), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:28,397 - root - INFO - Step 33420: lr=1.00E-05, loss= 1.1788 (max= 1.5783), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:28,397 - root - INFO - Step 33420: lr=1.00E-05, loss= 1.1788 (max= 1.5783), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:28,397 - root - INFO - Step 33420: lr=1.00E-05, loss= 1.1788 (max= 1.5783), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:28,398 - root - INFO - Step 33420: lr=1.00E-05, loss= 1.1788 (max= 1.5783), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:28,398 - root - INFO - Step 33420: lr=1.00E-05, loss= 1.1788 (max= 1.5783), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:28,398 - root - INFO - Step 33420: lr=1.00E-05, loss= 1.1788 (max= 1.5783), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:28,398 - root - INFO - Step 33420: lr=1.00E-05, loss= 1.1788 (max= 1.5783), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:28,398 - root - INFO - Step 33420: lr=1.00E-05, loss= 1.1788 (max= 1.5783), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:44,346 - root - INFO - Step 33430: lr=1.00E-05, loss= 1.1478 (max= 1.5608), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:44,346 - root - INFO - Step 33430: lr=1.00E-05, loss= 1.1478 (max= 1.5608), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:44,346 - root - INFO - Step 33430: lr=1.00E-05, loss= 1.1478 (max= 1.5608), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:44,346 - root - INFO - Step 33430: lr=1.00E-05, loss= 1.1478 (max= 1.5608), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:44,346 - root - INFO - Step 33430: lr=1.00E-05, loss= 1.1478 (max= 1.5608), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:44,346 - root - INFO - Step 33430: lr=1.00E-05, loss= 1.1478 (max= 1.5608), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:44,346 - root - INFO - Step 33430: lr=1.00E-05, loss= 1.1478 (max= 1.5608), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:24:44,346 - root - INFO - Step 33430: lr=1.00E-05, loss= 1.1478 (max= 1.5608), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:00,330 - root - INFO - Step 33440: lr=1.00E-05, loss= 1.1372 (max= 1.5513), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:00,330 - root - INFO - Step 33440: lr=1.00E-05, loss= 1.1372 (max= 1.5513), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:00,331 - root - INFO - Step 33440: lr=1.00E-05, loss= 1.1372 (max= 1.5513), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:00,331 - root - INFO - Step 33440: lr=1.00E-05, loss= 1.1372 (max= 1.5513), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:00,331 - root - INFO - Step 33440: lr=1.00E-05, loss= 1.1372 (max= 1.5513), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:00,331 - root - INFO - Step 33440: lr=1.00E-05, loss= 1.1372 (max= 1.5513), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:00,331 - root - INFO - Step 33440: lr=1.00E-05, loss= 1.1372 (max= 1.5513), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:00,331 - root - INFO - Step 33440: lr=1.00E-05, loss= 1.1372 (max= 1.5513), tps=20504, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:16,263 - root - INFO - Step 33450: lr=1.00E-05, loss= 1.1467 (max= 1.7269), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:16,263 - root - INFO - Step 33450: lr=1.00E-05, loss= 1.1467 (max= 1.7269), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:16,263 - root - INFO - Step 33450: lr=1.00E-05, loss= 1.1467 (max= 1.7269), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:16,263 - root - INFO - Step 33450: lr=1.00E-05, loss= 1.1467 (max= 1.7269), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:16,263 - root - INFO - Step 33450: lr=1.00E-05, loss= 1.1467 (max= 1.7269), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:16,263 - root - INFO - Step 33450: lr=1.00E-05, loss= 1.1467 (max= 1.7269), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:16,263 - root - INFO - Step 33450: lr=1.00E-05, loss= 1.1467 (max= 1.7269), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:16,263 - root - INFO - Step 33450: lr=1.00E-05, loss= 1.1467 (max= 1.7269), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:32,166 - root - INFO - Step 33460: lr=1.00E-05, loss= 1.1586 (max= 1.5348), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:32,166 - root - INFO - Step 33460: lr=1.00E-05, loss= 1.1586 (max= 1.5348), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:32,166 - root - INFO - Step 33460: lr=1.00E-05, loss= 1.1586 (max= 1.5348), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:32,166 - root - INFO - Step 33460: lr=1.00E-05, loss= 1.1586 (max= 1.5348), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:32,166 - root - INFO - Step 33460: lr=1.00E-05, loss= 1.1586 (max= 1.5348), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:32,166 - root - INFO - Step 33460: lr=1.00E-05, loss= 1.1586 (max= 1.5348), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:32,166 - root - INFO - Step 33460: lr=1.00E-05, loss= 1.1586 (max= 1.5348), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:32,166 - root - INFO - Step 33460: lr=1.00E-05, loss= 1.1586 (max= 1.5348), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:48,129 - root - INFO - Step 33470: lr=1.00E-05, loss= 1.1495 (max= 1.5658), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:48,129 - root - INFO - Step 33470: lr=1.00E-05, loss= 1.1495 (max= 1.5658), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:48,129 - root - INFO - Step 33470: lr=1.00E-05, loss= 1.1495 (max= 1.5658), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:48,129 - root - INFO - Step 33470: lr=1.00E-05, loss= 1.1495 (max= 1.5658), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:48,129 - root - INFO - Step 33470: lr=1.00E-05, loss= 1.1495 (max= 1.5658), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:48,129 - root - INFO - Step 33470: lr=1.00E-05, loss= 1.1495 (max= 1.5658), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:48,129 - root - INFO - Step 33470: lr=1.00E-05, loss= 1.1495 (max= 1.5658), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:25:48,129 - root - INFO - Step 33470: lr=1.00E-05, loss= 1.1495 (max= 1.5658), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:04,046 - root - INFO - Step 33480: lr=1.00E-05, loss= 1.1867 (max= 1.5943), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:04,046 - root - INFO - Step 33480: lr=1.00E-05, loss= 1.1867 (max= 1.5943), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:04,046 - root - INFO - Step 33480: lr=1.00E-05, loss= 1.1867 (max= 1.5943), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:04,046 - root - INFO - Step 33480: lr=1.00E-05, loss= 1.1867 (max= 1.5943), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:04,046 - root - INFO - Step 33480: lr=1.00E-05, loss= 1.1867 (max= 1.5943), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:04,046 - root - INFO - Step 33480: lr=1.00E-05, loss= 1.1867 (max= 1.5943), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:04,046 - root - INFO - Step 33480: lr=1.00E-05, loss= 1.1867 (max= 1.5943), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:04,046 - root - INFO - Step 33480: lr=1.00E-05, loss= 1.1867 (max= 1.5943), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:19,985 - root - INFO - Step 33490: lr=1.00E-05, loss= 1.1439 (max= 1.4865), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:19,985 - root - INFO - Step 33490: lr=1.00E-05, loss= 1.1439 (max= 1.4865), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:19,985 - root - INFO - Step 33490: lr=1.00E-05, loss= 1.1439 (max= 1.4865), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:19,985 - root - INFO - Step 33490: lr=1.00E-05, loss= 1.1439 (max= 1.4865), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:19,985 - root - INFO - Step 33490: lr=1.00E-05, loss= 1.1439 (max= 1.4865), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:19,985 - root - INFO - Step 33490: lr=1.00E-05, loss= 1.1439 (max= 1.4865), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:19,985 - root - INFO - Step 33490: lr=1.00E-05, loss= 1.1439 (max= 1.4865), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:19,985 - root - INFO - Step 33490: lr=1.00E-05, loss= 1.1439 (max= 1.4865), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:35,939 - root - INFO - Step 33500: lr=1.00E-05, loss= 1.1634 (max= 1.5049), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:35,939 - root - INFO - Step 33500: lr=1.00E-05, loss= 1.1634 (max= 1.5049), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:35,939 - root - INFO - Step 33500: lr=1.00E-05, loss= 1.1634 (max= 1.5049), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:35,939 - root - INFO - Step 33500: lr=1.00E-05, loss= 1.1634 (max= 1.5049), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:35,939 - root - INFO - Step 33500: lr=1.00E-05, loss= 1.1634 (max= 1.5049), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:35,939 - root - INFO - Step 33500: lr=1.00E-05, loss= 1.1634 (max= 1.5049), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:35,939 - root - INFO - Step 33500: lr=1.00E-05, loss= 1.1634 (max= 1.5049), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:35,939 - root - INFO - Step 33500: lr=1.00E-05, loss= 1.1634 (max= 1.5049), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:51,884 - root - INFO - Step 33510: lr=1.00E-05, loss= 1.1328 (max= 1.5825), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:51,884 - root - INFO - Step 33510: lr=1.00E-05, loss= 1.1328 (max= 1.5825), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:51,884 - root - INFO - Step 33510: lr=1.00E-05, loss= 1.1328 (max= 1.5825), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:51,885 - root - INFO - Step 33510: lr=1.00E-05, loss= 1.1328 (max= 1.5825), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:51,885 - root - INFO - Step 33510: lr=1.00E-05, loss= 1.1328 (max= 1.5825), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:51,885 - root - INFO - Step 33510: lr=1.00E-05, loss= 1.1328 (max= 1.5825), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:51,885 - root - INFO - Step 33510: lr=1.00E-05, loss= 1.1328 (max= 1.5825), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:26:51,885 - root - INFO - Step 33510: lr=1.00E-05, loss= 1.1328 (max= 1.5825), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:07,845 - root - INFO - Step 33520: lr=1.00E-05, loss= 1.1442 (max= 1.5419), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:07,845 - root - INFO - Step 33520: lr=1.00E-05, loss= 1.1442 (max= 1.5419), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:07,845 - root - INFO - Step 33520: lr=1.00E-05, loss= 1.1442 (max= 1.5419), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:07,845 - root - INFO - Step 33520: lr=1.00E-05, loss= 1.1442 (max= 1.5419), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:07,845 - root - INFO - Step 33520: lr=1.00E-05, loss= 1.1442 (max= 1.5419), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:07,845 - root - INFO - Step 33520: lr=1.00E-05, loss= 1.1442 (max= 1.5419), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:07,845 - root - INFO - Step 33520: lr=1.00E-05, loss= 1.1442 (max= 1.5419), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:07,845 - root - INFO - Step 33520: lr=1.00E-05, loss= 1.1442 (max= 1.5419), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:23,750 - root - INFO - Step 33530: lr=1.00E-05, loss= 1.1311 (max= 1.6054), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:23,750 - root - INFO - Step 33530: lr=1.00E-05, loss= 1.1311 (max= 1.6054), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:23,750 - root - INFO - Step 33530: lr=1.00E-05, loss= 1.1311 (max= 1.6054), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:23,750 - root - INFO - Step 33530: lr=1.00E-05, loss= 1.1311 (max= 1.6054), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:23,750 - root - INFO - Step 33530: lr=1.00E-05, loss= 1.1311 (max= 1.6054), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:23,750 - root - INFO - Step 33530: lr=1.00E-05, loss= 1.1311 (max= 1.6054), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:23,750 - root - INFO - Step 33530: lr=1.00E-05, loss= 1.1311 (max= 1.6054), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:23,750 - root - INFO - Step 33530: lr=1.00E-05, loss= 1.1311 (max= 1.6054), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:32,452 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:6643717 +2025-10-25 01:27:39,703 - root - INFO - Step 33540: lr=1.00E-05, loss= 1.1636 (max= 1.7946), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:39,703 - root - INFO - Step 33540: lr=1.00E-05, loss= 1.1636 (max= 1.7946), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:39,703 - root - INFO - Step 33540: lr=1.00E-05, loss= 1.1636 (max= 1.7946), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:39,703 - root - INFO - Step 33540: lr=1.00E-05, loss= 1.1636 (max= 1.7946), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:39,703 - root - INFO - Step 33540: lr=1.00E-05, loss= 1.1636 (max= 1.7946), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:39,703 - root - INFO - Step 33540: lr=1.00E-05, loss= 1.1636 (max= 1.7946), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:39,703 - root - INFO - Step 33540: lr=1.00E-05, loss= 1.1636 (max= 1.7946), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:39,703 - root - INFO - Step 33540: lr=1.00E-05, loss= 1.1636 (max= 1.7946), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:27:55,676 - root - INFO - Step 33550: lr=1.00E-05, loss= 1.2085 (max= 1.7613), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:27:55,676 - root - INFO - Step 33550: lr=1.00E-05, loss= 1.2085 (max= 1.7613), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:27:55,676 - root - INFO - Step 33550: lr=1.00E-05, loss= 1.2085 (max= 1.7613), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:27:55,676 - root - INFO - Step 33550: lr=1.00E-05, loss= 1.2085 (max= 1.7613), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:27:55,676 - root - INFO - Step 33550: lr=1.00E-05, loss= 1.2085 (max= 1.7613), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:27:55,676 - root - INFO - Step 33550: lr=1.00E-05, loss= 1.2085 (max= 1.7613), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:27:55,676 - root - INFO - Step 33550: lr=1.00E-05, loss= 1.2085 (max= 1.7613), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:27:55,676 - root - INFO - Step 33550: lr=1.00E-05, loss= 1.2085 (max= 1.7613), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:28:11,575 - root - INFO - Step 33560: lr=1.00E-05, loss= 1.1701 (max= 1.6580), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:28:11,575 - root - INFO - Step 33560: lr=1.00E-05, loss= 1.1701 (max= 1.6580), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:28:11,575 - root - INFO - Step 33560: lr=1.00E-05, loss= 1.1701 (max= 1.6580), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:28:11,575 - root - INFO - Step 33560: lr=1.00E-05, loss= 1.1701 (max= 1.6580), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:28:11,575 - root - INFO - Step 33560: lr=1.00E-05, loss= 1.1701 (max= 1.6580), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:28:11,575 - root - INFO - Step 33560: lr=1.00E-05, loss= 1.1701 (max= 1.6580), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:28:11,575 - root - INFO - Step 33560: lr=1.00E-05, loss= 1.1701 (max= 1.6580), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:28:11,575 - root - INFO - Step 33560: lr=1.00E-05, loss= 1.1701 (max= 1.6580), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:28:27,498 - root - INFO - Step 33570: lr=1.00E-05, loss= 1.1283 (max= 1.5418), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:28:27,498 - root - INFO - Step 33570: lr=1.00E-05, loss= 1.1283 (max= 1.5418), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:28:27,498 - root - INFO - Step 33570: lr=1.00E-05, loss= 1.1283 (max= 1.5418), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:28:27,499 - root - INFO - Step 33570: lr=1.00E-05, loss= 1.1283 (max= 1.5418), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:28:27,499 - root - INFO - Step 33570: lr=1.00E-05, loss= 1.1283 (max= 1.5418), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:28:27,499 - root - INFO - Step 33570: lr=1.00E-05, loss= 1.1283 (max= 1.5418), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:28:27,499 - root - INFO - Step 33570: lr=1.00E-05, loss= 1.1283 (max= 1.5418), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:28:27,499 - root - INFO - Step 33570: lr=1.00E-05, loss= 1.1283 (max= 1.5418), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:28:43,441 - root - INFO - Step 33580: lr=1.00E-05, loss= 1.1652 (max= 1.4811), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:28:43,441 - root - INFO - Step 33580: lr=1.00E-05, loss= 1.1652 (max= 1.4811), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:28:43,441 - root - INFO - Step 33580: lr=1.00E-05, loss= 1.1652 (max= 1.4811), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:28:43,441 - root - INFO - Step 33580: lr=1.00E-05, loss= 1.1652 (max= 1.4811), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:28:43,441 - root - INFO - Step 33580: lr=1.00E-05, loss= 1.1652 (max= 1.4811), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:28:43,441 - root - INFO - Step 33580: lr=1.00E-05, loss= 1.1652 (max= 1.4811), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:28:43,441 - root - INFO - Step 33580: lr=1.00E-05, loss= 1.1652 (max= 1.4811), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:28:43,441 - root - INFO - Step 33580: lr=1.00E-05, loss= 1.1652 (max= 1.4811), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:28:59,345 - root - INFO - Step 33590: lr=1.00E-05, loss= 1.1600 (max= 1.5743), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:28:59,345 - root - INFO - Step 33590: lr=1.00E-05, loss= 1.1600 (max= 1.5743), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:28:59,345 - root - INFO - Step 33590: lr=1.00E-05, loss= 1.1600 (max= 1.5743), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:28:59,345 - root - INFO - Step 33590: lr=1.00E-05, loss= 1.1600 (max= 1.5743), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:28:59,345 - root - INFO - Step 33590: lr=1.00E-05, loss= 1.1600 (max= 1.5743), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:28:59,345 - root - INFO - Step 33590: lr=1.00E-05, loss= 1.1600 (max= 1.5743), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:28:59,345 - root - INFO - Step 33590: lr=1.00E-05, loss= 1.1600 (max= 1.5743), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:28:59,345 - root - INFO - Step 33590: lr=1.00E-05, loss= 1.1600 (max= 1.5743), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:29:15,284 - root - INFO - Step 33600: lr=1.00E-05, loss= 1.1292 (max= 1.6365), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:29:15,284 - root - INFO - Step 33600: lr=1.00E-05, loss= 1.1292 (max= 1.6365), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:29:15,284 - root - INFO - Step 33600: lr=1.00E-05, loss= 1.1292 (max= 1.6365), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:29:15,285 - root - INFO - Step 33600: lr=1.00E-05, loss= 1.1292 (max= 1.6365), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:29:15,285 - root - INFO - Step 33600: lr=1.00E-05, loss= 1.1292 (max= 1.6365), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:29:15,285 - root - INFO - Step 33600: lr=1.00E-05, loss= 1.1292 (max= 1.6365), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:29:15,285 - root - INFO - Step 33600: lr=1.00E-05, loss= 1.1292 (max= 1.6365), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:29:15,285 - root - INFO - Step 33600: lr=1.00E-05, loss= 1.1292 (max= 1.6365), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:29:31,216 - root - INFO - Step 33610: lr=1.00E-05, loss= 1.1350 (max= 1.5602), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:29:31,216 - root - INFO - Step 33610: lr=1.00E-05, loss= 1.1350 (max= 1.5602), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:29:31,216 - root - INFO - Step 33610: lr=1.00E-05, loss= 1.1350 (max= 1.5602), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:29:31,216 - root - INFO - Step 33610: lr=1.00E-05, loss= 1.1350 (max= 1.5602), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:29:31,216 - root - INFO - Step 33610: lr=1.00E-05, loss= 1.1350 (max= 1.5602), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:29:31,216 - root - INFO - Step 33610: lr=1.00E-05, loss= 1.1350 (max= 1.5602), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:29:31,216 - root - INFO - Step 33610: lr=1.00E-05, loss= 1.1350 (max= 1.5602), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:29:31,216 - root - INFO - Step 33610: lr=1.00E-05, loss= 1.1350 (max= 1.5602), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:29:47,133 - root - INFO - Step 33620: lr=1.00E-05, loss= 1.1597 (max= 1.5378), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:29:47,133 - root - INFO - Step 33620: lr=1.00E-05, loss= 1.1597 (max= 1.5378), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:29:47,133 - root - INFO - Step 33620: lr=1.00E-05, loss= 1.1597 (max= 1.5378), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:29:47,133 - root - INFO - Step 33620: lr=1.00E-05, loss= 1.1597 (max= 1.5378), tps=20593, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:29:47,133 - root - INFO - Step 33620: lr=1.00E-05, loss= 1.1597 (max= 1.5378), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:29:47,133 - root - INFO - Step 33620: lr=1.00E-05, loss= 1.1597 (max= 1.5378), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:29:47,133 - root - INFO - Step 33620: lr=1.00E-05, loss= 1.1597 (max= 1.5378), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:29:47,133 - root - INFO - Step 33620: lr=1.00E-05, loss= 1.1597 (max= 1.5378), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:03,138 - root - INFO - Step 33630: lr=1.00E-05, loss= 1.1807 (max= 1.5757), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:03,138 - root - INFO - Step 33630: lr=1.00E-05, loss= 1.1807 (max= 1.5757), tps=20477, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:03,138 - root - INFO - Step 33630: lr=1.00E-05, loss= 1.1807 (max= 1.5757), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:03,138 - root - INFO - Step 33630: lr=1.00E-05, loss= 1.1807 (max= 1.5757), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:03,138 - root - INFO - Step 33630: lr=1.00E-05, loss= 1.1807 (max= 1.5757), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:03,138 - root - INFO - Step 33630: lr=1.00E-05, loss= 1.1807 (max= 1.5757), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:03,138 - root - INFO - Step 33630: lr=1.00E-05, loss= 1.1807 (max= 1.5757), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:03,138 - root - INFO - Step 33630: lr=1.00E-05, loss= 1.1807 (max= 1.5757), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:19,048 - root - INFO - Step 33640: lr=1.00E-05, loss= 1.1635 (max= 1.6071), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:19,048 - root - INFO - Step 33640: lr=1.00E-05, loss= 1.1635 (max= 1.6071), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:19,048 - root - INFO - Step 33640: lr=1.00E-05, loss= 1.1635 (max= 1.6071), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:19,048 - root - INFO - Step 33640: lr=1.00E-05, loss= 1.1635 (max= 1.6071), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:19,048 - root - INFO - Step 33640: lr=1.00E-05, loss= 1.1635 (max= 1.6071), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:19,048 - root - INFO - Step 33640: lr=1.00E-05, loss= 1.1635 (max= 1.6071), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:19,049 - root - INFO - Step 33640: lr=1.00E-05, loss= 1.1635 (max= 1.6071), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:19,049 - root - INFO - Step 33640: lr=1.00E-05, loss= 1.1635 (max= 1.6071), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:35,021 - root - INFO - Step 33650: lr=1.00E-05, loss= 1.1407 (max= 1.5174), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:35,021 - root - INFO - Step 33650: lr=1.00E-05, loss= 1.1407 (max= 1.5174), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:35,021 - root - INFO - Step 33650: lr=1.00E-05, loss= 1.1407 (max= 1.5174), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:35,021 - root - INFO - Step 33650: lr=1.00E-05, loss= 1.1407 (max= 1.5174), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:35,021 - root - INFO - Step 33650: lr=1.00E-05, loss= 1.1407 (max= 1.5174), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:35,021 - root - INFO - Step 33650: lr=1.00E-05, loss= 1.1407 (max= 1.5174), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:35,021 - root - INFO - Step 33650: lr=1.00E-05, loss= 1.1407 (max= 1.5174), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:35,021 - root - INFO - Step 33650: lr=1.00E-05, loss= 1.1407 (max= 1.5174), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:30:50,973 - root - INFO - Step 33660: lr=1.00E-05, loss= 1.1673 (max= 1.6754), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:30:50,973 - root - INFO - Step 33660: lr=1.00E-05, loss= 1.1673 (max= 1.6754), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:30:50,973 - root - INFO - Step 33660: lr=1.00E-05, loss= 1.1673 (max= 1.6754), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:30:50,973 - root - INFO - Step 33660: lr=1.00E-05, loss= 1.1673 (max= 1.6754), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:30:50,973 - root - INFO - Step 33660: lr=1.00E-05, loss= 1.1673 (max= 1.6754), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:30:50,973 - root - INFO - Step 33660: lr=1.00E-05, loss= 1.1673 (max= 1.6754), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:30:50,973 - root - INFO - Step 33660: lr=1.00E-05, loss= 1.1673 (max= 1.6754), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:30:50,973 - root - INFO - Step 33660: lr=1.00E-05, loss= 1.1673 (max= 1.6754), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:06,962 - root - INFO - Step 33670: lr=1.00E-05, loss= 1.1271 (max= 1.5281), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:06,962 - root - INFO - Step 33670: lr=1.00E-05, loss= 1.1271 (max= 1.5281), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:06,962 - root - INFO - Step 33670: lr=1.00E-05, loss= 1.1271 (max= 1.5281), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:06,962 - root - INFO - Step 33670: lr=1.00E-05, loss= 1.1271 (max= 1.5281), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:06,962 - root - INFO - Step 33670: lr=1.00E-05, loss= 1.1271 (max= 1.5281), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:06,962 - root - INFO - Step 33670: lr=1.00E-05, loss= 1.1271 (max= 1.5281), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:06,962 - root - INFO - Step 33670: lr=1.00E-05, loss= 1.1271 (max= 1.5281), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:06,962 - root - INFO - Step 33670: lr=1.00E-05, loss= 1.1271 (max= 1.5281), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:22,889 - root - INFO - Step 33680: lr=1.00E-05, loss= 1.1638 (max= 1.4882), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:22,889 - root - INFO - Step 33680: lr=1.00E-05, loss= 1.1638 (max= 1.4882), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:22,889 - root - INFO - Step 33680: lr=1.00E-05, loss= 1.1638 (max= 1.4882), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:22,889 - root - INFO - Step 33680: lr=1.00E-05, loss= 1.1638 (max= 1.4882), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:22,889 - root - INFO - Step 33680: lr=1.00E-05, loss= 1.1638 (max= 1.4882), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:22,889 - root - INFO - Step 33680: lr=1.00E-05, loss= 1.1638 (max= 1.4882), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:22,889 - root - INFO - Step 33680: lr=1.00E-05, loss= 1.1638 (max= 1.4882), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:22,889 - root - INFO - Step 33680: lr=1.00E-05, loss= 1.1638 (max= 1.4882), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:38,863 - root - INFO - Step 33690: lr=1.00E-05, loss= 1.1884 (max= 1.6151), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:38,863 - root - INFO - Step 33690: lr=1.00E-05, loss= 1.1884 (max= 1.6151), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:38,863 - root - INFO - Step 33690: lr=1.00E-05, loss= 1.1884 (max= 1.6151), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:38,863 - root - INFO - Step 33690: lr=1.00E-05, loss= 1.1884 (max= 1.6151), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:38,863 - root - INFO - Step 33690: lr=1.00E-05, loss= 1.1884 (max= 1.6151), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:38,863 - root - INFO - Step 33690: lr=1.00E-05, loss= 1.1884 (max= 1.6151), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:38,863 - root - INFO - Step 33690: lr=1.00E-05, loss= 1.1884 (max= 1.6151), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:38,863 - root - INFO - Step 33690: lr=1.00E-05, loss= 1.1884 (max= 1.6151), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:54,847 - root - INFO - Step 33700: lr=1.00E-05, loss= 1.1369 (max= 1.5661), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:54,847 - root - INFO - Step 33700: lr=1.00E-05, loss= 1.1369 (max= 1.5661), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:54,847 - root - INFO - Step 33700: lr=1.00E-05, loss= 1.1369 (max= 1.5661), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:54,847 - root - INFO - Step 33700: lr=1.00E-05, loss= 1.1369 (max= 1.5661), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:54,847 - root - INFO - Step 33700: lr=1.00E-05, loss= 1.1369 (max= 1.5661), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:54,847 - root - INFO - Step 33700: lr=1.00E-05, loss= 1.1369 (max= 1.5661), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:54,847 - root - INFO - Step 33700: lr=1.00E-05, loss= 1.1369 (max= 1.5661), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:31:54,847 - root - INFO - Step 33700: lr=1.00E-05, loss= 1.1369 (max= 1.5661), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:10,727 - root - INFO - Step 33710: lr=1.00E-05, loss= 1.1791 (max= 1.5510), tps=20639, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:10,727 - root - INFO - Step 33710: lr=1.00E-05, loss= 1.1791 (max= 1.5510), tps=20639, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:10,727 - root - INFO - Step 33710: lr=1.00E-05, loss= 1.1791 (max= 1.5510), tps=20640, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:10,727 - root - INFO - Step 33710: lr=1.00E-05, loss= 1.1791 (max= 1.5510), tps=20640, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:10,727 - root - INFO - Step 33710: lr=1.00E-05, loss= 1.1791 (max= 1.5510), tps=20640, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:10,727 - root - INFO - Step 33710: lr=1.00E-05, loss= 1.1791 (max= 1.5510), tps=20640, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:10,727 - root - INFO - Step 33710: lr=1.00E-05, loss= 1.1791 (max= 1.5510), tps=20640, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:10,727 - root - INFO - Step 33710: lr=1.00E-05, loss= 1.1791 (max= 1.5510), tps=20639, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:26,671 - root - INFO - Step 33720: lr=1.00E-05, loss= 1.1737 (max= 1.4784), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:26,671 - root - INFO - Step 33720: lr=1.00E-05, loss= 1.1737 (max= 1.4784), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:26,672 - root - INFO - Step 33720: lr=1.00E-05, loss= 1.1737 (max= 1.4784), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:26,672 - root - INFO - Step 33720: lr=1.00E-05, loss= 1.1737 (max= 1.4784), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:26,672 - root - INFO - Step 33720: lr=1.00E-05, loss= 1.1737 (max= 1.4784), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:26,672 - root - INFO - Step 33720: lr=1.00E-05, loss= 1.1737 (max= 1.4784), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:26,672 - root - INFO - Step 33720: lr=1.00E-05, loss= 1.1737 (max= 1.4784), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:26,672 - root - INFO - Step 33720: lr=1.00E-05, loss= 1.1737 (max= 1.4784), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:42,564 - root - INFO - Step 33730: lr=1.00E-05, loss= 1.1860 (max= 1.8388), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:32:42,564 - root - INFO - Step 33730: lr=1.00E-05, loss= 1.1860 (max= 1.8388), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:32:42,564 - root - INFO - Step 33730: lr=1.00E-05, loss= 1.1860 (max= 1.8388), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:32:42,564 - root - INFO - Step 33730: lr=1.00E-05, loss= 1.1860 (max= 1.8388), tps=20623, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:32:42,564 - root - INFO - Step 33730: lr=1.00E-05, loss= 1.1860 (max= 1.8388), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:32:42,565 - root - INFO - Step 33730: lr=1.00E-05, loss= 1.1860 (max= 1.8388), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:32:42,565 - root - INFO - Step 33730: lr=1.00E-05, loss= 1.1860 (max= 1.8388), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:32:42,565 - root - INFO - Step 33730: lr=1.00E-05, loss= 1.1860 (max= 1.8388), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:32:58,474 - root - INFO - Step 33740: lr=1.00E-05, loss= 1.1785 (max= 1.7321), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:58,474 - root - INFO - Step 33740: lr=1.00E-05, loss= 1.1785 (max= 1.7321), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:58,475 - root - INFO - Step 33740: lr=1.00E-05, loss= 1.1785 (max= 1.7321), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:58,475 - root - INFO - Step 33740: lr=1.00E-05, loss= 1.1785 (max= 1.7321), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:58,475 - root - INFO - Step 33740: lr=1.00E-05, loss= 1.1785 (max= 1.7321), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:58,475 - root - INFO - Step 33740: lr=1.00E-05, loss= 1.1785 (max= 1.7321), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:58,475 - root - INFO - Step 33740: lr=1.00E-05, loss= 1.1785 (max= 1.7321), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:32:58,475 - root - INFO - Step 33740: lr=1.00E-05, loss= 1.1785 (max= 1.7321), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:33:14,464 - root - INFO - Step 33750: lr=1.00E-05, loss= 1.1914 (max= 1.4631), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:33:14,464 - root - INFO - Step 33750: lr=1.00E-05, loss= 1.1914 (max= 1.4631), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:33:14,464 - root - INFO - Step 33750: lr=1.00E-05, loss= 1.1914 (max= 1.4631), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:33:14,464 - root - INFO - Step 33750: lr=1.00E-05, loss= 1.1914 (max= 1.4631), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:33:14,464 - root - INFO - Step 33750: lr=1.00E-05, loss= 1.1914 (max= 1.4631), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:33:14,464 - root - INFO - Step 33750: lr=1.00E-05, loss= 1.1914 (max= 1.4631), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:33:14,464 - root - INFO - Step 33750: lr=1.00E-05, loss= 1.1914 (max= 1.4631), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:33:14,464 - root - INFO - Step 33750: lr=1.00E-05, loss= 1.1914 (max= 1.4631), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:33:30,402 - root - INFO - Step 33760: lr=1.00E-05, loss= 1.1719 (max= 1.5657), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:33:30,402 - root - INFO - Step 33760: lr=1.00E-05, loss= 1.1719 (max= 1.5657), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:33:30,402 - root - INFO - Step 33760: lr=1.00E-05, loss= 1.1719 (max= 1.5657), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:33:30,402 - root - INFO - Step 33760: lr=1.00E-05, loss= 1.1719 (max= 1.5657), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:33:30,402 - root - INFO - Step 33760: lr=1.00E-05, loss= 1.1719 (max= 1.5657), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:33:30,402 - root - INFO - Step 33760: lr=1.00E-05, loss= 1.1719 (max= 1.5657), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:33:30,402 - root - INFO - Step 33760: lr=1.00E-05, loss= 1.1719 (max= 1.5657), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:33:30,402 - root - INFO - Step 33760: lr=1.00E-05, loss= 1.1719 (max= 1.5657), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:33:46,297 - root - INFO - Step 33770: lr=1.00E-05, loss= 1.1701 (max= 1.5898), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:33:46,297 - root - INFO - Step 33770: lr=1.00E-05, loss= 1.1701 (max= 1.5898), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:33:46,297 - root - INFO - Step 33770: lr=1.00E-05, loss= 1.1701 (max= 1.5898), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:33:46,297 - root - INFO - Step 33770: lr=1.00E-05, loss= 1.1701 (max= 1.5898), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:33:46,297 - root - INFO - Step 33770: lr=1.00E-05, loss= 1.1701 (max= 1.5898), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:33:46,297 - root - INFO - Step 33770: lr=1.00E-05, loss= 1.1701 (max= 1.5898), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:33:46,297 - root - INFO - Step 33770: lr=1.00E-05, loss= 1.1701 (max= 1.5898), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:33:46,297 - root - INFO - Step 33770: lr=1.00E-05, loss= 1.1701 (max= 1.5898), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:34:02,218 - root - INFO - Step 33780: lr=1.00E-05, loss= 1.1408 (max= 1.5742), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:34:02,218 - root - INFO - Step 33780: lr=1.00E-05, loss= 1.1408 (max= 1.5742), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:34:02,219 - root - INFO - Step 33780: lr=1.00E-05, loss= 1.1408 (max= 1.5742), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:34:02,219 - root - INFO - Step 33780: lr=1.00E-05, loss= 1.1408 (max= 1.5742), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:34:02,219 - root - INFO - Step 33780: lr=1.00E-05, loss= 1.1408 (max= 1.5742), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:34:02,219 - root - INFO - Step 33780: lr=1.00E-05, loss= 1.1408 (max= 1.5742), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:34:02,219 - root - INFO - Step 33780: lr=1.00E-05, loss= 1.1408 (max= 1.5742), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:34:02,219 - root - INFO - Step 33780: lr=1.00E-05, loss= 1.1408 (max= 1.5742), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:34:18,186 - root - INFO - Step 33790: lr=1.00E-05, loss= 1.1721 (max= 1.5497), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:34:18,186 - root - INFO - Step 33790: lr=1.00E-05, loss= 1.1721 (max= 1.5497), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:34:18,186 - root - INFO - Step 33790: lr=1.00E-05, loss= 1.1721 (max= 1.5497), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:34:18,186 - root - INFO - Step 33790: lr=1.00E-05, loss= 1.1721 (max= 1.5497), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:34:18,186 - root - INFO - Step 33790: lr=1.00E-05, loss= 1.1721 (max= 1.5497), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:34:18,186 - root - INFO - Step 33790: lr=1.00E-05, loss= 1.1721 (max= 1.5497), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:34:18,186 - root - INFO - Step 33790: lr=1.00E-05, loss= 1.1721 (max= 1.5497), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:34:18,187 - root - INFO - Step 33790: lr=1.00E-05, loss= 1.1721 (max= 1.5497), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:34:34,118 - root - INFO - Step 33800: lr=1.00E-05, loss= 1.1169 (max= 1.6731), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:34:34,118 - root - INFO - Step 33800: lr=1.00E-05, loss= 1.1169 (max= 1.6731), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:34:34,119 - root - INFO - Step 33800: lr=1.00E-05, loss= 1.1169 (max= 1.6731), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:34:34,119 - root - INFO - Step 33800: lr=1.00E-05, loss= 1.1169 (max= 1.6731), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:34:34,119 - root - INFO - Step 33800: lr=1.00E-05, loss= 1.1169 (max= 1.6731), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:34:34,119 - root - INFO - Step 33800: lr=1.00E-05, loss= 1.1169 (max= 1.6731), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:34:34,119 - root - INFO - Step 33800: lr=1.00E-05, loss= 1.1169 (max= 1.6731), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:34:34,119 - root - INFO - Step 33800: lr=1.00E-05, loss= 1.1169 (max= 1.6731), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:34:50,076 - root - INFO - Step 33810: lr=1.00E-05, loss= 1.1271 (max= 1.5524), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:34:50,076 - root - INFO - Step 33810: lr=1.00E-05, loss= 1.1271 (max= 1.5524), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:34:50,076 - root - INFO - Step 33810: lr=1.00E-05, loss= 1.1271 (max= 1.5524), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:34:50,076 - root - INFO - Step 33810: lr=1.00E-05, loss= 1.1271 (max= 1.5524), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:34:50,076 - root - INFO - Step 33810: lr=1.00E-05, loss= 1.1271 (max= 1.5524), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:34:50,076 - root - INFO - Step 33810: lr=1.00E-05, loss= 1.1271 (max= 1.5524), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:34:50,076 - root - INFO - Step 33810: lr=1.00E-05, loss= 1.1271 (max= 1.5524), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:34:50,076 - root - INFO - Step 33810: lr=1.00E-05, loss= 1.1271 (max= 1.5524), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:35:06,024 - root - INFO - Step 33820: lr=1.00E-05, loss= 1.1435 (max= 1.5007), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:06,024 - root - INFO - Step 33820: lr=1.00E-05, loss= 1.1435 (max= 1.5007), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:06,024 - root - INFO - Step 33820: lr=1.00E-05, loss= 1.1435 (max= 1.5007), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:06,024 - root - INFO - Step 33820: lr=1.00E-05, loss= 1.1435 (max= 1.5007), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:06,024 - root - INFO - Step 33820: lr=1.00E-05, loss= 1.1435 (max= 1.5007), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:06,024 - root - INFO - Step 33820: lr=1.00E-05, loss= 1.1435 (max= 1.5007), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:06,025 - root - INFO - Step 33820: lr=1.00E-05, loss= 1.1435 (max= 1.5007), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:06,025 - root - INFO - Step 33820: lr=1.00E-05, loss= 1.1435 (max= 1.5007), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:22,004 - root - INFO - Step 33830: lr=1.00E-05, loss= 1.1623 (max= 1.5127), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:22,004 - root - INFO - Step 33830: lr=1.00E-05, loss= 1.1623 (max= 1.5127), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:22,004 - root - INFO - Step 33830: lr=1.00E-05, loss= 1.1623 (max= 1.5127), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:22,004 - root - INFO - Step 33830: lr=1.00E-05, loss= 1.1623 (max= 1.5127), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:22,004 - root - INFO - Step 33830: lr=1.00E-05, loss= 1.1623 (max= 1.5127), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:22,004 - root - INFO - Step 33830: lr=1.00E-05, loss= 1.1623 (max= 1.5127), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:22,004 - root - INFO - Step 33830: lr=1.00E-05, loss= 1.1623 (max= 1.5127), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:22,005 - root - INFO - Step 33830: lr=1.00E-05, loss= 1.1623 (max= 1.5127), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:37,969 - root - INFO - Step 33840: lr=1.00E-05, loss= 1.1507 (max= 1.4767), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:37,969 - root - INFO - Step 33840: lr=1.00E-05, loss= 1.1507 (max= 1.4767), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:37,969 - root - INFO - Step 33840: lr=1.00E-05, loss= 1.1507 (max= 1.4767), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:37,969 - root - INFO - Step 33840: lr=1.00E-05, loss= 1.1507 (max= 1.4767), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:37,969 - root - INFO - Step 33840: lr=1.00E-05, loss= 1.1507 (max= 1.4767), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:37,969 - root - INFO - Step 33840: lr=1.00E-05, loss= 1.1507 (max= 1.4767), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:37,969 - root - INFO - Step 33840: lr=1.00E-05, loss= 1.1507 (max= 1.4767), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:37,969 - root - INFO - Step 33840: lr=1.00E-05, loss= 1.1507 (max= 1.4767), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:35:53,879 - root - INFO - Step 33850: lr=1.00E-05, loss= 1.1908 (max= 1.5151), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:35:53,879 - root - INFO - Step 33850: lr=1.00E-05, loss= 1.1908 (max= 1.5151), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:35:53,879 - root - INFO - Step 33850: lr=1.00E-05, loss= 1.1908 (max= 1.5151), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:35:53,879 - root - INFO - Step 33850: lr=1.00E-05, loss= 1.1908 (max= 1.5151), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:35:53,879 - root - INFO - Step 33850: lr=1.00E-05, loss= 1.1908 (max= 1.5151), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:35:53,879 - root - INFO - Step 33850: lr=1.00E-05, loss= 1.1908 (max= 1.5151), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:35:53,879 - root - INFO - Step 33850: lr=1.00E-05, loss= 1.1908 (max= 1.5151), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:35:53,880 - root - INFO - Step 33850: lr=1.00E-05, loss= 1.1908 (max= 1.5151), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:36:05,766 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:4907478 +2025-10-25 01:36:09,839 - root - INFO - Step 33860: lr=1.00E-05, loss= 1.1766 (max= 1.5671), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:36:09,839 - root - INFO - Step 33860: lr=1.00E-05, loss= 1.1766 (max= 1.5671), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:36:09,839 - root - INFO - Step 33860: lr=1.00E-05, loss= 1.1766 (max= 1.5671), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:36:09,839 - root - INFO - Step 33860: lr=1.00E-05, loss= 1.1766 (max= 1.5671), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:36:09,839 - root - INFO - Step 33860: lr=1.00E-05, loss= 1.1766 (max= 1.5671), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:36:09,839 - root - INFO - Step 33860: lr=1.00E-05, loss= 1.1766 (max= 1.5671), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:36:09,839 - root - INFO - Step 33860: lr=1.00E-05, loss= 1.1766 (max= 1.5671), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:36:09,839 - root - INFO - Step 33860: lr=1.00E-05, loss= 1.1766 (max= 1.5671), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:36:25,811 - root - INFO - Step 33870: lr=1.00E-05, loss= 1.1651 (max= 1.6160), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:25,811 - root - INFO - Step 33870: lr=1.00E-05, loss= 1.1651 (max= 1.6160), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:25,811 - root - INFO - Step 33870: lr=1.00E-05, loss= 1.1651 (max= 1.6160), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:25,811 - root - INFO - Step 33870: lr=1.00E-05, loss= 1.1651 (max= 1.6160), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:25,811 - root - INFO - Step 33870: lr=1.00E-05, loss= 1.1651 (max= 1.6160), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:25,811 - root - INFO - Step 33870: lr=1.00E-05, loss= 1.1651 (max= 1.6160), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:25,811 - root - INFO - Step 33870: lr=1.00E-05, loss= 1.1651 (max= 1.6160), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:25,811 - root - INFO - Step 33870: lr=1.00E-05, loss= 1.1651 (max= 1.6160), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:41,786 - root - INFO - Step 33880: lr=1.00E-05, loss= 1.1677 (max= 1.5438), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:41,786 - root - INFO - Step 33880: lr=1.00E-05, loss= 1.1677 (max= 1.5438), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:41,786 - root - INFO - Step 33880: lr=1.00E-05, loss= 1.1677 (max= 1.5438), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:41,787 - root - INFO - Step 33880: lr=1.00E-05, loss= 1.1677 (max= 1.5438), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:41,787 - root - INFO - Step 33880: lr=1.00E-05, loss= 1.1677 (max= 1.5438), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:41,787 - root - INFO - Step 33880: lr=1.00E-05, loss= 1.1677 (max= 1.5438), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:41,787 - root - INFO - Step 33880: lr=1.00E-05, loss= 1.1677 (max= 1.5438), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:41,787 - root - INFO - Step 33880: lr=1.00E-05, loss= 1.1677 (max= 1.5438), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:57,697 - root - INFO - Step 33890: lr=1.00E-05, loss= 1.1707 (max= 1.6086), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:57,697 - root - INFO - Step 33890: lr=1.00E-05, loss= 1.1707 (max= 1.6086), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:57,697 - root - INFO - Step 33890: lr=1.00E-05, loss= 1.1707 (max= 1.6086), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:57,697 - root - INFO - Step 33890: lr=1.00E-05, loss= 1.1707 (max= 1.6086), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:57,697 - root - INFO - Step 33890: lr=1.00E-05, loss= 1.1707 (max= 1.6086), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:57,697 - root - INFO - Step 33890: lr=1.00E-05, loss= 1.1707 (max= 1.6086), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:57,697 - root - INFO - Step 33890: lr=1.00E-05, loss= 1.1707 (max= 1.6086), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:36:57,697 - root - INFO - Step 33890: lr=1.00E-05, loss= 1.1707 (max= 1.6086), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:13,684 - root - INFO - Step 33900: lr=1.00E-05, loss= 1.1545 (max= 1.4583), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:13,684 - root - INFO - Step 33900: lr=1.00E-05, loss= 1.1545 (max= 1.4583), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:13,684 - root - INFO - Step 33900: lr=1.00E-05, loss= 1.1545 (max= 1.4583), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:13,684 - root - INFO - Step 33900: lr=1.00E-05, loss= 1.1545 (max= 1.4583), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:13,684 - root - INFO - Step 33900: lr=1.00E-05, loss= 1.1545 (max= 1.4583), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:13,684 - root - INFO - Step 33900: lr=1.00E-05, loss= 1.1545 (max= 1.4583), tps=20501, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:13,684 - root - INFO - Step 33900: lr=1.00E-05, loss= 1.1545 (max= 1.4583), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:13,685 - root - INFO - Step 33900: lr=1.00E-05, loss= 1.1545 (max= 1.4583), tps=20500, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:29,590 - root - INFO - Step 33910: lr=1.00E-05, loss= 1.1528 (max= 1.4764), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:29,590 - root - INFO - Step 33910: lr=1.00E-05, loss= 1.1528 (max= 1.4764), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:29,590 - root - INFO - Step 33910: lr=1.00E-05, loss= 1.1528 (max= 1.4764), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:29,590 - root - INFO - Step 33910: lr=1.00E-05, loss= 1.1528 (max= 1.4764), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:29,590 - root - INFO - Step 33910: lr=1.00E-05, loss= 1.1528 (max= 1.4764), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:29,590 - root - INFO - Step 33910: lr=1.00E-05, loss= 1.1528 (max= 1.4764), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:29,590 - root - INFO - Step 33910: lr=1.00E-05, loss= 1.1528 (max= 1.4764), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:29,590 - root - INFO - Step 33910: lr=1.00E-05, loss= 1.1528 (max= 1.4764), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:45,581 - root - INFO - Step 33920: lr=1.00E-05, loss= 1.1266 (max= 1.5376), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:45,581 - root - INFO - Step 33920: lr=1.00E-05, loss= 1.1266 (max= 1.5376), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:45,581 - root - INFO - Step 33920: lr=1.00E-05, loss= 1.1266 (max= 1.5376), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:45,581 - root - INFO - Step 33920: lr=1.00E-05, loss= 1.1266 (max= 1.5376), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:45,581 - root - INFO - Step 33920: lr=1.00E-05, loss= 1.1266 (max= 1.5376), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:45,581 - root - INFO - Step 33920: lr=1.00E-05, loss= 1.1266 (max= 1.5376), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:45,581 - root - INFO - Step 33920: lr=1.00E-05, loss= 1.1266 (max= 1.5376), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:37:45,581 - root - INFO - Step 33920: lr=1.00E-05, loss= 1.1266 (max= 1.5376), tps=20496, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:01,550 - root - INFO - Step 33930: lr=1.00E-05, loss= 1.1650 (max= 1.5762), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:01,550 - root - INFO - Step 33930: lr=1.00E-05, loss= 1.1650 (max= 1.5762), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:01,550 - root - INFO - Step 33930: lr=1.00E-05, loss= 1.1650 (max= 1.5762), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:01,550 - root - INFO - Step 33930: lr=1.00E-05, loss= 1.1650 (max= 1.5762), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:01,550 - root - INFO - Step 33930: lr=1.00E-05, loss= 1.1650 (max= 1.5762), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:01,550 - root - INFO - Step 33930: lr=1.00E-05, loss= 1.1650 (max= 1.5762), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:01,550 - root - INFO - Step 33930: lr=1.00E-05, loss= 1.1650 (max= 1.5762), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:01,550 - root - INFO - Step 33930: lr=1.00E-05, loss= 1.1650 (max= 1.5762), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:17,519 - root - INFO - Step 33940: lr=1.00E-05, loss= 1.1746 (max= 1.6097), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:17,520 - root - INFO - Step 33940: lr=1.00E-05, loss= 1.1746 (max= 1.6097), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:17,520 - root - INFO - Step 33940: lr=1.00E-05, loss= 1.1746 (max= 1.6097), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:17,520 - root - INFO - Step 33940: lr=1.00E-05, loss= 1.1746 (max= 1.6097), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:17,520 - root - INFO - Step 33940: lr=1.00E-05, loss= 1.1746 (max= 1.6097), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:17,520 - root - INFO - Step 33940: lr=1.00E-05, loss= 1.1746 (max= 1.6097), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:17,520 - root - INFO - Step 33940: lr=1.00E-05, loss= 1.1746 (max= 1.6097), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:17,520 - root - INFO - Step 33940: lr=1.00E-05, loss= 1.1746 (max= 1.6097), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:33,451 - root - INFO - Step 33950: lr=1.00E-05, loss= 1.1365 (max= 1.5396), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:33,451 - root - INFO - Step 33950: lr=1.00E-05, loss= 1.1365 (max= 1.5396), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:33,451 - root - INFO - Step 33950: lr=1.00E-05, loss= 1.1365 (max= 1.5396), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:33,451 - root - INFO - Step 33950: lr=1.00E-05, loss= 1.1365 (max= 1.5396), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:33,451 - root - INFO - Step 33950: lr=1.00E-05, loss= 1.1365 (max= 1.5396), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:33,451 - root - INFO - Step 33950: lr=1.00E-05, loss= 1.1365 (max= 1.5396), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:33,451 - root - INFO - Step 33950: lr=1.00E-05, loss= 1.1365 (max= 1.5396), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:33,451 - root - INFO - Step 33950: lr=1.00E-05, loss= 1.1365 (max= 1.5396), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:38:49,382 - root - INFO - Step 33960: lr=1.00E-05, loss= 1.1771 (max= 1.4759), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:38:49,382 - root - INFO - Step 33960: lr=1.00E-05, loss= 1.1771 (max= 1.4759), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:38:49,382 - root - INFO - Step 33960: lr=1.00E-05, loss= 1.1771 (max= 1.4759), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:38:49,382 - root - INFO - Step 33960: lr=1.00E-05, loss= 1.1771 (max= 1.4759), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:38:49,382 - root - INFO - Step 33960: lr=1.00E-05, loss= 1.1771 (max= 1.4759), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:38:49,382 - root - INFO - Step 33960: lr=1.00E-05, loss= 1.1771 (max= 1.4759), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:38:49,382 - root - INFO - Step 33960: lr=1.00E-05, loss= 1.1771 (max= 1.4759), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:38:49,382 - root - INFO - Step 33960: lr=1.00E-05, loss= 1.1771 (max= 1.4759), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:05,338 - root - INFO - Step 33970: lr=1.00E-05, loss= 1.1218 (max= 1.4387), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:05,338 - root - INFO - Step 33970: lr=1.00E-05, loss= 1.1218 (max= 1.4387), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:05,338 - root - INFO - Step 33970: lr=1.00E-05, loss= 1.1218 (max= 1.4387), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:05,338 - root - INFO - Step 33970: lr=1.00E-05, loss= 1.1218 (max= 1.4387), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:05,339 - root - INFO - Step 33970: lr=1.00E-05, loss= 1.1218 (max= 1.4387), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:05,339 - root - INFO - Step 33970: lr=1.00E-05, loss= 1.1218 (max= 1.4387), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:05,339 - root - INFO - Step 33970: lr=1.00E-05, loss= 1.1218 (max= 1.4387), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:05,339 - root - INFO - Step 33970: lr=1.00E-05, loss= 1.1218 (max= 1.4387), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:21,275 - root - INFO - Step 33980: lr=1.00E-05, loss= 1.1630 (max= 1.6427), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:39:21,275 - root - INFO - Step 33980: lr=1.00E-05, loss= 1.1630 (max= 1.6427), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:39:21,275 - root - INFO - Step 33980: lr=1.00E-05, loss= 1.1630 (max= 1.6427), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:39:21,275 - root - INFO - Step 33980: lr=1.00E-05, loss= 1.1630 (max= 1.6427), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:39:21,275 - root - INFO - Step 33980: lr=1.00E-05, loss= 1.1630 (max= 1.6427), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:39:21,275 - root - INFO - Step 33980: lr=1.00E-05, loss= 1.1630 (max= 1.6427), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:39:21,275 - root - INFO - Step 33980: lr=1.00E-05, loss= 1.1630 (max= 1.6427), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:39:21,275 - root - INFO - Step 33980: lr=1.00E-05, loss= 1.1630 (max= 1.6427), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:39:37,162 - root - INFO - Step 33990: lr=1.00E-05, loss= 1.1644 (max= 1.5548), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:37,162 - root - INFO - Step 33990: lr=1.00E-05, loss= 1.1644 (max= 1.5548), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:37,162 - root - INFO - Step 33990: lr=1.00E-05, loss= 1.1644 (max= 1.5548), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:37,162 - root - INFO - Step 33990: lr=1.00E-05, loss= 1.1644 (max= 1.5548), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:37,162 - root - INFO - Step 33990: lr=1.00E-05, loss= 1.1644 (max= 1.5548), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:37,162 - root - INFO - Step 33990: lr=1.00E-05, loss= 1.1644 (max= 1.5548), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:37,162 - root - INFO - Step 33990: lr=1.00E-05, loss= 1.1644 (max= 1.5548), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:37,163 - root - INFO - Step 33990: lr=1.00E-05, loss= 1.1644 (max= 1.5548), tps=20630, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-34000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-34000! Save time: 4.335393190383911 +2025-10-25 01:39:53,075 - root - INFO - Step 34000: lr=1.00E-05, loss= 1.1717 (max= 1.5848), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:53,075 - root - INFO - Step 34000: lr=1.00E-05, loss= 1.1717 (max= 1.5848), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:53,075 - root - INFO - Saving a full checkpoint at step 34000 +2025-10-25 01:39:53,075 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 01:39:53,075 - root - INFO - Saving a full checkpoint at step 34000 +2025-10-25 01:39:53,075 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 01:39:53,075 - root - INFO - Step 34000: lr=1.00E-05, loss= 1.1717 (max= 1.5848), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:53,075 - root - INFO - Saving a full checkpoint at step 34000 +2025-10-25 01:39:53,075 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 01:39:53,075 - root - INFO - Step 34000: lr=1.00E-05, loss= 1.1717 (max= 1.5848), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:53,075 - root - INFO - Step 34000: lr=1.00E-05, loss= 1.1717 (max= 1.5848), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:53,075 - root - INFO - Saving a full checkpoint at step 34000 +2025-10-25 01:39:53,075 - root - INFO - Saving a full checkpoint at step 34000 +2025-10-25 01:39:53,075 - root - INFO - Step 34000: lr=1.00E-05, loss= 1.1717 (max= 1.5848), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:53,075 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 01:39:53,075 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 01:39:53,075 - root - INFO - Step 34000: lr=1.00E-05, loss= 1.1717 (max= 1.5848), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:53,075 - root - INFO - Saving a full checkpoint at step 34000 +2025-10-25 01:39:53,075 - root - INFO - Saving a full checkpoint at step 34000 +2025-10-25 01:39:53,075 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 01:39:53,076 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 01:39:53,075 - root - INFO - Step 34000: lr=1.00E-05, loss= 1.1717 (max= 1.5848), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:39:53,076 - root - INFO - Saving a full checkpoint at step 34000 +2025-10-25 01:39:53,076 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 01:40:07,073 - root - INFO - Finished saving the checkpoint in 14.00 seconds +2025-10-25 01:40:07,080 - root - INFO - Finished saving the checkpoint in 14.00 seconds +2025-10-25 01:40:07,080 - root - INFO - Finished saving the checkpoint in 14.00 seconds +2025-10-25 01:40:07,080 - root - INFO - Finished saving the checkpoint in 14.00 seconds +2025-10-25 01:40:07,080 - root - INFO - Finished saving the checkpoint in 14.00 seconds +2025-10-25 01:40:07,080 - root - INFO - Finished saving the checkpoint in 14.00 seconds +2025-10-25 01:40:07,081 - root - INFO - Finished saving the checkpoint in 14.01 seconds +2025-10-25 01:40:07,081 - root - INFO - Finished saving the checkpoint in 14.01 seconds +2025-10-25 01:40:22,976 - root - INFO - Step 34010: lr=1.00E-05, loss= 1.1667 (max= 1.5512), tps=10960, mfu=22.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:22,976 - root - INFO - Step 34010: lr=1.00E-05, loss= 1.1667 (max= 1.5512), tps=10960, mfu=22.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:22,976 - root - INFO - Step 34010: lr=1.00E-05, loss= 1.1667 (max= 1.5512), tps=10960, mfu=22.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:22,977 - root - INFO - Step 34010: lr=1.00E-05, loss= 1.1667 (max= 1.5512), tps=10960, mfu=22.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:22,977 - root - INFO - Step 34010: lr=1.00E-05, loss= 1.1667 (max= 1.5512), tps=10960, mfu=22.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:22,977 - root - INFO - Step 34010: lr=1.00E-05, loss= 1.1667 (max= 1.5512), tps=10960, mfu=22.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:22,977 - root - INFO - Step 34010: lr=1.00E-05, loss= 1.1667 (max= 1.5512), tps=10960, mfu=22.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:22,977 - root - INFO - Step 34010: lr=1.00E-05, loss= 1.1667 (max= 1.5512), tps=10960, mfu=22.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:38,909 - root - INFO - Step 34020: lr=1.00E-05, loss= 1.1429 (max= 1.6079), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:38,909 - root - INFO - Step 34020: lr=1.00E-05, loss= 1.1429 (max= 1.6079), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:38,909 - root - INFO - Step 34020: lr=1.00E-05, loss= 1.1429 (max= 1.6079), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:38,910 - root - INFO - Step 34020: lr=1.00E-05, loss= 1.1429 (max= 1.6079), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:38,910 - root - INFO - Step 34020: lr=1.00E-05, loss= 1.1429 (max= 1.6079), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:38,910 - root - INFO - Step 34020: lr=1.00E-05, loss= 1.1429 (max= 1.6079), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:38,910 - root - INFO - Step 34020: lr=1.00E-05, loss= 1.1429 (max= 1.6079), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:38,910 - root - INFO - Step 34020: lr=1.00E-05, loss= 1.1429 (max= 1.6079), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:54,798 - root - INFO - Step 34030: lr=1.00E-05, loss= 1.1386 (max= 1.4995), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:54,798 - root - INFO - Step 34030: lr=1.00E-05, loss= 1.1386 (max= 1.4995), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:54,798 - root - INFO - Step 34030: lr=1.00E-05, loss= 1.1386 (max= 1.4995), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:54,798 - root - INFO - Step 34030: lr=1.00E-05, loss= 1.1386 (max= 1.4995), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:54,798 - root - INFO - Step 34030: lr=1.00E-05, loss= 1.1386 (max= 1.4995), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:54,798 - root - INFO - Step 34030: lr=1.00E-05, loss= 1.1386 (max= 1.4995), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:54,798 - root - INFO - Step 34030: lr=1.00E-05, loss= 1.1386 (max= 1.4995), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:40:54,798 - root - INFO - Step 34030: lr=1.00E-05, loss= 1.1386 (max= 1.4995), tps=20628, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:10,744 - root - INFO - Step 34040: lr=1.00E-05, loss= 1.1536 (max= 1.5122), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:10,744 - root - INFO - Step 34040: lr=1.00E-05, loss= 1.1536 (max= 1.5122), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:10,744 - root - INFO - Step 34040: lr=1.00E-05, loss= 1.1536 (max= 1.5122), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:10,744 - root - INFO - Step 34040: lr=1.00E-05, loss= 1.1536 (max= 1.5122), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:10,745 - root - INFO - Step 34040: lr=1.00E-05, loss= 1.1536 (max= 1.5122), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:10,745 - root - INFO - Step 34040: lr=1.00E-05, loss= 1.1536 (max= 1.5122), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:10,745 - root - INFO - Step 34040: lr=1.00E-05, loss= 1.1536 (max= 1.5122), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:10,745 - root - INFO - Step 34040: lr=1.00E-05, loss= 1.1536 (max= 1.5122), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:26,695 - root - INFO - Step 34050: lr=1.00E-05, loss= 1.1688 (max= 1.6135), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:26,695 - root - INFO - Step 34050: lr=1.00E-05, loss= 1.1688 (max= 1.6135), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:26,695 - root - INFO - Step 34050: lr=1.00E-05, loss= 1.1688 (max= 1.6135), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:26,695 - root - INFO - Step 34050: lr=1.00E-05, loss= 1.1688 (max= 1.6135), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:26,695 - root - INFO - Step 34050: lr=1.00E-05, loss= 1.1688 (max= 1.6135), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:26,695 - root - INFO - Step 34050: lr=1.00E-05, loss= 1.1688 (max= 1.6135), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:26,695 - root - INFO - Step 34050: lr=1.00E-05, loss= 1.1688 (max= 1.6135), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:26,695 - root - INFO - Step 34050: lr=1.00E-05, loss= 1.1688 (max= 1.6135), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:42,650 - root - INFO - Step 34060: lr=1.00E-05, loss= 1.1787 (max= 1.6159), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:42,651 - root - INFO - Step 34060: lr=1.00E-05, loss= 1.1787 (max= 1.6159), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:42,651 - root - INFO - Step 34060: lr=1.00E-05, loss= 1.1787 (max= 1.6159), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:42,651 - root - INFO - Step 34060: lr=1.00E-05, loss= 1.1787 (max= 1.6159), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:42,651 - root - INFO - Step 34060: lr=1.00E-05, loss= 1.1787 (max= 1.6159), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:42,651 - root - INFO - Step 34060: lr=1.00E-05, loss= 1.1787 (max= 1.6159), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:42,651 - root - INFO - Step 34060: lr=1.00E-05, loss= 1.1787 (max= 1.6159), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:42,651 - root - INFO - Step 34060: lr=1.00E-05, loss= 1.1787 (max= 1.6159), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:58,615 - root - INFO - Step 34070: lr=1.00E-05, loss= 1.1589 (max= 1.4454), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:58,616 - root - INFO - Step 34070: lr=1.00E-05, loss= 1.1589 (max= 1.4454), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:58,616 - root - INFO - Step 34070: lr=1.00E-05, loss= 1.1589 (max= 1.4454), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:58,616 - root - INFO - Step 34070: lr=1.00E-05, loss= 1.1589 (max= 1.4454), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:58,616 - root - INFO - Step 34070: lr=1.00E-05, loss= 1.1589 (max= 1.4454), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:58,616 - root - INFO - Step 34070: lr=1.00E-05, loss= 1.1589 (max= 1.4454), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:58,616 - root - INFO - Step 34070: lr=1.00E-05, loss= 1.1589 (max= 1.4454), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:41:58,616 - root - INFO - Step 34070: lr=1.00E-05, loss= 1.1589 (max= 1.4454), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:42:14,587 - root - INFO - Step 34080: lr=1.00E-05, loss= 1.1660 (max= 1.7129), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:42:14,587 - root - INFO - Step 34080: lr=1.00E-05, loss= 1.1660 (max= 1.7129), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:42:14,587 - root - INFO - Step 34080: lr=1.00E-05, loss= 1.1660 (max= 1.7129), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:42:14,587 - root - INFO - Step 34080: lr=1.00E-05, loss= 1.1660 (max= 1.7129), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:42:14,587 - root - INFO - Step 34080: lr=1.00E-05, loss= 1.1660 (max= 1.7129), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:42:14,587 - root - INFO - Step 34080: lr=1.00E-05, loss= 1.1660 (max= 1.7129), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:42:14,587 - root - INFO - Step 34080: lr=1.00E-05, loss= 1.1660 (max= 1.7129), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:42:14,587 - root - INFO - Step 34080: lr=1.00E-05, loss= 1.1660 (max= 1.7129), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:42:30,547 - root - INFO - Step 34090: lr=1.00E-05, loss= 1.1686 (max= 1.5757), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:42:30,547 - root - INFO - Step 34090: lr=1.00E-05, loss= 1.1686 (max= 1.5757), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:42:30,547 - root - INFO - Step 34090: lr=1.00E-05, loss= 1.1686 (max= 1.5757), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:42:30,548 - root - INFO - Step 34090: lr=1.00E-05, loss= 1.1686 (max= 1.5757), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:42:30,548 - root - INFO - Step 34090: lr=1.00E-05, loss= 1.1686 (max= 1.5757), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:42:30,548 - root - INFO - Step 34090: lr=1.00E-05, loss= 1.1686 (max= 1.5757), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:42:30,548 - root - INFO - Step 34090: lr=1.00E-05, loss= 1.1686 (max= 1.5757), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:42:30,548 - root - INFO - Step 34090: lr=1.00E-05, loss= 1.1686 (max= 1.5757), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:42:46,505 - root - INFO - Step 34100: lr=1.00E-05, loss= 1.1608 (max= 1.4995), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:42:46,505 - root - INFO - Step 34100: lr=1.00E-05, loss= 1.1608 (max= 1.4995), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:42:46,505 - root - INFO - Step 34100: lr=1.00E-05, loss= 1.1608 (max= 1.4995), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:42:46,505 - root - INFO - Step 34100: lr=1.00E-05, loss= 1.1608 (max= 1.4995), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:42:46,505 - root - INFO - Step 34100: lr=1.00E-05, loss= 1.1608 (max= 1.4995), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:42:46,505 - root - INFO - Step 34100: lr=1.00E-05, loss= 1.1608 (max= 1.4995), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:42:46,505 - root - INFO - Step 34100: lr=1.00E-05, loss= 1.1608 (max= 1.4995), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:42:46,505 - root - INFO - Step 34100: lr=1.00E-05, loss= 1.1608 (max= 1.4995), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:02,423 - root - INFO - Step 34110: lr=1.00E-05, loss= 1.1692 (max= 1.5735), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:02,423 - root - INFO - Step 34110: lr=1.00E-05, loss= 1.1692 (max= 1.5735), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:02,423 - root - INFO - Step 34110: lr=1.00E-05, loss= 1.1692 (max= 1.5735), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:02,423 - root - INFO - Step 34110: lr=1.00E-05, loss= 1.1692 (max= 1.5735), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:02,423 - root - INFO - Step 34110: lr=1.00E-05, loss= 1.1692 (max= 1.5735), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:02,423 - root - INFO - Step 34110: lr=1.00E-05, loss= 1.1692 (max= 1.5735), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:02,423 - root - INFO - Step 34110: lr=1.00E-05, loss= 1.1692 (max= 1.5735), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:02,423 - root - INFO - Step 34110: lr=1.00E-05, loss= 1.1692 (max= 1.5735), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:18,370 - root - INFO - Step 34120: lr=1.00E-05, loss= 1.1295 (max= 1.5544), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:18,370 - root - INFO - Step 34120: lr=1.00E-05, loss= 1.1295 (max= 1.5544), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:18,370 - root - INFO - Step 34120: lr=1.00E-05, loss= 1.1295 (max= 1.5544), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:18,370 - root - INFO - Step 34120: lr=1.00E-05, loss= 1.1295 (max= 1.5544), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:18,370 - root - INFO - Step 34120: lr=1.00E-05, loss= 1.1295 (max= 1.5544), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:18,370 - root - INFO - Step 34120: lr=1.00E-05, loss= 1.1295 (max= 1.5544), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:18,370 - root - INFO - Step 34120: lr=1.00E-05, loss= 1.1295 (max= 1.5544), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:18,370 - root - INFO - Step 34120: lr=1.00E-05, loss= 1.1295 (max= 1.5544), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:34,269 - root - INFO - Step 34130: lr=1.00E-05, loss= 1.1805 (max= 1.6434), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:34,269 - root - INFO - Step 34130: lr=1.00E-05, loss= 1.1805 (max= 1.6434), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:34,269 - root - INFO - Step 34130: lr=1.00E-05, loss= 1.1805 (max= 1.6434), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:34,269 - root - INFO - Step 34130: lr=1.00E-05, loss= 1.1805 (max= 1.6434), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:34,269 - root - INFO - Step 34130: lr=1.00E-05, loss= 1.1805 (max= 1.6434), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:34,269 - root - INFO - Step 34130: lr=1.00E-05, loss= 1.1805 (max= 1.6434), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:34,269 - root - INFO - Step 34130: lr=1.00E-05, loss= 1.1805 (max= 1.6434), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:34,269 - root - INFO - Step 34130: lr=1.00E-05, loss= 1.1805 (max= 1.6434), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:50,249 - root - INFO - Step 34140: lr=1.00E-05, loss= 1.1605 (max= 1.6043), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:50,249 - root - INFO - Step 34140: lr=1.00E-05, loss= 1.1605 (max= 1.6043), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:50,249 - root - INFO - Step 34140: lr=1.00E-05, loss= 1.1605 (max= 1.6043), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:50,249 - root - INFO - Step 34140: lr=1.00E-05, loss= 1.1605 (max= 1.6043), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:50,249 - root - INFO - Step 34140: lr=1.00E-05, loss= 1.1605 (max= 1.6043), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:50,249 - root - INFO - Step 34140: lr=1.00E-05, loss= 1.1605 (max= 1.6043), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:50,249 - root - INFO - Step 34140: lr=1.00E-05, loss= 1.1605 (max= 1.6043), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:43:50,249 - root - INFO - Step 34140: lr=1.00E-05, loss= 1.1605 (max= 1.6043), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:44:06,166 - root - INFO - Step 34150: lr=1.00E-05, loss= 1.1457 (max= 1.6554), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:44:06,166 - root - INFO - Step 34150: lr=1.00E-05, loss= 1.1457 (max= 1.6554), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:44:06,166 - root - INFO - Step 34150: lr=1.00E-05, loss= 1.1457 (max= 1.6554), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:44:06,167 - root - INFO - Step 34150: lr=1.00E-05, loss= 1.1457 (max= 1.6554), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:44:06,167 - root - INFO - Step 34150: lr=1.00E-05, loss= 1.1457 (max= 1.6554), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:44:06,167 - root - INFO - Step 34150: lr=1.00E-05, loss= 1.1457 (max= 1.6554), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:44:06,167 - root - INFO - Step 34150: lr=1.00E-05, loss= 1.1457 (max= 1.6554), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:44:06,167 - root - INFO - Step 34150: lr=1.00E-05, loss= 1.1457 (max= 1.6554), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:44:22,087 - root - INFO - Step 34160: lr=1.00E-05, loss= 1.1698 (max= 1.5743), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:44:22,087 - root - INFO - Step 34160: lr=1.00E-05, loss= 1.1698 (max= 1.5743), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:44:22,087 - root - INFO - Step 34160: lr=1.00E-05, loss= 1.1698 (max= 1.5743), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:44:22,087 - root - INFO - Step 34160: lr=1.00E-05, loss= 1.1698 (max= 1.5743), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:44:22,087 - root - INFO - Step 34160: lr=1.00E-05, loss= 1.1698 (max= 1.5743), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:44:22,087 - root - INFO - Step 34160: lr=1.00E-05, loss= 1.1698 (max= 1.5743), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:44:22,087 - root - INFO - Step 34160: lr=1.00E-05, loss= 1.1698 (max= 1.5743), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:44:22,087 - root - INFO - Step 34160: lr=1.00E-05, loss= 1.1698 (max= 1.5743), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:44:37,995 - root - INFO - Step 34170: lr=1.00E-05, loss= 1.1555 (max= 1.4996), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:44:37,995 - root - INFO - Step 34170: lr=1.00E-05, loss= 1.1555 (max= 1.4996), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:44:37,996 - root - INFO - Step 34170: lr=1.00E-05, loss= 1.1555 (max= 1.4996), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:44:37,996 - root - INFO - Step 34170: lr=1.00E-05, loss= 1.1555 (max= 1.4996), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:44:37,996 - root - INFO - Step 34170: lr=1.00E-05, loss= 1.1555 (max= 1.4996), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:44:37,996 - root - INFO - Step 34170: lr=1.00E-05, loss= 1.1555 (max= 1.4996), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:44:37,996 - root - INFO - Step 34170: lr=1.00E-05, loss= 1.1555 (max= 1.4996), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:44:37,996 - root - INFO - Step 34170: lr=1.00E-05, loss= 1.1555 (max= 1.4996), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:44:53,913 - root - INFO - Step 34180: lr=1.00E-05, loss= 1.1596 (max= 1.5536), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:44:53,913 - root - INFO - Step 34180: lr=1.00E-05, loss= 1.1596 (max= 1.5536), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:44:53,913 - root - INFO - Step 34180: lr=1.00E-05, loss= 1.1596 (max= 1.5536), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:44:53,913 - root - INFO - Step 34180: lr=1.00E-05, loss= 1.1596 (max= 1.5536), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:44:53,913 - root - INFO - Step 34180: lr=1.00E-05, loss= 1.1596 (max= 1.5536), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:44:53,913 - root - INFO - Step 34180: lr=1.00E-05, loss= 1.1596 (max= 1.5536), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:44:53,913 - root - INFO - Step 34180: lr=1.00E-05, loss= 1.1596 (max= 1.5536), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:44:53,913 - root - INFO - Step 34180: lr=1.00E-05, loss= 1.1596 (max= 1.5536), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:09,835 - root - INFO - Step 34190: lr=1.00E-05, loss= 1.1638 (max= 1.7221), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:09,835 - root - INFO - Step 34190: lr=1.00E-05, loss= 1.1638 (max= 1.7221), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:09,835 - root - INFO - Step 34190: lr=1.00E-05, loss= 1.1638 (max= 1.7221), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:09,835 - root - INFO - Step 34190: lr=1.00E-05, loss= 1.1638 (max= 1.7221), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:09,835 - root - INFO - Step 34190: lr=1.00E-05, loss= 1.1638 (max= 1.7221), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:09,835 - root - INFO - Step 34190: lr=1.00E-05, loss= 1.1638 (max= 1.7221), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:09,835 - root - INFO - Step 34190: lr=1.00E-05, loss= 1.1638 (max= 1.7221), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:09,835 - root - INFO - Step 34190: lr=1.00E-05, loss= 1.1638 (max= 1.7221), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:25,766 - root - INFO - Step 34200: lr=1.00E-05, loss= 1.1495 (max= 1.5462), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:25,766 - root - INFO - Step 34200: lr=1.00E-05, loss= 1.1495 (max= 1.5462), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:25,766 - root - INFO - Step 34200: lr=1.00E-05, loss= 1.1495 (max= 1.5462), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:25,766 - root - INFO - Step 34200: lr=1.00E-05, loss= 1.1495 (max= 1.5462), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:25,766 - root - INFO - Step 34200: lr=1.00E-05, loss= 1.1495 (max= 1.5462), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:25,766 - root - INFO - Step 34200: lr=1.00E-05, loss= 1.1495 (max= 1.5462), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:25,766 - root - INFO - Step 34200: lr=1.00E-05, loss= 1.1495 (max= 1.5462), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:25,766 - root - INFO - Step 34200: lr=1.00E-05, loss= 1.1495 (max= 1.5462), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:41,699 - root - INFO - Step 34210: lr=1.00E-05, loss= 1.2070 (max= 1.7056), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:41,699 - root - INFO - Step 34210: lr=1.00E-05, loss= 1.2070 (max= 1.7056), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:41,699 - root - INFO - Step 34210: lr=1.00E-05, loss= 1.2070 (max= 1.7056), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:41,699 - root - INFO - Step 34210: lr=1.00E-05, loss= 1.2070 (max= 1.7056), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:41,699 - root - INFO - Step 34210: lr=1.00E-05, loss= 1.2070 (max= 1.7056), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:41,699 - root - INFO - Step 34210: lr=1.00E-05, loss= 1.2070 (max= 1.7056), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:41,699 - root - INFO - Step 34210: lr=1.00E-05, loss= 1.2070 (max= 1.7056), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:41,699 - root - INFO - Step 34210: lr=1.00E-05, loss= 1.2070 (max= 1.7056), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:57,667 - root - INFO - Step 34220: lr=1.00E-05, loss= 1.1838 (max= 1.7583), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:57,667 - root - INFO - Step 34220: lr=1.00E-05, loss= 1.1838 (max= 1.7583), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:57,668 - root - INFO - Step 34220: lr=1.00E-05, loss= 1.1838 (max= 1.7583), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:57,668 - root - INFO - Step 34220: lr=1.00E-05, loss= 1.1838 (max= 1.7583), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:57,668 - root - INFO - Step 34220: lr=1.00E-05, loss= 1.1838 (max= 1.7583), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:57,668 - root - INFO - Step 34220: lr=1.00E-05, loss= 1.1838 (max= 1.7583), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:57,668 - root - INFO - Step 34220: lr=1.00E-05, loss= 1.1838 (max= 1.7583), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:45:57,668 - root - INFO - Step 34220: lr=1.00E-05, loss= 1.1838 (max= 1.7583), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:13,621 - root - INFO - Step 34230: lr=1.00E-05, loss= 1.1799 (max= 1.6392), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:13,621 - root - INFO - Step 34230: lr=1.00E-05, loss= 1.1799 (max= 1.6392), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:13,621 - root - INFO - Step 34230: lr=1.00E-05, loss= 1.1799 (max= 1.6392), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:13,621 - root - INFO - Step 34230: lr=1.00E-05, loss= 1.1799 (max= 1.6392), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:13,621 - root - INFO - Step 34230: lr=1.00E-05, loss= 1.1799 (max= 1.6392), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:13,621 - root - INFO - Step 34230: lr=1.00E-05, loss= 1.1799 (max= 1.6392), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:13,621 - root - INFO - Step 34230: lr=1.00E-05, loss= 1.1799 (max= 1.6392), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:13,621 - root - INFO - Step 34230: lr=1.00E-05, loss= 1.1799 (max= 1.6392), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:29,537 - root - INFO - Step 34240: lr=1.00E-05, loss= 1.1076 (max= 1.5273), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:29,537 - root - INFO - Step 34240: lr=1.00E-05, loss= 1.1076 (max= 1.5273), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:29,537 - root - INFO - Step 34240: lr=1.00E-05, loss= 1.1076 (max= 1.5273), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:29,537 - root - INFO - Step 34240: lr=1.00E-05, loss= 1.1076 (max= 1.5273), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:29,537 - root - INFO - Step 34240: lr=1.00E-05, loss= 1.1076 (max= 1.5273), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:29,537 - root - INFO - Step 34240: lr=1.00E-05, loss= 1.1076 (max= 1.5273), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:29,537 - root - INFO - Step 34240: lr=1.00E-05, loss= 1.1076 (max= 1.5273), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:29,537 - root - INFO - Step 34240: lr=1.00E-05, loss= 1.1076 (max= 1.5273), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:45,497 - root - INFO - Step 34250: lr=1.00E-05, loss= 1.1805 (max= 1.6454), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:45,497 - root - INFO - Step 34250: lr=1.00E-05, loss= 1.1805 (max= 1.6454), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:45,497 - root - INFO - Step 34250: lr=1.00E-05, loss= 1.1805 (max= 1.6454), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:45,497 - root - INFO - Step 34250: lr=1.00E-05, loss= 1.1805 (max= 1.6454), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:45,497 - root - INFO - Step 34250: lr=1.00E-05, loss= 1.1805 (max= 1.6454), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:45,497 - root - INFO - Step 34250: lr=1.00E-05, loss= 1.1805 (max= 1.6454), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:45,497 - root - INFO - Step 34250: lr=1.00E-05, loss= 1.1805 (max= 1.6454), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:46:45,497 - root - INFO - Step 34250: lr=1.00E-05, loss= 1.1805 (max= 1.6454), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:01,476 - root - INFO - Step 34260: lr=1.00E-05, loss= 1.1608 (max= 1.4916), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:01,476 - root - INFO - Step 34260: lr=1.00E-05, loss= 1.1608 (max= 1.4916), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:01,476 - root - INFO - Step 34260: lr=1.00E-05, loss= 1.1608 (max= 1.4916), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:01,476 - root - INFO - Step 34260: lr=1.00E-05, loss= 1.1608 (max= 1.4916), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:01,476 - root - INFO - Step 34260: lr=1.00E-05, loss= 1.1608 (max= 1.4916), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:01,476 - root - INFO - Step 34260: lr=1.00E-05, loss= 1.1608 (max= 1.4916), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:01,476 - root - INFO - Step 34260: lr=1.00E-05, loss= 1.1608 (max= 1.4916), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:01,476 - root - INFO - Step 34260: lr=1.00E-05, loss= 1.1608 (max= 1.4916), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:17,428 - root - INFO - Step 34270: lr=1.00E-05, loss= 1.1783 (max= 1.7073), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:17,428 - root - INFO - Step 34270: lr=1.00E-05, loss= 1.1783 (max= 1.7073), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:17,429 - root - INFO - Step 34270: lr=1.00E-05, loss= 1.1783 (max= 1.7073), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:17,429 - root - INFO - Step 34270: lr=1.00E-05, loss= 1.1783 (max= 1.7073), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:17,429 - root - INFO - Step 34270: lr=1.00E-05, loss= 1.1783 (max= 1.7073), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:17,429 - root - INFO - Step 34270: lr=1.00E-05, loss= 1.1783 (max= 1.7073), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:17,429 - root - INFO - Step 34270: lr=1.00E-05, loss= 1.1783 (max= 1.7073), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:17,429 - root - INFO - Step 34270: lr=1.00E-05, loss= 1.1783 (max= 1.7073), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:33,336 - root - INFO - Step 34280: lr=1.00E-05, loss= 1.1773 (max= 1.4862), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:33,336 - root - INFO - Step 34280: lr=1.00E-05, loss= 1.1773 (max= 1.4862), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:33,336 - root - INFO - Step 34280: lr=1.00E-05, loss= 1.1773 (max= 1.4862), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:33,336 - root - INFO - Step 34280: lr=1.00E-05, loss= 1.1773 (max= 1.4862), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:33,336 - root - INFO - Step 34280: lr=1.00E-05, loss= 1.1773 (max= 1.4862), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:33,337 - root - INFO - Step 34280: lr=1.00E-05, loss= 1.1773 (max= 1.4862), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:33,337 - root - INFO - Step 34280: lr=1.00E-05, loss= 1.1773 (max= 1.4862), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:33,337 - root - INFO - Step 34280: lr=1.00E-05, loss= 1.1773 (max= 1.4862), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:49,240 - root - INFO - Step 34290: lr=1.00E-05, loss= 1.1479 (max= 1.6131), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:49,240 - root - INFO - Step 34290: lr=1.00E-05, loss= 1.1479 (max= 1.6131), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:49,240 - root - INFO - Step 34290: lr=1.00E-05, loss= 1.1479 (max= 1.6131), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:49,240 - root - INFO - Step 34290: lr=1.00E-05, loss= 1.1479 (max= 1.6131), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:49,240 - root - INFO - Step 34290: lr=1.00E-05, loss= 1.1479 (max= 1.6131), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:49,240 - root - INFO - Step 34290: lr=1.00E-05, loss= 1.1479 (max= 1.6131), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:49,241 - root - INFO - Step 34290: lr=1.00E-05, loss= 1.1479 (max= 1.6131), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:47:49,241 - root - INFO - Step 34290: lr=1.00E-05, loss= 1.1479 (max= 1.6131), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:48:05,203 - root - INFO - Step 34300: lr=1.00E-05, loss= 1.1939 (max= 1.6214), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:05,203 - root - INFO - Step 34300: lr=1.00E-05, loss= 1.1939 (max= 1.6214), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:05,203 - root - INFO - Step 34300: lr=1.00E-05, loss= 1.1939 (max= 1.6214), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:05,203 - root - INFO - Step 34300: lr=1.00E-05, loss= 1.1939 (max= 1.6214), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:05,203 - root - INFO - Step 34300: lr=1.00E-05, loss= 1.1939 (max= 1.6214), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:05,203 - root - INFO - Step 34300: lr=1.00E-05, loss= 1.1939 (max= 1.6214), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:05,203 - root - INFO - Step 34300: lr=1.00E-05, loss= 1.1939 (max= 1.6214), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:05,203 - root - INFO - Step 34300: lr=1.00E-05, loss= 1.1939 (max= 1.6214), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:21,122 - root - INFO - Step 34310: lr=1.00E-05, loss= 1.1681 (max= 1.5729), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:48:21,123 - root - INFO - Step 34310: lr=1.00E-05, loss= 1.1681 (max= 1.5729), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:48:21,123 - root - INFO - Step 34310: lr=1.00E-05, loss= 1.1681 (max= 1.5729), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:48:21,123 - root - INFO - Step 34310: lr=1.00E-05, loss= 1.1681 (max= 1.5729), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:48:21,123 - root - INFO - Step 34310: lr=1.00E-05, loss= 1.1681 (max= 1.5729), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:48:21,123 - root - INFO - Step 34310: lr=1.00E-05, loss= 1.1681 (max= 1.5729), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:48:21,123 - root - INFO - Step 34310: lr=1.00E-05, loss= 1.1681 (max= 1.5729), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:48:21,123 - root - INFO - Step 34310: lr=1.00E-05, loss= 1.1681 (max= 1.5729), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:48:37,067 - root - INFO - Step 34320: lr=1.00E-05, loss= 1.1944 (max= 1.5363), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:37,067 - root - INFO - Step 34320: lr=1.00E-05, loss= 1.1944 (max= 1.5363), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:37,068 - root - INFO - Step 34320: lr=1.00E-05, loss= 1.1944 (max= 1.5363), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:37,068 - root - INFO - Step 34320: lr=1.00E-05, loss= 1.1944 (max= 1.5363), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:37,068 - root - INFO - Step 34320: lr=1.00E-05, loss= 1.1944 (max= 1.5363), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:37,068 - root - INFO - Step 34320: lr=1.00E-05, loss= 1.1944 (max= 1.5363), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:37,068 - root - INFO - Step 34320: lr=1.00E-05, loss= 1.1944 (max= 1.5363), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:37,068 - root - INFO - Step 34320: lr=1.00E-05, loss= 1.1944 (max= 1.5363), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:53,037 - root - INFO - Step 34330: lr=1.00E-05, loss= 1.1778 (max= 1.6321), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:53,037 - root - INFO - Step 34330: lr=1.00E-05, loss= 1.1778 (max= 1.6321), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:53,037 - root - INFO - Step 34330: lr=1.00E-05, loss= 1.1778 (max= 1.6321), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:53,038 - root - INFO - Step 34330: lr=1.00E-05, loss= 1.1778 (max= 1.6321), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:53,038 - root - INFO - Step 34330: lr=1.00E-05, loss= 1.1778 (max= 1.6321), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:53,038 - root - INFO - Step 34330: lr=1.00E-05, loss= 1.1778 (max= 1.6321), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:53,038 - root - INFO - Step 34330: lr=1.00E-05, loss= 1.1778 (max= 1.6321), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:48:53,038 - root - INFO - Step 34330: lr=1.00E-05, loss= 1.1778 (max= 1.6321), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:49:08,957 - root - INFO - Step 34340: lr=1.00E-05, loss= 1.1368 (max= 1.4509), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:08,957 - root - INFO - Step 34340: lr=1.00E-05, loss= 1.1368 (max= 1.4509), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:08,957 - root - INFO - Step 34340: lr=1.00E-05, loss= 1.1368 (max= 1.4509), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:08,957 - root - INFO - Step 34340: lr=1.00E-05, loss= 1.1368 (max= 1.4509), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:08,957 - root - INFO - Step 34340: lr=1.00E-05, loss= 1.1368 (max= 1.4509), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:08,957 - root - INFO - Step 34340: lr=1.00E-05, loss= 1.1368 (max= 1.4509), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:08,957 - root - INFO - Step 34340: lr=1.00E-05, loss= 1.1368 (max= 1.4509), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:08,957 - root - INFO - Step 34340: lr=1.00E-05, loss= 1.1368 (max= 1.4509), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:24,871 - root - INFO - Step 34350: lr=1.00E-05, loss= 1.1586 (max= 1.5626), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:49:24,872 - root - INFO - Step 34350: lr=1.00E-05, loss= 1.1586 (max= 1.5626), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:49:24,872 - root - INFO - Step 34350: lr=1.00E-05, loss= 1.1586 (max= 1.5626), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:49:24,872 - root - INFO - Step 34350: lr=1.00E-05, loss= 1.1586 (max= 1.5626), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:49:24,872 - root - INFO - Step 34350: lr=1.00E-05, loss= 1.1586 (max= 1.5626), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:49:24,872 - root - INFO - Step 34350: lr=1.00E-05, loss= 1.1586 (max= 1.5626), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:49:24,872 - root - INFO - Step 34350: lr=1.00E-05, loss= 1.1586 (max= 1.5626), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:49:24,872 - root - INFO - Step 34350: lr=1.00E-05, loss= 1.1586 (max= 1.5626), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:49:40,789 - root - INFO - Step 34360: lr=1.00E-05, loss= 1.1968 (max= 1.5081), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:40,789 - root - INFO - Step 34360: lr=1.00E-05, loss= 1.1968 (max= 1.5081), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:40,789 - root - INFO - Step 34360: lr=1.00E-05, loss= 1.1968 (max= 1.5081), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:40,790 - root - INFO - Step 34360: lr=1.00E-05, loss= 1.1968 (max= 1.5081), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:40,790 - root - INFO - Step 34360: lr=1.00E-05, loss= 1.1968 (max= 1.5081), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:40,790 - root - INFO - Step 34360: lr=1.00E-05, loss= 1.1968 (max= 1.5081), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:40,790 - root - INFO - Step 34360: lr=1.00E-05, loss= 1.1968 (max= 1.5081), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:40,790 - root - INFO - Step 34360: lr=1.00E-05, loss= 1.1968 (max= 1.5081), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:56,754 - root - INFO - Step 34370: lr=1.00E-05, loss= 1.1426 (max= 1.4864), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:56,755 - root - INFO - Step 34370: lr=1.00E-05, loss= 1.1426 (max= 1.4864), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:56,755 - root - INFO - Step 34370: lr=1.00E-05, loss= 1.1426 (max= 1.4864), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:56,755 - root - INFO - Step 34370: lr=1.00E-05, loss= 1.1426 (max= 1.4864), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:56,755 - root - INFO - Step 34370: lr=1.00E-05, loss= 1.1426 (max= 1.4864), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:56,755 - root - INFO - Step 34370: lr=1.00E-05, loss= 1.1426 (max= 1.4864), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:56,755 - root - INFO - Step 34370: lr=1.00E-05, loss= 1.1426 (max= 1.4864), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:49:56,755 - root - INFO - Step 34370: lr=1.00E-05, loss= 1.1426 (max= 1.4864), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:12,658 - root - INFO - Step 34380: lr=1.00E-05, loss= 1.1768 (max= 1.5634), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:12,658 - root - INFO - Step 34380: lr=1.00E-05, loss= 1.1768 (max= 1.5634), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:12,658 - root - INFO - Step 34380: lr=1.00E-05, loss= 1.1768 (max= 1.5634), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:12,658 - root - INFO - Step 34380: lr=1.00E-05, loss= 1.1768 (max= 1.5634), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:12,658 - root - INFO - Step 34380: lr=1.00E-05, loss= 1.1768 (max= 1.5634), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:12,658 - root - INFO - Step 34380: lr=1.00E-05, loss= 1.1768 (max= 1.5634), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:12,658 - root - INFO - Step 34380: lr=1.00E-05, loss= 1.1768 (max= 1.5634), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:12,658 - root - INFO - Step 34380: lr=1.00E-05, loss= 1.1768 (max= 1.5634), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:28,627 - root - INFO - Step 34390: lr=1.00E-05, loss= 1.1769 (max= 1.6206), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:28,627 - root - INFO - Step 34390: lr=1.00E-05, loss= 1.1769 (max= 1.6206), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:28,627 - root - INFO - Step 34390: lr=1.00E-05, loss= 1.1769 (max= 1.6206), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:28,627 - root - INFO - Step 34390: lr=1.00E-05, loss= 1.1769 (max= 1.6206), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:28,627 - root - INFO - Step 34390: lr=1.00E-05, loss= 1.1769 (max= 1.6206), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:28,627 - root - INFO - Step 34390: lr=1.00E-05, loss= 1.1769 (max= 1.6206), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:28,627 - root - INFO - Step 34390: lr=1.00E-05, loss= 1.1769 (max= 1.6206), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:28,627 - root - INFO - Step 34390: lr=1.00E-05, loss= 1.1769 (max= 1.6206), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:44,516 - root - INFO - Step 34400: lr=1.00E-05, loss= 1.1858 (max= 1.7771), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:44,516 - root - INFO - Step 34400: lr=1.00E-05, loss= 1.1858 (max= 1.7771), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:44,516 - root - INFO - Step 34400: lr=1.00E-05, loss= 1.1858 (max= 1.7771), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:44,516 - root - INFO - Step 34400: lr=1.00E-05, loss= 1.1858 (max= 1.7771), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:44,516 - root - INFO - Step 34400: lr=1.00E-05, loss= 1.1858 (max= 1.7771), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:44,516 - root - INFO - Step 34400: lr=1.00E-05, loss= 1.1858 (max= 1.7771), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:44,517 - root - INFO - Step 34400: lr=1.00E-05, loss= 1.1858 (max= 1.7771), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:50:44,517 - root - INFO - Step 34400: lr=1.00E-05, loss= 1.1858 (max= 1.7771), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:00,455 - root - INFO - Step 34410: lr=1.00E-05, loss= 1.1875 (max= 1.6001), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:00,456 - root - INFO - Step 34410: lr=1.00E-05, loss= 1.1875 (max= 1.6001), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:00,456 - root - INFO - Step 34410: lr=1.00E-05, loss= 1.1875 (max= 1.6001), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:00,456 - root - INFO - Step 34410: lr=1.00E-05, loss= 1.1875 (max= 1.6001), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:00,456 - root - INFO - Step 34410: lr=1.00E-05, loss= 1.1875 (max= 1.6001), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:00,456 - root - INFO - Step 34410: lr=1.00E-05, loss= 1.1875 (max= 1.6001), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:00,456 - root - INFO - Step 34410: lr=1.00E-05, loss= 1.1875 (max= 1.6001), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:00,456 - root - INFO - Step 34410: lr=1.00E-05, loss= 1.1875 (max= 1.6001), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:16,338 - root - INFO - Step 34420: lr=1.00E-05, loss= 1.1633 (max= 1.6467), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:16,338 - root - INFO - Step 34420: lr=1.00E-05, loss= 1.1633 (max= 1.6467), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:16,338 - root - INFO - Step 34420: lr=1.00E-05, loss= 1.1633 (max= 1.6467), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:16,338 - root - INFO - Step 34420: lr=1.00E-05, loss= 1.1633 (max= 1.6467), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:16,338 - root - INFO - Step 34420: lr=1.00E-05, loss= 1.1633 (max= 1.6467), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:16,338 - root - INFO - Step 34420: lr=1.00E-05, loss= 1.1633 (max= 1.6467), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:16,338 - root - INFO - Step 34420: lr=1.00E-05, loss= 1.1633 (max= 1.6467), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:16,338 - root - INFO - Step 34420: lr=1.00E-05, loss= 1.1633 (max= 1.6467), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:32,286 - root - INFO - Step 34430: lr=1.00E-05, loss= 1.1698 (max= 1.5349), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:32,286 - root - INFO - Step 34430: lr=1.00E-05, loss= 1.1698 (max= 1.5349), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:32,286 - root - INFO - Step 34430: lr=1.00E-05, loss= 1.1698 (max= 1.5349), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:32,286 - root - INFO - Step 34430: lr=1.00E-05, loss= 1.1698 (max= 1.5349), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:32,286 - root - INFO - Step 34430: lr=1.00E-05, loss= 1.1698 (max= 1.5349), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:32,286 - root - INFO - Step 34430: lr=1.00E-05, loss= 1.1698 (max= 1.5349), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:32,286 - root - INFO - Step 34430: lr=1.00E-05, loss= 1.1698 (max= 1.5349), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:32,286 - root - INFO - Step 34430: lr=1.00E-05, loss= 1.1698 (max= 1.5349), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:48,256 - root - INFO - Step 34440: lr=1.00E-05, loss= 1.1542 (max= 1.5115), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:48,256 - root - INFO - Step 34440: lr=1.00E-05, loss= 1.1542 (max= 1.5115), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:48,257 - root - INFO - Step 34440: lr=1.00E-05, loss= 1.1542 (max= 1.5115), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:48,257 - root - INFO - Step 34440: lr=1.00E-05, loss= 1.1542 (max= 1.5115), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:48,257 - root - INFO - Step 34440: lr=1.00E-05, loss= 1.1542 (max= 1.5115), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:48,257 - root - INFO - Step 34440: lr=1.00E-05, loss= 1.1542 (max= 1.5115), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:48,257 - root - INFO - Step 34440: lr=1.00E-05, loss= 1.1542 (max= 1.5115), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:51:48,257 - root - INFO - Step 34440: lr=1.00E-05, loss= 1.1542 (max= 1.5115), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:04,221 - root - INFO - Step 34450: lr=1.00E-05, loss= 1.1834 (max= 1.5041), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:04,221 - root - INFO - Step 34450: lr=1.00E-05, loss= 1.1834 (max= 1.5041), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:04,221 - root - INFO - Step 34450: lr=1.00E-05, loss= 1.1834 (max= 1.5041), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:04,221 - root - INFO - Step 34450: lr=1.00E-05, loss= 1.1834 (max= 1.5041), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:04,221 - root - INFO - Step 34450: lr=1.00E-05, loss= 1.1834 (max= 1.5041), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:04,221 - root - INFO - Step 34450: lr=1.00E-05, loss= 1.1834 (max= 1.5041), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:04,221 - root - INFO - Step 34450: lr=1.00E-05, loss= 1.1834 (max= 1.5041), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:04,221 - root - INFO - Step 34450: lr=1.00E-05, loss= 1.1834 (max= 1.5041), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:20,153 - root - INFO - Step 34460: lr=1.00E-05, loss= 1.1507 (max= 1.5101), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:20,153 - root - INFO - Step 34460: lr=1.00E-05, loss= 1.1507 (max= 1.5101), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:20,153 - root - INFO - Step 34460: lr=1.00E-05, loss= 1.1507 (max= 1.5101), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:20,153 - root - INFO - Step 34460: lr=1.00E-05, loss= 1.1507 (max= 1.5101), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:20,153 - root - INFO - Step 34460: lr=1.00E-05, loss= 1.1507 (max= 1.5101), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:20,153 - root - INFO - Step 34460: lr=1.00E-05, loss= 1.1507 (max= 1.5101), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:20,153 - root - INFO - Step 34460: lr=1.00E-05, loss= 1.1507 (max= 1.5101), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:20,153 - root - INFO - Step 34460: lr=1.00E-05, loss= 1.1507 (max= 1.5101), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:36,066 - root - INFO - Step 34470: lr=1.00E-05, loss= 1.1322 (max= 1.5380), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:36,066 - root - INFO - Step 34470: lr=1.00E-05, loss= 1.1322 (max= 1.5380), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:36,066 - root - INFO - Step 34470: lr=1.00E-05, loss= 1.1322 (max= 1.5380), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:36,066 - root - INFO - Step 34470: lr=1.00E-05, loss= 1.1322 (max= 1.5380), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:36,066 - root - INFO - Step 34470: lr=1.00E-05, loss= 1.1322 (max= 1.5380), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:36,066 - root - INFO - Step 34470: lr=1.00E-05, loss= 1.1322 (max= 1.5380), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:36,066 - root - INFO - Step 34470: lr=1.00E-05, loss= 1.1322 (max= 1.5380), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:36,067 - root - INFO - Step 34470: lr=1.00E-05, loss= 1.1322 (max= 1.5380), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:51,969 - root - INFO - Step 34480: lr=1.00E-05, loss= 1.2144 (max= 1.5842), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:51,969 - root - INFO - Step 34480: lr=1.00E-05, loss= 1.2144 (max= 1.5842), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:51,969 - root - INFO - Step 34480: lr=1.00E-05, loss= 1.2144 (max= 1.5842), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:51,969 - root - INFO - Step 34480: lr=1.00E-05, loss= 1.2144 (max= 1.5842), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:51,969 - root - INFO - Step 34480: lr=1.00E-05, loss= 1.2144 (max= 1.5842), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:51,969 - root - INFO - Step 34480: lr=1.00E-05, loss= 1.2144 (max= 1.5842), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:51,969 - root - INFO - Step 34480: lr=1.00E-05, loss= 1.2144 (max= 1.5842), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:52:51,969 - root - INFO - Step 34480: lr=1.00E-05, loss= 1.2144 (max= 1.5842), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:07,917 - root - INFO - Step 34490: lr=1.00E-05, loss= 1.1740 (max= 1.8009), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:07,917 - root - INFO - Step 34490: lr=1.00E-05, loss= 1.1740 (max= 1.8009), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:07,917 - root - INFO - Step 34490: lr=1.00E-05, loss= 1.1740 (max= 1.8009), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:07,917 - root - INFO - Step 34490: lr=1.00E-05, loss= 1.1740 (max= 1.8009), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:07,917 - root - INFO - Step 34490: lr=1.00E-05, loss= 1.1740 (max= 1.8009), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:07,917 - root - INFO - Step 34490: lr=1.00E-05, loss= 1.1740 (max= 1.8009), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:07,917 - root - INFO - Step 34490: lr=1.00E-05, loss= 1.1740 (max= 1.8009), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:07,917 - root - INFO - Step 34490: lr=1.00E-05, loss= 1.1740 (max= 1.8009), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:23,871 - root - INFO - Step 34500: lr=1.00E-05, loss= 1.1690 (max= 1.6155), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:23,871 - root - INFO - Step 34500: lr=1.00E-05, loss= 1.1690 (max= 1.6155), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:23,871 - root - INFO - Step 34500: lr=1.00E-05, loss= 1.1690 (max= 1.6155), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:23,871 - root - INFO - Step 34500: lr=1.00E-05, loss= 1.1690 (max= 1.6155), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:23,871 - root - INFO - Step 34500: lr=1.00E-05, loss= 1.1690 (max= 1.6155), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:23,871 - root - INFO - Step 34500: lr=1.00E-05, loss= 1.1690 (max= 1.6155), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:23,872 - root - INFO - Step 34500: lr=1.00E-05, loss= 1.1690 (max= 1.6155), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:23,872 - root - INFO - Step 34500: lr=1.00E-05, loss= 1.1690 (max= 1.6155), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:39,820 - root - INFO - Step 34510: lr=1.00E-05, loss= 1.1619 (max= 1.4542), tps=20549, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:53:39,821 - root - INFO - Step 34510: lr=1.00E-05, loss= 1.1619 (max= 1.4542), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:53:39,821 - root - INFO - Step 34510: lr=1.00E-05, loss= 1.1619 (max= 1.4542), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:53:39,821 - root - INFO - Step 34510: lr=1.00E-05, loss= 1.1619 (max= 1.4542), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:53:39,821 - root - INFO - Step 34510: lr=1.00E-05, loss= 1.1619 (max= 1.4542), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:53:39,821 - root - INFO - Step 34510: lr=1.00E-05, loss= 1.1619 (max= 1.4542), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:53:39,821 - root - INFO - Step 34510: lr=1.00E-05, loss= 1.1619 (max= 1.4542), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:53:39,821 - root - INFO - Step 34510: lr=1.00E-05, loss= 1.1619 (max= 1.4542), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:53:55,791 - root - INFO - Step 34520: lr=1.00E-05, loss= 1.1483 (max= 1.4867), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:55,792 - root - INFO - Step 34520: lr=1.00E-05, loss= 1.1483 (max= 1.4867), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:55,792 - root - INFO - Step 34520: lr=1.00E-05, loss= 1.1483 (max= 1.4867), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:55,792 - root - INFO - Step 34520: lr=1.00E-05, loss= 1.1483 (max= 1.4867), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:55,792 - root - INFO - Step 34520: lr=1.00E-05, loss= 1.1483 (max= 1.4867), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:55,792 - root - INFO - Step 34520: lr=1.00E-05, loss= 1.1483 (max= 1.4867), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:55,792 - root - INFO - Step 34520: lr=1.00E-05, loss= 1.1483 (max= 1.4867), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:53:55,792 - root - INFO - Step 34520: lr=1.00E-05, loss= 1.1483 (max= 1.4867), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:11,749 - root - INFO - Step 34530: lr=1.00E-05, loss= 1.2005 (max= 1.6881), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:11,749 - root - INFO - Step 34530: lr=1.00E-05, loss= 1.2005 (max= 1.6881), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:11,749 - root - INFO - Step 34530: lr=1.00E-05, loss= 1.2005 (max= 1.6881), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:11,749 - root - INFO - Step 34530: lr=1.00E-05, loss= 1.2005 (max= 1.6881), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:11,749 - root - INFO - Step 34530: lr=1.00E-05, loss= 1.2005 (max= 1.6881), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:11,749 - root - INFO - Step 34530: lr=1.00E-05, loss= 1.2005 (max= 1.6881), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:11,749 - root - INFO - Step 34530: lr=1.00E-05, loss= 1.2005 (max= 1.6881), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:11,749 - root - INFO - Step 34530: lr=1.00E-05, loss= 1.2005 (max= 1.6881), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:27,646 - root - INFO - Step 34540: lr=1.00E-05, loss= 1.1648 (max= 1.4878), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:27,646 - root - INFO - Step 34540: lr=1.00E-05, loss= 1.1648 (max= 1.4878), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:27,646 - root - INFO - Step 34540: lr=1.00E-05, loss= 1.1648 (max= 1.4878), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:27,646 - root - INFO - Step 34540: lr=1.00E-05, loss= 1.1648 (max= 1.4878), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:27,646 - root - INFO - Step 34540: lr=1.00E-05, loss= 1.1648 (max= 1.4878), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:27,646 - root - INFO - Step 34540: lr=1.00E-05, loss= 1.1648 (max= 1.4878), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:27,646 - root - INFO - Step 34540: lr=1.00E-05, loss= 1.1648 (max= 1.4878), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:27,646 - root - INFO - Step 34540: lr=1.00E-05, loss= 1.1648 (max= 1.4878), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:43,607 - root - INFO - Step 34550: lr=1.00E-05, loss= 1.1914 (max= 1.5449), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:54:43,607 - root - INFO - Step 34550: lr=1.00E-05, loss= 1.1914 (max= 1.5449), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:54:43,607 - root - INFO - Step 34550: lr=1.00E-05, loss= 1.1914 (max= 1.5449), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:54:43,607 - root - INFO - Step 34550: lr=1.00E-05, loss= 1.1914 (max= 1.5449), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:54:43,607 - root - INFO - Step 34550: lr=1.00E-05, loss= 1.1914 (max= 1.5449), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:54:43,607 - root - INFO - Step 34550: lr=1.00E-05, loss= 1.1914 (max= 1.5449), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:54:43,608 - root - INFO - Step 34550: lr=1.00E-05, loss= 1.1914 (max= 1.5449), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:54:43,608 - root - INFO - Step 34550: lr=1.00E-05, loss= 1.1914 (max= 1.5449), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:54:59,545 - root - INFO - Step 34560: lr=1.00E-05, loss= 1.1458 (max= 1.5645), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:59,545 - root - INFO - Step 34560: lr=1.00E-05, loss= 1.1458 (max= 1.5645), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:59,545 - root - INFO - Step 34560: lr=1.00E-05, loss= 1.1458 (max= 1.5645), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:59,545 - root - INFO - Step 34560: lr=1.00E-05, loss= 1.1458 (max= 1.5645), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:59,545 - root - INFO - Step 34560: lr=1.00E-05, loss= 1.1458 (max= 1.5645), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:59,545 - root - INFO - Step 34560: lr=1.00E-05, loss= 1.1458 (max= 1.5645), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:59,545 - root - INFO - Step 34560: lr=1.00E-05, loss= 1.1458 (max= 1.5645), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:54:59,545 - root - INFO - Step 34560: lr=1.00E-05, loss= 1.1458 (max= 1.5645), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:15,496 - root - INFO - Step 34570: lr=1.00E-05, loss= 1.1586 (max= 1.5673), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:15,497 - root - INFO - Step 34570: lr=1.00E-05, loss= 1.1586 (max= 1.5673), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:15,497 - root - INFO - Step 34570: lr=1.00E-05, loss= 1.1586 (max= 1.5673), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:15,497 - root - INFO - Step 34570: lr=1.00E-05, loss= 1.1586 (max= 1.5673), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:15,497 - root - INFO - Step 34570: lr=1.00E-05, loss= 1.1586 (max= 1.5673), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:15,497 - root - INFO - Step 34570: lr=1.00E-05, loss= 1.1586 (max= 1.5673), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:15,497 - root - INFO - Step 34570: lr=1.00E-05, loss= 1.1586 (max= 1.5673), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:15,497 - root - INFO - Step 34570: lr=1.00E-05, loss= 1.1586 (max= 1.5673), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:31,432 - root - INFO - Step 34580: lr=1.00E-05, loss= 1.1457 (max= 1.5506), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:31,432 - root - INFO - Step 34580: lr=1.00E-05, loss= 1.1457 (max= 1.5506), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:31,432 - root - INFO - Step 34580: lr=1.00E-05, loss= 1.1457 (max= 1.5506), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:31,432 - root - INFO - Step 34580: lr=1.00E-05, loss= 1.1457 (max= 1.5506), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:31,432 - root - INFO - Step 34580: lr=1.00E-05, loss= 1.1457 (max= 1.5506), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:31,432 - root - INFO - Step 34580: lr=1.00E-05, loss= 1.1457 (max= 1.5506), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:31,432 - root - INFO - Step 34580: lr=1.00E-05, loss= 1.1457 (max= 1.5506), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:31,432 - root - INFO - Step 34580: lr=1.00E-05, loss= 1.1457 (max= 1.5506), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:47,358 - root - INFO - Step 34590: lr=1.00E-05, loss= 1.1345 (max= 1.4715), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:47,358 - root - INFO - Step 34590: lr=1.00E-05, loss= 1.1345 (max= 1.4715), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:47,358 - root - INFO - Step 34590: lr=1.00E-05, loss= 1.1345 (max= 1.4715), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:47,358 - root - INFO - Step 34590: lr=1.00E-05, loss= 1.1345 (max= 1.4715), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:47,358 - root - INFO - Step 34590: lr=1.00E-05, loss= 1.1345 (max= 1.4715), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:47,358 - root - INFO - Step 34590: lr=1.00E-05, loss= 1.1345 (max= 1.4715), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:47,358 - root - INFO - Step 34590: lr=1.00E-05, loss= 1.1345 (max= 1.4715), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:55:47,359 - root - INFO - Step 34590: lr=1.00E-05, loss= 1.1345 (max= 1.4715), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:03,304 - root - INFO - Step 34600: lr=1.00E-05, loss= 1.1364 (max= 1.5200), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:03,304 - root - INFO - Step 34600: lr=1.00E-05, loss= 1.1364 (max= 1.5200), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:03,304 - root - INFO - Step 34600: lr=1.00E-05, loss= 1.1364 (max= 1.5200), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:03,304 - root - INFO - Step 34600: lr=1.00E-05, loss= 1.1364 (max= 1.5200), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:03,304 - root - INFO - Step 34600: lr=1.00E-05, loss= 1.1364 (max= 1.5200), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:03,304 - root - INFO - Step 34600: lr=1.00E-05, loss= 1.1364 (max= 1.5200), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:03,304 - root - INFO - Step 34600: lr=1.00E-05, loss= 1.1364 (max= 1.5200), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:03,304 - root - INFO - Step 34600: lr=1.00E-05, loss= 1.1364 (max= 1.5200), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:19,239 - root - INFO - Step 34610: lr=1.00E-05, loss= 1.1570 (max= 1.4502), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:56:19,239 - root - INFO - Step 34610: lr=1.00E-05, loss= 1.1570 (max= 1.4502), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:56:19,239 - root - INFO - Step 34610: lr=1.00E-05, loss= 1.1570 (max= 1.4502), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:56:19,239 - root - INFO - Step 34610: lr=1.00E-05, loss= 1.1570 (max= 1.4502), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:56:19,239 - root - INFO - Step 34610: lr=1.00E-05, loss= 1.1570 (max= 1.4502), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:56:19,239 - root - INFO - Step 34610: lr=1.00E-05, loss= 1.1570 (max= 1.4502), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:56:19,239 - root - INFO - Step 34610: lr=1.00E-05, loss= 1.1570 (max= 1.4502), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:56:19,239 - root - INFO - Step 34610: lr=1.00E-05, loss= 1.1570 (max= 1.4502), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:56:35,201 - root - INFO - Step 34620: lr=1.00E-05, loss= 1.1963 (max= 1.7454), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:35,201 - root - INFO - Step 34620: lr=1.00E-05, loss= 1.1963 (max= 1.7454), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:35,201 - root - INFO - Step 34620: lr=1.00E-05, loss= 1.1963 (max= 1.7454), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:35,201 - root - INFO - Step 34620: lr=1.00E-05, loss= 1.1963 (max= 1.7454), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:35,201 - root - INFO - Step 34620: lr=1.00E-05, loss= 1.1963 (max= 1.7454), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:35,201 - root - INFO - Step 34620: lr=1.00E-05, loss= 1.1963 (max= 1.7454), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:35,201 - root - INFO - Step 34620: lr=1.00E-05, loss= 1.1963 (max= 1.7454), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:35,201 - root - INFO - Step 34620: lr=1.00E-05, loss= 1.1963 (max= 1.7454), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:51,162 - root - INFO - Step 34630: lr=1.00E-05, loss= 1.1441 (max= 1.4634), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:51,162 - root - INFO - Step 34630: lr=1.00E-05, loss= 1.1441 (max= 1.4634), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:51,162 - root - INFO - Step 34630: lr=1.00E-05, loss= 1.1441 (max= 1.4634), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:51,162 - root - INFO - Step 34630: lr=1.00E-05, loss= 1.1441 (max= 1.4634), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:51,162 - root - INFO - Step 34630: lr=1.00E-05, loss= 1.1441 (max= 1.4634), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:51,162 - root - INFO - Step 34630: lr=1.00E-05, loss= 1.1441 (max= 1.4634), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:51,162 - root - INFO - Step 34630: lr=1.00E-05, loss= 1.1441 (max= 1.4634), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:56:51,162 - root - INFO - Step 34630: lr=1.00E-05, loss= 1.1441 (max= 1.4634), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:07,105 - root - INFO - Step 34640: lr=1.00E-05, loss= 1.1735 (max= 1.6431), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:57:07,105 - root - INFO - Step 34640: lr=1.00E-05, loss= 1.1735 (max= 1.6431), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:57:07,105 - root - INFO - Step 34640: lr=1.00E-05, loss= 1.1735 (max= 1.6431), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:57:07,105 - root - INFO - Step 34640: lr=1.00E-05, loss= 1.1735 (max= 1.6431), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:57:07,105 - root - INFO - Step 34640: lr=1.00E-05, loss= 1.1735 (max= 1.6431), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:57:07,105 - root - INFO - Step 34640: lr=1.00E-05, loss= 1.1735 (max= 1.6431), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:57:07,105 - root - INFO - Step 34640: lr=1.00E-05, loss= 1.1735 (max= 1.6431), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:57:07,105 - root - INFO - Step 34640: lr=1.00E-05, loss= 1.1735 (max= 1.6431), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:57:23,092 - root - INFO - Step 34650: lr=1.00E-05, loss= 1.1792 (max= 1.6154), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:23,092 - root - INFO - Step 34650: lr=1.00E-05, loss= 1.1792 (max= 1.6154), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:23,092 - root - INFO - Step 34650: lr=1.00E-05, loss= 1.1792 (max= 1.6154), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:23,092 - root - INFO - Step 34650: lr=1.00E-05, loss= 1.1792 (max= 1.6154), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:23,092 - root - INFO - Step 34650: lr=1.00E-05, loss= 1.1792 (max= 1.6154), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:23,092 - root - INFO - Step 34650: lr=1.00E-05, loss= 1.1792 (max= 1.6154), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:23,092 - root - INFO - Step 34650: lr=1.00E-05, loss= 1.1792 (max= 1.6154), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:23,092 - root - INFO - Step 34650: lr=1.00E-05, loss= 1.1792 (max= 1.6154), tps=20501, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:39,023 - root - INFO - Step 34660: lr=1.00E-05, loss= 1.1686 (max= 1.6115), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:39,023 - root - INFO - Step 34660: lr=1.00E-05, loss= 1.1686 (max= 1.6115), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:39,023 - root - INFO - Step 34660: lr=1.00E-05, loss= 1.1686 (max= 1.6115), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:39,023 - root - INFO - Step 34660: lr=1.00E-05, loss= 1.1686 (max= 1.6115), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:39,023 - root - INFO - Step 34660: lr=1.00E-05, loss= 1.1686 (max= 1.6115), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:39,023 - root - INFO - Step 34660: lr=1.00E-05, loss= 1.1686 (max= 1.6115), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:39,023 - root - INFO - Step 34660: lr=1.00E-05, loss= 1.1686 (max= 1.6115), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:39,023 - root - INFO - Step 34660: lr=1.00E-05, loss= 1.1686 (max= 1.6115), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:54,969 - root - INFO - Step 34670: lr=1.00E-05, loss= 1.1815 (max= 1.5658), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:54,969 - root - INFO - Step 34670: lr=1.00E-05, loss= 1.1815 (max= 1.5658), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:54,969 - root - INFO - Step 34670: lr=1.00E-05, loss= 1.1815 (max= 1.5658), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:54,970 - root - INFO - Step 34670: lr=1.00E-05, loss= 1.1815 (max= 1.5658), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:54,970 - root - INFO - Step 34670: lr=1.00E-05, loss= 1.1815 (max= 1.5658), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:54,970 - root - INFO - Step 34670: lr=1.00E-05, loss= 1.1815 (max= 1.5658), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:54,970 - root - INFO - Step 34670: lr=1.00E-05, loss= 1.1815 (max= 1.5658), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:57:54,970 - root - INFO - Step 34670: lr=1.00E-05, loss= 1.1815 (max= 1.5658), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:58:10,905 - root - INFO - Step 34680: lr=1.00E-05, loss= 1.1574 (max= 1.6882), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:58:10,905 - root - INFO - Step 34680: lr=1.00E-05, loss= 1.1574 (max= 1.6882), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:58:10,905 - root - INFO - Step 34680: lr=1.00E-05, loss= 1.1574 (max= 1.6882), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:58:10,906 - root - INFO - Step 34680: lr=1.00E-05, loss= 1.1574 (max= 1.6882), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:58:10,906 - root - INFO - Step 34680: lr=1.00E-05, loss= 1.1574 (max= 1.6882), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:58:10,906 - root - INFO - Step 34680: lr=1.00E-05, loss= 1.1574 (max= 1.6882), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:58:10,906 - root - INFO - Step 34680: lr=1.00E-05, loss= 1.1574 (max= 1.6882), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:58:10,906 - root - INFO - Step 34680: lr=1.00E-05, loss= 1.1574 (max= 1.6882), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:58:26,843 - root - INFO - Step 34690: lr=1.00E-05, loss= 1.1705 (max= 1.6376), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:58:26,843 - root - INFO - Step 34690: lr=1.00E-05, loss= 1.1705 (max= 1.6376), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:58:26,843 - root - INFO - Step 34690: lr=1.00E-05, loss= 1.1705 (max= 1.6376), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:58:26,843 - root - INFO - Step 34690: lr=1.00E-05, loss= 1.1705 (max= 1.6376), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:58:26,843 - root - INFO - Step 34690: lr=1.00E-05, loss= 1.1705 (max= 1.6376), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:58:26,843 - root - INFO - Step 34690: lr=1.00E-05, loss= 1.1705 (max= 1.6376), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:58:26,843 - root - INFO - Step 34690: lr=1.00E-05, loss= 1.1705 (max= 1.6376), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:58:26,843 - root - INFO - Step 34690: lr=1.00E-05, loss= 1.1705 (max= 1.6376), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:58:42,748 - root - INFO - Step 34700: lr=1.00E-05, loss= 1.1312 (max= 1.5586), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:58:42,748 - root - INFO - Step 34700: lr=1.00E-05, loss= 1.1312 (max= 1.5586), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:58:42,748 - root - INFO - Step 34700: lr=1.00E-05, loss= 1.1312 (max= 1.5586), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:58:42,748 - root - INFO - Step 34700: lr=1.00E-05, loss= 1.1312 (max= 1.5586), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:58:42,749 - root - INFO - Step 34700: lr=1.00E-05, loss= 1.1312 (max= 1.5586), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:58:42,749 - root - INFO - Step 34700: lr=1.00E-05, loss= 1.1312 (max= 1.5586), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:58:42,749 - root - INFO - Step 34700: lr=1.00E-05, loss= 1.1312 (max= 1.5586), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:58:42,749 - root - INFO - Step 34700: lr=1.00E-05, loss= 1.1312 (max= 1.5586), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:58:58,689 - root - INFO - Step 34710: lr=1.00E-05, loss= 1.1745 (max= 1.7806), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:58:58,689 - root - INFO - Step 34710: lr=1.00E-05, loss= 1.1745 (max= 1.7806), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:58:58,689 - root - INFO - Step 34710: lr=1.00E-05, loss= 1.1745 (max= 1.7806), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:58:58,690 - root - INFO - Step 34710: lr=1.00E-05, loss= 1.1745 (max= 1.7806), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:58:58,690 - root - INFO - Step 34710: lr=1.00E-05, loss= 1.1745 (max= 1.7806), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:58:58,690 - root - INFO - Step 34710: lr=1.00E-05, loss= 1.1745 (max= 1.7806), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:58:58,690 - root - INFO - Step 34710: lr=1.00E-05, loss= 1.1745 (max= 1.7806), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:58:58,690 - root - INFO - Step 34710: lr=1.00E-05, loss= 1.1745 (max= 1.7806), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:59:14,588 - root - INFO - Step 34720: lr=1.00E-05, loss= 1.1964 (max= 1.7109), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:59:14,588 - root - INFO - Step 34720: lr=1.00E-05, loss= 1.1964 (max= 1.7109), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:59:14,588 - root - INFO - Step 34720: lr=1.00E-05, loss= 1.1964 (max= 1.7109), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:59:14,588 - root - INFO - Step 34720: lr=1.00E-05, loss= 1.1964 (max= 1.7109), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:59:14,588 - root - INFO - Step 34720: lr=1.00E-05, loss= 1.1964 (max= 1.7109), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:59:14,588 - root - INFO - Step 34720: lr=1.00E-05, loss= 1.1964 (max= 1.7109), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:59:14,588 - root - INFO - Step 34720: lr=1.00E-05, loss= 1.1964 (max= 1.7109), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:59:14,588 - root - INFO - Step 34720: lr=1.00E-05, loss= 1.1964 (max= 1.7109), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:59:30,520 - root - INFO - Step 34730: lr=1.00E-05, loss= 1.1639 (max= 1.6132), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:59:30,520 - root - INFO - Step 34730: lr=1.00E-05, loss= 1.1639 (max= 1.6132), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:59:30,520 - root - INFO - Step 34730: lr=1.00E-05, loss= 1.1639 (max= 1.6132), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:59:30,521 - root - INFO - Step 34730: lr=1.00E-05, loss= 1.1639 (max= 1.6132), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:59:30,521 - root - INFO - Step 34730: lr=1.00E-05, loss= 1.1639 (max= 1.6132), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:59:30,521 - root - INFO - Step 34730: lr=1.00E-05, loss= 1.1639 (max= 1.6132), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:59:30,521 - root - INFO - Step 34730: lr=1.00E-05, loss= 1.1639 (max= 1.6132), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:59:30,521 - root - INFO - Step 34730: lr=1.00E-05, loss= 1.1639 (max= 1.6132), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 01:59:46,451 - root - INFO - Step 34740: lr=1.00E-05, loss= 1.1812 (max= 1.6708), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:59:46,451 - root - INFO - Step 34740: lr=1.00E-05, loss= 1.1812 (max= 1.6708), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:59:46,451 - root - INFO - Step 34740: lr=1.00E-05, loss= 1.1812 (max= 1.6708), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:59:46,451 - root - INFO - Step 34740: lr=1.00E-05, loss= 1.1812 (max= 1.6708), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:59:46,451 - root - INFO - Step 34740: lr=1.00E-05, loss= 1.1812 (max= 1.6708), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:59:46,451 - root - INFO - Step 34740: lr=1.00E-05, loss= 1.1812 (max= 1.6708), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:59:46,451 - root - INFO - Step 34740: lr=1.00E-05, loss= 1.1812 (max= 1.6708), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 01:59:46,451 - root - INFO - Step 34740: lr=1.00E-05, loss= 1.1812 (max= 1.6708), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:02,389 - root - INFO - Step 34750: lr=1.00E-05, loss= 1.1647 (max= 1.6473), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:02,389 - root - INFO - Step 34750: lr=1.00E-05, loss= 1.1647 (max= 1.6473), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:02,389 - root - INFO - Step 34750: lr=1.00E-05, loss= 1.1647 (max= 1.6473), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:02,389 - root - INFO - Step 34750: lr=1.00E-05, loss= 1.1647 (max= 1.6473), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:02,389 - root - INFO - Step 34750: lr=1.00E-05, loss= 1.1647 (max= 1.6473), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:02,389 - root - INFO - Step 34750: lr=1.00E-05, loss= 1.1647 (max= 1.6473), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:02,389 - root - INFO - Step 34750: lr=1.00E-05, loss= 1.1647 (max= 1.6473), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:02,389 - root - INFO - Step 34750: lr=1.00E-05, loss= 1.1647 (max= 1.6473), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:18,351 - root - INFO - Step 34760: lr=1.00E-05, loss= 1.1301 (max= 1.5904), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:18,351 - root - INFO - Step 34760: lr=1.00E-05, loss= 1.1301 (max= 1.5904), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:18,351 - root - INFO - Step 34760: lr=1.00E-05, loss= 1.1301 (max= 1.5904), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:18,351 - root - INFO - Step 34760: lr=1.00E-05, loss= 1.1301 (max= 1.5904), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:18,351 - root - INFO - Step 34760: lr=1.00E-05, loss= 1.1301 (max= 1.5904), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:18,351 - root - INFO - Step 34760: lr=1.00E-05, loss= 1.1301 (max= 1.5904), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:18,351 - root - INFO - Step 34760: lr=1.00E-05, loss= 1.1301 (max= 1.5904), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:18,351 - root - INFO - Step 34760: lr=1.00E-05, loss= 1.1301 (max= 1.5904), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:34,225 - root - INFO - Step 34770: lr=1.00E-05, loss= 1.1657 (max= 1.5532), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:34,225 - root - INFO - Step 34770: lr=1.00E-05, loss= 1.1657 (max= 1.5532), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:34,225 - root - INFO - Step 34770: lr=1.00E-05, loss= 1.1657 (max= 1.5532), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:34,225 - root - INFO - Step 34770: lr=1.00E-05, loss= 1.1657 (max= 1.5532), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:34,225 - root - INFO - Step 34770: lr=1.00E-05, loss= 1.1657 (max= 1.5532), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:34,225 - root - INFO - Step 34770: lr=1.00E-05, loss= 1.1657 (max= 1.5532), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:34,225 - root - INFO - Step 34770: lr=1.00E-05, loss= 1.1657 (max= 1.5532), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:34,225 - root - INFO - Step 34770: lr=1.00E-05, loss= 1.1657 (max= 1.5532), tps=20647, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:50,130 - root - INFO - Step 34780: lr=1.00E-05, loss= 1.1649 (max= 1.5128), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:50,130 - root - INFO - Step 34780: lr=1.00E-05, loss= 1.1649 (max= 1.5128), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:50,130 - root - INFO - Step 34780: lr=1.00E-05, loss= 1.1649 (max= 1.5128), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:50,130 - root - INFO - Step 34780: lr=1.00E-05, loss= 1.1649 (max= 1.5128), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:50,130 - root - INFO - Step 34780: lr=1.00E-05, loss= 1.1649 (max= 1.5128), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:50,130 - root - INFO - Step 34780: lr=1.00E-05, loss= 1.1649 (max= 1.5128), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:50,130 - root - INFO - Step 34780: lr=1.00E-05, loss= 1.1649 (max= 1.5128), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:00:50,130 - root - INFO - Step 34780: lr=1.00E-05, loss= 1.1649 (max= 1.5128), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:01:06,055 - root - INFO - Step 34790: lr=1.00E-05, loss= 1.1608 (max= 1.6432), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:01:06,055 - root - INFO - Step 34790: lr=1.00E-05, loss= 1.1608 (max= 1.6432), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:01:06,055 - root - INFO - Step 34790: lr=1.00E-05, loss= 1.1608 (max= 1.6432), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:01:06,055 - root - INFO - Step 34790: lr=1.00E-05, loss= 1.1608 (max= 1.6432), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:01:06,055 - root - INFO - Step 34790: lr=1.00E-05, loss= 1.1608 (max= 1.6432), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:01:06,055 - root - INFO - Step 34790: lr=1.00E-05, loss= 1.1608 (max= 1.6432), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:01:06,055 - root - INFO - Step 34790: lr=1.00E-05, loss= 1.1608 (max= 1.6432), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:01:06,055 - root - INFO - Step 34790: lr=1.00E-05, loss= 1.1608 (max= 1.6432), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:01:22,010 - root - INFO - Step 34800: lr=1.00E-05, loss= 1.1616 (max= 1.5639), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:01:22,010 - root - INFO - Step 34800: lr=1.00E-05, loss= 1.1616 (max= 1.5639), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:01:22,010 - root - INFO - Step 34800: lr=1.00E-05, loss= 1.1616 (max= 1.5639), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:01:22,010 - root - INFO - Step 34800: lr=1.00E-05, loss= 1.1616 (max= 1.5639), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:01:22,010 - root - INFO - Step 34800: lr=1.00E-05, loss= 1.1616 (max= 1.5639), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:01:22,010 - root - INFO - Step 34800: lr=1.00E-05, loss= 1.1616 (max= 1.5639), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:01:22,010 - root - INFO - Step 34800: lr=1.00E-05, loss= 1.1616 (max= 1.5639), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:01:22,010 - root - INFO - Step 34800: lr=1.00E-05, loss= 1.1616 (max= 1.5639), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:01:37,905 - root - INFO - Step 34810: lr=1.00E-05, loss= 1.1744 (max= 1.5550), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:01:37,905 - root - INFO - Step 34810: lr=1.00E-05, loss= 1.1744 (max= 1.5550), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:01:37,905 - root - INFO - Step 34810: lr=1.00E-05, loss= 1.1744 (max= 1.5550), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:01:37,905 - root - INFO - Step 34810: lr=1.00E-05, loss= 1.1744 (max= 1.5550), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:01:37,905 - root - INFO - Step 34810: lr=1.00E-05, loss= 1.1744 (max= 1.5550), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:01:37,905 - root - INFO - Step 34810: lr=1.00E-05, loss= 1.1744 (max= 1.5550), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:01:37,905 - root - INFO - Step 34810: lr=1.00E-05, loss= 1.1744 (max= 1.5550), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:01:37,905 - root - INFO - Step 34810: lr=1.00E-05, loss= 1.1744 (max= 1.5550), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:01:53,834 - root - INFO - Step 34820: lr=1.00E-05, loss= 1.1849 (max= 1.6964), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:01:53,834 - root - INFO - Step 34820: lr=1.00E-05, loss= 1.1849 (max= 1.6964), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:01:53,834 - root - INFO - Step 34820: lr=1.00E-05, loss= 1.1849 (max= 1.6964), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:01:53,834 - root - INFO - Step 34820: lr=1.00E-05, loss= 1.1849 (max= 1.6964), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:01:53,834 - root - INFO - Step 34820: lr=1.00E-05, loss= 1.1849 (max= 1.6964), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:01:53,834 - root - INFO - Step 34820: lr=1.00E-05, loss= 1.1849 (max= 1.6964), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:01:53,834 - root - INFO - Step 34820: lr=1.00E-05, loss= 1.1849 (max= 1.6964), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:01:53,834 - root - INFO - Step 34820: lr=1.00E-05, loss= 1.1849 (max= 1.6964), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:02:09,764 - root - INFO - Step 34830: lr=1.00E-05, loss= 1.1193 (max= 1.5022), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:09,764 - root - INFO - Step 34830: lr=1.00E-05, loss= 1.1193 (max= 1.5022), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:09,764 - root - INFO - Step 34830: lr=1.00E-05, loss= 1.1193 (max= 1.5022), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:09,764 - root - INFO - Step 34830: lr=1.00E-05, loss= 1.1193 (max= 1.5022), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:09,764 - root - INFO - Step 34830: lr=1.00E-05, loss= 1.1193 (max= 1.5022), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:09,764 - root - INFO - Step 34830: lr=1.00E-05, loss= 1.1193 (max= 1.5022), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:09,764 - root - INFO - Step 34830: lr=1.00E-05, loss= 1.1193 (max= 1.5022), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:09,764 - root - INFO - Step 34830: lr=1.00E-05, loss= 1.1193 (max= 1.5022), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:15,283 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:2474251 +2025-10-25 02:02:18,458 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:3615088 +2025-10-25 02:02:25,691 - root - INFO - Step 34840: lr=1.00E-05, loss= 1.1836 (max= 1.5014), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:25,691 - root - INFO - Step 34840: lr=1.00E-05, loss= 1.1836 (max= 1.5014), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:25,691 - root - INFO - Step 34840: lr=1.00E-05, loss= 1.1836 (max= 1.5014), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:25,692 - root - INFO - Step 34840: lr=1.00E-05, loss= 1.1836 (max= 1.5014), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:25,692 - root - INFO - Step 34840: lr=1.00E-05, loss= 1.1836 (max= 1.5014), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:25,692 - root - INFO - Step 34840: lr=1.00E-05, loss= 1.1836 (max= 1.5014), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:25,692 - root - INFO - Step 34840: lr=1.00E-05, loss= 1.1836 (max= 1.5014), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:25,692 - root - INFO - Step 34840: lr=1.00E-05, loss= 1.1836 (max= 1.5014), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:41,638 - root - INFO - Step 34850: lr=1.00E-05, loss= 1.1561 (max= 1.5609), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:02:41,638 - root - INFO - Step 34850: lr=1.00E-05, loss= 1.1561 (max= 1.5609), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:02:41,638 - root - INFO - Step 34850: lr=1.00E-05, loss= 1.1561 (max= 1.5609), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:02:41,638 - root - INFO - Step 34850: lr=1.00E-05, loss= 1.1561 (max= 1.5609), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:02:41,638 - root - INFO - Step 34850: lr=1.00E-05, loss= 1.1561 (max= 1.5609), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:02:41,638 - root - INFO - Step 34850: lr=1.00E-05, loss= 1.1561 (max= 1.5609), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:02:41,638 - root - INFO - Step 34850: lr=1.00E-05, loss= 1.1561 (max= 1.5609), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:02:41,638 - root - INFO - Step 34850: lr=1.00E-05, loss= 1.1561 (max= 1.5609), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:02:57,602 - root - INFO - Step 34860: lr=1.00E-05, loss= 1.1671 (max= 1.7057), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:57,602 - root - INFO - Step 34860: lr=1.00E-05, loss= 1.1671 (max= 1.7057), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:57,602 - root - INFO - Step 34860: lr=1.00E-05, loss= 1.1671 (max= 1.7057), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:57,602 - root - INFO - Step 34860: lr=1.00E-05, loss= 1.1671 (max= 1.7057), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:57,602 - root - INFO - Step 34860: lr=1.00E-05, loss= 1.1671 (max= 1.7057), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:57,602 - root - INFO - Step 34860: lr=1.00E-05, loss= 1.1671 (max= 1.7057), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:57,602 - root - INFO - Step 34860: lr=1.00E-05, loss= 1.1671 (max= 1.7057), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:02:57,602 - root - INFO - Step 34860: lr=1.00E-05, loss= 1.1671 (max= 1.7057), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:03:13,576 - root - INFO - Step 34870: lr=1.00E-05, loss= 1.1739 (max= 1.5577), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:03:13,576 - root - INFO - Step 34870: lr=1.00E-05, loss= 1.1739 (max= 1.5577), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:03:13,576 - root - INFO - Step 34870: lr=1.00E-05, loss= 1.1739 (max= 1.5577), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:03:13,576 - root - INFO - Step 34870: lr=1.00E-05, loss= 1.1739 (max= 1.5577), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:03:13,576 - root - INFO - Step 34870: lr=1.00E-05, loss= 1.1739 (max= 1.5577), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:03:13,576 - root - INFO - Step 34870: lr=1.00E-05, loss= 1.1739 (max= 1.5577), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:03:13,576 - root - INFO - Step 34870: lr=1.00E-05, loss= 1.1739 (max= 1.5577), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:03:13,576 - root - INFO - Step 34870: lr=1.00E-05, loss= 1.1739 (max= 1.5577), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:03:29,547 - root - INFO - Step 34880: lr=1.00E-05, loss= 1.1915 (max= 1.5565), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:03:29,547 - root - INFO - Step 34880: lr=1.00E-05, loss= 1.1915 (max= 1.5565), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:03:29,548 - root - INFO - Step 34880: lr=1.00E-05, loss= 1.1915 (max= 1.5565), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:03:29,548 - root - INFO - Step 34880: lr=1.00E-05, loss= 1.1915 (max= 1.5565), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:03:29,548 - root - INFO - Step 34880: lr=1.00E-05, loss= 1.1915 (max= 1.5565), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:03:29,548 - root - INFO - Step 34880: lr=1.00E-05, loss= 1.1915 (max= 1.5565), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:03:29,548 - root - INFO - Step 34880: lr=1.00E-05, loss= 1.1915 (max= 1.5565), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:03:29,548 - root - INFO - Step 34880: lr=1.00E-05, loss= 1.1915 (max= 1.5565), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:03:45,477 - root - INFO - Step 34890: lr=1.00E-05, loss= 1.1399 (max= 1.6076), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:03:45,477 - root - INFO - Step 34890: lr=1.00E-05, loss= 1.1399 (max= 1.6076), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:03:45,477 - root - INFO - Step 34890: lr=1.00E-05, loss= 1.1399 (max= 1.6076), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:03:45,477 - root - INFO - Step 34890: lr=1.00E-05, loss= 1.1399 (max= 1.6076), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:03:45,477 - root - INFO - Step 34890: lr=1.00E-05, loss= 1.1399 (max= 1.6076), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:03:45,477 - root - INFO - Step 34890: lr=1.00E-05, loss= 1.1399 (max= 1.6076), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:03:45,477 - root - INFO - Step 34890: lr=1.00E-05, loss= 1.1399 (max= 1.6076), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:03:45,477 - root - INFO - Step 34890: lr=1.00E-05, loss= 1.1399 (max= 1.6076), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:04:01,403 - root - INFO - Step 34900: lr=1.00E-05, loss= 1.1420 (max= 1.7666), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:04:01,403 - root - INFO - Step 34900: lr=1.00E-05, loss= 1.1420 (max= 1.7666), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:04:01,403 - root - INFO - Step 34900: lr=1.00E-05, loss= 1.1420 (max= 1.7666), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:04:01,403 - root - INFO - Step 34900: lr=1.00E-05, loss= 1.1420 (max= 1.7666), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:04:01,403 - root - INFO - Step 34900: lr=1.00E-05, loss= 1.1420 (max= 1.7666), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:04:01,403 - root - INFO - Step 34900: lr=1.00E-05, loss= 1.1420 (max= 1.7666), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:04:01,403 - root - INFO - Step 34900: lr=1.00E-05, loss= 1.1420 (max= 1.7666), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:04:01,403 - root - INFO - Step 34900: lr=1.00E-05, loss= 1.1420 (max= 1.7666), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:04:17,355 - root - INFO - Step 34910: lr=1.00E-05, loss= 1.1466 (max= 2.0549), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:17,355 - root - INFO - Step 34910: lr=1.00E-05, loss= 1.1466 (max= 2.0549), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:17,355 - root - INFO - Step 34910: lr=1.00E-05, loss= 1.1466 (max= 2.0549), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:17,355 - root - INFO - Step 34910: lr=1.00E-05, loss= 1.1466 (max= 2.0549), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:17,355 - root - INFO - Step 34910: lr=1.00E-05, loss= 1.1466 (max= 2.0549), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:17,355 - root - INFO - Step 34910: lr=1.00E-05, loss= 1.1466 (max= 2.0549), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:17,355 - root - INFO - Step 34910: lr=1.00E-05, loss= 1.1466 (max= 2.0549), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:17,355 - root - INFO - Step 34910: lr=1.00E-05, loss= 1.1466 (max= 2.0549), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:33,315 - root - INFO - Step 34920: lr=1.00E-05, loss= 1.1555 (max= 1.4892), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:33,315 - root - INFO - Step 34920: lr=1.00E-05, loss= 1.1555 (max= 1.4892), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:33,315 - root - INFO - Step 34920: lr=1.00E-05, loss= 1.1555 (max= 1.4892), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:33,315 - root - INFO - Step 34920: lr=1.00E-05, loss= 1.1555 (max= 1.4892), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:33,315 - root - INFO - Step 34920: lr=1.00E-05, loss= 1.1555 (max= 1.4892), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:33,315 - root - INFO - Step 34920: lr=1.00E-05, loss= 1.1555 (max= 1.4892), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:33,315 - root - INFO - Step 34920: lr=1.00E-05, loss= 1.1555 (max= 1.4892), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:33,315 - root - INFO - Step 34920: lr=1.00E-05, loss= 1.1555 (max= 1.4892), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:49,251 - root - INFO - Step 34930: lr=1.00E-05, loss= 1.1593 (max= 1.5612), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:49,251 - root - INFO - Step 34930: lr=1.00E-05, loss= 1.1593 (max= 1.5612), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:49,251 - root - INFO - Step 34930: lr=1.00E-05, loss= 1.1593 (max= 1.5612), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:49,251 - root - INFO - Step 34930: lr=1.00E-05, loss= 1.1593 (max= 1.5612), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:49,251 - root - INFO - Step 34930: lr=1.00E-05, loss= 1.1593 (max= 1.5612), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:49,251 - root - INFO - Step 34930: lr=1.00E-05, loss= 1.1593 (max= 1.5612), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:49,251 - root - INFO - Step 34930: lr=1.00E-05, loss= 1.1593 (max= 1.5612), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:04:49,251 - root - INFO - Step 34930: lr=1.00E-05, loss= 1.1593 (max= 1.5612), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:05,210 - root - INFO - Step 34940: lr=1.00E-05, loss= 1.2079 (max= 1.5333), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:05,210 - root - INFO - Step 34940: lr=1.00E-05, loss= 1.2079 (max= 1.5333), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:05,210 - root - INFO - Step 34940: lr=1.00E-05, loss= 1.2079 (max= 1.5333), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:05,210 - root - INFO - Step 34940: lr=1.00E-05, loss= 1.2079 (max= 1.5333), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:05,210 - root - INFO - Step 34940: lr=1.00E-05, loss= 1.2079 (max= 1.5333), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:05,210 - root - INFO - Step 34940: lr=1.00E-05, loss= 1.2079 (max= 1.5333), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:05,210 - root - INFO - Step 34940: lr=1.00E-05, loss= 1.2079 (max= 1.5333), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:05,210 - root - INFO - Step 34940: lr=1.00E-05, loss= 1.2079 (max= 1.5333), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:21,178 - root - INFO - Step 34950: lr=1.00E-05, loss= 1.1609 (max= 1.5843), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:05:21,178 - root - INFO - Step 34950: lr=1.00E-05, loss= 1.1609 (max= 1.5843), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:05:21,178 - root - INFO - Step 34950: lr=1.00E-05, loss= 1.1609 (max= 1.5843), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:05:21,179 - root - INFO - Step 34950: lr=1.00E-05, loss= 1.1609 (max= 1.5843), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:05:21,179 - root - INFO - Step 34950: lr=1.00E-05, loss= 1.1609 (max= 1.5843), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:05:21,179 - root - INFO - Step 34950: lr=1.00E-05, loss= 1.1609 (max= 1.5843), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:05:21,179 - root - INFO - Step 34950: lr=1.00E-05, loss= 1.1609 (max= 1.5843), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:05:21,179 - root - INFO - Step 34950: lr=1.00E-05, loss= 1.1609 (max= 1.5843), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:05:37,120 - root - INFO - Step 34960: lr=1.00E-05, loss= 1.1639 (max= 1.5542), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:37,120 - root - INFO - Step 34960: lr=1.00E-05, loss= 1.1639 (max= 1.5542), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:37,120 - root - INFO - Step 34960: lr=1.00E-05, loss= 1.1639 (max= 1.5542), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:37,120 - root - INFO - Step 34960: lr=1.00E-05, loss= 1.1639 (max= 1.5542), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:37,120 - root - INFO - Step 34960: lr=1.00E-05, loss= 1.1639 (max= 1.5542), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:37,120 - root - INFO - Step 34960: lr=1.00E-05, loss= 1.1639 (max= 1.5542), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:37,120 - root - INFO - Step 34960: lr=1.00E-05, loss= 1.1639 (max= 1.5542), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:37,120 - root - INFO - Step 34960: lr=1.00E-05, loss= 1.1639 (max= 1.5542), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:53,041 - root - INFO - Step 34970: lr=1.00E-05, loss= 1.1767 (max= 1.6819), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:53,041 - root - INFO - Step 34970: lr=1.00E-05, loss= 1.1767 (max= 1.6819), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:53,041 - root - INFO - Step 34970: lr=1.00E-05, loss= 1.1767 (max= 1.6819), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:53,041 - root - INFO - Step 34970: lr=1.00E-05, loss= 1.1767 (max= 1.6819), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:53,041 - root - INFO - Step 34970: lr=1.00E-05, loss= 1.1767 (max= 1.6819), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:53,041 - root - INFO - Step 34970: lr=1.00E-05, loss= 1.1767 (max= 1.6819), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:53,041 - root - INFO - Step 34970: lr=1.00E-05, loss= 1.1767 (max= 1.6819), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:05:53,041 - root - INFO - Step 34970: lr=1.00E-05, loss= 1.1767 (max= 1.6819), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:09,045 - root - INFO - Step 34980: lr=1.00E-05, loss= 1.1853 (max= 1.6844), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:09,045 - root - INFO - Step 34980: lr=1.00E-05, loss= 1.1853 (max= 1.6844), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:09,046 - root - INFO - Step 34980: lr=1.00E-05, loss= 1.1853 (max= 1.6844), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:09,046 - root - INFO - Step 34980: lr=1.00E-05, loss= 1.1853 (max= 1.6844), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:09,046 - root - INFO - Step 34980: lr=1.00E-05, loss= 1.1853 (max= 1.6844), tps=20479, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:09,046 - root - INFO - Step 34980: lr=1.00E-05, loss= 1.1853 (max= 1.6844), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:09,046 - root - INFO - Step 34980: lr=1.00E-05, loss= 1.1853 (max= 1.6844), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:09,046 - root - INFO - Step 34980: lr=1.00E-05, loss= 1.1853 (max= 1.6844), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:25,034 - root - INFO - Step 34990: lr=1.00E-05, loss= 1.1902 (max= 1.6648), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:25,034 - root - INFO - Step 34990: lr=1.00E-05, loss= 1.1902 (max= 1.6648), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:25,034 - root - INFO - Step 34990: lr=1.00E-05, loss= 1.1902 (max= 1.6648), tps=20498, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:25,034 - root - INFO - Step 34990: lr=1.00E-05, loss= 1.1902 (max= 1.6648), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:25,034 - root - INFO - Step 34990: lr=1.00E-05, loss= 1.1902 (max= 1.6648), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:25,034 - root - INFO - Step 34990: lr=1.00E-05, loss= 1.1902 (max= 1.6648), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:25,034 - root - INFO - Step 34990: lr=1.00E-05, loss= 1.1902 (max= 1.6648), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:25,034 - root - INFO - Step 34990: lr=1.00E-05, loss= 1.1902 (max= 1.6648), tps=20499, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-35000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-35000! Save time: 4.5056092739105225 +2025-10-25 02:06:40,970 - root - INFO - Step 35000: lr=1.00E-05, loss= 1.1958 (max= 1.6177), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:40,970 - root - INFO - Step 35000: lr=1.00E-05, loss= 1.1958 (max= 1.6177), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:40,971 - root - INFO - Saving a full checkpoint at step 35000 +2025-10-25 02:06:40,971 - root - INFO - Saving a full checkpoint at step 35000 +2025-10-25 02:06:40,971 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 02:06:40,971 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 02:06:40,971 - root - INFO - Step 35000: lr=1.00E-05, loss= 1.1958 (max= 1.6177), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:40,971 - root - INFO - Step 35000: lr=1.00E-05, loss= 1.1958 (max= 1.6177), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:40,971 - root - INFO - Step 35000: lr=1.00E-05, loss= 1.1958 (max= 1.6177), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:40,971 - root - INFO - Step 35000: lr=1.00E-05, loss= 1.1958 (max= 1.6177), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:40,971 - root - INFO - Saving a full checkpoint at step 35000 +2025-10-25 02:06:40,971 - root - INFO - Saving a full checkpoint at step 35000 +2025-10-25 02:06:40,971 - root - INFO - Saving a full checkpoint at step 35000 +2025-10-25 02:06:40,971 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 02:06:40,971 - root - INFO - Step 35000: lr=1.00E-05, loss= 1.1958 (max= 1.6177), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:40,971 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 02:06:40,971 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 02:06:40,971 - root - INFO - Saving a full checkpoint at step 35000 +2025-10-25 02:06:40,971 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 02:06:40,971 - root - INFO - Saving a full checkpoint at step 35000 +2025-10-25 02:06:40,971 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 02:06:40,971 - root - INFO - Step 35000: lr=1.00E-05, loss= 1.1958 (max= 1.6177), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:06:40,971 - root - INFO - Saving a full checkpoint at step 35000 +2025-10-25 02:06:40,971 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 02:06:54,692 - root - INFO - Finished saving the checkpoint in 13.72 seconds +2025-10-25 02:06:54,698 - root - INFO - Finished saving the checkpoint in 13.73 seconds +2025-10-25 02:06:54,698 - root - INFO - Finished saving the checkpoint in 13.73 seconds +2025-10-25 02:06:54,698 - root - INFO - Finished saving the checkpoint in 13.73 seconds +2025-10-25 02:06:54,699 - root - INFO - Finished saving the checkpoint in 13.73 seconds +2025-10-25 02:06:54,699 - root - INFO - Finished saving the checkpoint in 13.73 seconds +2025-10-25 02:06:54,699 - root - INFO - Finished saving the checkpoint in 13.73 seconds +2025-10-25 02:06:54,701 - root - INFO - Finished saving the checkpoint in 13.73 seconds +2025-10-25 02:07:10,570 - root - INFO - Step 35010: lr=1.00E-05, loss= 1.1551 (max= 1.5020), tps=11072, mfu=23.07%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:07:10,570 - root - INFO - Step 35010: lr=1.00E-05, loss= 1.1551 (max= 1.5020), tps=11072, mfu=23.07%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:07:10,570 - root - INFO - Step 35010: lr=1.00E-05, loss= 1.1551 (max= 1.5020), tps=11072, mfu=23.07%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:07:10,570 - root - INFO - Step 35010: lr=1.00E-05, loss= 1.1551 (max= 1.5020), tps=11072, mfu=23.07%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:07:10,570 - root - INFO - Step 35010: lr=1.00E-05, loss= 1.1551 (max= 1.5020), tps=11072, mfu=23.07%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:07:10,570 - root - INFO - Step 35010: lr=1.00E-05, loss= 1.1551 (max= 1.5020), tps=11072, mfu=23.07%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:07:10,571 - root - INFO - Step 35010: lr=1.00E-05, loss= 1.1551 (max= 1.5020), tps=11072, mfu=23.07%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:07:10,571 - root - INFO - Step 35010: lr=1.00E-05, loss= 1.1551 (max= 1.5020), tps=11072, mfu=23.07%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:07:26,505 - root - INFO - Step 35020: lr=1.00E-05, loss= 1.2118 (max= 1.8322), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:26,505 - root - INFO - Step 35020: lr=1.00E-05, loss= 1.2118 (max= 1.8322), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:26,505 - root - INFO - Step 35020: lr=1.00E-05, loss= 1.2118 (max= 1.8322), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:26,505 - root - INFO - Step 35020: lr=1.00E-05, loss= 1.2118 (max= 1.8322), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:26,505 - root - INFO - Step 35020: lr=1.00E-05, loss= 1.2118 (max= 1.8322), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:26,505 - root - INFO - Step 35020: lr=1.00E-05, loss= 1.2118 (max= 1.8322), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:26,505 - root - INFO - Step 35020: lr=1.00E-05, loss= 1.2118 (max= 1.8322), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:26,505 - root - INFO - Step 35020: lr=1.00E-05, loss= 1.2118 (max= 1.8322), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:42,430 - root - INFO - Step 35030: lr=1.00E-05, loss= 1.2296 (max= 1.6323), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:42,430 - root - INFO - Step 35030: lr=1.00E-05, loss= 1.2296 (max= 1.6323), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:42,431 - root - INFO - Step 35030: lr=1.00E-05, loss= 1.2296 (max= 1.6323), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:42,431 - root - INFO - Step 35030: lr=1.00E-05, loss= 1.2296 (max= 1.6323), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:42,431 - root - INFO - Step 35030: lr=1.00E-05, loss= 1.2296 (max= 1.6323), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:42,431 - root - INFO - Step 35030: lr=1.00E-05, loss= 1.2296 (max= 1.6323), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:42,431 - root - INFO - Step 35030: lr=1.00E-05, loss= 1.2296 (max= 1.6323), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:42,431 - root - INFO - Step 35030: lr=1.00E-05, loss= 1.2296 (max= 1.6323), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:58,354 - root - INFO - Step 35040: lr=1.00E-05, loss= 1.1488 (max= 1.5092), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:58,354 - root - INFO - Step 35040: lr=1.00E-05, loss= 1.1488 (max= 1.5092), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:58,354 - root - INFO - Step 35040: lr=1.00E-05, loss= 1.1488 (max= 1.5092), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:58,354 - root - INFO - Step 35040: lr=1.00E-05, loss= 1.1488 (max= 1.5092), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:58,354 - root - INFO - Step 35040: lr=1.00E-05, loss= 1.1488 (max= 1.5092), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:58,354 - root - INFO - Step 35040: lr=1.00E-05, loss= 1.1488 (max= 1.5092), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:58,354 - root - INFO - Step 35040: lr=1.00E-05, loss= 1.1488 (max= 1.5092), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:07:58,354 - root - INFO - Step 35040: lr=1.00E-05, loss= 1.1488 (max= 1.5092), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:14,252 - root - INFO - Step 35050: lr=1.00E-05, loss= 1.1730 (max= 1.5371), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:14,252 - root - INFO - Step 35050: lr=1.00E-05, loss= 1.1730 (max= 1.5371), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:14,252 - root - INFO - Step 35050: lr=1.00E-05, loss= 1.1730 (max= 1.5371), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:14,252 - root - INFO - Step 35050: lr=1.00E-05, loss= 1.1730 (max= 1.5371), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:14,252 - root - INFO - Step 35050: lr=1.00E-05, loss= 1.1730 (max= 1.5371), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:14,252 - root - INFO - Step 35050: lr=1.00E-05, loss= 1.1730 (max= 1.5371), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:14,252 - root - INFO - Step 35050: lr=1.00E-05, loss= 1.1730 (max= 1.5371), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:14,252 - root - INFO - Step 35050: lr=1.00E-05, loss= 1.1730 (max= 1.5371), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:22,983 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:6553744 +2025-10-25 02:08:30,213 - root - INFO - Step 35060: lr=1.00E-05, loss= 1.1632 (max= 1.6664), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:30,213 - root - INFO - Step 35060: lr=1.00E-05, loss= 1.1632 (max= 1.6664), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:30,213 - root - INFO - Step 35060: lr=1.00E-05, loss= 1.1632 (max= 1.6664), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:30,213 - root - INFO - Step 35060: lr=1.00E-05, loss= 1.1632 (max= 1.6664), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:30,213 - root - INFO - Step 35060: lr=1.00E-05, loss= 1.1632 (max= 1.6664), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:30,213 - root - INFO - Step 35060: lr=1.00E-05, loss= 1.1632 (max= 1.6664), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:30,213 - root - INFO - Step 35060: lr=1.00E-05, loss= 1.1632 (max= 1.6664), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:30,213 - root - INFO - Step 35060: lr=1.00E-05, loss= 1.1632 (max= 1.6664), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:46,192 - root - INFO - Step 35070: lr=1.00E-05, loss= 1.1624 (max= 1.5041), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:46,192 - root - INFO - Step 35070: lr=1.00E-05, loss= 1.1624 (max= 1.5041), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:46,193 - root - INFO - Step 35070: lr=1.00E-05, loss= 1.1624 (max= 1.5041), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:46,193 - root - INFO - Step 35070: lr=1.00E-05, loss= 1.1624 (max= 1.5041), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:46,193 - root - INFO - Step 35070: lr=1.00E-05, loss= 1.1624 (max= 1.5041), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:46,193 - root - INFO - Step 35070: lr=1.00E-05, loss= 1.1624 (max= 1.5041), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:46,193 - root - INFO - Step 35070: lr=1.00E-05, loss= 1.1624 (max= 1.5041), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:08:46,193 - root - INFO - Step 35070: lr=1.00E-05, loss= 1.1624 (max= 1.5041), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:02,159 - root - INFO - Step 35080: lr=1.00E-05, loss= 1.1588 (max= 1.6411), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:02,159 - root - INFO - Step 35080: lr=1.00E-05, loss= 1.1588 (max= 1.6411), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:02,159 - root - INFO - Step 35080: lr=1.00E-05, loss= 1.1588 (max= 1.6411), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:02,159 - root - INFO - Step 35080: lr=1.00E-05, loss= 1.1588 (max= 1.6411), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:02,159 - root - INFO - Step 35080: lr=1.00E-05, loss= 1.1588 (max= 1.6411), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:02,159 - root - INFO - Step 35080: lr=1.00E-05, loss= 1.1588 (max= 1.6411), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:02,159 - root - INFO - Step 35080: lr=1.00E-05, loss= 1.1588 (max= 1.6411), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:02,159 - root - INFO - Step 35080: lr=1.00E-05, loss= 1.1588 (max= 1.6411), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:18,113 - root - INFO - Step 35090: lr=1.00E-05, loss= 1.1567 (max= 1.8480), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:18,113 - root - INFO - Step 35090: lr=1.00E-05, loss= 1.1567 (max= 1.8480), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:18,114 - root - INFO - Step 35090: lr=1.00E-05, loss= 1.1567 (max= 1.8480), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:18,114 - root - INFO - Step 35090: lr=1.00E-05, loss= 1.1567 (max= 1.8480), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:18,114 - root - INFO - Step 35090: lr=1.00E-05, loss= 1.1567 (max= 1.8480), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:18,114 - root - INFO - Step 35090: lr=1.00E-05, loss= 1.1567 (max= 1.8480), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:18,114 - root - INFO - Step 35090: lr=1.00E-05, loss= 1.1567 (max= 1.8480), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:18,114 - root - INFO - Step 35090: lr=1.00E-05, loss= 1.1567 (max= 1.8480), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:34,052 - root - INFO - Step 35100: lr=1.00E-05, loss= 1.1568 (max= 1.4689), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:34,052 - root - INFO - Step 35100: lr=1.00E-05, loss= 1.1568 (max= 1.4689), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:34,052 - root - INFO - Step 35100: lr=1.00E-05, loss= 1.1568 (max= 1.4689), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:34,052 - root - INFO - Step 35100: lr=1.00E-05, loss= 1.1568 (max= 1.4689), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:34,052 - root - INFO - Step 35100: lr=1.00E-05, loss= 1.1568 (max= 1.4689), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:34,052 - root - INFO - Step 35100: lr=1.00E-05, loss= 1.1568 (max= 1.4689), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:34,052 - root - INFO - Step 35100: lr=1.00E-05, loss= 1.1568 (max= 1.4689), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:34,053 - root - INFO - Step 35100: lr=1.00E-05, loss= 1.1568 (max= 1.4689), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:49,994 - root - INFO - Step 35110: lr=1.00E-05, loss= 1.1589 (max= 1.8586), tps=20559, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:49,994 - root - INFO - Step 35110: lr=1.00E-05, loss= 1.1589 (max= 1.8586), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:49,994 - root - INFO - Step 35110: lr=1.00E-05, loss= 1.1589 (max= 1.8586), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:49,994 - root - INFO - Step 35110: lr=1.00E-05, loss= 1.1589 (max= 1.8586), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:49,994 - root - INFO - Step 35110: lr=1.00E-05, loss= 1.1589 (max= 1.8586), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:49,994 - root - INFO - Step 35110: lr=1.00E-05, loss= 1.1589 (max= 1.8586), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:49,994 - root - INFO - Step 35110: lr=1.00E-05, loss= 1.1589 (max= 1.8586), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:09:49,994 - root - INFO - Step 35110: lr=1.00E-05, loss= 1.1589 (max= 1.8586), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:05,939 - root - INFO - Step 35120: lr=1.00E-05, loss= 1.1184 (max= 1.4885), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:05,939 - root - INFO - Step 35120: lr=1.00E-05, loss= 1.1184 (max= 1.4885), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:05,939 - root - INFO - Step 35120: lr=1.00E-05, loss= 1.1184 (max= 1.4885), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:05,939 - root - INFO - Step 35120: lr=1.00E-05, loss= 1.1184 (max= 1.4885), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:05,939 - root - INFO - Step 35120: lr=1.00E-05, loss= 1.1184 (max= 1.4885), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:05,939 - root - INFO - Step 35120: lr=1.00E-05, loss= 1.1184 (max= 1.4885), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:05,939 - root - INFO - Step 35120: lr=1.00E-05, loss= 1.1184 (max= 1.4885), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:05,939 - root - INFO - Step 35120: lr=1.00E-05, loss= 1.1184 (max= 1.4885), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:21,891 - root - INFO - Step 35130: lr=1.00E-05, loss= 1.1852 (max= 1.6224), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:21,891 - root - INFO - Step 35130: lr=1.00E-05, loss= 1.1852 (max= 1.6224), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:21,891 - root - INFO - Step 35130: lr=1.00E-05, loss= 1.1852 (max= 1.6224), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:21,891 - root - INFO - Step 35130: lr=1.00E-05, loss= 1.1852 (max= 1.6224), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:21,891 - root - INFO - Step 35130: lr=1.00E-05, loss= 1.1852 (max= 1.6224), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:21,891 - root - INFO - Step 35130: lr=1.00E-05, loss= 1.1852 (max= 1.6224), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:21,891 - root - INFO - Step 35130: lr=1.00E-05, loss= 1.1852 (max= 1.6224), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:21,891 - root - INFO - Step 35130: lr=1.00E-05, loss= 1.1852 (max= 1.6224), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:37,783 - root - INFO - Step 35140: lr=1.00E-05, loss= 1.1735 (max= 1.6954), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:37,783 - root - INFO - Step 35140: lr=1.00E-05, loss= 1.1735 (max= 1.6954), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:37,783 - root - INFO - Step 35140: lr=1.00E-05, loss= 1.1735 (max= 1.6954), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:37,783 - root - INFO - Step 35140: lr=1.00E-05, loss= 1.1735 (max= 1.6954), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:37,783 - root - INFO - Step 35140: lr=1.00E-05, loss= 1.1735 (max= 1.6954), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:37,783 - root - INFO - Step 35140: lr=1.00E-05, loss= 1.1735 (max= 1.6954), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:37,783 - root - INFO - Step 35140: lr=1.00E-05, loss= 1.1735 (max= 1.6954), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:37,783 - root - INFO - Step 35140: lr=1.00E-05, loss= 1.1735 (max= 1.6954), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:53,755 - root - INFO - Step 35150: lr=1.00E-05, loss= 1.1426 (max= 1.5353), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:53,755 - root - INFO - Step 35150: lr=1.00E-05, loss= 1.1426 (max= 1.5353), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:53,755 - root - INFO - Step 35150: lr=1.00E-05, loss= 1.1426 (max= 1.5353), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:53,755 - root - INFO - Step 35150: lr=1.00E-05, loss= 1.1426 (max= 1.5353), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:53,755 - root - INFO - Step 35150: lr=1.00E-05, loss= 1.1426 (max= 1.5353), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:53,755 - root - INFO - Step 35150: lr=1.00E-05, loss= 1.1426 (max= 1.5353), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:53,755 - root - INFO - Step 35150: lr=1.00E-05, loss= 1.1426 (max= 1.5353), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:10:53,755 - root - INFO - Step 35150: lr=1.00E-05, loss= 1.1426 (max= 1.5353), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:09,723 - root - INFO - Step 35160: lr=1.00E-05, loss= 1.1447 (max= 1.5722), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:11:09,724 - root - INFO - Step 35160: lr=1.00E-05, loss= 1.1447 (max= 1.5722), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:11:09,724 - root - INFO - Step 35160: lr=1.00E-05, loss= 1.1447 (max= 1.5722), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:11:09,724 - root - INFO - Step 35160: lr=1.00E-05, loss= 1.1447 (max= 1.5722), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:11:09,724 - root - INFO - Step 35160: lr=1.00E-05, loss= 1.1447 (max= 1.5722), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:11:09,724 - root - INFO - Step 35160: lr=1.00E-05, loss= 1.1447 (max= 1.5722), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:11:09,724 - root - INFO - Step 35160: lr=1.00E-05, loss= 1.1447 (max= 1.5722), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:11:09,724 - root - INFO - Step 35160: lr=1.00E-05, loss= 1.1447 (max= 1.5722), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:11:25,663 - root - INFO - Step 35170: lr=1.00E-05, loss= 1.1722 (max= 1.6074), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:25,663 - root - INFO - Step 35170: lr=1.00E-05, loss= 1.1722 (max= 1.6074), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:25,663 - root - INFO - Step 35170: lr=1.00E-05, loss= 1.1722 (max= 1.6074), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:25,663 - root - INFO - Step 35170: lr=1.00E-05, loss= 1.1722 (max= 1.6074), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:25,663 - root - INFO - Step 35170: lr=1.00E-05, loss= 1.1722 (max= 1.6074), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:25,663 - root - INFO - Step 35170: lr=1.00E-05, loss= 1.1722 (max= 1.6074), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:25,663 - root - INFO - Step 35170: lr=1.00E-05, loss= 1.1722 (max= 1.6074), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:25,663 - root - INFO - Step 35170: lr=1.00E-05, loss= 1.1722 (max= 1.6074), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:41,616 - root - INFO - Step 35180: lr=1.00E-05, loss= 1.1806 (max= 1.7710), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:41,616 - root - INFO - Step 35180: lr=1.00E-05, loss= 1.1806 (max= 1.7710), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:41,616 - root - INFO - Step 35180: lr=1.00E-05, loss= 1.1806 (max= 1.7710), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:41,616 - root - INFO - Step 35180: lr=1.00E-05, loss= 1.1806 (max= 1.7710), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:41,616 - root - INFO - Step 35180: lr=1.00E-05, loss= 1.1806 (max= 1.7710), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:41,616 - root - INFO - Step 35180: lr=1.00E-05, loss= 1.1806 (max= 1.7710), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:41,616 - root - INFO - Step 35180: lr=1.00E-05, loss= 1.1806 (max= 1.7710), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:41,616 - root - INFO - Step 35180: lr=1.00E-05, loss= 1.1806 (max= 1.7710), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:57,578 - root - INFO - Step 35190: lr=1.00E-05, loss= 1.1606 (max= 1.6598), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:57,579 - root - INFO - Step 35190: lr=1.00E-05, loss= 1.1606 (max= 1.6598), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:57,579 - root - INFO - Step 35190: lr=1.00E-05, loss= 1.1606 (max= 1.6598), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:57,579 - root - INFO - Step 35190: lr=1.00E-05, loss= 1.1606 (max= 1.6598), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:57,579 - root - INFO - Step 35190: lr=1.00E-05, loss= 1.1606 (max= 1.6598), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:57,579 - root - INFO - Step 35190: lr=1.00E-05, loss= 1.1606 (max= 1.6598), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:57,579 - root - INFO - Step 35190: lr=1.00E-05, loss= 1.1606 (max= 1.6598), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:11:57,579 - root - INFO - Step 35190: lr=1.00E-05, loss= 1.1606 (max= 1.6598), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:12:13,478 - root - INFO - Step 35200: lr=1.00E-05, loss= 1.1416 (max= 1.5642), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:12:13,478 - root - INFO - Step 35200: lr=1.00E-05, loss= 1.1416 (max= 1.5642), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:12:13,478 - root - INFO - Step 35200: lr=1.00E-05, loss= 1.1416 (max= 1.5642), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:12:13,478 - root - INFO - Step 35200: lr=1.00E-05, loss= 1.1416 (max= 1.5642), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:12:13,478 - root - INFO - Step 35200: lr=1.00E-05, loss= 1.1416 (max= 1.5642), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:12:13,478 - root - INFO - Step 35200: lr=1.00E-05, loss= 1.1416 (max= 1.5642), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:12:13,479 - root - INFO - Step 35200: lr=1.00E-05, loss= 1.1416 (max= 1.5642), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:12:13,479 - root - INFO - Step 35200: lr=1.00E-05, loss= 1.1416 (max= 1.5642), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:12:29,399 - root - INFO - Step 35210: lr=1.00E-05, loss= 1.1322 (max= 1.5619), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:12:29,399 - root - INFO - Step 35210: lr=1.00E-05, loss= 1.1322 (max= 1.5619), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:12:29,399 - root - INFO - Step 35210: lr=1.00E-05, loss= 1.1322 (max= 1.5619), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:12:29,399 - root - INFO - Step 35210: lr=1.00E-05, loss= 1.1322 (max= 1.5619), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:12:29,399 - root - INFO - Step 35210: lr=1.00E-05, loss= 1.1322 (max= 1.5619), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:12:29,399 - root - INFO - Step 35210: lr=1.00E-05, loss= 1.1322 (max= 1.5619), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:12:29,399 - root - INFO - Step 35210: lr=1.00E-05, loss= 1.1322 (max= 1.5619), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:12:29,400 - root - INFO - Step 35210: lr=1.00E-05, loss= 1.1322 (max= 1.5619), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:12:45,309 - root - INFO - Step 35220: lr=1.00E-05, loss= 1.1851 (max= 1.5572), tps=20600, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:12:45,309 - root - INFO - Step 35220: lr=1.00E-05, loss= 1.1851 (max= 1.5572), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:12:45,309 - root - INFO - Step 35220: lr=1.00E-05, loss= 1.1851 (max= 1.5572), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:12:45,309 - root - INFO - Step 35220: lr=1.00E-05, loss= 1.1851 (max= 1.5572), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:12:45,309 - root - INFO - Step 35220: lr=1.00E-05, loss= 1.1851 (max= 1.5572), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:12:45,309 - root - INFO - Step 35220: lr=1.00E-05, loss= 1.1851 (max= 1.5572), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:12:45,309 - root - INFO - Step 35220: lr=1.00E-05, loss= 1.1851 (max= 1.5572), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:12:45,309 - root - INFO - Step 35220: lr=1.00E-05, loss= 1.1851 (max= 1.5572), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:01,273 - root - INFO - Step 35230: lr=1.00E-05, loss= 1.1637 (max= 1.8208), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:01,273 - root - INFO - Step 35230: lr=1.00E-05, loss= 1.1637 (max= 1.8208), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:01,273 - root - INFO - Step 35230: lr=1.00E-05, loss= 1.1637 (max= 1.8208), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:01,273 - root - INFO - Step 35230: lr=1.00E-05, loss= 1.1637 (max= 1.8208), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:01,273 - root - INFO - Step 35230: lr=1.00E-05, loss= 1.1637 (max= 1.8208), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:01,273 - root - INFO - Step 35230: lr=1.00E-05, loss= 1.1637 (max= 1.8208), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:01,274 - root - INFO - Step 35230: lr=1.00E-05, loss= 1.1637 (max= 1.8208), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:01,274 - root - INFO - Step 35230: lr=1.00E-05, loss= 1.1637 (max= 1.8208), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:17,211 - root - INFO - Step 35240: lr=1.00E-05, loss= 1.1700 (max= 1.4457), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:13:17,211 - root - INFO - Step 35240: lr=1.00E-05, loss= 1.1700 (max= 1.4457), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:13:17,211 - root - INFO - Step 35240: lr=1.00E-05, loss= 1.1700 (max= 1.4457), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:13:17,211 - root - INFO - Step 35240: lr=1.00E-05, loss= 1.1700 (max= 1.4457), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:13:17,211 - root - INFO - Step 35240: lr=1.00E-05, loss= 1.1700 (max= 1.4457), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:13:17,211 - root - INFO - Step 35240: lr=1.00E-05, loss= 1.1700 (max= 1.4457), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:13:17,211 - root - INFO - Step 35240: lr=1.00E-05, loss= 1.1700 (max= 1.4457), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:13:17,211 - root - INFO - Step 35240: lr=1.00E-05, loss= 1.1700 (max= 1.4457), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:13:33,147 - root - INFO - Step 35250: lr=1.00E-05, loss= 1.1453 (max= 1.4691), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:33,147 - root - INFO - Step 35250: lr=1.00E-05, loss= 1.1453 (max= 1.4691), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:33,147 - root - INFO - Step 35250: lr=1.00E-05, loss= 1.1453 (max= 1.4691), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:33,147 - root - INFO - Step 35250: lr=1.00E-05, loss= 1.1453 (max= 1.4691), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:33,147 - root - INFO - Step 35250: lr=1.00E-05, loss= 1.1453 (max= 1.4691), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:33,147 - root - INFO - Step 35250: lr=1.00E-05, loss= 1.1453 (max= 1.4691), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:33,147 - root - INFO - Step 35250: lr=1.00E-05, loss= 1.1453 (max= 1.4691), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:33,147 - root - INFO - Step 35250: lr=1.00E-05, loss= 1.1453 (max= 1.4691), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:49,115 - root - INFO - Step 35260: lr=1.00E-05, loss= 1.1622 (max= 1.4093), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:49,115 - root - INFO - Step 35260: lr=1.00E-05, loss= 1.1622 (max= 1.4093), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:49,115 - root - INFO - Step 35260: lr=1.00E-05, loss= 1.1622 (max= 1.4093), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:49,115 - root - INFO - Step 35260: lr=1.00E-05, loss= 1.1622 (max= 1.4093), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:49,115 - root - INFO - Step 35260: lr=1.00E-05, loss= 1.1622 (max= 1.4093), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:49,115 - root - INFO - Step 35260: lr=1.00E-05, loss= 1.1622 (max= 1.4093), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:49,115 - root - INFO - Step 35260: lr=1.00E-05, loss= 1.1622 (max= 1.4093), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:13:49,115 - root - INFO - Step 35260: lr=1.00E-05, loss= 1.1622 (max= 1.4093), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:05,047 - root - INFO - Step 35270: lr=1.00E-05, loss= 1.1559 (max= 1.5656), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:05,047 - root - INFO - Step 35270: lr=1.00E-05, loss= 1.1559 (max= 1.5656), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:05,047 - root - INFO - Step 35270: lr=1.00E-05, loss= 1.1559 (max= 1.5656), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:05,047 - root - INFO - Step 35270: lr=1.00E-05, loss= 1.1559 (max= 1.5656), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:05,047 - root - INFO - Step 35270: lr=1.00E-05, loss= 1.1559 (max= 1.5656), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:05,047 - root - INFO - Step 35270: lr=1.00E-05, loss= 1.1559 (max= 1.5656), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:05,047 - root - INFO - Step 35270: lr=1.00E-05, loss= 1.1559 (max= 1.5656), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:05,047 - root - INFO - Step 35270: lr=1.00E-05, loss= 1.1559 (max= 1.5656), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:20,989 - root - INFO - Step 35280: lr=1.00E-05, loss= 1.1724 (max= 1.7425), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:14:20,989 - root - INFO - Step 35280: lr=1.00E-05, loss= 1.1724 (max= 1.7425), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:14:20,989 - root - INFO - Step 35280: lr=1.00E-05, loss= 1.1724 (max= 1.7425), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:14:20,989 - root - INFO - Step 35280: lr=1.00E-05, loss= 1.1724 (max= 1.7425), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:14:20,989 - root - INFO - Step 35280: lr=1.00E-05, loss= 1.1724 (max= 1.7425), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:14:20,989 - root - INFO - Step 35280: lr=1.00E-05, loss= 1.1724 (max= 1.7425), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:14:20,989 - root - INFO - Step 35280: lr=1.00E-05, loss= 1.1724 (max= 1.7425), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:14:20,989 - root - INFO - Step 35280: lr=1.00E-05, loss= 1.1724 (max= 1.7425), tps=20559, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:14:36,927 - root - INFO - Step 35290: lr=1.00E-05, loss= 1.1703 (max= 1.7628), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:36,927 - root - INFO - Step 35290: lr=1.00E-05, loss= 1.1703 (max= 1.7628), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:36,927 - root - INFO - Step 35290: lr=1.00E-05, loss= 1.1703 (max= 1.7628), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:36,927 - root - INFO - Step 35290: lr=1.00E-05, loss= 1.1703 (max= 1.7628), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:36,927 - root - INFO - Step 35290: lr=1.00E-05, loss= 1.1703 (max= 1.7628), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:36,927 - root - INFO - Step 35290: lr=1.00E-05, loss= 1.1703 (max= 1.7628), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:36,927 - root - INFO - Step 35290: lr=1.00E-05, loss= 1.1703 (max= 1.7628), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:36,928 - root - INFO - Step 35290: lr=1.00E-05, loss= 1.1703 (max= 1.7628), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:52,832 - root - INFO - Step 35300: lr=1.00E-05, loss= 1.1633 (max= 1.5513), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:52,833 - root - INFO - Step 35300: lr=1.00E-05, loss= 1.1633 (max= 1.5513), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:52,833 - root - INFO - Step 35300: lr=1.00E-05, loss= 1.1633 (max= 1.5513), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:52,833 - root - INFO - Step 35300: lr=1.00E-05, loss= 1.1633 (max= 1.5513), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:52,833 - root - INFO - Step 35300: lr=1.00E-05, loss= 1.1633 (max= 1.5513), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:52,833 - root - INFO - Step 35300: lr=1.00E-05, loss= 1.1633 (max= 1.5513), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:52,833 - root - INFO - Step 35300: lr=1.00E-05, loss= 1.1633 (max= 1.5513), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:14:52,833 - root - INFO - Step 35300: lr=1.00E-05, loss= 1.1633 (max= 1.5513), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:08,766 - root - INFO - Step 35310: lr=1.00E-05, loss= 1.1542 (max= 1.5352), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:08,766 - root - INFO - Step 35310: lr=1.00E-05, loss= 1.1542 (max= 1.5352), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:08,766 - root - INFO - Step 35310: lr=1.00E-05, loss= 1.1542 (max= 1.5352), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:08,766 - root - INFO - Step 35310: lr=1.00E-05, loss= 1.1542 (max= 1.5352), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:08,766 - root - INFO - Step 35310: lr=1.00E-05, loss= 1.1542 (max= 1.5352), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:08,766 - root - INFO - Step 35310: lr=1.00E-05, loss= 1.1542 (max= 1.5352), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:08,766 - root - INFO - Step 35310: lr=1.00E-05, loss= 1.1542 (max= 1.5352), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:08,767 - root - INFO - Step 35310: lr=1.00E-05, loss= 1.1542 (max= 1.5352), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:24,691 - root - INFO - Step 35320: lr=1.00E-05, loss= 1.1488 (max= 1.5196), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:15:24,691 - root - INFO - Step 35320: lr=1.00E-05, loss= 1.1488 (max= 1.5196), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:15:24,691 - root - INFO - Step 35320: lr=1.00E-05, loss= 1.1488 (max= 1.5196), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:15:24,691 - root - INFO - Step 35320: lr=1.00E-05, loss= 1.1488 (max= 1.5196), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:15:24,691 - root - INFO - Step 35320: lr=1.00E-05, loss= 1.1488 (max= 1.5196), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:15:24,691 - root - INFO - Step 35320: lr=1.00E-05, loss= 1.1488 (max= 1.5196), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:15:24,691 - root - INFO - Step 35320: lr=1.00E-05, loss= 1.1488 (max= 1.5196), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:15:24,691 - root - INFO - Step 35320: lr=1.00E-05, loss= 1.1488 (max= 1.5196), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:15:40,594 - root - INFO - Step 35330: lr=1.00E-05, loss= 1.1497 (max= 1.6026), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:40,594 - root - INFO - Step 35330: lr=1.00E-05, loss= 1.1497 (max= 1.6026), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:40,594 - root - INFO - Step 35330: lr=1.00E-05, loss= 1.1497 (max= 1.6026), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:40,594 - root - INFO - Step 35330: lr=1.00E-05, loss= 1.1497 (max= 1.6026), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:40,594 - root - INFO - Step 35330: lr=1.00E-05, loss= 1.1497 (max= 1.6026), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:40,594 - root - INFO - Step 35330: lr=1.00E-05, loss= 1.1497 (max= 1.6026), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:40,594 - root - INFO - Step 35330: lr=1.00E-05, loss= 1.1497 (max= 1.6026), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:40,594 - root - INFO - Step 35330: lr=1.00E-05, loss= 1.1497 (max= 1.6026), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:56,533 - root - INFO - Step 35340: lr=1.00E-05, loss= 1.1783 (max= 1.6115), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:56,533 - root - INFO - Step 35340: lr=1.00E-05, loss= 1.1783 (max= 1.6115), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:56,533 - root - INFO - Step 35340: lr=1.00E-05, loss= 1.1783 (max= 1.6115), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:56,533 - root - INFO - Step 35340: lr=1.00E-05, loss= 1.1783 (max= 1.6115), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:56,533 - root - INFO - Step 35340: lr=1.00E-05, loss= 1.1783 (max= 1.6115), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:56,533 - root - INFO - Step 35340: lr=1.00E-05, loss= 1.1783 (max= 1.6115), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:56,533 - root - INFO - Step 35340: lr=1.00E-05, loss= 1.1783 (max= 1.6115), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:15:56,533 - root - INFO - Step 35340: lr=1.00E-05, loss= 1.1783 (max= 1.6115), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:12,480 - root - INFO - Step 35350: lr=1.00E-05, loss= 1.1795 (max= 1.5394), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:12,480 - root - INFO - Step 35350: lr=1.00E-05, loss= 1.1795 (max= 1.5394), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:12,480 - root - INFO - Step 35350: lr=1.00E-05, loss= 1.1795 (max= 1.5394), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:12,480 - root - INFO - Step 35350: lr=1.00E-05, loss= 1.1795 (max= 1.5394), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:12,480 - root - INFO - Step 35350: lr=1.00E-05, loss= 1.1795 (max= 1.5394), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:12,480 - root - INFO - Step 35350: lr=1.00E-05, loss= 1.1795 (max= 1.5394), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:12,480 - root - INFO - Step 35350: lr=1.00E-05, loss= 1.1795 (max= 1.5394), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:12,481 - root - INFO - Step 35350: lr=1.00E-05, loss= 1.1795 (max= 1.5394), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:28,417 - root - INFO - Step 35360: lr=1.00E-05, loss= 1.1460 (max= 1.4805), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:28,417 - root - INFO - Step 35360: lr=1.00E-05, loss= 1.1460 (max= 1.4805), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:28,417 - root - INFO - Step 35360: lr=1.00E-05, loss= 1.1460 (max= 1.4805), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:28,417 - root - INFO - Step 35360: lr=1.00E-05, loss= 1.1460 (max= 1.4805), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:28,417 - root - INFO - Step 35360: lr=1.00E-05, loss= 1.1460 (max= 1.4805), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:28,417 - root - INFO - Step 35360: lr=1.00E-05, loss= 1.1460 (max= 1.4805), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:28,417 - root - INFO - Step 35360: lr=1.00E-05, loss= 1.1460 (max= 1.4805), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:28,417 - root - INFO - Step 35360: lr=1.00E-05, loss= 1.1460 (max= 1.4805), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:44,331 - root - INFO - Step 35370: lr=1.00E-05, loss= 1.1736 (max= 1.5381), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:44,332 - root - INFO - Step 35370: lr=1.00E-05, loss= 1.1736 (max= 1.5381), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:44,332 - root - INFO - Step 35370: lr=1.00E-05, loss= 1.1736 (max= 1.5381), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:44,332 - root - INFO - Step 35370: lr=1.00E-05, loss= 1.1736 (max= 1.5381), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:44,332 - root - INFO - Step 35370: lr=1.00E-05, loss= 1.1736 (max= 1.5381), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:44,332 - root - INFO - Step 35370: lr=1.00E-05, loss= 1.1736 (max= 1.5381), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:44,332 - root - INFO - Step 35370: lr=1.00E-05, loss= 1.1736 (max= 1.5381), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:16:44,332 - root - INFO - Step 35370: lr=1.00E-05, loss= 1.1736 (max= 1.5381), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:00,246 - root - INFO - Step 35380: lr=1.00E-05, loss= 1.1538 (max= 1.4659), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:00,246 - root - INFO - Step 35380: lr=1.00E-05, loss= 1.1538 (max= 1.4659), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:00,246 - root - INFO - Step 35380: lr=1.00E-05, loss= 1.1538 (max= 1.4659), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:00,246 - root - INFO - Step 35380: lr=1.00E-05, loss= 1.1538 (max= 1.4659), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:00,246 - root - INFO - Step 35380: lr=1.00E-05, loss= 1.1538 (max= 1.4659), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:00,246 - root - INFO - Step 35380: lr=1.00E-05, loss= 1.1538 (max= 1.4659), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:00,246 - root - INFO - Step 35380: lr=1.00E-05, loss= 1.1538 (max= 1.4659), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:00,246 - root - INFO - Step 35380: lr=1.00E-05, loss= 1.1538 (max= 1.4659), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:16,193 - root - INFO - Step 35390: lr=1.00E-05, loss= 1.1736 (max= 1.6977), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:16,193 - root - INFO - Step 35390: lr=1.00E-05, loss= 1.1736 (max= 1.6977), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:16,193 - root - INFO - Step 35390: lr=1.00E-05, loss= 1.1736 (max= 1.6977), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:16,193 - root - INFO - Step 35390: lr=1.00E-05, loss= 1.1736 (max= 1.6977), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:16,193 - root - INFO - Step 35390: lr=1.00E-05, loss= 1.1736 (max= 1.6977), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:16,193 - root - INFO - Step 35390: lr=1.00E-05, loss= 1.1736 (max= 1.6977), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:16,193 - root - INFO - Step 35390: lr=1.00E-05, loss= 1.1736 (max= 1.6977), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:16,193 - root - INFO - Step 35390: lr=1.00E-05, loss= 1.1736 (max= 1.6977), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:32,136 - root - INFO - Step 35400: lr=1.00E-05, loss= 1.1801 (max= 1.5654), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:32,136 - root - INFO - Step 35400: lr=1.00E-05, loss= 1.1801 (max= 1.5654), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:32,136 - root - INFO - Step 35400: lr=1.00E-05, loss= 1.1801 (max= 1.5654), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:32,136 - root - INFO - Step 35400: lr=1.00E-05, loss= 1.1801 (max= 1.5654), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:32,136 - root - INFO - Step 35400: lr=1.00E-05, loss= 1.1801 (max= 1.5654), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:32,136 - root - INFO - Step 35400: lr=1.00E-05, loss= 1.1801 (max= 1.5654), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:32,136 - root - INFO - Step 35400: lr=1.00E-05, loss= 1.1801 (max= 1.5654), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:32,136 - root - INFO - Step 35400: lr=1.00E-05, loss= 1.1801 (max= 1.5654), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:48,063 - root - INFO - Step 35410: lr=1.00E-05, loss= 1.1209 (max= 1.6290), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:48,063 - root - INFO - Step 35410: lr=1.00E-05, loss= 1.1209 (max= 1.6290), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:48,063 - root - INFO - Step 35410: lr=1.00E-05, loss= 1.1209 (max= 1.6290), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:48,063 - root - INFO - Step 35410: lr=1.00E-05, loss= 1.1209 (max= 1.6290), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:48,063 - root - INFO - Step 35410: lr=1.00E-05, loss= 1.1209 (max= 1.6290), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:48,063 - root - INFO - Step 35410: lr=1.00E-05, loss= 1.1209 (max= 1.6290), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:48,063 - root - INFO - Step 35410: lr=1.00E-05, loss= 1.1209 (max= 1.6290), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:17:48,063 - root - INFO - Step 35410: lr=1.00E-05, loss= 1.1209 (max= 1.6290), tps=20578, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:03,945 - root - INFO - Step 35420: lr=1.00E-05, loss= 1.1702 (max= 1.5864), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:03,945 - root - INFO - Step 35420: lr=1.00E-05, loss= 1.1702 (max= 1.5864), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:03,945 - root - INFO - Step 35420: lr=1.00E-05, loss= 1.1702 (max= 1.5864), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:03,945 - root - INFO - Step 35420: lr=1.00E-05, loss= 1.1702 (max= 1.5864), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:03,945 - root - INFO - Step 35420: lr=1.00E-05, loss= 1.1702 (max= 1.5864), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:03,945 - root - INFO - Step 35420: lr=1.00E-05, loss= 1.1702 (max= 1.5864), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:03,945 - root - INFO - Step 35420: lr=1.00E-05, loss= 1.1702 (max= 1.5864), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:03,945 - root - INFO - Step 35420: lr=1.00E-05, loss= 1.1702 (max= 1.5864), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:19,872 - root - INFO - Step 35430: lr=1.00E-05, loss= 1.1267 (max= 1.4758), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:19,872 - root - INFO - Step 35430: lr=1.00E-05, loss= 1.1267 (max= 1.4758), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:19,872 - root - INFO - Step 35430: lr=1.00E-05, loss= 1.1267 (max= 1.4758), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:19,872 - root - INFO - Step 35430: lr=1.00E-05, loss= 1.1267 (max= 1.4758), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:19,872 - root - INFO - Step 35430: lr=1.00E-05, loss= 1.1267 (max= 1.4758), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:19,872 - root - INFO - Step 35430: lr=1.00E-05, loss= 1.1267 (max= 1.4758), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:19,872 - root - INFO - Step 35430: lr=1.00E-05, loss= 1.1267 (max= 1.4758), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:19,872 - root - INFO - Step 35430: lr=1.00E-05, loss= 1.1267 (max= 1.4758), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:35,838 - root - INFO - Step 35440: lr=1.00E-05, loss= 1.1794 (max= 1.7615), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:35,838 - root - INFO - Step 35440: lr=1.00E-05, loss= 1.1794 (max= 1.7615), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:35,838 - root - INFO - Step 35440: lr=1.00E-05, loss= 1.1794 (max= 1.7615), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:35,838 - root - INFO - Step 35440: lr=1.00E-05, loss= 1.1794 (max= 1.7615), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:35,838 - root - INFO - Step 35440: lr=1.00E-05, loss= 1.1794 (max= 1.7615), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:35,838 - root - INFO - Step 35440: lr=1.00E-05, loss= 1.1794 (max= 1.7615), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:35,838 - root - INFO - Step 35440: lr=1.00E-05, loss= 1.1794 (max= 1.7615), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:35,838 - root - INFO - Step 35440: lr=1.00E-05, loss= 1.1794 (max= 1.7615), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:18:51,762 - root - INFO - Step 35450: lr=1.00E-05, loss= 1.1973 (max= 1.8785), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:18:51,762 - root - INFO - Step 35450: lr=1.00E-05, loss= 1.1973 (max= 1.8785), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:18:51,762 - root - INFO - Step 35450: lr=1.00E-05, loss= 1.1973 (max= 1.8785), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:18:51,762 - root - INFO - Step 35450: lr=1.00E-05, loss= 1.1973 (max= 1.8785), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:18:51,762 - root - INFO - Step 35450: lr=1.00E-05, loss= 1.1973 (max= 1.8785), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:18:51,762 - root - INFO - Step 35450: lr=1.00E-05, loss= 1.1973 (max= 1.8785), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:18:51,762 - root - INFO - Step 35450: lr=1.00E-05, loss= 1.1973 (max= 1.8785), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:18:51,762 - root - INFO - Step 35450: lr=1.00E-05, loss= 1.1973 (max= 1.8785), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:19:07,734 - root - INFO - Step 35460: lr=1.00E-05, loss= 1.1858 (max= 1.5250), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:19:07,734 - root - INFO - Step 35460: lr=1.00E-05, loss= 1.1858 (max= 1.5250), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:19:07,734 - root - INFO - Step 35460: lr=1.00E-05, loss= 1.1858 (max= 1.5250), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:19:07,734 - root - INFO - Step 35460: lr=1.00E-05, loss= 1.1858 (max= 1.5250), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:19:07,734 - root - INFO - Step 35460: lr=1.00E-05, loss= 1.1858 (max= 1.5250), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:19:07,734 - root - INFO - Step 35460: lr=1.00E-05, loss= 1.1858 (max= 1.5250), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:19:07,734 - root - INFO - Step 35460: lr=1.00E-05, loss= 1.1858 (max= 1.5250), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:19:07,734 - root - INFO - Step 35460: lr=1.00E-05, loss= 1.1858 (max= 1.5250), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:19:23,671 - root - INFO - Step 35470: lr=1.00E-05, loss= 1.1347 (max= 1.5987), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:19:23,671 - root - INFO - Step 35470: lr=1.00E-05, loss= 1.1347 (max= 1.5987), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:19:23,671 - root - INFO - Step 35470: lr=1.00E-05, loss= 1.1347 (max= 1.5987), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:19:23,671 - root - INFO - Step 35470: lr=1.00E-05, loss= 1.1347 (max= 1.5987), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:19:23,671 - root - INFO - Step 35470: lr=1.00E-05, loss= 1.1347 (max= 1.5987), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:19:23,671 - root - INFO - Step 35470: lr=1.00E-05, loss= 1.1347 (max= 1.5987), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:19:23,671 - root - INFO - Step 35470: lr=1.00E-05, loss= 1.1347 (max= 1.5987), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:19:23,671 - root - INFO - Step 35470: lr=1.00E-05, loss= 1.1347 (max= 1.5987), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:19:39,648 - root - INFO - Step 35480: lr=1.00E-05, loss= 1.1160 (max= 1.4514), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:19:39,648 - root - INFO - Step 35480: lr=1.00E-05, loss= 1.1160 (max= 1.4514), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:19:39,648 - root - INFO - Step 35480: lr=1.00E-05, loss= 1.1160 (max= 1.4514), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:19:39,648 - root - INFO - Step 35480: lr=1.00E-05, loss= 1.1160 (max= 1.4514), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:19:39,648 - root - INFO - Step 35480: lr=1.00E-05, loss= 1.1160 (max= 1.4514), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:19:39,648 - root - INFO - Step 35480: lr=1.00E-05, loss= 1.1160 (max= 1.4514), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:19:39,648 - root - INFO - Step 35480: lr=1.00E-05, loss= 1.1160 (max= 1.4514), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:19:39,648 - root - INFO - Step 35480: lr=1.00E-05, loss= 1.1160 (max= 1.4514), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:19:55,616 - root - INFO - Step 35490: lr=1.00E-05, loss= 1.1468 (max= 1.6668), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:19:55,616 - root - INFO - Step 35490: lr=1.00E-05, loss= 1.1468 (max= 1.6668), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:19:55,616 - root - INFO - Step 35490: lr=1.00E-05, loss= 1.1468 (max= 1.6668), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:19:55,616 - root - INFO - Step 35490: lr=1.00E-05, loss= 1.1468 (max= 1.6668), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:19:55,616 - root - INFO - Step 35490: lr=1.00E-05, loss= 1.1468 (max= 1.6668), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:19:55,616 - root - INFO - Step 35490: lr=1.00E-05, loss= 1.1468 (max= 1.6668), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:19:55,616 - root - INFO - Step 35490: lr=1.00E-05, loss= 1.1468 (max= 1.6668), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:19:55,616 - root - INFO - Step 35490: lr=1.00E-05, loss= 1.1468 (max= 1.6668), tps=20525, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:20:11,579 - root - INFO - Step 35500: lr=1.00E-05, loss= 1.1696 (max= 1.6107), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:20:11,579 - root - INFO - Step 35500: lr=1.00E-05, loss= 1.1696 (max= 1.6107), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:20:11,579 - root - INFO - Step 35500: lr=1.00E-05, loss= 1.1696 (max= 1.6107), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:20:11,579 - root - INFO - Step 35500: lr=1.00E-05, loss= 1.1696 (max= 1.6107), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:20:11,579 - root - INFO - Step 35500: lr=1.00E-05, loss= 1.1696 (max= 1.6107), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:20:11,579 - root - INFO - Step 35500: lr=1.00E-05, loss= 1.1696 (max= 1.6107), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:20:11,579 - root - INFO - Step 35500: lr=1.00E-05, loss= 1.1696 (max= 1.6107), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:20:11,579 - root - INFO - Step 35500: lr=1.00E-05, loss= 1.1696 (max= 1.6107), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:20:27,557 - root - INFO - Step 35510: lr=1.00E-05, loss= 1.1767 (max= 1.9178), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:20:27,557 - root - INFO - Step 35510: lr=1.00E-05, loss= 1.1767 (max= 1.9178), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:20:27,557 - root - INFO - Step 35510: lr=1.00E-05, loss= 1.1767 (max= 1.9178), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:20:27,557 - root - INFO - Step 35510: lr=1.00E-05, loss= 1.1767 (max= 1.9178), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:20:27,557 - root - INFO - Step 35510: lr=1.00E-05, loss= 1.1767 (max= 1.9178), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:20:27,557 - root - INFO - Step 35510: lr=1.00E-05, loss= 1.1767 (max= 1.9178), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:20:27,557 - root - INFO - Step 35510: lr=1.00E-05, loss= 1.1767 (max= 1.9178), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:20:27,557 - root - INFO - Step 35510: lr=1.00E-05, loss= 1.1767 (max= 1.9178), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:20:43,515 - root - INFO - Step 35520: lr=1.00E-05, loss= 1.1378 (max= 1.4645), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:20:43,515 - root - INFO - Step 35520: lr=1.00E-05, loss= 1.1378 (max= 1.4645), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:20:43,515 - root - INFO - Step 35520: lr=1.00E-05, loss= 1.1378 (max= 1.4645), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:20:43,515 - root - INFO - Step 35520: lr=1.00E-05, loss= 1.1378 (max= 1.4645), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:20:43,515 - root - INFO - Step 35520: lr=1.00E-05, loss= 1.1378 (max= 1.4645), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:20:43,515 - root - INFO - Step 35520: lr=1.00E-05, loss= 1.1378 (max= 1.4645), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:20:43,515 - root - INFO - Step 35520: lr=1.00E-05, loss= 1.1378 (max= 1.4645), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:20:43,515 - root - INFO - Step 35520: lr=1.00E-05, loss= 1.1378 (max= 1.4645), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:20:59,466 - root - INFO - Step 35530: lr=1.00E-05, loss= 1.1159 (max= 1.6069), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:20:59,466 - root - INFO - Step 35530: lr=1.00E-05, loss= 1.1159 (max= 1.6069), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:20:59,466 - root - INFO - Step 35530: lr=1.00E-05, loss= 1.1159 (max= 1.6069), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:20:59,466 - root - INFO - Step 35530: lr=1.00E-05, loss= 1.1159 (max= 1.6069), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:20:59,466 - root - INFO - Step 35530: lr=1.00E-05, loss= 1.1159 (max= 1.6069), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:20:59,466 - root - INFO - Step 35530: lr=1.00E-05, loss= 1.1159 (max= 1.6069), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:20:59,466 - root - INFO - Step 35530: lr=1.00E-05, loss= 1.1159 (max= 1.6069), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:20:59,466 - root - INFO - Step 35530: lr=1.00E-05, loss= 1.1159 (max= 1.6069), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:21:15,426 - root - INFO - Step 35540: lr=1.00E-05, loss= 1.1586 (max= 1.6378), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:21:15,426 - root - INFO - Step 35540: lr=1.00E-05, loss= 1.1586 (max= 1.6378), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:21:15,426 - root - INFO - Step 35540: lr=1.00E-05, loss= 1.1586 (max= 1.6378), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:21:15,426 - root - INFO - Step 35540: lr=1.00E-05, loss= 1.1586 (max= 1.6378), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:21:15,426 - root - INFO - Step 35540: lr=1.00E-05, loss= 1.1586 (max= 1.6378), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:21:15,426 - root - INFO - Step 35540: lr=1.00E-05, loss= 1.1586 (max= 1.6378), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:21:15,426 - root - INFO - Step 35540: lr=1.00E-05, loss= 1.1586 (max= 1.6378), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:21:15,426 - root - INFO - Step 35540: lr=1.00E-05, loss= 1.1586 (max= 1.6378), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:21:31,401 - root - INFO - Step 35550: lr=1.00E-05, loss= 1.1516 (max= 1.4922), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:21:31,401 - root - INFO - Step 35550: lr=1.00E-05, loss= 1.1516 (max= 1.4922), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:21:31,401 - root - INFO - Step 35550: lr=1.00E-05, loss= 1.1516 (max= 1.4922), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:21:31,401 - root - INFO - Step 35550: lr=1.00E-05, loss= 1.1516 (max= 1.4922), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:21:31,401 - root - INFO - Step 35550: lr=1.00E-05, loss= 1.1516 (max= 1.4922), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:21:31,401 - root - INFO - Step 35550: lr=1.00E-05, loss= 1.1516 (max= 1.4922), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:21:31,401 - root - INFO - Step 35550: lr=1.00E-05, loss= 1.1516 (max= 1.4922), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:21:31,401 - root - INFO - Step 35550: lr=1.00E-05, loss= 1.1516 (max= 1.4922), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:21:47,366 - root - INFO - Step 35560: lr=1.00E-05, loss= 1.1631 (max= 1.7285), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:21:47,366 - root - INFO - Step 35560: lr=1.00E-05, loss= 1.1631 (max= 1.7285), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:21:47,366 - root - INFO - Step 35560: lr=1.00E-05, loss= 1.1631 (max= 1.7285), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:21:47,366 - root - INFO - Step 35560: lr=1.00E-05, loss= 1.1631 (max= 1.7285), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:21:47,366 - root - INFO - Step 35560: lr=1.00E-05, loss= 1.1631 (max= 1.7285), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:21:47,366 - root - INFO - Step 35560: lr=1.00E-05, loss= 1.1631 (max= 1.7285), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:21:47,366 - root - INFO - Step 35560: lr=1.00E-05, loss= 1.1631 (max= 1.7285), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:21:47,366 - root - INFO - Step 35560: lr=1.00E-05, loss= 1.1631 (max= 1.7285), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:22:03,302 - root - INFO - Step 35570: lr=1.00E-05, loss= 1.1339 (max= 1.8631), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:03,302 - root - INFO - Step 35570: lr=1.00E-05, loss= 1.1339 (max= 1.8631), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:03,302 - root - INFO - Step 35570: lr=1.00E-05, loss= 1.1339 (max= 1.8631), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:03,302 - root - INFO - Step 35570: lr=1.00E-05, loss= 1.1339 (max= 1.8631), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:03,302 - root - INFO - Step 35570: lr=1.00E-05, loss= 1.1339 (max= 1.8631), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:03,302 - root - INFO - Step 35570: lr=1.00E-05, loss= 1.1339 (max= 1.8631), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:03,302 - root - INFO - Step 35570: lr=1.00E-05, loss= 1.1339 (max= 1.8631), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:03,302 - root - INFO - Step 35570: lr=1.00E-05, loss= 1.1339 (max= 1.8631), tps=20566, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:19,248 - root - INFO - Step 35580: lr=1.00E-05, loss= 1.0993 (max= 1.4785), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:22:19,248 - root - INFO - Step 35580: lr=1.00E-05, loss= 1.0993 (max= 1.4785), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:22:19,248 - root - INFO - Step 35580: lr=1.00E-05, loss= 1.0993 (max= 1.4785), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:22:19,248 - root - INFO - Step 35580: lr=1.00E-05, loss= 1.0993 (max= 1.4785), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:22:19,248 - root - INFO - Step 35580: lr=1.00E-05, loss= 1.0993 (max= 1.4785), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:22:19,248 - root - INFO - Step 35580: lr=1.00E-05, loss= 1.0993 (max= 1.4785), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:22:19,248 - root - INFO - Step 35580: lr=1.00E-05, loss= 1.0993 (max= 1.4785), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:22:19,248 - root - INFO - Step 35580: lr=1.00E-05, loss= 1.0993 (max= 1.4785), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:22:35,207 - root - INFO - Step 35590: lr=1.00E-05, loss= 1.1837 (max= 1.5915), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:35,207 - root - INFO - Step 35590: lr=1.00E-05, loss= 1.1837 (max= 1.5915), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:35,207 - root - INFO - Step 35590: lr=1.00E-05, loss= 1.1837 (max= 1.5915), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:35,207 - root - INFO - Step 35590: lr=1.00E-05, loss= 1.1837 (max= 1.5915), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:35,207 - root - INFO - Step 35590: lr=1.00E-05, loss= 1.1837 (max= 1.5915), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:35,207 - root - INFO - Step 35590: lr=1.00E-05, loss= 1.1837 (max= 1.5915), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:35,207 - root - INFO - Step 35590: lr=1.00E-05, loss= 1.1837 (max= 1.5915), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:35,207 - root - INFO - Step 35590: lr=1.00E-05, loss= 1.1837 (max= 1.5915), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:51,176 - root - INFO - Step 35600: lr=1.00E-05, loss= 1.1667 (max= 1.7312), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:51,176 - root - INFO - Step 35600: lr=1.00E-05, loss= 1.1667 (max= 1.7312), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:51,176 - root - INFO - Step 35600: lr=1.00E-05, loss= 1.1667 (max= 1.7312), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:51,176 - root - INFO - Step 35600: lr=1.00E-05, loss= 1.1667 (max= 1.7312), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:51,176 - root - INFO - Step 35600: lr=1.00E-05, loss= 1.1667 (max= 1.7312), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:51,176 - root - INFO - Step 35600: lr=1.00E-05, loss= 1.1667 (max= 1.7312), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:51,176 - root - INFO - Step 35600: lr=1.00E-05, loss= 1.1667 (max= 1.7312), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:22:51,176 - root - INFO - Step 35600: lr=1.00E-05, loss= 1.1667 (max= 1.7312), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:23:07,074 - root - INFO - Step 35610: lr=1.00E-05, loss= 1.1776 (max= 1.6744), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:23:07,074 - root - INFO - Step 35610: lr=1.00E-05, loss= 1.1776 (max= 1.6744), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:23:07,074 - root - INFO - Step 35610: lr=1.00E-05, loss= 1.1776 (max= 1.6744), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:23:07,074 - root - INFO - Step 35610: lr=1.00E-05, loss= 1.1776 (max= 1.6744), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:23:07,074 - root - INFO - Step 35610: lr=1.00E-05, loss= 1.1776 (max= 1.6744), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:23:07,074 - root - INFO - Step 35610: lr=1.00E-05, loss= 1.1776 (max= 1.6744), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:23:07,074 - root - INFO - Step 35610: lr=1.00E-05, loss= 1.1776 (max= 1.6744), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:23:07,074 - root - INFO - Step 35610: lr=1.00E-05, loss= 1.1776 (max= 1.6744), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:23:22,992 - root - INFO - Step 35620: lr=1.00E-05, loss= 1.1463 (max= 1.6427), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:23:22,992 - root - INFO - Step 35620: lr=1.00E-05, loss= 1.1463 (max= 1.6427), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:23:22,992 - root - INFO - Step 35620: lr=1.00E-05, loss= 1.1463 (max= 1.6427), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:23:22,992 - root - INFO - Step 35620: lr=1.00E-05, loss= 1.1463 (max= 1.6427), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:23:22,992 - root - INFO - Step 35620: lr=1.00E-05, loss= 1.1463 (max= 1.6427), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:23:22,992 - root - INFO - Step 35620: lr=1.00E-05, loss= 1.1463 (max= 1.6427), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:23:22,992 - root - INFO - Step 35620: lr=1.00E-05, loss= 1.1463 (max= 1.6427), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:23:22,992 - root - INFO - Step 35620: lr=1.00E-05, loss= 1.1463 (max= 1.6427), tps=20590, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:23:38,885 - root - INFO - Step 35630: lr=1.00E-05, loss= 1.1362 (max= 1.5509), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:23:38,885 - root - INFO - Step 35630: lr=1.00E-05, loss= 1.1362 (max= 1.5509), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:23:38,885 - root - INFO - Step 35630: lr=1.00E-05, loss= 1.1362 (max= 1.5509), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:23:38,885 - root - INFO - Step 35630: lr=1.00E-05, loss= 1.1362 (max= 1.5509), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:23:38,885 - root - INFO - Step 35630: lr=1.00E-05, loss= 1.1362 (max= 1.5509), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:23:38,885 - root - INFO - Step 35630: lr=1.00E-05, loss= 1.1362 (max= 1.5509), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:23:38,885 - root - INFO - Step 35630: lr=1.00E-05, loss= 1.1362 (max= 1.5509), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:23:38,885 - root - INFO - Step 35630: lr=1.00E-05, loss= 1.1362 (max= 1.5509), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:23:54,838 - root - INFO - Step 35640: lr=1.00E-05, loss= 1.1309 (max= 1.6695), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:23:54,838 - root - INFO - Step 35640: lr=1.00E-05, loss= 1.1309 (max= 1.6695), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:23:54,838 - root - INFO - Step 35640: lr=1.00E-05, loss= 1.1309 (max= 1.6695), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:23:54,838 - root - INFO - Step 35640: lr=1.00E-05, loss= 1.1309 (max= 1.6695), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:23:54,838 - root - INFO - Step 35640: lr=1.00E-05, loss= 1.1309 (max= 1.6695), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:23:54,838 - root - INFO - Step 35640: lr=1.00E-05, loss= 1.1309 (max= 1.6695), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:23:54,838 - root - INFO - Step 35640: lr=1.00E-05, loss= 1.1309 (max= 1.6695), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:23:54,839 - root - INFO - Step 35640: lr=1.00E-05, loss= 1.1309 (max= 1.6695), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:10,769 - root - INFO - Step 35650: lr=1.00E-05, loss= 1.1983 (max= 1.9586), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:10,769 - root - INFO - Step 35650: lr=1.00E-05, loss= 1.1983 (max= 1.9586), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:10,769 - root - INFO - Step 35650: lr=1.00E-05, loss= 1.1983 (max= 1.9586), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:10,769 - root - INFO - Step 35650: lr=1.00E-05, loss= 1.1983 (max= 1.9586), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:10,769 - root - INFO - Step 35650: lr=1.00E-05, loss= 1.1983 (max= 1.9586), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:10,769 - root - INFO - Step 35650: lr=1.00E-05, loss= 1.1983 (max= 1.9586), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:10,769 - root - INFO - Step 35650: lr=1.00E-05, loss= 1.1983 (max= 1.9586), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:10,769 - root - INFO - Step 35650: lr=1.00E-05, loss= 1.1983 (max= 1.9586), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:26,716 - root - INFO - Step 35660: lr=1.00E-05, loss= 1.1522 (max= 1.6133), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:26,716 - root - INFO - Step 35660: lr=1.00E-05, loss= 1.1522 (max= 1.6133), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:26,717 - root - INFO - Step 35660: lr=1.00E-05, loss= 1.1522 (max= 1.6133), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:26,717 - root - INFO - Step 35660: lr=1.00E-05, loss= 1.1522 (max= 1.6133), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:26,717 - root - INFO - Step 35660: lr=1.00E-05, loss= 1.1522 (max= 1.6133), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:26,717 - root - INFO - Step 35660: lr=1.00E-05, loss= 1.1522 (max= 1.6133), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:26,717 - root - INFO - Step 35660: lr=1.00E-05, loss= 1.1522 (max= 1.6133), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:26,717 - root - INFO - Step 35660: lr=1.00E-05, loss= 1.1522 (max= 1.6133), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:42,619 - root - INFO - Step 35670: lr=1.00E-05, loss= 1.1502 (max= 1.6328), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:42,619 - root - INFO - Step 35670: lr=1.00E-05, loss= 1.1502 (max= 1.6328), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:42,619 - root - INFO - Step 35670: lr=1.00E-05, loss= 1.1502 (max= 1.6328), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:42,619 - root - INFO - Step 35670: lr=1.00E-05, loss= 1.1502 (max= 1.6328), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:42,619 - root - INFO - Step 35670: lr=1.00E-05, loss= 1.1502 (max= 1.6328), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:42,619 - root - INFO - Step 35670: lr=1.00E-05, loss= 1.1502 (max= 1.6328), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:42,619 - root - INFO - Step 35670: lr=1.00E-05, loss= 1.1502 (max= 1.6328), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:42,619 - root - INFO - Step 35670: lr=1.00E-05, loss= 1.1502 (max= 1.6328), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:58,567 - root - INFO - Step 35680: lr=1.00E-05, loss= 1.1484 (max= 1.7057), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:58,567 - root - INFO - Step 35680: lr=1.00E-05, loss= 1.1484 (max= 1.7057), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:58,567 - root - INFO - Step 35680: lr=1.00E-05, loss= 1.1484 (max= 1.7057), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:58,567 - root - INFO - Step 35680: lr=1.00E-05, loss= 1.1484 (max= 1.7057), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:58,567 - root - INFO - Step 35680: lr=1.00E-05, loss= 1.1484 (max= 1.7057), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:58,567 - root - INFO - Step 35680: lr=1.00E-05, loss= 1.1484 (max= 1.7057), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:58,567 - root - INFO - Step 35680: lr=1.00E-05, loss= 1.1484 (max= 1.7057), tps=20552, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:24:58,567 - root - INFO - Step 35680: lr=1.00E-05, loss= 1.1484 (max= 1.7057), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:14,479 - root - INFO - Step 35690: lr=1.00E-05, loss= 1.1686 (max= 1.6720), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:14,479 - root - INFO - Step 35690: lr=1.00E-05, loss= 1.1686 (max= 1.6720), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:14,479 - root - INFO - Step 35690: lr=1.00E-05, loss= 1.1686 (max= 1.6720), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:14,479 - root - INFO - Step 35690: lr=1.00E-05, loss= 1.1686 (max= 1.6720), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:14,479 - root - INFO - Step 35690: lr=1.00E-05, loss= 1.1686 (max= 1.6720), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:14,479 - root - INFO - Step 35690: lr=1.00E-05, loss= 1.1686 (max= 1.6720), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:14,479 - root - INFO - Step 35690: lr=1.00E-05, loss= 1.1686 (max= 1.6720), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:14,479 - root - INFO - Step 35690: lr=1.00E-05, loss= 1.1686 (max= 1.6720), tps=20597, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:30,444 - root - INFO - Step 35700: lr=1.00E-05, loss= 1.1103 (max= 1.4224), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:30,444 - root - INFO - Step 35700: lr=1.00E-05, loss= 1.1103 (max= 1.4224), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:30,444 - root - INFO - Step 35700: lr=1.00E-05, loss= 1.1103 (max= 1.4224), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:30,444 - root - INFO - Step 35700: lr=1.00E-05, loss= 1.1103 (max= 1.4224), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:30,444 - root - INFO - Step 35700: lr=1.00E-05, loss= 1.1103 (max= 1.4224), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:30,444 - root - INFO - Step 35700: lr=1.00E-05, loss= 1.1103 (max= 1.4224), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:30,444 - root - INFO - Step 35700: lr=1.00E-05, loss= 1.1103 (max= 1.4224), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:30,444 - root - INFO - Step 35700: lr=1.00E-05, loss= 1.1103 (max= 1.4224), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:46,393 - root - INFO - Step 35710: lr=1.00E-05, loss= 1.1445 (max= 1.6539), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:46,393 - root - INFO - Step 35710: lr=1.00E-05, loss= 1.1445 (max= 1.6539), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:46,393 - root - INFO - Step 35710: lr=1.00E-05, loss= 1.1445 (max= 1.6539), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:46,393 - root - INFO - Step 35710: lr=1.00E-05, loss= 1.1445 (max= 1.6539), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:46,393 - root - INFO - Step 35710: lr=1.00E-05, loss= 1.1445 (max= 1.6539), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:46,393 - root - INFO - Step 35710: lr=1.00E-05, loss= 1.1445 (max= 1.6539), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:46,393 - root - INFO - Step 35710: lr=1.00E-05, loss= 1.1445 (max= 1.6539), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:25:46,393 - root - INFO - Step 35710: lr=1.00E-05, loss= 1.1445 (max= 1.6539), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:02,289 - root - INFO - Step 35720: lr=1.00E-05, loss= 1.1679 (max= 1.5198), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:02,289 - root - INFO - Step 35720: lr=1.00E-05, loss= 1.1679 (max= 1.5198), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:02,289 - root - INFO - Step 35720: lr=1.00E-05, loss= 1.1679 (max= 1.5198), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:02,289 - root - INFO - Step 35720: lr=1.00E-05, loss= 1.1679 (max= 1.5198), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:02,289 - root - INFO - Step 35720: lr=1.00E-05, loss= 1.1679 (max= 1.5198), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:02,289 - root - INFO - Step 35720: lr=1.00E-05, loss= 1.1679 (max= 1.5198), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:02,289 - root - INFO - Step 35720: lr=1.00E-05, loss= 1.1679 (max= 1.5198), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:02,289 - root - INFO - Step 35720: lr=1.00E-05, loss= 1.1679 (max= 1.5198), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:18,212 - root - INFO - Step 35730: lr=1.00E-05, loss= 1.1491 (max= 1.5480), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:18,212 - root - INFO - Step 35730: lr=1.00E-05, loss= 1.1491 (max= 1.5480), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:18,212 - root - INFO - Step 35730: lr=1.00E-05, loss= 1.1491 (max= 1.5480), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:18,212 - root - INFO - Step 35730: lr=1.00E-05, loss= 1.1491 (max= 1.5480), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:18,212 - root - INFO - Step 35730: lr=1.00E-05, loss= 1.1491 (max= 1.5480), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:18,212 - root - INFO - Step 35730: lr=1.00E-05, loss= 1.1491 (max= 1.5480), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:18,212 - root - INFO - Step 35730: lr=1.00E-05, loss= 1.1491 (max= 1.5480), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:18,212 - root - INFO - Step 35730: lr=1.00E-05, loss= 1.1491 (max= 1.5480), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:34,121 - root - INFO - Step 35740: lr=1.00E-05, loss= 1.1520 (max= 1.5332), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:34,121 - root - INFO - Step 35740: lr=1.00E-05, loss= 1.1520 (max= 1.5332), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:34,121 - root - INFO - Step 35740: lr=1.00E-05, loss= 1.1520 (max= 1.5332), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:34,121 - root - INFO - Step 35740: lr=1.00E-05, loss= 1.1520 (max= 1.5332), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:34,121 - root - INFO - Step 35740: lr=1.00E-05, loss= 1.1520 (max= 1.5332), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:34,121 - root - INFO - Step 35740: lr=1.00E-05, loss= 1.1520 (max= 1.5332), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:34,121 - root - INFO - Step 35740: lr=1.00E-05, loss= 1.1520 (max= 1.5332), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:34,121 - root - INFO - Step 35740: lr=1.00E-05, loss= 1.1520 (max= 1.5332), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:50,006 - root - INFO - Step 35750: lr=1.00E-05, loss= 1.1494 (max= 1.8219), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:50,006 - root - INFO - Step 35750: lr=1.00E-05, loss= 1.1494 (max= 1.8219), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:50,006 - root - INFO - Step 35750: lr=1.00E-05, loss= 1.1494 (max= 1.8219), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:50,006 - root - INFO - Step 35750: lr=1.00E-05, loss= 1.1494 (max= 1.8219), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:50,006 - root - INFO - Step 35750: lr=1.00E-05, loss= 1.1494 (max= 1.8219), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:50,006 - root - INFO - Step 35750: lr=1.00E-05, loss= 1.1494 (max= 1.8219), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:50,006 - root - INFO - Step 35750: lr=1.00E-05, loss= 1.1494 (max= 1.8219), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:26:50,006 - root - INFO - Step 35750: lr=1.00E-05, loss= 1.1494 (max= 1.8219), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:27:05,964 - root - INFO - Step 35760: lr=1.00E-05, loss= 1.1383 (max= 1.5530), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:05,965 - root - INFO - Step 35760: lr=1.00E-05, loss= 1.1383 (max= 1.5530), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:05,965 - root - INFO - Step 35760: lr=1.00E-05, loss= 1.1383 (max= 1.5530), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:05,965 - root - INFO - Step 35760: lr=1.00E-05, loss= 1.1383 (max= 1.5530), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:05,965 - root - INFO - Step 35760: lr=1.00E-05, loss= 1.1383 (max= 1.5530), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:05,965 - root - INFO - Step 35760: lr=1.00E-05, loss= 1.1383 (max= 1.5530), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:05,965 - root - INFO - Step 35760: lr=1.00E-05, loss= 1.1383 (max= 1.5530), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:05,965 - root - INFO - Step 35760: lr=1.00E-05, loss= 1.1383 (max= 1.5530), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:21,935 - root - INFO - Step 35770: lr=1.00E-05, loss= 1.1592 (max= 1.5122), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:21,935 - root - INFO - Step 35770: lr=1.00E-05, loss= 1.1592 (max= 1.5122), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:21,936 - root - INFO - Step 35770: lr=1.00E-05, loss= 1.1592 (max= 1.5122), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:21,936 - root - INFO - Step 35770: lr=1.00E-05, loss= 1.1592 (max= 1.5122), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:21,936 - root - INFO - Step 35770: lr=1.00E-05, loss= 1.1592 (max= 1.5122), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:21,936 - root - INFO - Step 35770: lr=1.00E-05, loss= 1.1592 (max= 1.5122), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:21,936 - root - INFO - Step 35770: lr=1.00E-05, loss= 1.1592 (max= 1.5122), tps=20522, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:21,936 - root - INFO - Step 35770: lr=1.00E-05, loss= 1.1592 (max= 1.5122), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:37,897 - root - INFO - Step 35780: lr=1.00E-05, loss= 1.1617 (max= 1.5329), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:37,897 - root - INFO - Step 35780: lr=1.00E-05, loss= 1.1617 (max= 1.5329), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:37,897 - root - INFO - Step 35780: lr=1.00E-05, loss= 1.1617 (max= 1.5329), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:37,897 - root - INFO - Step 35780: lr=1.00E-05, loss= 1.1617 (max= 1.5329), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:37,897 - root - INFO - Step 35780: lr=1.00E-05, loss= 1.1617 (max= 1.5329), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:37,897 - root - INFO - Step 35780: lr=1.00E-05, loss= 1.1617 (max= 1.5329), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:37,897 - root - INFO - Step 35780: lr=1.00E-05, loss= 1.1617 (max= 1.5329), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:37,897 - root - INFO - Step 35780: lr=1.00E-05, loss= 1.1617 (max= 1.5329), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:53,805 - root - INFO - Step 35790: lr=1.00E-05, loss= 1.1940 (max= 1.6389), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:53,805 - root - INFO - Step 35790: lr=1.00E-05, loss= 1.1940 (max= 1.6389), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:53,805 - root - INFO - Step 35790: lr=1.00E-05, loss= 1.1940 (max= 1.6389), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:53,805 - root - INFO - Step 35790: lr=1.00E-05, loss= 1.1940 (max= 1.6389), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:53,805 - root - INFO - Step 35790: lr=1.00E-05, loss= 1.1940 (max= 1.6389), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:53,805 - root - INFO - Step 35790: lr=1.00E-05, loss= 1.1940 (max= 1.6389), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:53,805 - root - INFO - Step 35790: lr=1.00E-05, loss= 1.1940 (max= 1.6389), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:27:53,805 - root - INFO - Step 35790: lr=1.00E-05, loss= 1.1940 (max= 1.6389), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:09,761 - root - INFO - Step 35800: lr=1.00E-05, loss= 1.1393 (max= 1.6593), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:09,761 - root - INFO - Step 35800: lr=1.00E-05, loss= 1.1393 (max= 1.6593), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:09,761 - root - INFO - Step 35800: lr=1.00E-05, loss= 1.1393 (max= 1.6593), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:09,761 - root - INFO - Step 35800: lr=1.00E-05, loss= 1.1393 (max= 1.6593), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:09,761 - root - INFO - Step 35800: lr=1.00E-05, loss= 1.1393 (max= 1.6593), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:09,761 - root - INFO - Step 35800: lr=1.00E-05, loss= 1.1393 (max= 1.6593), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:09,761 - root - INFO - Step 35800: lr=1.00E-05, loss= 1.1393 (max= 1.6593), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:09,761 - root - INFO - Step 35800: lr=1.00E-05, loss= 1.1393 (max= 1.6593), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:25,719 - root - INFO - Step 35810: lr=1.00E-05, loss= 1.1648 (max= 1.6370), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:25,719 - root - INFO - Step 35810: lr=1.00E-05, loss= 1.1648 (max= 1.6370), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:25,720 - root - INFO - Step 35810: lr=1.00E-05, loss= 1.1648 (max= 1.6370), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:25,720 - root - INFO - Step 35810: lr=1.00E-05, loss= 1.1648 (max= 1.6370), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:25,720 - root - INFO - Step 35810: lr=1.00E-05, loss= 1.1648 (max= 1.6370), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:25,720 - root - INFO - Step 35810: lr=1.00E-05, loss= 1.1648 (max= 1.6370), tps=20537, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:25,720 - root - INFO - Step 35810: lr=1.00E-05, loss= 1.1648 (max= 1.6370), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:25,720 - root - INFO - Step 35810: lr=1.00E-05, loss= 1.1648 (max= 1.6370), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:41,635 - root - INFO - Step 35820: lr=1.00E-05, loss= 1.1425 (max= 1.5515), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:41,635 - root - INFO - Step 35820: lr=1.00E-05, loss= 1.1425 (max= 1.5515), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:41,635 - root - INFO - Step 35820: lr=1.00E-05, loss= 1.1425 (max= 1.5515), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:41,635 - root - INFO - Step 35820: lr=1.00E-05, loss= 1.1425 (max= 1.5515), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:41,635 - root - INFO - Step 35820: lr=1.00E-05, loss= 1.1425 (max= 1.5515), tps=20593, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:41,635 - root - INFO - Step 35820: lr=1.00E-05, loss= 1.1425 (max= 1.5515), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:41,635 - root - INFO - Step 35820: lr=1.00E-05, loss= 1.1425 (max= 1.5515), tps=20593, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:41,635 - root - INFO - Step 35820: lr=1.00E-05, loss= 1.1425 (max= 1.5515), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:28:57,563 - root - INFO - Step 35830: lr=1.00E-05, loss= 1.1427 (max= 1.5943), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:28:57,563 - root - INFO - Step 35830: lr=1.00E-05, loss= 1.1427 (max= 1.5943), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:28:57,564 - root - INFO - Step 35830: lr=1.00E-05, loss= 1.1427 (max= 1.5943), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:28:57,564 - root - INFO - Step 35830: lr=1.00E-05, loss= 1.1427 (max= 1.5943), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:28:57,564 - root - INFO - Step 35830: lr=1.00E-05, loss= 1.1427 (max= 1.5943), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:28:57,564 - root - INFO - Step 35830: lr=1.00E-05, loss= 1.1427 (max= 1.5943), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:28:57,564 - root - INFO - Step 35830: lr=1.00E-05, loss= 1.1427 (max= 1.5943), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:28:57,564 - root - INFO - Step 35830: lr=1.00E-05, loss= 1.1427 (max= 1.5943), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:13,507 - root - INFO - Step 35840: lr=1.00E-05, loss= 1.1639 (max= 1.7113), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:13,507 - root - INFO - Step 35840: lr=1.00E-05, loss= 1.1639 (max= 1.7113), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:13,507 - root - INFO - Step 35840: lr=1.00E-05, loss= 1.1639 (max= 1.7113), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:13,507 - root - INFO - Step 35840: lr=1.00E-05, loss= 1.1639 (max= 1.7113), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:13,507 - root - INFO - Step 35840: lr=1.00E-05, loss= 1.1639 (max= 1.7113), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:13,507 - root - INFO - Step 35840: lr=1.00E-05, loss= 1.1639 (max= 1.7113), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:13,507 - root - INFO - Step 35840: lr=1.00E-05, loss= 1.1639 (max= 1.7113), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:13,507 - root - INFO - Step 35840: lr=1.00E-05, loss= 1.1639 (max= 1.7113), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:29,469 - root - INFO - Step 35850: lr=1.00E-05, loss= 1.1661 (max= 1.4513), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:29,469 - root - INFO - Step 35850: lr=1.00E-05, loss= 1.1661 (max= 1.4513), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:29,469 - root - INFO - Step 35850: lr=1.00E-05, loss= 1.1661 (max= 1.4513), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:29,470 - root - INFO - Step 35850: lr=1.00E-05, loss= 1.1661 (max= 1.4513), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:29,470 - root - INFO - Step 35850: lr=1.00E-05, loss= 1.1661 (max= 1.4513), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:29,470 - root - INFO - Step 35850: lr=1.00E-05, loss= 1.1661 (max= 1.4513), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:29,470 - root - INFO - Step 35850: lr=1.00E-05, loss= 1.1661 (max= 1.4513), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:29,470 - root - INFO - Step 35850: lr=1.00E-05, loss= 1.1661 (max= 1.4513), tps=20533, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:45,410 - root - INFO - Step 35860: lr=1.00E-05, loss= 1.1312 (max= 1.4561), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:45,410 - root - INFO - Step 35860: lr=1.00E-05, loss= 1.1312 (max= 1.4561), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:45,410 - root - INFO - Step 35860: lr=1.00E-05, loss= 1.1312 (max= 1.4561), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:45,410 - root - INFO - Step 35860: lr=1.00E-05, loss= 1.1312 (max= 1.4561), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:45,410 - root - INFO - Step 35860: lr=1.00E-05, loss= 1.1312 (max= 1.4561), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:45,410 - root - INFO - Step 35860: lr=1.00E-05, loss= 1.1312 (max= 1.4561), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:45,410 - root - INFO - Step 35860: lr=1.00E-05, loss= 1.1312 (max= 1.4561), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:29:45,410 - root - INFO - Step 35860: lr=1.00E-05, loss= 1.1312 (max= 1.4561), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:30:01,295 - root - INFO - Step 35870: lr=1.00E-05, loss= 1.1191 (max= 1.4659), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:01,295 - root - INFO - Step 35870: lr=1.00E-05, loss= 1.1191 (max= 1.4659), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:01,295 - root - INFO - Step 35870: lr=1.00E-05, loss= 1.1191 (max= 1.4659), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:01,295 - root - INFO - Step 35870: lr=1.00E-05, loss= 1.1191 (max= 1.4659), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:01,295 - root - INFO - Step 35870: lr=1.00E-05, loss= 1.1191 (max= 1.4659), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:01,295 - root - INFO - Step 35870: lr=1.00E-05, loss= 1.1191 (max= 1.4659), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:01,295 - root - INFO - Step 35870: lr=1.00E-05, loss= 1.1191 (max= 1.4659), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:01,295 - root - INFO - Step 35870: lr=1.00E-05, loss= 1.1191 (max= 1.4659), tps=20633, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:17,195 - root - INFO - Step 35880: lr=1.00E-05, loss= 1.1829 (max= 1.4793), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:17,195 - root - INFO - Step 35880: lr=1.00E-05, loss= 1.1829 (max= 1.4793), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:17,195 - root - INFO - Step 35880: lr=1.00E-05, loss= 1.1829 (max= 1.4793), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:17,195 - root - INFO - Step 35880: lr=1.00E-05, loss= 1.1829 (max= 1.4793), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:17,195 - root - INFO - Step 35880: lr=1.00E-05, loss= 1.1829 (max= 1.4793), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:17,195 - root - INFO - Step 35880: lr=1.00E-05, loss= 1.1829 (max= 1.4793), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:17,195 - root - INFO - Step 35880: lr=1.00E-05, loss= 1.1829 (max= 1.4793), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:17,195 - root - INFO - Step 35880: lr=1.00E-05, loss= 1.1829 (max= 1.4793), tps=20612, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:33,149 - root - INFO - Step 35890: lr=1.00E-05, loss= 1.1618 (max= 1.6215), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:30:33,149 - root - INFO - Step 35890: lr=1.00E-05, loss= 1.1618 (max= 1.6215), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:30:33,149 - root - INFO - Step 35890: lr=1.00E-05, loss= 1.1618 (max= 1.6215), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:30:33,149 - root - INFO - Step 35890: lr=1.00E-05, loss= 1.1618 (max= 1.6215), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:30:33,150 - root - INFO - Step 35890: lr=1.00E-05, loss= 1.1618 (max= 1.6215), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:30:33,150 - root - INFO - Step 35890: lr=1.00E-05, loss= 1.1618 (max= 1.6215), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:30:33,150 - root - INFO - Step 35890: lr=1.00E-05, loss= 1.1618 (max= 1.6215), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:30:33,150 - root - INFO - Step 35890: lr=1.00E-05, loss= 1.1618 (max= 1.6215), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:30:49,058 - root - INFO - Step 35900: lr=1.00E-05, loss= 1.1556 (max= 1.5995), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:49,058 - root - INFO - Step 35900: lr=1.00E-05, loss= 1.1556 (max= 1.5995), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:49,058 - root - INFO - Step 35900: lr=1.00E-05, loss= 1.1556 (max= 1.5995), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:49,058 - root - INFO - Step 35900: lr=1.00E-05, loss= 1.1556 (max= 1.5995), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:49,058 - root - INFO - Step 35900: lr=1.00E-05, loss= 1.1556 (max= 1.5995), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:49,058 - root - INFO - Step 35900: lr=1.00E-05, loss= 1.1556 (max= 1.5995), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:49,058 - root - INFO - Step 35900: lr=1.00E-05, loss= 1.1556 (max= 1.5995), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:30:49,058 - root - INFO - Step 35900: lr=1.00E-05, loss= 1.1556 (max= 1.5995), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:04,941 - root - INFO - Step 35910: lr=1.00E-05, loss= 1.1512 (max= 1.6078), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:31:04,941 - root - INFO - Step 35910: lr=1.00E-05, loss= 1.1512 (max= 1.6078), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:31:04,941 - root - INFO - Step 35910: lr=1.00E-05, loss= 1.1512 (max= 1.6078), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:31:04,941 - root - INFO - Step 35910: lr=1.00E-05, loss= 1.1512 (max= 1.6078), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:31:04,941 - root - INFO - Step 35910: lr=1.00E-05, loss= 1.1512 (max= 1.6078), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:31:04,941 - root - INFO - Step 35910: lr=1.00E-05, loss= 1.1512 (max= 1.6078), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:31:04,941 - root - INFO - Step 35910: lr=1.00E-05, loss= 1.1512 (max= 1.6078), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:31:04,941 - root - INFO - Step 35910: lr=1.00E-05, loss= 1.1512 (max= 1.6078), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:31:20,857 - root - INFO - Step 35920: lr=1.00E-05, loss= 1.1330 (max= 1.6190), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:20,857 - root - INFO - Step 35920: lr=1.00E-05, loss= 1.1330 (max= 1.6190), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:20,857 - root - INFO - Step 35920: lr=1.00E-05, loss= 1.1330 (max= 1.6190), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:20,857 - root - INFO - Step 35920: lr=1.00E-05, loss= 1.1330 (max= 1.6190), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:20,858 - root - INFO - Step 35920: lr=1.00E-05, loss= 1.1330 (max= 1.6190), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:20,858 - root - INFO - Step 35920: lr=1.00E-05, loss= 1.1330 (max= 1.6190), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:20,858 - root - INFO - Step 35920: lr=1.00E-05, loss= 1.1330 (max= 1.6190), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:20,858 - root - INFO - Step 35920: lr=1.00E-05, loss= 1.1330 (max= 1.6190), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:36,752 - root - INFO - Step 35930: lr=1.00E-05, loss= 1.1466 (max= 1.4972), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:36,752 - root - INFO - Step 35930: lr=1.00E-05, loss= 1.1466 (max= 1.4972), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:36,752 - root - INFO - Step 35930: lr=1.00E-05, loss= 1.1466 (max= 1.4972), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:36,752 - root - INFO - Step 35930: lr=1.00E-05, loss= 1.1466 (max= 1.4972), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:36,752 - root - INFO - Step 35930: lr=1.00E-05, loss= 1.1466 (max= 1.4972), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:36,752 - root - INFO - Step 35930: lr=1.00E-05, loss= 1.1466 (max= 1.4972), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:36,752 - root - INFO - Step 35930: lr=1.00E-05, loss= 1.1466 (max= 1.4972), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:36,752 - root - INFO - Step 35930: lr=1.00E-05, loss= 1.1466 (max= 1.4972), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:52,723 - root - INFO - Step 35940: lr=1.00E-05, loss= 1.1327 (max= 1.4974), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:52,723 - root - INFO - Step 35940: lr=1.00E-05, loss= 1.1327 (max= 1.4974), tps=20521, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:52,723 - root - INFO - Step 35940: lr=1.00E-05, loss= 1.1327 (max= 1.4974), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:52,723 - root - INFO - Step 35940: lr=1.00E-05, loss= 1.1327 (max= 1.4974), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:52,724 - root - INFO - Step 35940: lr=1.00E-05, loss= 1.1327 (max= 1.4974), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:52,724 - root - INFO - Step 35940: lr=1.00E-05, loss= 1.1327 (max= 1.4974), tps=20521, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:52,724 - root - INFO - Step 35940: lr=1.00E-05, loss= 1.1327 (max= 1.4974), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:31:52,724 - root - INFO - Step 35940: lr=1.00E-05, loss= 1.1327 (max= 1.4974), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:32:08,648 - root - INFO - Step 35950: lr=1.00E-05, loss= 1.1574 (max= 1.5594), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:32:08,648 - root - INFO - Step 35950: lr=1.00E-05, loss= 1.1574 (max= 1.5594), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:32:08,648 - root - INFO - Step 35950: lr=1.00E-05, loss= 1.1574 (max= 1.5594), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:32:08,648 - root - INFO - Step 35950: lr=1.00E-05, loss= 1.1574 (max= 1.5594), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:32:08,649 - root - INFO - Step 35950: lr=1.00E-05, loss= 1.1574 (max= 1.5594), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:32:08,649 - root - INFO - Step 35950: lr=1.00E-05, loss= 1.1574 (max= 1.5594), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:32:08,649 - root - INFO - Step 35950: lr=1.00E-05, loss= 1.1574 (max= 1.5594), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:32:08,649 - root - INFO - Step 35950: lr=1.00E-05, loss= 1.1574 (max= 1.5594), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:32:24,559 - root - INFO - Step 35960: lr=1.00E-05, loss= 1.1708 (max= 1.6749), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:32:24,559 - root - INFO - Step 35960: lr=1.00E-05, loss= 1.1708 (max= 1.6749), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:32:24,559 - root - INFO - Step 35960: lr=1.00E-05, loss= 1.1708 (max= 1.6749), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:32:24,559 - root - INFO - Step 35960: lr=1.00E-05, loss= 1.1708 (max= 1.6749), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:32:24,559 - root - INFO - Step 35960: lr=1.00E-05, loss= 1.1708 (max= 1.6749), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:32:24,559 - root - INFO - Step 35960: lr=1.00E-05, loss= 1.1708 (max= 1.6749), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:32:24,559 - root - INFO - Step 35960: lr=1.00E-05, loss= 1.1708 (max= 1.6749), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:32:24,559 - root - INFO - Step 35960: lr=1.00E-05, loss= 1.1708 (max= 1.6749), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:32:40,514 - root - INFO - Step 35970: lr=1.00E-05, loss= 1.1133 (max= 1.5396), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:32:40,514 - root - INFO - Step 35970: lr=1.00E-05, loss= 1.1133 (max= 1.5396), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:32:40,514 - root - INFO - Step 35970: lr=1.00E-05, loss= 1.1133 (max= 1.5396), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:32:40,514 - root - INFO - Step 35970: lr=1.00E-05, loss= 1.1133 (max= 1.5396), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:32:40,514 - root - INFO - Step 35970: lr=1.00E-05, loss= 1.1133 (max= 1.5396), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:32:40,514 - root - INFO - Step 35970: lr=1.00E-05, loss= 1.1133 (max= 1.5396), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:32:40,514 - root - INFO - Step 35970: lr=1.00E-05, loss= 1.1133 (max= 1.5396), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:32:40,514 - root - INFO - Step 35970: lr=1.00E-05, loss= 1.1133 (max= 1.5396), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:32:56,474 - root - INFO - Step 35980: lr=1.00E-05, loss= 1.1359 (max= 1.7968), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:32:56,474 - root - INFO - Step 35980: lr=1.00E-05, loss= 1.1359 (max= 1.7968), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:32:56,474 - root - INFO - Step 35980: lr=1.00E-05, loss= 1.1359 (max= 1.7968), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:32:56,474 - root - INFO - Step 35980: lr=1.00E-05, loss= 1.1359 (max= 1.7968), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:32:56,474 - root - INFO - Step 35980: lr=1.00E-05, loss= 1.1359 (max= 1.7968), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:32:56,474 - root - INFO - Step 35980: lr=1.00E-05, loss= 1.1359 (max= 1.7968), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:32:56,474 - root - INFO - Step 35980: lr=1.00E-05, loss= 1.1359 (max= 1.7968), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:32:56,474 - root - INFO - Step 35980: lr=1.00E-05, loss= 1.1359 (max= 1.7968), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:33:12,431 - root - INFO - Step 35990: lr=1.00E-05, loss= 1.1357 (max= 1.4739), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:33:12,431 - root - INFO - Step 35990: lr=1.00E-05, loss= 1.1357 (max= 1.4739), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:33:12,431 - root - INFO - Step 35990: lr=1.00E-05, loss= 1.1357 (max= 1.4739), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:33:12,432 - root - INFO - Step 35990: lr=1.00E-05, loss= 1.1357 (max= 1.4739), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:33:12,432 - root - INFO - Step 35990: lr=1.00E-05, loss= 1.1357 (max= 1.4739), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:33:12,432 - root - INFO - Step 35990: lr=1.00E-05, loss= 1.1357 (max= 1.4739), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:33:12,432 - root - INFO - Step 35990: lr=1.00E-05, loss= 1.1357 (max= 1.4739), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:33:12,432 - root - INFO - Step 35990: lr=1.00E-05, loss= 1.1357 (max= 1.4739), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-36000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-36000! Save time: 4.358523845672607 +2025-10-25 02:33:28,361 - root - INFO - Step 36000: lr=1.00E-05, loss= 1.1832 (max= 1.6383), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:33:28,361 - root - INFO - Step 36000: lr=1.00E-05, loss= 1.1832 (max= 1.6383), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:33:28,361 - root - INFO - Saving a full checkpoint at step 36000 +2025-10-25 02:33:28,361 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 02:33:28,361 - root - INFO - Step 36000: lr=1.00E-05, loss= 1.1832 (max= 1.6383), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:33:28,361 - root - INFO - Saving a full checkpoint at step 36000 +2025-10-25 02:33:28,361 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 02:33:28,361 - root - INFO - Saving a full checkpoint at step 36000 +2025-10-25 02:33:28,361 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 02:33:28,361 - root - INFO - Step 36000: lr=1.00E-05, loss= 1.1832 (max= 1.6383), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:33:28,361 - root - INFO - Saving a full checkpoint at step 36000 +2025-10-25 02:33:28,362 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 02:33:28,361 - root - INFO - Step 36000: lr=1.00E-05, loss= 1.1832 (max= 1.6383), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:33:28,361 - root - INFO - Step 36000: lr=1.00E-05, loss= 1.1832 (max= 1.6383), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:33:28,361 - root - INFO - Step 36000: lr=1.00E-05, loss= 1.1832 (max= 1.6383), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:33:28,361 - root - INFO - Step 36000: lr=1.00E-05, loss= 1.1832 (max= 1.6383), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:33:28,362 - root - INFO - Saving a full checkpoint at step 36000 +2025-10-25 02:33:28,362 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 02:33:28,362 - root - INFO - Saving a full checkpoint at step 36000 +2025-10-25 02:33:28,362 - root - INFO - Saving a full checkpoint at step 36000 +2025-10-25 02:33:28,362 - root - INFO - Saving a full checkpoint at step 36000 +2025-10-25 02:33:28,362 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 02:33:28,362 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 02:33:28,362 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 02:33:42,520 - root - INFO - Finished saving the checkpoint in 14.16 seconds +2025-10-25 02:33:42,526 - root - INFO - Finished saving the checkpoint in 14.16 seconds +2025-10-25 02:33:42,526 - root - INFO - Finished saving the checkpoint in 14.16 seconds +2025-10-25 02:33:42,527 - root - INFO - Finished saving the checkpoint in 14.17 seconds +2025-10-25 02:33:42,527 - root - INFO - Finished saving the checkpoint in 14.17 seconds +2025-10-25 02:33:42,527 - root - INFO - Finished saving the checkpoint in 14.17 seconds +2025-10-25 02:33:42,528 - root - INFO - Finished saving the checkpoint in 14.17 seconds +2025-10-25 02:33:42,528 - root - INFO - Finished saving the checkpoint in 14.17 seconds +2025-10-25 02:33:58,383 - root - INFO - Step 36010: lr=1.00E-05, loss= 1.1898 (max= 1.6289), tps=10916, mfu=22.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:33:58,383 - root - INFO - Step 36010: lr=1.00E-05, loss= 1.1898 (max= 1.6289), tps=10916, mfu=22.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:33:58,383 - root - INFO - Step 36010: lr=1.00E-05, loss= 1.1898 (max= 1.6289), tps=10916, mfu=22.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:33:58,383 - root - INFO - Step 36010: lr=1.00E-05, loss= 1.1898 (max= 1.6289), tps=10916, mfu=22.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:33:58,383 - root - INFO - Step 36010: lr=1.00E-05, loss= 1.1898 (max= 1.6289), tps=10916, mfu=22.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:33:58,383 - root - INFO - Step 36010: lr=1.00E-05, loss= 1.1898 (max= 1.6289), tps=10916, mfu=22.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:33:58,383 - root - INFO - Step 36010: lr=1.00E-05, loss= 1.1898 (max= 1.6289), tps=10916, mfu=22.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:33:58,383 - root - INFO - Step 36010: lr=1.00E-05, loss= 1.1898 (max= 1.6289), tps=10916, mfu=22.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:34:14,294 - root - INFO - Step 36020: lr=1.00E-05, loss= 1.1623 (max= 1.8199), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:14,295 - root - INFO - Step 36020: lr=1.00E-05, loss= 1.1623 (max= 1.8199), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:14,295 - root - INFO - Step 36020: lr=1.00E-05, loss= 1.1623 (max= 1.8199), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:14,295 - root - INFO - Step 36020: lr=1.00E-05, loss= 1.1623 (max= 1.8199), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:14,295 - root - INFO - Step 36020: lr=1.00E-05, loss= 1.1623 (max= 1.8199), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:14,295 - root - INFO - Step 36020: lr=1.00E-05, loss= 1.1623 (max= 1.8199), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:14,295 - root - INFO - Step 36020: lr=1.00E-05, loss= 1.1623 (max= 1.8199), tps=20599, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:14,295 - root - INFO - Step 36020: lr=1.00E-05, loss= 1.1623 (max= 1.8199), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:30,233 - root - INFO - Step 36030: lr=1.00E-05, loss= 1.1532 (max= 1.7139), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:30,233 - root - INFO - Step 36030: lr=1.00E-05, loss= 1.1532 (max= 1.7139), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:30,233 - root - INFO - Step 36030: lr=1.00E-05, loss= 1.1532 (max= 1.7139), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:30,233 - root - INFO - Step 36030: lr=1.00E-05, loss= 1.1532 (max= 1.7139), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:30,233 - root - INFO - Step 36030: lr=1.00E-05, loss= 1.1532 (max= 1.7139), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:30,233 - root - INFO - Step 36030: lr=1.00E-05, loss= 1.1532 (max= 1.7139), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:30,233 - root - INFO - Step 36030: lr=1.00E-05, loss= 1.1532 (max= 1.7139), tps=20564, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:30,233 - root - INFO - Step 36030: lr=1.00E-05, loss= 1.1532 (max= 1.7139), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:46,214 - root - INFO - Step 36040: lr=1.00E-05, loss= 1.1413 (max= 1.6102), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:46,215 - root - INFO - Step 36040: lr=1.00E-05, loss= 1.1413 (max= 1.6102), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:46,215 - root - INFO - Step 36040: lr=1.00E-05, loss= 1.1413 (max= 1.6102), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:46,215 - root - INFO - Step 36040: lr=1.00E-05, loss= 1.1413 (max= 1.6102), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:46,215 - root - INFO - Step 36040: lr=1.00E-05, loss= 1.1413 (max= 1.6102), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:46,215 - root - INFO - Step 36040: lr=1.00E-05, loss= 1.1413 (max= 1.6102), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:46,215 - root - INFO - Step 36040: lr=1.00E-05, loss= 1.1413 (max= 1.6102), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:34:46,215 - root - INFO - Step 36040: lr=1.00E-05, loss= 1.1413 (max= 1.6102), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:02,118 - root - INFO - Step 36050: lr=1.00E-05, loss= 1.1426 (max= 1.6142), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:02,119 - root - INFO - Step 36050: lr=1.00E-05, loss= 1.1426 (max= 1.6142), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:02,119 - root - INFO - Step 36050: lr=1.00E-05, loss= 1.1426 (max= 1.6142), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:02,119 - root - INFO - Step 36050: lr=1.00E-05, loss= 1.1426 (max= 1.6142), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:02,119 - root - INFO - Step 36050: lr=1.00E-05, loss= 1.1426 (max= 1.6142), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:02,119 - root - INFO - Step 36050: lr=1.00E-05, loss= 1.1426 (max= 1.6142), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:02,119 - root - INFO - Step 36050: lr=1.00E-05, loss= 1.1426 (max= 1.6142), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:02,119 - root - INFO - Step 36050: lr=1.00E-05, loss= 1.1426 (max= 1.6142), tps=20608, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:18,082 - root - INFO - Step 36060: lr=1.00E-05, loss= 1.1320 (max= 1.5476), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:18,083 - root - INFO - Step 36060: lr=1.00E-05, loss= 1.1320 (max= 1.5476), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:18,083 - root - INFO - Step 36060: lr=1.00E-05, loss= 1.1320 (max= 1.5476), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:18,083 - root - INFO - Step 36060: lr=1.00E-05, loss= 1.1320 (max= 1.5476), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:18,083 - root - INFO - Step 36060: lr=1.00E-05, loss= 1.1320 (max= 1.5476), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:18,083 - root - INFO - Step 36060: lr=1.00E-05, loss= 1.1320 (max= 1.5476), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:18,083 - root - INFO - Step 36060: lr=1.00E-05, loss= 1.1320 (max= 1.5476), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:18,083 - root - INFO - Step 36060: lr=1.00E-05, loss= 1.1320 (max= 1.5476), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:34,022 - root - INFO - Step 36070: lr=1.00E-05, loss= 1.1181 (max= 1.5507), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:34,022 - root - INFO - Step 36070: lr=1.00E-05, loss= 1.1181 (max= 1.5507), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:34,022 - root - INFO - Step 36070: lr=1.00E-05, loss= 1.1181 (max= 1.5507), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:34,022 - root - INFO - Step 36070: lr=1.00E-05, loss= 1.1181 (max= 1.5507), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:34,022 - root - INFO - Step 36070: lr=1.00E-05, loss= 1.1181 (max= 1.5507), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:34,022 - root - INFO - Step 36070: lr=1.00E-05, loss= 1.1181 (max= 1.5507), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:34,022 - root - INFO - Step 36070: lr=1.00E-05, loss= 1.1181 (max= 1.5507), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:34,022 - root - INFO - Step 36070: lr=1.00E-05, loss= 1.1181 (max= 1.5507), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:35:49,966 - root - INFO - Step 36080: lr=1.00E-05, loss= 1.1579 (max= 1.7986), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:35:49,966 - root - INFO - Step 36080: lr=1.00E-05, loss= 1.1579 (max= 1.7986), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:35:49,966 - root - INFO - Step 36080: lr=1.00E-05, loss= 1.1579 (max= 1.7986), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:35:49,966 - root - INFO - Step 36080: lr=1.00E-05, loss= 1.1579 (max= 1.7986), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:35:49,966 - root - INFO - Step 36080: lr=1.00E-05, loss= 1.1579 (max= 1.7986), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:35:49,966 - root - INFO - Step 36080: lr=1.00E-05, loss= 1.1579 (max= 1.7986), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:35:49,966 - root - INFO - Step 36080: lr=1.00E-05, loss= 1.1579 (max= 1.7986), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:35:49,966 - root - INFO - Step 36080: lr=1.00E-05, loss= 1.1579 (max= 1.7986), tps=20556, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:05,888 - root - INFO - Step 36090: lr=1.00E-05, loss= 1.1636 (max= 1.5913), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:05,888 - root - INFO - Step 36090: lr=1.00E-05, loss= 1.1636 (max= 1.5913), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:05,888 - root - INFO - Step 36090: lr=1.00E-05, loss= 1.1636 (max= 1.5913), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:05,889 - root - INFO - Step 36090: lr=1.00E-05, loss= 1.1636 (max= 1.5913), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:05,889 - root - INFO - Step 36090: lr=1.00E-05, loss= 1.1636 (max= 1.5913), tps=20585, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:05,889 - root - INFO - Step 36090: lr=1.00E-05, loss= 1.1636 (max= 1.5913), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:05,889 - root - INFO - Step 36090: lr=1.00E-05, loss= 1.1636 (max= 1.5913), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:05,889 - root - INFO - Step 36090: lr=1.00E-05, loss= 1.1636 (max= 1.5913), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:21,840 - root - INFO - Step 36100: lr=1.00E-05, loss= 1.1532 (max= 1.7029), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:21,840 - root - INFO - Step 36100: lr=1.00E-05, loss= 1.1532 (max= 1.7029), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:21,840 - root - INFO - Step 36100: lr=1.00E-05, loss= 1.1532 (max= 1.7029), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:21,840 - root - INFO - Step 36100: lr=1.00E-05, loss= 1.1532 (max= 1.7029), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:21,840 - root - INFO - Step 36100: lr=1.00E-05, loss= 1.1532 (max= 1.7029), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:21,840 - root - INFO - Step 36100: lr=1.00E-05, loss= 1.1532 (max= 1.7029), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:21,840 - root - INFO - Step 36100: lr=1.00E-05, loss= 1.1532 (max= 1.7029), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:21,840 - root - INFO - Step 36100: lr=1.00E-05, loss= 1.1532 (max= 1.7029), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:37,818 - root - INFO - Step 36110: lr=1.00E-05, loss= 1.1790 (max= 1.5349), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:36:37,818 - root - INFO - Step 36110: lr=1.00E-05, loss= 1.1790 (max= 1.5349), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:36:37,818 - root - INFO - Step 36110: lr=1.00E-05, loss= 1.1790 (max= 1.5349), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:36:37,818 - root - INFO - Step 36110: lr=1.00E-05, loss= 1.1790 (max= 1.5349), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:36:37,818 - root - INFO - Step 36110: lr=1.00E-05, loss= 1.1790 (max= 1.5349), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:36:37,818 - root - INFO - Step 36110: lr=1.00E-05, loss= 1.1790 (max= 1.5349), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:36:37,818 - root - INFO - Step 36110: lr=1.00E-05, loss= 1.1790 (max= 1.5349), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:36:37,818 - root - INFO - Step 36110: lr=1.00E-05, loss= 1.1790 (max= 1.5349), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:36:53,812 - root - INFO - Step 36120: lr=1.00E-05, loss= 1.1262 (max= 1.5917), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:53,812 - root - INFO - Step 36120: lr=1.00E-05, loss= 1.1262 (max= 1.5917), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:53,812 - root - INFO - Step 36120: lr=1.00E-05, loss= 1.1262 (max= 1.5917), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:53,812 - root - INFO - Step 36120: lr=1.00E-05, loss= 1.1262 (max= 1.5917), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:53,812 - root - INFO - Step 36120: lr=1.00E-05, loss= 1.1262 (max= 1.5917), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:53,812 - root - INFO - Step 36120: lr=1.00E-05, loss= 1.1262 (max= 1.5917), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:53,812 - root - INFO - Step 36120: lr=1.00E-05, loss= 1.1262 (max= 1.5917), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:36:53,812 - root - INFO - Step 36120: lr=1.00E-05, loss= 1.1262 (max= 1.5917), tps=20492, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:09,726 - root - INFO - Step 36130: lr=1.00E-05, loss= 1.1618 (max= 1.6796), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:09,726 - root - INFO - Step 36130: lr=1.00E-05, loss= 1.1618 (max= 1.6796), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:09,726 - root - INFO - Step 36130: lr=1.00E-05, loss= 1.1618 (max= 1.6796), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:09,726 - root - INFO - Step 36130: lr=1.00E-05, loss= 1.1618 (max= 1.6796), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:09,727 - root - INFO - Step 36130: lr=1.00E-05, loss= 1.1618 (max= 1.6796), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:09,727 - root - INFO - Step 36130: lr=1.00E-05, loss= 1.1618 (max= 1.6796), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:09,727 - root - INFO - Step 36130: lr=1.00E-05, loss= 1.1618 (max= 1.6796), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:09,727 - root - INFO - Step 36130: lr=1.00E-05, loss= 1.1618 (max= 1.6796), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:25,641 - root - INFO - Step 36140: lr=1.00E-05, loss= 1.1252 (max= 1.6731), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:25,641 - root - INFO - Step 36140: lr=1.00E-05, loss= 1.1252 (max= 1.6731), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:25,641 - root - INFO - Step 36140: lr=1.00E-05, loss= 1.1252 (max= 1.6731), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:25,642 - root - INFO - Step 36140: lr=1.00E-05, loss= 1.1252 (max= 1.6731), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:25,642 - root - INFO - Step 36140: lr=1.00E-05, loss= 1.1252 (max= 1.6731), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:25,642 - root - INFO - Step 36140: lr=1.00E-05, loss= 1.1252 (max= 1.6731), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:25,642 - root - INFO - Step 36140: lr=1.00E-05, loss= 1.1252 (max= 1.6731), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:25,642 - root - INFO - Step 36140: lr=1.00E-05, loss= 1.1252 (max= 1.6731), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:41,557 - root - INFO - Step 36150: lr=1.00E-05, loss= 1.1604 (max= 1.5509), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:41,557 - root - INFO - Step 36150: lr=1.00E-05, loss= 1.1604 (max= 1.5509), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:41,557 - root - INFO - Step 36150: lr=1.00E-05, loss= 1.1604 (max= 1.5509), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:41,557 - root - INFO - Step 36150: lr=1.00E-05, loss= 1.1604 (max= 1.5509), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:41,557 - root - INFO - Step 36150: lr=1.00E-05, loss= 1.1604 (max= 1.5509), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:41,557 - root - INFO - Step 36150: lr=1.00E-05, loss= 1.1604 (max= 1.5509), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:41,557 - root - INFO - Step 36150: lr=1.00E-05, loss= 1.1604 (max= 1.5509), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:41,557 - root - INFO - Step 36150: lr=1.00E-05, loss= 1.1604 (max= 1.5509), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:57,486 - root - INFO - Step 36160: lr=1.00E-05, loss= 1.1545 (max= 1.4050), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:57,486 - root - INFO - Step 36160: lr=1.00E-05, loss= 1.1545 (max= 1.4050), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:57,486 - root - INFO - Step 36160: lr=1.00E-05, loss= 1.1545 (max= 1.4050), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:57,486 - root - INFO - Step 36160: lr=1.00E-05, loss= 1.1545 (max= 1.4050), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:57,486 - root - INFO - Step 36160: lr=1.00E-05, loss= 1.1545 (max= 1.4050), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:57,486 - root - INFO - Step 36160: lr=1.00E-05, loss= 1.1545 (max= 1.4050), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:57,486 - root - INFO - Step 36160: lr=1.00E-05, loss= 1.1545 (max= 1.4050), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:37:57,486 - root - INFO - Step 36160: lr=1.00E-05, loss= 1.1545 (max= 1.4050), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:38:13,418 - root - INFO - Step 36170: lr=1.00E-05, loss= 1.1776 (max= 1.5791), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:38:13,418 - root - INFO - Step 36170: lr=1.00E-05, loss= 1.1776 (max= 1.5791), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:38:13,418 - root - INFO - Step 36170: lr=1.00E-05, loss= 1.1776 (max= 1.5791), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:38:13,418 - root - INFO - Step 36170: lr=1.00E-05, loss= 1.1776 (max= 1.5791), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:38:13,418 - root - INFO - Step 36170: lr=1.00E-05, loss= 1.1776 (max= 1.5791), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:38:13,418 - root - INFO - Step 36170: lr=1.00E-05, loss= 1.1776 (max= 1.5791), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:38:13,418 - root - INFO - Step 36170: lr=1.00E-05, loss= 1.1776 (max= 1.5791), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:38:13,419 - root - INFO - Step 36170: lr=1.00E-05, loss= 1.1776 (max= 1.5791), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:38:29,320 - root - INFO - Step 36180: lr=1.00E-05, loss= 1.1420 (max= 1.4893), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:38:29,321 - root - INFO - Step 36180: lr=1.00E-05, loss= 1.1420 (max= 1.4893), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:38:29,321 - root - INFO - Step 36180: lr=1.00E-05, loss= 1.1420 (max= 1.4893), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:38:29,321 - root - INFO - Step 36180: lr=1.00E-05, loss= 1.1420 (max= 1.4893), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:38:29,321 - root - INFO - Step 36180: lr=1.00E-05, loss= 1.1420 (max= 1.4893), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:38:29,321 - root - INFO - Step 36180: lr=1.00E-05, loss= 1.1420 (max= 1.4893), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:38:29,321 - root - INFO - Step 36180: lr=1.00E-05, loss= 1.1420 (max= 1.4893), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:38:29,321 - root - INFO - Step 36180: lr=1.00E-05, loss= 1.1420 (max= 1.4893), tps=20611, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:38:45,285 - root - INFO - Step 36190: lr=1.00E-05, loss= 1.1722 (max= 1.6174), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:38:45,285 - root - INFO - Step 36190: lr=1.00E-05, loss= 1.1722 (max= 1.6174), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:38:45,285 - root - INFO - Step 36190: lr=1.00E-05, loss= 1.1722 (max= 1.6174), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:38:45,285 - root - INFO - Step 36190: lr=1.00E-05, loss= 1.1722 (max= 1.6174), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:38:45,285 - root - INFO - Step 36190: lr=1.00E-05, loss= 1.1722 (max= 1.6174), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:38:45,285 - root - INFO - Step 36190: lr=1.00E-05, loss= 1.1722 (max= 1.6174), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:38:45,285 - root - INFO - Step 36190: lr=1.00E-05, loss= 1.1722 (max= 1.6174), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:38:45,285 - root - INFO - Step 36190: lr=1.00E-05, loss= 1.1722 (max= 1.6174), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:01,224 - root - INFO - Step 36200: lr=1.00E-05, loss= 1.1773 (max= 1.5175), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:01,224 - root - INFO - Step 36200: lr=1.00E-05, loss= 1.1773 (max= 1.5175), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:01,224 - root - INFO - Step 36200: lr=1.00E-05, loss= 1.1773 (max= 1.5175), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:01,224 - root - INFO - Step 36200: lr=1.00E-05, loss= 1.1773 (max= 1.5175), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:01,224 - root - INFO - Step 36200: lr=1.00E-05, loss= 1.1773 (max= 1.5175), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:01,224 - root - INFO - Step 36200: lr=1.00E-05, loss= 1.1773 (max= 1.5175), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:01,225 - root - INFO - Step 36200: lr=1.00E-05, loss= 1.1773 (max= 1.5175), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:01,225 - root - INFO - Step 36200: lr=1.00E-05, loss= 1.1773 (max= 1.5175), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:17,151 - root - INFO - Step 36210: lr=1.00E-05, loss= 1.1788 (max= 1.5735), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:17,151 - root - INFO - Step 36210: lr=1.00E-05, loss= 1.1788 (max= 1.5735), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:17,151 - root - INFO - Step 36210: lr=1.00E-05, loss= 1.1788 (max= 1.5735), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:17,151 - root - INFO - Step 36210: lr=1.00E-05, loss= 1.1788 (max= 1.5735), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:17,151 - root - INFO - Step 36210: lr=1.00E-05, loss= 1.1788 (max= 1.5735), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:17,151 - root - INFO - Step 36210: lr=1.00E-05, loss= 1.1788 (max= 1.5735), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:17,151 - root - INFO - Step 36210: lr=1.00E-05, loss= 1.1788 (max= 1.5735), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:17,152 - root - INFO - Step 36210: lr=1.00E-05, loss= 1.1788 (max= 1.5735), tps=20578, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:33,106 - root - INFO - Step 36220: lr=1.00E-05, loss= 1.1502 (max= 1.5597), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:33,106 - root - INFO - Step 36220: lr=1.00E-05, loss= 1.1502 (max= 1.5597), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:33,107 - root - INFO - Step 36220: lr=1.00E-05, loss= 1.1502 (max= 1.5597), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:33,107 - root - INFO - Step 36220: lr=1.00E-05, loss= 1.1502 (max= 1.5597), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:33,107 - root - INFO - Step 36220: lr=1.00E-05, loss= 1.1502 (max= 1.5597), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:33,107 - root - INFO - Step 36220: lr=1.00E-05, loss= 1.1502 (max= 1.5597), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:33,107 - root - INFO - Step 36220: lr=1.00E-05, loss= 1.1502 (max= 1.5597), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:33,107 - root - INFO - Step 36220: lr=1.00E-05, loss= 1.1502 (max= 1.5597), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:49,070 - root - INFO - Step 36230: lr=1.00E-05, loss= 1.1762 (max= 1.6475), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:49,070 - root - INFO - Step 36230: lr=1.00E-05, loss= 1.1762 (max= 1.6475), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:49,070 - root - INFO - Step 36230: lr=1.00E-05, loss= 1.1762 (max= 1.6475), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:49,070 - root - INFO - Step 36230: lr=1.00E-05, loss= 1.1762 (max= 1.6475), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:49,070 - root - INFO - Step 36230: lr=1.00E-05, loss= 1.1762 (max= 1.6475), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:49,070 - root - INFO - Step 36230: lr=1.00E-05, loss= 1.1762 (max= 1.6475), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:49,070 - root - INFO - Step 36230: lr=1.00E-05, loss= 1.1762 (max= 1.6475), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:39:49,070 - root - INFO - Step 36230: lr=1.00E-05, loss= 1.1762 (max= 1.6475), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:05,016 - root - INFO - Step 36240: lr=1.00E-05, loss= 1.1498 (max= 1.6152), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:05,016 - root - INFO - Step 36240: lr=1.00E-05, loss= 1.1498 (max= 1.6152), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:05,016 - root - INFO - Step 36240: lr=1.00E-05, loss= 1.1498 (max= 1.6152), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:05,017 - root - INFO - Step 36240: lr=1.00E-05, loss= 1.1498 (max= 1.6152), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:05,017 - root - INFO - Step 36240: lr=1.00E-05, loss= 1.1498 (max= 1.6152), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:05,017 - root - INFO - Step 36240: lr=1.00E-05, loss= 1.1498 (max= 1.6152), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:05,017 - root - INFO - Step 36240: lr=1.00E-05, loss= 1.1498 (max= 1.6152), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:05,017 - root - INFO - Step 36240: lr=1.00E-05, loss= 1.1498 (max= 1.6152), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:20,996 - root - INFO - Step 36250: lr=1.00E-05, loss= 1.1687 (max= 1.7218), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:20,996 - root - INFO - Step 36250: lr=1.00E-05, loss= 1.1687 (max= 1.7218), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:20,996 - root - INFO - Step 36250: lr=1.00E-05, loss= 1.1687 (max= 1.7218), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:20,997 - root - INFO - Step 36250: lr=1.00E-05, loss= 1.1687 (max= 1.7218), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:20,997 - root - INFO - Step 36250: lr=1.00E-05, loss= 1.1687 (max= 1.7218), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:20,997 - root - INFO - Step 36250: lr=1.00E-05, loss= 1.1687 (max= 1.7218), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:20,997 - root - INFO - Step 36250: lr=1.00E-05, loss= 1.1687 (max= 1.7218), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:20,997 - root - INFO - Step 36250: lr=1.00E-05, loss= 1.1687 (max= 1.7218), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:36,932 - root - INFO - Step 36260: lr=1.00E-05, loss= 1.1593 (max= 1.7846), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:36,932 - root - INFO - Step 36260: lr=1.00E-05, loss= 1.1593 (max= 1.7846), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:36,932 - root - INFO - Step 36260: lr=1.00E-05, loss= 1.1593 (max= 1.7846), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:36,932 - root - INFO - Step 36260: lr=1.00E-05, loss= 1.1593 (max= 1.7846), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:36,932 - root - INFO - Step 36260: lr=1.00E-05, loss= 1.1593 (max= 1.7846), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:36,932 - root - INFO - Step 36260: lr=1.00E-05, loss= 1.1593 (max= 1.7846), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:36,932 - root - INFO - Step 36260: lr=1.00E-05, loss= 1.1593 (max= 1.7846), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:36,932 - root - INFO - Step 36260: lr=1.00E-05, loss= 1.1593 (max= 1.7846), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:52,892 - root - INFO - Step 36270: lr=1.00E-05, loss= 1.1714 (max= 1.9842), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:52,892 - root - INFO - Step 36270: lr=1.00E-05, loss= 1.1714 (max= 1.9842), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:52,892 - root - INFO - Step 36270: lr=1.00E-05, loss= 1.1714 (max= 1.9842), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:52,893 - root - INFO - Step 36270: lr=1.00E-05, loss= 1.1714 (max= 1.9842), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:52,893 - root - INFO - Step 36270: lr=1.00E-05, loss= 1.1714 (max= 1.9842), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:52,893 - root - INFO - Step 36270: lr=1.00E-05, loss= 1.1714 (max= 1.9842), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:52,893 - root - INFO - Step 36270: lr=1.00E-05, loss= 1.1714 (max= 1.9842), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:40:52,893 - root - INFO - Step 36270: lr=1.00E-05, loss= 1.1714 (max= 1.9842), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:08,809 - root - INFO - Step 36280: lr=1.00E-05, loss= 1.1520 (max= 1.5251), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:08,809 - root - INFO - Step 36280: lr=1.00E-05, loss= 1.1520 (max= 1.5251), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:08,809 - root - INFO - Step 36280: lr=1.00E-05, loss= 1.1520 (max= 1.5251), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:08,809 - root - INFO - Step 36280: lr=1.00E-05, loss= 1.1520 (max= 1.5251), tps=20593, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:08,809 - root - INFO - Step 36280: lr=1.00E-05, loss= 1.1520 (max= 1.5251), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:08,809 - root - INFO - Step 36280: lr=1.00E-05, loss= 1.1520 (max= 1.5251), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:08,809 - root - INFO - Step 36280: lr=1.00E-05, loss= 1.1520 (max= 1.5251), tps=20593, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:08,809 - root - INFO - Step 36280: lr=1.00E-05, loss= 1.1520 (max= 1.5251), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:24,711 - root - INFO - Step 36290: lr=1.00E-05, loss= 1.1572 (max= 1.6369), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:24,711 - root - INFO - Step 36290: lr=1.00E-05, loss= 1.1572 (max= 1.6369), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:24,711 - root - INFO - Step 36290: lr=1.00E-05, loss= 1.1572 (max= 1.6369), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:24,711 - root - INFO - Step 36290: lr=1.00E-05, loss= 1.1572 (max= 1.6369), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:24,712 - root - INFO - Step 36290: lr=1.00E-05, loss= 1.1572 (max= 1.6369), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:24,712 - root - INFO - Step 36290: lr=1.00E-05, loss= 1.1572 (max= 1.6369), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:24,712 - root - INFO - Step 36290: lr=1.00E-05, loss= 1.1572 (max= 1.6369), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:24,712 - root - INFO - Step 36290: lr=1.00E-05, loss= 1.1572 (max= 1.6369), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:40,646 - root - INFO - Step 36300: lr=1.00E-05, loss= 1.1305 (max= 1.4728), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:41:40,646 - root - INFO - Step 36300: lr=1.00E-05, loss= 1.1305 (max= 1.4728), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:41:40,647 - root - INFO - Step 36300: lr=1.00E-05, loss= 1.1305 (max= 1.4728), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:41:40,647 - root - INFO - Step 36300: lr=1.00E-05, loss= 1.1305 (max= 1.4728), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:41:40,647 - root - INFO - Step 36300: lr=1.00E-05, loss= 1.1305 (max= 1.4728), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:41:40,647 - root - INFO - Step 36300: lr=1.00E-05, loss= 1.1305 (max= 1.4728), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:41:40,647 - root - INFO - Step 36300: lr=1.00E-05, loss= 1.1305 (max= 1.4728), tps=20569, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:41:40,647 - root - INFO - Step 36300: lr=1.00E-05, loss= 1.1305 (max= 1.4728), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:41:56,575 - root - INFO - Step 36310: lr=1.00E-05, loss= 1.1216 (max= 1.4908), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:56,575 - root - INFO - Step 36310: lr=1.00E-05, loss= 1.1216 (max= 1.4908), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:56,576 - root - INFO - Step 36310: lr=1.00E-05, loss= 1.1216 (max= 1.4908), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:56,576 - root - INFO - Step 36310: lr=1.00E-05, loss= 1.1216 (max= 1.4908), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:56,576 - root - INFO - Step 36310: lr=1.00E-05, loss= 1.1216 (max= 1.4908), tps=20575, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:56,576 - root - INFO - Step 36310: lr=1.00E-05, loss= 1.1216 (max= 1.4908), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:56,576 - root - INFO - Step 36310: lr=1.00E-05, loss= 1.1216 (max= 1.4908), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:41:56,576 - root - INFO - Step 36310: lr=1.00E-05, loss= 1.1216 (max= 1.4908), tps=20576, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:12,500 - root - INFO - Step 36320: lr=1.00E-05, loss= 1.1907 (max= 1.7472), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:12,500 - root - INFO - Step 36320: lr=1.00E-05, loss= 1.1907 (max= 1.7472), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:12,500 - root - INFO - Step 36320: lr=1.00E-05, loss= 1.1907 (max= 1.7472), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:12,500 - root - INFO - Step 36320: lr=1.00E-05, loss= 1.1907 (max= 1.7472), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:12,500 - root - INFO - Step 36320: lr=1.00E-05, loss= 1.1907 (max= 1.7472), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:12,500 - root - INFO - Step 36320: lr=1.00E-05, loss= 1.1907 (max= 1.7472), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:12,500 - root - INFO - Step 36320: lr=1.00E-05, loss= 1.1907 (max= 1.7472), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:12,500 - root - INFO - Step 36320: lr=1.00E-05, loss= 1.1907 (max= 1.7472), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:28,414 - root - INFO - Step 36330: lr=1.00E-05, loss= 1.1569 (max= 1.6216), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:28,414 - root - INFO - Step 36330: lr=1.00E-05, loss= 1.1569 (max= 1.6216), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:28,415 - root - INFO - Step 36330: lr=1.00E-05, loss= 1.1569 (max= 1.6216), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:28,415 - root - INFO - Step 36330: lr=1.00E-05, loss= 1.1569 (max= 1.6216), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:28,415 - root - INFO - Step 36330: lr=1.00E-05, loss= 1.1569 (max= 1.6216), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:28,415 - root - INFO - Step 36330: lr=1.00E-05, loss= 1.1569 (max= 1.6216), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:28,415 - root - INFO - Step 36330: lr=1.00E-05, loss= 1.1569 (max= 1.6216), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:28,415 - root - INFO - Step 36330: lr=1.00E-05, loss= 1.1569 (max= 1.6216), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:44,390 - root - INFO - Step 36340: lr=1.00E-05, loss= 1.1462 (max= 1.5406), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:44,390 - root - INFO - Step 36340: lr=1.00E-05, loss= 1.1462 (max= 1.5406), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:44,390 - root - INFO - Step 36340: lr=1.00E-05, loss= 1.1462 (max= 1.5406), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:44,390 - root - INFO - Step 36340: lr=1.00E-05, loss= 1.1462 (max= 1.5406), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:44,390 - root - INFO - Step 36340: lr=1.00E-05, loss= 1.1462 (max= 1.5406), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:44,390 - root - INFO - Step 36340: lr=1.00E-05, loss= 1.1462 (max= 1.5406), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:44,390 - root - INFO - Step 36340: lr=1.00E-05, loss= 1.1462 (max= 1.5406), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:42:44,390 - root - INFO - Step 36340: lr=1.00E-05, loss= 1.1462 (max= 1.5406), tps=20516, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:43:00,357 - root - INFO - Step 36350: lr=1.00E-05, loss= 1.1178 (max= 1.5390), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:43:00,357 - root - INFO - Step 36350: lr=1.00E-05, loss= 1.1178 (max= 1.5390), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:43:00,357 - root - INFO - Step 36350: lr=1.00E-05, loss= 1.1178 (max= 1.5390), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:43:00,357 - root - INFO - Step 36350: lr=1.00E-05, loss= 1.1178 (max= 1.5390), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:43:00,357 - root - INFO - Step 36350: lr=1.00E-05, loss= 1.1178 (max= 1.5390), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:43:00,357 - root - INFO - Step 36350: lr=1.00E-05, loss= 1.1178 (max= 1.5390), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:43:00,357 - root - INFO - Step 36350: lr=1.00E-05, loss= 1.1178 (max= 1.5390), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:43:00,357 - root - INFO - Step 36350: lr=1.00E-05, loss= 1.1178 (max= 1.5390), tps=20527, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:43:16,277 - root - INFO - Step 36360: lr=1.00E-05, loss= 1.1348 (max= 1.5766), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:43:16,277 - root - INFO - Step 36360: lr=1.00E-05, loss= 1.1348 (max= 1.5766), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:43:16,277 - root - INFO - Step 36360: lr=1.00E-05, loss= 1.1348 (max= 1.5766), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:43:16,278 - root - INFO - Step 36360: lr=1.00E-05, loss= 1.1348 (max= 1.5766), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:43:16,278 - root - INFO - Step 36360: lr=1.00E-05, loss= 1.1348 (max= 1.5766), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:43:16,278 - root - INFO - Step 36360: lr=1.00E-05, loss= 1.1348 (max= 1.5766), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:43:16,278 - root - INFO - Step 36360: lr=1.00E-05, loss= 1.1348 (max= 1.5766), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:43:16,278 - root - INFO - Step 36360: lr=1.00E-05, loss= 1.1348 (max= 1.5766), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:43:32,175 - root - INFO - Step 36370: lr=1.00E-05, loss= 1.1278 (max= 1.5153), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:43:32,175 - root - INFO - Step 36370: lr=1.00E-05, loss= 1.1278 (max= 1.5153), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:43:32,175 - root - INFO - Step 36370: lr=1.00E-05, loss= 1.1278 (max= 1.5153), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:43:32,176 - root - INFO - Step 36370: lr=1.00E-05, loss= 1.1278 (max= 1.5153), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:43:32,176 - root - INFO - Step 36370: lr=1.00E-05, loss= 1.1278 (max= 1.5153), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:43:32,176 - root - INFO - Step 36370: lr=1.00E-05, loss= 1.1278 (max= 1.5153), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:43:32,176 - root - INFO - Step 36370: lr=1.00E-05, loss= 1.1278 (max= 1.5153), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:43:32,176 - root - INFO - Step 36370: lr=1.00E-05, loss= 1.1278 (max= 1.5153), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:43:48,119 - root - INFO - Step 36380: lr=1.00E-05, loss= 1.1793 (max= 1.5953), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:43:48,119 - root - INFO - Step 36380: lr=1.00E-05, loss= 1.1793 (max= 1.5953), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:43:48,119 - root - INFO - Step 36380: lr=1.00E-05, loss= 1.1793 (max= 1.5953), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:43:48,119 - root - INFO - Step 36380: lr=1.00E-05, loss= 1.1793 (max= 1.5953), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:43:48,119 - root - INFO - Step 36380: lr=1.00E-05, loss= 1.1793 (max= 1.5953), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:43:48,119 - root - INFO - Step 36380: lr=1.00E-05, loss= 1.1793 (max= 1.5953), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:43:48,119 - root - INFO - Step 36380: lr=1.00E-05, loss= 1.1793 (max= 1.5953), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:43:48,119 - root - INFO - Step 36380: lr=1.00E-05, loss= 1.1793 (max= 1.5953), tps=20557, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:04,038 - root - INFO - Step 36390: lr=1.00E-05, loss= 1.1408 (max= 1.4717), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:04,038 - root - INFO - Step 36390: lr=1.00E-05, loss= 1.1408 (max= 1.4717), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:04,038 - root - INFO - Step 36390: lr=1.00E-05, loss= 1.1408 (max= 1.4717), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:04,038 - root - INFO - Step 36390: lr=1.00E-05, loss= 1.1408 (max= 1.4717), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:04,038 - root - INFO - Step 36390: lr=1.00E-05, loss= 1.1408 (max= 1.4717), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:04,038 - root - INFO - Step 36390: lr=1.00E-05, loss= 1.1408 (max= 1.4717), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:04,038 - root - INFO - Step 36390: lr=1.00E-05, loss= 1.1408 (max= 1.4717), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:04,038 - root - INFO - Step 36390: lr=1.00E-05, loss= 1.1408 (max= 1.4717), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:11,110 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:6602672 +2025-10-25 02:44:19,978 - root - INFO - Step 36400: lr=1.00E-05, loss= 1.1330 (max= 1.5791), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:44:19,978 - root - INFO - Step 36400: lr=1.00E-05, loss= 1.1330 (max= 1.5791), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:44:19,978 - root - INFO - Step 36400: lr=1.00E-05, loss= 1.1330 (max= 1.5791), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:44:19,978 - root - INFO - Step 36400: lr=1.00E-05, loss= 1.1330 (max= 1.5791), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:44:19,978 - root - INFO - Step 36400: lr=1.00E-05, loss= 1.1330 (max= 1.5791), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:44:19,978 - root - INFO - Step 36400: lr=1.00E-05, loss= 1.1330 (max= 1.5791), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:44:19,978 - root - INFO - Step 36400: lr=1.00E-05, loss= 1.1330 (max= 1.5791), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:44:19,978 - root - INFO - Step 36400: lr=1.00E-05, loss= 1.1330 (max= 1.5791), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:44:35,927 - root - INFO - Step 36410: lr=1.00E-05, loss= 1.1640 (max= 1.5753), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:35,927 - root - INFO - Step 36410: lr=1.00E-05, loss= 1.1640 (max= 1.5753), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:35,927 - root - INFO - Step 36410: lr=1.00E-05, loss= 1.1640 (max= 1.5753), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:35,927 - root - INFO - Step 36410: lr=1.00E-05, loss= 1.1640 (max= 1.5753), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:35,927 - root - INFO - Step 36410: lr=1.00E-05, loss= 1.1640 (max= 1.5753), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:35,927 - root - INFO - Step 36410: lr=1.00E-05, loss= 1.1640 (max= 1.5753), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:35,927 - root - INFO - Step 36410: lr=1.00E-05, loss= 1.1640 (max= 1.5753), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:35,927 - root - INFO - Step 36410: lr=1.00E-05, loss= 1.1640 (max= 1.5753), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:51,887 - root - INFO - Step 36420: lr=1.00E-05, loss= 1.1755 (max= 1.5026), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:51,887 - root - INFO - Step 36420: lr=1.00E-05, loss= 1.1755 (max= 1.5026), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:51,887 - root - INFO - Step 36420: lr=1.00E-05, loss= 1.1755 (max= 1.5026), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:51,887 - root - INFO - Step 36420: lr=1.00E-05, loss= 1.1755 (max= 1.5026), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:51,887 - root - INFO - Step 36420: lr=1.00E-05, loss= 1.1755 (max= 1.5026), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:51,887 - root - INFO - Step 36420: lr=1.00E-05, loss= 1.1755 (max= 1.5026), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:51,887 - root - INFO - Step 36420: lr=1.00E-05, loss= 1.1755 (max= 1.5026), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:44:51,888 - root - INFO - Step 36420: lr=1.00E-05, loss= 1.1755 (max= 1.5026), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:45:07,843 - root - INFO - Step 36430: lr=1.00E-05, loss= 1.1698 (max= 1.6913), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:07,843 - root - INFO - Step 36430: lr=1.00E-05, loss= 1.1698 (max= 1.6913), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:07,843 - root - INFO - Step 36430: lr=1.00E-05, loss= 1.1698 (max= 1.6913), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:07,843 - root - INFO - Step 36430: lr=1.00E-05, loss= 1.1698 (max= 1.6913), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:07,844 - root - INFO - Step 36430: lr=1.00E-05, loss= 1.1698 (max= 1.6913), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:07,844 - root - INFO - Step 36430: lr=1.00E-05, loss= 1.1698 (max= 1.6913), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:07,844 - root - INFO - Step 36430: lr=1.00E-05, loss= 1.1698 (max= 1.6913), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:07,844 - root - INFO - Step 36430: lr=1.00E-05, loss= 1.1698 (max= 1.6913), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:23,797 - root - INFO - Step 36440: lr=1.00E-05, loss= 1.1803 (max= 1.5374), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:45:23,797 - root - INFO - Step 36440: lr=1.00E-05, loss= 1.1803 (max= 1.5374), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:45:23,797 - root - INFO - Step 36440: lr=1.00E-05, loss= 1.1803 (max= 1.5374), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:45:23,797 - root - INFO - Step 36440: lr=1.00E-05, loss= 1.1803 (max= 1.5374), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:45:23,797 - root - INFO - Step 36440: lr=1.00E-05, loss= 1.1803 (max= 1.5374), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:45:23,797 - root - INFO - Step 36440: lr=1.00E-05, loss= 1.1803 (max= 1.5374), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:45:23,797 - root - INFO - Step 36440: lr=1.00E-05, loss= 1.1803 (max= 1.5374), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:45:23,797 - root - INFO - Step 36440: lr=1.00E-05, loss= 1.1803 (max= 1.5374), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:45:39,730 - root - INFO - Step 36450: lr=1.00E-05, loss= 1.1534 (max= 1.6803), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:39,730 - root - INFO - Step 36450: lr=1.00E-05, loss= 1.1534 (max= 1.6803), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:39,730 - root - INFO - Step 36450: lr=1.00E-05, loss= 1.1534 (max= 1.6803), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:39,730 - root - INFO - Step 36450: lr=1.00E-05, loss= 1.1534 (max= 1.6803), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:39,731 - root - INFO - Step 36450: lr=1.00E-05, loss= 1.1534 (max= 1.6803), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:39,731 - root - INFO - Step 36450: lr=1.00E-05, loss= 1.1534 (max= 1.6803), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:39,731 - root - INFO - Step 36450: lr=1.00E-05, loss= 1.1534 (max= 1.6803), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:39,731 - root - INFO - Step 36450: lr=1.00E-05, loss= 1.1534 (max= 1.6803), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:55,685 - root - INFO - Step 36460: lr=1.00E-05, loss= 1.1526 (max= 1.4967), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:55,685 - root - INFO - Step 36460: lr=1.00E-05, loss= 1.1526 (max= 1.4967), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:55,685 - root - INFO - Step 36460: lr=1.00E-05, loss= 1.1526 (max= 1.4967), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:55,685 - root - INFO - Step 36460: lr=1.00E-05, loss= 1.1526 (max= 1.4967), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:55,685 - root - INFO - Step 36460: lr=1.00E-05, loss= 1.1526 (max= 1.4967), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:55,685 - root - INFO - Step 36460: lr=1.00E-05, loss= 1.1526 (max= 1.4967), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:55,685 - root - INFO - Step 36460: lr=1.00E-05, loss= 1.1526 (max= 1.4967), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:45:55,685 - root - INFO - Step 36460: lr=1.00E-05, loss= 1.1526 (max= 1.4967), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:11,549 - root - INFO - Step 36470: lr=1.00E-05, loss= 1.1434 (max= 1.4487), tps=20659, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:46:11,549 - root - INFO - Step 36470: lr=1.00E-05, loss= 1.1434 (max= 1.4487), tps=20659, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:46:11,549 - root - INFO - Step 36470: lr=1.00E-05, loss= 1.1434 (max= 1.4487), tps=20659, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:46:11,549 - root - INFO - Step 36470: lr=1.00E-05, loss= 1.1434 (max= 1.4487), tps=20659, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:46:11,549 - root - INFO - Step 36470: lr=1.00E-05, loss= 1.1434 (max= 1.4487), tps=20659, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:46:11,549 - root - INFO - Step 36470: lr=1.00E-05, loss= 1.1434 (max= 1.4487), tps=20659, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:46:11,549 - root - INFO - Step 36470: lr=1.00E-05, loss= 1.1434 (max= 1.4487), tps=20659, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:46:11,549 - root - INFO - Step 36470: lr=1.00E-05, loss= 1.1434 (max= 1.4487), tps=20659, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:46:27,522 - root - INFO - Step 36480: lr=1.00E-05, loss= 1.1822 (max= 1.5301), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:27,522 - root - INFO - Step 36480: lr=1.00E-05, loss= 1.1822 (max= 1.5301), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:27,522 - root - INFO - Step 36480: lr=1.00E-05, loss= 1.1822 (max= 1.5301), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:27,522 - root - INFO - Step 36480: lr=1.00E-05, loss= 1.1822 (max= 1.5301), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:27,522 - root - INFO - Step 36480: lr=1.00E-05, loss= 1.1822 (max= 1.5301), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:27,522 - root - INFO - Step 36480: lr=1.00E-05, loss= 1.1822 (max= 1.5301), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:27,522 - root - INFO - Step 36480: lr=1.00E-05, loss= 1.1822 (max= 1.5301), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:27,522 - root - INFO - Step 36480: lr=1.00E-05, loss= 1.1822 (max= 1.5301), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:43,430 - root - INFO - Step 36490: lr=1.00E-05, loss= 1.1486 (max= 1.6279), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:43,430 - root - INFO - Step 36490: lr=1.00E-05, loss= 1.1486 (max= 1.6279), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:43,430 - root - INFO - Step 36490: lr=1.00E-05, loss= 1.1486 (max= 1.6279), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:43,430 - root - INFO - Step 36490: lr=1.00E-05, loss= 1.1486 (max= 1.6279), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:43,430 - root - INFO - Step 36490: lr=1.00E-05, loss= 1.1486 (max= 1.6279), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:43,430 - root - INFO - Step 36490: lr=1.00E-05, loss= 1.1486 (max= 1.6279), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:43,430 - root - INFO - Step 36490: lr=1.00E-05, loss= 1.1486 (max= 1.6279), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:43,430 - root - INFO - Step 36490: lr=1.00E-05, loss= 1.1486 (max= 1.6279), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:59,381 - root - INFO - Step 36500: lr=1.00E-05, loss= 1.1637 (max= 1.5905), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:59,381 - root - INFO - Step 36500: lr=1.00E-05, loss= 1.1637 (max= 1.5905), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:59,381 - root - INFO - Step 36500: lr=1.00E-05, loss= 1.1637 (max= 1.5905), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:59,381 - root - INFO - Step 36500: lr=1.00E-05, loss= 1.1637 (max= 1.5905), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:59,381 - root - INFO - Step 36500: lr=1.00E-05, loss= 1.1637 (max= 1.5905), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:59,381 - root - INFO - Step 36500: lr=1.00E-05, loss= 1.1637 (max= 1.5905), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:59,381 - root - INFO - Step 36500: lr=1.00E-05, loss= 1.1637 (max= 1.5905), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:46:59,381 - root - INFO - Step 36500: lr=1.00E-05, loss= 1.1637 (max= 1.5905), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:15,320 - root - INFO - Step 36510: lr=1.00E-05, loss= 1.1285 (max= 1.5692), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:15,320 - root - INFO - Step 36510: lr=1.00E-05, loss= 1.1285 (max= 1.5692), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:15,320 - root - INFO - Step 36510: lr=1.00E-05, loss= 1.1285 (max= 1.5692), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:15,320 - root - INFO - Step 36510: lr=1.00E-05, loss= 1.1285 (max= 1.5692), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:15,320 - root - INFO - Step 36510: lr=1.00E-05, loss= 1.1285 (max= 1.5692), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:15,320 - root - INFO - Step 36510: lr=1.00E-05, loss= 1.1285 (max= 1.5692), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:15,320 - root - INFO - Step 36510: lr=1.00E-05, loss= 1.1285 (max= 1.5692), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:15,320 - root - INFO - Step 36510: lr=1.00E-05, loss= 1.1285 (max= 1.5692), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:31,252 - root - INFO - Step 36520: lr=1.00E-05, loss= 1.1896 (max= 1.6811), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:31,252 - root - INFO - Step 36520: lr=1.00E-05, loss= 1.1896 (max= 1.6811), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:31,252 - root - INFO - Step 36520: lr=1.00E-05, loss= 1.1896 (max= 1.6811), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:31,252 - root - INFO - Step 36520: lr=1.00E-05, loss= 1.1896 (max= 1.6811), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:31,253 - root - INFO - Step 36520: lr=1.00E-05, loss= 1.1896 (max= 1.6811), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:31,253 - root - INFO - Step 36520: lr=1.00E-05, loss= 1.1896 (max= 1.6811), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:31,253 - root - INFO - Step 36520: lr=1.00E-05, loss= 1.1896 (max= 1.6811), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:31,253 - root - INFO - Step 36520: lr=1.00E-05, loss= 1.1896 (max= 1.6811), tps=20571, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:47,205 - root - INFO - Step 36530: lr=1.00E-05, loss= 1.1601 (max= 1.5532), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:47,205 - root - INFO - Step 36530: lr=1.00E-05, loss= 1.1601 (max= 1.5532), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:47,205 - root - INFO - Step 36530: lr=1.00E-05, loss= 1.1601 (max= 1.5532), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:47,205 - root - INFO - Step 36530: lr=1.00E-05, loss= 1.1601 (max= 1.5532), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:47,205 - root - INFO - Step 36530: lr=1.00E-05, loss= 1.1601 (max= 1.5532), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:47,205 - root - INFO - Step 36530: lr=1.00E-05, loss= 1.1601 (max= 1.5532), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:47,205 - root - INFO - Step 36530: lr=1.00E-05, loss= 1.1601 (max= 1.5532), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:47:47,206 - root - INFO - Step 36530: lr=1.00E-05, loss= 1.1601 (max= 1.5532), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:03,111 - root - INFO - Step 36540: lr=1.00E-05, loss= 1.1524 (max= 1.6032), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:03,111 - root - INFO - Step 36540: lr=1.00E-05, loss= 1.1524 (max= 1.6032), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:03,111 - root - INFO - Step 36540: lr=1.00E-05, loss= 1.1524 (max= 1.6032), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:03,111 - root - INFO - Step 36540: lr=1.00E-05, loss= 1.1524 (max= 1.6032), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:03,111 - root - INFO - Step 36540: lr=1.00E-05, loss= 1.1524 (max= 1.6032), tps=20605, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:03,111 - root - INFO - Step 36540: lr=1.00E-05, loss= 1.1524 (max= 1.6032), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:03,111 - root - INFO - Step 36540: lr=1.00E-05, loss= 1.1524 (max= 1.6032), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:03,111 - root - INFO - Step 36540: lr=1.00E-05, loss= 1.1524 (max= 1.6032), tps=20606, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:19,009 - root - INFO - Step 36550: lr=1.00E-05, loss= 1.1479 (max= 1.5923), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:48:19,009 - root - INFO - Step 36550: lr=1.00E-05, loss= 1.1479 (max= 1.5923), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:48:19,009 - root - INFO - Step 36550: lr=1.00E-05, loss= 1.1479 (max= 1.5923), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:48:19,009 - root - INFO - Step 36550: lr=1.00E-05, loss= 1.1479 (max= 1.5923), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:48:19,009 - root - INFO - Step 36550: lr=1.00E-05, loss= 1.1479 (max= 1.5923), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:48:19,009 - root - INFO - Step 36550: lr=1.00E-05, loss= 1.1479 (max= 1.5923), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:48:19,009 - root - INFO - Step 36550: lr=1.00E-05, loss= 1.1479 (max= 1.5923), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:48:19,009 - root - INFO - Step 36550: lr=1.00E-05, loss= 1.1479 (max= 1.5923), tps=20616, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:48:35,007 - root - INFO - Step 36560: lr=1.00E-05, loss= 1.1556 (max= 1.4633), tps=20486, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:35,007 - root - INFO - Step 36560: lr=1.00E-05, loss= 1.1556 (max= 1.4633), tps=20486, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:35,007 - root - INFO - Step 36560: lr=1.00E-05, loss= 1.1556 (max= 1.4633), tps=20486, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:35,007 - root - INFO - Step 36560: lr=1.00E-05, loss= 1.1556 (max= 1.4633), tps=20486, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:35,008 - root - INFO - Step 36560: lr=1.00E-05, loss= 1.1556 (max= 1.4633), tps=20486, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:35,008 - root - INFO - Step 36560: lr=1.00E-05, loss= 1.1556 (max= 1.4633), tps=20486, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:35,008 - root - INFO - Step 36560: lr=1.00E-05, loss= 1.1556 (max= 1.4633), tps=20486, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:35,008 - root - INFO - Step 36560: lr=1.00E-05, loss= 1.1556 (max= 1.4633), tps=20486, mfu=42.68%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:50,933 - root - INFO - Step 36570: lr=1.00E-05, loss= 1.1791 (max= 1.8269), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:50,933 - root - INFO - Step 36570: lr=1.00E-05, loss= 1.1791 (max= 1.8269), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:50,933 - root - INFO - Step 36570: lr=1.00E-05, loss= 1.1791 (max= 1.8269), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:50,933 - root - INFO - Step 36570: lr=1.00E-05, loss= 1.1791 (max= 1.8269), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:50,933 - root - INFO - Step 36570: lr=1.00E-05, loss= 1.1791 (max= 1.8269), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:50,933 - root - INFO - Step 36570: lr=1.00E-05, loss= 1.1791 (max= 1.8269), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:50,933 - root - INFO - Step 36570: lr=1.00E-05, loss= 1.1791 (max= 1.8269), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:48:50,933 - root - INFO - Step 36570: lr=1.00E-05, loss= 1.1791 (max= 1.8269), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:06,870 - root - INFO - Step 36580: lr=1.00E-05, loss= 1.1677 (max= 1.5429), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:06,870 - root - INFO - Step 36580: lr=1.00E-05, loss= 1.1677 (max= 1.5429), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:06,870 - root - INFO - Step 36580: lr=1.00E-05, loss= 1.1677 (max= 1.5429), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:06,870 - root - INFO - Step 36580: lr=1.00E-05, loss= 1.1677 (max= 1.5429), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:06,870 - root - INFO - Step 36580: lr=1.00E-05, loss= 1.1677 (max= 1.5429), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:06,870 - root - INFO - Step 36580: lr=1.00E-05, loss= 1.1677 (max= 1.5429), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:06,870 - root - INFO - Step 36580: lr=1.00E-05, loss= 1.1677 (max= 1.5429), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:06,870 - root - INFO - Step 36580: lr=1.00E-05, loss= 1.1677 (max= 1.5429), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:22,760 - root - INFO - Step 36590: lr=1.00E-05, loss= 1.1610 (max= 1.4821), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:49:22,760 - root - INFO - Step 36590: lr=1.00E-05, loss= 1.1610 (max= 1.4821), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:49:22,760 - root - INFO - Step 36590: lr=1.00E-05, loss= 1.1610 (max= 1.4821), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:49:22,760 - root - INFO - Step 36590: lr=1.00E-05, loss= 1.1610 (max= 1.4821), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:49:22,760 - root - INFO - Step 36590: lr=1.00E-05, loss= 1.1610 (max= 1.4821), tps=20626, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:49:22,760 - root - INFO - Step 36590: lr=1.00E-05, loss= 1.1610 (max= 1.4821), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:49:22,760 - root - INFO - Step 36590: lr=1.00E-05, loss= 1.1610 (max= 1.4821), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:49:22,760 - root - INFO - Step 36590: lr=1.00E-05, loss= 1.1610 (max= 1.4821), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:49:38,713 - root - INFO - Step 36600: lr=1.00E-05, loss= 1.1208 (max= 1.5032), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:38,713 - root - INFO - Step 36600: lr=1.00E-05, loss= 1.1208 (max= 1.5032), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:38,713 - root - INFO - Step 36600: lr=1.00E-05, loss= 1.1208 (max= 1.5032), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:38,713 - root - INFO - Step 36600: lr=1.00E-05, loss= 1.1208 (max= 1.5032), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:38,713 - root - INFO - Step 36600: lr=1.00E-05, loss= 1.1208 (max= 1.5032), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:38,713 - root - INFO - Step 36600: lr=1.00E-05, loss= 1.1208 (max= 1.5032), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:38,713 - root - INFO - Step 36600: lr=1.00E-05, loss= 1.1208 (max= 1.5032), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:38,713 - root - INFO - Step 36600: lr=1.00E-05, loss= 1.1208 (max= 1.5032), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:54,694 - root - INFO - Step 36610: lr=1.00E-05, loss= 1.1783 (max= 1.7678), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:54,694 - root - INFO - Step 36610: lr=1.00E-05, loss= 1.1783 (max= 1.7678), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:54,694 - root - INFO - Step 36610: lr=1.00E-05, loss= 1.1783 (max= 1.7678), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:54,694 - root - INFO - Step 36610: lr=1.00E-05, loss= 1.1783 (max= 1.7678), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:54,694 - root - INFO - Step 36610: lr=1.00E-05, loss= 1.1783 (max= 1.7678), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:54,694 - root - INFO - Step 36610: lr=1.00E-05, loss= 1.1783 (max= 1.7678), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:54,694 - root - INFO - Step 36610: lr=1.00E-05, loss= 1.1783 (max= 1.7678), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:49:54,694 - root - INFO - Step 36610: lr=1.00E-05, loss= 1.1783 (max= 1.7678), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:50:10,629 - root - INFO - Step 36620: lr=1.00E-05, loss= 1.2150 (max= 1.7774), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:50:10,629 - root - INFO - Step 36620: lr=1.00E-05, loss= 1.2150 (max= 1.7774), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:50:10,630 - root - INFO - Step 36620: lr=1.00E-05, loss= 1.2150 (max= 1.7774), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:50:10,630 - root - INFO - Step 36620: lr=1.00E-05, loss= 1.2150 (max= 1.7774), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:50:10,630 - root - INFO - Step 36620: lr=1.00E-05, loss= 1.2150 (max= 1.7774), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:50:10,630 - root - INFO - Step 36620: lr=1.00E-05, loss= 1.2150 (max= 1.7774), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:50:10,630 - root - INFO - Step 36620: lr=1.00E-05, loss= 1.2150 (max= 1.7774), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:50:10,630 - root - INFO - Step 36620: lr=1.00E-05, loss= 1.2150 (max= 1.7774), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:50:26,594 - root - INFO - Step 36630: lr=1.00E-05, loss= 1.1912 (max= 1.6218), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:50:26,594 - root - INFO - Step 36630: lr=1.00E-05, loss= 1.1912 (max= 1.6218), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:50:26,594 - root - INFO - Step 36630: lr=1.00E-05, loss= 1.1912 (max= 1.6218), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:50:26,594 - root - INFO - Step 36630: lr=1.00E-05, loss= 1.1912 (max= 1.6218), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:50:26,594 - root - INFO - Step 36630: lr=1.00E-05, loss= 1.1912 (max= 1.6218), tps=20530, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:50:26,594 - root - INFO - Step 36630: lr=1.00E-05, loss= 1.1912 (max= 1.6218), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:50:26,594 - root - INFO - Step 36630: lr=1.00E-05, loss= 1.1912 (max= 1.6218), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:50:26,594 - root - INFO - Step 36630: lr=1.00E-05, loss= 1.1912 (max= 1.6218), tps=20530, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:50:42,476 - root - INFO - Step 36640: lr=1.00E-05, loss= 1.1664 (max= 1.5083), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:50:42,476 - root - INFO - Step 36640: lr=1.00E-05, loss= 1.1664 (max= 1.5083), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:50:42,476 - root - INFO - Step 36640: lr=1.00E-05, loss= 1.1664 (max= 1.5083), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:50:42,476 - root - INFO - Step 36640: lr=1.00E-05, loss= 1.1664 (max= 1.5083), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:50:42,476 - root - INFO - Step 36640: lr=1.00E-05, loss= 1.1664 (max= 1.5083), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:50:42,476 - root - INFO - Step 36640: lr=1.00E-05, loss= 1.1664 (max= 1.5083), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:50:42,476 - root - INFO - Step 36640: lr=1.00E-05, loss= 1.1664 (max= 1.5083), tps=20636, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:50:42,476 - root - INFO - Step 36640: lr=1.00E-05, loss= 1.1664 (max= 1.5083), tps=20637, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:50:58,402 - root - INFO - Step 36650: lr=1.00E-05, loss= 1.1560 (max= 1.5225), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:50:58,402 - root - INFO - Step 36650: lr=1.00E-05, loss= 1.1560 (max= 1.5225), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:50:58,402 - root - INFO - Step 36650: lr=1.00E-05, loss= 1.1560 (max= 1.5225), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:50:58,402 - root - INFO - Step 36650: lr=1.00E-05, loss= 1.1560 (max= 1.5225), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:50:58,402 - root - INFO - Step 36650: lr=1.00E-05, loss= 1.1560 (max= 1.5225), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:50:58,402 - root - INFO - Step 36650: lr=1.00E-05, loss= 1.1560 (max= 1.5225), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:50:58,402 - root - INFO - Step 36650: lr=1.00E-05, loss= 1.1560 (max= 1.5225), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:50:58,402 - root - INFO - Step 36650: lr=1.00E-05, loss= 1.1560 (max= 1.5225), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:51:14,346 - root - INFO - Step 36660: lr=1.00E-05, loss= 1.1317 (max= 1.5059), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:51:14,347 - root - INFO - Step 36660: lr=1.00E-05, loss= 1.1317 (max= 1.5059), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:51:14,347 - root - INFO - Step 36660: lr=1.00E-05, loss= 1.1317 (max= 1.5059), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:51:14,347 - root - INFO - Step 36660: lr=1.00E-05, loss= 1.1317 (max= 1.5059), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:51:14,347 - root - INFO - Step 36660: lr=1.00E-05, loss= 1.1317 (max= 1.5059), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:51:14,347 - root - INFO - Step 36660: lr=1.00E-05, loss= 1.1317 (max= 1.5059), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:51:14,347 - root - INFO - Step 36660: lr=1.00E-05, loss= 1.1317 (max= 1.5059), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:51:14,347 - root - INFO - Step 36660: lr=1.00E-05, loss= 1.1317 (max= 1.5059), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:51:30,258 - root - INFO - Step 36670: lr=1.00E-05, loss= 1.1727 (max= 1.7227), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:51:30,258 - root - INFO - Step 36670: lr=1.00E-05, loss= 1.1727 (max= 1.7227), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:51:30,258 - root - INFO - Step 36670: lr=1.00E-05, loss= 1.1727 (max= 1.7227), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:51:30,258 - root - INFO - Step 36670: lr=1.00E-05, loss= 1.1727 (max= 1.7227), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:51:30,258 - root - INFO - Step 36670: lr=1.00E-05, loss= 1.1727 (max= 1.7227), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:51:30,258 - root - INFO - Step 36670: lr=1.00E-05, loss= 1.1727 (max= 1.7227), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:51:30,258 - root - INFO - Step 36670: lr=1.00E-05, loss= 1.1727 (max= 1.7227), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:51:30,258 - root - INFO - Step 36670: lr=1.00E-05, loss= 1.1727 (max= 1.7227), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:51:46,234 - root - INFO - Step 36680: lr=1.00E-05, loss= 1.1735 (max= 1.5702), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:51:46,234 - root - INFO - Step 36680: lr=1.00E-05, loss= 1.1735 (max= 1.5702), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:51:46,234 - root - INFO - Step 36680: lr=1.00E-05, loss= 1.1735 (max= 1.5702), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:51:46,234 - root - INFO - Step 36680: lr=1.00E-05, loss= 1.1735 (max= 1.5702), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:51:46,234 - root - INFO - Step 36680: lr=1.00E-05, loss= 1.1735 (max= 1.5702), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:51:46,235 - root - INFO - Step 36680: lr=1.00E-05, loss= 1.1735 (max= 1.5702), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:51:46,235 - root - INFO - Step 36680: lr=1.00E-05, loss= 1.1735 (max= 1.5702), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:51:46,235 - root - INFO - Step 36680: lr=1.00E-05, loss= 1.1735 (max= 1.5702), tps=20515, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:52:02,165 - root - INFO - Step 36690: lr=1.00E-05, loss= 1.1570 (max= 1.6303), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:52:02,165 - root - INFO - Step 36690: lr=1.00E-05, loss= 1.1570 (max= 1.6303), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:52:02,165 - root - INFO - Step 36690: lr=1.00E-05, loss= 1.1570 (max= 1.6303), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:52:02,165 - root - INFO - Step 36690: lr=1.00E-05, loss= 1.1570 (max= 1.6303), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:52:02,165 - root - INFO - Step 36690: lr=1.00E-05, loss= 1.1570 (max= 1.6303), tps=20573, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:52:02,165 - root - INFO - Step 36690: lr=1.00E-05, loss= 1.1570 (max= 1.6303), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:52:02,165 - root - INFO - Step 36690: lr=1.00E-05, loss= 1.1570 (max= 1.6303), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:52:02,165 - root - INFO - Step 36690: lr=1.00E-05, loss= 1.1570 (max= 1.6303), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:52:18,151 - root - INFO - Step 36700: lr=1.00E-05, loss= 1.1678 (max= 1.6521), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:52:18,151 - root - INFO - Step 36700: lr=1.00E-05, loss= 1.1678 (max= 1.6521), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:52:18,151 - root - INFO - Step 36700: lr=1.00E-05, loss= 1.1678 (max= 1.6521), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:52:18,151 - root - INFO - Step 36700: lr=1.00E-05, loss= 1.1678 (max= 1.6521), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:52:18,151 - root - INFO - Step 36700: lr=1.00E-05, loss= 1.1678 (max= 1.6521), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:52:18,151 - root - INFO - Step 36700: lr=1.00E-05, loss= 1.1678 (max= 1.6521), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:52:18,151 - root - INFO - Step 36700: lr=1.00E-05, loss= 1.1678 (max= 1.6521), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:52:18,151 - root - INFO - Step 36700: lr=1.00E-05, loss= 1.1678 (max= 1.6521), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:52:34,047 - root - INFO - Step 36710: lr=1.00E-05, loss= 1.1779 (max= 1.6508), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:52:34,047 - root - INFO - Step 36710: lr=1.00E-05, loss= 1.1779 (max= 1.6508), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:52:34,047 - root - INFO - Step 36710: lr=1.00E-05, loss= 1.1779 (max= 1.6508), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:52:34,047 - root - INFO - Step 36710: lr=1.00E-05, loss= 1.1779 (max= 1.6508), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:52:34,047 - root - INFO - Step 36710: lr=1.00E-05, loss= 1.1779 (max= 1.6508), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:52:34,047 - root - INFO - Step 36710: lr=1.00E-05, loss= 1.1779 (max= 1.6508), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:52:34,047 - root - INFO - Step 36710: lr=1.00E-05, loss= 1.1779 (max= 1.6508), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:52:34,047 - root - INFO - Step 36710: lr=1.00E-05, loss= 1.1779 (max= 1.6508), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:52:50,001 - root - INFO - Step 36720: lr=1.00E-05, loss= 1.1733 (max= 1.4974), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:52:50,001 - root - INFO - Step 36720: lr=1.00E-05, loss= 1.1733 (max= 1.4974), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:52:50,001 - root - INFO - Step 36720: lr=1.00E-05, loss= 1.1733 (max= 1.4974), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:52:50,001 - root - INFO - Step 36720: lr=1.00E-05, loss= 1.1733 (max= 1.4974), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:52:50,002 - root - INFO - Step 36720: lr=1.00E-05, loss= 1.1733 (max= 1.4974), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:52:50,002 - root - INFO - Step 36720: lr=1.00E-05, loss= 1.1733 (max= 1.4974), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:52:50,002 - root - INFO - Step 36720: lr=1.00E-05, loss= 1.1733 (max= 1.4974), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:52:50,002 - root - INFO - Step 36720: lr=1.00E-05, loss= 1.1733 (max= 1.4974), tps=20543, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:05,910 - root - INFO - Step 36730: lr=1.00E-05, loss= 1.1520 (max= 1.5621), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:05,910 - root - INFO - Step 36730: lr=1.00E-05, loss= 1.1520 (max= 1.5621), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:05,910 - root - INFO - Step 36730: lr=1.00E-05, loss= 1.1520 (max= 1.5621), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:05,910 - root - INFO - Step 36730: lr=1.00E-05, loss= 1.1520 (max= 1.5621), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:05,910 - root - INFO - Step 36730: lr=1.00E-05, loss= 1.1520 (max= 1.5621), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:05,910 - root - INFO - Step 36730: lr=1.00E-05, loss= 1.1520 (max= 1.5621), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:05,910 - root - INFO - Step 36730: lr=1.00E-05, loss= 1.1520 (max= 1.5621), tps=20602, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:05,910 - root - INFO - Step 36730: lr=1.00E-05, loss= 1.1520 (max= 1.5621), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:21,806 - root - INFO - Step 36740: lr=1.00E-05, loss= 1.1825 (max= 1.6079), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:53:21,806 - root - INFO - Step 36740: lr=1.00E-05, loss= 1.1825 (max= 1.6079), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:53:21,806 - root - INFO - Step 36740: lr=1.00E-05, loss= 1.1825 (max= 1.6079), tps=20618, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:53:21,807 - root - INFO - Step 36740: lr=1.00E-05, loss= 1.1825 (max= 1.6079), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:53:21,807 - root - INFO - Step 36740: lr=1.00E-05, loss= 1.1825 (max= 1.6079), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:53:21,807 - root - INFO - Step 36740: lr=1.00E-05, loss= 1.1825 (max= 1.6079), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:53:21,807 - root - INFO - Step 36740: lr=1.00E-05, loss= 1.1825 (max= 1.6079), tps=20619, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:53:21,807 - root - INFO - Step 36740: lr=1.00E-05, loss= 1.1825 (max= 1.6079), tps=20617, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:53:37,726 - root - INFO - Step 36750: lr=1.00E-05, loss= 1.1656 (max= 1.4890), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:37,726 - root - INFO - Step 36750: lr=1.00E-05, loss= 1.1656 (max= 1.4890), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:37,726 - root - INFO - Step 36750: lr=1.00E-05, loss= 1.1656 (max= 1.4890), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:37,726 - root - INFO - Step 36750: lr=1.00E-05, loss= 1.1656 (max= 1.4890), tps=20589, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:37,726 - root - INFO - Step 36750: lr=1.00E-05, loss= 1.1656 (max= 1.4890), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:37,726 - root - INFO - Step 36750: lr=1.00E-05, loss= 1.1656 (max= 1.4890), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:37,726 - root - INFO - Step 36750: lr=1.00E-05, loss= 1.1656 (max= 1.4890), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:37,726 - root - INFO - Step 36750: lr=1.00E-05, loss= 1.1656 (max= 1.4890), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:53,710 - root - INFO - Step 36760: lr=1.00E-05, loss= 1.1575 (max= 1.5162), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:53,710 - root - INFO - Step 36760: lr=1.00E-05, loss= 1.1575 (max= 1.5162), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:53,710 - root - INFO - Step 36760: lr=1.00E-05, loss= 1.1575 (max= 1.5162), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:53,710 - root - INFO - Step 36760: lr=1.00E-05, loss= 1.1575 (max= 1.5162), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:53,710 - root - INFO - Step 36760: lr=1.00E-05, loss= 1.1575 (max= 1.5162), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:53,710 - root - INFO - Step 36760: lr=1.00E-05, loss= 1.1575 (max= 1.5162), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:53,710 - root - INFO - Step 36760: lr=1.00E-05, loss= 1.1575 (max= 1.5162), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:53:53,710 - root - INFO - Step 36760: lr=1.00E-05, loss= 1.1575 (max= 1.5162), tps=20505, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:09,618 - root - INFO - Step 36770: lr=1.00E-05, loss= 1.1696 (max= 1.6254), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:09,618 - root - INFO - Step 36770: lr=1.00E-05, loss= 1.1696 (max= 1.6254), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:09,618 - root - INFO - Step 36770: lr=1.00E-05, loss= 1.1696 (max= 1.6254), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:09,618 - root - INFO - Step 36770: lr=1.00E-05, loss= 1.1696 (max= 1.6254), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:09,618 - root - INFO - Step 36770: lr=1.00E-05, loss= 1.1696 (max= 1.6254), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:09,618 - root - INFO - Step 36770: lr=1.00E-05, loss= 1.1696 (max= 1.6254), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:09,618 - root - INFO - Step 36770: lr=1.00E-05, loss= 1.1696 (max= 1.6254), tps=20602, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:09,618 - root - INFO - Step 36770: lr=1.00E-05, loss= 1.1696 (max= 1.6254), tps=20603, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:25,577 - root - INFO - Step 36780: lr=1.00E-05, loss= 1.1628 (max= 1.5754), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:25,578 - root - INFO - Step 36780: lr=1.00E-05, loss= 1.1628 (max= 1.5754), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:25,578 - root - INFO - Step 36780: lr=1.00E-05, loss= 1.1628 (max= 1.5754), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:25,578 - root - INFO - Step 36780: lr=1.00E-05, loss= 1.1628 (max= 1.5754), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:25,578 - root - INFO - Step 36780: lr=1.00E-05, loss= 1.1628 (max= 1.5754), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:25,578 - root - INFO - Step 36780: lr=1.00E-05, loss= 1.1628 (max= 1.5754), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:25,578 - root - INFO - Step 36780: lr=1.00E-05, loss= 1.1628 (max= 1.5754), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:25,578 - root - INFO - Step 36780: lr=1.00E-05, loss= 1.1628 (max= 1.5754), tps=20536, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:41,501 - root - INFO - Step 36790: lr=1.00E-05, loss= 1.1693 (max= 1.5673), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:41,501 - root - INFO - Step 36790: lr=1.00E-05, loss= 1.1693 (max= 1.5673), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:41,501 - root - INFO - Step 36790: lr=1.00E-05, loss= 1.1693 (max= 1.5673), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:41,501 - root - INFO - Step 36790: lr=1.00E-05, loss= 1.1693 (max= 1.5673), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:41,501 - root - INFO - Step 36790: lr=1.00E-05, loss= 1.1693 (max= 1.5673), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:41,501 - root - INFO - Step 36790: lr=1.00E-05, loss= 1.1693 (max= 1.5673), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:41,501 - root - INFO - Step 36790: lr=1.00E-05, loss= 1.1693 (max= 1.5673), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:41,501 - root - INFO - Step 36790: lr=1.00E-05, loss= 1.1693 (max= 1.5673), tps=20583, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:57,441 - root - INFO - Step 36800: lr=1.00E-05, loss= 1.1238 (max= 1.5977), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:57,441 - root - INFO - Step 36800: lr=1.00E-05, loss= 1.1238 (max= 1.5977), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:57,441 - root - INFO - Step 36800: lr=1.00E-05, loss= 1.1238 (max= 1.5977), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:57,441 - root - INFO - Step 36800: lr=1.00E-05, loss= 1.1238 (max= 1.5977), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:57,441 - root - INFO - Step 36800: lr=1.00E-05, loss= 1.1238 (max= 1.5977), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:57,441 - root - INFO - Step 36800: lr=1.00E-05, loss= 1.1238 (max= 1.5977), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:57,441 - root - INFO - Step 36800: lr=1.00E-05, loss= 1.1238 (max= 1.5977), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:54:57,441 - root - INFO - Step 36800: lr=1.00E-05, loss= 1.1238 (max= 1.5977), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:55:13,316 - root - INFO - Step 36810: lr=1.00E-05, loss= 1.1617 (max= 1.6522), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:55:13,316 - root - INFO - Step 36810: lr=1.00E-05, loss= 1.1617 (max= 1.6522), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:55:13,316 - root - INFO - Step 36810: lr=1.00E-05, loss= 1.1617 (max= 1.6522), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:55:13,316 - root - INFO - Step 36810: lr=1.00E-05, loss= 1.1617 (max= 1.6522), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:55:13,316 - root - INFO - Step 36810: lr=1.00E-05, loss= 1.1617 (max= 1.6522), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:55:13,316 - root - INFO - Step 36810: lr=1.00E-05, loss= 1.1617 (max= 1.6522), tps=20645, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:55:13,316 - root - INFO - Step 36810: lr=1.00E-05, loss= 1.1617 (max= 1.6522), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:55:13,316 - root - INFO - Step 36810: lr=1.00E-05, loss= 1.1617 (max= 1.6522), tps=20646, mfu=43.02%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:55:29,264 - root - INFO - Step 36820: lr=1.00E-05, loss= 1.1678 (max= 1.8856), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:55:29,264 - root - INFO - Step 36820: lr=1.00E-05, loss= 1.1678 (max= 1.8856), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:55:29,264 - root - INFO - Step 36820: lr=1.00E-05, loss= 1.1678 (max= 1.8856), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:55:29,264 - root - INFO - Step 36820: lr=1.00E-05, loss= 1.1678 (max= 1.8856), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:55:29,264 - root - INFO - Step 36820: lr=1.00E-05, loss= 1.1678 (max= 1.8856), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:55:29,264 - root - INFO - Step 36820: lr=1.00E-05, loss= 1.1678 (max= 1.8856), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:55:29,264 - root - INFO - Step 36820: lr=1.00E-05, loss= 1.1678 (max= 1.8856), tps=20550, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:55:29,264 - root - INFO - Step 36820: lr=1.00E-05, loss= 1.1678 (max= 1.8856), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:55:45,216 - root - INFO - Step 36830: lr=1.00E-05, loss= 1.1246 (max= 1.8221), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:55:45,216 - root - INFO - Step 36830: lr=1.00E-05, loss= 1.1246 (max= 1.8221), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:55:45,216 - root - INFO - Step 36830: lr=1.00E-05, loss= 1.1246 (max= 1.8221), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:55:45,216 - root - INFO - Step 36830: lr=1.00E-05, loss= 1.1246 (max= 1.8221), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:55:45,216 - root - INFO - Step 36830: lr=1.00E-05, loss= 1.1246 (max= 1.8221), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:55:45,216 - root - INFO - Step 36830: lr=1.00E-05, loss= 1.1246 (max= 1.8221), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:55:45,216 - root - INFO - Step 36830: lr=1.00E-05, loss= 1.1246 (max= 1.8221), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:55:45,216 - root - INFO - Step 36830: lr=1.00E-05, loss= 1.1246 (max= 1.8221), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:56:01,142 - root - INFO - Step 36840: lr=1.00E-05, loss= 1.1826 (max= 1.6037), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:01,142 - root - INFO - Step 36840: lr=1.00E-05, loss= 1.1826 (max= 1.6037), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:01,142 - root - INFO - Step 36840: lr=1.00E-05, loss= 1.1826 (max= 1.6037), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:01,142 - root - INFO - Step 36840: lr=1.00E-05, loss= 1.1826 (max= 1.6037), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:01,142 - root - INFO - Step 36840: lr=1.00E-05, loss= 1.1826 (max= 1.6037), tps=20579, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:01,142 - root - INFO - Step 36840: lr=1.00E-05, loss= 1.1826 (max= 1.6037), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:01,142 - root - INFO - Step 36840: lr=1.00E-05, loss= 1.1826 (max= 1.6037), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:01,142 - root - INFO - Step 36840: lr=1.00E-05, loss= 1.1826 (max= 1.6037), tps=20580, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:17,073 - root - INFO - Step 36850: lr=1.00E-05, loss= 1.1685 (max= 1.5161), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:56:17,073 - root - INFO - Step 36850: lr=1.00E-05, loss= 1.1685 (max= 1.5161), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:56:17,073 - root - INFO - Step 36850: lr=1.00E-05, loss= 1.1685 (max= 1.5161), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:56:17,073 - root - INFO - Step 36850: lr=1.00E-05, loss= 1.1685 (max= 1.5161), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:56:17,073 - root - INFO - Step 36850: lr=1.00E-05, loss= 1.1685 (max= 1.5161), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:56:17,073 - root - INFO - Step 36850: lr=1.00E-05, loss= 1.1685 (max= 1.5161), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:56:17,073 - root - INFO - Step 36850: lr=1.00E-05, loss= 1.1685 (max= 1.5161), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:56:17,073 - root - INFO - Step 36850: lr=1.00E-05, loss= 1.1685 (max= 1.5161), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:56:33,024 - root - INFO - Step 36860: lr=9.11E-06, loss= 1.1299 (max= 1.6426), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:33,024 - root - INFO - Step 36860: lr=9.11E-06, loss= 1.1299 (max= 1.6426), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:33,024 - root - INFO - Step 36860: lr=9.11E-06, loss= 1.1299 (max= 1.6426), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:33,024 - root - INFO - Step 36860: lr=9.11E-06, loss= 1.1299 (max= 1.6426), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:33,024 - root - INFO - Step 36860: lr=9.11E-06, loss= 1.1299 (max= 1.6426), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:33,025 - root - INFO - Step 36860: lr=9.11E-06, loss= 1.1299 (max= 1.6426), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:33,025 - root - INFO - Step 36860: lr=9.11E-06, loss= 1.1299 (max= 1.6426), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:33,025 - root - INFO - Step 36860: lr=9.11E-06, loss= 1.1299 (max= 1.6426), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:48,924 - root - INFO - Step 36870: lr=8.66E-06, loss= 1.1670 (max= 1.7058), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:48,924 - root - INFO - Step 36870: lr=8.66E-06, loss= 1.1670 (max= 1.7058), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:48,924 - root - INFO - Step 36870: lr=8.66E-06, loss= 1.1670 (max= 1.7058), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:48,924 - root - INFO - Step 36870: lr=8.66E-06, loss= 1.1670 (max= 1.7058), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:48,924 - root - INFO - Step 36870: lr=8.66E-06, loss= 1.1670 (max= 1.7058), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:48,924 - root - INFO - Step 36870: lr=8.66E-06, loss= 1.1670 (max= 1.7058), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:48,924 - root - INFO - Step 36870: lr=8.66E-06, loss= 1.1670 (max= 1.7058), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:56:48,924 - root - INFO - Step 36870: lr=8.66E-06, loss= 1.1670 (max= 1.7058), tps=20613, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:04,901 - root - INFO - Step 36880: lr=8.33E-06, loss= 1.1634 (max= 1.5880), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:04,901 - root - INFO - Step 36880: lr=8.33E-06, loss= 1.1634 (max= 1.5880), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:04,901 - root - INFO - Step 36880: lr=8.33E-06, loss= 1.1634 (max= 1.5880), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:04,901 - root - INFO - Step 36880: lr=8.33E-06, loss= 1.1634 (max= 1.5880), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:04,901 - root - INFO - Step 36880: lr=8.33E-06, loss= 1.1634 (max= 1.5880), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:04,901 - root - INFO - Step 36880: lr=8.33E-06, loss= 1.1634 (max= 1.5880), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:04,901 - root - INFO - Step 36880: lr=8.33E-06, loss= 1.1634 (max= 1.5880), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:04,901 - root - INFO - Step 36880: lr=8.33E-06, loss= 1.1634 (max= 1.5880), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:20,792 - root - INFO - Step 36890: lr=8.05E-06, loss= 1.1521 (max= 1.5846), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:57:20,792 - root - INFO - Step 36890: lr=8.05E-06, loss= 1.1521 (max= 1.5846), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:57:20,792 - root - INFO - Step 36890: lr=8.05E-06, loss= 1.1521 (max= 1.5846), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:57:20,792 - root - INFO - Step 36890: lr=8.05E-06, loss= 1.1521 (max= 1.5846), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:57:20,792 - root - INFO - Step 36890: lr=8.05E-06, loss= 1.1521 (max= 1.5846), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:57:20,792 - root - INFO - Step 36890: lr=8.05E-06, loss= 1.1521 (max= 1.5846), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:57:20,793 - root - INFO - Step 36890: lr=8.05E-06, loss= 1.1521 (max= 1.5846), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:57:20,793 - root - INFO - Step 36890: lr=8.05E-06, loss= 1.1521 (max= 1.5846), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:57:36,773 - root - INFO - Step 36900: lr=7.81E-06, loss= 1.1484 (max= 1.6004), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:36,773 - root - INFO - Step 36900: lr=7.81E-06, loss= 1.1484 (max= 1.6004), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:36,773 - root - INFO - Step 36900: lr=7.81E-06, loss= 1.1484 (max= 1.6004), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:36,773 - root - INFO - Step 36900: lr=7.81E-06, loss= 1.1484 (max= 1.6004), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:36,773 - root - INFO - Step 36900: lr=7.81E-06, loss= 1.1484 (max= 1.6004), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:36,773 - root - INFO - Step 36900: lr=7.81E-06, loss= 1.1484 (max= 1.6004), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:36,773 - root - INFO - Step 36900: lr=7.81E-06, loss= 1.1484 (max= 1.6004), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:36,773 - root - INFO - Step 36900: lr=7.81E-06, loss= 1.1484 (max= 1.6004), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:52,726 - root - INFO - Step 36910: lr=7.59E-06, loss= 1.1500 (max= 1.5244), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:52,726 - root - INFO - Step 36910: lr=7.59E-06, loss= 1.1500 (max= 1.5244), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:52,726 - root - INFO - Step 36910: lr=7.59E-06, loss= 1.1500 (max= 1.5244), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:52,726 - root - INFO - Step 36910: lr=7.59E-06, loss= 1.1500 (max= 1.5244), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:52,726 - root - INFO - Step 36910: lr=7.59E-06, loss= 1.1500 (max= 1.5244), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:52,726 - root - INFO - Step 36910: lr=7.59E-06, loss= 1.1500 (max= 1.5244), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:52,726 - root - INFO - Step 36910: lr=7.59E-06, loss= 1.1500 (max= 1.5244), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:57:52,726 - root - INFO - Step 36910: lr=7.59E-06, loss= 1.1500 (max= 1.5244), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:58:08,661 - root - INFO - Step 36920: lr=7.39E-06, loss= 1.1158 (max= 1.7264), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:58:08,661 - root - INFO - Step 36920: lr=7.39E-06, loss= 1.1158 (max= 1.7264), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:58:08,661 - root - INFO - Step 36920: lr=7.39E-06, loss= 1.1158 (max= 1.7264), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:58:08,661 - root - INFO - Step 36920: lr=7.39E-06, loss= 1.1158 (max= 1.7264), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:58:08,661 - root - INFO - Step 36920: lr=7.39E-06, loss= 1.1158 (max= 1.7264), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:58:08,661 - root - INFO - Step 36920: lr=7.39E-06, loss= 1.1158 (max= 1.7264), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:58:08,661 - root - INFO - Step 36920: lr=7.39E-06, loss= 1.1158 (max= 1.7264), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:58:08,661 - root - INFO - Step 36920: lr=7.39E-06, loss= 1.1158 (max= 1.7264), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:58:24,618 - root - INFO - Step 36930: lr=7.21E-06, loss= 1.1090 (max= 1.6801), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:58:24,618 - root - INFO - Step 36930: lr=7.21E-06, loss= 1.1090 (max= 1.6801), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:58:24,618 - root - INFO - Step 36930: lr=7.21E-06, loss= 1.1090 (max= 1.6801), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:58:24,618 - root - INFO - Step 36930: lr=7.21E-06, loss= 1.1090 (max= 1.6801), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:58:24,618 - root - INFO - Step 36930: lr=7.21E-06, loss= 1.1090 (max= 1.6801), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:58:24,618 - root - INFO - Step 36930: lr=7.21E-06, loss= 1.1090 (max= 1.6801), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:58:24,618 - root - INFO - Step 36930: lr=7.21E-06, loss= 1.1090 (max= 1.6801), tps=20540, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:58:24,618 - root - INFO - Step 36930: lr=7.21E-06, loss= 1.1090 (max= 1.6801), tps=20540, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:58:40,509 - root - INFO - Step 36940: lr=7.03E-06, loss= 1.1544 (max= 1.8531), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:58:40,509 - root - INFO - Step 36940: lr=7.03E-06, loss= 1.1544 (max= 1.8531), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:58:40,509 - root - INFO - Step 36940: lr=7.03E-06, loss= 1.1544 (max= 1.8531), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:58:40,509 - root - INFO - Step 36940: lr=7.03E-06, loss= 1.1544 (max= 1.8531), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:58:40,509 - root - INFO - Step 36940: lr=7.03E-06, loss= 1.1544 (max= 1.8531), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:58:40,509 - root - INFO - Step 36940: lr=7.03E-06, loss= 1.1544 (max= 1.8531), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:58:40,509 - root - INFO - Step 36940: lr=7.03E-06, loss= 1.1544 (max= 1.8531), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:58:40,509 - root - INFO - Step 36940: lr=7.03E-06, loss= 1.1544 (max= 1.8531), tps=20624, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:58:56,460 - root - INFO - Step 36950: lr=6.87E-06, loss= 1.1508 (max= 1.4699), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:58:56,460 - root - INFO - Step 36950: lr=6.87E-06, loss= 1.1508 (max= 1.4699), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:58:56,460 - root - INFO - Step 36950: lr=6.87E-06, loss= 1.1508 (max= 1.4699), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:58:56,460 - root - INFO - Step 36950: lr=6.87E-06, loss= 1.1508 (max= 1.4699), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:58:56,460 - root - INFO - Step 36950: lr=6.87E-06, loss= 1.1508 (max= 1.4699), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:58:56,460 - root - INFO - Step 36950: lr=6.87E-06, loss= 1.1508 (max= 1.4699), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:58:56,460 - root - INFO - Step 36950: lr=6.87E-06, loss= 1.1508 (max= 1.4699), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:58:56,460 - root - INFO - Step 36950: lr=6.87E-06, loss= 1.1508 (max= 1.4699), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 02:59:12,405 - root - INFO - Step 36960: lr=6.71E-06, loss= 1.1498 (max= 1.5056), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:12,405 - root - INFO - Step 36960: lr=6.71E-06, loss= 1.1498 (max= 1.5056), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:12,405 - root - INFO - Step 36960: lr=6.71E-06, loss= 1.1498 (max= 1.5056), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:12,405 - root - INFO - Step 36960: lr=6.71E-06, loss= 1.1498 (max= 1.5056), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:12,405 - root - INFO - Step 36960: lr=6.71E-06, loss= 1.1498 (max= 1.5056), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:12,405 - root - INFO - Step 36960: lr=6.71E-06, loss= 1.1498 (max= 1.5056), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:12,405 - root - INFO - Step 36960: lr=6.71E-06, loss= 1.1498 (max= 1.5056), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:12,405 - root - INFO - Step 36960: lr=6.71E-06, loss= 1.1498 (max= 1.5056), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:21,125 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:4017313 +2025-10-25 02:59:28,328 - root - INFO - Step 36970: lr=6.56E-06, loss= 1.1122 (max= 1.4914), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:28,328 - root - INFO - Step 36970: lr=6.56E-06, loss= 1.1122 (max= 1.4914), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:28,328 - root - INFO - Step 36970: lr=6.56E-06, loss= 1.1122 (max= 1.4914), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:28,329 - root - INFO - Step 36970: lr=6.56E-06, loss= 1.1122 (max= 1.4914), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:28,329 - root - INFO - Step 36970: lr=6.56E-06, loss= 1.1122 (max= 1.4914), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:28,329 - root - INFO - Step 36970: lr=6.56E-06, loss= 1.1122 (max= 1.4914), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:28,329 - root - INFO - Step 36970: lr=6.56E-06, loss= 1.1122 (max= 1.4914), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:28,329 - root - INFO - Step 36970: lr=6.56E-06, loss= 1.1122 (max= 1.4914), tps=20583, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:44,244 - root - INFO - Step 36980: lr=6.42E-06, loss= 1.1483 (max= 1.7950), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:44,244 - root - INFO - Step 36980: lr=6.42E-06, loss= 1.1483 (max= 1.7950), tps=20593, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:44,244 - root - INFO - Step 36980: lr=6.42E-06, loss= 1.1483 (max= 1.7950), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:44,244 - root - INFO - Step 36980: lr=6.42E-06, loss= 1.1483 (max= 1.7950), tps=20592, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:44,244 - root - INFO - Step 36980: lr=6.42E-06, loss= 1.1483 (max= 1.7950), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:44,244 - root - INFO - Step 36980: lr=6.42E-06, loss= 1.1483 (max= 1.7950), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:44,244 - root - INFO - Step 36980: lr=6.42E-06, loss= 1.1483 (max= 1.7950), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 02:59:44,245 - root - INFO - Step 36980: lr=6.42E-06, loss= 1.1483 (max= 1.7950), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:00:00,199 - root - INFO - Step 36990: lr=6.29E-06, loss= 1.1406 (max= 1.6053), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:00:00,199 - root - INFO - Step 36990: lr=6.29E-06, loss= 1.1406 (max= 1.6053), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:00:00,200 - root - INFO - Step 36990: lr=6.29E-06, loss= 1.1406 (max= 1.6053), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:00:00,200 - root - INFO - Step 36990: lr=6.29E-06, loss= 1.1406 (max= 1.6053), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:00:00,200 - root - INFO - Step 36990: lr=6.29E-06, loss= 1.1406 (max= 1.6053), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:00:00,200 - root - INFO - Step 36990: lr=6.29E-06, loss= 1.1406 (max= 1.6053), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:00:00,200 - root - INFO - Step 36990: lr=6.29E-06, loss= 1.1406 (max= 1.6053), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:00:00,200 - root - INFO - Step 36990: lr=6.29E-06, loss= 1.1406 (max= 1.6053), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +Saving dataset to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-37000 +Dataset successfully saved to jobs/munin-7b-open-stage1/checkpoints/dataloader/step-37000! Save time: 4.506304979324341 +2025-10-25 03:00:16,119 - root - INFO - Step 37000: lr=6.15E-06, loss= 1.1053 (max= 1.4003), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:00:16,119 - root - INFO - Step 37000: lr=6.15E-06, loss= 1.1053 (max= 1.4003), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:00:16,119 - root - INFO - Saving a full checkpoint at step 37000 +2025-10-25 03:00:16,119 - root - INFO - Step 37000: lr=6.15E-06, loss= 1.1053 (max= 1.4003), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:00:16,119 - root - INFO - Saving a full checkpoint at step 37000 +2025-10-25 03:00:16,119 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 03:00:16,119 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 03:00:16,119 - root - INFO - Saving a full checkpoint at step 37000 +2025-10-25 03:00:16,119 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 03:00:16,119 - root - INFO - Step 37000: lr=6.15E-06, loss= 1.1053 (max= 1.4003), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:00:16,119 - root - INFO - Saving a full checkpoint at step 37000 +2025-10-25 03:00:16,119 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 03:00:16,119 - root - INFO - Step 37000: lr=6.15E-06, loss= 1.1053 (max= 1.4003), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:00:16,119 - root - INFO - Step 37000: lr=6.15E-06, loss= 1.1053 (max= 1.4003), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:00:16,119 - root - INFO - Step 37000: lr=6.15E-06, loss= 1.1053 (max= 1.4003), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:00:16,119 - root - INFO - Saving a full checkpoint at step 37000 +2025-10-25 03:00:16,119 - root - INFO - Step 37000: lr=6.15E-06, loss= 1.1053 (max= 1.4003), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:00:16,119 - root - INFO - Saving a full checkpoint at step 37000 +2025-10-25 03:00:16,119 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 03:00:16,119 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 03:00:16,119 - root - INFO - Saving a full checkpoint at step 37000 +2025-10-25 03:00:16,119 - root - INFO - Saving a full checkpoint at step 37000 +2025-10-25 03:00:16,120 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 03:00:16,120 - root - INFO - CheckpointManager: State dict keys: dict_keys(['train_state', 'model', 'optimizer', 'lr_scheduler']) +2025-10-25 03:00:30,547 - root - INFO - Finished saving the checkpoint in 14.43 seconds +2025-10-25 03:00:30,554 - root - INFO - Finished saving the checkpoint in 14.43 seconds +2025-10-25 03:00:30,554 - root - INFO - Finished saving the checkpoint in 14.43 seconds +2025-10-25 03:00:30,554 - root - INFO - Finished saving the checkpoint in 14.43 seconds +2025-10-25 03:00:30,555 - root - INFO - Finished saving the checkpoint in 14.44 seconds +2025-10-25 03:00:30,556 - root - INFO - Finished saving the checkpoint in 14.44 seconds +2025-10-25 03:00:30,556 - root - INFO - Finished saving the checkpoint in 14.44 seconds +2025-10-25 03:00:30,556 - root - INFO - Finished saving the checkpoint in 14.44 seconds +2025-10-25 03:00:46,419 - root - INFO - Step 37010: lr=6.03E-06, loss= 1.1356 (max= 1.7499), tps=10816, mfu=22.54%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:00:46,419 - root - INFO - Step 37010: lr=6.03E-06, loss= 1.1356 (max= 1.7499), tps=10816, mfu=22.54%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:00:46,419 - root - INFO - Step 37010: lr=6.03E-06, loss= 1.1356 (max= 1.7499), tps=10816, mfu=22.53%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:00:46,419 - root - INFO - Step 37010: lr=6.03E-06, loss= 1.1356 (max= 1.7499), tps=10816, mfu=22.53%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:00:46,419 - root - INFO - Step 37010: lr=6.03E-06, loss= 1.1356 (max= 1.7499), tps=10816, mfu=22.54%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:00:46,419 - root - INFO - Step 37010: lr=6.03E-06, loss= 1.1356 (max= 1.7499), tps=10816, mfu=22.54%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:00:46,419 - root - INFO - Step 37010: lr=6.03E-06, loss= 1.1356 (max= 1.7499), tps=10816, mfu=22.54%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:00:46,419 - root - INFO - Step 37010: lr=6.03E-06, loss= 1.1356 (max= 1.7499), tps=10816, mfu=22.54%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:02,366 - root - INFO - Step 37020: lr=5.90E-06, loss= 1.1399 (max= 1.4989), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:02,366 - root - INFO - Step 37020: lr=5.90E-06, loss= 1.1399 (max= 1.4989), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:02,366 - root - INFO - Step 37020: lr=5.90E-06, loss= 1.1399 (max= 1.4989), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:02,366 - root - INFO - Step 37020: lr=5.90E-06, loss= 1.1399 (max= 1.4989), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:02,367 - root - INFO - Step 37020: lr=5.90E-06, loss= 1.1399 (max= 1.4989), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:02,367 - root - INFO - Step 37020: lr=5.90E-06, loss= 1.1399 (max= 1.4989), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:02,367 - root - INFO - Step 37020: lr=5.90E-06, loss= 1.1399 (max= 1.4989), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:02,367 - root - INFO - Step 37020: lr=5.90E-06, loss= 1.1399 (max= 1.4989), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:18,309 - root - INFO - Step 37030: lr=5.78E-06, loss= 1.1399 (max= 1.5219), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:01:18,309 - root - INFO - Step 37030: lr=5.78E-06, loss= 1.1399 (max= 1.5219), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:01:18,309 - root - INFO - Step 37030: lr=5.78E-06, loss= 1.1399 (max= 1.5219), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:01:18,309 - root - INFO - Step 37030: lr=5.78E-06, loss= 1.1399 (max= 1.5219), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:01:18,309 - root - INFO - Step 37030: lr=5.78E-06, loss= 1.1399 (max= 1.5219), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:01:18,309 - root - INFO - Step 37030: lr=5.78E-06, loss= 1.1399 (max= 1.5219), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:01:18,309 - root - INFO - Step 37030: lr=5.78E-06, loss= 1.1399 (max= 1.5219), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:01:18,309 - root - INFO - Step 37030: lr=5.78E-06, loss= 1.1399 (max= 1.5219), tps=20558, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:01:34,249 - root - INFO - Step 37040: lr=5.66E-06, loss= 1.1828 (max= 2.0336), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:34,249 - root - INFO - Step 37040: lr=5.66E-06, loss= 1.1828 (max= 2.0336), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:34,249 - root - INFO - Step 37040: lr=5.66E-06, loss= 1.1828 (max= 2.0336), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:34,249 - root - INFO - Step 37040: lr=5.66E-06, loss= 1.1828 (max= 2.0336), tps=20560, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:34,249 - root - INFO - Step 37040: lr=5.66E-06, loss= 1.1828 (max= 2.0336), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:34,249 - root - INFO - Step 37040: lr=5.66E-06, loss= 1.1828 (max= 2.0336), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:34,249 - root - INFO - Step 37040: lr=5.66E-06, loss= 1.1828 (max= 2.0336), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:34,249 - root - INFO - Step 37040: lr=5.66E-06, loss= 1.1828 (max= 2.0336), tps=20561, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:50,213 - root - INFO - Step 37050: lr=5.55E-06, loss= 1.1441 (max= 1.6727), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:50,213 - root - INFO - Step 37050: lr=5.55E-06, loss= 1.1441 (max= 1.6727), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:50,213 - root - INFO - Step 37050: lr=5.55E-06, loss= 1.1441 (max= 1.6727), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:50,213 - root - INFO - Step 37050: lr=5.55E-06, loss= 1.1441 (max= 1.6727), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:50,213 - root - INFO - Step 37050: lr=5.55E-06, loss= 1.1441 (max= 1.6727), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:50,213 - root - INFO - Step 37050: lr=5.55E-06, loss= 1.1441 (max= 1.6727), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:50,213 - root - INFO - Step 37050: lr=5.55E-06, loss= 1.1441 (max= 1.6727), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:01:50,213 - root - INFO - Step 37050: lr=5.55E-06, loss= 1.1441 (max= 1.6727), tps=20531, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:02:06,150 - root - INFO - Step 37060: lr=5.44E-06, loss= 1.2005 (max= 1.6384), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:02:06,150 - root - INFO - Step 37060: lr=5.44E-06, loss= 1.2005 (max= 1.6384), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:02:06,150 - root - INFO - Step 37060: lr=5.44E-06, loss= 1.2005 (max= 1.6384), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:02:06,151 - root - INFO - Step 37060: lr=5.44E-06, loss= 1.2005 (max= 1.6384), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:02:06,151 - root - INFO - Step 37060: lr=5.44E-06, loss= 1.2005 (max= 1.6384), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:02:06,151 - root - INFO - Step 37060: lr=5.44E-06, loss= 1.2005 (max= 1.6384), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:02:06,151 - root - INFO - Step 37060: lr=5.44E-06, loss= 1.2005 (max= 1.6384), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:02:06,151 - root - INFO - Step 37060: lr=5.44E-06, loss= 1.2005 (max= 1.6384), tps=20564, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:02:22,132 - root - INFO - Step 37070: lr=5.33E-06, loss= 1.1799 (max= 1.4905), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:22,132 - root - INFO - Step 37070: lr=5.33E-06, loss= 1.1799 (max= 1.4905), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:22,132 - root - INFO - Step 37070: lr=5.33E-06, loss= 1.1799 (max= 1.4905), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:22,132 - root - INFO - Step 37070: lr=5.33E-06, loss= 1.1799 (max= 1.4905), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:22,132 - root - INFO - Step 37070: lr=5.33E-06, loss= 1.1799 (max= 1.4905), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:22,132 - root - INFO - Step 37070: lr=5.33E-06, loss= 1.1799 (max= 1.4905), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:22,132 - root - INFO - Step 37070: lr=5.33E-06, loss= 1.1799 (max= 1.4905), tps=20509, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:22,132 - root - INFO - Step 37070: lr=5.33E-06, loss= 1.1799 (max= 1.4905), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:38,092 - root - INFO - Step 37080: lr=5.23E-06, loss= 1.1536 (max= 1.5490), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:38,092 - root - INFO - Step 37080: lr=5.23E-06, loss= 1.1536 (max= 1.5490), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:38,092 - root - INFO - Step 37080: lr=5.23E-06, loss= 1.1536 (max= 1.5490), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:38,092 - root - INFO - Step 37080: lr=5.23E-06, loss= 1.1536 (max= 1.5490), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:38,092 - root - INFO - Step 37080: lr=5.23E-06, loss= 1.1536 (max= 1.5490), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:38,092 - root - INFO - Step 37080: lr=5.23E-06, loss= 1.1536 (max= 1.5490), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:38,092 - root - INFO - Step 37080: lr=5.23E-06, loss= 1.1536 (max= 1.5490), tps=20535, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:38,092 - root - INFO - Step 37080: lr=5.23E-06, loss= 1.1536 (max= 1.5490), tps=20535, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:54,071 - root - INFO - Step 37090: lr=5.12E-06, loss= 1.1782 (max= 1.6475), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:54,071 - root - INFO - Step 37090: lr=5.12E-06, loss= 1.1782 (max= 1.6475), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:54,071 - root - INFO - Step 37090: lr=5.12E-06, loss= 1.1782 (max= 1.6475), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:54,071 - root - INFO - Step 37090: lr=5.12E-06, loss= 1.1782 (max= 1.6475), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:54,071 - root - INFO - Step 37090: lr=5.12E-06, loss= 1.1782 (max= 1.6475), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:54,071 - root - INFO - Step 37090: lr=5.12E-06, loss= 1.1782 (max= 1.6475), tps=20513, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:54,071 - root - INFO - Step 37090: lr=5.12E-06, loss= 1.1782 (max= 1.6475), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:02:54,071 - root - INFO - Step 37090: lr=5.12E-06, loss= 1.1782 (max= 1.6475), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:03:10,004 - root - INFO - Step 37100: lr=5.02E-06, loss= 1.1410 (max= 1.4786), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:03:10,004 - root - INFO - Step 37100: lr=5.02E-06, loss= 1.1410 (max= 1.4786), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:03:10,005 - root - INFO - Step 37100: lr=5.02E-06, loss= 1.1410 (max= 1.4786), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:03:10,005 - root - INFO - Step 37100: lr=5.02E-06, loss= 1.1410 (max= 1.4786), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:03:10,005 - root - INFO - Step 37100: lr=5.02E-06, loss= 1.1410 (max= 1.4786), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:03:10,005 - root - INFO - Step 37100: lr=5.02E-06, loss= 1.1410 (max= 1.4786), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:03:10,005 - root - INFO - Step 37100: lr=5.02E-06, loss= 1.1410 (max= 1.4786), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:03:10,005 - root - INFO - Step 37100: lr=5.02E-06, loss= 1.1410 (max= 1.4786), tps=20570, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:03:25,979 - root - INFO - Step 37110: lr=4.92E-06, loss= 1.1894 (max= 1.6251), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:03:25,979 - root - INFO - Step 37110: lr=4.92E-06, loss= 1.1894 (max= 1.6251), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:03:25,979 - root - INFO - Step 37110: lr=4.92E-06, loss= 1.1894 (max= 1.6251), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:03:25,979 - root - INFO - Step 37110: lr=4.92E-06, loss= 1.1894 (max= 1.6251), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:03:25,979 - root - INFO - Step 37110: lr=4.92E-06, loss= 1.1894 (max= 1.6251), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:03:25,979 - root - INFO - Step 37110: lr=4.92E-06, loss= 1.1894 (max= 1.6251), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:03:25,979 - root - INFO - Step 37110: lr=4.92E-06, loss= 1.1894 (max= 1.6251), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:03:25,979 - root - INFO - Step 37110: lr=4.92E-06, loss= 1.1894 (max= 1.6251), tps=20517, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:03:41,916 - root - INFO - Step 37120: lr=4.82E-06, loss= 1.1687 (max= 1.5018), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:03:41,916 - root - INFO - Step 37120: lr=4.82E-06, loss= 1.1687 (max= 1.5018), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:03:41,916 - root - INFO - Step 37120: lr=4.82E-06, loss= 1.1687 (max= 1.5018), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:03:41,916 - root - INFO - Step 37120: lr=4.82E-06, loss= 1.1687 (max= 1.5018), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:03:41,916 - root - INFO - Step 37120: lr=4.82E-06, loss= 1.1687 (max= 1.5018), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:03:41,916 - root - INFO - Step 37120: lr=4.82E-06, loss= 1.1687 (max= 1.5018), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:03:41,916 - root - INFO - Step 37120: lr=4.82E-06, loss= 1.1687 (max= 1.5018), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:03:41,916 - root - INFO - Step 37120: lr=4.82E-06, loss= 1.1687 (max= 1.5018), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:03:57,837 - root - INFO - Step 37130: lr=4.73E-06, loss= 1.1707 (max= 1.9839), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:03:57,837 - root - INFO - Step 37130: lr=4.73E-06, loss= 1.1707 (max= 1.9839), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:03:57,837 - root - INFO - Step 37130: lr=4.73E-06, loss= 1.1707 (max= 1.9839), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:03:57,837 - root - INFO - Step 37130: lr=4.73E-06, loss= 1.1707 (max= 1.9839), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:03:57,837 - root - INFO - Step 37130: lr=4.73E-06, loss= 1.1707 (max= 1.9839), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:03:57,837 - root - INFO - Step 37130: lr=4.73E-06, loss= 1.1707 (max= 1.9839), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:03:57,837 - root - INFO - Step 37130: lr=4.73E-06, loss= 1.1707 (max= 1.9839), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:03:57,837 - root - INFO - Step 37130: lr=4.73E-06, loss= 1.1707 (max= 1.9839), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:04:13,720 - root - INFO - Step 37140: lr=4.63E-06, loss= 1.1686 (max= 1.5848), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:04:13,720 - root - INFO - Step 37140: lr=4.63E-06, loss= 1.1686 (max= 1.5848), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:04:13,720 - root - INFO - Step 37140: lr=4.63E-06, loss= 1.1686 (max= 1.5848), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:04:13,720 - root - INFO - Step 37140: lr=4.63E-06, loss= 1.1686 (max= 1.5848), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:04:13,720 - root - INFO - Step 37140: lr=4.63E-06, loss= 1.1686 (max= 1.5848), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:04:13,720 - root - INFO - Step 37140: lr=4.63E-06, loss= 1.1686 (max= 1.5848), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:04:13,721 - root - INFO - Step 37140: lr=4.63E-06, loss= 1.1686 (max= 1.5848), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:04:13,721 - root - INFO - Step 37140: lr=4.63E-06, loss= 1.1686 (max= 1.5848), tps=20635, mfu=42.99%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:04:29,693 - root - INFO - Step 37150: lr=4.54E-06, loss= 1.1023 (max= 1.4087), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:04:29,693 - root - INFO - Step 37150: lr=4.54E-06, loss= 1.1023 (max= 1.4087), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:04:29,693 - root - INFO - Step 37150: lr=4.54E-06, loss= 1.1023 (max= 1.4087), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:04:29,693 - root - INFO - Step 37150: lr=4.54E-06, loss= 1.1023 (max= 1.4087), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:04:29,693 - root - INFO - Step 37150: lr=4.54E-06, loss= 1.1023 (max= 1.4087), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:04:29,693 - root - INFO - Step 37150: lr=4.54E-06, loss= 1.1023 (max= 1.4087), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:04:29,693 - root - INFO - Step 37150: lr=4.54E-06, loss= 1.1023 (max= 1.4087), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:04:29,693 - root - INFO - Step 37150: lr=4.54E-06, loss= 1.1023 (max= 1.4087), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:04:45,638 - root - INFO - Step 37160: lr=4.45E-06, loss= 1.1808 (max= 1.5763), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:04:45,638 - root - INFO - Step 37160: lr=4.45E-06, loss= 1.1808 (max= 1.5763), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:04:45,638 - root - INFO - Step 37160: lr=4.45E-06, loss= 1.1808 (max= 1.5763), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:04:45,638 - root - INFO - Step 37160: lr=4.45E-06, loss= 1.1808 (max= 1.5763), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:04:45,638 - root - INFO - Step 37160: lr=4.45E-06, loss= 1.1808 (max= 1.5763), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:04:45,638 - root - INFO - Step 37160: lr=4.45E-06, loss= 1.1808 (max= 1.5763), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:04:45,638 - root - INFO - Step 37160: lr=4.45E-06, loss= 1.1808 (max= 1.5763), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:04:45,638 - root - INFO - Step 37160: lr=4.45E-06, loss= 1.1808 (max= 1.5763), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:05:01,566 - root - INFO - Step 37170: lr=4.36E-06, loss= 1.1523 (max= 1.6139), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:01,566 - root - INFO - Step 37170: lr=4.36E-06, loss= 1.1523 (max= 1.6139), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:01,566 - root - INFO - Step 37170: lr=4.36E-06, loss= 1.1523 (max= 1.6139), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:01,566 - root - INFO - Step 37170: lr=4.36E-06, loss= 1.1523 (max= 1.6139), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:01,566 - root - INFO - Step 37170: lr=4.36E-06, loss= 1.1523 (max= 1.6139), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:01,566 - root - INFO - Step 37170: lr=4.36E-06, loss= 1.1523 (max= 1.6139), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:01,567 - root - INFO - Step 37170: lr=4.36E-06, loss= 1.1523 (max= 1.6139), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:01,567 - root - INFO - Step 37170: lr=4.36E-06, loss= 1.1523 (max= 1.6139), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:17,434 - root - INFO - Step 37180: lr=4.27E-06, loss= 1.1349 (max= 1.4843), tps=20655, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:17,434 - root - INFO - Step 37180: lr=4.27E-06, loss= 1.1349 (max= 1.4843), tps=20655, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:17,434 - root - INFO - Step 37180: lr=4.27E-06, loss= 1.1349 (max= 1.4843), tps=20655, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:17,434 - root - INFO - Step 37180: lr=4.27E-06, loss= 1.1349 (max= 1.4843), tps=20655, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:17,434 - root - INFO - Step 37180: lr=4.27E-06, loss= 1.1349 (max= 1.4843), tps=20655, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:17,434 - root - INFO - Step 37180: lr=4.27E-06, loss= 1.1349 (max= 1.4843), tps=20655, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:17,434 - root - INFO - Step 37180: lr=4.27E-06, loss= 1.1349 (max= 1.4843), tps=20655, mfu=43.03%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:17,434 - root - INFO - Step 37180: lr=4.27E-06, loss= 1.1349 (max= 1.4843), tps=20655, mfu=43.04%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:33,407 - root - INFO - Step 37190: lr=4.19E-06, loss= 1.1488 (max= 1.5738), tps=20518, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:05:33,407 - root - INFO - Step 37190: lr=4.19E-06, loss= 1.1488 (max= 1.5738), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:05:33,407 - root - INFO - Step 37190: lr=4.19E-06, loss= 1.1488 (max= 1.5738), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:05:33,407 - root - INFO - Step 37190: lr=4.19E-06, loss= 1.1488 (max= 1.5738), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:05:33,407 - root - INFO - Step 37190: lr=4.19E-06, loss= 1.1488 (max= 1.5738), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:05:33,407 - root - INFO - Step 37190: lr=4.19E-06, loss= 1.1488 (max= 1.5738), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:05:33,407 - root - INFO - Step 37190: lr=4.19E-06, loss= 1.1488 (max= 1.5738), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:05:33,407 - root - INFO - Step 37190: lr=4.19E-06, loss= 1.1488 (max= 1.5738), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:05:49,300 - root - INFO - Step 37200: lr=4.10E-06, loss= 1.1652 (max= 1.6609), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:49,300 - root - INFO - Step 37200: lr=4.10E-06, loss= 1.1652 (max= 1.6609), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:49,300 - root - INFO - Step 37200: lr=4.10E-06, loss= 1.1652 (max= 1.6609), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:49,300 - root - INFO - Step 37200: lr=4.10E-06, loss= 1.1652 (max= 1.6609), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:49,300 - root - INFO - Step 37200: lr=4.10E-06, loss= 1.1652 (max= 1.6609), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:49,300 - root - INFO - Step 37200: lr=4.10E-06, loss= 1.1652 (max= 1.6609), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:49,300 - root - INFO - Step 37200: lr=4.10E-06, loss= 1.1652 (max= 1.6609), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:05:49,301 - root - INFO - Step 37200: lr=4.10E-06, loss= 1.1652 (max= 1.6609), tps=20622, mfu=42.97%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:06:05,221 - root - INFO - Step 37210: lr=4.02E-06, loss= 1.1547 (max= 1.5780), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:05,221 - root - INFO - Step 37210: lr=4.02E-06, loss= 1.1547 (max= 1.5780), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:05,221 - root - INFO - Step 37210: lr=4.02E-06, loss= 1.1547 (max= 1.5780), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:05,221 - root - INFO - Step 37210: lr=4.02E-06, loss= 1.1547 (max= 1.5780), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:05,221 - root - INFO - Step 37210: lr=4.02E-06, loss= 1.1547 (max= 1.5780), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:05,221 - root - INFO - Step 37210: lr=4.02E-06, loss= 1.1547 (max= 1.5780), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:05,221 - root - INFO - Step 37210: lr=4.02E-06, loss= 1.1547 (max= 1.5780), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:05,221 - root - INFO - Step 37210: lr=4.02E-06, loss= 1.1547 (max= 1.5780), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:21,214 - root - INFO - Step 37220: lr=3.93E-06, loss= 1.1803 (max= 1.5328), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:21,214 - root - INFO - Step 37220: lr=3.93E-06, loss= 1.1803 (max= 1.5328), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:21,214 - root - INFO - Step 37220: lr=3.93E-06, loss= 1.1803 (max= 1.5328), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:21,214 - root - INFO - Step 37220: lr=3.93E-06, loss= 1.1803 (max= 1.5328), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:21,214 - root - INFO - Step 37220: lr=3.93E-06, loss= 1.1803 (max= 1.5328), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:21,214 - root - INFO - Step 37220: lr=3.93E-06, loss= 1.1803 (max= 1.5328), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:21,214 - root - INFO - Step 37220: lr=3.93E-06, loss= 1.1803 (max= 1.5328), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:21,214 - root - INFO - Step 37220: lr=3.93E-06, loss= 1.1803 (max= 1.5328), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:37,134 - root - INFO - Step 37230: lr=3.85E-06, loss= 1.1528 (max= 1.5126), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:37,134 - root - INFO - Step 37230: lr=3.85E-06, loss= 1.1528 (max= 1.5126), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:37,134 - root - INFO - Step 37230: lr=3.85E-06, loss= 1.1528 (max= 1.5126), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:37,134 - root - INFO - Step 37230: lr=3.85E-06, loss= 1.1528 (max= 1.5126), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:37,134 - root - INFO - Step 37230: lr=3.85E-06, loss= 1.1528 (max= 1.5126), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:37,134 - root - INFO - Step 37230: lr=3.85E-06, loss= 1.1528 (max= 1.5126), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:37,134 - root - INFO - Step 37230: lr=3.85E-06, loss= 1.1528 (max= 1.5126), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:37,134 - root - INFO - Step 37230: lr=3.85E-06, loss= 1.1528 (max= 1.5126), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:53,071 - root - INFO - Step 37240: lr=3.77E-06, loss= 1.1516 (max= 1.5175), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:53,071 - root - INFO - Step 37240: lr=3.77E-06, loss= 1.1516 (max= 1.5175), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:53,071 - root - INFO - Step 37240: lr=3.77E-06, loss= 1.1516 (max= 1.5175), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:53,071 - root - INFO - Step 37240: lr=3.77E-06, loss= 1.1516 (max= 1.5175), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:53,072 - root - INFO - Step 37240: lr=3.77E-06, loss= 1.1516 (max= 1.5175), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:53,072 - root - INFO - Step 37240: lr=3.77E-06, loss= 1.1516 (max= 1.5175), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:53,072 - root - INFO - Step 37240: lr=3.77E-06, loss= 1.1516 (max= 1.5175), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:06:53,072 - root - INFO - Step 37240: lr=3.77E-06, loss= 1.1516 (max= 1.5175), tps=20565, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:08,985 - root - INFO - Step 37250: lr=3.69E-06, loss= 1.1190 (max= 1.5857), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:08,985 - root - INFO - Step 37250: lr=3.69E-06, loss= 1.1190 (max= 1.5857), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:08,985 - root - INFO - Step 37250: lr=3.69E-06, loss= 1.1190 (max= 1.5857), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:08,985 - root - INFO - Step 37250: lr=3.69E-06, loss= 1.1190 (max= 1.5857), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:08,985 - root - INFO - Step 37250: lr=3.69E-06, loss= 1.1190 (max= 1.5857), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:08,985 - root - INFO - Step 37250: lr=3.69E-06, loss= 1.1190 (max= 1.5857), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:08,985 - root - INFO - Step 37250: lr=3.69E-06, loss= 1.1190 (max= 1.5857), tps=20596, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:08,985 - root - INFO - Step 37250: lr=3.69E-06, loss= 1.1190 (max= 1.5857), tps=20595, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:24,970 - root - INFO - Step 37260: lr=3.61E-06, loss= 1.1771 (max= 1.5048), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:24,970 - root - INFO - Step 37260: lr=3.61E-06, loss= 1.1771 (max= 1.5048), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:24,970 - root - INFO - Step 37260: lr=3.61E-06, loss= 1.1771 (max= 1.5048), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:24,970 - root - INFO - Step 37260: lr=3.61E-06, loss= 1.1771 (max= 1.5048), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:24,970 - root - INFO - Step 37260: lr=3.61E-06, loss= 1.1771 (max= 1.5048), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:24,970 - root - INFO - Step 37260: lr=3.61E-06, loss= 1.1771 (max= 1.5048), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:24,970 - root - INFO - Step 37260: lr=3.61E-06, loss= 1.1771 (max= 1.5048), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:24,971 - root - INFO - Step 37260: lr=3.61E-06, loss= 1.1771 (max= 1.5048), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:40,885 - root - INFO - Step 37270: lr=3.53E-06, loss= 1.1628 (max= 1.5899), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:40,885 - root - INFO - Step 37270: lr=3.53E-06, loss= 1.1628 (max= 1.5899), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:40,885 - root - INFO - Step 37270: lr=3.53E-06, loss= 1.1628 (max= 1.5899), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:40,885 - root - INFO - Step 37270: lr=3.53E-06, loss= 1.1628 (max= 1.5899), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:40,885 - root - INFO - Step 37270: lr=3.53E-06, loss= 1.1628 (max= 1.5899), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:40,885 - root - INFO - Step 37270: lr=3.53E-06, loss= 1.1628 (max= 1.5899), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:40,885 - root - INFO - Step 37270: lr=3.53E-06, loss= 1.1628 (max= 1.5899), tps=20594, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:40,885 - root - INFO - Step 37270: lr=3.53E-06, loss= 1.1628 (max= 1.5899), tps=20593, mfu=42.91%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:07:56,865 - root - INFO - Step 37280: lr=3.46E-06, loss= 1.1492 (max= 1.5332), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:07:56,865 - root - INFO - Step 37280: lr=3.46E-06, loss= 1.1492 (max= 1.5332), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:07:56,865 - root - INFO - Step 37280: lr=3.46E-06, loss= 1.1492 (max= 1.5332), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:07:56,865 - root - INFO - Step 37280: lr=3.46E-06, loss= 1.1492 (max= 1.5332), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:07:56,865 - root - INFO - Step 37280: lr=3.46E-06, loss= 1.1492 (max= 1.5332), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:07:56,865 - root - INFO - Step 37280: lr=3.46E-06, loss= 1.1492 (max= 1.5332), tps=20512, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:07:56,865 - root - INFO - Step 37280: lr=3.46E-06, loss= 1.1492 (max= 1.5332), tps=20511, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:07:56,865 - root - INFO - Step 37280: lr=3.46E-06, loss= 1.1492 (max= 1.5332), tps=20510, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:08:12,768 - root - INFO - Step 37290: lr=3.38E-06, loss= 1.1404 (max= 1.4365), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:08:12,768 - root - INFO - Step 37290: lr=3.38E-06, loss= 1.1404 (max= 1.4365), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:08:12,768 - root - INFO - Step 37290: lr=3.38E-06, loss= 1.1404 (max= 1.4365), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:08:12,768 - root - INFO - Step 37290: lr=3.38E-06, loss= 1.1404 (max= 1.4365), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:08:12,768 - root - INFO - Step 37290: lr=3.38E-06, loss= 1.1404 (max= 1.4365), tps=20610, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:08:12,768 - root - INFO - Step 37290: lr=3.38E-06, loss= 1.1404 (max= 1.4365), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:08:12,768 - root - INFO - Step 37290: lr=3.38E-06, loss= 1.1404 (max= 1.4365), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:08:12,768 - root - INFO - Step 37290: lr=3.38E-06, loss= 1.1404 (max= 1.4365), tps=20609, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:08:28,713 - root - INFO - Step 37300: lr=3.31E-06, loss= 1.1161 (max= 1.4457), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:08:28,713 - root - INFO - Step 37300: lr=3.31E-06, loss= 1.1161 (max= 1.4457), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:08:28,713 - root - INFO - Step 37300: lr=3.31E-06, loss= 1.1161 (max= 1.4457), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:08:28,713 - root - INFO - Step 37300: lr=3.31E-06, loss= 1.1161 (max= 1.4457), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:08:28,713 - root - INFO - Step 37300: lr=3.31E-06, loss= 1.1161 (max= 1.4457), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:08:28,713 - root - INFO - Step 37300: lr=3.31E-06, loss= 1.1161 (max= 1.4457), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:08:28,713 - root - INFO - Step 37300: lr=3.31E-06, loss= 1.1161 (max= 1.4457), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:08:28,713 - root - INFO - Step 37300: lr=3.31E-06, loss= 1.1161 (max= 1.4457), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:08:44,647 - root - INFO - Step 37310: lr=3.23E-06, loss= 1.1308 (max= 1.5094), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:08:44,647 - root - INFO - Step 37310: lr=3.23E-06, loss= 1.1308 (max= 1.5094), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:08:44,647 - root - INFO - Step 37310: lr=3.23E-06, loss= 1.1308 (max= 1.5094), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:08:44,647 - root - INFO - Step 37310: lr=3.23E-06, loss= 1.1308 (max= 1.5094), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:08:44,647 - root - INFO - Step 37310: lr=3.23E-06, loss= 1.1308 (max= 1.5094), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:08:44,647 - root - INFO - Step 37310: lr=3.23E-06, loss= 1.1308 (max= 1.5094), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:08:44,647 - root - INFO - Step 37310: lr=3.23E-06, loss= 1.1308 (max= 1.5094), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:08:44,647 - root - INFO - Step 37310: lr=3.23E-06, loss= 1.1308 (max= 1.5094), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:09:00,571 - root - INFO - Step 37320: lr=3.16E-06, loss= 1.1460 (max= 1.5423), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:09:00,571 - root - INFO - Step 37320: lr=3.16E-06, loss= 1.1460 (max= 1.5423), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:09:00,571 - root - INFO - Step 37320: lr=3.16E-06, loss= 1.1460 (max= 1.5423), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:09:00,571 - root - INFO - Step 37320: lr=3.16E-06, loss= 1.1460 (max= 1.5423), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:09:00,571 - root - INFO - Step 37320: lr=3.16E-06, loss= 1.1460 (max= 1.5423), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:09:00,572 - root - INFO - Step 37320: lr=3.16E-06, loss= 1.1460 (max= 1.5423), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:09:00,572 - root - INFO - Step 37320: lr=3.16E-06, loss= 1.1460 (max= 1.5423), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:09:00,572 - root - INFO - Step 37320: lr=3.16E-06, loss= 1.1460 (max= 1.5423), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:09:16,491 - root - INFO - Step 37330: lr=3.09E-06, loss= 1.1668 (max= 1.4621), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:09:16,492 - root - INFO - Step 37330: lr=3.09E-06, loss= 1.1668 (max= 1.4621), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:09:16,492 - root - INFO - Step 37330: lr=3.09E-06, loss= 1.1668 (max= 1.4621), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:09:16,492 - root - INFO - Step 37330: lr=3.09E-06, loss= 1.1668 (max= 1.4621), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:09:16,492 - root - INFO - Step 37330: lr=3.09E-06, loss= 1.1668 (max= 1.4621), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:09:16,492 - root - INFO - Step 37330: lr=3.09E-06, loss= 1.1668 (max= 1.4621), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:09:16,492 - root - INFO - Step 37330: lr=3.09E-06, loss= 1.1668 (max= 1.4621), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:09:16,492 - root - INFO - Step 37330: lr=3.09E-06, loss= 1.1668 (max= 1.4621), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:09:32,427 - root - INFO - Step 37340: lr=3.01E-06, loss= 1.1271 (max= 1.5993), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:09:32,427 - root - INFO - Step 37340: lr=3.01E-06, loss= 1.1271 (max= 1.5993), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:09:32,427 - root - INFO - Step 37340: lr=3.01E-06, loss= 1.1271 (max= 1.5993), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:09:32,427 - root - INFO - Step 37340: lr=3.01E-06, loss= 1.1271 (max= 1.5993), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:09:32,427 - root - INFO - Step 37340: lr=3.01E-06, loss= 1.1271 (max= 1.5993), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:09:32,427 - root - INFO - Step 37340: lr=3.01E-06, loss= 1.1271 (max= 1.5993), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:09:32,427 - root - INFO - Step 37340: lr=3.01E-06, loss= 1.1271 (max= 1.5993), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:09:32,427 - root - INFO - Step 37340: lr=3.01E-06, loss= 1.1271 (max= 1.5993), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:09:48,354 - root - INFO - Step 37350: lr=2.94E-06, loss= 1.1694 (max= 1.6840), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:09:48,355 - root - INFO - Step 37350: lr=2.94E-06, loss= 1.1694 (max= 1.6840), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:09:48,355 - root - INFO - Step 37350: lr=2.94E-06, loss= 1.1694 (max= 1.6840), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:09:48,355 - root - INFO - Step 37350: lr=2.94E-06, loss= 1.1694 (max= 1.6840), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:09:48,355 - root - INFO - Step 37350: lr=2.94E-06, loss= 1.1694 (max= 1.6840), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:09:48,355 - root - INFO - Step 37350: lr=2.94E-06, loss= 1.1694 (max= 1.6840), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:09:48,355 - root - INFO - Step 37350: lr=2.94E-06, loss= 1.1694 (max= 1.6840), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:09:48,355 - root - INFO - Step 37350: lr=2.94E-06, loss= 1.1694 (max= 1.6840), tps=20577, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:10:04,279 - root - INFO - Step 37360: lr=2.87E-06, loss= 1.1416 (max= 1.4537), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:04,279 - root - INFO - Step 37360: lr=2.87E-06, loss= 1.1416 (max= 1.4537), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:04,279 - root - INFO - Step 37360: lr=2.87E-06, loss= 1.1416 (max= 1.4537), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:04,279 - root - INFO - Step 37360: lr=2.87E-06, loss= 1.1416 (max= 1.4537), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:04,279 - root - INFO - Step 37360: lr=2.87E-06, loss= 1.1416 (max= 1.4537), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:04,279 - root - INFO - Step 37360: lr=2.87E-06, loss= 1.1416 (max= 1.4537), tps=20582, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:04,279 - root - INFO - Step 37360: lr=2.87E-06, loss= 1.1416 (max= 1.4537), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:04,280 - root - INFO - Step 37360: lr=2.87E-06, loss= 1.1416 (max= 1.4537), tps=20581, mfu=42.88%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:20,191 - root - INFO - Step 37370: lr=2.80E-06, loss= 1.1613 (max= 1.5263), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:20,191 - root - INFO - Step 37370: lr=2.80E-06, loss= 1.1613 (max= 1.5263), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:20,191 - root - INFO - Step 37370: lr=2.80E-06, loss= 1.1613 (max= 1.5263), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:20,191 - root - INFO - Step 37370: lr=2.80E-06, loss= 1.1613 (max= 1.5263), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:20,191 - root - INFO - Step 37370: lr=2.80E-06, loss= 1.1613 (max= 1.5263), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:20,191 - root - INFO - Step 37370: lr=2.80E-06, loss= 1.1613 (max= 1.5263), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:20,191 - root - INFO - Step 37370: lr=2.80E-06, loss= 1.1613 (max= 1.5263), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:20,191 - root - INFO - Step 37370: lr=2.80E-06, loss= 1.1613 (max= 1.5263), tps=20598, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:36,173 - root - INFO - Step 37380: lr=2.73E-06, loss= 1.1312 (max= 1.5643), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:10:36,173 - root - INFO - Step 37380: lr=2.73E-06, loss= 1.1312 (max= 1.5643), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:10:36,173 - root - INFO - Step 37380: lr=2.73E-06, loss= 1.1312 (max= 1.5643), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:10:36,173 - root - INFO - Step 37380: lr=2.73E-06, loss= 1.1312 (max= 1.5643), tps=20508, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:10:36,173 - root - INFO - Step 37380: lr=2.73E-06, loss= 1.1312 (max= 1.5643), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:10:36,173 - root - INFO - Step 37380: lr=2.73E-06, loss= 1.1312 (max= 1.5643), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:10:36,173 - root - INFO - Step 37380: lr=2.73E-06, loss= 1.1312 (max= 1.5643), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:10:36,173 - root - INFO - Step 37380: lr=2.73E-06, loss= 1.1312 (max= 1.5643), tps=20507, mfu=42.73%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:10:52,078 - root - INFO - Step 37390: lr=2.67E-06, loss= 1.1441 (max= 1.6660), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:52,078 - root - INFO - Step 37390: lr=2.67E-06, loss= 1.1441 (max= 1.6660), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:52,078 - root - INFO - Step 37390: lr=2.67E-06, loss= 1.1441 (max= 1.6660), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:52,078 - root - INFO - Step 37390: lr=2.67E-06, loss= 1.1441 (max= 1.6660), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:52,078 - root - INFO - Step 37390: lr=2.67E-06, loss= 1.1441 (max= 1.6660), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:52,078 - root - INFO - Step 37390: lr=2.67E-06, loss= 1.1441 (max= 1.6660), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:52,078 - root - INFO - Step 37390: lr=2.67E-06, loss= 1.1441 (max= 1.6660), tps=20607, mfu=42.93%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:10:52,078 - root - INFO - Step 37390: lr=2.67E-06, loss= 1.1441 (max= 1.6660), tps=20607, mfu=42.94%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:11:07,967 - root - INFO - Step 37400: lr=2.60E-06, loss= 1.1537 (max= 1.4403), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:07,967 - root - INFO - Step 37400: lr=2.60E-06, loss= 1.1537 (max= 1.4403), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:07,967 - root - INFO - Step 37400: lr=2.60E-06, loss= 1.1537 (max= 1.4403), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:07,967 - root - INFO - Step 37400: lr=2.60E-06, loss= 1.1537 (max= 1.4403), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:07,967 - root - INFO - Step 37400: lr=2.60E-06, loss= 1.1537 (max= 1.4403), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:07,967 - root - INFO - Step 37400: lr=2.60E-06, loss= 1.1537 (max= 1.4403), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:07,967 - root - INFO - Step 37400: lr=2.60E-06, loss= 1.1537 (max= 1.4403), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:07,967 - root - INFO - Step 37400: lr=2.60E-06, loss= 1.1537 (max= 1.4403), tps=20627, mfu=42.98%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:23,897 - root - INFO - Step 37410: lr=2.53E-06, loss= 1.1593 (max= 1.5834), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:11:23,897 - root - INFO - Step 37410: lr=2.53E-06, loss= 1.1593 (max= 1.5834), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:11:23,897 - root - INFO - Step 37410: lr=2.53E-06, loss= 1.1593 (max= 1.5834), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:11:23,897 - root - INFO - Step 37410: lr=2.53E-06, loss= 1.1593 (max= 1.5834), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:11:23,897 - root - INFO - Step 37410: lr=2.53E-06, loss= 1.1593 (max= 1.5834), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:11:23,897 - root - INFO - Step 37410: lr=2.53E-06, loss= 1.1593 (max= 1.5834), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:11:23,897 - root - INFO - Step 37410: lr=2.53E-06, loss= 1.1593 (max= 1.5834), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:11:23,897 - root - INFO - Step 37410: lr=2.53E-06, loss= 1.1593 (max= 1.5834), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:11:39,792 - root - INFO - Step 37420: lr=2.46E-06, loss= 1.1679 (max= 1.6378), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:39,792 - root - INFO - Step 37420: lr=2.46E-06, loss= 1.1679 (max= 1.6378), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:39,792 - root - INFO - Step 37420: lr=2.46E-06, loss= 1.1679 (max= 1.6378), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:39,792 - root - INFO - Step 37420: lr=2.46E-06, loss= 1.1679 (max= 1.6378), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:39,792 - root - INFO - Step 37420: lr=2.46E-06, loss= 1.1679 (max= 1.6378), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:39,792 - root - INFO - Step 37420: lr=2.46E-06, loss= 1.1679 (max= 1.6378), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:39,792 - root - INFO - Step 37420: lr=2.46E-06, loss= 1.1679 (max= 1.6378), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:39,792 - root - INFO - Step 37420: lr=2.46E-06, loss= 1.1679 (max= 1.6378), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:55,723 - root - INFO - Step 37430: lr=2.40E-06, loss= 1.1720 (max= 1.5610), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:55,723 - root - INFO - Step 37430: lr=2.40E-06, loss= 1.1720 (max= 1.5610), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:55,723 - root - INFO - Step 37430: lr=2.40E-06, loss= 1.1720 (max= 1.5610), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:55,723 - root - INFO - Step 37430: lr=2.40E-06, loss= 1.1720 (max= 1.5610), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:55,723 - root - INFO - Step 37430: lr=2.40E-06, loss= 1.1720 (max= 1.5610), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:55,723 - root - INFO - Step 37430: lr=2.40E-06, loss= 1.1720 (max= 1.5610), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:55,723 - root - INFO - Step 37430: lr=2.40E-06, loss= 1.1720 (max= 1.5610), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:11:55,723 - root - INFO - Step 37430: lr=2.40E-06, loss= 1.1720 (max= 1.5610), tps=20573, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:12:11,654 - root - INFO - Step 37440: lr=2.33E-06, loss= 1.1098 (max= 1.6009), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:11,654 - root - INFO - Step 37440: lr=2.33E-06, loss= 1.1098 (max= 1.6009), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:11,654 - root - INFO - Step 37440: lr=2.33E-06, loss= 1.1098 (max= 1.6009), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:11,654 - root - INFO - Step 37440: lr=2.33E-06, loss= 1.1098 (max= 1.6009), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:11,655 - root - INFO - Step 37440: lr=2.33E-06, loss= 1.1098 (max= 1.6009), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:11,655 - root - INFO - Step 37440: lr=2.33E-06, loss= 1.1098 (max= 1.6009), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:11,655 - root - INFO - Step 37440: lr=2.33E-06, loss= 1.1098 (max= 1.6009), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:11,655 - root - INFO - Step 37440: lr=2.33E-06, loss= 1.1098 (max= 1.6009), tps=20572, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:27,640 - root - INFO - Step 37450: lr=2.27E-06, loss= 1.1316 (max= 1.4936), tps=20502, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:27,640 - root - INFO - Step 37450: lr=2.27E-06, loss= 1.1316 (max= 1.4936), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:27,640 - root - INFO - Step 37450: lr=2.27E-06, loss= 1.1316 (max= 1.4936), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:27,640 - root - INFO - Step 37450: lr=2.27E-06, loss= 1.1316 (max= 1.4936), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:27,640 - root - INFO - Step 37450: lr=2.27E-06, loss= 1.1316 (max= 1.4936), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:27,640 - root - INFO - Step 37450: lr=2.27E-06, loss= 1.1316 (max= 1.4936), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:27,640 - root - INFO - Step 37450: lr=2.27E-06, loss= 1.1316 (max= 1.4936), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:27,640 - root - INFO - Step 37450: lr=2.27E-06, loss= 1.1316 (max= 1.4936), tps=20503, mfu=42.72%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:43,607 - root - INFO - Step 37460: lr=2.20E-06, loss= 1.1543 (max= 1.7041), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:12:43,607 - root - INFO - Step 37460: lr=2.20E-06, loss= 1.1543 (max= 1.7041), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:12:43,607 - root - INFO - Step 37460: lr=2.20E-06, loss= 1.1543 (max= 1.7041), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:12:43,607 - root - INFO - Step 37460: lr=2.20E-06, loss= 1.1543 (max= 1.7041), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:12:43,607 - root - INFO - Step 37460: lr=2.20E-06, loss= 1.1543 (max= 1.7041), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:12:43,607 - root - INFO - Step 37460: lr=2.20E-06, loss= 1.1543 (max= 1.7041), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:12:43,607 - root - INFO - Step 37460: lr=2.20E-06, loss= 1.1543 (max= 1.7041), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:12:43,608 - root - INFO - Step 37460: lr=2.20E-06, loss= 1.1543 (max= 1.7041), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:12:59,573 - root - INFO - Step 37470: lr=2.14E-06, loss= 1.1561 (max= 1.4460), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:59,573 - root - INFO - Step 37470: lr=2.14E-06, loss= 1.1561 (max= 1.4460), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:59,573 - root - INFO - Step 37470: lr=2.14E-06, loss= 1.1561 (max= 1.4460), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:59,573 - root - INFO - Step 37470: lr=2.14E-06, loss= 1.1561 (max= 1.4460), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:59,573 - root - INFO - Step 37470: lr=2.14E-06, loss= 1.1561 (max= 1.4460), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:59,573 - root - INFO - Step 37470: lr=2.14E-06, loss= 1.1561 (max= 1.4460), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:59,573 - root - INFO - Step 37470: lr=2.14E-06, loss= 1.1561 (max= 1.4460), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:12:59,574 - root - INFO - Step 37470: lr=2.14E-06, loss= 1.1561 (max= 1.4460), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:15,536 - root - INFO - Step 37480: lr=2.08E-06, loss= 1.1484 (max= 1.6272), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:15,536 - root - INFO - Step 37480: lr=2.08E-06, loss= 1.1484 (max= 1.6272), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:15,536 - root - INFO - Step 37480: lr=2.08E-06, loss= 1.1484 (max= 1.6272), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:15,536 - root - INFO - Step 37480: lr=2.08E-06, loss= 1.1484 (max= 1.6272), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:15,536 - root - INFO - Step 37480: lr=2.08E-06, loss= 1.1484 (max= 1.6272), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:15,536 - root - INFO - Step 37480: lr=2.08E-06, loss= 1.1484 (max= 1.6272), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:15,536 - root - INFO - Step 37480: lr=2.08E-06, loss= 1.1484 (max= 1.6272), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:15,536 - root - INFO - Step 37480: lr=2.08E-06, loss= 1.1484 (max= 1.6272), tps=20532, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:31,501 - root - INFO - Step 37490: lr=2.01E-06, loss= 1.1744 (max= 1.5561), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:31,501 - root - INFO - Step 37490: lr=2.01E-06, loss= 1.1744 (max= 1.5561), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:31,501 - root - INFO - Step 37490: lr=2.01E-06, loss= 1.1744 (max= 1.5561), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:31,501 - root - INFO - Step 37490: lr=2.01E-06, loss= 1.1744 (max= 1.5561), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:31,501 - root - INFO - Step 37490: lr=2.01E-06, loss= 1.1744 (max= 1.5561), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:31,501 - root - INFO - Step 37490: lr=2.01E-06, loss= 1.1744 (max= 1.5561), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:31,501 - root - INFO - Step 37490: lr=2.01E-06, loss= 1.1744 (max= 1.5561), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:31,502 - root - INFO - Step 37490: lr=2.01E-06, loss= 1.1744 (max= 1.5561), tps=20529, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:47,437 - root - INFO - Step 37500: lr=1.95E-06, loss= 1.1838 (max= 1.8517), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:47,437 - root - INFO - Step 37500: lr=1.95E-06, loss= 1.1838 (max= 1.8517), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:47,437 - root - INFO - Step 37500: lr=1.95E-06, loss= 1.1838 (max= 1.8517), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:47,437 - root - INFO - Step 37500: lr=1.95E-06, loss= 1.1838 (max= 1.8517), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:47,437 - root - INFO - Step 37500: lr=1.95E-06, loss= 1.1838 (max= 1.8517), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:47,437 - root - INFO - Step 37500: lr=1.95E-06, loss= 1.1838 (max= 1.8517), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:47,437 - root - INFO - Step 37500: lr=1.95E-06, loss= 1.1838 (max= 1.8517), tps=20568, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:13:47,437 - root - INFO - Step 37500: lr=1.95E-06, loss= 1.1838 (max= 1.8517), tps=20567, mfu=42.85%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:03,392 - root - INFO - Step 37510: lr=1.89E-06, loss= 1.1282 (max= 1.5380), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:03,392 - root - INFO - Step 37510: lr=1.89E-06, loss= 1.1282 (max= 1.5380), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:03,392 - root - INFO - Step 37510: lr=1.89E-06, loss= 1.1282 (max= 1.5380), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:03,392 - root - INFO - Step 37510: lr=1.89E-06, loss= 1.1282 (max= 1.5380), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:03,392 - root - INFO - Step 37510: lr=1.89E-06, loss= 1.1282 (max= 1.5380), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:03,392 - root - INFO - Step 37510: lr=1.89E-06, loss= 1.1282 (max= 1.5380), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:03,392 - root - INFO - Step 37510: lr=1.89E-06, loss= 1.1282 (max= 1.5380), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:03,392 - root - INFO - Step 37510: lr=1.89E-06, loss= 1.1282 (max= 1.5380), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:19,344 - root - INFO - Step 37520: lr=1.83E-06, loss= 1.1446 (max= 1.6118), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:19,344 - root - INFO - Step 37520: lr=1.83E-06, loss= 1.1446 (max= 1.6118), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:19,344 - root - INFO - Step 37520: lr=1.83E-06, loss= 1.1446 (max= 1.6118), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:19,344 - root - INFO - Step 37520: lr=1.83E-06, loss= 1.1446 (max= 1.6118), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:19,344 - root - INFO - Step 37520: lr=1.83E-06, loss= 1.1446 (max= 1.6118), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:19,344 - root - INFO - Step 37520: lr=1.83E-06, loss= 1.1446 (max= 1.6118), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:19,344 - root - INFO - Step 37520: lr=1.83E-06, loss= 1.1446 (max= 1.6118), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:19,345 - root - INFO - Step 37520: lr=1.83E-06, loss= 1.1446 (max= 1.6118), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:35,349 - root - INFO - Step 37530: lr=1.77E-06, loss= 1.1379 (max= 1.4990), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:35,350 - root - INFO - Step 37530: lr=1.77E-06, loss= 1.1379 (max= 1.4990), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:35,350 - root - INFO - Step 37530: lr=1.77E-06, loss= 1.1379 (max= 1.4990), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:35,350 - root - INFO - Step 37530: lr=1.77E-06, loss= 1.1379 (max= 1.4990), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:35,350 - root - INFO - Step 37530: lr=1.77E-06, loss= 1.1379 (max= 1.4990), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:35,350 - root - INFO - Step 37530: lr=1.77E-06, loss= 1.1379 (max= 1.4990), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:35,350 - root - INFO - Step 37530: lr=1.77E-06, loss= 1.1379 (max= 1.4990), tps=20478, mfu=42.67%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:35,350 - root - INFO - Step 37530: lr=1.77E-06, loss= 1.1379 (max= 1.4990), tps=20477, mfu=42.66%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:51,244 - root - INFO - Step 37540: lr=1.71E-06, loss= 1.1523 (max= 1.5342), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:51,244 - root - INFO - Step 37540: lr=1.71E-06, loss= 1.1523 (max= 1.5342), tps=20620, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:51,244 - root - INFO - Step 37540: lr=1.71E-06, loss= 1.1523 (max= 1.5342), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:51,244 - root - INFO - Step 37540: lr=1.71E-06, loss= 1.1523 (max= 1.5342), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:51,244 - root - INFO - Step 37540: lr=1.71E-06, loss= 1.1523 (max= 1.5342), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:51,244 - root - INFO - Step 37540: lr=1.71E-06, loss= 1.1523 (max= 1.5342), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:51,244 - root - INFO - Step 37540: lr=1.71E-06, loss= 1.1523 (max= 1.5342), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:14:51,244 - root - INFO - Step 37540: lr=1.71E-06, loss= 1.1523 (max= 1.5342), tps=20621, mfu=42.96%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:07,234 - root - INFO - Step 37550: lr=1.65E-06, loss= 1.1482 (max= 1.5662), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:07,234 - root - INFO - Step 37550: lr=1.65E-06, loss= 1.1482 (max= 1.5662), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:07,234 - root - INFO - Step 37550: lr=1.65E-06, loss= 1.1482 (max= 1.5662), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:07,234 - root - INFO - Step 37550: lr=1.65E-06, loss= 1.1482 (max= 1.5662), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:07,234 - root - INFO - Step 37550: lr=1.65E-06, loss= 1.1482 (max= 1.5662), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:07,234 - root - INFO - Step 37550: lr=1.65E-06, loss= 1.1482 (max= 1.5662), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:07,234 - root - INFO - Step 37550: lr=1.65E-06, loss= 1.1482 (max= 1.5662), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:07,234 - root - INFO - Step 37550: lr=1.65E-06, loss= 1.1482 (max= 1.5662), tps=20497, mfu=42.71%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:23,153 - root - INFO - Step 37560: lr=1.59E-06, loss= 1.1537 (max= 1.4857), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:23,154 - root - INFO - Step 37560: lr=1.59E-06, loss= 1.1537 (max= 1.4857), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:23,154 - root - INFO - Step 37560: lr=1.59E-06, loss= 1.1537 (max= 1.4857), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:23,154 - root - INFO - Step 37560: lr=1.59E-06, loss= 1.1537 (max= 1.4857), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:23,154 - root - INFO - Step 37560: lr=1.59E-06, loss= 1.1537 (max= 1.4857), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:23,154 - root - INFO - Step 37560: lr=1.59E-06, loss= 1.1537 (max= 1.4857), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:23,154 - root - INFO - Step 37560: lr=1.59E-06, loss= 1.1537 (max= 1.4857), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:23,154 - root - INFO - Step 37560: lr=1.59E-06, loss= 1.1537 (max= 1.4857), tps=20588, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:39,100 - root - INFO - Step 37570: lr=1.53E-06, loss= 1.1248 (max= 1.4688), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:39,100 - root - INFO - Step 37570: lr=1.53E-06, loss= 1.1248 (max= 1.4688), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:39,100 - root - INFO - Step 37570: lr=1.53E-06, loss= 1.1248 (max= 1.4688), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:39,100 - root - INFO - Step 37570: lr=1.53E-06, loss= 1.1248 (max= 1.4688), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:39,100 - root - INFO - Step 37570: lr=1.53E-06, loss= 1.1248 (max= 1.4688), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:39,100 - root - INFO - Step 37570: lr=1.53E-06, loss= 1.1248 (max= 1.4688), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:39,100 - root - INFO - Step 37570: lr=1.53E-06, loss= 1.1248 (max= 1.4688), tps=20554, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:39,100 - root - INFO - Step 37570: lr=1.53E-06, loss= 1.1248 (max= 1.4688), tps=20553, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:55,055 - root - INFO - Step 37580: lr=1.47E-06, loss= 1.1903 (max= 1.5715), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:55,055 - root - INFO - Step 37580: lr=1.47E-06, loss= 1.1903 (max= 1.5715), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:55,055 - root - INFO - Step 37580: lr=1.47E-06, loss= 1.1903 (max= 1.5715), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:55,055 - root - INFO - Step 37580: lr=1.47E-06, loss= 1.1903 (max= 1.5715), tps=20542, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:55,055 - root - INFO - Step 37580: lr=1.47E-06, loss= 1.1903 (max= 1.5715), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:55,055 - root - INFO - Step 37580: lr=1.47E-06, loss= 1.1903 (max= 1.5715), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:55,055 - root - INFO - Step 37580: lr=1.47E-06, loss= 1.1903 (max= 1.5715), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:15:55,055 - root - INFO - Step 37580: lr=1.47E-06, loss= 1.1903 (max= 1.5715), tps=20541, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:10,972 - root - INFO - Step 37590: lr=1.41E-06, loss= 1.1329 (max= 1.5716), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:10,972 - root - INFO - Step 37590: lr=1.41E-06, loss= 1.1329 (max= 1.5716), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:10,973 - root - INFO - Step 37590: lr=1.41E-06, loss= 1.1329 (max= 1.5716), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:10,973 - root - INFO - Step 37590: lr=1.41E-06, loss= 1.1329 (max= 1.5716), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:10,973 - root - INFO - Step 37590: lr=1.41E-06, loss= 1.1329 (max= 1.5716), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:10,973 - root - INFO - Step 37590: lr=1.41E-06, loss= 1.1329 (max= 1.5716), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:10,973 - root - INFO - Step 37590: lr=1.41E-06, loss= 1.1329 (max= 1.5716), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:10,973 - root - INFO - Step 37590: lr=1.41E-06, loss= 1.1329 (max= 1.5716), tps=20591, mfu=42.90%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:26,940 - root - INFO - Step 37600: lr=1.35E-06, loss= 1.1553 (max= 1.6133), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:26,940 - root - INFO - Step 37600: lr=1.35E-06, loss= 1.1553 (max= 1.6133), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:26,940 - root - INFO - Step 37600: lr=1.35E-06, loss= 1.1553 (max= 1.6133), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:26,940 - root - INFO - Step 37600: lr=1.35E-06, loss= 1.1553 (max= 1.6133), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:26,940 - root - INFO - Step 37600: lr=1.35E-06, loss= 1.1553 (max= 1.6133), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:26,940 - root - INFO - Step 37600: lr=1.35E-06, loss= 1.1553 (max= 1.6133), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:26,940 - root - INFO - Step 37600: lr=1.35E-06, loss= 1.1553 (max= 1.6133), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:26,940 - root - INFO - Step 37600: lr=1.35E-06, loss= 1.1553 (max= 1.6133), tps=20526, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:42,893 - root - INFO - Step 37610: lr=1.29E-06, loss= 1.1708 (max= 1.5696), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:42,893 - root - INFO - Step 37610: lr=1.29E-06, loss= 1.1708 (max= 1.5696), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:42,893 - root - INFO - Step 37610: lr=1.29E-06, loss= 1.1708 (max= 1.5696), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:42,893 - root - INFO - Step 37610: lr=1.29E-06, loss= 1.1708 (max= 1.5696), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:42,893 - root - INFO - Step 37610: lr=1.29E-06, loss= 1.1708 (max= 1.5696), tps=20544, mfu=42.80%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:42,893 - root - INFO - Step 37610: lr=1.29E-06, loss= 1.1708 (max= 1.5696), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:42,893 - root - INFO - Step 37610: lr=1.29E-06, loss= 1.1708 (max= 1.5696), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:42,893 - root - INFO - Step 37610: lr=1.29E-06, loss= 1.1708 (max= 1.5696), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:58,823 - root - INFO - Step 37620: lr=1.24E-06, loss= 1.1654 (max= 1.4868), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:58,823 - root - INFO - Step 37620: lr=1.24E-06, loss= 1.1654 (max= 1.4868), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:58,823 - root - INFO - Step 37620: lr=1.24E-06, loss= 1.1654 (max= 1.4868), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:58,823 - root - INFO - Step 37620: lr=1.24E-06, loss= 1.1654 (max= 1.4868), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:58,823 - root - INFO - Step 37620: lr=1.24E-06, loss= 1.1654 (max= 1.4868), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:58,823 - root - INFO - Step 37620: lr=1.24E-06, loss= 1.1654 (max= 1.4868), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:58,823 - root - INFO - Step 37620: lr=1.24E-06, loss= 1.1654 (max= 1.4868), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:16:58,823 - root - INFO - Step 37620: lr=1.24E-06, loss= 1.1654 (max= 1.4868), tps=20574, mfu=42.87%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:14,762 - root - INFO - Step 37630: lr=1.18E-06, loss= 1.1607 (max= 1.6532), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:14,762 - root - INFO - Step 37630: lr=1.18E-06, loss= 1.1607 (max= 1.6532), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:14,762 - root - INFO - Step 37630: lr=1.18E-06, loss= 1.1607 (max= 1.6532), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:14,762 - root - INFO - Step 37630: lr=1.18E-06, loss= 1.1607 (max= 1.6532), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:14,762 - root - INFO - Step 37630: lr=1.18E-06, loss= 1.1607 (max= 1.6532), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:14,762 - root - INFO - Step 37630: lr=1.18E-06, loss= 1.1607 (max= 1.6532), tps=20562, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:14,762 - root - INFO - Step 37630: lr=1.18E-06, loss= 1.1607 (max= 1.6532), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:14,762 - root - INFO - Step 37630: lr=1.18E-06, loss= 1.1607 (max= 1.6532), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:30,731 - root - INFO - Step 37640: lr=1.12E-06, loss= 1.1337 (max= 1.6199), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:30,731 - root - INFO - Step 37640: lr=1.12E-06, loss= 1.1337 (max= 1.6199), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:30,731 - root - INFO - Step 37640: lr=1.12E-06, loss= 1.1337 (max= 1.6199), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:30,731 - root - INFO - Step 37640: lr=1.12E-06, loss= 1.1337 (max= 1.6199), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:30,731 - root - INFO - Step 37640: lr=1.12E-06, loss= 1.1337 (max= 1.6199), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:30,731 - root - INFO - Step 37640: lr=1.12E-06, loss= 1.1337 (max= 1.6199), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:30,731 - root - INFO - Step 37640: lr=1.12E-06, loss= 1.1337 (max= 1.6199), tps=20525, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:30,731 - root - INFO - Step 37640: lr=1.12E-06, loss= 1.1337 (max= 1.6199), tps=20524, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:46,640 - root - INFO - Step 37650: lr=1.07E-06, loss= 1.1386 (max= 1.5431), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:46,640 - root - INFO - Step 37650: lr=1.07E-06, loss= 1.1386 (max= 1.5431), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:46,640 - root - INFO - Step 37650: lr=1.07E-06, loss= 1.1386 (max= 1.5431), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:46,640 - root - INFO - Step 37650: lr=1.07E-06, loss= 1.1386 (max= 1.5431), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:46,640 - root - INFO - Step 37650: lr=1.07E-06, loss= 1.1386 (max= 1.5431), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:46,640 - root - INFO - Step 37650: lr=1.07E-06, loss= 1.1386 (max= 1.5431), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:46,640 - root - INFO - Step 37650: lr=1.07E-06, loss= 1.1386 (max= 1.5431), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:17:46,640 - root - INFO - Step 37650: lr=1.07E-06, loss= 1.1386 (max= 1.5431), tps=20601, mfu=42.92%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:02,590 - root - INFO - Step 37660: lr=1.01E-06, loss= 1.1349 (max= 1.7487), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:02,590 - root - INFO - Step 37660: lr=1.01E-06, loss= 1.1349 (max= 1.7487), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:02,590 - root - INFO - Step 37660: lr=1.01E-06, loss= 1.1349 (max= 1.7487), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:02,590 - root - INFO - Step 37660: lr=1.01E-06, loss= 1.1349 (max= 1.7487), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:02,590 - root - INFO - Step 37660: lr=1.01E-06, loss= 1.1349 (max= 1.7487), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:02,590 - root - INFO - Step 37660: lr=1.01E-06, loss= 1.1349 (max= 1.7487), tps=20549, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:02,591 - root - INFO - Step 37660: lr=1.01E-06, loss= 1.1349 (max= 1.7487), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:02,591 - root - INFO - Step 37660: lr=1.01E-06, loss= 1.1349 (max= 1.7487), tps=20548, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:18,548 - root - INFO - Step 37670: lr=9.56E-07, loss= 1.1357 (max= 1.7346), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:18,548 - root - INFO - Step 37670: lr=9.56E-07, loss= 1.1357 (max= 1.7346), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:18,548 - root - INFO - Step 37670: lr=9.56E-07, loss= 1.1357 (max= 1.7346), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:18,548 - root - INFO - Step 37670: lr=9.56E-07, loss= 1.1357 (max= 1.7346), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:18,548 - root - INFO - Step 37670: lr=9.56E-07, loss= 1.1357 (max= 1.7346), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:18,548 - root - INFO - Step 37670: lr=9.56E-07, loss= 1.1357 (max= 1.7346), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:18,548 - root - INFO - Step 37670: lr=9.56E-07, loss= 1.1357 (max= 1.7346), tps=20539, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:18,548 - root - INFO - Step 37670: lr=9.56E-07, loss= 1.1357 (max= 1.7346), tps=20538, mfu=42.79%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:34,518 - root - INFO - Step 37680: lr=9.01E-07, loss= 1.1285 (max= 1.5999), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:18:34,518 - root - INFO - Step 37680: lr=9.01E-07, loss= 1.1285 (max= 1.5999), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:18:34,518 - root - INFO - Step 37680: lr=9.01E-07, loss= 1.1285 (max= 1.5999), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:18:34,518 - root - INFO - Step 37680: lr=9.01E-07, loss= 1.1285 (max= 1.5999), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:18:34,518 - root - INFO - Step 37680: lr=9.01E-07, loss= 1.1285 (max= 1.5999), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:18:34,518 - root - INFO - Step 37680: lr=9.01E-07, loss= 1.1285 (max= 1.5999), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:18:34,518 - root - INFO - Step 37680: lr=9.01E-07, loss= 1.1285 (max= 1.5999), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:18:34,518 - root - INFO - Step 37680: lr=9.01E-07, loss= 1.1285 (max= 1.5999), tps=20523, mfu=42.76%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.00%) +2025-10-25 03:18:50,466 - root - INFO - Step 37690: lr=8.46E-07, loss= 1.1495 (max= 1.5753), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:50,466 - root - INFO - Step 37690: lr=8.46E-07, loss= 1.1495 (max= 1.5753), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:50,466 - root - INFO - Step 37690: lr=8.46E-07, loss= 1.1495 (max= 1.5753), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:50,466 - root - INFO - Step 37690: lr=8.46E-07, loss= 1.1495 (max= 1.5753), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:50,466 - root - INFO - Step 37690: lr=8.46E-07, loss= 1.1495 (max= 1.5753), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:50,466 - root - INFO - Step 37690: lr=8.46E-07, loss= 1.1495 (max= 1.5753), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:50,466 - root - INFO - Step 37690: lr=8.46E-07, loss= 1.1495 (max= 1.5753), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:18:50,466 - root - INFO - Step 37690: lr=8.46E-07, loss= 1.1495 (max= 1.5753), tps=20551, mfu=42.82%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:06,417 - root - INFO - Step 37700: lr=7.91E-07, loss= 1.1507 (max= 1.6606), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:06,417 - root - INFO - Step 37700: lr=7.91E-07, loss= 1.1507 (max= 1.6606), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:06,417 - root - INFO - Step 37700: lr=7.91E-07, loss= 1.1507 (max= 1.6606), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:06,417 - root - INFO - Step 37700: lr=7.91E-07, loss= 1.1507 (max= 1.6606), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:06,417 - root - INFO - Step 37700: lr=7.91E-07, loss= 1.1507 (max= 1.6606), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:06,417 - root - INFO - Step 37700: lr=7.91E-07, loss= 1.1507 (max= 1.6606), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:06,417 - root - INFO - Step 37700: lr=7.91E-07, loss= 1.1507 (max= 1.6606), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:06,417 - root - INFO - Step 37700: lr=7.91E-07, loss= 1.1507 (max= 1.6606), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:22,338 - root - INFO - Step 37710: lr=7.37E-07, loss= 1.1542 (max= 1.5266), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:22,338 - root - INFO - Step 37710: lr=7.37E-07, loss= 1.1542 (max= 1.5266), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:22,338 - root - INFO - Step 37710: lr=7.37E-07, loss= 1.1542 (max= 1.5266), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:22,338 - root - INFO - Step 37710: lr=7.37E-07, loss= 1.1542 (max= 1.5266), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:22,338 - root - INFO - Step 37710: lr=7.37E-07, loss= 1.1542 (max= 1.5266), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:22,338 - root - INFO - Step 37710: lr=7.37E-07, loss= 1.1542 (max= 1.5266), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:22,338 - root - INFO - Step 37710: lr=7.37E-07, loss= 1.1542 (max= 1.5266), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:22,338 - root - INFO - Step 37710: lr=7.37E-07, loss= 1.1542 (max= 1.5266), tps=20586, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:38,331 - root - INFO - Step 37720: lr=6.83E-07, loss= 1.1263 (max= 1.4568), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:38,331 - root - INFO - Step 37720: lr=6.83E-07, loss= 1.1263 (max= 1.4568), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:38,331 - root - INFO - Step 37720: lr=6.83E-07, loss= 1.1263 (max= 1.4568), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:38,331 - root - INFO - Step 37720: lr=6.83E-07, loss= 1.1263 (max= 1.4568), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:38,331 - root - INFO - Step 37720: lr=6.83E-07, loss= 1.1263 (max= 1.4568), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:38,331 - root - INFO - Step 37720: lr=6.83E-07, loss= 1.1263 (max= 1.4568), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:38,331 - root - INFO - Step 37720: lr=6.83E-07, loss= 1.1263 (max= 1.4568), tps=20494, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:38,331 - root - INFO - Step 37720: lr=6.83E-07, loss= 1.1263 (max= 1.4568), tps=20493, mfu=42.70%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:54,254 - root - INFO - Step 37730: lr=6.30E-07, loss= 1.1466 (max= 1.4722), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:54,254 - root - INFO - Step 37730: lr=6.30E-07, loss= 1.1466 (max= 1.4722), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:54,254 - root - INFO - Step 37730: lr=6.30E-07, loss= 1.1466 (max= 1.4722), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:54,254 - root - INFO - Step 37730: lr=6.30E-07, loss= 1.1466 (max= 1.4722), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:54,254 - root - INFO - Step 37730: lr=6.30E-07, loss= 1.1466 (max= 1.4722), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:54,254 - root - INFO - Step 37730: lr=6.30E-07, loss= 1.1466 (max= 1.4722), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:54,254 - root - INFO - Step 37730: lr=6.30E-07, loss= 1.1466 (max= 1.4722), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:19:54,254 - root - INFO - Step 37730: lr=6.30E-07, loss= 1.1466 (max= 1.4722), tps=20584, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:10,205 - root - INFO - Step 37740: lr=5.77E-07, loss= 1.1545 (max= 1.5270), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:10,205 - root - INFO - Step 37740: lr=5.77E-07, loss= 1.1545 (max= 1.5270), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:10,205 - root - INFO - Step 37740: lr=5.77E-07, loss= 1.1545 (max= 1.5270), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:10,205 - root - INFO - Step 37740: lr=5.77E-07, loss= 1.1545 (max= 1.5270), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:10,205 - root - INFO - Step 37740: lr=5.77E-07, loss= 1.1545 (max= 1.5270), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:10,205 - root - INFO - Step 37740: lr=5.77E-07, loss= 1.1545 (max= 1.5270), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:10,205 - root - INFO - Step 37740: lr=5.77E-07, loss= 1.1545 (max= 1.5270), tps=20547, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:10,206 - root - INFO - Step 37740: lr=5.77E-07, loss= 1.1545 (max= 1.5270), tps=20546, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:26,182 - root - INFO - Step 37750: lr=5.24E-07, loss= 1.1558 (max= 1.7008), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:26,182 - root - INFO - Step 37750: lr=5.24E-07, loss= 1.1558 (max= 1.7008), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:26,182 - root - INFO - Step 37750: lr=5.24E-07, loss= 1.1558 (max= 1.7008), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:26,182 - root - INFO - Step 37750: lr=5.24E-07, loss= 1.1558 (max= 1.7008), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:26,182 - root - INFO - Step 37750: lr=5.24E-07, loss= 1.1558 (max= 1.7008), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:26,182 - root - INFO - Step 37750: lr=5.24E-07, loss= 1.1558 (max= 1.7008), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:26,182 - root - INFO - Step 37750: lr=5.24E-07, loss= 1.1558 (max= 1.7008), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:26,182 - root - INFO - Step 37750: lr=5.24E-07, loss= 1.1558 (max= 1.7008), tps=20514, mfu=42.74%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:42,148 - root - INFO - Step 37760: lr=4.71E-07, loss= 1.1572 (max= 1.5057), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:42,148 - root - INFO - Step 37760: lr=4.71E-07, loss= 1.1572 (max= 1.5057), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:42,148 - root - INFO - Step 37760: lr=4.71E-07, loss= 1.1572 (max= 1.5057), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:42,148 - root - INFO - Step 37760: lr=4.71E-07, loss= 1.1572 (max= 1.5057), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:42,148 - root - INFO - Step 37760: lr=4.71E-07, loss= 1.1572 (max= 1.5057), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:42,148 - root - INFO - Step 37760: lr=4.71E-07, loss= 1.1572 (max= 1.5057), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:42,148 - root - INFO - Step 37760: lr=4.71E-07, loss= 1.1572 (max= 1.5057), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:42,148 - root - INFO - Step 37760: lr=4.71E-07, loss= 1.1572 (max= 1.5057), tps=20528, mfu=42.77%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:58,082 - root - INFO - Step 37770: lr=4.19E-07, loss= 1.1556 (max= 1.4949), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:58,082 - root - INFO - Step 37770: lr=4.19E-07, loss= 1.1556 (max= 1.4949), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:58,082 - root - INFO - Step 37770: lr=4.19E-07, loss= 1.1556 (max= 1.4949), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:58,082 - root - INFO - Step 37770: lr=4.19E-07, loss= 1.1556 (max= 1.4949), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:58,082 - root - INFO - Step 37770: lr=4.19E-07, loss= 1.1556 (max= 1.4949), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:58,082 - root - INFO - Step 37770: lr=4.19E-07, loss= 1.1556 (max= 1.4949), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:58,082 - root - INFO - Step 37770: lr=4.19E-07, loss= 1.1556 (max= 1.4949), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:20:58,082 - root - INFO - Step 37770: lr=4.19E-07, loss= 1.1556 (max= 1.4949), tps=20569, mfu=42.86%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:14,021 - root - INFO - Step 37780: lr=3.67E-07, loss= 1.1520 (max= 1.6224), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:14,021 - root - INFO - Step 37780: lr=3.67E-07, loss= 1.1520 (max= 1.6224), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:14,021 - root - INFO - Step 37780: lr=3.67E-07, loss= 1.1520 (max= 1.6224), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:14,021 - root - INFO - Step 37780: lr=3.67E-07, loss= 1.1520 (max= 1.6224), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:14,021 - root - INFO - Step 37780: lr=3.67E-07, loss= 1.1520 (max= 1.6224), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:14,021 - root - INFO - Step 37780: lr=3.67E-07, loss= 1.1520 (max= 1.6224), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:14,021 - root - INFO - Step 37780: lr=3.67E-07, loss= 1.1520 (max= 1.6224), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:14,021 - root - INFO - Step 37780: lr=3.67E-07, loss= 1.1520 (max= 1.6224), tps=20563, mfu=42.84%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:29,993 - root - INFO - Step 37790: lr=3.15E-07, loss= 1.1424 (max= 1.5453), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:29,994 - root - INFO - Step 37790: lr=3.15E-07, loss= 1.1424 (max= 1.5453), tps=20519, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:29,994 - root - INFO - Step 37790: lr=3.15E-07, loss= 1.1424 (max= 1.5453), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:29,994 - root - INFO - Step 37790: lr=3.15E-07, loss= 1.1424 (max= 1.5453), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:29,994 - root - INFO - Step 37790: lr=3.15E-07, loss= 1.1424 (max= 1.5453), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:29,994 - root - INFO - Step 37790: lr=3.15E-07, loss= 1.1424 (max= 1.5453), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:29,994 - root - INFO - Step 37790: lr=3.15E-07, loss= 1.1424 (max= 1.5453), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:29,994 - root - INFO - Step 37790: lr=3.15E-07, loss= 1.1424 (max= 1.5453), tps=20520, mfu=42.75%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:45,939 - root - INFO - Step 37800: lr=2.63E-07, loss= 1.1520 (max= 1.5220), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:45,939 - root - INFO - Step 37800: lr=2.63E-07, loss= 1.1520 (max= 1.5220), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:45,939 - root - INFO - Step 37800: lr=2.63E-07, loss= 1.1520 (max= 1.5220), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:45,939 - root - INFO - Step 37800: lr=2.63E-07, loss= 1.1520 (max= 1.5220), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:45,939 - root - INFO - Step 37800: lr=2.63E-07, loss= 1.1520 (max= 1.5220), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:45,939 - root - INFO - Step 37800: lr=2.63E-07, loss= 1.1520 (max= 1.5220), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:45,939 - root - INFO - Step 37800: lr=2.63E-07, loss= 1.1520 (max= 1.5220), tps=20555, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:45,939 - root - INFO - Step 37800: lr=2.63E-07, loss= 1.1520 (max= 1.5220), tps=20554, mfu=42.83%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:21:51,463 - root - WARNING - Empty document detected at /work/production/data/munin-open-dyna-0-of-1-cp-0-of-16-train/munin-open-dyna-0-of-1-cp-0-of-16-train.parquet:254823 +2025-10-25 03:22:01,900 - root - INFO - Step 37810: lr=2.12E-07, loss= 1.1406 (max= 1.5038), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:01,900 - root - INFO - Step 37810: lr=2.12E-07, loss= 1.1406 (max= 1.5038), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:01,900 - root - INFO - Step 37810: lr=2.12E-07, loss= 1.1406 (max= 1.5038), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:01,901 - root - INFO - Step 37810: lr=2.12E-07, loss= 1.1406 (max= 1.5038), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:01,901 - root - INFO - Step 37810: lr=2.12E-07, loss= 1.1406 (max= 1.5038), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:01,901 - root - INFO - Step 37810: lr=2.12E-07, loss= 1.1406 (max= 1.5038), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:01,901 - root - INFO - Step 37810: lr=2.12E-07, loss= 1.1406 (max= 1.5038), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:01,901 - root - INFO - Step 37810: lr=2.12E-07, loss= 1.1406 (max= 1.5038), tps=20534, mfu=42.78%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:17,853 - root - INFO - Step 37820: lr=1.61E-07, loss= 1.1396 (max= 1.5658), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:17,853 - root - INFO - Step 37820: lr=1.61E-07, loss= 1.1396 (max= 1.5658), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:17,853 - root - INFO - Step 37820: lr=1.61E-07, loss= 1.1396 (max= 1.5658), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:17,853 - root - INFO - Step 37820: lr=1.61E-07, loss= 1.1396 (max= 1.5658), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:17,853 - root - INFO - Step 37820: lr=1.61E-07, loss= 1.1396 (max= 1.5658), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:17,853 - root - INFO - Step 37820: lr=1.61E-07, loss= 1.1396 (max= 1.5658), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:17,853 - root - INFO - Step 37820: lr=1.61E-07, loss= 1.1396 (max= 1.5658), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:17,853 - root - INFO - Step 37820: lr=1.61E-07, loss= 1.1396 (max= 1.5658), tps=20545, mfu=42.81%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:33,752 - root - INFO - Step 37830: lr=1.11E-07, loss= 1.1800 (max= 1.6669), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:33,752 - root - INFO - Step 37830: lr=1.11E-07, loss= 1.1800 (max= 1.6669), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:33,752 - root - INFO - Step 37830: lr=1.11E-07, loss= 1.1800 (max= 1.6669), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:33,752 - root - INFO - Step 37830: lr=1.11E-07, loss= 1.1800 (max= 1.6669), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:33,752 - root - INFO - Step 37830: lr=1.11E-07, loss= 1.1800 (max= 1.6669), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:33,752 - root - INFO - Step 37830: lr=1.11E-07, loss= 1.1800 (max= 1.6669), tps=20614, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:33,753 - root - INFO - Step 37830: lr=1.11E-07, loss= 1.1800 (max= 1.6669), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:33,753 - root - INFO - Step 37830: lr=1.11E-07, loss= 1.1800 (max= 1.6669), tps=20615, mfu=42.95%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:49,632 - root - INFO - Step 37840: lr=6.02E-08, loss= 1.1691 (max= 1.5356), tps=20639, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:49,632 - root - INFO - Step 37840: lr=6.02E-08, loss= 1.1691 (max= 1.5356), tps=20639, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:49,632 - root - INFO - Step 37840: lr=6.02E-08, loss= 1.1691 (max= 1.5356), tps=20640, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:49,632 - root - INFO - Step 37840: lr=6.02E-08, loss= 1.1691 (max= 1.5356), tps=20640, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:49,632 - root - INFO - Step 37840: lr=6.02E-08, loss= 1.1691 (max= 1.5356), tps=20640, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:49,632 - root - INFO - Step 37840: lr=6.02E-08, loss= 1.1691 (max= 1.5356), tps=20639, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:49,632 - root - INFO - Step 37840: lr=6.02E-08, loss= 1.1691 (max= 1.5356), tps=20640, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:22:49,632 - root - INFO - Step 37840: lr=6.02E-08, loss= 1.1691 (max= 1.5356), tps=20639, mfu=43.00%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:23:05,552 - root - INFO - Step 37850: lr=1.00E-08, loss= 1.0880 (max= 1.6204), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:23:05,552 - root - INFO - Step 37850: lr=1.00E-08, loss= 1.0880 (max= 1.6204), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:23:05,552 - root - INFO - Step 37850: lr=1.00E-08, loss= 1.0880 (max= 1.6204), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:23:05,552 - root - INFO - Step 37850: lr=1.00E-08, loss= 1.0880 (max= 1.6204), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:23:05,552 - root - INFO - Step 37850: lr=1.00E-08, loss= 1.0880 (max= 1.6204), tps=20588, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:23:05,552 - root - INFO - Step 37850: lr=1.00E-08, loss= 1.0880 (max= 1.6204), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:23:05,552 - root - INFO - Step 37850: lr=1.00E-08, loss= 1.0880 (max= 1.6204), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:23:05,552 - root - INFO - Step 37850: lr=1.00E-08, loss= 1.0880 (max= 1.6204), tps=20587, mfu=42.89%, memory: 152.90GiB(85.73%) time/data_loading=0.00s (max=0.00s, 0.01%) +2025-10-25 03:23:08,167 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 37852 +2025-10-25 03:23:08,167 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) +2025-10-25 03:23:08,167 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 37852 +2025-10-25 03:23:08,167 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) +2025-10-25 03:23:08,180 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 37852 +2025-10-25 03:23:08,181 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) +2025-10-25 03:23:08,181 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 37852 +2025-10-25 03:23:08,181 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) +2025-10-25 03:23:08,181 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 37852 +2025-10-25 03:23:08,181 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) +2025-10-25 03:23:08,182 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 37852 +2025-10-25 03:23:08,182 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) +2025-10-25 03:23:08,183 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 37852 +2025-10-25 03:23:08,183 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) +2025-10-25 03:23:08,183 - root - INFO - Saving a model weights only checkpoint in torch.bfloat16 at step 37852 +2025-10-25 03:23:08,183 - root - INFO - CheckpointManager: State dict keys: dict_keys(['model']) +2025-10-25 03:23:13,907 - root - INFO - Finished saving the checkpoint in 5.74 seconds +2025-10-25 03:23:13,907 - root - INFO - Sleeping 2 seconds for other ranks to complete +2025-10-25 03:23:13,909 - root - INFO - Finished saving the checkpoint in 5.73 seconds +2025-10-25 03:23:13,909 - root - INFO - Finished saving the checkpoint in 5.73 seconds +2025-10-25 03:23:13,910 - root - INFO - Finished saving the checkpoint in 5.74 seconds +2025-10-25 03:23:13,910 - root - INFO - Finished saving the checkpoint in 5.73 seconds +2025-10-25 03:23:13,910 - root - INFO - Finished saving the checkpoint in 5.73 seconds +2025-10-25 03:23:13,910 - root - INFO - Training successfully completed! +2025-10-25 03:23:13,910 - root - INFO - Training successfully completed! +2025-10-25 03:23:13,910 - root - INFO - Finished saving the checkpoint in 5.73 seconds +2025-10-25 03:23:13,910 - root - INFO - Training successfully completed! +2025-10-25 03:23:13,910 - root - INFO - Training successfully completed! +2025-10-25 03:23:13,910 - root - INFO - Finished saving the checkpoint in 5.73 seconds +2025-10-25 03:23:13,910 - root - INFO - Training successfully completed! +2025-10-25 03:23:13,911 - root - INFO - Training successfully completed! +2025-10-25 03:23:13,911 - root - INFO - Training successfully completed! +2025-10-25 03:23:15,908 - root - INFO - Training successfully completed!