Driver Behavior Detection Model (Epoch 7)

운전자 이상행동 감지를 위한 Video Swin Transformer 기반 모델입니다.

Model Description

  • Architecture: Video Swin Transformer Tiny (swin3d_t)
  • Backbone Pretrained: Kinetics-400
  • Parameters: 27.85M
  • Input: [B, 3, 30, 224, 224] (batch, channels, frames, height, width)

Classes (5)

Label Class F1-Score
0 정상 (Normal) 0.97
1 졸음운전 (Drowsy Driving) 0.99
2 물건찾기 (Reaching/Searching) 0.96
3 휴대폰 사용 (Phone Usage) 0.96
4 운전자 폭행 (Driver Assault) 1.00

Performance (Epoch 7)

Metric Value
Accuracy 98.05%
Macro F1 0.9757
Validation Samples 1,371,062

Training Configuration

Parameter Value
Hardware 2x NVIDIA RTX A6000 (48GB)
Distributed DDP (DistributedDataParallel)
Batch Size 32 (16 x 2 GPU)
Gradient Accumulation 4
Effective Batch 128
Optimizer AdamW (lr=1e-3, wd=0.05)
Scheduler OneCycleLR
Mixed Precision FP16
Loss CrossEntropy + Label Smoothing (0.1)
Regularization Mixup (a=0.4), Dropout (0.3)

Files

File Size Description
pytorch_model.bin 121 MB PyTorch weights (FP32)
model.onnx 164 MB ONNX model for mobile deployment
config.json 1.2 KB Model configuration
model.py 6.9 KB Model architecture code
convert_coreml_macos.py 2.2 KB CoreML conversion script (macOS)

Platform-specific Usage

PyTorch (Server/Desktop)

import torch
from model import DriverBehaviorModel

model = DriverBehaviorModel(num_classes=5, pretrained=False)
checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
model.load_state_dict(checkpoint["model"])
model.eval()

iOS (CoreML)

  1. Copy model.onnx to macOS
  2. Run conversion script:
python convert_coreml_macos.py
  1. Add generated DriverBehavior.mlpackage to Xcode project

Android (ONNX Runtime)

// build.gradle
implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0'

// Kotlin
val session = OrtEnvironment.getEnvironment()
    .createSession(assetManager.open("model.onnx").readBytes())

val output = session.run(mapOf("video_input" to inputTensor))

Preprocessing (All Platforms)

Input Shape: [1, 3, 30, 224, 224]  (batch, channels, frames, height, width)
Channel Order: RGB
Normalization: (pixel / 255.0 - mean) / std
  - mean = [0.485, 0.456, 0.406]
  - std = [0.229, 0.224, 0.225]
Resize: 224x224 (BILINEAR)
Frames: 30 frames uniformly sampled

Dataset

  • Total Videos: 243,979
  • Total Samples (windows): 1,371,062
  • Window Size: 30 frames
  • Stride: 15 frames
  • Resolution: 224x224

Training Progress

Epoch Accuracy Macro F1
5 97.35% 0.9666
6 97.74% 0.9720
7 98.05% 0.9757

License

This model is for research purposes only.

Citation

@misc{driver-behavior-detection-2026,
  title={Driver Behavior Detection using Video Swin Transformer},
  author={C-Team},
  year={2026}
}
Downloads last month
31
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results