Driver Behavior Detection Model (Epoch 7)
운전자 이상행동 감지를 위한 Video Swin Transformer 기반 모델입니다.
Model Description
- Architecture: Video Swin Transformer Tiny (swin3d_t)
- Backbone Pretrained: Kinetics-400
- Parameters: 27.85M
- Input: [B, 3, 30, 224, 224] (batch, channels, frames, height, width)
Classes (5)
| Label |
Class |
F1-Score |
| 0 |
정상 (Normal) |
0.97 |
| 1 |
졸음운전 (Drowsy Driving) |
0.99 |
| 2 |
물건찾기 (Reaching/Searching) |
0.96 |
| 3 |
휴대폰 사용 (Phone Usage) |
0.96 |
| 4 |
운전자 폭행 (Driver Assault) |
1.00 |
Performance (Epoch 7)
| Metric |
Value |
| Accuracy |
98.05% |
| Macro F1 |
0.9757 |
| Validation Samples |
1,371,062 |
Training Configuration
| Parameter |
Value |
| Hardware |
2x NVIDIA RTX A6000 (48GB) |
| Distributed |
DDP (DistributedDataParallel) |
| Batch Size |
32 (16 x 2 GPU) |
| Gradient Accumulation |
4 |
| Effective Batch |
128 |
| Optimizer |
AdamW (lr=1e-3, wd=0.05) |
| Scheduler |
OneCycleLR |
| Mixed Precision |
FP16 |
| Loss |
CrossEntropy + Label Smoothing (0.1) |
| Regularization |
Mixup (a=0.4), Dropout (0.3) |
Files
| File |
Size |
Description |
pytorch_model.bin |
121 MB |
PyTorch weights (FP32) |
model.onnx |
164 MB |
ONNX model for mobile deployment |
config.json |
1.2 KB |
Model configuration |
model.py |
6.9 KB |
Model architecture code |
convert_coreml_macos.py |
2.2 KB |
CoreML conversion script (macOS) |
Platform-specific Usage
PyTorch (Server/Desktop)
import torch
from model import DriverBehaviorModel
model = DriverBehaviorModel(num_classes=5, pretrained=False)
checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
model.load_state_dict(checkpoint["model"])
model.eval()
iOS (CoreML)
- Copy
model.onnx to macOS
- Run conversion script:
python convert_coreml_macos.py
- Add generated
DriverBehavior.mlpackage to Xcode project
Android (ONNX Runtime)
implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0'
val session = OrtEnvironment.getEnvironment()
.createSession(assetManager.open("model.onnx").readBytes())
val output = session.run(mapOf("video_input" to inputTensor))
Preprocessing (All Platforms)
Input Shape: [1, 3, 30, 224, 224] (batch, channels, frames, height, width)
Channel Order: RGB
Normalization: (pixel / 255.0 - mean) / std
- mean = [0.485, 0.456, 0.406]
- std = [0.229, 0.224, 0.225]
Resize: 224x224 (BILINEAR)
Frames: 30 frames uniformly sampled
Dataset
- Total Videos: 243,979
- Total Samples (windows): 1,371,062
- Window Size: 30 frames
- Stride: 15 frames
- Resolution: 224x224
Training Progress
| Epoch |
Accuracy |
Macro F1 |
| 5 |
97.35% |
0.9666 |
| 6 |
97.74% |
0.9720 |
| 7 |
98.05% |
0.9757 |
License
This model is for research purposes only.
Citation
@misc{driver-behavior-detection-2026,
title={Driver Behavior Detection using Video Swin Transformer},
author={C-Team},
year={2026}
}