MultihopSpatial: Multi-hop Compositional Spatial Reasoning Benchmark for Vision-Language Model Paper • 2603.18892 • Published 1 day ago • 1
MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents Paper • 2603.09827 • Published 10 days ago • 28
🛡️ Safe-VLMs Collection Safe Vision-Language Models with Visual Guard Module (https://huggingface.co/spaces/etri-vilab/Ko-LLaVA) • 9 items • Updated about 12 hours ago • 2
KOALA: Self-Attention Matters in Knowledge Distillation of Latent Diffusion Models for Memory-Efficient and Fast Image Synthesis Paper • 2312.04005 • Published Dec 7, 2023 • 2