Multimodal System - a btjhjeon Collection

btjhjeon 's Collections

Multimodal Agent

Multimodal System

Multimodal Reasoning

Multimodal Analysis

Multimodal Alignment

PEFT

LLM

LLM context length

Multimodal Dataset

Multimodal Benchmarks

Multimodal System

updated 18 days ago

MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Paper • 2503.13964 • Published Mar 18, 2025 • 20
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

Paper • 2510.06710 • Published Oct 8, 2025 • 42
VIDEOP2R: Video Understanding from Perception to Reasoning

Paper • 2511.11113 • Published Nov 14, 2025 • 111
Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models

Paper • 2512.04981 • Published Dec 4, 2025 • 8
Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language Reasoning

Paper • 2512.06835 • Published Dec 7, 2025 • 4
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

Paper • 2601.04720 • Published 28 days ago • 52
BabyVision: Visual Reasoning Beyond Language

Paper • 2601.06521 • Published 26 days ago • 195