Rao MingChu's picture

Rao MingChu

maomao1840855365

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 18 days ago

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

liked a dataset 2 months ago

FreedomIntelligence/TCM-Pretrain-Data-ShizhenGPT

upvoted a collection 2 months ago

View all activity

Organizations

None yet

upvoted a paper 18 days ago

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

Paper • 2605.12500 • Published 26 days ago • 191

upvoted a collection 2 months ago

Qwen3-VL

37 items • Updated Dec 31, 2025 • 738

upvoted an article 2 months ago

Article

Welcome Gemma 4: Frontier multimodal intelligence on device

+5

merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift

•

Apr 2

• 907

upvoted a collection 3 months ago

IFBench

Datasets for IFBench benchmark and paper! • 3 items • Updated Dec 23, 2025 • 12

upvoted a paper 3 months ago

DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

Paper • 2602.22839 • Published Feb 26 • 3

upvoted 2 collections 3 months ago

PPTBench

An extensive benchmark for LLMs on PowerPoint related tasks. • 4 items • Updated Jan 6, 2025 • 1

Qwen3.5

21 items • Updated Mar 9 • 1.67k

upvoted an article 3 months ago

Article

NEO-unify: Building Native Multimodal Unified Models End to End

sensenova

•

Mar 5

• 164

upvoted 2 collections 3 months ago

NEO1_5

From Pixels to Words -- Towards Native One-Vision Models at Scale • 3 items • Updated 10 days ago • 6

NEO1_0

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale • 7 items • Updated Jan 27 • 9

upvoted a paper 5 months ago

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Paper • 2404.16821 • Published Apr 25, 2024 • 59

upvoted a paper 6 months ago

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Paper • 2312.14238 • Published Dec 21, 2023 • 20

upvoted a collection 6 months ago

PaddleOCR-VL

Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model • 5 items • Updated Feb 11 • 32