Papers - Image - Clip
updated
Paper
• 2309.16671
• Published
• 21
Model Stock: All we need is just a few fine-tuned models
Paper
• 2403.19522
• Published
• 13
Bigger is not Always Better: Scaling Properties of Latent Diffusion
Models
Paper
• 2404.01367
• Published
• 22
On the Scalability of Diffusion-based Text-to-Image Generation
Paper
• 2404.02883
• Published
• 19
Learning Transferable Visual Models From Natural Language Supervision
Paper
• 2103.00020
• Published
• 19
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
Latent Diffusion
Paper
• 2310.03502
• Published
• 79
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper
• 2404.07448
• Published
• 12
Ferret-v2: An Improved Baseline for Referring and Grounding with Large
Language Models
Paper
• 2404.07973
• Published
• 32
RegionGPT: Towards Region Understanding Vision Language Model
Paper
• 2403.02330
• Published
• 2
On Speculative Decoding for Multimodal Large Language Models
Paper
• 2404.08856
• Published
• 13
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper
• 2301.07093
• Published
• 4
A Multimodal Automated Interpretability Agent
Paper
• 2404.14394
• Published
• 23
MultiBooth: Towards Generating All Your Concepts in an Image from Text
Paper
• 2404.14239
• Published
• 9
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster
Pre-training on Web-scale Image-Text Data
Paper
• 2404.15653
• Published
• 29
MoDE: CLIP Data Experts via Clustering
Paper
• 2404.16030
• Published
• 15
DOCCI: Descriptions of Connected and Contrasting Images
Paper
• 2404.19753
• Published
• 13
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation
Paper
• 2404.19427
• Published
• 74
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
Paper
• 2406.06911
• Published
• 12
DataComp: In search of the next generation of multimodal datasets
Paper
• 2304.14108
• Published
• 2
Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity
Paper
• 2406.17720
• Published
• 8
SLIP: Self-supervision meets Language-Image Pre-training
Paper
• 2112.12750
• Published
• 1
Generalized Out-of-Distribution Detection and Beyond in Vision Language
Model Era: A Survey
Paper
• 2407.21794
• Published
• 6
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed
Dual-Branch Diffusion
Paper
• 2403.06976
• Published
• 2