논문 리뷰/Multi Modal3 [X:AI] Flamingo 논문 리뷰 Flamingo: a Visual Language Model for Few-Shot Learning 🦩논문 원본 : https://arxiv.org/abs/2204.14198 Flamingo: a Visual Language Model for Few-Shot LearningBuilding models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this abili.. 2024. 8. 26. [X:AI] DALL-E 논문 리뷰 Zero-Shot Text-to-Image Generation논문 원본 : https://arxiv.org/abs/2102.12092 Zero-Shot Text-to-Image GenerationText-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentatiarxiv.org 2. Method Stage 1Di.. 2024. 8. 20. [X:AI] BLIP 논문 리뷰 Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation논문 원본 : https://arxiv.org/abs/2201.12086 BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and GenerationVision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. However, most existing pre-trained models only excel in eit.. 2024. 8. 20. 이전 1 다음