본문 바로가기

Transformer8

[X:AI] ViT 논문 리뷰 논문 원본 : https://arxiv.org/abs/2010.11929 An Image is Worth 16x16 Words: Transformers for Image Recognition at ScaleWhile the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to reparxiv.org 1. IntroductionS.. 2025. 7. 14.
[Paper Review] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks 논문 원본 : https://arxiv.org/abs/2005.11401 Retrieval-Augmented Generation for Knowledge-Intensive NLP TasksLarge pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limarxiv.org 1. Introduction사전학습된 신경망 언.. 2025. 4. 20.
[Paper Review] Robust Speech Recognition via Large-Scale Weak Supervision 논문 원본 : https://arxiv.org/abs/2212.04356?utm_source=chatgpt.com Robust Speech Recognition via Large-Scale Weak SupervisionWe study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standardarxiv.org 1. Introd.. 2025. 4. 19.
[X:AI] BART 논문 리뷰 논문 원본 : https://arxiv.org/abs/1910.13461v1 BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionWe present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tra.. 2025. 2. 11.
[X:AI] RoBERTa 논문 리뷰 논문 원본 : https://arxiv.org/abs/1907.11692 RoBERTa: A Robustly Optimized BERT Pretraining ApproachLanguage model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperpararxiv.org  1. Abstract & Introduction 자기지도학습(S.. 2025. 2. 4.
[X:AI] BERT 논문 리뷰 논문 원본 : https://arxiv.org/abs/1810.04805 BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingWe introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlaarxiv.org 1. Abstract.. 2024. 2. 15.