본문 바로가기

Transformer5

[X:AI] BART 논문 리뷰 논문 원본 : https://arxiv.org/abs/1910.13461v1 BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionWe present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tra.. 2025. 2. 11.
[X:AI] RoBERTa 논문 리뷰 논문 원본 : https://arxiv.org/abs/1907.11692 RoBERTa: A Robustly Optimized BERT Pretraining ApproachLanguage model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperpararxiv.org  1. Abstract & Introduction 자기지도학습(S.. 2025. 2. 4.
[X:AI] BERT 논문 리뷰 논문 원본 : https://arxiv.org/abs/1810.04805 BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingWe introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlaarxiv.org 1. Abstract.. 2024. 2. 15.
[X:AI] GPT-1 논문 리뷰 Improving Language Understanding by Generative Pre-Training논문 원본 : https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf Abstract자연어 이해(NLU)는 텍스트 함의(textual entailment), 질문 응답(question answering), 의미 유사도 평가(semantic similarity assessment), 문서 분류(document classification) 등 다양한 작업을 포함기존에는 이런 작업을 수행하려면, 각 작업마다 특별히 설계된 모델이 필요하지만 문제는, 그런 모델을 훈련하려면 라벨이 있는 데이터가 필요한데, 이런 데이터가 부족하다는 점그래.. 2024. 2. 11.
[X:AI] Transformer 논문 리뷰 논문 원본 : https://arxiv.org/abs/1706.03762 Attention Is All You NeedThe dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a newarxiv.org  Abstract지배적인 시퀀스 변환 모델은 인코더와 디코더를 포함하는 RNN 또는 CNN 신경망을 기반으로 함최고의 성.. 2024. 2. 10.