Week 1 - Research Papers

These notes were developed using lectures/material/transcripts from the DeepLearning.AI & AWS - Generative AI with Large Language Models course

Transformer Architecture

  • Attention is All You Need - This paper introduced the Transformer architecture, with the core “self-attention” mechanism. This article was the foundation for LLMs.
  • BLOOM: BigScience 176B Model - BLOOM is a open-source LLM with 176B parameters (similar to GPT-3) trained in an open and transparent way. In this paper, the authors present a detailed discussion of the dataset and process used to train the model. You can also see a high-level overview of the model here.
  • Vector Space Models - Series of lessons from DeepLearning.AI’s Natural Language Processing specialization discussing the basics of vector space models and their use in language modeling.

Pre-training and Scaling Laws

Model Architectures and Pre-training Objectives

Scaling Laws and Compute-Optimal Models