Week 3 - Research Papers
These notes were developed using lectures/material/transcripts from the DeepLearning.AI & AWS - Generative AI with Large Language Models course
Reinforcement Learning from Human Feedback (RLHF)
- Training language models to follow instructions with human feedback - Paper by OpenAI introducing a human-in-the-loop process to create a model that is better at following instructions (InstructGPT).
- Learning to summarize from human feedback - This paper presents a method for improving language model-generated summaries using a reward-based approach, surpassing human reference summaries.
Proximal Policy Optimization (PPO)
- Proximal Policy Optimization Algorithms - The paper from researchers at OpenAI that first proposed the PPO algorithm. The paper discusses the performance of the algorithm on a number of benchmark tasks including robotic locomotion and game play.
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model - This paper presents a simpler and effective method for precise control of large-scale unsupervised language models by aligning them with human preferences.
Scaling Human Feedback
- Constitutional AI: Harmlessness from AI Feedback - ****This paper introduces a method for training a harmless AI assistant without human labels, allowing better control of AI behavior with minimal human input.
Advanced Prompting Techniques
- Chain-of-thought Prompting Elicits Reasoning in Large Language Models - Paper by researchers at Google exploring how chain-of-thought prompting improves the ability of LLMs to perform complex reasoning.
- PAL: Program-aided Language Models - This paper proposes an approach that uses the LLM to read natural language problems and generate programs as the intermediate reasoning steps.
- ReAct: Synergizing Reasoning and Acting in Language Models - This paper presents an advanced prompting technique that allows an LLM to make decisions about how to interact with external applications.
LLM-powered Application Architectures
- LangChain Library (GitHub) - This library is aimed at assisting in the development of those types of applications, such as Question Answering, Chatbots and other Agents. You can read the documentation here.
- Who Owns the Generative AI Platform? - The article examines the market dynamics and business models of generative AI.