Breaking down GPT-2 and Transformer

What is GPT-2?

Figure 1: transformer decoder block
Figure 2: Overview of GPT-2 process
Figure 3: Matrix of token embeddings, the length of embedding varies with regard to the model size
Figure 4: Matrix of positional embedding
Figure 5: Input embedding is the sum of token embedding and positional embedding

Self-attention mechanism

Figure 6: brief illustration of a masked self-attention layer
Figure 7: analogy of self-attention to sticky note matching
Figure 8: The multiplication between the query vector and each key vector will be a probability
Figure 9: illustration of calculating weighted vector corresponding to “it”
Figure 10: the first layer of feed-forward neural network
Figure 11: the second layer of feed-forward neural network
Figure 12: matrices inside a transformer decoder block
Figure 13: reshape long vector to get Q, K, V vector for each attention head
Figure 14: concatenation of output of attention heads

Enjoying the landscape of the whole GPT-2 model

Figure 15: whole picture of GPT-2





Ph.D. student in Computer Science

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Introduction to Machine Learning with Python.

Prune Tacotron2 and Fastspeech2 models with Magnitude based pruning algorithm (MBP or MP)

YOLO: Introduction | Custom Testing

How to Replace Backgrounds in Spark AR Studio

Building an architecturally viable Facial Recognition System

Machine Learning in Snowflake

Image Segmentation

Good Sources to Learn CNNs, CapsNet, ViT, and GNNs — 4

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Zheng Zhang

Zheng Zhang

Ph.D. student in Computer Science

More from Medium

Read Paper With Me: Bootstrap Your Own Latent: A New Approach To Self-Supervised Learning

AI/ML news in Feb 2022

AI black-box vs. Human black-box, why is it significant to make AI explainable?

ECCO : Deciphering the Disguised Transformers.