Breaking down GPT-2 and Transformer

What is GPT-2?

Figure 1: transformer decoder block
Figure 2: Overview of GPT-2 process
Figure 3: Matrix of token embeddings, the length of embedding varies with regard to the model size
Figure 4: Matrix of positional embedding
Figure 5: Input embedding is the sum of token embedding and positional embedding

Self-attention mechanism

Figure 6: brief illustration of a masked self-attention layer
Figure 7: analogy of self-attention to sticky note matching
Figure 8: The multiplication between the query vector and each key vector will be a probability
Figure 9: illustration of calculating weighted vector corresponding to “it”
Figure 10: the first layer of feed-forward neural network
Figure 11: the second layer of feed-forward neural network
Figure 12: matrices inside a transformer decoder block
Figure 13: reshape long vector to get Q, K, V vector for each attention head
Figure 14: concatenation of output of attention heads

Enjoying the landscape of the whole GPT-2 model

Figure 15: whole picture of GPT-2

Reference

--

--

--

Ph.D. student in Computer Science

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

A Deep Look @ Deep Learning — Part 02

Vehicle’s Number Plate Detection using CNN model using python and Flask API…

Hands On Image Compression using Principal Component Analysis, with Python

Snips Open Sources Tract

arm weekly #10 — ML Ops

Paper Review: Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection

Contextual Embeddings for NLP Sequence Labeling

Machine Learning Reductions & Mother Algorithms, Part II: Multiclass to Binary Classification

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Zheng Zhang

Zheng Zhang

Ph.D. student in Computer Science

More from Medium

GPT-3 and Everything About it

Best Choice of 50 branded hotels in Taiwan, Aiello Intelligent Voice Assistant wins good reputation

Prompt Engineering for Large Language Models ( Few-shot learners)

AI: Rule Based & ML Based