GPT-2 has shown an impressive capacity of getting around a wide range of NLP tasks. In this article, I will break down the inner workings of this versatile model, illustrating the architecture of GPT-2 and its essential component — transformer. This article distills the content of Jay Alammar’s inspirational blog…