Hands-On-Large-Language-Models/chapter03/Chapter 3 - Looking Inside LLMs.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "adFzzFsB-Ofl"
      },
      "source": [
        "<h1>Chapter 3 - Looking Inside Transformer LLMs</h1>\n",
        "<i>An extensive look into the transformer architecture of generative LLMs</i>\n",
        "\n",
        "<a href=\"https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961\"><img src=\"https://img.shields.io/badge/Buy%20the%20Book!-grey?logo=amazon\"></a>\n",
        "<a href=\"https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/\"><img src=\"https://img.shields.io/badge/O'Reilly-white.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB3aWR0aD0iMzQiIGhlaWdodD0iMjciIHZpZXdCb3g9IjAgMCAzNCAyNyIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGNpcmNsZSBjeD0iMTMiIGN5PSIxNCIgcj0iMTEiIHN0cm9rZT0iI0Q0MDEwMSIgc3Ryb2tlLXdpZHRoPSI0Ii8+CjxjaXJjbGUgY3g9IjMwLjUiIGN5PSIzLjUiIHI9IjMuNSIgZmlsbD0iI0Q0MDEwMSIvPgo8L3N2Zz4K\"></a>\n",
        "<a href=\"https://github.com/HandsOnLLM/Hands-On-Large-Language-Models\"><img src=\"https://img.shields.io/badge/GitHub%20Repository-black?logo=github\"></a>\n",
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter03/Chapter%203%20-%20Looking%20Inside%20LLMs.ipynb)\n",
        "\n",
        "---\n",
        "\n",
        "This notebook is for Chapter 3 of the [Hands-On Large Language Models](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961) book by [Jay Alammar](https://www.linkedin.com/in/jalammar) and [Maarten Grootendorst](https://www.linkedin.com/in/mgrootendorst/).\n",
        "\n",
        "---\n",
        "\n",
        "<a href=\"https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961\">\n",
        "<img src=\"https://raw.githubusercontent.com/HandsOnLLM/Hands-On-Large-Language-Models/main/images/book_cover.png\" width=\"350\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### [OPTIONAL] - Installing Packages on <img src=\"https://colab.google/static/images/icons/colab.png\" width=100>\n",
        "\n",
        "If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:\n",
        "\n",
        "---\n",
        "\n",
        "≡ƒÆí **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to\n",
        "**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.\n",
        "\n",
        "---\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# %%capture\n",
        "# !pip install transformers>=4.41.2 accelerate>=0.31.0"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "W_23Z_do-faF"
      },
      "source": [
        "# Loading the LLM"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 759,
          "referenced_widgets": [
            "5ecafba0f8f04685a56d2b1495baea24",
            "53945d03a26044878ddb7fc6eadfd8db",
            "99db408011ea43c79adbd4a880839484",
            "fd181746067945febc55f94e2dcf6f67",
            "c5acd3e22bbe4a1897f7a12051e8eae9",
            "446d084fde5f422a8c0525e8c5b47f93",
            "6a0dabe02c874ecabd257c5da5f4a7c5",
            "1ee8996717864f79af6cf0314cd27c59",
            "7c719ce3694845b384ad4ae7207d31cc",
            "989fc06dfb25420eaf155bfc0174c692",
            "443850bc37d94a90aa037e73a77f9369",
            "184bbd6daf424ddeba7f8cbe0b5b34d9",
            "a6ee07658a234a54b38d849ff2017d6c",
            "bf63e0a23fa147778dd88b48075611e3",
            "acf8b6868c484885871964cc538359c6",
            "32cc3b668de84e8a9f8cde25d3846822",
            "b1115e5097c4404694e5a7c228e25234",
            "6ca20198bc144510a14b795aaf813940",
            "982a35743f51448cba583e54e7d3987d",
            "17250e2791ba42099c50efa594e229ce",
            "a4c438b7029f47d9a54d5a0ce16541c0",
            "fafd3d7714d4466ea00f354b20dc954a",
            "0a18c83d2645496797d74aef5e84dafa",
            "bfbbea9912104cb1803c7c827c1cea7d",
            "18c23f29824e4cb58e461bf89811a32a",
            "74105769e6a541098769921b84d1f1cd",
            "52109f76852e4388b6df4e5441c43a48",
            "4f39386aea7c4ac9abf2d66291cbcb4e",
            "4893546c2d6f4eec9cb5b7f6853013f0",
            "88616761593a4b2b9365294427a3136d",
            "d3ab3ad192f54d5b9f9e08db62884f0c",
            "6be54d3b31864134bc36d3b9d997530a",
            "27a1df02460948e19cbd40b69a89bec6",
            "6b81bad9c639454980de2a67b414b988",
            "1e8224a73a724058b28a13db2c2197c2",
            "3cbbd1ae4e4d4bf1a34c9d960ee7d26d",
            "e9c0309e4af44726b180c977692e2469",
            "97421a19ce25438e846a84b45321f9d0",
            "c6aa0ad56ce44b1eab1356d3bc56706b",
            "87d0758fdaaa4d90b21b1291f4d20039",
            "8c85660c9e9f4027bd64d596160a7d7f",
            "fce8b8595211452f8a759a6e98410f6a",
            "4249df3055d4427493e2a2e775dc1a93",
            "e42e39002e66410b9d714656a632adea",
            "3cf79ac9541c4b7bbf686b604ff73b81",
            "b709537463c3485dbbcef93e3636af2d",
            "cf7c24d7629b4362ae0905f8e4bbd997",
            "d9f9f98cb9dd465daed1da06d7d4084b",
            "cb20a4ccd9ea4548bc7f694068bb30f6",
            "b91d534a7a5c4863a9e858cedbda9fd5",
            "a65601c4918c4c2cb2020837cd1e1f85",
            "99505d538b894a359bc3791cd95423d5",
            "1a48245eddfc4923a5f267cd799ba9c7",
            "6c50ddb227b4421da5cd391e4d6ec94c",
            "7d62b5c9c90644378b0ca96cee430419",
            "9ec56d025f8446a08184b055fe11598e",
            "54894d44acce4fe6b0ef749d5c02e3cd",
            "e2dd956536a0407fb9b9c4a01c01ba9c",
            "54370883565644c5a2529e092db7f259",
            "7ec24f1bff5f4ca78d7c36a637cfa294",
            "36ff492b5c7c4ff3bcf72cc574d03a38",
            "8279cee867884166bb09df4e02635e2a",
            "a2644dee82b14cfbab069a484f3841e2",
            "e761a9afe1b847579fb51eb0eddd4488",
            "c59d9e04e5964a8891cc3ddce33d6f86",
            "a9007e7552ab4634ac44577279b242ac",
            "2cff15834149418c81eee5239a5e275a",
            "6623f077d95f4faf883fa2ca4397169d",
            "98529c941229460da8563b94d3419c37",
            "3bef1bf002c94340a3323592d616e7ac",
            "47c8fd6b7c9844d983014e07f1999cd2",
            "aa24ef9c6e1d49708b7bb4755a9adb78",
            "28261270cd0449f8ac85dc4b0efdac57",
            "a231962de8584f69a6a107c275d1cda5",
            "2d35dfe75a744987ab210f5fb0118301",
            "1cbaa3bcf20b4af099f6d4310dd071dd",
            "4f7088db853e47e6b6a6ccd654e379dc",
            "7e0710dc5b5c4002b4bcc189bf5514cf",
            "66dd13ca8234409eab15ecbf7a009914",
            "8f021e2bfb7246248baa49b08f4d3358",
            "0a91788829fc49ef95a69efd3256a8e3",
            "b42a56f9b2e44b6992413e023ed44b0e",
            "f9fc98e3d8ed4338bd763d152f8cc5f9",
            "f786e0117562476697092ef828ceb1b2",
            "07fed042c0894ca5aebe717eca6f3018",
            "3bea08fef3ef456e9f180d8a23ded5bc",
            "35f3727f37c44640986d8416141f8069",
            "0dd1ce8a2306431c9b0412e5992f4f84",
            "dcea31213c9f421abc7ffabc3499ecb9",
            "61643fa2fcb54ebaad8da5106dec9ea0",
            "0c4f8c58d213493494120c070d86ac76",
            "e15e9978db794891b9fa0d8ce096c983",
            "b15098d69b5f418da3f81aba8fb79de0",
            "f7cf40042a2e4c8cb8b87444893d8ec1",
            "6dcbc6ec46d44a55a1b38267201926b8",
            "6de4dd6e36b444638b2f65e0ca80bc9a",
            "60febb04a12447c192e1b8eb2aa5ba28",
            "6d200e6b1bdc4e918093670df8e37dc8",
            "934e950a3c0c48f08802d551ee1bd429",
            "34058533e3cf46a88db8927372102b9f",
            "2a7ba0f87814436386e66f4ef7f1111d",
            "d3e41286c5b747a8bb5cf326f9f80ad3",
            "fe92dc2d5e8c449bbb36dafbd6c9935f",
            "615d4ec0fb194688a392b71b99cf3621",
            "e2056e3dba884242807a98b9b3837843",
            "6ca50fb814b64fccaf0e1c6c11d8f4d8",
            "9ec1d173921748b2af19a3a21df9ed40",
            "f8c3d566e98d47239ef2b823544b75a5",
            "3618dcec35a740c485ecafa5589e0c91",
            "f66dd730f3364e35974a3918d12ff51d",
            "dcb9240335394bfa8d3949ef1cdbcdf8",
            "2df88b2fd5e242eca8b1d3f6cea1349b",
            "1b103c69baf74fceb551ebcb5a0ac5e8",
            "6847b7b6b3854d6e9b72f40040f84c8f",
            "ac3e67f03f604883ab4787930cd316f3",
            "6d7e012a7fbf4d788054ead5020e9314",
            "a432e2e32c7c4e56b138ebaefad76c93",
            "352fe4ea215240149d73478e34cd9b66",
            "bf9bc31d7f99477bb391f69df41b8dbe",
            "41b8e463309e4013a50268677f44d4b5",
            "989326b74cd146e3b5b2f2d5f19bdf41",
            "36b7269d8eb849b084694fd1f3b177b9",
            "2d09f9ed15fa452ba8d9ce9aba9f61ac",
            "53818082d62749a28552a1eebd304d88",
            "766f67cb6a8a4a58965edb671ce624e8",
            "a86d22308dbf4c11bf3d6f6515aef561",
            "e9e7d944715b402ab149f86862b92259",
            "4aaa732bb1b94b4895ca3f00f93cd762",
            "9fb02b3bbe79434a93f32291c208aaad",
            "dd9b3a5e84ba44cb9717ded470c258b4",
            "fa89524d446b480aa50d203d01ec7bb7",
            "c8b77256d5fc436fbfdcc150843a6b5b",
            "f5f5b592768048169676e09cca453645",
            "77f40b8bf30c437ba987b71178d0e9f6",
            "3f39ec300bd84852a2388dadaafd8c4b",
            "250cac43e6da47dd8d732ea57d8c50ec",
            "f929d12aad68458b98c21e0669da3d8e",
            "44ddbcadcc4c477c80daf278122de46d",
            "eb606db4125e4eb097d5b7d3cdb90976",
            "b6c177de60b54edd887d1ca983ea7546",
            "6db56d7c52244a3984a0638e060a81cb",
            "11d17ae63dd44ecc8813d482ee17dd95",
            "ff3733c6a1f34580b037e296e3abed6b",
            "2d31d51641e945f695f7315b68e0ad2e",
            "1b694930328e46bd9e0d61063b9141d4",
            "b544b6f2c2bb4f36bf9a983005a8bdb8",
            "afb8b0e602b649fcb92634c8aab4caf7",
            "24fc703c916f43aab5900288b8aa5aca",
            "16cffa93ab234718a8ede1044596e8b2",
            "2d7999217424413d988cd29d41ed5ace",
            "0d1546581c90418fa1cfc37491339134",
            "0e233853a3b74211a0b65dcdd001feed",
            "a1d2163af40a4aaab8a765b148845807",
            "2aaa722b303b4e1f823cee828fc7958c"
          ]
        },
        "executionInfo": {
          "elapsed": 130259,
          "status": "ok",
          "timestamp": 1718959891215,
          "user": {
            "displayName": "Maarten Grootendorst",
            "userId": "11015108362723620659"
          },
          "user_tz": -120
        },
        "id": "-5RLd6dI-Ytm",
        "outputId": "fb085ff7-e06f-4142-8e95-5ff98b212e37"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: \n",
            "The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
            "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
            "You will be able to reuse this secret in all of your notebooks.\n",
            "Please note that authentication is recommended but still optional to access public models or datasets.\n",
            "  warnings.warn(\n"
          ]
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "5ecafba0f8f04685a56d2b1495baea24",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "tokenizer_config.json:   0%|          | 0.00/3.17k [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "184bbd6daf424ddeba7f8cbe0b5b34d9",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "0a18c83d2645496797d74aef5e84dafa",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "6b81bad9c639454980de2a67b414b988",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "3cf79ac9541c4b7bbf686b604ff73b81",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "special_tokens_map.json:   0%|          | 0.00/568 [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
          ]
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "9ec56d025f8446a08184b055fe11598e",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "config.json:   0%|          | 0.00/931 [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "2cff15834149418c81eee5239a5e275a",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "configuration_phi3.py:   0%|          | 0.00/10.4k [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:\n",
            "- configuration_phi3.py\n",
            ". Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.\n"
          ]
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "7e0710dc5b5c4002b4bcc189bf5514cf",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "modeling_phi3.py:   0%|          | 0.00/73.8k [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:\n",
            "- modeling_phi3.py\n",
            ". Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.\n",
            "WARNING:transformers_modules.microsoft.Phi-3-mini-4k-instruct.ff07dc01615f8113924aed013115ab2abd32115b.modeling_phi3:`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.\n",
            "WARNING:transformers_modules.microsoft.Phi-3-mini-4k-instruct.ff07dc01615f8113924aed013115ab2abd32115b.modeling_phi3:Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.\n"
          ]
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "dcea31213c9f421abc7ffabc3499ecb9",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "model.safetensors.index.json:   0%|          | 0.00/16.3k [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "34058533e3cf46a88db8927372102b9f",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "dcb9240335394bfa8d3949ef1cdbcdf8",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "36b7269d8eb849b084694fd1f3b177b9",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "f5f5b592768048169676e09cca453645",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "2d31d51641e945f695f7315b68e0ad2e",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline\n",
        "\n",
        "# Load model and tokenizer\n",
        "tokenizer = AutoTokenizer.from_pretrained(\"microsoft/Phi-3-mini-4k-instruct\")\n",
        "\n",
        "model = AutoModelForCausalLM.from_pretrained(\n",
        "    \"microsoft/Phi-3-mini-4k-instruct\",\n",
        "    device_map=\"cuda\",\n",
        "    torch_dtype=\"auto\",\n",
        "    trust_remote_code=False,\n",
        ")\n",
        "\n",
        "# Create a pipeline\n",
        "generator = pipeline(\n",
        "    \"text-generation\",\n",
        "    model=model,\n",
        "    tokenizer=tokenizer,\n",
        "    return_full_text=False,\n",
        "    max_new_tokens=50,\n",
        "    do_sample=False,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "REqcz-ID_XgV"
      },
      "source": [
        "# The Inputs and Outputs of a Trained Transformer LLM\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "executionInfo": {
          "elapsed": 4955,
          "status": "ok",
          "timestamp": 1718959896168,
          "user": {
            "displayName": "Maarten Grootendorst",
            "userId": "11015108362723620659"
          },
          "user_tz": -120
        },
        "id": "17h6TPHluJ-i",
        "outputId": "18727eeb-ccd6-40f8-aab1-25c8d9a03cbe"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "WARNING:transformers_modules.microsoft.Phi-3-mini-4k-instruct.ff07dc01615f8113924aed013115ab2abd32115b.modeling_phi3:You are not running the flash-attention implementation, expect numerical differences.\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "\n",
            "\n",
            "\n",
            "Solution 1:\n",
            "\n",
            "Subject: My Sincere Apologies for the Gardening Mishap\n",
            "\n",
            "\n",
            "Dear Sarah,\n",
            "\n",
            "\n",
            "I hope this message finds you well. I am writing to express my deep\n"
          ]
        }
      ],
      "source": [
        "prompt = \"Write an email apologizing to Sarah for the tragic gardening mishap. Explain how it happened.\"\n",
        "\n",
        "output = generator(prompt)\n",
        "\n",
        "print(output[0]['generated_text'])"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "executionInfo": {
          "elapsed": 1,
          "status": "ok",
          "timestamp": 1718959898745,
          "user": {
            "displayName": "Maarten Grootendorst",
            "userId": "11015108362723620659"
          },
          "user_tz": -120
        },
        "id": "eoFkdTd6_g5o",
        "outputId": "bdcfde9f-28b7-4f43-ec0c-32c16677a776"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Phi3ForCausalLM(\n",
            "  (model): Phi3Model(\n",
            "    (embed_tokens): Embedding(32064, 3072, padding_idx=32000)\n",
            "    (embed_dropout): Dropout(p=0.0, inplace=False)\n",
            "    (layers): ModuleList(\n",
            "      (0-31): 32 x Phi3DecoderLayer(\n",
            "        (self_attn): Phi3Attention(\n",
            "          (o_proj): Linear(in_features=3072, out_features=3072, bias=False)\n",
            "          (qkv_proj): Linear(in_features=3072, out_features=9216, bias=False)\n",
            "          (rotary_emb): Phi3RotaryEmbedding()\n",
            "        )\n",
            "        (mlp): Phi3MLP(\n",
            "          (gate_up_proj): Linear(in_features=3072, out_features=16384, bias=False)\n",
            "          (down_proj): Linear(in_features=8192, out_features=3072, bias=False)\n",
            "          (activation_fn): SiLU()\n",
            "        )\n",
            "        (input_layernorm): Phi3RMSNorm()\n",
            "        (resid_attn_dropout): Dropout(p=0.0, inplace=False)\n",
            "        (resid_mlp_dropout): Dropout(p=0.0, inplace=False)\n",
            "        (post_attention_layernorm): Phi3RMSNorm()\n",
            "      )\n",
            "    )\n",
            "    (norm): Phi3RMSNorm()\n",
            "  )\n",
            "  (lm_head): Linear(in_features=3072, out_features=32064, bias=False)\n",
            ")\n"
          ]
        }
      ],
      "source": [
        "print(model)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RTrwzB67BYVY"
      },
      "source": [
        "# Choosing a single token from the probability distribution (sampling / decoding)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "sEcxYgJxBYbJ"
      },
      "outputs": [],
      "source": [
        "prompt = \"The capital of France is\"\n",
        "\n",
        "# Tokenize the input prompt\n",
        "input_ids = tokenizer(prompt, return_tensors=\"pt\").input_ids\n",
        "\n",
        "# Tokenize the input prompt\n",
        "input_ids = input_ids.to(\"cuda\")\n",
        "\n",
        "# Get the output of the model before the lm_head\n",
        "model_output = model.model(input_ids)\n",
        "\n",
        "# Get the output of the lm_head\n",
        "lm_head_output = model.lm_head(model_output[0])"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 36
        },
        "executionInfo": {
          "elapsed": 421,
          "status": "ok",
          "timestamp": 1718960391623,
          "user": {
            "displayName": "Maarten Grootendorst",
            "userId": "11015108362723620659"
          },
          "user_tz": -120
        },
        "id": "68YUSS4GBf9Q",
        "outputId": "2dc25e8d-03b6-4bca-b46c-fec3e3a4a492"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            },
            "text/plain": [
              "'Paris'"
            ]
          },
          "execution_count": 6,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "token_id = lm_head_output[0,-1].argmax(-1)\n",
        "tokenizer.decode(token_id)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "executionInfo": {
          "elapsed": 901,
          "status": "ok",
          "timestamp": 1718960415287,
          "user": {
            "displayName": "Maarten Grootendorst",
            "userId": "11015108362723620659"
          },
          "user_tz": -120
        },
        "id": "cWWrfC5oBjwp",
        "outputId": "c2fdeab7-e787-466f-88f4-988cd5f939a6"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "torch.Size([1, 6, 3072])"
            ]
          },
          "execution_count": 7,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "model_output[0].shape"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "executionInfo": {
          "elapsed": 1079,
          "status": "ok",
          "timestamp": 1718960424560,
          "user": {
            "displayName": "Maarten Grootendorst",
            "userId": "11015108362723620659"
          },
          "user_tz": -120
        },
        "id": "nC1PdOnTBnxZ",
        "outputId": "1fd5f482-7046-4536-b745-4e681d6ecdaf"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "torch.Size([1, 6, 32064])"
            ]
          },
          "execution_count": 8,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "lm_head_output.shape"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Of2_rP4QBqrZ"
      },
      "source": [
        "# Speeding up generation by caching keys and values\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "B0n6JhNHBrin"
      },
      "outputs": [],
      "source": [
        "prompt = \"Write a very long email apologizing to Sarah for the tragic gardening mishap. Explain how it happened.\"\n",
        "\n",
        "# Tokenize the input prompt\n",
        "input_ids = tokenizer(prompt, return_tensors=\"pt\").input_ids\n",
        "input_ids = input_ids.to(\"cuda\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "executionInfo": {
          "elapsed": 47155,
          "status": "ok",
          "timestamp": 1718960517928,
          "user": {
            "displayName": "Maarten Grootendorst",
            "userId": "11015108362723620659"
          },
          "user_tz": -120
        },
        "id": "BwIvt6jSByAF",
        "outputId": "e71c4141-2ca3-488a-fdfb-8d9357af0125"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "6.66 s ┬▒ 2.22 s per loop (mean ┬▒ std. dev. of 7 runs, 1 loop each)\n"
          ]
        }
      ],
      "source": [
        "%%timeit -n 1\n",
        "# Generate the text\n",
        "generation_output = model.generate(\n",
        "  input_ids=input_ids,\n",
        "  max_new_tokens=100,\n",
        "  use_cache=True\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "executionInfo": {
          "elapsed": 152674,
          "status": "ok",
          "timestamp": 1718960670601,
          "user": {
            "displayName": "Maarten Grootendorst",
            "userId": "11015108362723620659"
          },
          "user_tz": -120
        },
        "id": "dFb1dcvJByCW",
        "outputId": "0aba6a01-9bc7-40b7-e2e1-e064f13b4c88"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "21.9 s ┬▒ 94.6 ms per loop (mean ┬▒ std. dev. of 7 runs, 1 loop each)\n"
          ]
        }
      ],
      "source": [
        "%%timeit -n 1\n",
        "# Generate the text\n",
        "generation_output = model.generate(\n",
        "  input_ids=input_ids,\n",
        "  max_new_tokens=100,\n",
        "  use_cache=False\n",
        ")"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "gpuType": "T4",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.14"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}