Files
Hands-On-Large-Language-Models/chapter03/Chapter 3 - Looking Inside LLMs.ipynb
2025-04-10 16:07:56 +02:00

897 lines
64 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "adFzzFsB-Ofl"
},
"source": [
"<h1>Chapter 3 - Looking Inside Transformer LLMs</h1>\n",
"<i>An extensive look into the transformer architecture of generative LLMs</i>\n",
"\n",
"<a href=\"https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961\"><img src=\"https://img.shields.io/badge/Buy%20the%20Book!-grey?logo=amazon\"></a>\n",
"<a href=\"https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/\"><img src=\"https://img.shields.io/badge/O'Reilly-white.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB3aWR0aD0iMzQiIGhlaWdodD0iMjciIHZpZXdCb3g9IjAgMCAzNCAyNyIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGNpcmNsZSBjeD0iMTMiIGN5PSIxNCIgcj0iMTEiIHN0cm9rZT0iI0Q0MDEwMSIgc3Ryb2tlLXdpZHRoPSI0Ii8+CjxjaXJjbGUgY3g9IjMwLjUiIGN5PSIzLjUiIHI9IjMuNSIgZmlsbD0iI0Q0MDEwMSIvPgo8L3N2Zz4K\"></a>\n",
"<a href=\"https://github.com/HandsOnLLM/Hands-On-Large-Language-Models\"><img src=\"https://img.shields.io/badge/GitHub%20Repository-black?logo=github\"></a>\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter03/Chapter%203%20-%20Looking%20Inside%20LLMs.ipynb)\n",
"\n",
"---\n",
"\n",
"This notebook is for Chapter 3 of the [Hands-On Large Language Models](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961) book by [Jay Alammar](https://www.linkedin.com/in/jalammar) and [Maarten Grootendorst](https://www.linkedin.com/in/mgrootendorst/).\n",
"\n",
"---\n",
"\n",
"<a href=\"https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961\">\n",
"<img src=\"https://raw.githubusercontent.com/HandsOnLLM/Hands-On-Large-Language-Models/main/images/book_cover.png\" width=\"350\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### [OPTIONAL] - Installing Packages on <img src=\"https://colab.google/static/images/icons/colab.png\" width=100>\n",
"\n",
"If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:\n",
"\n",
"---\n",
"\n",
"💡 **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to\n",
"**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.\n",
"\n",
"---\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# %%capture\n",
"# !pip install transformers>=4.41.2 accelerate>=0.31.0"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "W_23Z_do-faF"
},
"source": [
"# Loading the LLM"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 759,
"referenced_widgets": [
"5ecafba0f8f04685a56d2b1495baea24",
"53945d03a26044878ddb7fc6eadfd8db",
"99db408011ea43c79adbd4a880839484",
"fd181746067945febc55f94e2dcf6f67",
"c5acd3e22bbe4a1897f7a12051e8eae9",
"446d084fde5f422a8c0525e8c5b47f93",
"6a0dabe02c874ecabd257c5da5f4a7c5",
"1ee8996717864f79af6cf0314cd27c59",
"7c719ce3694845b384ad4ae7207d31cc",
"989fc06dfb25420eaf155bfc0174c692",
"443850bc37d94a90aa037e73a77f9369",
"184bbd6daf424ddeba7f8cbe0b5b34d9",
"a6ee07658a234a54b38d849ff2017d6c",
"bf63e0a23fa147778dd88b48075611e3",
"acf8b6868c484885871964cc538359c6",
"32cc3b668de84e8a9f8cde25d3846822",
"b1115e5097c4404694e5a7c228e25234",
"6ca20198bc144510a14b795aaf813940",
"982a35743f51448cba583e54e7d3987d",
"17250e2791ba42099c50efa594e229ce",
"a4c438b7029f47d9a54d5a0ce16541c0",
"fafd3d7714d4466ea00f354b20dc954a",
"0a18c83d2645496797d74aef5e84dafa",
"bfbbea9912104cb1803c7c827c1cea7d",
"18c23f29824e4cb58e461bf89811a32a",
"74105769e6a541098769921b84d1f1cd",
"52109f76852e4388b6df4e5441c43a48",
"4f39386aea7c4ac9abf2d66291cbcb4e",
"4893546c2d6f4eec9cb5b7f6853013f0",
"88616761593a4b2b9365294427a3136d",
"d3ab3ad192f54d5b9f9e08db62884f0c",
"6be54d3b31864134bc36d3b9d997530a",
"27a1df02460948e19cbd40b69a89bec6",
"6b81bad9c639454980de2a67b414b988",
"1e8224a73a724058b28a13db2c2197c2",
"3cbbd1ae4e4d4bf1a34c9d960ee7d26d",
"e9c0309e4af44726b180c977692e2469",
"97421a19ce25438e846a84b45321f9d0",
"c6aa0ad56ce44b1eab1356d3bc56706b",
"87d0758fdaaa4d90b21b1291f4d20039",
"8c85660c9e9f4027bd64d596160a7d7f",
"fce8b8595211452f8a759a6e98410f6a",
"4249df3055d4427493e2a2e775dc1a93",
"e42e39002e66410b9d714656a632adea",
"3cf79ac9541c4b7bbf686b604ff73b81",
"b709537463c3485dbbcef93e3636af2d",
"cf7c24d7629b4362ae0905f8e4bbd997",
"d9f9f98cb9dd465daed1da06d7d4084b",
"cb20a4ccd9ea4548bc7f694068bb30f6",
"b91d534a7a5c4863a9e858cedbda9fd5",
"a65601c4918c4c2cb2020837cd1e1f85",
"99505d538b894a359bc3791cd95423d5",
"1a48245eddfc4923a5f267cd799ba9c7",
"6c50ddb227b4421da5cd391e4d6ec94c",
"7d62b5c9c90644378b0ca96cee430419",
"9ec56d025f8446a08184b055fe11598e",
"54894d44acce4fe6b0ef749d5c02e3cd",
"e2dd956536a0407fb9b9c4a01c01ba9c",
"54370883565644c5a2529e092db7f259",
"7ec24f1bff5f4ca78d7c36a637cfa294",
"36ff492b5c7c4ff3bcf72cc574d03a38",
"8279cee867884166bb09df4e02635e2a",
"a2644dee82b14cfbab069a484f3841e2",
"e761a9afe1b847579fb51eb0eddd4488",
"c59d9e04e5964a8891cc3ddce33d6f86",
"a9007e7552ab4634ac44577279b242ac",
"2cff15834149418c81eee5239a5e275a",
"6623f077d95f4faf883fa2ca4397169d",
"98529c941229460da8563b94d3419c37",
"3bef1bf002c94340a3323592d616e7ac",
"47c8fd6b7c9844d983014e07f1999cd2",
"aa24ef9c6e1d49708b7bb4755a9adb78",
"28261270cd0449f8ac85dc4b0efdac57",
"a231962de8584f69a6a107c275d1cda5",
"2d35dfe75a744987ab210f5fb0118301",
"1cbaa3bcf20b4af099f6d4310dd071dd",
"4f7088db853e47e6b6a6ccd654e379dc",
"7e0710dc5b5c4002b4bcc189bf5514cf",
"66dd13ca8234409eab15ecbf7a009914",
"8f021e2bfb7246248baa49b08f4d3358",
"0a91788829fc49ef95a69efd3256a8e3",
"b42a56f9b2e44b6992413e023ed44b0e",
"f9fc98e3d8ed4338bd763d152f8cc5f9",
"f786e0117562476697092ef828ceb1b2",
"07fed042c0894ca5aebe717eca6f3018",
"3bea08fef3ef456e9f180d8a23ded5bc",
"35f3727f37c44640986d8416141f8069",
"0dd1ce8a2306431c9b0412e5992f4f84",
"dcea31213c9f421abc7ffabc3499ecb9",
"61643fa2fcb54ebaad8da5106dec9ea0",
"0c4f8c58d213493494120c070d86ac76",
"e15e9978db794891b9fa0d8ce096c983",
"b15098d69b5f418da3f81aba8fb79de0",
"f7cf40042a2e4c8cb8b87444893d8ec1",
"6dcbc6ec46d44a55a1b38267201926b8",
"6de4dd6e36b444638b2f65e0ca80bc9a",
"60febb04a12447c192e1b8eb2aa5ba28",
"6d200e6b1bdc4e918093670df8e37dc8",
"934e950a3c0c48f08802d551ee1bd429",
"34058533e3cf46a88db8927372102b9f",
"2a7ba0f87814436386e66f4ef7f1111d",
"d3e41286c5b747a8bb5cf326f9f80ad3",
"fe92dc2d5e8c449bbb36dafbd6c9935f",
"615d4ec0fb194688a392b71b99cf3621",
"e2056e3dba884242807a98b9b3837843",
"6ca50fb814b64fccaf0e1c6c11d8f4d8",
"9ec1d173921748b2af19a3a21df9ed40",
"f8c3d566e98d47239ef2b823544b75a5",
"3618dcec35a740c485ecafa5589e0c91",
"f66dd730f3364e35974a3918d12ff51d",
"dcb9240335394bfa8d3949ef1cdbcdf8",
"2df88b2fd5e242eca8b1d3f6cea1349b",
"1b103c69baf74fceb551ebcb5a0ac5e8",
"6847b7b6b3854d6e9b72f40040f84c8f",
"ac3e67f03f604883ab4787930cd316f3",
"6d7e012a7fbf4d788054ead5020e9314",
"a432e2e32c7c4e56b138ebaefad76c93",
"352fe4ea215240149d73478e34cd9b66",
"bf9bc31d7f99477bb391f69df41b8dbe",
"41b8e463309e4013a50268677f44d4b5",
"989326b74cd146e3b5b2f2d5f19bdf41",
"36b7269d8eb849b084694fd1f3b177b9",
"2d09f9ed15fa452ba8d9ce9aba9f61ac",
"53818082d62749a28552a1eebd304d88",
"766f67cb6a8a4a58965edb671ce624e8",
"a86d22308dbf4c11bf3d6f6515aef561",
"e9e7d944715b402ab149f86862b92259",
"4aaa732bb1b94b4895ca3f00f93cd762",
"9fb02b3bbe79434a93f32291c208aaad",
"dd9b3a5e84ba44cb9717ded470c258b4",
"fa89524d446b480aa50d203d01ec7bb7",
"c8b77256d5fc436fbfdcc150843a6b5b",
"f5f5b592768048169676e09cca453645",
"77f40b8bf30c437ba987b71178d0e9f6",
"3f39ec300bd84852a2388dadaafd8c4b",
"250cac43e6da47dd8d732ea57d8c50ec",
"f929d12aad68458b98c21e0669da3d8e",
"44ddbcadcc4c477c80daf278122de46d",
"eb606db4125e4eb097d5b7d3cdb90976",
"b6c177de60b54edd887d1ca983ea7546",
"6db56d7c52244a3984a0638e060a81cb",
"11d17ae63dd44ecc8813d482ee17dd95",
"ff3733c6a1f34580b037e296e3abed6b",
"2d31d51641e945f695f7315b68e0ad2e",
"1b694930328e46bd9e0d61063b9141d4",
"b544b6f2c2bb4f36bf9a983005a8bdb8",
"afb8b0e602b649fcb92634c8aab4caf7",
"24fc703c916f43aab5900288b8aa5aca",
"16cffa93ab234718a8ede1044596e8b2",
"2d7999217424413d988cd29d41ed5ace",
"0d1546581c90418fa1cfc37491339134",
"0e233853a3b74211a0b65dcdd001feed",
"a1d2163af40a4aaab8a765b148845807",
"2aaa722b303b4e1f823cee828fc7958c"
]
},
"executionInfo": {
"elapsed": 130259,
"status": "ok",
"timestamp": 1718959891215,
"user": {
"displayName": "Maarten Grootendorst",
"userId": "11015108362723620659"
},
"user_tz": -120
},
"id": "-5RLd6dI-Ytm",
"outputId": "fb085ff7-e06f-4142-8e95-5ff98b212e37"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: \n",
"The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
"To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
"You will be able to reuse this secret in all of your notebooks.\n",
"Please note that authentication is recommended but still optional to access public models or datasets.\n",
" warnings.warn(\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "5ecafba0f8f04685a56d2b1495baea24",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"tokenizer_config.json: 0%| | 0.00/3.17k [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "184bbd6daf424ddeba7f8cbe0b5b34d9",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"tokenizer.model: 0%| | 0.00/500k [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "0a18c83d2645496797d74aef5e84dafa",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"tokenizer.json: 0%| | 0.00/1.84M [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "6b81bad9c639454980de2a67b414b988",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"added_tokens.json: 0%| | 0.00/293 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "3cf79ac9541c4b7bbf686b604ff73b81",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"special_tokens_map.json: 0%| | 0.00/568 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "9ec56d025f8446a08184b055fe11598e",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"config.json: 0%| | 0.00/931 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "2cff15834149418c81eee5239a5e275a",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"configuration_phi3.py: 0%| | 0.00/10.4k [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:\n",
"- configuration_phi3.py\n",
". Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "7e0710dc5b5c4002b4bcc189bf5514cf",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"modeling_phi3.py: 0%| | 0.00/73.8k [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:\n",
"- modeling_phi3.py\n",
". Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.\n",
"WARNING:transformers_modules.microsoft.Phi-3-mini-4k-instruct.ff07dc01615f8113924aed013115ab2abd32115b.modeling_phi3:`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.\n",
"WARNING:transformers_modules.microsoft.Phi-3-mini-4k-instruct.ff07dc01615f8113924aed013115ab2abd32115b.modeling_phi3:Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "dcea31213c9f421abc7ffabc3499ecb9",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"model.safetensors.index.json: 0%| | 0.00/16.3k [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "34058533e3cf46a88db8927372102b9f",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading shards: 0%| | 0/2 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "dcb9240335394bfa8d3949ef1cdbcdf8",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"model-00001-of-00002.safetensors: 0%| | 0.00/4.97G [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "36b7269d8eb849b084694fd1f3b177b9",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"model-00002-of-00002.safetensors: 0%| | 0.00/2.67G [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "f5f5b592768048169676e09cca453645",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "2d31d51641e945f695f7315b68e0ad2e",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"generation_config.json: 0%| | 0.00/172 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline\n",
"\n",
"# Load model and tokenizer\n",
"tokenizer = AutoTokenizer.from_pretrained(\"microsoft/Phi-3-mini-4k-instruct\")\n",
"\n",
"model = AutoModelForCausalLM.from_pretrained(\n",
" \"microsoft/Phi-3-mini-4k-instruct\",\n",
" device_map=\"cuda\",\n",
" torch_dtype=\"auto\",\n",
" trust_remote_code=False,\n",
")\n",
"\n",
"# Create a pipeline\n",
"generator = pipeline(\n",
" \"text-generation\",\n",
" model=model,\n",
" tokenizer=tokenizer,\n",
" return_full_text=False,\n",
" max_new_tokens=50,\n",
" do_sample=False,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "REqcz-ID_XgV"
},
"source": [
"# The Inputs and Outputs of a Trained Transformer LLM\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 4955,
"status": "ok",
"timestamp": 1718959896168,
"user": {
"displayName": "Maarten Grootendorst",
"userId": "11015108362723620659"
},
"user_tz": -120
},
"id": "17h6TPHluJ-i",
"outputId": "18727eeb-ccd6-40f8-aab1-25c8d9a03cbe"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:transformers_modules.microsoft.Phi-3-mini-4k-instruct.ff07dc01615f8113924aed013115ab2abd32115b.modeling_phi3:You are not running the flash-attention implementation, expect numerical differences.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\n",
"Solution 1:\n",
"\n",
"Subject: My Sincere Apologies for the Gardening Mishap\n",
"\n",
"\n",
"Dear Sarah,\n",
"\n",
"\n",
"I hope this message finds you well. I am writing to express my deep\n"
]
}
],
"source": [
"prompt = \"Write an email apologizing to Sarah for the tragic gardening mishap. Explain how it happened.\"\n",
"\n",
"output = generator(prompt)\n",
"\n",
"print(output[0]['generated_text'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 1,
"status": "ok",
"timestamp": 1718959898745,
"user": {
"displayName": "Maarten Grootendorst",
"userId": "11015108362723620659"
},
"user_tz": -120
},
"id": "eoFkdTd6_g5o",
"outputId": "bdcfde9f-28b7-4f43-ec0c-32c16677a776"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Phi3ForCausalLM(\n",
" (model): Phi3Model(\n",
" (embed_tokens): Embedding(32064, 3072, padding_idx=32000)\n",
" (embed_dropout): Dropout(p=0.0, inplace=False)\n",
" (layers): ModuleList(\n",
" (0-31): 32 x Phi3DecoderLayer(\n",
" (self_attn): Phi3Attention(\n",
" (o_proj): Linear(in_features=3072, out_features=3072, bias=False)\n",
" (qkv_proj): Linear(in_features=3072, out_features=9216, bias=False)\n",
" (rotary_emb): Phi3RotaryEmbedding()\n",
" )\n",
" (mlp): Phi3MLP(\n",
" (gate_up_proj): Linear(in_features=3072, out_features=16384, bias=False)\n",
" (down_proj): Linear(in_features=8192, out_features=3072, bias=False)\n",
" (activation_fn): SiLU()\n",
" )\n",
" (input_layernorm): Phi3RMSNorm()\n",
" (resid_attn_dropout): Dropout(p=0.0, inplace=False)\n",
" (resid_mlp_dropout): Dropout(p=0.0, inplace=False)\n",
" (post_attention_layernorm): Phi3RMSNorm()\n",
" )\n",
" )\n",
" (norm): Phi3RMSNorm()\n",
" )\n",
" (lm_head): Linear(in_features=3072, out_features=32064, bias=False)\n",
")\n"
]
}
],
"source": [
"print(model)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RTrwzB67BYVY"
},
"source": [
"# Choosing a single token from the probability distribution (sampling / decoding)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "sEcxYgJxBYbJ"
},
"outputs": [],
"source": [
"prompt = \"The capital of France is\"\n",
"\n",
"# Tokenize the input prompt\n",
"input_ids = tokenizer(prompt, return_tensors=\"pt\").input_ids\n",
"\n",
"# Tokenize the input prompt\n",
"input_ids = input_ids.to(\"cuda\")\n",
"\n",
"# Get the output of the model before the lm_head\n",
"model_output = model.model(input_ids)\n",
"\n",
"# Get the output of the lm_head\n",
"lm_head_output = model.lm_head(model_output[0])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 36
},
"executionInfo": {
"elapsed": 421,
"status": "ok",
"timestamp": 1718960391623,
"user": {
"displayName": "Maarten Grootendorst",
"userId": "11015108362723620659"
},
"user_tz": -120
},
"id": "68YUSS4GBf9Q",
"outputId": "2dc25e8d-03b6-4bca-b46c-fec3e3a4a492"
},
"outputs": [
{
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
},
"text/plain": [
"'Paris'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"token_id = lm_head_output[0,-1].argmax(-1)\n",
"tokenizer.decode(token_id)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 901,
"status": "ok",
"timestamp": 1718960415287,
"user": {
"displayName": "Maarten Grootendorst",
"userId": "11015108362723620659"
},
"user_tz": -120
},
"id": "cWWrfC5oBjwp",
"outputId": "c2fdeab7-e787-466f-88f4-988cd5f939a6"
},
"outputs": [
{
"data": {
"text/plain": [
"torch.Size([1, 6, 3072])"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_output[0].shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 1079,
"status": "ok",
"timestamp": 1718960424560,
"user": {
"displayName": "Maarten Grootendorst",
"userId": "11015108362723620659"
},
"user_tz": -120
},
"id": "nC1PdOnTBnxZ",
"outputId": "1fd5f482-7046-4536-b745-4e681d6ecdaf"
},
"outputs": [
{
"data": {
"text/plain": [
"torch.Size([1, 6, 32064])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lm_head_output.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Of2_rP4QBqrZ"
},
"source": [
"# Speeding up generation by caching keys and values\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "B0n6JhNHBrin"
},
"outputs": [],
"source": [
"prompt = \"Write a very long email apologizing to Sarah for the tragic gardening mishap. Explain how it happened.\"\n",
"\n",
"# Tokenize the input prompt\n",
"input_ids = tokenizer(prompt, return_tensors=\"pt\").input_ids\n",
"input_ids = input_ids.to(\"cuda\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 47155,
"status": "ok",
"timestamp": 1718960517928,
"user": {
"displayName": "Maarten Grootendorst",
"userId": "11015108362723620659"
},
"user_tz": -120
},
"id": "BwIvt6jSByAF",
"outputId": "e71c4141-2ca3-488a-fdfb-8d9357af0125"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"6.66 s ┬▒ 2.22 s per loop (mean ┬▒ std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit -n 1\n",
"# Generate the text\n",
"generation_output = model.generate(\n",
" input_ids=input_ids,\n",
" max_new_tokens=100,\n",
" use_cache=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 152674,
"status": "ok",
"timestamp": 1718960670601,
"user": {
"displayName": "Maarten Grootendorst",
"userId": "11015108362723620659"
},
"user_tz": -120
},
"id": "dFb1dcvJByCW",
"outputId": "0aba6a01-9bc7-40b7-e2e1-e064f13b4c88"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"21.9 s ┬▒ 94.6 ms per loop (mean ┬▒ std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit -n 1\n",
"# Generate the text\n",
"generation_output = model.generate(\n",
" input_ids=input_ids,\n",
" max_new_tokens=100,\n",
" use_cache=False\n",
")"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "T4",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
}
},
"nbformat": 4,
"nbformat_minor": 4
}