Transformer fundamentals
 
Loading...
Searching...
No Matches
bigram.Bigram Namespace Reference

Classes

class  BigramLanguageModel
 

Functions

 main ()
 
 get_batch (split, train_data, val_data)
 new version of get_batch that allows for model to be loaded to gpus
 
 estimate_loss (model, train_data, val_data)
 

Variables

int batch_size = 256
 
int block_size = 8
 
int max_iters = 100000
 
int eval_interval = 100
 
int learning_rate = 1e-2
 
str device = "cpu"
 
int eval_iters = 200
 
int n_embd = 32
 

Function Documentation

◆ estimate_loss()

bigram.Bigram.estimate_loss ( model,
train_data,
val_data )

Definition at line 138 of file Bigram.py.

138def estimate_loss(model, train_data, val_data):
139 out = {}
140 model.eval()
141 for split in ["train", "val"]:
142 losses = torch.zeros(eval_iters)
143 for k in range(eval_iters):
144 X, Y = get_batch(split, train_data, val_data)
145 logits, loss = model(X, Y)
146 losses[k] = loss.item()
147 out[split] = losses.mean()
148 model.train()
149 return out
150
151

References get_batch().

Referenced by main().

◆ get_batch()

bigram.Bigram.get_batch ( split,
train_data,
val_data )

new version of get_batch that allows for model to be loaded to gpus

The previous version of get_batch in the Tensor_Prac doesn't have the ability for the GPU to actualy be used, even if the device is present. So this new version will take advantage of that hardware.

Parameters
splitstring either 'train' or 'val' which is used to pick data to batch with
train_datathe data that is split for training
val_datathe data that is split for validation
Returns
Tuple(Tensor of inputs, Tensor or targets)

Definition at line 114 of file Bigram.py.

114def get_batch(split, train_data, val_data):
115 """
116 @brief new version of get_batch that allows for model to be loaded to gpus
117
118 The previous version of get_batch in the Tensor_Prac doesn't have the ability
119 for the GPU to actualy be used, even if the device is present. So this new
120 version will take advantage of that hardware.
121
122 @param split: string either 'train' or 'val' which is used to pick data to batch with
123 @param train_data: the data that is split for training
124 @param val_data: the data that is split for validation
125 @return Tuple(Tensor of inputs, Tensor or targets)
126 """
127 # generate a small batch of data of inputs x and targets y
128 data = train_data if split == "train" else val_data
129 ix = torch.randint(len(data) - block_size, (batch_size,))
130 x = torch.stack([data[i : i + block_size] for i in ix])
131 y = torch.stack([data[i + 1 : i + block_size + 1] for i in ix])
132 # IMPORTANT: This is the part that is needed for the GPU
133 x, y = x.to(device), y.to(device)
134 return x, y
135
136
137@torch.no_grad()

Referenced by estimate_loss(), and main().

◆ main()

bigram.Bigram.main ( )

Definition at line 55 of file Bigram.py.

55def main():
56 torch.manual_seed(1337)
57 input_path = os.path.abspath(
58 os.path.join(os.path.dirname(__file__), "..", "input.txt")
59 )
60 with open(input_path, "r", encoding="utf-8") as f:
61 text = f.read()
62 chars = sorted(list(set(text)))
63 vocab_size = len(chars)
64
65 # create a mapping from characters to integers
66 stoi = {ch: i for i, ch in enumerate(chars)}
67 # create the reverse mapping taking an integer to a string
68 itos = {i: ch for i, ch in enumerate(chars)}
69
70 encode = lambda s: [stoi[c] for c in s]
71 decode = lambda l: "".join([itos[i] for i in l])
72
73 # create a tensor representing the encoding of all the text
74 data = torch.tensor(encode(text), dtype=torch.long)
75
76 train_data, val_data = tp.train_val_split(data)
77 input, targs = get_batch("train", train_data, val_data)
78
79 model = BigramLanguageModel(vocab_size)
80 # IMPORTANT: Need to load to something like a GPU using .to(device) on the model
81 m = model.to(device)
82
83 context = torch.zeros((1, 1), dtype=torch.long, device=device)
84 # before training the model generating tokens will split out complete
85 # nonsense since the weights are random
86 print("Before training 100 generated tokens")
87 print(decode(m.generate(context, max_new_tokens=1000)[0].tolist()))
88
89 # TRAINING WORK
90 print("Training loss values")
91 optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)
92 for step in range(max_iters):
93
94 # every so often need to evaluate the loss and train data set
95 if step % eval_interval == 0:
96 losses = estimate_loss(model, train_data, val_data)
97 print(
98 f"step {step}: train loss {losses['train']:.4f}, val loss{losses['val']:.4f}"
99 )
100
101 # sample batch of data
102 input, targs = get_batch("train", train_data, val_data)
103
104 # evaluate the loss
105 logits, loss = model(input, targs)
106 optimizer.zero_grad(set_to_none=True)
107 loss.backward()
108 optimizer.step()
109
110 print("After training 100 generated tokens")
111 print(decode(m.generate(context, max_new_tokens=1000)[0].tolist()))
112
113
Definition main.py:1

References estimate_loss(), and get_batch().

Variable Documentation

◆ batch_size

int bigram.Bigram.batch_size = 256

Definition at line 14 of file Bigram.py.

◆ block_size

int bigram.Bigram.block_size = 8

Definition at line 15 of file Bigram.py.

◆ device

str bigram.Bigram.device = "cpu"

Definition at line 32 of file Bigram.py.

◆ eval_interval

int bigram.Bigram.eval_interval = 100

Definition at line 17 of file Bigram.py.

◆ eval_iters

int bigram.Bigram.eval_iters = 200

Definition at line 34 of file Bigram.py.

◆ learning_rate

int bigram.Bigram.learning_rate = 1e-2

Definition at line 18 of file Bigram.py.

◆ max_iters

int bigram.Bigram.max_iters = 100000

Definition at line 16 of file Bigram.py.

◆ n_embd

int bigram.Bigram.n_embd = 32

Definition at line 35 of file Bigram.py.