Transformer fundamentals
 
Loading...
Searching...
No Matches
bigram.Bigram.BigramLanguageModel Class Reference
Inheritance diagram for bigram.Bigram.BigramLanguageModel:

Public Member Functions

 __init__ (self, vocab_size)
 constructor for BigramLanguageModel that calls super, and then creates a new embedding table of vocab_size by vocab_size
 
 forward (self, idx, targets=None)
 goes "ahead" by getting our next set of logits and then returns the logits and loss
 
 generate (self, idx, max_new_tokens)
 this is the generator of the bigram model, where the model creates tokens
 

Public Attributes

 token_embedding_table = nn.Embedding(vocab_size, n_embd)
 
 position_embedding_table = nn.Embedding(block_size, n_embd)
 
 lm_head = nn.Linear(n_embd, vocab_size)
 

Detailed Description

Definition at line 152 of file Bigram.py.

Constructor & Destructor Documentation

◆ __init__()

bigram.Bigram.BigramLanguageModel.__init__ ( self,
vocab_size )

constructor for BigramLanguageModel that calls super, and then creates a new embedding table of vocab_size by vocab_size

Parameters
selfthe current BigramLanguageModel object
vocab_sizethe length of the vocab_size
Returns
new BigramLanguageModel

Definition at line 154 of file Bigram.py.

154 def __init__(self, vocab_size):
155 """
156 @brief constructor for BigramLanguageModel that calls super, and then
157 creates a new embedding table of vocab_size by vocab_size
158
159 @param self: the current BigramLanguageModel object
160 @param vocab_size: the length of the vocab_size
161 @return new BigramLanguageModel
162 """
163 super().__init__()
164 # each token directly reads off the logits for hte next token from a tookup table
165 self.token_embedding_table = nn.Embedding(vocab_size, n_embd)
166 self.position_embedding_table = nn.Embedding(block_size, n_embd)
167 self.lm_head = nn.Linear(n_embd, vocab_size)
168

References __init__().

Referenced by __init__().

Member Function Documentation

◆ forward()

bigram.Bigram.BigramLanguageModel.forward ( self,
idx,
targets = None )

goes "ahead" by getting our next set of logits and then returns the logits and loss

Parameters
selfthe current BigramLanguageModel object
idxthe current context batch
targetsthe tokens that are actually next
Returns
logits, loss: Tuple(logits, loss), which are the logits, and the loss under cross_entropy

Definition at line 169 of file Bigram.py.

169 def forward(self, idx, targets=None):
170 """
171 @brief goes "ahead" by getting our next set of logits and then returns
172 the logits and loss
173
174 @param self: the current BigramLanguageModel object
175 @param idx: the current context batch
176 @param targets: the tokens that are actually next
177 @return logits, loss: Tuple(logits, loss), which are the logits, and
178 the loss under cross_entropy
179 """
180
181 # idx and targets are both (B, T) tensor of integers
182 tok_emb = self.token_embedding_table(idx) # (B, T, C)
183 logits = self.lm_head(tok_emb) # (B, T, vocab_size)
184
185 # negative log likelyhood loss (cross-entropy) how close are logits
186 # to target
187 if targets == None:
188 loss = None
189 else:
190 B, T, C = logits.shape
191 logits = logits.view(B * T, C)
192 targets = targets.view(B * T)
193 loss = F.cross_entropy(logits, targets)
194
195 return logits, loss
196

References lm_head, and token_embedding_table.

◆ generate()

bigram.Bigram.BigramLanguageModel.generate ( self,
idx,
max_new_tokens )

this is the generator of the bigram model, where the model creates tokens

Idx is the current context of characters in the current batch. And we expent all the B dimension in the T dimension. What is predicted along the T dimension. We take the current idx, focus on the last element in the T dimension. These are then converted to probabilities using softmax, and the sampled using multinomial. We then take the sampled integers, and concat them

Parameters
selfthe current BigramLanguageModel object
idxthe current context batch (which is dim(B by T))
max_new_tokensthe amount of new tokens we want to generate
Returns
idx: modified idx with the new generated tokens appended

Definition at line 197 of file Bigram.py.

197 def generate(self, idx, max_new_tokens):
198 """
199 @brief this is the generator of the bigram model, where the model creates tokens
200
201 Idx is the current context of characters in the current batch. And we
202 expent all the B dimension in the T dimension. What is predicted along
203 the T dimension.
204 We take the current idx, focus on the last element in the T dimension.
205 These are then converted to probabilities using softmax, and the sampled
206 using multinomial. We then take the sampled integers, and concat them
207
208 @param self: the current BigramLanguageModel object
209 @param idx: the current context batch (which is dim(B by T))
210 @param max_new_tokens: the amount of new tokens we want to generate
211 @return idx: modified idx with the new generated tokens appended
212 """
213 # idx is (B, T) array of incixes in the current context
214 for _ in range(max_new_tokens):
215 # get the predictions
216 logits, loss = self(idx)
217 # focus only on the last time step
218 logits = logits[:, -1, :] # now dim(B by C)
219 # apply the softmax to get probabilities
220 probs = F.softmax(logits, dim=-1) # dim(B by C)
221 # sample from the distribution
222 idx_next = torch.multinomial(probs, num_samples=1) # (B, 1)
223 # append sampled index to the running sequence
224 idx = torch.cat((idx, idx_next), dim=1) # dim(B by T-1)
225 return idx
226
227

Member Data Documentation

◆ lm_head

bigram.Bigram.BigramLanguageModel.lm_head = nn.Linear(n_embd, vocab_size)

Definition at line 167 of file Bigram.py.

Referenced by forward(), and working_gpt.GPTLanguageModel.forward().

◆ position_embedding_table

bigram.Bigram.BigramLanguageModel.position_embedding_table = nn.Embedding(block_size, n_embd)

Definition at line 166 of file Bigram.py.

Referenced by working_gpt.GPTLanguageModel.forward().

◆ token_embedding_table

bigram.Bigram.BigramLanguageModel.token_embedding_table = nn.Embedding(vocab_size, n_embd)

Definition at line 165 of file Bigram.py.

Referenced by forward(), and working_gpt.GPTLanguageModel.forward().


The documentation for this class was generated from the following file: