Classes
class	BigramLanguageModel

Functions
	main ()

	get_batch (split, train_data, val_data)
	new version of get_batch that allows for model to be loaded to gpus

	estimate_loss (model, train_data, val_data)

Variables
int	batch_size = 256

int	block_size = 8

int	max_iters = 100000

int	eval_interval = 100

int	learning_rate = 1e-2

str	device = "cpu"

int	eval_iters = 200

int	n_embd = 32

Function Documentation

◆ estimate_loss()

bigram.Bigram.estimate_loss	(	model,
		train_data,
		val_data )

Definition at line 138 of file Bigram.py.

def estimate_loss(model, train_data, val_data):
    out = {}
    model.eval()
    for split in ["train", "val"]:
        losses = torch.zeros(eval_iters)
        for k in range(eval_iters):
            X, Y = get_batch(split, train_data, val_data)
            logits, loss = model(X, Y)
            losses[k] = loss.item()
        out[split] = losses.mean()
    model.train()
    return out
 
 

References get_batch().

Referenced by main().

◆ get_batch()

bigram.Bigram.get_batch	(	split,
		train_data,
		val_data )

new version of get_batch that allows for model to be loaded to gpus

The previous version of get_batch in the Tensor_Prac doesn't have the ability for the GPU to actualy be used, even if the device is present. So this new version will take advantage of that hardware.

Parameters

split	string either 'train' or 'val' which is used to pick data to batch with
train_data	the data that is split for training
val_data	the data that is split for validation

Returns: Tuple(Tensor of inputs, Tensor or targets)

Definition at line 114 of file Bigram.py.

def get_batch(split, train_data, val_data):
    """
    @brief new version of get_batch that allows for model to be loaded to gpus
 
    The previous version of get_batch in the Tensor_Prac doesn't have the ability
    for the GPU to actualy be used, even if the device is present. So this new
    version will take advantage of that hardware.
 
    @param split: string either 'train' or 'val' which is used to pick data to batch with
    @param train_data: the data that is split for training
    @param val_data: the data that is split for validation
    @return Tuple(Tensor of inputs, Tensor or targets)
    """
    # generate a small batch of data of inputs x and targets y
    data = train_data if split == "train" else val_data
    ix = torch.randint(len(data) - block_size, (batch_size,))
    x = torch.stack([data[i : i + block_size] for i in ix])
    y = torch.stack([data[i + 1 : i + block_size + 1] for i in ix])
    # IMPORTANT: This is the part that is needed for the GPU
    x, y = x.to(device), y.to(device)
    return x, y
 
 
@torch.no_grad()

Referenced by estimate_loss(), and main().

◆ main()

bigram.Bigram.main ( )

Definition at line 55 of file Bigram.py.

def main():
    torch.manual_seed(1337)
    input_path = os.path.abspath(
        os.path.join(os.path.dirname(__file__), "..", "input.txt")
    )
    with open(input_path, "r", encoding="utf-8") as f:
        text = f.read()
    chars = sorted(list(set(text)))
    vocab_size = len(chars)
 
    # create a mapping from characters to integers
    stoi = {ch: i for i, ch in enumerate(chars)}
    # create the reverse mapping taking an integer to a string
    itos = {i: ch for i, ch in enumerate(chars)}
 
    encode = lambda s: [stoi[c] for c in s]
    decode = lambda l: "".join([itos[i] for i in l])
 
    # create a tensor representing the encoding of all the text
    data = torch.tensor(encode(text), dtype=torch.long)
 
    train_data, val_data = tp.train_val_split(data)
    input, targs = get_batch("train", train_data, val_data)
 
    model = BigramLanguageModel(vocab_size)
    # IMPORTANT: Need to load to something like a GPU using .to(device) on the model
    m = model.to(device)
 
    context = torch.zeros((1, 1), dtype=torch.long, device=device)
    # before training the model generating tokens will split out complete
    # nonsense since the weights are random
    print("Before training 100 generated tokens")
    print(decode(m.generate(context, max_new_tokens=1000)[0].tolist()))
 
    # TRAINING WORK
    print("Training loss values")
    optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)
    for step in range(max_iters):
 
        # every so often need to evaluate the loss and train data set
        if step % eval_interval == 0:
            losses = estimate_loss(model, train_data, val_data)
            print(
                f"step {step}: train loss {losses['train']:.4f}, val loss{losses['val']:.4f}"
            )
 
        # sample batch of data
        input, targs = get_batch("train", train_data, val_data)
 
        # evaluate the loss
        logits, loss = model(input, targs)
        optimizer.zero_grad(set_to_none=True)
        loss.backward()
        optimizer.step()
 
    print("After training 100 generated tokens")
    print(decode(m.generate(context, max_new_tokens=1000)[0].tolist()))
 
 

References estimate_loss(), and get_batch().

Variable Documentation

◆ batch_size

int bigram.Bigram.batch_size = 256

Definition at line 14 of file Bigram.py.

◆ block_size

int bigram.Bigram.block_size = 8

Definition at line 15 of file Bigram.py.

◆ device

str bigram.Bigram.device = "cpu"

Definition at line 32 of file Bigram.py.

◆ eval_interval

int bigram.Bigram.eval_interval = 100

Definition at line 17 of file Bigram.py.

◆ eval_iters

int bigram.Bigram.eval_iters = 200

Definition at line 34 of file Bigram.py.

◆ learning_rate

int bigram.Bigram.learning_rate = 1e-2

Definition at line 18 of file Bigram.py.

◆ max_iters

int bigram.Bigram.max_iters = 100000

Definition at line 16 of file Bigram.py.

◆ n_embd

int bigram.Bigram.n_embd = 32

Definition at line 35 of file Bigram.py.

Classes

Functions

Variables