In overall, I understand following code from Keras Tutorial:
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import random
import io
path = keras.utils.get_file(
"nietzsche.txt", origin="https://s3.amazonaws.com/text-datasets/nietzsche.txt"
)
with io.open(path, encoding="utf-8") as f:
text = f.read().lower()
text = text.replace("\n", " ") # We remove newlines chars for nicer display
print("Corpus length:", len(text))
chars = sorted(list(set(text)))
print("Total chars:", len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))
# cut the text in semi-redundant sequences of maxlen characters
maxlen = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
sentences.append(text[i : i + maxlen])
next_chars.append(text[i + maxlen])
print("Number of sequences:", len(sentences))
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
for t, char in enumerate(sentence):
x[i, t, char_indices[char]] = 1
y[i, char_indices[next_chars[i]]] = 1
model = keras.Sequential(
[
keras.Input(shape=(maxlen, len(chars))),
layers.LSTM(128),
layers.Dense(len(chars), activation="softmax"),
]
)
optimizer = keras.optimizers.RMSprop(learning_rate=0.01)
model.compile(loss="categorical_crossentropy", optimizer=optimizer)
def sample(preds, temperature=1.0):
# helper function to sample an index from a probability array
preds = np.asarray(preds).astype("float64")
preds = np.log(preds) / temperature
exp_preds = np.exp(preds)
preds = exp_preds / np.sum(exp_preds)
probas = np.random.multinomial(1, preds, 1)
return np.argmax(probas)
epochs = 40
batch_size = 128
for epoch in range(epochs):
model.fit(x, y, batch_size=batch_size, epochs=1)
print()
print("Generating text after epoch: %d" % epoch)
start_index = random.randint(0, len(text) - maxlen - 1)
for diversity in [0.2, 0.5, 1.0, 1.2]:
print("...Diversity:", diversity)
generated = ""
sentence = text[start_index : start_index + maxlen]
print('...Generating with seed: "' + sentence + '"')
for i in range(400):
x_pred = np.zeros((1, maxlen, len(chars)))
for t, char in enumerate(sentence):
x_pred[0, t, char_indices[char]] = 1.0
preds = model.predict(x_pred, verbose=0)[0]
next_index = sample(preds, diversity)
next_char = indices_char[next_index]
sentence = sentence[1:] + next_char
generated += next_char
print("...Generated: ", generated)
print()
# max_len is 40
# number of character is 56
We download the dataset and create x, y for training, and do training while also doing sampling.
When doing sampling, we are running the model 400 times to generate 400 characters.
Here’s one thing I can’t understand. As I learned if there are 128 LSTM units, the output from the first unit got passed as input of second unit, then the output from the second unit got passed into input of third unit and so on. In this way isn’t that model should be generating 127 characters at a time.?
But code here looks like work some what in a different way. Shape of the input is (max_len, number of characters)
and output (number of character, )
which got passed to softmax layer.
Can anyone help me to understand this? Thanks