Recurrent neural networks are very useful when it comes to the processing of sequential data like text. In this tutorial, we are going to use LSTM neural networks (Long-Short-Term Memory) in order to tech our computer to write texts like Shakespeare.
Yes you heard that right! We are going to train a neural network to write texts similar to those of the famous poet. At least kind of. Since recurrent neural networks and LSTMs in particular have a short term memory, we can train it to “guess” the next letter based on the letters that came before. That leads to our network producing these texts. For this of course we will need some substantial training data.
Loading Shakespeare Texts
In this tutorial we will only need a Python installation and the libraries Tensorflow, NumPy and Random. Nothing else!
Now we can get to work. First of all we need a large amount of sentences and texts from Shakespeare himself, in order to train the model. For this we will use this file! And we will download it directly into our script.
Here we directly download the file into our script and start reading and decoding it. The next step is to prepare the data so that we can process it.
The problem that we have right now with our data is that we are dealing with text. We cannot just train a neural network on letters or sentences. We need to convert all of these values into numerical data. So we have to come up with a system that allows us to convert the text into numbers, then predict specific numbers based on that data and then again convert the resulting numbers back into text.
For the sake of simplicity I am going to modify the last code line that we wrote. In this case I immediately convert all of the text into lower-case so that we have fewer possible choices. Also I am not going to use the whole text file as training data. If you have the capacities or the time to train your model on the whole data, do it! It will produce much better results. But if your machine is slow or you have limited time, you might consider just using a part of the text.
Here we select all the characters from character number 300,000 up until 800,000. So we are processing a total of 500,000 characters, which should be enough for pretty descent results.
Now we create a sorted set of all the unique characters that occur in the text. In a set no value appears more than once, so this is a good way to filter out the characters. After that we define two structures for converting the values. Both are dictionaries that enumerate the characters. In the first one, the characters are the keys and the indices are the values. In the second one it is the other way around. Now we can easily convert a character into a unique numerical representation and vice versa.
In this next step, we define how long a sequence shall be and also how many characters we will step further to start the next sentence. What we try to do here is to take sentences and then save the next letter as the training data.
We iterate through the whole text and gather all sentences and their next character. This is the training data for our neural network. Now we just need to convert it into a numerical format.
This might seem a little bit complicated right now but it is not. We are creating two NumPy arrays full of zeros. The data type of those is bool, which stands for boolean. Wherever a character appears in a certain sentence at a certain position we will set it to a one or a True. We have one dimension for the sentences, one dimension for the positions of the characters within the sentences and one dimension to specify which character is at this position.
Building Recurrent Neural Network
Now that our training data is prepared, let us start with building the neural network. To make our code simpler, we are going to import the specific tools that we are going to use.
Of course you can also just refer to these things manually if you want to. We will use Sequential for our model, Activation, Dense and LSTM for our layers and RMSprop for optimization during the compilation of our model. LSTM stands for long-short-term memory and is a type of recurrent neural network layer. It might be called the memory of our model. This is crucial, since we are dealing with sequential data.
Our structure is simple! The inputs immediately flow into our LSTM layer with 128 neurons. Our input shape is the length of a sentence times the amount of characters. The character which shall follow will be set to True or one. This layer is followed by a Dense hidden layer, which just increases complexity. In the end we use the Softmax activation function in order to make our results add up to one. This gives us the probability for each character.
Now we compile the model and train it with our training data that we prepared above. We choose a batch size of 256 (which you can change if you want) and four epochs. This means that our model is going to see the same data four times.
Our model is now trained but it only outputs the probabilities for the next character. We therefore need some additional functions to make our script generate some reasonable text.
This helper function called sample is copied from the official Keras tutorial.
Link to the tutorial: https://keras.io/examples/lstm_text_generation/
It basically just picks one of the characters from the output. As parameters it takes the result of the prediction and a temperature. This temperature indicates how risky the pick shall be. If we have a high temperature, we will pick one of the less likely characters. A low temperature will cause a conservative choice.
Now we can get to the final function of our script. The function that generates the final text.
Again, it is less complicated than it looks. We basically choose a random starting position within the text because we need some starting text in order to predict the “next” character. So basically the first SEQ_LENGTH amount of characters will be copied from the original text. But we could just cut them off afterwards and we would end up with text that is completely generated by our neural network.
So we choose some random starting text and then we run a for loop in the range of the length that we want. We can generate a text with 100 characters or one with 20,000. We then convert our sentence into the desired input format that we already talked about. The sentence is now an array with ones or Trues, wherever a character occurs. Then we use the predict method of our model, to predict the likelihoods of the next characters. Then we make use of our sample helper function. In this function we also have a temperature parameter, which we can pass to that helper function. Of course the result we get needs to be converted from the numerical format into a readable character. Once this is done, we add the character to our generated text and repeat the process, until we reach the desired length.
The results are actually quite good! Let’s take a look at some samples. I played around with the parameters, in order to diversify the results. I am not going to post the full results here but just some interesting snippets.
Settings – Length: 300 – Temperature: 0.4
Settings – Length: 300 – Temperature: 0.6
Settings – Length: 300 – Temperature: 0.8
If you consider the fact that our computer doesn’t even understand what a word or a sentence is, these results are mind-blowing. Of course, they are not perfect and a lot of times (especially when choosing a high temperature), we will end up with some creative word creations. But still it is kind of impressive.
That’s it for today’s tutorial. Of course, here we just used Shakespeare’s texts. You can also export your WhatsApp chats and load these into this script. If you want to see a tutorial on that, let me know in the comments!
I hope you enjoyed this blog post! If you want to tell me something or ask questions, feel free to ask in the comments! Down below you will find some additional links, including the full source code. Check out my Instagram page or the other parts of this website, if you are interested in more! I also have a lot of Python programming books! Stay tuned!