In this workflow we use simple textprocessing techniques to reduce the complexity of the captions used for training. This will limit the words which our network is able to predict, i.e. makes the task a bit easier. Also, we will add special start and end tokens to the cleaned captions. As output, this workflow will write a table containing the vocabulary and a table containing the processed captions to the data folder.

