01_caption_preprocessing

In this workflow we look-up a GLOVE embedding vector (https://nlp.stanford.edu/projects/glove/) for each word in our caption vocabulary. Like the previous workflow, these can be used as word features to train our caption model and remove the need to train an own embedding. First, the GLOVE model is downloaded and extracted to the data folder. The vectors are simply contained in a text file. The actual look-up is then performed in a Python Script node, that creates a Python dictionary mapping from each word of the vocabulary to the corresponding GLOVE vector. This Python dictionary will then directly be written to the data folder within the Python Script, as we need this format in the next workflow during model definition.


This is a companion discussion topic for the original entry at https://kni.me/w/S_w16stlaQSisyS8