In this workflow we create image feature vectors for each image in the training data using a pretrained InceptionV3 (https://arxiv.org/abs/1512.00567) model. These feature vectors will then be used in workflow 4. to train our caption network. This way we do not need to include a computationally expensive convolutional branch in our network architecture. As output, this workflow will write a table containing a feature vector for each image to the data folder.
This is a companion discussion topic for the original entry at https://kni.me/w/ryZbpSgP87hLKVvb