In the last workflow of this series, we predict the captions of the images in our test sample. This can be done using a similar iterative approach, meaning predicting the captions word by word. First, we create a seed by creating a partial caption only containing the special start token. This can then be input into the network to predict the first word of the caption. Using the predicted word, a new partial caption is created, and the process is repeated until the special end toke is predicted. In KNIME, this is implemented in a Recursive Loop. In each loop iteration we check if the end token has been predicted. If this is the case, we exclude the corresponding row from further iteration as the full caption has already been predicted. All other rows are used for the next loop iteration. This is done until the end token has been predicted for all examples.

This is a companion discussion topic for the original entry at https://kni.me/w/hH01-0kMQfBLrAq4