Word Embeddings : Read (cached) vocabulary/vectors?

Hello, 

 

I was wondering if anyone has any tips as to how to read/import pre-calculated word vectors. Using the deeplearning4j Word Embeddings module, we can train and apply, or even extract the vocabulary, but it's my understanding we cannot re-use pre-calculated (elsewhere, for example) word vectors.

If this were possible, I would guess it would provide an object adapted to the in-port of Word Vector Apply. 

 

Given that calculating word embeddings is very time-consuming, I would have guessed this was possible, but I haven't found a way. 

 

Any ideas?

 

Hi torobotaki,

unfortunately the reading of pre-calculated word vector models is not directly possible in the current version of KNIME. However, we are currently working on nodes which will provide that functionality. There will be a node which will be able to read word vector models created by KNIME as well as most of the external models (if saved in a suitable format).

However, there are some pre-trained models that are saved in a CSV format (e.g. http://nlp.stanford.edu/projects/glove/). They just contain the word and the corresponding vector in a row separated by whitespace. So you could read these models using the normal File Reader node and then try to work with the table.

If you have further questions I'm happy to help.

Cheers

David