The architecture I would like to replicate can be found in this paper: https://arxiv.org/pdf/2111.06377.pdf
I am curious to find out whether anyone has any experience with masked autoencoders in KNIME and how to set it up.
The first problem I have, is building a vision transformer in KNIME. I usually work with the Keras nodes, but from my understanding it is not possible to build a vision transformer with the Keras nodes atm, am I right?
If anybody has any ideas on how to approach this problem or has examples to share, I would be very grateful.
My goal is to use the vector from the autoencoder to enhance an existing nlp-based classifier. I tried several CNN-based architectures before but due to the limited amount of labelled data the results are far from satisfactory. I think an unsupervised approach might be working better because I have a lot of similar, but unlabelled images available. I stumbled across the paper above, which I think is pretty neat. I´m also curious to hear of any other methods out there for unsupervised image vector creation.