How to: image vector creation with masked autoencoder

Hi there,

The architecture I would like to replicate can be found in this paper:

I am curious to find out whether anyone has any experience with masked autoencoders in KNIME and how to set it up.

The first problem I have, is building a vision transformer in KNIME. I usually work with the Keras nodes, but from my understanding it is not possible to build a vision transformer with the Keras nodes atm, am I right?

If anybody has any ideas on how to approach this problem or has examples to share, I would be very grateful.

My goal is to use the vector from the autoencoder to enhance an existing nlp-based classifier. I tried several CNN-based architectures before but due to the limited amount of labelled data the results are far from satisfactory. I think an unsupervised approach might be working better because I have a lot of similar, but unlabelled images available. I stumbled across the paper above, which I think is pretty neat. I´m also curious to hear of any other methods out there for unsupervised image vector creation.


1 Like

Hi @e2dboden,

as you can see by the many responses, this is quite a tough question :sweat_smile: Let me try to give hints, even though I don’t have experience with masked autoencoders.

Did I understand correctly that a vision transformer is a concatenation of certain layers? If so, this should be possible via the Keras Layer nodes (see here for a list). If its more complex, it’s probably more convenient to build the vision transformer with the DL Python Network Creator – KNIME Hub or the DL Python Network Editor – KNIME Hub .

You could also use an existing trained network, if available and modify it (or parts of it). An example for this you can find here: Fine-tune VGG16 (Python) – KNIME Hub

But lets see whether this bump to the topic works and someone more experienced in deep learning has some idea!

Kind regards, Lukas


This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.