Mixing text and other columns in classification model

Is there any way to combine features from text analysis with other columns to generate a model? In this case I would like to do some classification on patent data and as well as the text in the patents, there are a number of classification codes added to the patents to identify the purpose of the patent. I’d like to be able to generate some form of model where both the text and these classification codes are used. I was wondering if it would be possible to use some of the deep learning nodes to generate an encoding (possibly an LSTM layer) which becomes an input to another model alongside the codes. Has anyone looked at doing something like this and are there any tips as to how to go about it?

Hi @EwarWright

we did this once to combine network and text mining. The example you can find here: Network Analytics meets Text Processing – KNIME Hub

However with the same principle you can also combine all kind other data into a text mining model.

3 Likes

Hi @EwarWright -

To add on to what Iris said, you may also want to take a look at this workflow that uses an LSTM approach for text classification:

Hi Iris, thanks for the interesting reference. It doesn’t look to be quite what I’m after though as it seems that the two models are generated independently and then combined for visualisation. What I would like to do is to combine text with classification codes within the same model. What I was considering would be something like the LSTM model that Scott has referenced but instead of adding in a standard output layer, using the output of the LSTM layer combined with the additional categorical output (probably encoded) in a new layer and then feeding this larger layer on to a dense layer for output. Does this sound possible (or even sensible)?
Thanks,
Ed