How to programmatically modify table columns and adjust column domains

Hi all,

 

(I want to thank all the Knimers in Zurich for a wonderful conference.  Top notch!)

 

I have what I believe may be a common problem for people who use learning algorithms.  The Weka J48 node requires that (1) all attributes in the test data be present in the training data, and (2) all values for any nominal attribute in the test data be present in the training data.

 

What happens is, over time, my test data does start to include new attributes, or new values for existing nominal attributes.  I want my model predictor to be robust to those changes.

 

To work around this, I was given advice in Zurich to create/append an attribute "Other Attribute" to my training data (an extra column) and nominal values "Other Value" for each nominal attribute.  This is quite easy to accomplish in Knime.

 

The second half of this trick is harder.  When the test data starts to include a new structure, I need to recognize this and modify the test data.  That is, I would need to combine and rename the new attribute column(s) to "Other Attribute".  And I would need to compare the nominal values present in the training and test data for each of the nominal columns, and change all new nominal values to "Other Value". Since this is to be an automatic program running every few minutes, I cannot simply modify the test data by hand. 

 

However, filtering (programmatically) the test data has stumped me.  It seems I need nodes that manipulate columns and cell contents based on table specifications.  Does anyone have a suggestions for the best way to proceed?  I do perhaps have the option of doing this in a stored procedure that produces the data, using SQL instead of Knime.

 

Kind regards,

Bill Nowlin

Hi Bill, 

Have you tried the Extract Table Spec node?  If you enable the possible values checkbox you can extract the nominal values from a table and use them with a reference row filter to sort out the rows that are either in or not in a particular domain.

 

Does that help at all?

 

Regards,

 

Aaron