Problems with input data format for GeneralizedSequentialPatterns Weka node

Hi collegues,

I'm trying to figure out how to use the GeneralizedSequentialPatterns node that performs the GSP algorithm.

Here (you can download it if you want) I got a preprocessed data from UCI page. These data are suitable as type format for being process using the GSP algorithm, already tested with SPMF java tool and everything went infe.

Unfortunately with Knime something goes wrong because there are some compatibility problems related to the input data format. I have tried to process them as integer or double or string that are the only data format accepeted but it doesn't work.

Any suggestions?

Thanks in advice.

-Giulio

Hello Giulio,

I think you should be able to process the data as strings. Moreover, when you configure the GeneralizedSequentialPatterns (3.7) node, please make sure to set the right miminum support threshold. (the default is 0.9).

Hope that helps,

Best,

Vincenzo

Hi @Vincenzo,

I have already tried to process the input data in every way and format as possible, and with every kind of support threshold. Have you take a look to the input datat that I have provided? Did you try to test them as input data on that node?

Since I think that it could be an unresolved bug of that weka node.

I have already seen some discussions about this topic and no one of the user that has tried to perform this node he made it work... also there isn't a knime example showing this node in a successful working situation.

-Giulio

Hi giulio89,

Yes, I tried with the data you provided. Another think that I did not add in the previous message is that you need to be sure that no missing values are given in input to the GeneralizedSequentialPatterns (3.7) node.

Actually, the Preliminary Attribute check option in the config node gives you an overview of the variables that are not OK.

Hope that helps,

Best,

Vincenzo

 

Hi @Vincenzo,

in the data provided there are not missing values. If you have it (of course) can you upload here a working example? Maybe with the data provided by me in the first message?

Thanks in advice.

-Giulio