Trouble replicating cascading K-Means clustering and C4.5 DT

clx_lb · December 4, 2020, 7:45pm

Hey there!

I’m currently trying to replicate an algorithm regarding Network Anomaly Detection, based upon cascading K-means and the C4.5 decision tree. It’s actually my first time using Knime, up until now I have only worked with a couple Python libs.

Sadly I’m not able to progress successfully with my current workflow. I’m trying to cluster the WIFI on ICE (Deutsche Bahn) dataset based on geodata and ping, but I have not been able to replicate the mentioned algorithm correctly and would appreciate some help. Apparently I keep feeding the correct information into the decision tree learner unintentionally, which results in a 100% accuracy rate.

Thats the current workflow:

Thanks in advance!

janina · December 9, 2020, 2:30pm

Hallo @clx_lb,

yes, you are using the exact same data set for training you decision tree model as for evaluating your decision tree model. Therefore, your model has a 100%. You would need to split you data into a training data set and a test data set using the Partitioning node (similar as you have done for the K-means clustering).

Best,
Janina

system · June 10, 2021, 2:30am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.