I have a workflow that trains a DBSCAN model using OPTICS (Cluster Compute & Cluster Assigner) in order to detect anomalies in data. This model is trained using almost all my historical data (data is aggregated by day, 729 days in total) but last month. Now, I’m trying to use that model (generated by OPTICS cluster Compute) and data from yesterday to evaluate if yesterday there was some anomaly. But I don’t know how I can do it.
After loading the model with a “Model Reader” node, OPTICS Cluster gives me an error related with different sizes (The length of the model doesn’t correspond to the given data. (1!=729)). Obviously I don’t want to re-train the model, only use it to cluster yesterday data.
I can only do some guessing here… depending how you were setting up the original model…
how about using the partitioning node in your original model using only 1 month to train the model… then apply this to the new model… so the size of the data is about the same…
Thanks for your suggestion, but I need to train my model with daily aggregations of events and to detect anomalies each day also with an aggregation of the same events of the day before.
I want to use dbscan operator in the same way that I use for example Fuzzy c-means, but dbscan node doesn’t allow me to save the model to be used later :(. Neither optics’s dbscan. At this moment I only have two alternatives: Fuzzy c-means and Weka DBSCAN.