How to implement a workflow to get live data?

Heey,

Is it possible to get live data from a pc rather than a database ? I am trying to analyse traffic flow and inspect packets. The data that travels into the router is captured by wireshark or tcpdump and its not a database. Is there any solution?

I have a complete workflow but I don’t know how to implement it and what options knime provides.

1 Like

What form is it captured in? The File Reader node will read csv and many other delimited file formats. There is also an XML reader node or JSON Reader node which will read single XML or JSON files, and various nodes in the IO section of the Vernalis community contribution.

Steve

1 Like

Thanks for your response Steve,

I have trained model now and shows good results. I have to write this to PMML writer first ?

And then how can I feed it with live data that is captured by tcpdump ? Does Knime have solution to feed the data live and give the results up to date or the user have to upload data each time?

Ahmad Wali

1 Like

There is a blog discussing an extension to use streaming data in KNIME. Maybe you could explore that.

https://www.knime.com/blog/streaming-data-in-knime

The node is still Beta

@AhmadWali - yes, if your model is in PMML form then you can save / load it using the corresponding nodes.

I dont think you can pick up the data direct from Wire Shark continuously - the best I can think that you can do would be either:

  • Put your reader node, reading the data from Wire Shark in a loop, and repeatedly read and analyse in whatever way you are doing, or
  • Run the whole workflow at fixed repeated intervals from the command line (See https://www.knime.com/faq#q12) using task scheduler or as a cron job.

@mlauber71 - Streaming doesnt help here, unless @AhmadWali is planning on writing his own node to interface directly with wireshark in some way. Streaming is a different way that nodes execute so that they pass on their output rows to the downstream node(s) as soon as they are complete rather than generate the complete output table, save it to disk, and the pass to the next downstream node. If you have lots of data then this can help as the Disk I/O is often the slow part, even with SSD. One side effect is you can’t view the intermediate tables as the data in them is not saved. Nice thought though!

Steve

1 Like

@AhmadWali Sir did you get the solution to your problem? Please let me know how it can be done