I have a log file (see below) from an external (rel. to KNIME) program that I would like to analyze and display with KNIME-Reporter.
As the file can be regarded as a concatenation of many different files, I think the best is to split the file many different smaller files that can then be read using the File Reader. Any other option using KNIME and either the LineReader or File Reader would involve a VERY complicated workflow that won't be very flexible as well.
Do you have any other idea on how to process this file? I don't like the idea of producing many files as some of the files will probably only contain one line and it just pollutes the file system...
Thanks a lot for any comments.
Bernd
As you can see the file contains sections that are encapsulated in ">> Name" and ">>END_MODULE". Within these sections a File Reader would be able to read the columns appropriately.
One option is to copy the log textfile to its own directory somewhere. Then use the "Flat File Document Parser" in "KNIME Labs/Text Processing" and choose location of the directory from the node config.
This puts the logfile in a "document" cell, now use the "Document Data Extractor" node to convert this into a "String" cell. In the config of the node choose "text". Now connect up the "Data To Report" node.
In Report Designer, when choosing the Column (which will be called Text) and you add this to the report page, highlight it, and be sure to go to Advanced Options and "WhiteSpace" and change it from "No Wrapping" to "Normal" otherwise you will only see the first line of the log file.
I actually want to plot some graphs based on the values and might even compare different files. So I think I will need the data within KNIME in a usable form.
Maybe I could write a new input node, which has arbitrary number of outputs? Is this possible? (I only know variable inputs)...
Otherwise I have completed the script to create multiple files and this seems to be working. Unfortunately this was easier then devising an algorithm in KNIME. An awk / gawk node might be very useful... Well, the hackathon is coming soon and maybe we can come up with a solution there...
To read in multiple files in KNIME is quite easy. If you copy all the files to a single directory, you can then use "List Files" node to get a list of all the filenames from that directory. Then connect up a "TableRow To Variable Loop Start" node, this now has all the filenames as flow variables.
You can now use the "File Reader" node, by showing its variable ports, and connecting the "TableRow To Variable Loop Start" output to the variable input node of the "File Reader" node. To configure this in the "File Reader" node go to the Flow Variables tab, and choose the variable called "URL" from the dropdown list next to "DataURL". Then after this node put in a "Loop End" node. This will now load all the text files in, in one go, from the directory, is this helpful ?
If the File Reader node doesnt load the files the desired way, remember you can swap it for the "Flat File Document Parser" node instead, followed by the "Document Data Extractor" node, and then do the "Loop End" node. Using this approach allows you to do more manipulations with the document too, so instead of extracting the whole lot into one cell with "Document Data Extractor" you can extract each sentence into a cell at a time with "Sentence Extractor", or filter certain terms and characters out from the range of nodes in "Text Processing/PreProcessing".
One problem with the loop over different files is that the file reader won't adjust the format (i.e. the table structure, column names etc) for each new input file. That will cause a problem...
I have now solved the problem with the awk script to create different files and then individually read in the different files (no loop). This is OK since they all have to be handled differently in the report anyways and also need different Data to Report nodes.
Though the report designer is very flexible it is still limited to very specific instances where the data has to be nearly the same. And that is actually what I want anyways. It just takes time to generate and you have to have a plan before starting. That is somehow unsatisfactory because the data is changing quite often but I guess I can live with that.
I was looking at this solution, and I noticed that as of today, the File Reader doesn’t have an option to update to the DataUrl. Do you have any additional recommendations?
This thread is more than 10 years old, so it’s likely that the solution posted has since been superseded.
It sounds like you’re interested in reading in multiple files from a directory? Please make a new thread and include as many details as possible about what you’re attempting to do.