I've just read in an excel file via the "XLS Reader" Node into knime, which results in an output table with around 50 rows and 9 columns.
A lot of the information stored inside that output table is actually more or less worthless for my specific purposes and I am wondering now if there is any way to remove those "useless" data fields and rearrange the data I need into a specific order to have just one row with alle the information concerning that excel file?
Thx in advance for any kind of help!
I am fairly new into KNIME and data mining in general, so I hope not to bother you guys too much!
To remove unnecessary data fields, simply use the Column Filter node.
If you want to selectively remove rows, you could use the Transpose node first, and then the Column Filter node again.
I am not clear exactly what you are trying to do, if this is sufficient, or you are trying to do additional transformations.
As Simon says, the Column filter node, and also the Row filter node for selecting certain rows. How you procede from there really depends exactly on what you want to do with your data to reduce it to single row?
You can limit the parsed range directly in the XLS reader node as well, if that helps. When it comes to "having it all in one row", I presume you're trying to sort out the ever-popular "business format" Excel style with merged cell headers etc.? These can be a real pain, best to clean up outside of KNIME and parse the result only (and educate your users...). The only alternative is massive filtering and concatenation of fields via the "GroupBy" node's concatenate options.
big thanks - it indeed helped me a lot!
i used the groupby node for each of the columns to eliminate the empty data fields, transponed the "trimmed" columns to get everything into a row format and cross joined everything in the end.
i am not done yet and have just been testing it to a specific prototype excel file. we'll see what other issues will come up with the real files!
You can also use the Column Aggregator node which does the same as the GroupBy node but across columns, just a FYI.
works even better now! thanks.
does anyone of you have any idea how to extract text from those annoying text boxes within an excel file? i until now did not know those things actually exist. if i wanted to type text in an excel sheet, i just typed it somewhere, but did not use an additional textbox... the xls reader does not seem to recognize them unfortunately.
As far I am aware, floating text boxes and graphics are not supported by the xls reader.
i guess it would be difficult to know where they would be assigned to in a data table.
I halfway got around to parsing these at one point - you may want to try saving as xlsx, unzipping to a temporary location and parsing the XML content with the XML reader. It's fairly horrible, but I guess it's doable with enough XPath patience. However, be prepared for a steep learning curve if you have no prior experience with it (which is why I only got "halfway" there back then, now it'd be easier).