Converting XLSX to specific TXT

Stanislaw · April 28, 2021, 6:36am

Hello all,

I have a table in Excel file and I would like to convert it to a specific .txt file.

I assume I’d need some kind of loop to achieve this.
Is it possible to do such things in Knime?

Thank you!

Please see the attachments:

SourceFile.xlsx (8.3 KB) output.txt (180 Bytes)

mlauber71 · April 28, 2021, 7:55am

Maybe you could take a look at this example and adapt it for your needs:

takbb · April 28, 2021, 10:36am

Hi @Stanislaw , I have taken some ideas from @mlauber71’s flow and put the attached flow together for your example. In this case, it increases the number of nodes used, but removes the loops.

You have a number of specific challenges here. Firstly, your data is in “matrix format” and this ideally needs to be converted into tabular format so that it can then be processed. This can be done by the Unpivot node (Step 1 in my workflow).

Secondly (Step 2), now with the data in tabular form, ItemCode - Date - Value, and sorted on ItemCode/Date you need to process it and determine the required outputs. (In my workflow, I converted you date strings to actual date/time columns for the purposes of sorting. I think your dates are already in sortable order, so this is actually unnecessary in this instance. However, if you ever supplied dates in a different string format then this step would probably be required.)
You are requiring different outputs depending on if this is the first row for an ItemCode, the final row, or one of the rows in between. Effectively its like having a “head”, “body” and “tail”. You therefore need to know where you are within the processing for each line. For this you can Rank the rows, and determine how many there are for each ItemCode. After that, a Column Expressions node can create the required output per row (encoding line separators - I used pipe | characters - within the output as required).

Third (Step 3), having generated your “pseudo” output, you need to turn this into a data set of individual rows for each line to be output. You can do this with a Cell Splitter and Unpivoting which generates new rows based on the divided cell contents.

Finally (Step 4) is any tidy up required and then writing to the output using CSV Writer.

In my workflow I have written to Output.csv using CSV Writer, and Output2.csv using CSV Writer (deprecated) nodes. I was unsuccessful in making the “current” version of CSV Writer output blank lines. If anybody knows a trick for that it would be useful!

KNIME_Excel to TXT.knwf (223.9 KB)

Stanislaw · April 28, 2021, 11:07am

Thank you, guys! I’m quite busy right now, but I’ll analyze your posts later and let you know if it works.

Mark_Ortmann · April 29, 2021, 11:32am

@takbb,

we just tried on Windows and Ubuntu to reproduce the CSV Writer issue you’re reporting, but without any success. Are you certain that the CSV Writer is not able to create the empty line?

Best
Mark

takbb · April 29, 2021, 11:47am

Hi @Mark_Ortmann, that’s very attentive of you!

Now here is the weird thing. I’ve just creating a new data flow to test this, containing just a Table Creator and a CSV Writer, and sure enough it IS writing the blank lines. So that is good news.

But in my other flow where I found the “issue”, I have just tried it again, and yes it is still skipping the blanks. So I shall do some more experimentation to see what the difference between them is. (It’s actually good to see the two different behaviours in the same environment, as it ought to make the cause/difference easier to track down for me!)

It may well be something “odd” then that I have done to the data in the above flow that causes a slight difference between the old and new CSV writer nodes. I’ll get back to you when I find something.
br
Brian

takbb · April 29, 2021, 12:50pm

Hello again @Mark_Ortmann, I think you can rest easy. I would say that the error is “between keyboard and chair” (i.e. my bad! )

What I have now realised is that I had at some point saved my workflow under a (subtly different) new name but in my editor when I was checking the results, I was still pointing at the old file location (my data file being relative to workflow) and there I had an old copy of the file that didn’t have line breaks. But I think that was because at the time that was written, I didn’t have blank lines in the flow.

So when I was checking, the file that I was looking at in the editor wasn’t the one being updated! Coincidentally, (in my “clutching-at-straws” defence ) I had dropped a CSV Reader node on my workflow to “conveniently” read that file back in while I was trying out different encoding settings, and even though that WAS pointing at the correct file, it seems (and I have now discovered it mentioned elsewhere) that CSV Reader doesn’t show the blank lines, so it reinforced my view that the Writer wasn’t working.

So it was either the Perfect Storm or schoolboy error, depending on viewpoint!

Sorry to have put you to the trouble of investigating, and thanks again for following up.

Stanislaw · April 30, 2021, 8:18am

I’ve just followed @takbb 's schema and it surely works. Thank you! Since I’m new to this game it seems a little bit complicated to me, but I hope I’ll figure it all out in the future. You described every step (node) which is great and will be very helpful.
Yeah, I need to understand the graph quite well, because this example is just a simple example. Real life source files are more complicated - for example some item codes have to be excluded from the output file, etc.

Anyway, thank you for all the help! Cheers

system · May 7, 2021, 8:18am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.