read xls file in textprocessing/workflow

I have copied all the articles of our site ZDnet.be to an Exell spreadsheet. for 2016 it is a spreadsheet with approximately 4000 rows and 5 colums. In the "content column" all the articles are postes. Every row is a new article. I want to read this spreadsheet (especially the Content column) in a texprocessing node. I have tried XLS reader followed by Strings to Term (didnot work) and XLS reader followed by Strings to Document (didnot work). I want to datamine all the articles integral (so not an analysis on every separate article/row) because I want to be able to tell something about the content of the whole site.

Which workflow do I have to follow to be able to read this excell spreadsheet followed by the Textprocessing module?

I have copied the first 5 rows in the attachment so you can see the structure of the spreadsheet

 

Leo

 

 

Hi Leo,

when you say "did not work" what exactly didn't work? Reading in the file or turning the articles into Documents? Which error did you get, if any?

I haven't tested this, but I assume you can read in the XLS then concatenate all the Content cells into one gigantic string, convert that string to a Document, then use the Text Processing nodes to carry on your analysis task.

Did you try already?

Cheers,
Marco.

Hello  Marco

what a good idea! I split up the spreadsheet in 2 tables and then I concatenated them. It worked! I now can go on with my preprocessing....Hope I will not have to bother you again! Thank you for the tip

Leo

You are welcome. Feel free to post here again in case you get stuck.

Cheers,
Marco.