Read CV from SAP

Hi! I downloaded some tables from SAP in a .txt file and I would like to open them on Knime.


As u can see in the image there are several tables. The first two are information about format and download. The third is the information separated into multiple pages.

I used TikaParser for reading and after a long workflow I got the the third part as I wanted but I feel like there must be a better way.
This is my workflow:.


This is a minimum working example:
Prueba minimo.txt (787 Bytes)
The result:
image

Thanks in advance :smile:

Hi @pablobato

Below is an alternative way to do it. Note that this approach could be a starting point and is subject to enhancement based on your actual data set.

A Line Reader is a bit more suitable than Tika Parser in this case. The former process the file already row by row avoiding having to manipulate the single cell with all the content that the Tika Parser creates.

Main working principle here is to evaluate each row for the type of content that you need (in its raw format).
In this case that’s a number, pipe, and date in that particular order. You can check this with a RegexMatcher (available in the Column Expression node) by using .*[0-9].*[|][0-9]{2}.[0-9]{2}.[0-9]{4}.*


See it in action here: regex101: build, test, and debug regex

Note that you need to match the entire row in KNIME which is arranged by the leading and trailing wildcards. The disclaimer applies to this, if you have different data formats within the same file the Regex needs to be expanded to captures those cases as well.

The RegexMatcher will generate a true/false output which you can subsequently filter on.

Filtered data:

Using a Cell Splitter with | as delimiter, you can create the desired columns.

With some generic clean-up, you’ll end up with the desired output.

image

See WF: Read CV from SAP.knwf (26.8 KB)

Hope this provides some inspiration!

2 Likes

Hi @pablobato,

I think there is an easier solution. It seems that you exported the tables with the transaction “se16” and did the following:


and than this:
image
(sorry I got an german systems, it say´s export to local file and than unconverted).

But what you could to is choose “text with tabulator” and use a file reader with tabulator a seperator

ORRRR

you could export it to Excel…

Hope it helps,

Paul

3 Likes

Thanks ArjenEX, now my workflow is cleaner and more eficient. :smile:
When I have a final version, I’ll upload it in case you are interested!

1 Like

Thanks goodvirus,
Sadly the tables are too big to fit in Excel, so I tried txt.

I tried your format but I still have to remove the first two tables, I’ll see if I can remove it before the download. Thanks!

Pablo

Hi pablobato,

to big for excel? So we are talking > 1 Million rows?
What you also could do, is use the csv reader and skip the lines that you don’t need:

Maybe you could tell us how you access the data in SAP (se16, custom report, quickview) and we can come up with an easier solution.

Best regards,

Paul

Hi @goodvirus,
thank you for your help.

The data is 1.3 Million million aprox.
I am downloading from GJLI in this format.
image

I tried to use the CSV Reader like you said but I think the lines that separate headers from table are causing some error. Is there any further configuration that I can add to make it work?
Thanks! :smile:

Hi @pablobato,
have you tried “Text with Tabs” and could you append a sample of the text file (just the first 50 Rows)?

Hi @goodvirus,
I tried text with tabs but I was having a similar issue, I’ll try again tomorrow.
About the sample of the text, It is uploaded in the initial question, let me know if it’s corrupted or you are havind difficoulties opening it
image
I implemented a similar solution toArjenEX but I feel like there is an easier way.

Thanks

I meant the File with Text with taps.