Read CV from SAP

pablobato · January 3, 2023, 11:40am

Hi! I downloaded some tables from SAP in a .txt file and I would like to open them on Knime.

As u can see in the image there are several tables. The first two are information about format and download. The third is the information separated into multiple pages.

I used TikaParser for reading and after a long workflow I got the the third part as I wanted but I feel like there must be a better way.
This is my workflow:.

This is a minimum working example:
Prueba minimo.txt (787 Bytes)
The result:

Thanks in advance

ArjenEX · January 3, 2023, 12:19pm

Hi @pablobato

Below is an alternative way to do it. Note that this approach could be a starting point and is subject to enhancement based on your actual data set.

A Line Reader is a bit more suitable than Tika Parser in this case. The former process the file already row by row avoiding having to manipulate the single cell with all the content that the Tika Parser creates.

Main working principle here is to evaluate each row for the type of content that you need (in its raw format).
In this case that’s a number, pipe, and date in that particular order. You can check this with a RegexMatcher (available in the Column Expression node) by using .*[0-9].*[|][0-9]{2}.[0-9]{2}.[0-9]{4}.*

See it in action here: regex101: build, test, and debug regex

Note that you need to match the entire row in KNIME which is arranged by the leading and trailing wildcards. The disclaimer applies to this, if you have different data formats within the same file the Regex needs to be expanded to captures those cases as well.

The RegexMatcher will generate a true/false output which you can subsequently filter on.

Filtered data:

Using a Cell Splitter with | as delimiter, you can create the desired columns.

With some generic clean-up, you’ll end up with the desired output.

See WF: Read CV from SAP.knwf (26.8 KB)

Hope this provides some inspiration!

goodvirus · January 4, 2023, 8:23am

Hi @pablobato,

I think there is an easier solution. It seems that you exported the tables with the transaction “se16” and did the following:

and than this:

(sorry I got an german systems, it say´s export to local file and than unconverted).

But what you could to is choose “text with tabulator” and use a file reader with tabulator a seperator

ORRRR

you could export it to Excel…

Hope it helps,

Paul

pablobato · January 4, 2023, 9:44am

Thanks ArjenEX, now my workflow is cleaner and more eficient.
When I have a final version, I’ll upload it in case you are interested!

pablobato · January 4, 2023, 9:56am

Thanks goodvirus,
Sadly the tables are too big to fit in Excel, so I tried txt.

I tried your format but I still have to remove the first two tables, I’ll see if I can remove it before the download. Thanks!

Pablo

goodvirus · January 5, 2023, 12:27pm

Hi pablobato,

to big for excel? So we are talking > 1 Million rows?
What you also could do, is use the csv reader and skip the lines that you don’t need:

Maybe you could tell us how you access the data in SAP (se16, custom report, quickview) and we can come up with an easier solution.

Best regards,

Paul

pablobato · January 23, 2023, 9:43am

Hi @goodvirus,
thank you for your help.

The data is 1.3 Million million aprox.
I am downloading from GJLI in this format.

I tried to use the CSV Reader like you said but I think the lines that separate headers from table are causing some error. Is there any further configuration that I can add to make it work?
Thanks!

goodvirus · January 23, 2023, 5:07pm

Hi @pablobato,
have you tried “Text with Tabs” and could you append a sample of the text file (just the first 50 Rows)?

pablobato · January 23, 2023, 5:57pm

Hi @goodvirus,
I tried text with tabs but I was having a similar issue, I’ll try again tomorrow.
About the sample of the text, It is uploaded in the initial question, let me know if it’s corrupted or you are havind difficoulties opening it

I implemented a similar solution toArjenEX but I feel like there is an easier way.

Thanks

goodvirus · January 23, 2023, 6:10pm

I meant the File with Text with taps.

system · April 23, 2023, 6:11pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.