Parsing EDIFACT files

Hi everyone,

I have a few thousand EDIFACT files that I need to parse.
For those who don’t know what they look like: they contain 3-character segments (like DTM for date and time, NAD for name and address, etc). The segments contain data separated by plus and colon (their position is defined by the segment) and they are delimited by a quote mark to separate from the next segment.

The raw data looks like this:

UNB+UNOC:3+CUSTOMER+SUPPLIER+220206:0255+000000416'UNH+1+DELFOR:D:04A:UN:GAVB10'BGM+241::6:ANY+000000416+9'DTM+137:20220206:102'DTM+2'NAD+BY+BUYER::92'NAD+SE+SELLER CODE::92++SELLER NAME+SELLER STREET+SELLER CITY++POSTCODE+COUNTRY'...

After segmenting it for better readibility:

grafik

I would like to be able to extract a few segment names to column headers and extract part of the segment content as table rows. Like this:

BGM DTM NAD_BY NAD_SE
000000416 20220206 BUYER CODE SELLER CODE

I used the file reader to separate the segments into columns by using the quote as column delimiter, but I’m stuck now.

How do I get the segment names as headers and retrieve dedicated parts of the segments as rows?

Any help is highly appreciated. Thanks.

@gentile you might try something along these lines:

Maybe you could provide us with a few files that would represent the full spectrum of your challenge and also a file showing the expected results.

Hi @gentile,

i don’t think KNIME has a edifact parser and it would be quite a task to write a real praser with just knime notes, because edifact has some nesting in it (i worked with it in the german utilities sector).
I would advise to parse it via a python package and get the output a python script note.
Fore example you could use https://github.com/nerdocs/pydifact or https://github.com/php-edifact/edifact

Best Regards,

Paul

2 Likes

@goodvirus that sounds like a very good idea

Thank you. I will take a look at those entries and see if they show me the light :wink:
Are you still interested to see a few “real” files?

1 Like

You are right: the nesting in EDIFACT can be a challenge. Maybe transforming EDIFACT to JSON as a first step could help with that.
Thanks for the hint with python: I had found those packages too, but I have hardly any experience with it and don’t even know how to implement them or make them do something :see_no_evil:

@gentile I could try and use the Python package on it and see if I can find a working example.

About KNIME and Python. If you add Python to your KNIME set it will greatly enhance your capabilities:

https://docs.knime.com/latest/python_installation_guide/index.html#_introduction

It might be a bit of a challange first but once you have it set up it opens the world of Python for you :slight_smile:

Sounds like the promised land :wink:
Guess I need to give python a chance and see what happens.

Please find attached two calloff files: one of them containing one message (described by the UNH to UNT loop) and the other one containing multiple messages. Each message contains demand dates and quantities for one material number. Dates are specific days or a period.
Basically every message tells the vendor when they are required to ship a specific material to their customer.

220207_025514.7fdd40cb-ffa4-4f87-92e7-c19b0093b9be_mod.txt (2.9 KB)
220207_025514.911ede23-62b9-4da1-8f59-f66382c3674c_mod.txt (9.1 KB)

The task is to list the dates and periods for a material to better understand changes that can occur from one calloff to another.

For better understanding I tried to describe the looping within the file here:

LIN contains the material number.
Actual demand is in QTY+113 with DTM+2 under SCC+24.
Forecast demand (described by a period) is in QTY+113 with DTM+64 (start) and DTM+63 (end), under SCC+4.

The output would be something like this:

NAD_SE LIN SCC_24_QTY SCC_24_DTM SCC_4_QTY SCC_4_DTM
SELLER CODE 7915288-02 2304 20220207 1584 CW15
SELLER CODE 7915288-02 720 20220209 1728 CW16
SELLER CODE 7915288-02 432 20220211 2304 CW17
SELLER CODE 7915288-02 864 20220214 2016 CW18
SELLER CODE 7915288-02 720 20220216

Do you think this is feasible somehow?

@gentile I tried a first import and split up the messages with the Python library and imported them into KNIME (and Excel to show what they look like). Further work will have to be done to split the lines into the tables you want.

I am not sure if the Python package would also offer this options but it might be worth a try (I am not familiar with the format as to be able to feed some sort of pattern into the code). Otherwise you will have to identify the blocks in the data and transform them to your needs.

in the folder /script/ theer is a Jupyter notebook to try the code ‘pure’: kn_forum_39612_python_edifact_parse.ipynb

3 Likes

That looks amazing! Thank you so much @mlauber71 for your time and effort!
I should be able to pick it up now and take it further :smiling_face:

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.