combine multiple xml files / How to do knime

Hi; everyone
I want to merge xml files as in the picture. How do I do this, I should do a method of how to extract the content. (from all files)

image

I want to reach the following workflow can do with the result. but I don’t add “xml reader” (for 2500 xml file) and it doesn’t make sense if I do. I need to read and merge in bulk by selecting the folder where xml files are. I don’t know how to do this on knime

image

Hi,

You can do this to read and concatenate and XML files:

xml

xml.knwf (21.7 KB)

Best,
Armin

3 Likes

In addition to what @armingrudd wrote you can read more about handling of xml and JSON Files in this thread

And this blog

2 Likes

test.zip (677.0 KB)
Hi Armin;
In the appendix there is a sample file as short. I don’t understand what I’m doing wrong. it just seems to have received the first file.xml_combine.knwf (16.3 KB)

1 Like

@ mlauber71 @ armingrudd the files are reading ordinary but the result part comes out one. what is missing here or what am I doing wrong?

Dear;
@mlauber71 and @armingrudd
I’d appreciate it if you could describe the last step to find the solution.
Reading individual files at the end result only the last file content information

thanks

You have not entered the XPath queries in the XPath node.

1 Like

I think you have some serious work to do on your Path definitions. They do not seem to work for some of the xml files and I think most of them that you had in your example are not going to work.

I defined some of them but not all since I do not know what information you are interested in.

kn_example_xml_loop.knar (681.0 KB)

3 Likes

i have entered and tried it but the only result is that the room belongs to the latest file. If you can make an example with any content I edit. I can’t figure out why the files he reads as a single result are now.

As @mlauber71 said, there is a problem with your files. I think they don’t have a same structure so the queries work for some of them and don’t work for some other ones.

Make sure all your files have a same structure and then add the correct XPath queries for the elements you want.

Best,
Armin

1 Like

You have to define every single one of the elements you want to extract. I tried to define an extraction for the URI_LIST. It will result in a collection field you then might have to ungroup and do further manipulations.

KNIME’s node does help you in defining the XPaths by letting you click on the items you want and suggesting a definition. That works most of the time. Sometimes you have to refine it; and since I am not an expert in XML and do not know the original structure and intent it is mostly try and error.

And again: it seems the xml files have varying structures. It might still work if you come up with definitions for both/all variants (?) and later remove the missing columns,

I think KNIME is the tool to help you with your task but you will have to work on the definitions.

image

2 Likes

The name is not Martin BTW :slight_smile: I wrote something about defining XPaths here:

It is a little bit tricky but once one has done it the magic starts to happen. Click on the blue parts until there is a green feedback :slight_smile:

Thanks for sharing the link.

Oh, sorry. I searched your username and found a similar username in Twitter with the first name Martin and thought that it’s you. Excuse me for mentioning you with a wrong name.

1 Like

@mlauber71 and @armingrudd
Thank you very much for all your answers and solutions. You guys are great.
You made me know.
I’ll work on it a little bit in the next process.

2 Likes

@mlauber71 and @armingrudd
Hello, how can I convert the table file into a excel file (xlsx)?
Is there also a certain node I can use?

Hi,
have a look at the Excel Writer:

Bye
Tobias

4 Likes

Hi, I have studied this discussion. It helped a lot. but I can’t build my own and I don’t understand it well enough.
I downloaded the sample worfflow and it works.
But Knime told me that the Table row to variable loop start node is obsolete and will be disrupted.
Of course there is a new version of the node.
When I use it, I can’t load data with a new node.
I found the problem and that is that the variable Path is not transferred inside the loop and I don’t quite understand why.
In the old version it is Location and I see it in the variable but I don’t see it in the new version.
Can you please advise me where I am making a mistake?
I am attaching an older version that works but the new one does not work.
This is approximately 1000000 xml files in which the data are publicly available as open data in the Czech Republic.
ARES import xml ok working reseted.knwf (20.4 KB)
ares_problem_with_variable reseted.knwf (27.5 KB)


Hi,
you don’t want to use the deprecated nodes. Use the updated ones instead and replace them in your flow.
KNIME has introduced a new data type called “Path” (P). You should point to the Path variable not location which is a string in your screenshot.
hope that helps
br

1 Like

Hi,
I am happy to use “Path” (P).
Unfortunately I can not see it in combo.


That is problem