How to transform XML data to a table

Thank you very much, indeed it is very easy for one file. This is a first step. Then I will have to upload a lot of xml files and extract all the same objects I need and include them into one table (File 1/File 2/File 3…). I will try and I’ll come back if I need. :grinning:

It would be as straightforward as doing it on a single file, if the structure of the files are all the same.

:blush:

Okay, I still need help please. I have several issues.

  1. How to write the right query
    I want to get all the element “oai_pmh” written in my XML file to compile all the attributes “code” in my table. It works if I select one element then I click on “Add Xpath”. But there are a lot of these elements “oai_pmh” in my XML file. So which query should I write to get all of them without clicking one by one?
    code_table

2)How to get the name of a node?

I want to take the name of an element in my XML file, in my example the name of the 3rd node “oai_pmh”. If I want to get the name of my root, it is ok with name(/*), but I can’t do the same for the 3rd node “oai-pmh”. I tried with “name(/source_oai_pmh/source/)” but it doesn’t wotk. Any idea?

Thank you for your help,

Melanie

  1. Select the path > remove the identifier number of the element in the path > select “Multiple Rows” in the configuration window of the path.
    e.g. /source_oai_pmh/source/oai_pmh[1]/@code to /source_oai_pmh/source/oai_pmh/@code

  2. Since the XPath node supports XPath 1, you can only take a single name:
    The first element: name(/source_oai_pmh/source/*). Which here would be “oai_pmh”.
    The second element: name(/source_oai_pmh/source/*[2]) and so on.

:blush:

3 Likes

Well, thank you @armingrudd :slight_smile: . I come back about my question on uploading multiple xml files and extract my data in a single table. I read on the forum that you suggest this:


In list files, I browse the folder with all my XML files but then in Xml reader what should I browse?

Thank you

The Location or URL variable. Click on the little button next to the browse button and select one of the variable (Location or URL).

:blush:

Thank you. It works for XML files with exactly the same name of root and item, but do you know if there is a possibility to do it with XML files which have the same structure but with diffrent name of root and items? In my below example, I have a file with a root “sources_sitemap” etc… and on the other file I have “sources_oai_pmh”. Is there a possibility or not at all? Thank you very much (I very beginner in Xpath/XML language so I try to understand step by step)


It is possible. You have to use the root name in a String Manipulation node to create the XPath in each iteration.

If you cannot handle it, provide me with 2 sample files and I’ll do it for you.

:blush:

I am unfortunately not enough advanced levelto success in this issue. I am very grateful if you could show me with these two XML files. Thank you :slight_smile: abes-calames-rdfa.xml (3.9 KB) hal-oai.xml (32.8 KB)

Here you are:

xpath_loop.knwf (59.6 KB)

:blush:

Well thank you it works well.
Then I would like another item “baseUrl”, so I added it in Column Expression with this expression “join(”/dns:sources_", column(“item”),"/dns:source/dns:", column(“item”),"/dns:parameters/dns:crawl/@baseUrl")" and in Xpath Query but it is weird that it doesn’t work for this specific item. I mean it works for one type of “source” (oai-pmh) I can can find in my table but not for the other (?sitemap). Is there an error in my query? Do I need to change something in the RegexReplace in String Manipulation? Thank you

The XPath you have created in the Column expressions node is converted to a flow variable which should be assigned to the corresponding xpath variable option the XPath node. Go to the Flow Variables tab of the XPath node and assign it to the xpath option. the first one is xpath0, then xpath1 and so on.

:blush:

Excellent, thank you, you’re great! If I want to convert my end table into a CSV files, should I use the XLS writer after the End loop or this one has to be included somewhere into my workflow? And should I use a new string manipulation with a new loop?


:slight_smile:

1 Like

If you want to export the final table (after reading all XML files) to a CSV file, you have to use a CSV Writer node after the Loop End node. But if you want to export the extracted data from each XML file to a separated CSV file, then use it after the String Manipulation inside the loop (the output port of the String Manipulation connects to the Loop End and CSV Writer nodes). Remember you have to create different file names using the Create File Name node if you want to use the writer node inside the loop.

:blush:

2 Likes

Thank you, it worked well yesterday. However, today, I tried again and I had a problem with a warning “Input file ‘C:\Program Files\KNIME\missing’ does not exist”. In XML reader, I have chosen “location” as you told me, but when I choose this there the word “missing” which appears and then it doesn’t work. I don’t understant what happened.


I chose a folder for my all XML files.

Thank you for your help

Just execute it if your XML files are in “C:\Users\Bunel\knime-workspace\Test2”.
If the path to the folder which contains the XML files is correct, then just ignore the error and execute the workflow and should work. In this case, the error appears since the List Files and the Table Row To Variable Loop Start nodes are not executed and there is no variable for “Input location”.

But if the error appears even after executing the previous nodes, then you have entered a wrong path to the folder. Do you have your XML files in “C:\Users\Bunel\knime-workspace\Test2”?

1 Like

It works well now. Thank you

1 Like

I still have a problem with an XML file. Apparently, it is well structured (I tried with Oxygen) but it is not recognize in KNIME. Even if I use a very eazy workflow like XML reader–>Xpath, when I try to create a Xpath query, it tells me that XML elements are not attributes or tags. I don’t know if it a file problem or KNIME problem. Could you please try to help me? PLease find my file here.univ-rouen-irihs.xml (3.6 KB)

Thank you

The XML file looks normal at the first glance but if you investigate further, there are line separator characters (ls) in some attributes and that’s the problem. So you have to use the String Manipulation node with the expression below (which will look like there is nothing inside double quotes when you paste it), then use the String to XML node and finally the XPath node.

EDIT: It seems the expression with the line separator character cannot be displayed here. So I share the workflow with the String Manipulation node included. The expression looks like this:
regexReplace($XML$, "", "")
But actually there is the ls character inside the first pair of double quotes.

You can use this sequence in a loop in your workflow. It has no side effects or impact on healthy files.

magic_regex.knwf (30.0 KB)

:blush:

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.