XML: replace some values with new values (based csv) and convert It back to (Original XML)

Hello friends,
I’d like some help with a task that I understand to be simple, but I haven’t been able to complete.

(I’ve looked at some topics on the subject, but honestly, none were simple to execute.)

What I want is this:

Scenario: I have a template XML file that already has its fields filled in. I need to replace some of these fields with new values. To do this, I’ll create a Loop with a second table of new values. Each row should result in a new XML output.

To simplify, I just want someone to help me with the REPLACE process of 1 of the values in the template with the new values. After the replacement, I need the output XML to be exactly the same as the original, meaning only the fields will be changed.

If someone could provide just one example, including the output, that would be great.

What is the best strategy?
Use xPath or regexRepalacer?

Goal: After the output XML file is generated, I will submit it to a website that reads it and generates a PDF with the payment slip.

“I thought about asking Gemini to do the work, but it will be hard, but I preferred to learn in KNIME. It would also be cool to create an LLM within KNIME, but I haven’t been able to start learning yet. I still find it difficult to understand (API configurations, prompts, vectors, etc.).”

Even thought, I read something about XSLT node, and I think I will have to use it.

This is my workflow, but I’m thinking if the best approach would be xml to string, then regex Replace, but, some XML has same tag name, but ident in another group, and replace could, replace both values.

XML_REPLACE.knar (213.3 KB)

Hello @Felipereis50,
Although it’s somewhat redundant, I’ve created a workflow to replace values in the XML. This is the result from the first iteration. The right column is the new one. Does it meet your expectations?

XML_REPLACE_tark.knwf (199.8 KB)


3 Likes

Kudos tark!

I just looked into this and - to be honest like anything else that is data to either JSON or XML - was a little put off as I feel it is pretty difficult to turn data into JSON / XML structures with KNIME.

Without having implemented any of this my current thought process to achieve this using a “path of least resistance” is as follows:

Provided that:

  • the overall structure of target XML does not change
  • you can uniquely identify any “leaf” where you want to insert the target values

My approach is:

  • spent some time and add unique placeholder values into the otherwise “blank” target XML structure
  • e.g. value from your column “NF” => placeholder in XML “##NF##”, value in your column “ChaveNFe” => placeholder in XML “##ChaveNFe##”
  • Once the full structure with placeholders is prepared, load it
  • Create a separate table that maps each of your column names to the corresponding placeholder above (one column contains each column header in a differet row, a second column has the mapped placeholder
  • Send your main data coming from ETL node into a loop, inside this loop transpose the incoming row, then join in the data from above mapping table.
  • turn your XML into string and then use String Replacer (Dictonary) to replace the placeholder values in the XML with the actual values from the incoming row into the loop.
3 Likes

Hi Tark, I analyzed your structure, and it is very… very… ‘creative’. :ok_hand:
I see that you transposed and used the Join node.

But now, with your structure, I will update and create a better version.

I will come back soon.

2 Likes

I anticipate that at some point I’ll have a problem with regex (later)

Below is the complete XML structure (downloaded from the website that explains the rules).

Explanation
Notice that some tags are repeated, for example, the group:

<identificacao>
 <CPF>...</CPF>
 <CNPJ>...</CNPJ>
 <IE>...</IE>
</identificacao>

This group appears twice; however, the “parent” tag is different:

(1)

<contribuinteEmitente>
 <identificacao>
 <CPF>...</CPF>
 <CNPJ>...</CNPJ>
 <IE>...</IE>
 </identificacao>

(2)

<contribuinteDestinatario>
 <identificacao>
 <CPF>...</CPF>
 <CNPJ>...</CNPJ>
 <IE>...</IE>

There are several rules in the XMLs (which I won’t explain here), but depending on the condition, some of them will be omitted. For example, if there is no value for the <CNPJ> tag of the <contribuinteEmitente> group, it should not be included, nor should <IE>, resulting in this:

<contribuinteEmitente>
 <identificacao>
 <CPF>...</CPF>
 </identificacao>

I’ll figure out how to implement these rules, but there’s still one point I’m unsure about. In your help, you used a regex replace, assuming we are iterating the <CNPJ> field of the <contribuinteEmitente> group. When we use regexReplace to modify the XML, it will also change the same tag in the <contribuinteDestinatario> group.

Since we are at the “row” level without any identification, it’s more complicated to identify.

These types of rules in XMLs are very common, where tags are repeated within groups. I asked Gemini to teach me how to build an XML based on conditions, and it gave me rules in Python (which I know nothing about).

Besides your help, I thought: How do large companies build XMLs in an organized way, considering these rules? My idea would be, instead of doing regex replace, to build the XML in blocks of groups based on IF rules. But for that, “I have no idea yet.”

Anyway, one step at a time. Tonight I’ll start modifying my code considering your rules.

(By the way, I’ve never used the Recursive Loop. I use the group loop a lot, and I’ve watched several videos and examples, and it never clicked in my head “when” I should use it. In your example, it became easier for me to understand. You used it to iterate over the same “XML” column until the iteration for the number of rows ended. Very interesting.)

Completed structure

Versão 2.00
<TDadosGNRE versao="2.00">
  <ufFavorecida>...</ufFavorecida>
  <tipoGnre>...</tipoGnre>
  <contribuinteEmitente>
   <identificacao>
    <CPF>...</CPF>
    <CNPJ>...</CNPJ>
    <IE>...</IE>
   </identificacao>
   <razaoSocial>...</razaoSocial>
   <endereco>...</endereco>
   <municipio>...</municipio>
   <uf>...</uf>
   <cep>...</cep>
   <telefone>...</telefone>
  </contribuinteEmitente>
  <itensGNRE>
   <item>
    <receita>...</receita>
    <detalhamentoReceita>...</detalhamentoReceita>
    <documentoOrigem tipo="..." >...</documentoOrigem>
    <produto>...</produto>
    <referencia>
     <periodo>...</periodo>
     <mes>...</mes>
     <ano>...</ano>
     <parcela>...</parcela>
    </referencia>
    <dataVencimento>...</dataVencimento>
    <valor tipo="..." >...</valor>
    <convenio>...</convenio>
    <contribuinteDestinatario>
     <identificacao>
      <CPF>...</CPF>
      <CNPJ>...</CNPJ>
      <IE>...</IE>
     </identificacao>
     <razaoSocial>...</razaoSocial>
     <municipio>...</municipio>
    </contribuinteDestinatario>
    <camposExtras>
     <campoExtra>
      <codigo>
      <valor>
     </campoExtra>
    </camposExtras>
   </item>
  </itensGNRE>
   <valorGNRE>...</valorGNRE>
   <dataPagamento>...</dataPagamento>
</TDadosGNRE>
1 Like

Hi @MartinDDDD,
Thank you for sharing your approach. As you mentioned, mapping of xml tags and the table column headers appaers to be required. I think your placeholder approach is more robust one.

2 Likes

Hi @Felipereis50,
Thank you for sharing your situation. It seems more challenging. I think the columns in the second table need additional information to distinguish the XML tags. I had to resort to string manipulation and tried it with KNIME, so it ended up becoming quite complicated. :sweat_smile:

XML_REPLACE_tark 2.knwf (261.3 KB)

This is a sample xml I prepared. There are two CPF values with different parent elements.

<contribuinteEmitente> ... <CPF>012345678</CPF>
<contribuinteDestinatario> ... <CPF>01280332239</CPF>

This is a sample table for replacing XML values with new ones. Here are the rules:

  • The column names should match the XML tags.
  • If there are identical XML tags, add one parent name to the column name to distinguish the tags.

In this example, do as follows:

<contribuinteEmitente>CPF
<contribuinteDestinatario>CPF

With this, the preparation is complete. When you run the workflow, the values should be replaced. However, I’m not confident it will work for other cases and it’s a complicated workflow. I hope a more efficient solution can be found.

Output

2 Likes

Different approach: I often use Handlebars templating for building HTML, XML, etc. from a workflow.

You can use this node, input the data a JSON, use iteration, loops, conditionals, etc. (see here for some more details).

5 Likes

Hello Tark,

First of all, I apologize for the delay. This week was very busy and I didn’t have time to analyze it.

Here we go:

OMG, what a lot of work this is. Jesus.
You put in a lot of effort. I found it very interesting. A little complex either.

I’ll do the following: I’ll analyze every detail and see if I can use anything.

I will return as soon as I find the final solution. I’m trying a different strategy. I’m going to try to build the XML step by step.

1 Like

Tark
I did it differently.
I built the XML step by step.
I followed the manual and created a single string, considering the rules.
My flow became much smaller and simpler.

Anyway, I appreciate your help. :fist_right: :sparkles: :fist_left:

I think for XML, the ideal approach is to build it step by step, and if there’s any group that shouldn’t be included, to do it by rule.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.