XPATH configuration - xpath query script

Hello,
I need to extract the row string references (to extract the exls cell color data) from this XLM file (XLS file with XLM format). I’m a beginner on the topic but, with teachings, I can succeed. What is the “xpat value query command” to extract all “” present in the XLM file?

XLM file and root:

@umberto75

I’m not quite sure where the cell color is in your data… in general if you want to capture a value that is nested somewhere - let’s say you want to grab what is between and :

you can do it like this

//c/v
1 Like

thankx @MartinDDDD your feedback, now i try it!
Do u know why i think there is a value color in string? Becasue i can do the chack with file xlsx master and i know, for exeple, the cells C5 and C9 are RED. Infact, the “s” value “=6”, in XML-cell same, are different rispect athers. Are u sure is not color referce?

C5
image

C9
image

I use this suggest string " //c/v" but i haven’t all output row, only one!

the settings:

the output:

@umberto75

Hi Rico,

I’m afraid I am not familiar with Excel XML structure to confirm (or deny) that the “c” tag does include color information…

I asked ChatGPT to create example data based on your image and with that I managed to do the following:

  • extract all the values between v tags
  • extract all the r attributes from the c tags:

Get the v tag values: Select a v-tag so that it shows in green the Xpath. Then click the Add Xpath button so the window appears. Delete what I have highlighted in red box - so that only the part with /v/c (in my example with the weird dns in front of it…) stays. Finally make sure that there is a double-forward slash at the beginning. Also select to output into multiple rows as string cell:

Final query should look like this:

Get the r attribute of c tag:

Perform similar steps as above, but select the r attribute inside the c tag this time so it shows green. Do the same steps as per above incl the //.

Final query should look like this:

Output:

You can try it with this example workflow. Note: By “selecting” the parts you want to extract you make sure you take care of this weird namespace thing (sorry have to admit I dont fully get why it’s dns:c/… for me…

Workflow with Example data:

ExcelXML.knwf (92.7 KB)

2 Likes

Hello @MartinDDDD your feedback is excellent !!! :clap:

But it not is done all !

The cells “r” are extracts all and good.
I use the similar DNS configuration with “s” ( the reference color ) with script:

//dns:c/@s

but i think there is a error with logical root tree dns because only the first output row are correct:

Can you help me with the correct configuration of XPath query?

Total configuration done in XPath:

@umberto75

Not sure how to help you - the table does not show any missing values for the second Xpath query so it seems to return a value that is set for that attribute.

To help further I need the data.

How do you turn this into XML?

yes i convert xls in xlm

I dont know how to do that - can you tell me how to do that or share the workflow that you use for the conversion?

Sorry ! OK

1-Create a new folder
2- you can use the node “UNZIP FILES (Legacy)”
3- configure unzip files ( legacy ) : set the file .xlsx and choose the output-folder

file data.xlsx (19.7 KB)

image

4- after esecute the node and look inside the folder all folders
5- look the root folder //xl/worksheet/sheet1.xml
6- the propriety set file .xlsx are in sheet1.xml

image

ty - managed to do that.

Have to admit I am a bit confused - the XPATH seems to select OK and as expected, but the values in the XML feel weird:

Red - 35 matches between cell value and what is in the XML
Yellow - XML says 1, Excel cell is empty
Orange - XML says value should be 544 whereas text in the cell says “Horror”

Also odd for the values behind s attributes:

Not really sure what is going on there to be honest…

1 Like

Right @MartinDDDD !
I use ugual DNS linguage, for the frist cell it’ s play!
U can show, for check ugual to my file extract, in the cell C22 and C23 show empty!
Why? :man_facepalming:

@umberto75

@armingrudd i don’t know if you follow the topic. Perhaps, you can help ours!

Thank your support.

Hi @Rico_Rico,

You should first extract the “c” element as a node cell and then extract the insider attributes or elements. The reason is that there are elements without “r” attribute or without “v” child element. When you parse them all at once, it lists all existing values so missing ones cannot be determined.

So, first a XPath with this path:
/dns:worksheet/dns:sheetData/dns:row/dns:c
and “Return type” = “Node cell” in multiple rows.
Then another XPath to parse inside the “c”.

1st:

2nd:

result:

I hope this solves the issue.

2 Likes

@armingrudd you are a genius! :100: :boom: :clap:

I was unaware the difference XPath data Type: “NODE” and “STRINC CELL” series

I undestand the root isn’t unique but haven’t a sistem solutin.

THANK YOU SO MUCH :1st_place_medal:

@MartinDDDD
@umberto75

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.