XML Parsing based on self ending tag element name & converting it to csv

suhridghosh.01 · July 31, 2020, 4:52am

Hello,

     I have started working on XML file extraction using XML reader & XPATH node. 
     But there is one such case which i have encountered in my XML file & i am 
     not getting how to parse & get it in desired csv format. 
     My XML holds information like PhoneNo & IMSI which are optional parameters. 
     It is like as shown below:

XML Input: 
<CDRFile>
    <OCCRecord>
	  <resultCode>2001</resultCode>
	  <nodeName>thocc1a</nodeName>
        <servedSubscriptionID>
            <SubscriptionID>
                <subscriptionIDType><PhoneNo/></subscriptionIDType>
                <subscriptionIDValue>8989767856</subscriptionIDValue>
            </SubscriptionID>
            <SubscriptionID>
                <subscriptionIDType><IMSI/></subscriptionIDType>
                <subscriptionIDValue>408998000678628</subscriptionIDValue>
            </SubscriptionID>
        </servedSubscriptionID>
    </OCCRecord>
    <OCCRecord>
        <resultCode>2001</resultCode>
        <nodeName>thocc2a</nodeName>        <servedSubscriptionID>
            <SubscriptionID>
                <subscriptionIDType><PhoneNo/></subscriptionIDType>
                <subscriptionIDValue>89797834567</subscriptionIDValue>
            </SubscriptionID>
        </servedSubscriptionID>
    </OCCRecord>
    <OCCRecord>
        <resultCode>2001</resultCode>
        <nodeName>thocc3a</nodeName>        <servedSubscriptionID>
            <SubscriptionID>
                <subscriptionIDType><IMSI/>
</subscriptionIDType>
                <subscriptionIDValue>897554000678628</subscriptionIDValue>
            </SubscriptionID>
        </servedSubscriptionID>
    </OCCRecord>

</CDRFile>

I would like to get it in the below desired format in csv

resultCode, nodeName, phoneNo, IMSI
2001,thocc1a,8989767856,408998000678628
2001,thocc2a,89797834567,-1(since it is optional needs to be replaced with -1 if not present)
2001,thocc3a,-1(since it is optional needs to be replaced with -1 if not present),897554000678628

Please guide me how i can achieve this using xpath. Appreciate everyones response.
XML-Test-Sample.xml (1.3 KB)

Thanks,
Suhrid Ghosh

armingrudd · July 31, 2020, 6:36pm

Hi @suhridghosh.01,

You can easily select elements in the XPath node and extract values. What is you problem?

If a values is not available, there will be missing cell which can be handled by the Missing Value node later.

suhridghosh.01 · August 3, 2020, 4:15am

Hello @armingrudd Sir,

Thank You for your reply , but my problem is how do i get it in the desired format in csv, which i have been trying. As you see subscriptionIDType would contain the type of attribute as self ending tag & its value will be present in the following subscriptionIDValue tag. I have been tring to convert self ending tag in to value between the subscriptionIDType tags using string manipulation but not been able to.
If you can guide me on this if i can easily achieve this using xpath or provide a sample workflow for the above xml file ,it would be much helpful to see how it is achieved.

armingrudd · August 3, 2020, 6:48am

Use this expression in the String Manipulation node: (change $XML$ if your xml column has a different name)

regexReplace(
	regexReplace(
		regexReplace($XML$, "\n", "_nnn_")
	, "(?<=<subscriptionIDType>)_nnn_\\s+<([^>]*)>_nnn_\\s+<\\/[^>]*>_nnn_\\s+(?=<\\/subscriptionIDType>)", "$1")
, "_nnn_", "\n")

Then using the String to XML, you can convert the column back to XML and parse it with the XPath node as desired.

suhridghosh.01 · August 3, 2020, 7:02am

Hello @armingrudd Sir,

                                 Thank you for the reply. I have tried this regex but it doesnt work as per my workflow. Attaching the sample workbook & data set for the same which i have been working on.

XML processing work flow.knwf (9.7 KB)

XML-Test-Sample.xml (1.3 KB)

Thanks,
Suhrid Ghosh

armingrudd · August 3, 2020, 7:15am

You should have used the String Manipulation after the XML Reader. Here is the modified version of your workflow:

25578-1-1.knwf (41.9 KB)

suhridghosh.01 · August 3, 2020, 7:23am

Hello @armingrudd Sir,

                            Thank You Sir , it works perfectly fine , sorry my mistake that i should have used it after XML Reader. I will try to understand the regex :).

                             Thank you once again sir.

Thanks,
Suhrid Ghosh

armingrudd · August 3, 2020, 7:30am

First I have replaced new line characters with something else. Then I have used lookbehind and lookahead to locate the type values and replace whatever is between type tags with the type name.

You can learn more about regular expressions here.

PS Take a look at the XPath nodes as well. I have used different configurations.

suhridghosh.01 · August 3, 2020, 7:37am

Hello @armingrudd Sir,

                                 Thank you once again for the detailed explanation on regex, I have seen the XPATH configurations as well & its perfectly understood. I will surely go through that link for regex for my future workflows.

Thanks,
Suhrid Ghosh

system · August 10, 2020, 7:37am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.