Split and remove some texts in the input lines and store the result in new columns using String Manipulation node.

Hi KNIME Community Team,

I am new to KNIME. I have a small query regarding string manipulation node.

I have a input file as below like.

2015-05-01 00:03:21,962 NodeName CellName ModuleName TechnologyName <xml>..xml contents.</xml>
2015-05-01 00:03:21,962 NodeName CellName ModuleName TechnologyName <xml>..xml contents.</xml>
2015-05-01 00:03:21,962 NodeName CellName ModuleName TechnologyName <xml>..xml contents.</xml>
2015-05-01 00:03:21,962 NodeName CellName ModuleName TechnologyName <xml>..xml contents.</xml>

...

...

2015-05-01 00:03:21,962 NodeName CellName ModuleName TechnologyName <xml>..xml contents.</xml>

Using Line reader I have loaded each line in one new row. 

Now, I would like to remove upto Technologyname in the each line and store only <xml>...</xml> contents as string in the table for further process.

I have tried some functions which is provided in the sting manipulation node. But, no luck. Kindly suggest some ideas to achieve this.

Thanks in advance.

Kind regards,

velu

 

 

Hi,

here is the function for cutting everything before <xml> using the string manipulation node

substr($column1$,indexOf($column1$, "<xml>") )

Best, Iris

Hi Iris,

Thank you so much for your reply and your solution worked as I expected.

In addition, I am trying some interesting activity also with the input.

e.g

2015-05-01 00:03:21,962 NodeName CellName ModuleName TechnologyName <one: xml>..xml contents.</one: xml>
2015-05-01 00:03:21,962 NodeName CellName ModuleName TechnologyName <two: xml>..xml contents.</two: xml>
2015-05-01 00:03:21,962 NodeName CellName ModuleName TechnologyName <three: xml>..xml contents.</three: xml>
2015-05-01 00:03:21,962 NodeName CellName ModuleName TechnologyName <four: xml>..xml contents.</four: xml>
2015-05-01 00:03:21,962 NodeName CellName ModuleName TechnologyName <five: xml>..xml contents.</five: xml>
2015-05-01 00:03:21,962 NodeName CellName ModuleName TechnologyName <six: xml>..xml contents.</six: xml>
2015-05-01 00:03:21,962 NodeName CellName ModuleName TechnologyName <seven: xml>..xml contents.</seven: xml>

For that, I have to executed the below nested expression as follow,

substr(substr(substr(substr(substr(substr(substr($Column$, indexOf($Column$, "<seven: xml")),indexOf($Column$, "<six: xml")),indexOf($Column$, "<five: xml")),indexOf($Column$, "<four: xml")),indexOf($Column$, "<three: xml")),indexOf($Column$, "<two: xml")),indexOf($Column$, "<one: xml"))

It is worked fine and no issue.

Is there any other way can I use, instead of this nested expression?
Means, expecting to execute multiple expression in the String manipulation node.
Because, later I am planning to store those remining values(2015-05-01 00:03:21,962 NodeName CellName ModuleName TechnologyName) also in the columns as a string along with the xml column.
Like,

| Date | Time | Node | Cell | Module | TechnologyName | XML Column |

Kindly, provide your suggestion to achieve this and appreciate your great work.


Best Regards,
Velu

Hi Velu,

I would make this in two nodes. First: Split the xml from the rest. Second use a cell splitter on the first part using a space as splitting criteria. Afterwards you can rename the splitted values.

In the string manipulation node you always have only output. Thats why you would need one per new column.

Best, Iris

Hi Iris,

Now, I have two columns,

Column1                                                                                                           Column2

2015-05-01 Name Technology <one: xml>..xml contents.</one: xml>       <one: xml>..xml contents.</one: xml>
2015-05-02 Name Technology <two: xml>..xml contents.</two: xml>       <two: xml>..xml contents.</two: xml>

 

Now, I would like to remove the xml content in the Column1. For that, I have used the following expression,

substr($Column1$, 0, indexOf($Column1$, "<one:xml"))

It is worked fine and returned the output properly. (2015-05-01 Name Technology)

But, when I execute the nested command, It is not working.

substr(substr($Column1$, 0, indexOf($Column1$, "<one: xml")), 0, indexOf($Column1$, "<two: xml"))

to get the output as,

2015-05-01 Name Technology

2015-05-02 Name Technology

 

Can you please, help me on this? Where I missed?

 

Best Regards,

Velu

 

 

Hi Iris,

Thanks for your support and I will try your suggestions as well.

Best Regards,

velu

Hi Iris,

2015-05-01 00:03:21,962 NodeName CellName ModuleName TechnologyName <one: xml>..xml contents.</one: xml>
2015-05-01 00:03:21,962 NodeName CellName ModuleName TechnologyName <two: xml>..xml contents.</two: xml>

 

When I try to remove the input upto <one:xml using the following expression,

substr($Raw_Input$, 0, indexOf($Raw_Input$, "<one: xml")) worked fine. I am storing the result in the new column. After that, I used Cell splitter for split the result also working fine.

Similarly, when i try to execute it below,

substr(substr($Raw_Input$, 0, indexOf($Raw_Input$, "<one: xml")) , 0, indexOf($Raw_Input$, "<two: xml"))

It is not working.

Could you please help on this nested expression.

 

Best Regards,

Velu