I’m learning to use KNIME and I need to use two variables to establish a range and I’m trying to use node Variable Creator to create the range. OK.
I have a file with thousands of lines and I need to change a range. The reference in the file is identified like this: |C010| followed by a numerical identification.
In notepad++ I look for all occurrences with |C010| of the file to know the starting and ending line, as shown in the result in the list below. Let’s say I need to change the block [|C010|15436940000103|1|].
In the “Rule-based Row Splitter” node I manually add [ $rowindex$ > 8 AND $rowindex$ < 291 => TRUE ] - (without the square brackets) - to apply the rule in the range where I will later proceed with the changes to be performed in this interval with the Column Express node that I already use in other dataflows.
I accept help with other suggestions as well even if I have to add other nodes. I need to use the nodes [“Math Formula”] as a numeric ID and not a String and I need the node [“Rule-based Row Splitter”] because I will join the information by concatenating the data after the treatments in the interval.
One of the options I tried was to use within the Rule-based Row Splitter node the definition of a variable in the Expression area to store the value of the row, but without success.
This range varies depending on the file, so it is not always on the same line that the information below will be.
Line 8: |C010|15436940000103|1|
Line 291: |C010|15436940000367|1|
Line 537: |C010|15436940000286|1|
Line 1071: |9900|C010|3|
Hi,
what you think about to split your input-column into at least 2 new ones to obtain a column with your search items like “C010”?
So I tried following:
Remove leading “|” with string manipulation
“Cell splitter” Node with “|” as column delimiter
Add Math Node to Add a Row-Index
Afterwards you can add a Group-By node to get the minimum and maximum Row-Index
|- ID -|- MIN -|- MAX-|
|C010|7|536|
|C180|8|1063|
|C181|9|1064|
If you are interested in all Row IDs you can use “List” option within Group By node.
Thanks for the feedback.
I’m slowly getting to know the power of KNIME. Bearing in mind that I have also received help on other occasions that have been extremely useful.
This type of file cannot undergo any type of modification in its structure as it is a tax file. Including the following changes that are necessary to make, I deal with the column expression with decision structure not to consider the " | " as a delimiter due to its different structure throughout the entire file. It was just a small sample that I made available of the data that needs to be changed, for testing. After the ETL process the file needs to be the same size on your property.
There are requests that I will need to change the data that is between the cnpj 15436940000367 ( |C010|15436940000367|1| ) inside the file. Considering the example below so I’m currently using the Rule-based Row Splitter node and entering the row number information.
/*
Line 8: |C010|15436940000103|1|
Line 291: |C010|15436940000367|1|
Line 537: |C010|15436940000286|1|
Line 1071: |9900|C010|3|
*/
$rowindex$ > 291 AND $rowindex$ < $537 => TRUE
With my little knowledge and some research I used this method to define the range that should be considered for modification.
If there is a method to store this variable every time it is found |C010|15436940000103|1| or |C010|15436940000367|1| or |C010|15436940000286|1| , I wouldn’t use the line number anymore so I don’t have to keep each file opening in np++ and looking for the beginning and end of each “cnpj”. Each file changes the line placement of this “cnpj” information.
Sometimes some manipulations are required to achieve the desired result. What Andreas suggested can easily be reversed with a Column Aggregator (and a String Manipulation to get the first “|” back).
The files structure will be unchanged afterwards.
Yes, the files will always have the same reference information. The reference I use to identify the range to be changed is by the cnpj which is in the second position after " | ". Example: changes should only be made to the lines below cnpj 15436940000367 and above cnpj 15436940000286
I have managed to get to the point as shown in the attached file.
I think I’m close to the goal.
Once I found the line number where each cnpj starts, now I’m not able to get that data into the Rule-based Row Splitter node to define the range from one cnpj to another through the line number.
Maybe my example wasn’t clear enough. I think yor need to filter twice: First to create a table with the corresponding row indexes, and the second time to apply these indexes.
Hello @andymesmo
I’ve been messing around with your data. I interpreted that your pain point is to set variables from data frame.
Have a look to this different approach.
Some updates happened since latest post (specially to the picture). Split data was achieved with the proposed rules by using ROWINDEX variables; as suggested in @ActionAndi 's workflow, but applied to a Rule Based Row Splitter straight away.
ROWINDEX addressing to string variable’s values, has been approached based on a REGEX wildcard logical indexing method.
Your interpretation and understanding was completely helpful. The result is exactly what I need. I applied all the flow logic by making some adjustments with the data flow I was working on. I learned and added even more knowledge from everyone who cooperated together, @Thyme and @ActionAndi. With the clarifications of each one, I have sought to achieve the expected result and this I consider as a challenge and learning.
To share, I’ll leave my data flow for learning purposes, as I understand that KNIME has several ways to achieve results. It’s a very powerful tool.