Split in more than two parts

Hello!

Here is my problem.

My table contains rows that have a mol_id. This mol_id is shared by hundred of rows. I'd like to separate the rows depending on the mol_id and then do some stuff with only these rows. A spliter would be great but the "splitter" node can only separate rows into two parts and I have more than two mol_id.

One of my idea is to tell Knime: take only the rows with one mol_id in particular and do the job, then choose another mol_id and do the same job, until you take all the rows. That's sound like a loop, but I can't find a loop that allows me to loop with the mol_id. I'm a beginner with the loop but I think it's possible, I just don't know how...

Anyone has an idea?

Thanks.

Nico

Use a similar process to the workflow I built you on a previous post.

Use a GroupBy node on the Mol_ID column, this will get all the unique Mol_IDs. Then take this resultant list and pass it into a TableRow to Variable node.

Now connect a Row Filter node upto your main dataset, right click on the node and choose show flow variables. Connect the TableRow to Variable node to the left variable port of this row filter node.

Now configure the Row Filter node, choose to Include by Attribute value, use Pattern matching, and either click on the little variable button next to pattern matching and choose Mol_IDs from the dropdown, or click on Flow Variable tab and choose Mol_ID from the dropdown next to Pattern. Now click ok.

Now complete this with a Loop End.

This will now iterate over each Mol_ID per loop, all you need to do now is put in the nodes you want to use for the manipulate (i.e. what you want to do to the data) in between the Row Filter node and Loop End node.

Hope this helps.

 

Simon.

1 Like

Thank you Simon, it works perfectly.

Now if I want to write a xls file for each mol_id at the end of the work, how could I specify a different filename for each of them?

I've tried to add the "xls writer" after the "loop end". Then I've connected the left variable port of the "xls writer" to the right variable port of the "rwo filter". In the flow variables settings, for filename I've selected mol_id but it doesn't work. If I specify a filename in the writer options, after the first iteration it can't write another file cause the name is the same.

Nicolas

You are nearly there.

Remember that the Mol_ID variable will not have a pathname in it. So you will need to write the variable out to a column with Variable to TableColumn first. Then generate another column with RuleEngine for instance to put in the pathname part, i.e. C:\Temp\

Then use a Column Combiner node to join these two columns together so you will then have contents which look like C:\Temp\Mol_ID. Now convert this to a variable with TableRow To Variable. Use this new variable as the filename in the XLS Writer node from the Flow Variables tab.

Hope this helps.

Simon.

If you put the xls writer before the loop end (you can put the same table into both nodes), to ensure that the writer executes as part of the loop, connect a flow variable link from the top right flow variable port of the xls writer to the top left flow variable port of the loop end.  The link doesnt transfer any data, but ensures KNIME treats the writer as part of the loop.

Steve

Hello Simon (again), hello Steve,

Thanks to both of you, I have something that almost work.

I've done what Simon told to generate pathname. I have connected the "Variable to TableColumn" node to the row filter, and I have put a link between the flow variable port of the "TableRow to Variable Loop Start" and the flow variable port of the "Variable to TableColumn". Then I generate the pathname, and I put a link between the flow variable port of the"TableRow to Variable" and the left flow variable port of the "xls writer" node which is also connected at the "row filter".

To finish I have connected the "row filter' to the "loop end" and I've linked the flow variable port of the "xls writer" and the "loop end".

I hope you're still here ;)

So now when I execute the job, some xls files are created and they contain what I what but there are 2 things:

- I have these kind of warning :

WARN      XLS Writer     Errors loading flow variables into node : No output filename specified.
WARN      XLS Writer     Errors loading flow variables into node : No output filename specified.
WARN      TableRow To Variable     Table has 97 rows, ignored all rows except the first one

- One of the id I use to create filename contain a "/". That's horrible! I guess you now why...

Do you have ideas to solve these problems?

Nico

Well before you do the Column Combine step, you could take the Mol_ID column, and use a String Replacer node to replace all instances of "/" with something of your choosing. So now any MolIDs which have this "/" in the Mol String will have been replaced.

Simon.

The warning messages I see it are nothing to be alarmed about.

The XLS Writer warning are just saying no filename is specified, probably because the workflow is at a point where it doesnt know what the variable will be for the XLS Writer node. As the workflow works through the loop, the variable is defined and it generates the XLS file as expected. So this doesnt mean anything has gone wrong.

The TableRow to Variable warning is just indicating that at the step where the Filename is defined as a variable with the TableRow to Variable node, it is just saying that it only took the first row in defining that, and ignored the rest of the rows in that particular loop iteration. Again, this is fine, as all the other rows will have the same definition of the variable, as it was duplicated down the rows with the RuleEngine node.

Simon.

Okay, problem solved!

Thank you very much!!

Nico

Hello,

I'm relatively new to Knime, would like to use row filter and use pattern matching based on a flow variable but with wild cards before and after the flow variable value. would very much appreciate any suggestions. Thanks!

Karen

The way I do this is to use the variable to table row node, and then constant value column node to generate two new columns, each containing * only. Use column resorter to position the variable which is now as a column Inbetween the two columns containing a *.

now use column combiner node to combine all 3 columns. You will now have a new column with the variable contents with a * either side. Now use table row to variable.

now use this variable in your row filter node.

simon.

Hi Simon, thank you so much, I would not have figured it out myself being new to Knime, and not knowing all the functionalities/availabilities of nodes. It is really powerful!

-Karen