Table parsing / Multidimentional assay (multiple experiment with multiple objects that have multiple features)

gbonamy · April 15, 2009, 11:47pm

Dear all,

I am new to KNIME and wanted to get some help parsing a multidimensional table. This table contains multiple lines each line corresponds to a full well. however, there are multiple objects that are recorder in each well and each objects have multiple features associated with them.

If you could provide me some help parsing the following table into the provided destination format, that would be very helpful. I have loaded KNIME in expert mode to gain access to more advanced loop features, but I have a hard time to understand how some of these nodes work.

Let assume the following CSV table where Cols are labeled in the first row (notice how the data in column Data1, Data2, Data3 is a String where numbers are seperated by “;” (one can expect Data1, Data2 and Data3 to have the same size, as they are properties of a same object):
Original table:
Barcode,WellID,Data1,Data2,Data3
AAAA,A01,2;2;5;4;5;,3.2;2.3;3.4;4.6;3.1;,A;B;C;B;A;
BBBB,B02,8;7;9;,3.2;2.1;1.0;,A;A;A;

Desired output table:
Barcode,WellID,Object#,Data1,Data2,Data3
AAAA,A01,1,2,3.2,A
AAAA,A01,2,2,2.3,B
AAAA,A01,3,5,3.4,C
AAAA,A01,4,4,4.6,B
AAAA,A01,5,5,3.1,A
BBBB,B02,1,8,3.2,A
BBBB,B02,2,7,2.1,A
BBBB,B02,3,9,1.0,A

I could write a specific parser in Java or perl, but I am most interested to see how this could be done using KNIME.
Thanks in advance for the help.

aborg · April 16, 2009, 9:11am

If you read the data with File Reader to get each column read, then you may try my node. It can be found here (the update site): http://hits.googlecode.com/svn/trunk/ie.tcd.imm.hits.update
It is not really documented, but I guess the Pivot node will do the trick for you.

aborg · April 16, 2009, 10:22am

Sorry, my bad. I was not enough cautios when answered. In your case the Unpivot might be useful (from KNIME utilities feature), but only in that case if the rows generate the same number of results. Else you have to filter to rows with empty content in the generated columns. Here is a setup which does the job for you: File Reader → Cell Splitter → Cell Splitter → Cell Splitter → Column Filter → Unpivot → Row Filter → Rename In this case you should split the Data1, Data2, Data3 columns, then remove them from the results. After set this pattern in Unpivot: Data\d_(Arr[\d]). Now You can filter the rows. You may want to rename the columns, because those have a trailing underscore. Hope this help. In case you need this parsing multiple times I would recommend use it as a meta node. (See this thread to save it as an extension to KNIME.)

gbonamy · April 16, 2009, 11:41pm

Aborg,

Thanks for the answers.

Somehow I cannot find the Unpivot Node (from KNIME utilities feature). The only Unpivot nodes comes from: http://hits.googlecode.com/svn/trunk/ie.tcd.imm.hits.update. How can I install it?

For the table, as you can see the number of objects in plate AAAA,A01 is difernet from BBBB,B02. I am not sure this is what you meant by but only in that case if the rows generate the same number of results. I suppose that what you mean is that once the upivot is done, I can then remove empty rows with an Row filter?

Finally as you may suspect, I do have more than 3 Data columns. (I just used 3 as an example). Howver, I have not figured out, how I could pass the row name via the flow variables (or other mechanism) to let say the cell spliter. I uppose that to iterate I would need to provide a table with the list of rows to split etc. and then then repeat the procedure. Perhaps, a way would be to rename to column that needs to be modified?

Thanks for the help,

aborg · April 17, 2009, 12:57am

Hi,

Sorry, maybe I was not enough clear. So, you can add the update site like described in this page. The only difference is that you should select the KNIME utilities/KNIME utilities 0.0.1 feature instead of the HiTS main feature. (The latter might be also useful if you want to analyse data from HTS/HCS experiment.)
Yes, the Unpivot is designed to use with the same split count, but it works -by generating empty cells- even if this assumption is not hold.
I guessed you have more than 3 data columns. It is easy to go through the columns with a Cell Splitter node. If they have the same structure (like Data and a number) this will not be a problem, but if there is more than one naming convention you might have the same number of Unpivot nodes with different configurations. (Assuming you have the same naming convention, you might use the following pattern instead of the previous one: Data\d+(_Arr[\d+]).) I forget to mention that the column which you called Object# in this example will have values like: _Arr[0] instead of 1, … . Hope this is not a huge problem. (You might use Java/Python snippet node, or some other nodes to get only what you want from that node.)

PS.: If you have problems with the update site, you might try to copy to your eclipse/knime installation’s plugins folder the ie.tcd.imm.knime.util_0.0.1.jar file, and to the features folder the ie.tcd.imm.knime.util.feature_0.0.1.jar file (all these should be done with not running eclipse/knime).

gbonamy · April 17, 2009, 3:23am

Thanks again for the very prompt answer. Perhaps the confusion came from the fact that the update manager does not detect the HITS instalation but only the KNIME utilities 0.0.1 (perhaps this is linked to using eclipse 3.4 , and could be fixed). After adding the URL provided in my update manager all I see is: HITS Official update site -> KNIME utilities -> KNIME utilities 0.0.1.

This does install the Unpivot Node in the "data manipulation node"

For the rest I think the hard part will be to iterate over the list of columns since they do have different naming conventions (but have the same number of elemnts, since these are features for a set of objects that remains constant). Once I figure out how to run the loops I will be all set.

Finally for the name of the object, that should not be a probem, I can do a text replace or write a small java snippet.
Thanks again,

aborg · April 19, 2009, 6:34pm

It seems it is not doable with the current tools, but hopefully someone will correct me if I am wrong.

It is easy to put the columns to table row values (Column Filter to select only the proper columns, ⇒Row Sampling to have only one unnecessary column in the result after Transpose, then use the RowID to put the original column names to a column, and at last with Column Filter it is easy to remove the unnecessary column left there by the Row Sampling step.)
The problem is with the different structure when the user tries to use the Variables Loop (Data) with the Cell Splitter node. I got this error messages:
Input table’s structure differs from reference (first iteration) table: Column 5 [name=Data2_Arr[0],type=DoubleCell] vs. [name=Data1_Arr[0],type=IntCell]Column 6 [name=Data2_Arr[1],type=DoubleCell] vs. [name=Data1_Arr[1],type=IntCell]Column 7 [name=Data2_Arr[2],type=DoubleCell] vs. [name=Data1_Arr[2],type=IntCell]Column 8 [name=Data2_Arr[3],type=DoubleCell] vs. [name=Data1_Arr[3],type=IntCell]Column 9 [name=Data2_Arr[4],type=DoubleCell] vs. [name=Data1_Arr[4],type=IntCell]
Input table’s structure differs from reference (first iteration) table: Column 5 [name=Data3_Arr[0],type=StringCell] vs. [name=Data2_Arr[0],type=DoubleCell]Column 6 [name=Data3_Arr[1],type=StringCell] vs. [name=Data2_Arr[1],type=DoubleCell]Column 7 [name=Data3_Arr[2],type=StringCell] vs. [name=Data2_Arr[2],type=DoubleCell]Column 8 [name=Data3_Arr[3],type=StringCell] vs. [name=Data2_Arr[3],type=DoubleCell]Column 9 [name=Data3_Arr[4],type=StringCell] vs. [name=Data2_Arr[4],type=DoubleCell]

The most interesting thing is that after the third attempt (there were three data columns in the example) it finished without error. Although it has done only the last column’s split. Is this a bug, or just not complete yet, or this will never work?
Thanks, gabor

PS.: Yep, the problem is with the 3.4 eclipse. With 3.3 the update site it works well. Next week I plan to release a new version supporting 3.4 update, and some new features.