Limitations for headers, functions restricted either to rows or columns, read in file-names

I started to use KNIME a couple of weeks ago and found it really great to extract and transform data from microarrays. Also the integration of R functions is extremly useful allowing me to use functions such as quantiles normalisation within the workflow. However, some useful functions/NODES are missing or either restricted to rows or columns.

  1. Header
    Headers are not easy to handle. E.g. microarray files usually contain a header and column names are not given in the first row but in one of the following rows. In order to read this files the option “Enable short lines” has to be activated". Unfortunately, many of the fields containing the columns names are replaced by a “?”. I have no ideay nor can I understand why some fields are not converted. E.g. the following row
    Block Row Column Name ID X Y
    is converted to
    Block ? ? Name ID ? ?
    Is there any way to prevent replacement of fields by the missing value “?”? This would be very helpful, since then all rows above the header could be removed and the first row could be turned into a column ID.
  2. Functions restricted to either rows or columns
    Some functions are either restricted to nodes for rows or columns which I found very inconvenient, since I have to use the transform NODE as a workaround (.e.g. the META MODE “Iterate list of files” is extending rows only and not columns, an appropriate node such as rowID is not available for columns). Furthermore, the transform NODE is quite time consuming particularly for larger files. Is it not possible to provide some of these NODES for both columns and rows?
  3. Read files into meta nodes
    If several files are read into a meta note (e.g. “Iterate list of files”) it would be very convenient if the names of the files could be read in and used as column or row names suffix. However, I did not find an appropriate function for this.
  4. Training
    For many of the availabel nodes I really have now idea of their usage but I guess some of them might be extremely useful. Is it planned to perform a user seminar/workshop that demonstrates the usage of nodes and allow users to discuss their problems? I could imagine, that this might be interesting for many other users.
    Thanks in advance for any comment.

Hi Stefan, I agree mostly with what you say. The nodes are mostly used to either work on columns or rows. Let me go through your list:

  1. (Header) The File Reader currently doesn’t allow you to skip the first n lines. (That probably would fix your problem). Unfortunatly I don’t understand your example above, probably because the line breaks were not preserved in your posting?
  2. (Row vs Column based operation). Understood and agreed, though I guess you were talking about the Transpose node and not a Transform node? The only node that works on both columns and rows at the same time is the JPython Scripting node. However, it does not give you the ability to define the column names during the execute, … and it doesn’t allow you to have a dynamic number of output columns (you need to specify the number of (additional) output columns in the dialog). Please don’t take this as a built-in limitation of KNIME. We are just missing nodes that are more flexible in that respect (and we are missing a clear picture of what such a node would look like). You can always donate such nodes to KNIME (or sponsor them), the same is true for missing functionality, like e.g. the “skip rows” option in the File Reader or the Column ID counterpart to the RowID node.
  3. (Column names to contain current file name). This should be possible via variables. My idea is to use the current file name (which is available as variable), transform it to the (single cell) table and use it with the RowID node to do a replacement of the actual row ID column.
  4. (Training). We do offer trainings. The next one will be in Zurich in Feb 2010. (See here for details.)
I hope this answers most of your questions, Bernd