Creating a New Column and Group By.

kasthuri · April 9, 2007, 5:22pm

First, I would like to thank the developers of KNIME. Its really cool. I am wondering how can I add the entries of two columns and form a new column (or make a new column by doing operations on an existing column) ? Also, it will be great to know if I can do a 'group by' operation on existing column(s). For example, if my data set has column A with the following data: 2,3,2,5,3.

A simple 'group by' with a record count will give

Number ----- Record Count
2 -------------------- 2
3 -------------------- 2
5 -------------------- 1

It just counts the appearance of each number. If there are two or more columns we can do a similar 'group by' based on each column (just like sorting on more than one column). Is there any way to do this in KNIME ?

Thanks.
Kasthuri.

unknown_user · April 11, 2007, 12:27pm

Hi Kasthuri,

the solution for your first problem (add the entries of two columns and append the result as a new column) is straightforward: Use the "Java Snippet" node. The node description (this neat online documentation thingie that you usually find on the right of the KNIME window) contains an example for adding two columns.

Quote:

Also, it will be great to know if I can do a 'group by' operation on existing column(s). For example, if my data set has column A with the following data: 2,3,2,5,3. ...

That one is not that easy to solve (at least not in the current version). I can't think of any node that accomplishes a 'group by'. We shall have a node that does this in KNIME 1.3. For now, I can only suggest to use the 'Pie Chart' node ( available in the JFreeChart extension) or one of the histogram nodes. Although you won't be able to use this count information in the flow (the numbers will not be available in the node outports), you get at least an idea of what's in your data.

Bernd

unknown_user · April 12, 2007, 1:06am

wiswedel wrote:

We shall have a node that does this in KNIME 1.3.

For the aggregation options in addition to the normal ones first and last are useful in certain situations.

unknown_user · June 1, 2007, 11:08am

Hi,

I came across knime when searching for workflow tools and I found the software beautiful. Most of us here, we work on Macs and the OS X Java problems are really annoying :-(. However I could test knime on a Linux server and it worked perfectly fine :-).

I'm posting in this thread because the question was about aggregation and column addition to tables. As I was recently using a commercial workflow tool called Amadea (whose license costs are prohibitive), I would like to comment on the utility of workflow tools that deal with tables, tabular data.

We found out that it is really very convenient to have all the typical database tables operations in a workflow withouth having to actually deal with building the database. Column addition, row addition, matrix transposition and especially efficient joins are a joy to use in a graphical workflow environment. Aggregation is also very useful ('group by' in SQL) but only if accompanied by special functions that work on the formed groups. For example, if one has a table with microarray fluorescence values in which several rows correspond to the signal for a single gene, one could be interested in obtaining easily a table where each line corresponds to a gene and contains the average value of corresponding signals. Simple functions would have 'aggregate' versions for this kind of scenario.

Oh, and it would be great if one could select columns or rows from a table based on more complex queries. Regular expressions are great but, as I have seen in the actual implementation, only allow testing of values from a single column. Moreover, while programmers are familiar with regexps, many scientists are not - they would greatly benefit from support in building those queries.

Many of my fellow scientists have difficulties to work with tabular data even if thy need to do that more and more each day. They are used to Excel, however for the kind of operations I described above and for large tables Excel is useless. In view of the large amounts of microarray data, growing very fast, I believe Knime could provide a solution to this widespread problem by providing generic tables operations and functions. Knime has an enormous potential. You did an excellent work, thank you.

Sincerely yours,

Cosmin

lucatoldo · September 2, 2009, 11:47am

In order to compute a “New Column from expression”, the Java Snippet works really well.
For example to add a new column called “DataType” with the value “Cheese” one simply has to do
create a Java Snippet node, connect the input node, configure the Java Snippet with the following
Java code:
return “Cheese”;
and then Append column: “DataType” of return type String.
Thankyou so much !

simulyant · September 2, 2009, 12:45pm

Hello, kasthuri.
You wrote “Also, it will be great to know if I can do a ‘group by’ operation on existing column(s)”. But there is a node “GroupBy” in the Data Manipulation \ Row section. This node can group by several columns and calculate aggregate functions.
BR