Colored Interactive Table

corwinjoy · September 28, 2007, 10:16pm

I have developed a simple modification of the Knime Interactive Table that allows you to set coloring on individual cells in the table by accepting multiple Color Models as inputs. This ends up giving a table with functionality similar to conditional color formatting in Excel.

I've been using KNIME for a bit, and one of the limitations that I did not like was that the Interactive Table View allows one to specify only a single ColorManager to use in formatting colors for the table. Further this color only shows at the beginning of the row. What I want to be able to do is set multiple color managers on a table so that a column for molecular weight could be colored one way, and a column for hydrophobicity could be colored another way. I also wanted these color formats to be applied directly to cells in the table (basically like conditional cell coloring in Excel). It turns out that it is fairly easy to modify the Knime Interactive Table code to create a table that can accept multiple color models and then colors the cells in the table based on these color model specifications. Below is a link to the code, basically it just involves reading color models from the input ports and changing the cell renderer to use the appropriate color model in each column. There is a bit of bloat here in that I had to make a copy of TableNodeView and ColorNodeModel rather than just inheriting and making the small changes needed. Other than that, its a fairly simple node and may be of interest to others. I have included source + a compiled add-in binary.

https://sourceforge.net/project/showfiles.php?group_id=150558&package_id=247317

Corwin

unknown_user · September 29, 2007, 2:01pm

Most impressive!

The notion of connectiing several ColorHandlers via the Model Inports is quite original although not quite the way we envision the property handlers being used :-)

The underlying idea is that colors (and other properties) are applied to data records along the way (i.e. at a previous stage of the pipeline). The reason why property nodes also offer model ports is that sometimes you want to define your coloring (size/shape...) scheme on one data flow and apply the same way to color (size, shape...) the records of a second flow.

In your case you can grab at least one color handler from the inports directly. However, we do not plan to support multiple color handlers along one "pipe" since it will be very confusing to communicate to the user where the colors of a particular plot come from (imagine having three scatter plots open all using different colors?!). So if you need to add more visual dimensions you should preferable use conceptually other dimensions such as size or shape. Right now a subsequent node does not need to know which column defines color (shape/size...) but only knows which color (size/shape) a row has. This is also to allow us to add color (...) handlers which are based on information from several columns in parallel (imagine class information spread out over more than one column).

Having said all of this, I can understand what you are trying to do. I am wondering if this is not something that really should be specified in this one "colorable interactive view" and make it clear to the user that the colors in this particular view have nothing to do with the color available on the pipeline? Similar to what you can already do by changing the renderer of a color column to gray-scale. Try right clicking the column header and choose this from "Available Renderers"...
(We could add one that keeps the numbers in there as well...)

Does this make any sense at all?

Cheers, Michael

unknown_user · October 1, 2007, 5:15pm

Michael,
Thanks for the response, I'm glad you liked the node.
I see what you are saying about wanting to have a single global / row format for the pipeline. Personally, my preference is that formatting should be more separated from the pipeline. The idea is to define a group of color models, shape models, number formats etc. that define how you want to color a particular kind of data. (Be that drug response data, cancer patients, chemical compounds etc.) This group then gets bundled as a named format (cancer remission, drug family SAR, GO classification etc.). I would then see nodes referencing re-usable named formats to specify how they want colors done. I would like to use your color models here since they are already reasonably well set up to incorporate multiple columns which I agree is valuable. I also like having these format settings be a set of explicit nodes rather than just buried as an attribute in a table viewer since I think that gives better documentation on how settings are being applied. The only piece I would want to add on top of this is a kind of "model join" node so that model outputs can be grouped together into a collection

e.g. Model 1 + Model 2 --> Join Node --> Array of (Model 1, Model 2)

It looks like the ModelContent class can support this, I've been a bit lazy in doing it though.
Anyway, this would allow collections of format properties so that nodes like ColoredTable would then need just a single input port.

Also, I don't really agree that it would be confusing having several formats on a pipeline. If a node is setup to reference a formatting scheme explicitly then it should always be clear what is being done.

For us, we are dealing with large sets of data so having "dense" and informative formatting is very helpful. Actually, what I would really like is a table with shape and color in each column... :o (this may be a job for R tho: http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=63)

By the way, here is a link on color formatting in Excel if anyone is not already familiar with this:
http://blogs.msdn.com/excel/archive/2005/10/06/477948.aspx

unknown_user · October 7, 2007, 9:40pm

Hi Corwin,

merging models is not a problem, yes. Actually for some of our meta nodes (boosting) we have a similar setup. I am not quite warming up to the idea of having a seperate "format pipeline" just yet, however. The current setup is that these things are attached to the pipeline by specific nodes (the property nodes) and are than an inherent part of the data tables flowing across the pipeline.
If you seperate the formats one would need to allow for matching those to the datatables again, simply going by column name may not work. What happens if someone transposes a matrix in the middle and afterwards applies the previously created format to the new table?

But let us ponder this thought a bit more, I agree with you that it would allow more control/power in terms of adding visual dimensions.

Maybe a good topic for a dinner discussion at our upcoming workshop? ;-)
knime.org/events

Cheers, Michael

unknown_user · October 9, 2007, 4:20pm

Sounds good. I'll think about some different ways we could do this. I'm probably not going to be at the conference, but my boss Bertrand Fabre will be there and we can hopefully discuss a few ideas we've had about Knime.

Corwin