Feature suggestions for the Color Manager node

The owner of the tarantula-infested hotel went to his MBA brother-in-law for advice, who tells him “problems are opportunities in disguise!” Inspired by this wisdom, the owner decides to turn his business into a theme hotel for arachnophiles. Visitors are encouraged to catch the tarantulas and bring them to the kitchen, where the tarantulas become ingredients in all kinds of unusual fusion dishes. This attracts the attention of the nutrition science department of the nearby university, who want to know if tarantulas are a good source of protein. They send a group of students who start tracking the weights of the guests and compare it to a vegetarian control group. The students collect the data and process it with a Knime workflow:

image

image

The professor is impressed by the students’ work, especially by the fact that the vegetarians and the tarantularians can be easily distinguished in the plot by the line colors. However, he sends them back to collect more data in order to achieve statistical significance. After a few days the students added an additional tarantularian and vegetarian.

image

Uh oh:

image

The workflow cannot be executed without re-configuring the color manager. How awesome would it be if one could color the line using regular expressions… In this case, color all columns that match “Tarantularian[0-9]+” blue and the columns matching “Vegetarian[0-9]+” green. Then we can keep adding data to the table without having to worry about the color manager.

A second suggestion: the Color Manager’s feature of coloring data using a continuous variable, aka the “color range” functionality, is a bit limited because only 2 colors can be used. It seems to be impossible to generate a more complex color range such as:

image

Best
Aswin

2 Likes

Hello @Aswin,

Thank you for your awesome story and pointing to the issue you face. You are actually right about your two observations. And I will open development tickets for such issues.
However, there is actually a quite easy workaround for the first one: You could easily use the naming of the ID column and remove the number at the end, colorize the unique values and send this color table to the <line plot>. This actually involves only two extra nodes, as you can see in the attached workflow in the green area. The <table creator> in the green area just depicts your updated data with Tarantularian3 and Vegetarian2.


color-manager-improvement.knwf (25.0 KB)
Best regards,
Kevin

2 Likes

Dear @kevin_sturm,

thank you, it works! :smiley: :+1: But the table generated by “Color Manager” now contains 2 columns.

From the “Line Plot” help text:

The node automatically selects the column from the Color Manager and assigns the color values to the plotted columns.

…and…

Input Ports: 1: Data table containing one column with the column names of table which has in addition a color assigned. (optional)

Now that the color table has 2 columns instead of one, how does the Line Plot node know which column to take?

To answer my own question: it seems that the “Line Plot” node simply takes the first column, no matter what the name is. If the columns are moved around with the “Column Resorter” node, your solution stops working.

Maybe the help text can be made more specific?

I also would like to add that your solution does not work for the “Line Chart (JFreeChart)” node (see also this earlier post). I still often use the “Line Chart (JFreeChart)” node because it generates png images of the charts 1-2 orders of magnitude faster than the “Line Plot” node.

Thanks again and best regards,
Aswin

Thank you @Aswin for your reply and observations.

You are right, the description of the <line plot> is not too specific with the expected input data table dimension. However, it ignores what is inside all columns after the first column and just takes the first column including the assigned colors. Removing all other columns with a <column filter> does not work, since it also removes the colorization (due to the fact the colors were defined on the removed column).

This is only a problem in the case you described, while using the older <line chart (jfreechart)>. We are sorry to hear that the difference between both, creating a png, is unexpectedly high. Can you give us some numbers here? Other than that, we do not recommend to use the old <line chart (jfreechart)> anymore, necessarily.

When working with the <line plot> of the Plotly extension, the shown workaround does not work as well, since this node expects the input data in a different format. Due to the fact there is no additional (and optional) input port for colors, your entire input table needs to have colors already assigned. That might make it complicated to achieve the exact same plot in some cases, since the table cannot be transposed.

Best regards,
Kevin

1 Like

Dear @kevin_sturm

this is a typical part of my workflow where I create a table with 928 miniplots of 300x200 pixels using a group loop:

In the top branch I use “line chart (jfreechart)”. This takes 35 seconds.

In the middle branch I replaced it with “line plot”. This takes 649 seconds, almost 20x slower.

Note that “line plot” can only produce svg images. If I want png, I have to add a “Renderer to Image” node, which I do in the bottom branch. Now the loop takes even longer, 705 seconds.

Now you perhaps think “that’s still 1 order of magnitude, not 2”. In the following example I plot a typical output of my lab equipment… 58985 points:
image

The JFreeChart version takes 0.32 seconds, the “line plot” version needs 57 seconds.

The sluggishness of “line plot” is , I guess, a result of the fact that it is written in Javascript and not in Java like most of Knime, therefore I do not have much hope that it can ever come close to the performance of “line chart (jfreechart)”. So I hope that the JFreeChart nodes will remain in Knime for the foreseeable future…

Best
Aswin

1 Like