KNIME 2.4 Comments

Hi KNIME Guys,

Many thanks for the new version with all the added functionalities which are much appreciated. I am using Windows 32bit, and there are couple of bugs identified, and some suggestions for improvements :-)

- In Rule Engine node, if you enter in a rule which is not accepted when you click "Add", it continues to add quotation marks to the outcome box everytime you click "Add".

- In Rule Engine node, selecting a column in the Outcome box and then ticking "Is A Column Reference" then prevents the Rule being added. As far as I can tell the column is entered correctly by double clicking on it which add its in name within $ characters. This is a shame as this is an exciting new feature I look forward to using.

- In Rule Engine node, when you first open the dialog, clicking in Rule box followed by double clicking on a column, it will not insert the column into the box.

- XLS Reader is becoming slower and slower with every release. Loading in spreadsheet of 200 columns by 2000 rows takes around 1 minute to load from selecting the file to playing the node.

- Also the Table Creator node is getting very slow and completing too when clicking Okay.

- Insert Column Header and Column Rename (RegeX) work great, many thanks. However, the location for the "Insert Column Header" seems an odd location, surely it should be with Column Rename in Column/Convert&Replace.

- Column List Loop Start is a great new node, but is it possible to have the option for the "excluded" columns to be either 1. passed through the node (current situation), or 2. not passed through the node (would be more useful). It is often useful to loop across individual columns at a time, and this would be useful. The current situation with the "Column List Loop Start" does not seem that useful unless you plan to loop across all the columns as otherwise you end up with lots of duplicated columns, unless I am missing a feature of this node.

- New features on metanodes are great being able to expand and contract to and from metanodes. One missing feature is being able to modify an existing metanode in terms of number of in ports and out ports. Also it would have been nice to be able to expand and collapse metanodes on nodes which have already been run.

- Really like the new Ensemble Learning nodes which adds alot more versatility to the modelling facilities, but some bugs seem to be present (below).

- In the Delegating metanode, should the Column Filter not be set to Enforce the Exclusion of the Prediction column only, otherwise all the data columns are lost on the second loop iteration. Also can the Java Snippet node be made more user friendly where you have a config box to select the real value column and prediction column, maybe a "True or False Match" node, alot of users like myself are terrified of Java programming!, I dont want to be editting this text string everytime I change the model or real value column.

- The Voting Loop End doesnt seem to be reporting the most frequent value correctly all the time. Looking at the individual winner columns, the reported "most frequent winner" column is not correct from simply adding "the individual winner" columns up manually. I have tried numerous datasets with the same issue. Also can there be an option to only report the "most frequent winner" column instead of being force to have all the individual winner columns.

- Maybe I am doing something wrong with the Boosting Learner, but after just one iteration, the loop fails for me with the warning "Prediction Error to big. Finishing", but the prediction error is only 0.153, and model weight 3.5. I would have said 0.153 is a low error. Using other model learners (Naive Bayesian) giving initial errors of 0.5 work fine in the metanode producing many loop iterations to refine the model, so surely either the error is incorrect and should be saying "Prediction Error too small. Finishing", or something else is going wrong.

- The "Enh 2724: Feature Elimination node to have filter by error threshold option" is a nice addition, however, is it possible for the selected outcome to be highlighted in the node config box so user can go back into the node to see which outcome was automatically selected, its a little unfriendly to go to the node preview window to work out which columns were picked.

- Parallel Chunk Looping nodes work well, are there plans to include an option with existing looping nodes for these to run in parallel such as the "TableRow to Variable LoopStart".

- The additional controls around Loops with Stepping is also a nice feature to keep track of what is happening within a loop when things go wrong! Makes troubleshooting more easy.

- Really good to have the Fingerprint Bayesian nodes, a much welcome addition! However, I dont quite understand how the Scoring works on the Target Class other than negative is bad and positive is good, just what is a good score, a percentage confidence would be more useful if possible.

- With the Column Merger, a missing option is to take the average when there is data for both primary and secondary columns, otherwise another useful addition.

- Column Resorter node still fails when new columns are introduced. Needs to be made to put new columns to the end and node continues and just provides a warning of new columns found. It is frustrating to go back through workflows when columns change.

- Numeric Binner node fails when new columns are introduced or removed, even when it doesnt affect the columns being binned.

- The Binner Dictionary node works great and doesnt suffer from the same problem mentioned about the Numeric Binner node failing when unrelated columns are added/removed.

- Statistics node is still missing "Total Datapoints count" despite having a "Missing Values" count which seems a little bizarre.

- In the XLS Writer node, in addition to the overwrite data option, please can we have an "Append Data" option.

- In the Maths Formula node, is it possible to have a "Variables List" box besides the "Column List" box. I constantly find myself having to convert my variables into a column so I can do some Mathematical calculations with it.

- Also in the Maths node, is it possible to have an operator facility to specify the number of decimal points or significant figures.

- Can the GroupBy node have geometric means added to the aggregate options besides the standard arithmetic mean.

- In the String Replacer node, is it possible to replace Strings from within the RowID column. Being able to replace Strings across multiple columns would also be useful.

- Also can the String Replacer Dictionary be given a second in port to load the Dictionary table by. That way you are not reliant on an external file.

- Strange Behaviour with the Annotation boxes around Copy and Paste. If you highlight a string inside an Annotation Box and then copy to Clipboard, if you then click somewhere else in the Annotation box to paste this string, instead of doing this it actually pastes another entire Annotation box.

- In the Dictionary Tagger node (Text Processing), can we have a second in port to enter in the definitions for the dictionary instead of loading an external text file, this would be more user friendly (so 2nd in port would take a table of one column containing the definitions to match). Also the Named Entity tag of 6 options is limited, this would be better as an open box for the user to enter in their description.

- Also on Text Processing, is there any plans for a Chemistry Tagger in the Enrichment section such as the OSCAR Java software implementation, this was something mentioned a while back. http://apidoc.ch.cam.ac.uk/oscar3/

- In the Table Viewers, either Interactive Table Viewer or just previewing the data in the node, need the facility to drag the column header down, so that the column header name can be wrapped over multiple lines. The monoline column headers is a significant inconvenience when you have lots of column names which are similarly named apart from at the end.

-In the Data Viewers, is it possible to have an option in the data view nodes such as Histogram, Scatterplot etc to be able to set up the view just as you want it and then have a button to "Set as Default" such that everytime the viewer is open, it loads up the view settings you defined such as the axis scale, the x and y datasets, dot size, bar size etc. This would be useful for passing on workflows to others who you want to share specific analysis with them and it would be nice for it to be set up automatically. Or even when you come back to reanalyse the data at a later point, you will want the same views set up as before. At the moment this is quite painful to set up the view everytime you open up the node viewer.

- Also in the Rule Engine node is it possible to specify an option to play a basic Windows audio WAV file in the otucome as an audio alarm for when the rule condition is met. The reason this can be useful is to alert the user to a specific event.

- Can the "Hilite Collector" node be improved such that the node remains in execution mode until the user enters their desired content in the Hilite Collector window and clicks on a Finish button. Current situation allows annotations to be made when the "Hilite Collector" node has already been executed resulting in the annotations not being applied.

Thanks for the updates,

Simon.

I would like to add a few feature requests:

1) Node to remove empty columns

2) Filter Column (ID) using a Regex.

3) Normalization nodes should also offer median and MAD scheme (ie. x/Median and (x-Median)/MAD)

4) Group by should also have robust statistics (percentile, quartile, MAD...)

4) Text replace using regex (cf. previous post).

5) A few nodes around Non-Linear discriminant analysis node would be great.

6) It would be nice if the spotfire Node (in Knime Tech) could support 64bit platforms. I imagine that 32bit computing is slowly called to disapear!

Thank you very much for this detailed analysis!

This is valuable feedback.

We will fix the bugs and we have the enhancements on our list now!

Thank you for your time and effort.

 

Best regards,

 - Peter.

About the Delegating Node :

 

No the filter is correctly set as we only want to have the original columns back to the loop node.

 

I thought about your second command, but I think with the current nodes I can't do something easier. One could use the rule engine. But then this has to be changed every time.

 

Thank you for your comments.

Iris

Hi Iris,

For the  Delegating metanode;

Yes we only want the original columns back to the loop start node, but the current settings of the Column Filter seems to remove everything except the RowID's so all the columns are lost with just the RowIDs being sent back to the loop start. If the Column Filter setting is changed to exclude only the Prediction column with the "Force Exclusion" setting, then this would have the desired outcome in that any other columns the node sees will be retained (i.e. all columns retained exception the predictions).

I agree with the current set of nodes the Java Snippet is the easiest way, but it is still not ideal (as this has to be changed every time too for different datasets and different Mining nodes). The best solution would be to have a dedicated "True or False" node designed which would be much more user friendly to set up.

Thanks

Simon.

Hi,

A new feature in KNIME 2.4 which I feel has made KNIME worse compared to the previous version is the node repository quick search box.

In KNIME 2.4 it updates the nodes matching the string as-you-type. This makes entering a string in the box very slow and unresponsive. Please revert to the previous way (version 2.3.4 and earlier) of entering a string first, then pressing return, and then showing the nodes matching the string. Either that or find a way of drastically improving the node searching.

 

Thanks,

Simon.

I found it very useful Simon. Since the whole tree is expanded, it does not need 3 clicks to reach the terminal node as earlier. It may be advisable to set this behaviour in preferences rather than reverting?

What IS annoying though are the file IO related nodes which open at the OS's default location even though one has changed to some other directory. My preference is to open the last opened location/directory. In most file open/save  dialogs it is very easy to navigate to the desktop or home..

Hi Simon,

We added this as a new, often requested feature and it works fine (and fast) on our end. Can you let us know what kind of system you use so we can find out what causes the slow down on your end? How many node extensions do you have installed; we have tried it with more than 600 nodes without any performance problem.

Thanks.

Hi Gabriel, I work at Lilly with Mike Bodkin who you may be familiar with.

I dont believe we have more than 600 nodes (Schrodinger, MOE, Tripos, Think are custom addons we have).

It seems to take around 3 seconds per letter to type. It doesnt sound like long, it is a little frustrating to see what you are typing, as when you finish a word, you are waiting some while for it to catch up.

PC wise, its 2Gb, Windows XP SP3, 3Ghz CPU.

Simon.

I would have to agree that the new behaviour is, in practice, a retrograde step.  I find that the lag when typing is just annoying, rather than unworkable - but the lag when deleting letters seems worse - eg type "row filter" and it is a bit sluggish, but ok.  However, if I want to delete what I have just typed, by the time I get to deleting the 'o', then the 'r' it gets really sticky (as the nodes all collapse back into their respective parent categories)!

I also find that if a search brings back a lot of matches, then the list doesn't default to the top (eg just typing 'filter') - which seems a bit arbitrary.

I am currently running both 32 and 64-bit knime on 64-bit Windows 7, i7 CPU, 8GB ram.

 

Kind regards

James

Hi James,

I completely agree that when you delete characters after you have typed a word into the box is sluggish to say the least and takes some time to delete.

Deleting characters is much much slower than adding characters.

Simon.

Hello Simon,

I'm happy to report that the group by node will support geometric mean as well as counting of FALSE and TRUE values of boolean cells with the next version of KNIME (2.4.1).

I hope to add more robust statistic operations with the version after the next one. If you are missing further aggregation methods feel free to contact me or submit a feature request in the forum.

Bye,

Tobias

That is brilliant to include the geometric mean, is it possible to include the geometric standard deviation in this too for completion. There will be alot of happy people with the geometric mean being in the next release.

Other useful statistical measures would be;

The top quarter percentile value  (Top Quartile) and the bottom quarter percentile value (Bottom Quartile).  The reason this is useful is sometimes the max and min end up reporting back the outliers within your dataset, but what you want to see is the max and min values of where the majority of the dataset lies. (i.e. the upper and lower quarter percentiles). It would be more versatile if the percentiles could be specified, i.e. in case you want the 10th and 90th percentile values. May be more difficult to implement, but would add a lot more versatility.

Another useful aggregation technique is a binning percentage count, which again may be tricky to implement is for a column which has got binned values, lets say for example it has 3 bins; "Low Metabolism", "Medium Metabolism", and "High Metabolism", is being able to return three new aggregation columns called "Bin Percentage Count - Low Metabolism", "Bin Percentage Count - Medium Metabolism" etc, with the percentage count in each.

 

Many thanks Tobias for these implementations, KNIME is just getting better and better, and consequently more and more addictive and powerful. I hope the KNIME team didn't mind this huge bug list and improvement list, I just want it better and better.

Simon.

Hi All,

On my end the selection is notreall slugish, though I mus admit that when the selection is cleared, it would be great if the view could stay a the last vaue you selected.

Best,

Ghislain

Dear All

I am very happy with the node repository quick search box in  KNIME 2.4, it would be a step backwards to revert it to how it used to function.

Best wishes

Stephen

> The "Enh 2724: Feature Elimination node to have filter by error
> threshold option" is a nice addition, however, is it possible for
> the selected outcome to be highlighted in the node config box so user
> can go back into the node to see which outcome was automatically
> selected, its a little unfriendly to go to the node preview window to
> work out which columns were picked.

Actually, I'm seeing the selected features when I open the dialog in the feature list on the right. Is this not what you want?

Hi Thor,

This seems to be working now in KNIME 2.4.1, it didnt seem to work for me when "Select Features Automatically" was selected in 2.4. The only minor criticism is whether it could also highlight the left hand table to show the error/no of features row it selected, as this does not show when the "Select Features Automatically" is selected.

Thanks,

Simon.

We have addressed your concerns in the latest bugfix release and added a java property that allows to disable the live search in knime.ini.

From the change log:
Enh 2809: Added Java property to disable live search in node repository (add -Dknime.repository.non-instant-search=true to knime.ini; temporary workaround for problem reported in forum discussion)

Dominik

Some of the issues above have been addressed in the newest KNIME v2.4.1 bugfix release, please see the changelog for details. Thanks again for your valuable input and please keep us updated :)