Hi KNIME Guys,
Many thanks for the new version with all the added functionalities which are much appreciated. I am using Windows 32bit, and there are couple of bugs identified, and some suggestions for improvements :-)
- In Rule Engine node, if you enter in a rule which is not accepted when you click "Add", it continues to add quotation marks to the outcome box everytime you click "Add".
- In Rule Engine node, selecting a column in the Outcome box and then ticking "Is A Column Reference" then prevents the Rule being added. As far as I can tell the column is entered correctly by double clicking on it which add its in name within $ characters. This is a shame as this is an exciting new feature I look forward to using.
- In Rule Engine node, when you first open the dialog, clicking in Rule box followed by double clicking on a column, it will not insert the column into the box.
- XLS Reader is becoming slower and slower with every release. Loading in spreadsheet of 200 columns by 2000 rows takes around 1 minute to load from selecting the file to playing the node.
- Also the Table Creator node is getting very slow and completing too when clicking Okay.
- Insert Column Header and Column Rename (RegeX) work great, many thanks. However, the location for the "Insert Column Header" seems an odd location, surely it should be with Column Rename in Column/Convert&Replace.
- Column List Loop Start is a great new node, but is it possible to have the option for the "excluded" columns to be either 1. passed through the node (current situation), or 2. not passed through the node (would be more useful). It is often useful to loop across individual columns at a time, and this would be useful. The current situation with the "Column List Loop Start" does not seem that useful unless you plan to loop across all the columns as otherwise you end up with lots of duplicated columns, unless I am missing a feature of this node.
- New features on metanodes are great being able to expand and contract to and from metanodes. One missing feature is being able to modify an existing metanode in terms of number of in ports and out ports. Also it would have been nice to be able to expand and collapse metanodes on nodes which have already been run.
- Really like the new Ensemble Learning nodes which adds alot more versatility to the modelling facilities, but some bugs seem to be present (below).
- In the Delegating metanode, should the Column Filter not be set to Enforce the Exclusion of the Prediction column only, otherwise all the data columns are lost on the second loop iteration. Also can the Java Snippet node be made more user friendly where you have a config box to select the real value column and prediction column, maybe a "True or False Match" node, alot of users like myself are terrified of Java programming!, I dont want to be editting this text string everytime I change the model or real value column.
- The Voting Loop End doesnt seem to be reporting the most frequent value correctly all the time. Looking at the individual winner columns, the reported "most frequent winner" column is not correct from simply adding "the individual winner" columns up manually. I have tried numerous datasets with the same issue. Also can there be an option to only report the "most frequent winner" column instead of being force to have all the individual winner columns.
- Maybe I am doing something wrong with the Boosting Learner, but after just one iteration, the loop fails for me with the warning "Prediction Error to big. Finishing", but the prediction error is only 0.153, and model weight 3.5. I would have said 0.153 is a low error. Using other model learners (Naive Bayesian) giving initial errors of 0.5 work fine in the metanode producing many loop iterations to refine the model, so surely either the error is incorrect and should be saying "Prediction Error too small. Finishing", or something else is going wrong.
- The "Enh 2724: Feature Elimination node to have filter by error threshold option" is a nice addition, however, is it possible for the selected outcome to be highlighted in the node config box so user can go back into the node to see which outcome was automatically selected, its a little unfriendly to go to the node preview window to work out which columns were picked.
- Parallel Chunk Looping nodes work well, are there plans to include an option with existing looping nodes for these to run in parallel such as the "TableRow to Variable LoopStart".
- The additional controls around Loops with Stepping is also a nice feature to keep track of what is happening within a loop when things go wrong! Makes troubleshooting more easy.
- Really good to have the Fingerprint Bayesian nodes, a much welcome addition! However, I dont quite understand how the Scoring works on the Target Class other than negative is bad and positive is good, just what is a good score, a percentage confidence would be more useful if possible.
- With the Column Merger, a missing option is to take the average when there is data for both primary and secondary columns, otherwise another useful addition.
- Column Resorter node still fails when new columns are introduced. Needs to be made to put new columns to the end and node continues and just provides a warning of new columns found. It is frustrating to go back through workflows when columns change.
- Numeric Binner node fails when new columns are introduced or removed, even when it doesnt affect the columns being binned.
- The Binner Dictionary node works great and doesnt suffer from the same problem mentioned about the Numeric Binner node failing when unrelated columns are added/removed.
- Statistics node is still missing "Total Datapoints count" despite having a "Missing Values" count which seems a little bizarre.
- In the XLS Writer node, in addition to the overwrite data option, please can we have an "Append Data" option.
- In the Maths Formula node, is it possible to have a "Variables List" box besides the "Column List" box. I constantly find myself having to convert my variables into a column so I can do some Mathematical calculations with it.
- Also in the Maths node, is it possible to have an operator facility to specify the number of decimal points or significant figures.
- Can the GroupBy node have geometric means added to the aggregate options besides the standard arithmetic mean.
- In the String Replacer node, is it possible to replace Strings from within the RowID column. Being able to replace Strings across multiple columns would also be useful.
- Also can the String Replacer Dictionary be given a second in port to load the Dictionary table by. That way you are not reliant on an external file.
- Strange Behaviour with the Annotation boxes around Copy and Paste. If you highlight a string inside an Annotation Box and then copy to Clipboard, if you then click somewhere else in the Annotation box to paste this string, instead of doing this it actually pastes another entire Annotation box.
- In the Dictionary Tagger node (Text Processing), can we have a second in port to enter in the definitions for the dictionary instead of loading an external text file, this would be more user friendly (so 2nd in port would take a table of one column containing the definitions to match). Also the Named Entity tag of 6 options is limited, this would be better as an open box for the user to enter in their description.
- Also on Text Processing, is there any plans for a Chemistry Tagger in the Enrichment section such as the OSCAR Java software implementation, this was something mentioned a while back. http://apidoc.ch.cam.ac.uk/oscar3/
- In the Table Viewers, either Interactive Table Viewer or just previewing the data in the node, need the facility to drag the column header down, so that the column header name can be wrapped over multiple lines. The monoline column headers is a significant inconvenience when you have lots of column names which are similarly named apart from at the end.
-In the Data Viewers, is it possible to have an option in the data view nodes such as Histogram, Scatterplot etc to be able to set up the view just as you want it and then have a button to "Set as Default" such that everytime the viewer is open, it loads up the view settings you defined such as the axis scale, the x and y datasets, dot size, bar size etc. This would be useful for passing on workflows to others who you want to share specific analysis with them and it would be nice for it to be set up automatically. Or even when you come back to reanalyse the data at a later point, you will want the same views set up as before. At the moment this is quite painful to set up the view everytime you open up the node viewer.
- Also in the Rule Engine node is it possible to specify an option to play a basic Windows audio WAV file in the otucome as an audio alarm for when the rule condition is met. The reason this can be useful is to alert the user to a specific event.
- Can the "Hilite Collector" node be improved such that the node remains in execution mode until the user enters their desired content in the Hilite Collector window and clicks on a Finish button. Current situation allows annotations to be made when the "Hilite Collector" node has already been executed resulting in the annotations not being applied.
Thanks for the updates,
Simon.