Hi,
(hope this is thre right forum to post feedback on the knime src code. I also wanted to ask whether there is a more "direct" way to suggest patches.)
I ran in a potential bottleneck in the ARFF Reader.
In the function extractNominalValues all values are extracted from the ARFF Header and stored in a Vector. To ensure that the values are unique it is verified that each value occurs only once.
This verification step is pretty expansive (quadradict(?)) for large domains. However, this can be easily made more efficently by replacing the Vector by a Set.
As far as I know the order of the attributes is crucial in the context of weka (nodes). As I do not know whether Knime respects the order internally I'm not sure whether to suggest to use a HashSet or an OrderdSet.
cheers Ingo