Combining regression predictions into a row

jordan · November 7, 2012, 1:51pm

I have created a flow to pass test data through a groupby loop for analysis. The groupby loop performs a linear regression on several targets and then makes several predictions. The intended outcome is that 5-10 rows, representing a single unit are summarized into a single row. The only way I have found to accomplish this is to create individual regressions and predictions for each of the targets of interest and now I must somehow collect these individual predictions into a single row. Can anyone suggest the best way to combine the individual predictions into a single row? Is there any way to build a regression model with several targets? As an example: A model to predict pressure and flow rate as a function of a valve position. I know that it can be done with individual regression nodes, I am hoping that there is an easier, more generic way. In an ideal world, I would love to preserve highlighting through this groupby loop, but I don't believe the mapping feature that exists in the groupby node is implemented in the groupby loop. Is this assumption correct? Thanks to anyone who can help.

jordan · November 7, 2012, 4:48pm

I've managed to do this myself using the column joiner. The next part of my problem has to do with speed. To fit a trend line(single variable regression) through 500, 5 point sets takes about 10 minutes. Any way to improve the speed of this process?

Jordan

jordan · November 7, 2012, 5:16pm

Wow! Not surprisingly, the speed of execution is strongly influenced by the log level. By default, the debug log is set to write a ton of messages. Switching logging reduced the amount of time needed from 18 minutes to about 45 seconds.

So, I've blundered my way through two of my problems. Anybody know if supporting hiliting through a groupby loop is possible? If not, could it be implemented?

Thanks.

gabriel · November 7, 2012, 5:36pm

If you are using the Group Loop Start or have enabled the highlighting explicitly in the GroupBy node and have un-checked the option 'unify row IDs' in the Loop End node, then the hilting also works across the loop. In general, highlighting always works based on row IDs, if a node produces the same row IDs highlighting works right away, in other nodes such as Pivoting and GroupBy the highlighting need to be enabled manually.
I can confirm, that tons of messages printed in the KNIME Console can slow down the KNIME workflow execution. This is simply because the out-process needs resources that can't be used for the executing the workflow.

jordan · November 7, 2012, 7:03pm

Thanks Gabriel.

I've found the ability to hilite across a groupby node, I've also observed the ability to preserve hiliting across a regular loop(the "uniquefy row ids" must be disabled.) However, I think the only way to perform hiliting across a groupby loop is if the loop builds a map, just as the groupby node does. Can you confirm that you have been able to preserve hiliting across a groupby loop start and loop end node?

The rows that I produce are summarized results. Basically, I don't expect that they should preserve hiliting because the incoming row id's shouldn't correspond to the results at a 1 to 1 level. It is a many to 1 correspondence from input to results. Does a node exist to establish this link for hilite purposes?

Regarding the logging: I think it would be valuable to make it clear to users that the default loglevel is set to "DEBUG" so that new users(like me) aren't discouraged by what at first could appear to be a very slow tool.