Problems with Vernalis Benchmark Nodes - Please help

Hi all,
I have a huge workflow with over 600 nodes. One part of the nodes are the Excel file formater nodes in KNIME which take roughly 50% of the processing time.
I want to measure the time for the pure worklfow til the Excel file is written and after that I want to measure the Excel formating workflow.
Therefore I’m using Vernalis Benchmark Start and 2x Benchmark End node.
I have connected the Vernalis Benchmark Start and End node and via the variable ports with the Benchmark End nodes, see scheme in the attachment. Only as an example, not working.

My questions to that:
1.) is the usage of the Vernalis Benchmark nodes right? (for time measuring of the workflow), means one Benchmark Start node and 2 Benchmark End nodes at the end of the workflow?
2.) How is the Benchmark node (2 ports and 3 ports) working? Can you give me a simple example?
3.) Are there existing exam
2nd Vernalis Benchmark example for Excel nodes.knwf (147.2 KB)
KNIME Excel Reading Decimal Number Test.knwf (94.9 KB)
ples how the different Vernalis Benchmark nodes are working?
4.) Why is the 2nd Example with Vernalis Benchmark nodes not working? see attachment “2nd Vernalis Benchmark example for Excel nodes” see also the Questions in the workflow

I say in advance many thanks for your help its highly appreciated! :wink:

LG
Burkhard

Hi Burkhard,

You are right as to the idea behind the benchmark nodes - they are designed to measure execution times and optionally memory usage during workflow execution. A little more detail…

  • The nodes are KNIME loop start/end nodes, so they need to obey the usual loop rules - i.e. matching start/end pairs (this is why your second example fails - more on that later)
  • By default, they only run the loop body once - you can change this to get a better sense of average performance - in particular Java often has a ‘warm-up’ time on the first use of something - this may or may not show up in the timing execution on multiple depending on exactly how the JVM (Java Virtual Machine) decided to go about things
  • The Benchmark Start nodes (1port, 2 port, 3 port and Flow Variable versions) simply pass the input table straight through - they just act as a marker, saying “Start timing now”. This is also where you can do some setting up of options such as number of iterations, and whether you eventually report the individual node timings within the loop - this option might be useful for you if you want to pin down where in your workflow the slowest parts are
  • The Benchmark End nodes in all versions (again 1 port, 2 port, 3 port and Flow variable) pass through the tables from the inputs, but also add as the first two outputs a flow variable port - this has the summary information - number of executions, average, min and max execution times etc as flow variables, and a data table containing the timing information
  • You can mix and match and of the start/end nodes, so a 3 port start, for example could pair with a flow variable end.

Now to the specifics of your example 2…

You have 4 benchmark starts and 2 ends (and a lot of stuff in between - that’s fine!)

I think the Ends are easy to solve - as far as I can tell, you want to end the timing after the ‘XLS Formatter (apply)’ node - in which case, you are probably best to use a ‘Benchmark End (Flow Variable)’ node, attached to the hidden variable output port, as shown here:

image

As for the ‘Starts’, that’s a bit more complicated… (because we dont have a 4-port start - more on that at the end…)

I’m guessing you are trying not to include the time for the 2 excel readers to do their stuff?

Yes… I want to exclude the Excel readers…
OK, use a Benchmark Start (2 Port) node, with the 2 excel readers feeding into it, replacing the 2 benchmark start nodes ‘Node 103’ and ‘Node 115’, and remove the other 2 benchmark starts immediately downstream of the Table Creator nodes:

The caveat here is that the two ‘XLS Control Table Generator’ nodes will not be included in the timing - if you want that too, then use the hidden flow variables to connect them too the output side of the Benchmark Start:

image

Modified version of 2nd example (141.4 KB)

No… I don’t mind including the Excel readers…
This is actually a bit lot easier…
The Benchmark Start (Flow Variable) node doesnt need an input, so you can put it before all the nodes to time and connect via their optional input ports:


Modified version of 2nd example (142.7 KB)

Hopefully that makes some sense…?

Finally, I mentioned the lack of a 4-port node… I’ve been doing some work reworking some of our other flow control nodes to take advantage of the new configurable ports feature. I plan to update the benchmark nodes similarly to the ‘Multiport Loop End’ node we released recently, so a single node for each of start and end will be able to be configured with multiple ports of different types…

Yours,

Steve

3 Likes

Hi Steve,
Sorry for late answer but I was in vacation.
first many thanks for your answer its is very helpful and its running.

Your modified version works and I understand the principle.
But how can I separate the time measuring for the different sections?
Is my understanding right, that I have to wait til your mentioned update of the benchmark nodes?

Thanks for your answer

BR
Burkhard

At the moment then unless you can form individual benchmark loops in places in the workflow with either flow variable or standard datatable ports, then the only option you have is in the ‘Benchmark Start’ node to check the ‘Report node times’ and ‘Probe Wrapped Metanode timings’ option. Then after the benchmark end you can use an Ungroup node on the timings output port to get timings of all individual nodes

Steve

Hi Steve,

thanks again for your answer.

I have tried your Vernalis Benchmark node in small Knime workflow and its works, but no breakdown by the Ungroup node to the used nodes. Why get I here no Breakdown to the different use nodes?

If I use this in a big KNIME Workflow, it doesn’t work and I get the error message:
Execute failed: fromIndex(1) > toIndex(0)
Can you tell me why this error happened?

I have started this worklfow with the “Benchmark Start (Flow Variable)” and have end with “Benchmark End (Flow Variable)”
The Workflow have > 500 nodes, ca. 10 Excel Readers, a lot of Excel formaters

I have connected the “Benchmark Start (Flow Variable)” only to “one” variable port and also the “Benchmark End (Flow Variable)”

The “Benchmark Start (Flow Variable)” is working, the “Benchmark End (Flow Variable)” not and gives the above error message

Thanks in advance for your help and answer

BR
Burkhard

Hi Steve,
I created a screenshot of a short test of the "Benchmark End " note in the beginning of my workflow.
It is only to testing, if the error appears right from the beginning, which is the case, like in the end of the workflow. Maybe this helps you a little bit better to find the problem.
Benchmark node problem

BR
Burkhard

Thanks Burkhard. Is there any more detail in the KNIME console or Log when this happens? (You might need to set the console logging level to INFO or DEBUG temporarily if nothing more than the message above)

Steve

Hi Steve,
the error message which I have sent comes directly from the node info when the white cross in the red circle appears.
I found in the console the following error message:
ERROR Benchmark End (Flow Variable) 3:1634:0:2356 Execute failed: fromIndex(1) > toIndex(0)
I have placed the ERROR Benchmark End (Flow Variable) 3:1634:0:2356 Execute failed: fromIndex(1) > toIndex(0) like already written in the beginning of my workflow to see if its happen also there.

I get from Knime the info that I should install the " KNIME AP Core feature", but its not possible with my Knime 4.3.3, I think its not compatible with 4.3.4. Has this problem to do with this extension? I don’t think so cause your first feedback in the “small” Excel workflow was working.

Thanks for your answer :wink:
BR
Burkhard

Hi Steve,
only as additional information, I try it also with Knime 4.4, and the same failure happend with the same error messge again: ERROR Benchmark End (Flow Variable) 0:1634:0:2356 Execute failed: fromIndex(1) > toIndex(0)

Are there rules existing, which node are allowed between a “Benchmark Start (Flow variable)” and a “Benchmark Start (Flow variable)”?

BR
Burkhard

OK, a few more questions to try to get to the bottom of this.

  1. Do you have any metanodes / components in the full workflow between the start / end nodes?
  2. Does it run OK without the individual nodes timings options, or does it fail in both cases?
  3. Could you also look in the Log viewer (ViewOpen KNIME Log) - this is best immediately after a failure and then scroll right down to near the very newest entries. - I’m looking for something vaguely like:
2021-08-24 15:47:59,015 : ERROR : ModalContext :  : AggregationMethods : GroupBy : 0:23 : Problems during initialization of aggregation operator (with id 'com.vernalis.knime.fingerprint.aggregators.bitvector.BitVectorAndOperator'.)
org.knime.base.data.aggregation.DuplicateOperatorException: Operator with id: Bitvector AND already registered
	at org.knime.base.data.aggregation.AggregationMethods.addOperator(AggregationMethods.java:468)
	at org.knime.base.data.aggregation.AggregationMethods.registerExtensionPoints(AggregationMethods.java:445)
	at org.knime.base.data.aggregation.AggregationMethods.<init>(AggregationMethods.java:336)
	at org.knime.base.data.aggregation.AggregationMethods.getInstance(AggregationMethods.java:348)
	at org.knime.base.data.aggregation.AggregationMethods.getMethod4Id(AggregationMethods.java:738)
	at org.knime.base.data.aggregation.ColumnAggregator.loadColumnAggregators(ColumnAggregator.java:282)
	at org.knime.base.data.aggregation.ColumnAggregator.loadColumnAggregators(ColumnAggregator.java:239)
	at org.knime.base.data.aggregation.ColumnAggregator.loadColumnAggregators(ColumnAggregator.java:224)
	at org.knime.base.node.preproc.groupby.GroupByNodeModel.validateSettings(GroupByNodeModel.java:402)
	at org.knime.core.node.Node.validateModelSettings(Node.java:667)
	at org.knime.core.node.workflow.FileNativeNodeContainerPersistor.loadNCAndWashModelSettings(FileNativeNodeContainerPersistor.java:211)
	at org.knime.core.node.workflow.FileSingleNodeContainerPersistor.loadNodeContainer(FileSingleNodeContainerPersistor.java:259)
	at org.knime.core.node.workflow.WorkflowManager.postLoad(WorkflowManager.java:8297)
	at org.knime.core.node.workflow.WorkflowManager.loadContent(WorkflowManager.java:8198)
	at org.knime.core.node.workflow.WorkflowManager.postLoad(WorkflowManager.java:8313)
	at org.knime.core.node.workflow.WorkflowManager.loadContent(WorkflowManager.java:8198)
	at org.knime.core.node.workflow.WorkflowManager.load(WorkflowManager.java:8152)
	at org.knime.core.node.workflow.WorkflowManager.load(WorkflowManager.java:8073)
	at org.knime.core.node.workflow.WorkflowManager.load(WorkflowManager.java:8046)
	at org.knime.core.node.workflow.WorkflowManager.loadProject(WorkflowManager.java:7891)
	at org.knime.workbench.editor2.LoadWorkflowRunnable.run(LoadWorkflowRunnable.java:182)
	at org.eclipse.jface.operation.ModalContext$ModalContextThread.run(ModalContext.java:122)

(The details will be different - I’m guessing a bit but something like java.lang.IllegalArgumentException perhaps - this is just a recent stack trace I could find in my own log file)

  1. Could you go to HelpAbout KNIME Analytics Platform and click on the Installation Details button, then change over to the ‘Configuration’ tab and copy/paste the configuration details (please delete everything apart from the sections headed *** Features: and *** Plug-in Registry:)

  2. If you can reproduce with a scaled-down, shareable workflow, then that would also be really useful!

Thanks again,

Steve

Processing: knime.zip…
Hi Steve,

thanks for your detailed answer.

Here my answer to your questions:
1.) Yes I have one component node in the full workflow between the start / end nodes?
But I have tested it in the workflow and put the Benchmark end note far before the components
node.
I have tested the Excel formatting workflow and create a components node. Also with the
component node it works, see attachment
2.) When I delete the Ungroup node it is not working.
When I unselect the “Report node times” it is working, surprise, surprise
3.) I have saved for you the knime.log, without XLS Formatter Infos cause file is then to big, knime_without XLSFormatter Info.log (2.0 KB)
see attachment
4.) please find also the saved configuration details in the attachment
5.) I try my best and will try what I can. You will get a workflow later
6.) What is with this error message: can’t install “KNIME AP Core feature”? Do I need thisConfiguration KNIME.txt (356.0 KB)
Modified 2nd Vernalis Benchmark example for Excel nodes with component.knwf (146.6 KB)

Thanks in advance for your help I appreciate it very much! :wink:

BR
Burkhard

1 Like

Thanks for this - I will have another look into this tomorrow - 3) does at least pinpoint where the node failure is - now I just need to figure out why it fails - which is probably related in some way to the mysterious ‘KNIME AP Core feature’ error.

More anon…

Steve

Hi Steve, I found meantime out, that the error message: “KNIME AP Core feature” in all 3 versions was creating by the Vernalis nodes. (v4.2.1, v4.3.4, v4.4.1). When I delete both vernalis benchmark nodes save and open again, no more failure message appears and problems with Excel nodes (old and new one) also disappear.
I do not why, but now our huge workflow with 1600 nodes is running very well.

But if you will find the error problem from the vernalis nodes, I’m interested to know it cause I want to use them in the future. Til then I will use the “Timer Info” node

BR
Burkhard

2 Likes

Thanks Burkhard, and thanks for your patience with this.

I just created a new vanilla KNIME 4.2 with our public nodes in and unfortunately I still don’t see the KNIME AP Core feature message - where and when do you see it?

Also, could I check which version of our nodes do you have installed?

I’m going to tag @gab1one here in case he can see anything obvious that I’m missing (Gabriel - the source for the plugin that contributes these nodes is at:

)

Steve

1 Like

I think that message only would appear if you import a workflow that was created with a later version of AP than the one you have installed.

3 Likes

Hi Burkhard,

I’m looking again at this - could you do something for me please? Could you right click on the failed Benchmark end node and select the ‘Select Scope’ option, and let me know what happens? The only reason I can find for this error to happen at the moment is if KNIME doesnt think the loop is closed and therefore has only the loop (= benchmark) end in it, and this is the only way I can think to see if that is what is happening here.

Steve

1 Like

Hi Steve,
sorry that you haven’t heart so long nothing from me but I was busy with an other project.

If have try it now again with the KNIME 4.4.1 Version and the same Workflow.
And surprise it is working, I can’t trust my eyes.
I use again the “Benchmark Start Flow Variable” and the “Benchmark End Flow Variable”.
Running time is: 1,834.655 (s) displayed in the “Ungroup” node.
Why this is now running I can’t tell you but I’m really positive surprised.
I will try it again with the KNIME version 4.3.4. Lets see what will happen with the same workflow

You will get my update.

BR
Burkhard

1 Like

Hi Steve,
I have saved something in my old workflow version with new KNIME version 4.4.1
Therefore I can’t test the workflow any longer with KNIME 4.3.4.
KNIME tells me that it is not possible to load a workflow with KNIME 4.3.4 when it was saved with KNIME 4.4.1.
If you have an idea, how I can test it, I would help you.
Otherwise I would say, the new KNIME version is able to work with and thats the important thing, that I have a solution.
BR & many thanks for your help
Burkhard

1 Like

Hi Burkhard,

Thanks for the updates - I’m glad it’s now working for you at least. Hopefully it stays that way and doesn’t break for anybody else!

Steve