Missing sample datasets

I'm looking at the samples/examples, in particular the cross validation example. The zip file does not contain test.csv or training.csv. I've browsed the site, the forums, googled, all without any luck.

Can someone tell me where to find these files? The examples don't seem to do anything unless you provide the data (the 'green light' is on for all nodes, indicating I should be able to see some result from one or more nodes, but the option to view the results is not available)

If it's possible to see the results without getting these files, and I'm just missing something obvious (or not so obvious), feel free to point it out.


For reference, I did find the 'benchmark' archive, but the filenames, locations, and potentially contents are significantly different than what these examples expect. For example, I'm playing with the CDK example. There's the "NCI HIV data" and "NCI Mechset" input steps. The closest dataset I see in this benchmark archive is a folder named nci_aids containing all.csv, cleaned.csv, and readme.txt.

If anyone can help ... I'd appreciate it.


Hi Brian,

the workflows that are available on the examples page are all partially executed, in particular the source nodes (such as the file reader in the cross validation example) are executed. This has been done on purpose so that users don't need to care about the source files and can solely concentrate on the cross validation node or the cdk nodes (depending on which example you run).

If you happen to reset those source nodes you have problem... I agree. In this case you should start over from scratch, i.e. delete the workflow and import it again from the zip file.

If you want to get the data out of the imported workflow (for instance if you want to run a different program on the data set, I suggest to use the CSV writer node that is then connected to the outport of the file reader).

The benchmark data set file contains only a small subset of potentially interesting data sets. Most of them are from the UCI Machine Learning Repository!. (And so is the data set in the cross validation example.)

Let me know if you need more details.


The trouble is, I'm not just wanting to see the end result, but what happened along the way. What each step did to the data, how it affected things, etc, and most of the steps don't have a way to view those results, only a couple of the steps offer that feature unless you re-run the scripts against the original source data yourself.

This may be a bug, in which case I can write it up ... I just wanted to familiarize myself with the various nodes by looking at an existing example of, not just the nodes operating on the data, but the source & modified data being used 'properly', so to speak, as sometimes it's easy to find yourself wondering what the right way to do something is when it isn't entirely obvious.


Hi Phantal,

you can download the source files for the cross validation and the cdk example under this link. This zip archive needs to be extracted somewhere in your local file system and you will need to point the individual source nodes (file readers and sdf reader) to the new locations. All but one nodes should perfectly configure themselves if you set the new file location. The file reader in the CDK example (which reads the file 'HIV-CA-02.smiles.gz' needs little configuration: In the text field "column delimiter" enter "=>" and also change the column type of the single column being read to Smiles by right clicking the column header).

Please also note that each node in the pipeline has an outport view, which shows the output tables. These views can help a lot to see what has changed from one node to another. Just right click any node and then select "Data Output 0", for example.

Hope it helps.


Thank you, I appreciate you going through the effort. I've been impressed with Knime thus far, I just wish I had more free time :)