Datasets

Hello, this is my first post here on KNIME forums, so I apologize if anything I say or do isn't acordingly to the rules.

I am going to try and explain as better as I can what I need. I apologize again if this isn't the right place to do this question but I'm running out of ideas.

I am in need of running a few tests on several KNIME processes for Data Mining tasks of Classification, Clustering, Association and Regression (I can be more specific about exactly what processes i want to test if need be).

For the tests I need to run I need to find compatible datasets with such tasks, preferably as big as 4GB or somewhere near that range.

My problem (after looking in all most known dataset repositories) is that its hard to know what datasets fit into each task before hand (not mentioning most dont have the size I need) and I do not have time to look into them all.

Can anyone point me up to some datasets with the required specifications that I can just run on KNIME as soon as possible?

This is why I come here, where I expect some people to be more experienced in this matters, to get some help. I hope you can help me and help me learn as much as I can. Sorry for anything and thanks ahead for the help.

Hi Pedro,

I cannot quickly come up with a dataset you are asking for. But if you need *any* type of dataset large enough and proper for KNIME Data Mining nodes (if I understood you correctly), then you might have a look at Data Generator node.

If you generate ~100K rows (Pattern Count parameter), it would be around 4Gb. You can use then Table Writer node appended to save the generated data to your hard disk. Though it might take a certain time to be finished.

You can, of course, set up other parameters to your needs or use a more specific data generator - you will find a lot of them in IO -> Other -> Modular Data Generation.

Best,
Oleg