how to use SQL select-from-limit?

paskal007r · October 22, 2013, 9:48pm

Hi, I'm new here, I'm trying to use knime for my master thesis (twitter data analisys) in particular I need to perform some basic k-means clustering over tweet features that a parser daemon measures.

Now, I don't need to parse all the data for that, but just a sample.

I considered using a row sampling node, but I wander if there is a way to just use a LIMIT clause in the sql query. I tried to just add "LIMIT 0,1000" to the sql query both in database connector and database reader, but if I click "apply" it gets a red dot.

Is there a way to avoid downloading everything (best if with slq LIMIT clause)?

Thank you for your time and attention!

thor · October 23, 2013, 12:42pm

The read dot should also show you the reason in its tooltip. In your case I believe the SQL syntax is wrong, limit only take one argument (the number of rows) instead of two.

paskal007r · October 27, 2013, 11:10am

Dear thor, thanks a lot for your reply,

I tried to use only one argument in the query, but the result is the same error, althrought I find that the error is quite illuminating:

while my query is now

SELECT tweet.metadata FROM tweet LIMIT '1';

the error I get is:

WARN Database Reader com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''1'; LIMIT 0' at line 1

which makes me think that there is a limit clause that the program itself adds to the query. Now I have to find where to edit that value to fit my task.

Anyone knows how?

thor · October 29, 2013, 10:22am

This is an error in the node. It adds a "LIMIT 0" at the end of the query in order to determine the table structure. If you have a LIMIT already in the original query, the resulting SQL is invalid. This will be fixed in 2.9.

paskal007r · November 3, 2013, 6:08pm

Ok, thank you!

That's quite a problem to my project, I'll build a dummy table to host just the sample and use the data from that instead than the original one, but now I know what's the problem ;)

thanks again and goodbye!