lately I keep getting the error: <Execute failed: (“StackOverflowError”): null>
I have many nodes working on the database. The error happens with a simple inner join. Both tables are not empty. Another join also does not work.
Can it be that simply some limit of number of characters or something like that is exceeded? I copied the SQL code of the previous working node and there are 4908 rows. With the inner join maybe the 5000 are exceeded and it doesn’t work because of that?
Hi @andre17 and welcome to the Knime Community.
A StackOverflowError indeed looks like the query might be too big. There are a lot of factors to look at than just the number of rows when determining the size of a query. How many columns are retrieved? What kind of data is in those columns? Any sub queries involved?
It all comes down to how much data you are dealing with.
For example, let’s say in both cases we compare only a single column, 1 row of Text containing a full article is much bigger than 10 rows of numeric values. So it’s not just about number of rows.
Can you check how big your query is?
Hi @bruno29a and thank you for your response.
I have checked and the preceding working node processes a dataset of 26.4 GB and has a result of 570,000 rows.
Compares only one column with each other and also adds only one column.
To reduce the data set, I removed all columns except the one relevant for the join as a test before the join. Same result.
Hi @andre17 , no problem.
As you can see, you are dealing with quite a large size of data.
What is the size of the results after you reduced the data set and removed all columns that you do not need?
You can also share your query to see if it can be optimized.
I have made some tests:
- Period reduced to one day. So the dataset consisted of 1.8 GB and 11.500 rows. The error still occurred. Therefore I would exclude the amount of data as the cause of the error.
- The column to join is a STRING. The content describes a unique brand set. So I created some nodes after and created a set_id with type long. The join should now run over the set_id and no longer over the unique brand set. Same error.
- I took out six nodes which served to reduce the data set (among other things because i need the WHERE IN command and had to take a workaround). The dataset to join now became larger. But as the nodes got fewer, the query also got shorter (less complex for KNIME?). Now the join worked without any problems. Unfortunately, the result is not satisfying. It is important that I analyze only the subset and not the full dataset.
The main problem is caused by the missing WHERE IN command in KNIME. I need to concatenate 8 tables to determine specific session_ids. But in doing so, I lose some rows of the whole session per session_id. Therefore I need again the source table to filter it by the determined session_ids (with the WHERE IN command). This must be linked again with the 7 tables to get all information about the sessions.
Is there a way in KNIME to store the result of a query in a new temporary table name? I would then be able to access this in the subsequent nodes and greatly reduce the complexity. Like this:
(long complex query)
session_id IN (
session_idFROM #table# AS
Hi @andre17 , sub
IN statements are generally bad - it’s an issue with a few db systems.
It’s better to move the sub
SELECT as a table rather than as a condition, like this:
FROM tempTable t
JOIN (SELECT DISTINCT session_id FROM #table#) AS table ON t.session_id = table.session_id
thank you very much.
I followed your advice and omitted or rebuilt sub SELECT IN statement. Now the workflow works again. In summary, it was probably the length or the unnecessary complexity of the query.
Thanks for the support!
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.