In DB GroupBy node, DB Type is Number but Knime Type is String

I am connecting to a SQLite DB that uses the Chinook dataset (an online music company’s DB with data on albums, artists, etc.). I want to build a simple workflow that select the album table (DB Table Selector), counts the number of albums by artists (DB GroupBy), and then filters the results based on artists having > 10 albums (DB Row Filter).

The table I selected has 3 columns: AlbumId (Integer), ArtistId (Integer), and Title (Strig). In the DB GroupBy node, I set the grouping variable to ArtistId and do a manual aggregation of AlbumId by Count. The node runs successfully, but as you can see in the output below, the Count of AlbumID has a DB Type of Numeric whereas the KNIME Type is a String. As a result, I am unable to filter by values. Why is the variable treated as string despite being an integer, and how can I fix that? Thanks.

output

Hi @saddas ,

This appears to be a bug in the sqllite jdbc driver.

To investigate, I created a demo workflow for others to play with, using SQLLite which demonstrates the issue you have raised. You can find that below.

I did a search of the forum and found this: DB Reader ignoring types for SQLite databases - #4 by mlauber71

The suggestion there was to maybe update the SQLite driver to the latest version, or alternatively are you able to move to using an H2 database instead of SQLite? I haven’t tried updating my SQLite driver to see if that will resolve this specific problem. I’m currently using the SQLite 3.36 driver which was the default install on my KNIME version (4.7.1)

Demonstration of the issue

Demo SqlLite Count as String issue.knwf (34.8 KB)

In three of four different ways of trying to obtain a count, it returns a String. It appears on the face of my examples that the act of renaming the count() column causes KNIME to return the data as a string, even though previewing the data in the DB Table Selector returns it as Integer

image

The only way KNIME returned it as an Integer was by writing a custom query in the DB Table Selector, and not renaming the column.

i.e. This returns an Integer:

SELECT "ArtistId", 
cast(count("AlbumId") as Integer)  
FROM #table#
group by "ArtistId"

But this returns a String:

SELECT "ArtistId", 
cast(count("AlbumId") as Integer) as "COUNT(AlbumId)"
FROM #table#
group by "ArtistId"

I hope this provides some assistance or pointers. Maybe somebody who already has a later SQLite version installed can try the workflow above out to see if the issue persists?

4 Likes

@takbb : Thanks for reproducing the issue and letting me know it’s a bug with the JDBC driver. I downloaded the latest SQLite JDBC driver (3.41), but the problem still happens.

image

Ideally, I would like to keep working with SQLite and not switch to H2. If I understand correctly, the only workaround that works for this as of now is to write custom queries and not to rename the column?

Hi @saddas , that’s what I’m assuming. I don’t know for certain that the bug isn’t elsewhere but it certainly seems specific to SQLite, and from that other thread it does seem to implicate the driver. I tried the same with a different database (Oracle) and it works ok. It looks to me that renaming the column (which the DB Groupby will also be doing internally) is linked to the problem.

Given that you are using SQLite, the advantages of performing the query “in database” are not as great as if you were using a full client server database with large volumes of data being pulled across a network, and for this particular use case you could possibly pull the albums table data back using a DB Query Reader and let KNIME perform the group by operation with its Group By node instead. This would achieve what you want without writing sql but whether that is appropriate will depend on what else you are trying to do, or if maybe this is for a proof of concept.

Could you test using a name without special characters?

Hi @mlauber71 , no unfortunately it doesn’t seem to make a difference. I just tried it with returning a simple column name with no brackets etc but it still came in as a String.

However, @saddas , Here is a workaround

Further to this, I think now that what is happening is that under specific circumstances (possibly to do with renaming of the column), SQLite is failing to inform KNIME of the data type of the returned column and therefore KNIME is having to treat it as String, being the default option to fit a “variant/unknown” data type.

So, if the DB Table Selector is configured to give it a helping hand with the type mapping, it can be made to work:

It is important that the Source Type is specified as NULL. This is the clue, I think that suggests that SQLite is failing to return the data type of the column. Using regex we can tell it to return any columns beginning COUNT as a numeric (Integer)

There is then no need to do a custom query:

and at the end, the count is returned correctly:
image

I’ve uploaded an updated demo, with this workaround at the bottom

Demo SqlLite Count as String issue - with workaround.knwf (43.3 KB)

2 Likes

@takbb: Thanks very much for your efforts to explain this bug and to come up with a workaround.
To provide more context of my use case, this is purely for pedagogical reasons to teach students about the use of SQLite, and we want to do it within KNIME as we use KNIME for other types of data analytics as well. The idea is to compare between the query nodes (relying fully on SQL queries) and the combination with KNIME DB nodes to simplify the process.

It seems that if I simply write a SQL query to perform the count (e.g., in DB Query Reader: select ArtistId, count(AlbumId) as NumAlbums from “Album”), the issue with the conversion to string will not occur.
As for your workaround, it looks great but when I tried it on my workflow, it didn’t work:
SQL_sample.knwf (23.0 KB)

I am still unsure why the input mapping didn’t work in my case.
But in the meantime, another workaround is to first read the selected DB table into a KNIME table and then to use the regular KNIME nodes. This maintains the correct data types:

I figured out now why the input mapping wasn’t working. The Source Type was set to NULL. When I changed it to NUMERIC, it worked and now the COUNT variable is treated as NUMERIC.
Thanks @takbb for leading me in the right direction.

2 Likes

Here is my final workflow showing three options for getting the result I needed (counting the number of albums for each artist and filtering for >5 albums):

Hi @saddas,

My pleasure! It is interesting that you needed to specify the Source Type mapping as “NUMERIC” instead of “NULL” that I used in the demo. In my workflow, if I do that (instead of NULL) it comes through as String, so there are clearly some other (as yet unknown) factors at play here.

Unfortunately I cannot try out your uploaded workflow as it references your local database files, but I’m glad you managed to get the workaround (to the workaround) to work for you :wink: .

1 Like