KNIME v4 database node changes issue

#1

I am currently working on migrating my 3.7.2 workflows to v4 and found a problem I just can’t seem to find a solution to.
I have workflows that use a SQL Server db query that uses the “Database Looping” node.
The query extracts data from multiple tables (joined) for each account number in a column.
The looping node is now legacy and there isn’t a version in the new database nodes.
Does anyone know of a solution?
Thanks
Steven

0 Likes

legacy Database looping replacement
Database Looping node
#2

Try to replace it by Parameterized Query. I would say it is a real loss.

0 Likes

#3

I’ve tried the parameterized query reader now.
It seems to continually run without result. and also seems to need a black connector where as the SQL server connector puts out a red connector.
Not a clean alternative to the legacy looping node.

0 Likes

#4

I had trouble with legacy Parameterized Query node and use Database Looping one as a better option. Why KNIME decided to eliminate it?

0 Likes

#5

Hello,
we haven’t re-implemented the existing Database Looping node for the new db framework because the looping node had some flaws which were already eliminated by the Parameterized Database Query node in the old database framework. Among the problems is the duplicate value handling, optional database connection in-port, the hard coded quoting of the value list, and the performance problems since the looping node does not use prepared statements. That is the main reason why we haven’t migrated the Database Looping node but the Parameterized Database Query node which is the Parameterized DB Query Reader node in the new db framework. We think that the parameterized nodes provide more functionality, better control over the generated SQL query and improved performance then the looping node. But we might be wrong which is why your feedback is important to us. So if you have a case that can not be solved with the Parameterized DB Query Reader node let us know and we need to rethink our decision.

For the migrating from the Database Looping node to the Parameterized DB Query Reader node I suggest to first assemble the query that would return all of your data e.g. join the tables you need and use this as input for the Parameterized DB Query Reader in which you then simply filter based on the values of the input column(s).

If you still want to use something similar to the old Database Looping node you could use a loop construct with a DB Reader in KNIME. Admitted, depending on the options you use from the node this loop might be a bit more complicated :slight_smile: That is why I have created a component that emulates the old Database Looping node for the new database framework but which hides all the complexity. You can copy&paste the node from the workflow that I have uploaded to the KNIME Hub at https://kni.me/w/7tCeIJ6M6ZX0o3xO

@izaychik63: It would be great if you could tell us which problems you had with the Parameterized Database Query node.

Thanks for using the new database framework and your valuable feedback. Your feedback helps us to improve KNIME Analytics Platform and the new database framework.
Bye
Tobias

2 Likes

#6

Thank you, Tobias, for detailed explanation. First of all I have trouble to switch to new framework because it is not possible to use old and new nodes in parallel. In my situation I cannot have 2 parallel installations of KNIME because of strict company policy.
As for Parameterized Database Query node, it works only with flow variable as query parameters but not with fields from input table. Believe in new framework it works.
In general new framework takes much space. In version 4 two compact nodes gone Database Reader and Database Looping. Many people do not need details and flexibility but functionality and compactness.

Just checked, Database Looping returns same number of rows as MS-SQL SMS and Parameterized Database Query returns less records (query has distinct clause).
Please, bring Database Looping back to the new release.

1 Like

#7

In general the old and new framework can we used in parallel except when you use native authentication to connect to a Microsoft SQL Server as discussed in this thread. We weren’t aware of this problem and are working on a solution for it. We know that our users use the database node extensively and with many different settings. That is why the old framework is still available and only flagged as legacy. So as long as we haven’t solved the above mentioned problem you can use the old framework as before.

The Prameterized Database Query node works with the column values of an input table. To do so simply double click in the node dialog on the column name and a place holder will be inserted into the SQL query at the current cursor position. When the node is executed this placeholder is replaced by the value from the column. This way you can also use several input columns in your SQL query.

The Database Reader node has been replaced by the DB Query Reader node. For details about the mapping from old database nodes to new db nodes have a look at the node name mapping table in the database documentation.

Can you please provide more information or ideally an example workflow that demonstrates the problem where the Parameterized DB Query Reader node returns less records then the Database Looping node. This would help me a lot.
Thanks
Tobias

0 Likes

#8

Tobias, I need to state again that Prameterized Database Query is not accepting parameters from Column List with error massage The index 1 is out of range.
It is not possible for me to provide data set as far it is confidential. And my reason to avoid this node initially was based on the incorrect result comparatively to Database Looping.
Also, performance of Database Looping is better then combination of the Chunk loop and Prameterized Database Query.

0 Likes

#9

Sorry for being so persistent but I try to understand the problem so that we can fix it. If you can not provide the data can you at lease tell me the query you are using e.g. where you are using the distinct clause. You can also change the column and table names if needed or send it to me via personal mail.

I tried to reproduce the problem with the Parameterized Reader nodes but couldn’t so far. I have uploaded the workflow that I used to try it out. It would help me a lot to find the problem if you could could either manipulate my example workflow in a way so that it fails or if you could send me the KNIME log with the full error message. To view the KNIME log go to View->Open KNIME log then you can go to File->Save as to save it on your local disk.

Thanks
Tobias

0 Likes

#10

Here’s SQL and the screen. When I use Flow Variable NPI as parameter it works (but incorrectly in some cases (some NPI returns less records then expected), when I use NPI from column list - index 1 is out of range error goes to log).

select distinct dt.DateValue as Service_Date, enc.EncounterEpicCsn

,pos.Name as POS, pd.ProviderEpicID, pd.NPI, pd.Name as Billing_Prov, pd.ClinicianTitle, pd.PrimarySpecialty

, pp.ProviderEpicID as Serv_ProvID, pp.Name as Serv_Prov, pp.ClinicianTitle as Ser_ClinicianTitle, pp.PrimarySpecialty as Ser_PrimarySpecialty

,c.PrimaryMrn, c.Name as Patient

, a.BillingProcedureQuantity

–, coalesce(d.CPTCode, d.HcpcsCode) as CPT_Code

, a.BillingProcedureCode as CPT_Code, a.BillingProcedureDescription as CPT_Desc

, dx.Value as Dx, dx.DisplayString as Dx_Desc

, cvg.BenefitPlanName, cvg.PayorName

from fullaccess.BillingTransactionFact a with(nolock)

join fullaccess.PatientDim_current c with(nolock)

on c.patientKey = a.patientKey

join DiagnosisTerminologyDim dx with(nolock)

on dx.DiagnosisKey = a.PrimaryDiagnosisKey and dx.Type = ‘ICD-10-CM’

join ProviderDim pd with(nolock)

on pd.ProviderKey = a.BillingProviderKey and pd.IsCurrent = 1

join ProviderDim pp with(nolock)

on pp.ProviderKey = a.ServiceProviderKey and pp.IsCurrent = 1

join PlaceOfServiceDim POS with(nolock)

on pos.PlaceOfServiceKey = a.PlaceOfServiceKey

join CoverageDim cvg with(nolock)

on cvg.CoverageKey = a.OriginalCoverageKey

join EncounterFact enc with(nolock)

on enc.EncounterKey = a.EncounterKey

join DateDim dt with(nolock)

on dt.DateKey = a.ServiceDateKey

where a.TransactionSubtype =‘Post’ and a.TransactionDetailType = ‘New Charge’

and dt.DateValue > dateadd(mm, -2, getdate())

and (pd.NPI =’$${SNPI}$$’ or pp.NPI = ‘$${SNPI}$$’)

0 Likes

#11

Hello izaychik63,
thanks a lot for the information. I was now able to reproduce the the index 1 is out of range problem. I haven’t thought about using the same column twice. I have created a ticket for it to fix it as soon as possible.

We will also gather further feedback to see if we can re-implement the Database Looping node. But I can see that the Prameterized nodes would return more rows then the Looping node if you have a distinct clause for all result columns. However I guess this would also only work reliably if you create an in statement that contains all possible input values and thus execute only a single query.
Bye
Tobias

0 Likes

#12

Thank you, Tobias. I agree that if Parameterized node will have an option to create a temp table for input values to generate code like Select * from table where value in (select in_parameter from temp_table). Of cause implementing Database Looping in v. 4. will help to switch to the new version.

0 Likes

#13

Hi izaychik63,
we have just released KNIME 4.0.1 which also contains the fix for the Parameterized Reader bug (AP-12329). For further details see the changelog. To update simply open KNIME and go to File->Update KNIME…
Bye
Tobias

2 Likes

#14

Thank you, Tobias, for a good news. Is the issue with parallel use of new and legacy DB node functioning is resolved too?

0 Likes

#15

Hello izaychik63,
do you mean connecting to a SQL Server via native authentication with the legacy and the new MS SQL Server Connector in parallel? If yes this is not fixed but we are working on it for the winter release.
Bye
Tobias

1 Like