Database Row Filter with Row to Variable Loop

Hi all

I need help with my workflow trying to query from a database with over 160 millions of records. Here is my workflow and a sample of the data:

sample data.xlsx (7.9 KB)

Sample workflow.knwf (21.5 KB)

I am having problems getting my Database Row Filter work to filter on SWFID as a flow variable so the Database Connection Table Reader only reads the data corresponding to one SWFID at a time to avoid processing the entire records from the database.

Any tips would be greatly appreciated.

Thanks
Jennifer

hi there
try a group by on SWFID and use a columnfilter so only this column is avasilable
also maybe experiment with different loop starts such as chunk loop start…

Hi Jennifer,

i personally would use a 2-step approach.

  1. Use the “select distinct SWFID from [table]” to get unique values for SWFID
  2. Connect the Output to the Table Row to Variable loop Start Node
  3. Use this Variable to Filter the Data in the Database Reader with the criteria "where SWFID = ‘$SWFID$’.
1 Like

Hi there

Your suggestion helps and thanks for your help.

Cheers
Jennifer

Hi there

I tried to use the following SQL statement in the Database Table Selector node:

SELECT COUNT(DISTINCT SAM.SAM_Data_Old_JoinedData_Final_Jen.Date)
FROM SAM.SAM_Data_Old_JoinedData_Final_Jen
WHERE SAM_Data_Old_JoinedData_Final_Jen.Date <= ‘2016/12/31’ & SAM_Data_Old_JoinedData_Final_Jen.SpeedZone <> 10
GROUP BY SAM_Data_Old_JoinedData_Final_Jen.SWFID
HAVING COUNT(DISTINCT SAM_Data_Old_JoinedData_Final_Jen.Date) > 7;

SELECT * FROM SAM.SAM_Data_Old_JoinedData_Final_Jen

But it did not work. I could not figure out why. Please help.

Thanks
Jennifer

Hi Jennifer,
with your statement you only retrieve the counts for the rows but no SWFID column which is necessary for the Loop.

please try the following statement.

SELECT SAM_Data_Old_JoinedData_Final_Jen.SWFID,
COUNT(SAM.SAM_Data_Old_JoinedData_Final_Jen.Date)
FROM SAM.SAM_Data_Old_JoinedData_Final_Jen
WHERE SAM_Data_Old_JoinedData_Final_Jen.Date <= ‘2016/12/31’
AND SAM_Data_Old_JoinedData_Final_Jen.SpeedZone <> 10
GROUP BY SAM_Data_Old_JoinedData_Final_Jen.SWFID
HAVING COUNT(SAM_Data_Old_JoinedData_Final_Jen.Date) > 7;

1 Like

Hi there

When I run the sql statements, KNIME gives me an error saying “Error while validating…” if I don’t add Select columns From table after the Having Count…; statement.

Would you know why?

Thanks
Jennifer

Hi Jennifer,

maybe you have trouble with the date-format.

is it possible to run the query without the date filter criteria and only with the SpeedZone criteria?

e.g.

… where SAM_DATA_OLD_JoinedData_Final_Jen.SpeedZone <> 10 …

if this work you have to verify your date format.

Regards
Hermann

Hi Hermann

I have tried running the query without the date filter criterion and made some debugging on the sql statements. I still can’t work out why I can’t run the Database Table Selector without the additional Select and From statements after the query as below:

SELECT SAM_Data_Old_JoinedData_Final_Jen.SWFID,
COUNT(DISTINCT SAM.SAM_Data_Old_JoinedData_Final_Jen.Date)
FROM SAM.SAM_Data_Old_JoinedData_Final_Jen
WHERE SAM.SAM_Data_Old_JoinedData_Final_Jen.Date = ‘2016-12-31’
AND SAM.SAM_Data_Old_JoinedData_Final_Jen.SpeedZone > 10
GROUP BY SAM.SAM_Data_Old_JoinedData_Final_Jen.SWFID
HAVING COUNT(DISTINCT SAM.SAM_Data_Old_JoinedData_Final_Jen.Date) >=7;

Would there be other reasons than the date format?

Thanks
Jennifer

Hi Jennifer,

please try it by using the Database Reader Node and not the Database Table Selector Node.
Probably this solve your problem.

Regards
Hermann

1 Like

Hi Hermann

Thank for the tip. Replacing the Database Table Selector Node with the Database Reader Node worked.

Wonder why we can’t just use Database Table Selector Node to do the same filter.

Anyway thanks a lot for your help.

Cheers
Jennifer

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.