How to avoid duplicate data for Tableau Writer

sroberts · August 31, 2022, 8:22pm

Hi all,

I have a workflow that calculates training performance data for each supervisor for each day. Knime will add a column with the execution date so that I can ‘trend’ data over time as the information changes. The idea is to run this workflow daily to gather the data at that moment in time.

The issue is that I can “append” the data onto the Tableau file after I run each time, however, if I do so, I get duplicate data if I were to ever accidentally run the workflow twice in a row on the same day.

Duplicate row filter won’t work in this context because the duplication would only happen at the very end of the workflow.

I’m at a loss for how I can solve this issue. Any ideas?

iCFO · August 31, 2022, 9:46pm

Use the Joiner node. Join the tableau dataset with your updates and only append the updates that don’t match.

sroberts · August 31, 2022, 9:47pm

But we only have a Tableau writer node not a reader node.

I was just using the Knime workflow to create a Tableau HYPER file.

iCFO · August 31, 2022, 9:54pm

I typically simultaneously write to a local dataset within a workflow to duplicate what is on a server, or you could do a full overwrite instead of an append but that may not be feasible because of size and update schedule.

You may be able to download tableau’s hyper file and convert it to a KNIME friendly format another way as well. I am fairly sure I have done this via Alteryx in the past. It may be possible in Tableau desktop as well, but not sure. You only have to pull it off once to generate that local data copy for join comparison.

eamendola · August 31, 2022, 11:06pm

This problem isn’t actually the scope of Knime. But I understand because we continuously face this problem in my company. You cannot alter the content of a hyper file, only full updates or incremental updates are allowed.

Since Knime cannot read a hyper file (Tableau Prep Builder can), we pretty much take the approach suggested by @iCFO, we keep all of our datasets in a local/remote repository or DB Engine (like Postgres) or whatever file type that is convenient (like csv, excel, etc.) and we work with any issues like duplications or transformations at the ETL level, in the KNIME workflow (run the workflow and read the file/repository, compare the outputs and filter out possible duplications, then write the full file again with the desired output/recreate the hyper file). The we left our Tableau Server manage the type of refresh we want.

If you are having duplication issues, well, you need to recreate the entire hyper in each iteration. If that is not possible, some workarounds can be done within Tableau using special calculations so you can get rid of the duplicated rows, that unfortunately kills your performance at the analysis level.

Hope that this gives you some ideas.

iCFO · September 1, 2022, 11:59am

I checked to make sure I was right about using Tableau Desktop as well and found this…

You can connect to the hyper extract with Tableau Desktop for a one-time conversion, and then export to CSV from the data menu.

This looks like the easiest way to get your hyper data back into KNIME so that you can manage data duplication of your server dataset at the ETL level moving forward.

Daniel_Weikert · September 1, 2022, 3:38pm

Not sure if I understood correctly but this is what I would do
Write it to a database and then connect to this database and publish it as a published data source to tableau server. Next time always update the database table instead of writing to the server directly. The server can take care of the refresh automatically (either via extract or live)
br

ScottF · September 1, 2022, 4:40pm

Just to add a bit of commentary to this: As part of the license agreement, Tableau explicitly prohibits vendors that embed their API (like KNIME does) from reading in Hyper files. So while we can create nodes to export data to Tableau, we can’t go the other direction. Frustrating, I know.

system · November 30, 2022, 4:40pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.