Data Profiling in Knime

josephgp · October 25, 2016, 6:22pm

Hi,

Are there any existing Nodes or Extensions for profiling data to assess Data Quality? Something along the lines of what other tools like Ataccama DQ Analyzer or Ebay's Griffin DQ Service can do.

Thanks,

JGP

Ergonomist · October 26, 2016, 10:17am

José,

You can always tie in specilist tools via KNIME's extension points, but normally my idea would be to build the quality screening I need in KNIME and to ignore all the rest I don't need. The statistics node is a great starting point for this, but of course you'd expect commercial specialist DQ tools go above an beyond that. Don't know them enough to really assess this, though...

-E

allen_n · November 1, 2018, 10:25pm

If you need a quick & easy work-around, convert the index to a new (1st) column in the DataFrame:

df.reset_index(inplace = True)

then name it:

df.rename(columns = { df.columns[0]: "row_index" })

If you need to restore the row indices in a downstream Python node:

df.set_index('row_index')

Prasanthsk · June 27, 2019, 10:59am

how do we integrate ataccama DQ tool via knime extension

ipazin · July 17, 2019, 1:50pm

For @Prasanthsk question see here: How do i report the data quality issues in some file format like csv ,xlsx or pdf etc...

ben_westphal · April 21, 2021, 7:21am

Agreed, would love to understand how (if at all, this can be done). Keen on anyone’s thoughts.

ipazin · April 21, 2021, 11:15am

Hello @ben_westphal,

never used Ataccama so having troubles understanding how would KNIME Ataccama integration work. Can you explain it a bit more? And additionally what functionalities from it are missing in KNIME, if any? (Guessing that is the reason why you would like to use both KNIME and Ataccama?)

It’s worth mentioning that the Data Explorer node has features for data profiling in case you haven’t tried it out yet.

Br,
Ivan

ben_westphal · April 26, 2021, 5:46am

I would have to get deeper into it… I suspect KNIME can do everything, it just needs to be built. I liked the Ataccama interface, insights available immediately. Just looking for shortcuts really! I’ll check out the Data Explorer Node, thanks.

Ben