Is artificial intelligence used for data cleansing techniques used by KNIME

Is artificial intelligence used for data cleansing techniques used by KNIME. Especially while cleansing database tables?

Thanks and Regards,
Ram

I would say no. Why would you assume that?
br

1 Like

Thank you for your quick response Daniel. Why I asked that question was in many of the blogs I have read that Machine Learning if used in data cleansing can reduce the effort spent on cleaning and can provide more accurate data.

Regards,
Ram

Hi @Ramakrishnaam and welcome to the forum.

I think the answer here is “it depends”. If you are talking about cleansing techniques like data deduplication, probably not. But if you wanted to, for example, predict missing values in your dataset using more complete records in the same dataset, you could easily build a model in KNIME to do that.

If you can be more specific about what techniques you mean, and what types of ML you are thinking about, maybe someone can give a more complete answer.

5 Likes

Thank you Scott,

Which one’s de we use in KNIME for predict missing values in dataset?

@Ramakrishnaam I think it might help if you could tell us more about what you want to do.

If we are talking about automated data preparation and feature engineering. There is no genuine KNIME node but I have set up sample workflows using the R package vtreat to (semi-)automatically prepared data. Similar approaches would be featuretools with Python.

Please also note the links at the bottom of the pages.

If you are looking for other techniques of dimension reduction there is this large workflow along with accompanying articles:

4 Likes

@Ramakrishnaam
If you were referring to automl then this is a complete different story. Then I understood your question wrong. I was referring to almost all the “standard” data prep nodes.
But even with automl it make sense to know what you are doing!
br

4 Likes

You could try the component for Guided Missing Value Handling. Among other things, it uses a random forest model to calculate missing values for a given field.

5 Likes

Thank you all for all the suggestions, will try with Guided Missing Value Handling, if it fits my need.

Regards,
Ram

1 Like

@ScottF, for Guided Missing Value Handling, I see we used “Column Selection Configuration” node which needs Table data as input Data Value. If we want to use Postgres DB table, I did not find any matching node which “Column Selection Configuration” accepts.

Could you please suggest what to do here if we want to handle missing values using “Guided Missing Value Handling” workflow for Postgres DB tables?

Thank You,
Ram

Hello @Ramakrishnaam,

you can use DB Reader or DB Query Reader node to read data from your Database into KNIME and then apply above mentioned Component.

Br,
Ivan

3 Likes

Thank you @ipazin , this worked :+1:

1 Like

I am still not able to connect the dots, I tried using “Guided Missing Value Handling” component, but still not getting the desired output. I might be going wrong somewhere. Can we connect on a zoom call if possible?

Thank You,
Ram

Hello @Ramakrishnaam,

have checked a bit on Guided Missing Value Handling component and it’s pretty specific Component. Maybe for start try with something simpler regarding missing values?

Simply Missing Value node might help you out? Also you can check KNIME Hub for workflow examples on dealing with Missing Values (and other cleaning techniques):

Br,
Ivan

3 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.