Synthetic Data Augmentation with Copulas

Hi all! :wave:

I created a KNIME workflow that uses the Synthetic Data (Copulas) component to perform data augmentation on tabular datasets.

It starts with the Iris dataset , generates 500 synthetic rows using a Gaussian copula, and compares the real and synthetic data using Statistics , Linear Correlation , and a 3D scatter plot for visual inspection.

Great for testing, privacy, or boosting small datasets.

Check it out!

5 Likes

Cool idea - thanks for posting it!

2 Likes

I can’t find the conda packages anywhere. Could you suggest how to install them? My Python skills are weak at best.


If you like the Iris Data, I can also recommend the New Iris Data :slight_smile:

3 Likes

Hi @rfeigel,

I apologize for the delay in responding. I was caught up with other commitments right after your question.

To get started, please install Anaconda on your computer. You can find it by searching for “Anaconda” on Google.

Once installed, follow these steps:

  1. Open the Anaconda Prompt.
  2. Run the following command to create a new environment:
    conda create --name knime_workflows -c knime -c conda-forge knime-python-scripting python=3.9
    
  3. Activate the newly created environment:
    conda activate knime_workflows
    
  4. Install the Copulas library:
    pip install copulas
    
  5. Open KNIME Preferences, navigate to Python, select Conda, and within the Python 3 section, choose the knime_workflows environment.
  6. Try running the workflow and let me know if you encounter any issues.

Looking forward to your feedback!

Best regards,
Carlos

Hi Iris,

Your paper was really awesome! Thank you for sharing! :rose:

Best,
Carlos

1 Like

Thanks for the reply. I had Anaconda installed. For some unknown reason the prompt disappeared and won’t reinstall. Finally got the copula package installed through conda forge from a cmd prompt. “conda install conda-forge::copulas”. Installed to an existing environment. Your workflow is now working for me. Thanks again. Nice work.

3 Likes

@rfeigel I have written down what I know about KNIME and Python in this article including handling of conda environments

2 Likes