I was just thinking how Knime could be advertised as the best possible glue between R and Python, but maybe only with better R integration. Currently a do-nothing Python Script node executes about 10x faster than a do-nothing R Snippet node.
As you might imagine, the development team has spent quite a bit more time on the Python side of things relative to R. But I can pass your feedback to them - a 10x difference in speed is a lot!
Hi @ScottF, indeed the speed of Python nodes has improved greatly since I posted about it a few years ago, which is awesome! R nodes are still kind of the same, though. Unfortunately, all of my colleagues are totally into R.
Here are the steps to create a KNIME workflow for PCA on the Iris dataset:
Open the KNIME Analytics Platform and create a new workflow.
Import the Iris dataset into KNIME by using the “File Reader” node. Configure the node to read the dataset file and choose the “CSV” format.
Split the dataset into training and testing sets using the “Partitioning” node. Configure the node to split the dataset into a 70/30 ratio, with 70% of the data going to the training set and 30% going to the testing set.
Normalize the training set using the “Normalizer” node. This step is important because PCA is sensitive to the scale of the input features.
Perform PCA on the normalized training set using the “PCA” node. Configure the node to use the default settings.
Use the “PCA Apply” node to apply the PCA model to the testing set.
Visualize the results of the PCA using the “Scatter Plot (with Color)” node. Connect the output of the PCA Apply node to the input of the Scatter Plot node. Configure the node to use the first two principal components as the X and Y axes, respectively.
Save and run the workflow to see the PCA results.
That’s it! You have created a KNIME workflow for performing PCA on the Iris dataset.
Me: I think you forgot to perform the normalization on the test set as well.
ChatGPT: You’re right! Normalizing the test set is an important step to ensure that the same scaling factors are applied to the test set as were applied to the training set. Here are the updated steps to create a KNIME workflow for PCA on the Iris dataset… etc, etc
@Aswin as of now ChatGPT would not be able to create KNIME workflows directly though it is interesting that it can lay out a plan what to do. I have tried several things with KNIME and Python since it can write Python (and R and other) code and integrate that into KNIME.
Concerning you initial question about KNIME and R. There are two ways to send data from KNIME to R and back (data.frame, data.table), and you could also employ Parquet (or SQLite). Question is if the speed really is that crucial compared to some R functions you might want to use.