Hi All,
I have a Document Vector output with a Product ID column and then a large number of columns (over 5K) representing words (features of products) with their Document Vector value. I am using the Distance Matrix Calculate node to then use the Distance Matrix Pair Extractor to get the distance between each product based on their features. However, my dataset is over 50K rows which is making the Distance Matrix Calculate take forever to run. I have already tried PCA to reduce number of columns but also takes to much time.
Can anyone suggest an alternative solution to my problem. At the end, I am trying to get the cosine distance between each product based on their text features. I need a solution that runs locally in my computer as I can not use connectors or APIs to process data outside my environment.
Many thanks!!
Best,
Ricardo