Calculating cosine similarity

I have created some sentences and wanna find similar sentences. I used cosine similarity and it is very weird because those sentences have nothing in common, but the result is that they are perfectly similar (cosine similarity = 1). Why? I dont get this. When I use python and scikit-learn library it works well, but here something is wrong.


My created dataset:


Here it is not a similarity but a distance. A distance of 1 means the sentences are as far away from each other as possible. The similarity would be 1 - distance.
Kind regards


Oh, you are right. Thanks. I have also another question

I have two datasets - A and B with some documents.

I wanna calculate the distances between the documents in dataset A and B

dataset A has shape: 300 rows and 1000 columns (tf-idf)
dataset B has shape: 900 rows and 1000 columns (tf-idf)

As a result I would like to obtain a matrix (dataset) with shape 300x900 and in each cell there will be a distance (cosine) between documents A vs B

Creating such a table is possible, but may be a bit time consuming. Do you really need the full distance matrix, or are the k nearest neighbours in B for each document in A maybe enough? Because then you can use the Similarity Search, which is quite quick.
Kind regards,


This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.