Hello community,
I need your help with a rather complex task, which is to identify similar listings across multiple tables.
Let me explain my project in detail. I have several data tables that have collected item sale listings from various internet sources using web scraping.
I now have multiple data tables, and I would like to group listings that are identical. The challenge is that the information for a listing can be different on the source site. Therefore, I need to create a kind of similarity score between listings based on various pieces of information such as the listing name, author name, description, location, etc.
Adding to the complexity, information can be written differently as well.
So, I’m not sure if you have any examples or ideas on how to approach this issue. I’ve already done some research, for instance, I know that there are tools that calculate the similarity distance between two strings, but I’m unsure if this is effective for a listing description containing hundreds of words.
If you have any ideas or advice, I would be grateful.
Thank you very much.