Hello everyone:
I need help from someone who can handle text mining, or at least I think. I have two files, one of them is part of the information notices used car sales. The most important field of this in "Model" where the person who placed the ad type model of car you want to sell. On the other hand, I have a file with data from the Internal Revenue Service, where different cars with their brands, "Models", the amount you must pay to circulate in the streets of Chile appear. I need to know is what model (row-level) of the first file belongs to second. While it is true that the "Model" the second file field is very orderly, in the first, as is written at the discretion by putting the announcement, the "Model" field is very messy and perhaps the same "model" appears written in different ways. Also, the "Model" field of the first file contains information that has nothing to do with the model itself.
The original files are much larger and contain many other makes and models, in this case I filtered 3 common to the Chilean market.
Thanks in advance to anyone who can help me. I include a very basic flow and sample files.
Gabriel.