Importing and merging csv files

jonathanco · July 28, 2016, 2:08am

Hi,

This is very much a beginner's question.

I have a history of transactions for large number of properties that I want to prepare for analysis. The transactions are in a file which I have/can setup to also import into a database table. The support data is also in files (CSV) but could also be put into database tables.

Questions are:

I need to merge some additional data that is in CSV files, looking up by ID - this would just effectively add columns from one file to the master table to perform a simple join - the example is taking a property transaction and then adding some additional type classification by looking up that transaction in a seperate file/table - is the join table node the best to use for this?
I then need to 'flag' some records - we are looking up whether a property transaction is of a type that is listed in a seperate file/table - in this case a list of transaction types that are considered distinct types for review - I think the right way to do this is with an outer join to add the flag?
I also then need to match to one last data set which links the transaction to a client record - getting the correct client record requires looking at the transaction date and then searching for the correct client record in a file that has the client ID with a start date and end date for each property. I am really not sure of the best way to go about this:
1. Should I take the joined table and loop over it with the 'Table Row to Variable Loop Start'?
2. If so then what would be the best approach or nodes to use to take each row, pick out the ID and date to use for a look up and then take that and query the client dataset?
3. Or, is it better and more stratightfoward to push client data and the joined table into a database and then write a view or stored procedure that handles this search and combine for me?

Many thanks in advance,

Jonathan

marco_ghislanzoni · July 29, 2016, 12:20pm

Hi Jonathan,

most of the tasks you describe can be accomplished with the Joiner node and with the Rule Engine node (like flagging records depending on certain conditions). If your data are in a database, you can access it from KNIME and use the Database Joiner node to perform similar operations.

Instead of coming up with complex rules, you can often simplify your workflow by adding working columns (e.g. date ranges, number of days, etc.) which can be removed when you no longer need them.

It is difficult to be more specific without knowing your data structure, size and quality (i.e. are there missing values? Are the matches going to be exact ones or do you need fuzzy matching? Will you be working on few hundred lines or millions?)

Hope this helps anyway. Feel free to give it a try and post a follow-up for additional help.

Cheers,
Marco.

peter_miller · December 15, 2016, 12:06pm

Hi,

for me, i merge CSV before i import using merge csv files.