Using multiple CSV files for Machine Learning

I would be grateful for some help. I am a radiation oncologist trying to get au fait with Machine Learning. I have data! That’s the first hurdle I suppose.

My data exists in three CSV files - ClinicalData.CSV (in which resides the target variable for grouping; between 19 and 300 columns, many contain missing data), DVH.CSV (Dose-Volume Histogram data describing the radiation dose received by various ROIs dawn on a CT; ~700 columns), and PyRadiomics.CSV (feature manipulation of the same ROIs drawn on a CT; ~107 columns).

All CSV files contain the same patient ID. The DVH and PyRadiomics files have the same number of rows (each ROI for each patient gives one DVH row and one PyRadiomics row).

I would like to know how can I ‘combine’ these three files automatically for use in one ML instance.

Thanks for the assistance, both now and in the past. The community is a truly a benefit.

Take a look at

node. It can read all files in the folder and result output is combined table.

1 Like

It seems to me that the combination of files needs to have the same structure. Is this correct?

Your got it right. Some variations on line structure like short lines could be adjusted automatically with set this option.

Without having example data for us to look at it’s hard to say precisely, but you may also want to try importing the files individually and then joining them using some common key (like patientID, in your case).