Using multiple CSV files for Machine Learning

AAM · August 28, 2022, 5:59am

I would be grateful for some help. I am a radiation oncologist trying to get au fait with Machine Learning. I have data! That’s the first hurdle I suppose.

My data exists in three CSV files - ClinicalData.CSV (in which resides the target variable for grouping; between 19 and 300 columns, many contain missing data), DVH.CSV (Dose-Volume Histogram data describing the radiation dose received by various ROIs dawn on a CT; ~700 columns), and PyRadiomics.CSV (feature manipulation of the same ROIs drawn on a CT; ~107 columns).

All CSV files contain the same patient ID. The DVH and PyRadiomics files have the same number of rows (each ROI for each patient gives one DVH row and one PyRadiomics row).

I would like to know how can I ‘combine’ these three files automatically for use in one ML instance.

Thanks for the assistance, both now and in the past. The community is a truly a benefit.

izaychik63 · August 28, 2022, 11:07am

Take a look at

node. It can read all files in the folder and result output is combined table.

AAM · August 29, 2022, 8:12am

It seems to me that the combination of files needs to have the same structure. Is this correct?

izaychik63 · August 29, 2022, 11:38am

Your got it right. Some variations on line structure like short lines could be adjusted automatically with set this option.

ScottF · August 29, 2022, 2:28pm

Without having example data for us to look at it’s hard to say precisely, but you may also want to try importing the files individually and then joining them using some common key (like patientID, in your case).

system · November 27, 2022, 2:28pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.