How to deal with multiple row for one prediction

Hello,
This is my problem.
I have a dataset concerning training data in sports. Several training sessions lead to a competition, and the current result is coded as “good result = 1” or “bad result = 0.” For practical reasons, this is displayed on the same line as the last training session. My question is, how do I input multiple distinct lines while maintaining the chronological order of the training sessions for a single prediction ?
Example data :
Exemple data.xlsx (20.6 KB)

Thanks
Br

Hi,
In the end you probably want to train a model to predict the score, right? You will need to figure out your set of features that you want to use to make the prediction. Then you can use a GroupBy or other aggregation methods to create a single line with all your features and the score as the target variable. On that you can then train your predictor. There is no predictor that can make a prediction based off of multiple lines.
Kind regards,
Alexander

Thanks for your answer.
I added a “training session” column, which corresponds to all the training rows by name before a competition that will result in “outcome 1/0”. For example, Pierre from August 1st to 29th = training session number 1 (there are four training sessions during this period). Outcome on August 29th = 1. How to group these 4 training sessions for a single outcome? Is it ultimately the same to specify the outcome for each training session? If yes, the disadvantage is that this system will not take into account any evolution or none.

Thanks
Br

Hi,
You will need to think of your problem as a set of independent variables (features) and a dependent variable (target, score, …). Your model is a function that maps the independent variables to the target variable. This implies that you need one row for every session. How you compress your multiple rows into one row is up to you. For example, for each session you could calculate the minimum, maximum, mean, and/or median distance traveled and do the same for other numeric variables you have. For the categorical variables, you could calculate the mode, or just if some value was present in any of the rows. Doing this means that you become independent of the actual number of rows for any given session, so that every record for a session “looks the same”, i.e. has the same number of features. And then you can use that for your prediction.
Kind regards,
Alexander

NTS.xlsx (11.1 KB)

My reference columns are Name + Number of training session.
I have 10 sessions.
I have 1 target.
For all the futures i retain mean.

How finally get only one row that is the result of all the training sessions ==>> One taget
How to deal with that ?

Thanks
Br

Hey,
Just use a GroupBy node and then group by Entrainement and aggregate all other columns except “Outcome” with mean or mode. Aggregate “Outcome” with “First”.
You will end up with a single row for your session.
Kind regards,
Alexander

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.