Small Dataset Predictive Modeling

Hi Community,
I am working with a very small dataset (9 rows and 7 attributes). Is it even feasible to train a predictive model with such a small dataset (i.e. decision tree, neural network,etc)? Even if i could train a model, how much weight can I put into the accuracy? Thanks in advance for any feedback!!

Hi,

In my opinion there are several facts that can affect a model like number of class values, the value range of the attributes, number of attributes, number of records and more.
As you mentioned, your dataset is small but still I would prefer to check the dataset and the case and then judge.
What is the case you want to build the model for?
How many class values do you have?
How diverse are the values of the attributes?

Although it is very unlikely to build a reliable model based on such a small dataset (too few records and high attributes to records ratio) it is not impossible yet. For example if this is a very simple case that your dataset covers it well enough then the model will be fine.

That’s what I think and please don’t consider me as an expert.

I hope I could help.

Best,
Armin

1 Like

First instinct would be to say: no. But like @armingrudd said it depends on the circumstances, how good a representation of your population this file is and how strong your rules are. To give you an example: You have a dataset of 6 adults and 4 babies. You want to derive a rule that tells you by age if the adults are larger than the babies. In your model you would likely get a 100% accuracy depending on the age and this rule might be true also for 99.99% of the world population. So with a very small sample you came up with a nearly universal rule (adults are larger than babies) - OK you might have reached that conclusion without a fancy neural network - but still.

So it very much depends on the representation and how strong and universal your rules are. So it is not entirely impossible but in any normal business environment I would be very careful. Although I have very limited knowledge of eg. medical data where some models for very small groups might be more common (but 7 attributes still seems to be very little information).

1 Like