Limitations about molecule size for geometry optimization?

Your suggested workflow certainly will not result in something reasonable if you use it for model building. You are calculating 1 set of coordinates and then optimize it which means you get more or less 1 random conformer. And then you want to build a model of this? in one case you catch something close to minimum, in another one it’s >+20kcal. Your comparing apples to oranges.

You will for sure need multiple conformers for each molecule and go from there. But that is ultimately the problem with 3D descriptors. They are different for each conformer and now you need to make a decision how to build a ML model from that. Note: I do not have a answer what is correct, just possible options which all aren’t very nice. Most reasonable IMHO: you can simply use the conformers as data augmentation, eg. 1 row per conformer instead of per molecule. (you should probably make a energy cut-off first)
.
Still, given the added huge computational effort, the benefit mostly will not exist. You will have to try it out yourself however. (On top compared to neural networks this additional computational effort also applies to predicting as you need to generate and predict multiple conformers of each molecule you want to predict and then add additional logic to decide if it is active or not)