Now that I have achieved what I was doing, I can add a few comments.
Probably the most helpful node to use in this case is the RDKit Catalogue filter, as it is the fastest of them all, and still gives pretty good info.
The RDKit Molecule Substructure filter turned out to be a weird one. On my smaller set, with my own SMARTS strings it worked just fine, but on the beg set it just refused to output data (0 matching molecules). The strangest part is that my test set is literally a diversity selection from the big one, with everything being the same.
And lastly, it turned out that the problem I was having with the SMARTS Query node (getting stuck at 25% forever) can be solved by throwing increasing amounts of computing power and time at it. Running it on a beefier pc overnight solved the issue and provided info that I needed.