Dear Knime community,
I’ve downloaded the Zinc database and filtered based on the properties I wanted. I now have ~1000 files with the compounds I want, but some files have 1 molecule (in SMILES format) and some files have >10 million molecules. I’d like to normalize the files so that each one has 1 million molecules.
I can use the Chunk Loop to write files of 1 million molecules, but this requires that I first read in the entire set of molecules (>200 million) and my computer can’t handle this. I’m trying to develop a workflow that:
- reads in files until >= 1 million molecules are available in a “buffer table”.
- split off 1 million molecules and write this to a file; continue writing files of 1 million molecules until there is no longer enough molecules in the buffer table.
- loop back to step 1 and continue until all files are processed.
I’m using the following workflow to read in multiple files:
I’m having a hard time modifying this workflow to perform the desired loop described above. Would anyone be able to help?
Many thanks,
Jeremy.