Just in case anyone is interested in using the USPTO curated rsmi/csv datasets by Lowe or Schwaller, I have curated the yields in these datasets. As it is in the original, the yield isn’t very helpful.
A Knime based workflow with some explanation is available on the hub:
Yield cleanup of USPTO csv/rsmi files
The ready data-sets are available on fishare: