Chemical Names

Hello,

I hope you all are doing well.
Me and a few other doctors are working on a research project pertaining to the COVID-19 virus. Our objective is to identify effective anti-viral agents from the CAS COVID-19 Antiviral Candidate Compounds Dataset available on the www.CAS.org. One issue we seem to have run into is that the Chemical Names of the Antiviral Compounds in the Dataset are truncated as depicted in the example below. We will be needing the entire Chemical Names for Downstream Analysis. Is there anyway we can obtain the complete/ untruncated Chemical Names of the Antiviral Compounds?
Example of an Antiviral Agent with a truncated Chemical Name

Acetamide 2-amino-N-(1-tricyclo[3.3.1.137]dec-1-ylethyl)-
Mrv1921 09212018412D

17 19 0 0 0 0 999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.8517 -1.6585 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.1163 -2.0322 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.5657 -1.2419 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.3012 -1.6156 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.0368 -1.2419 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5873 -2.0322 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.0368 -0.4945 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.3012 -0.1208 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.8517 -0.9111 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.5657 -0.4945 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
3 4 1 0 0 0 0
4 5 1 0 0 0 0
3 6 2 0 0 0 0
1 7 1 0 0 0 0
1 8 1 0 0 0 0
8 9 1 0 0 0 0
9 10 1 0 0 0 0
10 11 1 0 0 0 0
11 12 1 0 0 0 0
12 13 1 0 0 0 0
8 13 1 0 0 0 0
15 17 1 0 0 0 0
14 15 1 0 0 0 0
15 16 1 0 0 0 0
8 16 1 0 0 0 0
12 14 1 0 0 0 0
10 17 1 0 0 0 0
M END

Thank you,
Dr. Hussain.

  1. Where is the truncated chemical name?
    If you’re referring to “Acetamide 2-amino-N-(1-tricyclo[3.3.1.137]dec-1-ylethyl)-” then is not truncated, it’s just rearranged. This is not uncommon. You can manipulate the string to “un-rearrange” it if it’s that important. Split the name cell with the Cell Splitter node using an appropriate delimiter (in this case, it looks like this is a space), and then use the String Manipulation node to join them in the desired order.

  2. What downstream analysis requires the names? In my opinion, in an SD file, the chemical name is the least useful bit of information since you have the structure.

1 Like

is there any KNIME node to derive chemical names from Molecular formulae of compounds.

Thanks.

You can try the CIR node:

Or you can generate the CIR API call (to https://cactus.nci.nih.gov/chemical/structure) yourself with the REST nodes.

Good luck.

2 Likes