I am using the Chemical Identitiy Resolver node to search for CAS Numbers based on SMILES codes. I want to check know if the CAS number I have for the SMILES is among the result provided by the CIR node.
In other words I want to check if a string is among a list of strings. The problem is the format of the data. The CIR node gives the results as multiple lines in one column. I think that therefore, the rule engine does not do it´s job. Please find below an example:
The problem isn’t the CAS CIR column, it’s your rules.
As stated in the description box, the LIKE function checks whether the value of the left expression is like the wildcard pattern defined by the right expression. You’ve flipped this around for some reason.
Maybe I wasn’t clear or extensive in my previous explanation.
‘Removing the asterisks’ meant a call of attention for your $CAS Search$ column in the case that you attempt the ‘regex match’ approach, as in this case, they won’t be needed. Not referred to the ‘Rule Engine’'s method.
Hi @Alkaline, I can see that this appears to be a bug (or undocumented feature!) in that in my test even just a wildcard * with no other characters fails a pattern match as soon as the string contains lines feeds. Certainly that would not be expected behaviour I think.
To be honest, I would say that the suggestion from @elsamuel …
… probably provides the least complex workaround to your problem.
Depending on how the multi-line strings were created you might have to replace just newlines \n, or possibly replace carriage return +line feed \r\n with an alternative character that could be just a space or some more obscure character such as ¬ just so that the problem goes away, as suggested
Having done that, your code should work. (It worked for me when I set up test data similar to yours)
@elsamuel , @takbb
The ‘Row0’ (in elsamuel table) has to return ‘False’ as $CAS Search$ is not in first position in $CAS CIR$ list. The same is happening for your ‘Row2’, because the Rules are not requesting it.
I’d go with the workaround to replace the newlines with some other character, as @elsamuel suggested. @Alkaline, if you want to match only for the first CAS, you can then take your first idea and have only the asterisks at the end of the CAS Search, that is make the string only match if the beginning matches using the rule engine logic proposed by @elsamuel.
Of course, @gonhaddock’s solution with the cell splitter works just as fine I just wanted to tell you that I filed a bug under reference AP-17933 to let you know the newline issue is in our system. I don’t know whether it is a good idea to change the behavior from a backwards compatibility perspective, but instead have an explanatory text in the description of “LIKE” that newlines are not treated as regular characters would help.
Thanks all for reporting and proposing solutions/workaround!
thank you all for your help. For me it looks that everyone has some slightly different results. But in the end we managed to solve the issue with the suggestions from everyone . And thank you for taking care that the bug will be fixed.