String Replacer bug

When using the String Replacer node, if you try and replace any string which contains the * character, the node freezes part way through execution. This is for when you have a Smiles column. It is also true for any Smiles column you have renamed to a String column using the "Rename" node. 

For example, trying to replace "[*1]" for "[Br]" with "Replace Whole String" selected, stops the node from completing.

 

Simon.

Hi Simon,

* is a wildcard, so matches any number of arbritary characters, rather than specifically '*'.

I found that using [?1] as the pattern replaces the [*1] s in my example, but this will also replace anything like [21], [a1], [x1] etc. I'm not sure that this would be a problem in SMILES, but you should be aware.

In general you can 'escape' special charactes with a preceding backslash, but this doesn't seem to work in the String replacer node, but in the Java Snippet (which I know you don't like :-)) you can do this which is more robust:

return $column1$.replace("[\*1]", "[Br]");

and set return type to String, where 'column1' is your smiles column.

Hope this helps,

Dave

It is known that the StringReplace cannot replace * and ? because they are meta-characters and cannot be escaped. We cannot easily change this without potentially breaking existing workflows. We will think of a solution. However, the node should not freeze if you accidently want to replace *. Is this reproducible?

Hi Thor,

Yes, it is reproducible, but this freezing seems to concern Smiles columns only that have been converted back to String columns using the Rename node first. For some reason, the String column still seems to retain some "Smiles" character as the String Replacer node freezes and when you cancel it you sometimes get an error saying "[1*] is not a smiles truetype cell" or a comment to that effect. This is even though I have renamed the column back to String from Smiles. Maybe its a bug in the Rename node?

 

And thanks Dave for the suggestion [?1] sounds like a good solution for what I want to do. As you say, I dont like the Java Snippet, so I wont be venturing there unless I really need too. Too complicated for me!

Simon.

This sound like a bug related to replacing text in Smiles columns. It is already fixed in KNIME 2.5.

BTW the Rename node does not physically change the type of a column it merely rewrites the type information in the table specification. The cells are still Smiles cells. This may lead to all kinds of seemingly strange behaviour.

Simon,

I too have run into problems trying to replace strings containing the '*' character, and also with the rename node 'pretending' to re-type Smiles to String.  If you want to achieve this properly, then the best way I have found is to use a Java Snippet, set with the output type as string, and the output destination as to replace your smiles column, and the script:

String r=$My Smiles Column$;

return r;

Hope this helps

Steve

Many thanks for all your comments, and I'm glad to hear the String Replacer will work correctly in Smiles columns in KNIME 2.5.

 

Thanks Steve for the dreaded Java Snippet solution :-).

I will need to start storing all these Java Snippet solutions, or start learning to use Java, Python or Perl etc !!

 

Simon.

Hi,

Related to the use of the string replacer node, is there a way to replace all cells containing letters and punctuation with NULL (i.e. replace ABC123 with NULL but if the cell is 123, then keep it as 123)?  If this node isn't able to perform that function, what node should I use?

The string manipulation node is what you need here where you could use regex syntax to achieve what you want.

simon.

Instead of listing each letter in the alphabet and every possible punctuation mark, is there a regular expression that'll remove/replace all?

replace($column$, "a", "", "i")

replace($column$, "b", "", "i")

replace($column$, "c", "", "i")

I'm no expert on RegeX, there are better people on the forum than myself who know about RegEx. Wikipedia is a good source for understanding  RegeX.

What I can tell you, is use square brackets encompasses various possibilities. So [A-Za-z] covers any letter, and for multiple of these letters, you can follow it with +, so [A-Za-z]+ is 1 or more letters. For punctuations, you could add those into the square brackets also.

Simon.

I'm not sure if the String Manipulation node actually has a function expecting a regular expression, but it would be something like replace(string, ".*[A-Za-z].*", ""). So anything that contains at least one letter character somewhere is replaced with the empty string.

The 'String To Number' node could be an easy solution. Any cell with non-numeric characters will fail the conversion and be replaced with a NULL value.

If you do go the RegEx route, I found the regular expressions primer over here a valuable introduction:

http://www.macresearch.org/introduction-regular-expressions

It's written for Python users, but still generic enough for all situations.

 

(the other) Simon

The string manipulation node doesn't seem to accept regular expressions. However, I'm able to achieve my desired results by simply using the string to number node =)