How can I split a cell into alpha and numeric columns

I have a column with an identifier that has both alpha and numeric characters. I want to split this column into 2 columns, one containing only the alphas, the other containing only the numerics. For example:

Col1                   Col2        Col3

1012BXY -------> 1012      BXY      

I have tried using the Regex Split node but I'm having difficulty formatting an expression that works (I am a regex newbie). Can anyone give me an example of what expression should be input into the node to enable the split?

Thanks

Jon Timko

"(\d+)(\w+)" should do the trick (\d = digits, \w = word characters).

Dear All,

How can I further Col2 into its component numbers using regex split?

Col4 Col5 Col6 Col7

1 0 1 2

I am sorry for the simple query.

With best regards,

Philip

([0-9])([0-9])([0-9])([0-9]) should do it.

However I would probably use the node "cell splitter by position" and for split indices use "1,2,3,4".

Simon.

Dear Simon,

Thank you very much for looking into it. I am trying to split a chemical fingerprint (of length 1024) column. Maybe the regex is pattern is not sufficient.
Can you please kindly look at the attached protocol (modified from an existing protocol) and kindly suggest again ?

With best regards,
Philip

I dont have access to KNIME right now, but the Expand BitVector node should be what you are looking for to split out a Fingerprint.

Simon.

Dear Simon,

Thank you again. But I was not able to split it with the expand bitvector node. Can you please kindly look into it again by downloading it and trying it yourself ?
I am sorry to trouble you with this.

With best regards,
Philip

Would like to help looking at this, but without a data file it's a bit of a pain.

With the description of your initial post, could it be that you have a decimal fingerprint? I think the expand bitvector node only works with binary FP.

that aside, you are using RD-kit fingerprint node, that gives binary, not sure why the exp. node wouldn't work (do you use right after the fingerprint node or after the Bitvector generator? don't see the point in having the latter).

Erlwood nodes has a "Fingerprints Expander" (basically the same thing), you could try that instead.

If your fingerprint column has somehow lost its data type and become a string column, may need to regenerate it with the create BitVector node, and then use expand BitVector node. This should work for you.

Simon.

I tested this now (I used the original workflow) and added RD-kit fingerprint and bit-vector-expander, works as intended for me.

I believe it's like richards99 says, you lost your datatype and that is as far as I can see because you use a column rename and end up with a string column, that is the problem.

Forget the rename, when you expand the bitvector you can (most likely will) replace the original FP column anyway. And for 500 or 1000 bit columns you don't need to worry about naming.

Dear Docminus and Simon,

Thank you for the suggestions.

Is it possible to share your workflow with me? I am still not able to figure the problem and I do not use the column rename node anywhere in my workflow. I instead used a string manipulation node to convert the RD-kit fingerprint format to a string format so that the ‘Bitvector Generator’ node can recognize it.

Please kindly attach your workflow and hopefully it will all be OK.

Thank you and with best regards,
Philip

Sorry, my bad, I meant your String Manipulation. You don't need to convert to string format.

Why do you want to your Bitvector Generator? RD-kit fingerprint makes a fingerprint. That is a bitvector mathematically speaking, just another name as used in cheminformatics.

So all you need is:

RD-Kit Fingerprint (makes your bitvector) -> Expand Bit Vector (if that is what you need)

done.

 

Dear Docminus,

Thank you again, I understand and I have tried your suggestions.
But the ‘Bitvector Generator’ does not recognize the ‘Col0 (Fingerprint)’ column which contains the Bitvector of type ‘DenseBitVectorCell’. May be the ‘Bitvector Generator’ node does not recognize the Dense BitVectors.
Please kindly take a look at my workflow again and kindly advice.

Thank you,
With best regards,
Philip