Is it possible to import BitVector columns into Java snippets, as BitVector or BitSet objects?
I remember this being a shortcoming that was going to be addressed ages ago, with a forum post back in '15 asking for the same feature: Programming (java snippet) with Bitvectors/Bitstrings?
I could have sworn it was possible, but the only options I appear to have are to load the BitVector into the Java snippet node as a String. That wouldn’t be a problem if the whole BitVector was represented in the string, but it isn’t. It’s being truncated, with “…” at the end.
You could try using some of the
Fingerprint to ... nodes from Vernalis (see https://nodepit.com/iu/com.vernalis.knime.fingerprint) before the snippet, (and possibly a
Fingerprint from ... node to follow it to convert back)
Hmm, it would be easier if the BitVector types implemented the @DataValueAccessMethod and @DataCellFactoryMethod annotations or explicit converters for the Java types?
I have a feeling I round trip through the Java BitSet or our internal BitSet types which are implemented with the Java Snippet as a workaround.
I thought you guys had a BitSet type, but couldn’t find it on NodePit - ‘internal’ explains it, I guess…
Yeah, I thought there was a BitVector to BitSet converter node. Or am I remembering an internal node from when I was at Lhasa?
@Steve: Your FP to binary string is probably what I’m after.
Edit: maybe not; the output is also truncated. I presume you’re just using the toString method.
I thought I was using my own method which didnt truncate! Maybe try some of the others - I dont think the set bits list truncates.
In the end I took a quite different approach to what I was trying to achieve. Still, I’m surprised the BitVector isn’t a snippet friendly data type at this point. Greg did a blog post a while back, where he used BitVector as a String to create RDKit FPs. That’s dangerous if the String might come out truncated. My FPs weren’t a standard folded length though, so maybe a more normal 1024, or 2048 bit FP would be fine.
I think the truncation is either 8192 bits or 8192 characters in the String representation - cant remember which, and obviously in some cases that is the same.