DNA to Protein translator

eygnt · August 31, 2022, 6:06pm

Hi,
Is there a node that translates a given DNA sequence into peptide/protein (one/three letter aa sequence)?
Sounds pretty trivial, but I couldn’t find one.

Thanks

Martyna · September 1, 2022, 7:55pm

Hi @eygnt

that’s a good question, I didn’t find one either.
In principle, it could work using our usual nodes for data manipulation, a mix of Transpose, Rule Engine and String Manipulation.

An alternative could be to use a REST Service from EBI: EMBOSS transeq Help and Documentation - Job Dispatcher Sequence Analysis Tools - EMBL-EBI

In any case this could be a nice component with a sequence as input and peptide as output

Best!
Martyna

Vernalis · September 2, 2022, 8:39am

None that I know of.

I thought I had started looking into this when I wrote the Sequence to SMILES nodes, but I don’t see any obvious sign of it having actually got anywhere at the moment. As I recall, I realised that it was more complicated than I originally realised with different Codon sets depending on the organism (DNA and RNA codon tables - Wikipedia), and the possibility of Stop codons, primer regions, introns, exons etc etc

I still think that for most cases it would be simple enough assuming you wanted the translation to start at the beginning of the nucelotide sequence and stop at the first stop codon, or the end if none encountered

Not the answer you were hoping for, I guess.

Steve

dnaki · September 2, 2022, 4:55pm

Hi,
If you know Java, another option would be to use a Java Snippet node and the BioJava library. biojava-tutorial/translating.md at master · biojava/biojava-tutorial · GitHub

-Don

Vernalis · September 2, 2022, 5:15pm

That has become pretty difficult to integrate with KNIME since the log4shell issue unfortunately as it now requires you to provide and configure your own log4j. I’ve not so far managed to get it to work successfully.

But, a snippet with a simple look-up would probably work:

// In global variables
Map<String,String> codonLookup = null;

//In the snippet
if(codonLookup == null) {
    codonLookup = new HashMap<>();
    codonLookup.put("TTT", "F");
    codonLookup.put("TTC", "F");
    codonLookup.put("TTA", "L");
    // etc etc..
}
StringBuilder peptide = new StringBuilder();
for(int i = 0; i< sequence.length; i+=3) {
    peptide.append(codonLookup.get(sequence.substring(i, i+2)));
}
o_Peptide = peptide.toString();

NB - this has no checks that you have for example 3 chars left each time, that the ‘codon’ is present in the lookup etc

Steve

dnaki · September 2, 2022, 5:45pm

Thanks, this is good to know, Steve.
I’m guessing this has been an issue in the development of custom (native) nodes, yes? I have a number of custom nodes that use the BioJava library that I need to update, so it’s likely I’ll run into this issue. I’ll share if I find a decent solution to this issue.

Thanks,
-Don

Vernalis · September 3, 2022, 3:01pm

Yes - unfortunately so. Be interested to hear if you figure how to sort it.

Steve

system · December 2, 2022, 3:02pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.