DNA to Protein translator

Is there a node that translates a given DNA sequence into peptide/protein (one/three letter aa sequence)?
Sounds pretty trivial, but I couldn’t find one.

Thanks :slight_smile:


Hi @eygnt

that’s a good question, I didn’t find one either.
In principle, it could work using our usual nodes for data manipulation, a mix of Transpose, Rule Engine and String Manipulation.

An alternative could be to use a REST Service from EBI: EMBOSS transeq Help and Documentation - Job Dispatcher Sequence Analysis Tools - EMBL-EBI

In any case this could be a nice component with a sequence as input and peptide as output :slight_smile:



None that I know of.

I thought I had started looking into this when I wrote the Sequence to SMILES nodes, but I don’t see any obvious sign of it having actually got anywhere at the moment. As I recall, I realised that it was more complicated than I originally realised with different Codon sets depending on the organism (DNA and RNA codon tables - Wikipedia), and the possibility of Stop codons, primer regions, introns, exons etc etc

I still think that for most cases it would be simple enough assuming you wanted the translation to start at the beginning of the nucelotide sequence and stop at the first stop codon, or the end if none encountered

Not the answer you were hoping for, I guess.


1 Like

If you know Java, another option would be to use a Java Snippet node and the BioJava library. biojava-tutorial/translating.md at master · biojava/biojava-tutorial · GitHub


1 Like

That has become pretty difficult to integrate with KNIME since the log4shell issue unfortunately as it now requires you to provide and configure your own log4j. I’ve not so far managed to get it to work successfully.

But, a snippet with a simple look-up would probably work:

// In global variables
Map<String,String> codonLookup = null;

//In the snippet
if(codonLookup == null) {
    codonLookup = new HashMap<>();
    codonLookup.put("TTT", "F");
    codonLookup.put("TTC", "F");
    codonLookup.put("TTA", "L");
    // etc etc..
StringBuilder peptide = new StringBuilder();
for(int i = 0; i< sequence.length; i+=3) {
    peptide.append(codonLookup.get(sequence.substring(i, i+2)));
o_Peptide = peptide.toString();

NB - this has no checks that you have for example 3 chars left each time, that the ‘codon’ is present in the lookup etc


1 Like

Thanks, this is good to know, Steve.
I’m guessing this has been an issue in the development of custom (native) nodes, yes? I have a number of custom nodes that use the BioJava library that I need to update, so it’s likely I’ll run into this issue. I’ll share if I find a decent solution to this issue.


Yes - unfortunately so. Be interested to hear if you figure how to sort it.