Hello I need help regarding gene and protein sequences

Faheeemahmed · August 14, 2021, 3:01pm

Hello everyone. I am currently working on gene and protein sequencing in a bioinformatics project using knime.
I am trying to apply the k-mer counting method widely used in python. I actually, want to create a filter of five characters like “ATACG” and see the total number of such combinations in gene sequence.
The combinations should be checked by skipping on letter, like taking the filter of five above mentioned characters and slipping it over the sequences and recording the number of times it is found. The sequence example is given below.
Looking forward to hearing from anyone. since this is very urgent and important for me so even a slightest help would be highly appreciated. Thank you in anticipation .

temesgen-dadi · August 17, 2021, 9:09am

Hi @Faheeemahmed ,

Are you looking for a solution that gives you a list of k-mers and their counts in a given biological sequence? If yes, this is going to be a computationally intensive task to do in KNIME. And there is the question of how you want the result to be presented. Because for every row in a KNIME table with a sequence, you can have arbitrary number of k-mers and the same amount of counts.

Can you please point me to the widely used method in python that you mentioned? - We could use that same method from KNIME using the python integration.

I am also missing the sequence example you mentioned.

Best,
Temesgen

Faheeemahmed · August 17, 2021, 4:12pm

Hello @temesgen-dadi, I am really glad to hear from you. I am actually working on a bioinformatics project, and I want effective conversion of the biological sequences for effective algorithm training and finding out useful patterns as I am running out of time given by my instructor.
For K-mers, I want to create a filter like ‘AATGCATTA’
and slide it over the sequence in a row. For every filter matching in a row, it should count 1 and so on for all the rows. OR it should take two or more filters and continue the same process.
The example row is given below.

ATGTCAGAAACTTCCAGGACCGCCTTTGGAGGCAGAAGAGCAGTTCCACCCAATAACTCTAATGCAGCGGAAGATGACCTGCCCACAGTGGAGCTTCAGGGCGTGGTGCCCCGGGGCGTCAACCTGCAAGAGTTTCTTAATGTCACGAGCGTTCACCTGTTCAAGGAGAGATGGGACACTAACAAGGTGGACCACCACACTGACAAGTATGAAAACAACAAGCTGATTGTCCGCAGAGGGCAGTCTTTCTATGTGCAGATTGACTTCAGTCGTCCATATGACCCCAGAAGGGATCTCTTCAGGGTGGAATACGTCATTGGTCGCTACCCACAGGAGAACAAGGGAACCTACATCCCAGTGCCTATAGTCTCAGAGTTACAAAGTGGAAAGTGGGGGGCCAAGATTGTCATGAGAGAGGACAGGTCTGTGCGGCTGTCCATCCAGTCTTCCCCCAAATGTATTGTGGGGAAATTCCGCATGTATGTTGCTGTCTGGACTCCCTATGGCGTACTTCGAACCAGTCGAAACCCAGAAACAGACACGTACATTCTCTTCAATCCTTGGTGTGAAGATGATGCTGTGTATCTGGACAATGAGAAAGAAAGAGAAGAGTATGTCCTGAATGACATCGGGGTAATTTTTTATGGAGAGGTCAATGACATCAAGACCAGAAGCTGGAGCTATGGTCAGTTTGAAGATGGCATCCTGGACACTTGCCTGTATGTGATGGACAGAGCACAAATGGACCTCTCTGGAAGA

temesgen-dadi · August 18, 2021, 10:03am

Hi @Faheeemahmed

Your question is not still not clear for me. I am a trained bioinformatician. I don’t mean to be rude, but if this is a homework and you couldn’t do it in time yourself, you better talk to your instructor and get a better understanding of the problem as well as how to solve it.

If what you are looking for is a KNIME workflow that

takes a list of query strings (you are calling them filters/kmers) and another list of longer DNA sequences like the one you provided above
calculates the occurrence count of each query string in each DNA sequences

then the attached workflow might help.
count_occurrence.knwf (18.1 KB)

Best,
Temesgen

Faheeemahmed · August 18, 2021, 2:38pm

Hello again @temesgen-dadi it pleasure to hear from you again.
I am trying but may be I could not make you understand.

Actually, I am a PhD student and this is part of my research work. I am supposed to finish this project as the deadline is approaching.
Can you share with me your email so that I can specifically contact you and explain it to you in a better way to seek your help?

Thank You in advance

system · February 17, 2022, 2:38am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.