Seeking help with a text search related question for the Uniprot database

KnimeLearner · April 4, 2013, 10:13am

Hi, I am new to Knime. I would be very thankful if someone can help me or give me some advice with a question related to bioinformatics. I have a spreadsheet which contains the names of 1000 genes and I would like to collect information from the Uniprot database for each gene. I can download the Unirpot database in several different formats, fof instance fasta, text and so on and I can read in the data into an excel sheet ( I actually store the relevant part of the database corresponding to around 500 000 rows in an excel sheet).

Thus I have an input spreadsheet with gene names, for instance, BCR1, and I would like to search the "Gene Name"-column in the Uniprot spreadsheet (or fasta or text-format) in order to identify the row which contains the information for this gene and then store this in a separate file.

Is there any workflow ready for such a task or is there a simple way to understand how to create such a workflow.

I would be very grateful for any sort of help on this matter! Also, sorry if I post this under the wrong forum!

Best regards,

Bobby

frank · April 4, 2013, 3:56pm

Hi

There are several text processing nodes in the KNIME Labs node library. One of these nodes is the Dictionary Tagger node. That is exactly what you need. The list of genes is your dictionary. The excel list can be read by the XLS Reader and transferred to a cell of document type.

There is a whitepaper about text processing on the KNIME webpage to get familiar with these nodes:

http://tech.knime.org/files/knime_text_processing_introduction_technical_report_120515.pdf

Frank

KnimeLearner · April 5, 2013, 9:49am

Hi Frank,

Thanks a lot for so quicikly putting me on the right track!!!

Best regards,

Bobby

frank · April 5, 2013, 12:25pm

You are wellcome. It takes a bit to get an overview of the text processing nodes, but the paper is a good starting point because it describes a typical sequence of text processing nodes in a workflow.

Frank

Aaron_Hart · April 5, 2013, 2:25pm

Moved to the general section as this is not really focused on HCS. Thanks for the contributions.

Cheers,

Aaron