Receiving an "Invalid Character Error" when using OMSSAAdapter

Hi,

I am following the OpenMS tutorial using KNIME, but I am using my own LC/MS data. I have downloaded the uniref90 fasta database (from UniProt), and created an index file with NCBI’s makeblastdb -dbtype prot -in uniref90.fastsa. However, I get an error that says "Invalid Character 0xB in line LINE and column COLUMN" in reference to the mzML file I have input.

I have converted my data using MSConvertGUI (from ProteoWizard) and the command line msconvert tool, but each one results in the same error.

I’m not exactly sure where this error is coming from (OpenMS/KNIME or MSConvert), so I thought I would start with OpenMS/KNIME and go from there

My questions are:

  1. Does anyone know why I get this error, and some general steps I might take to fix it?
  2. Is the reference database I have downloaded acceptable, or is there a better known one? This may be user-specific, but I thought I would ask anyway

I appreciate any help!

Edit: I would also like to say this is very new to me, so I apologize for any information not included

Hello, usually pictures of your problems and a workflow (with real or fake data) where we can reproduce the problem helps greatly. So please create a shareable workflow with the minimal information needed to replicate your issue and I can inspect what may be happening.

Okay, sounds good! I have included an image of my workflow, along with a link to my data, the reference database, and the KNIME workflow itself

Here is the basics of the workflow:


This is the “Standard output”:
Error: Unable to read file (- due to that error of type Parse Error in: C:\jenkins\ws\openms\ntly\TstPkg\9447518b\source\src\openms\source\FORMAT\HANDLERS\XMLHandler.cpp@132-void __cdecl OpenMS::Internal::XMLHandler::fatalError(enum OpenMS::Internal::XMLHandler::ActionMode,const class OpenMS::String &,unsigned int,unsigned int) const)

And this is what the “Error output” shows:
C:\jenkins\ws\openms\ntly\TstPkg\9447518b\source\src\openms\source\FORMAT\HANDLERS\XMLHandler.cpp(131): While loading 'C:\Users\joshl\Desktop\Data\PWU1F_P1-C1_1_1003.mzML': invalid character 0xB( in line 209752 column 56085)


I have my workflow/data/reference database on OneDrive, I hope that’s okay
Link to data

Hi!

I think the problem is that your data is gzipped. The easiest solution would be to unzip your mzml.gz first.

It might be possible to correctly overwrite the filetype in the Input File node to mzml.gz but I am not sure if that will work with all nodes.

Hi,
Sorry, I should have specified that I gzipped my data only for upload because my internet speed is not great. I have the data unzipped in my workflow

Okay, then there is probably a “tab” somewhere in your mzML that should not be there.

Okay!

Do you have any idea how I could go about finding the tab character?

The most efficient way would be on a command line (e.g. PowerShell on Windows):

Or bash on Linux/macOS:

If you cannot get it to work, you might also be able to open the file in a lightweight editor like Sublime Text and just Find/Replace the character.

This worked great, thank you!

I do have another problem - OMSSAAdapter only works with MS2 data. I am currently working with MS1 data.

Do you know if there another tool I can use for MS1 data that is included in KNIME/OpenMS?

Usually, you should have a mixture of MS1 and MS2 scans in your data. If you really only have MS1 data, then, for proteomics you can only do quantification, no identification. For metabolomics data, you could do an AccurateMassSearch based on a compound database.

Okay. Neither my supervisor nor I have worked with KNIME/OpenMS before, so I apologize for the naive questions and hand-holding I’m needing.

He said I should be using OpenMS for MS Only Full Scan data for pairwise comparisons of our data, am I able to do this in OpenMS? Or do you happen to know of any papers/writeups of doing something similar to this?

I see from this issue that OMSSAAdapter is not really supported anymore, and should be moved to CometAdapter or MSGFPlusAdapter.

I appreciate your time in helping me on this

Are you working with proteins/peptides or small molecules/metabolites?

I am using proteins/peptides

Well, you could use the quantification tools and compare quantities of unknown features. That works with just MS1.
But to annotate peptides and distinguish peptide sequences with the same composition you need MS2 spectra in OpenMS.