OpenMS nodes for LC-MS/MS data in combination with MetaboliteSpectralMatcher

ktzanakis · November 2, 2021, 7:32pm

Hello!

I am trying to make a workflow for MS/MS data based on MetaboliteSpectralMatcher node and I would like to know if it is possible.

Could the BaselineFilter and NoiseFilterSGolay nodes be used for MS/MS data?
Could the FeatureFinderMetabo and nodes also be used for LC-MS/MS data?
Could the MapAlignerPoseClustering and FeatureLinkerUnlabeledQT nodes also be used for LC-MS/MS data?

Are the above nodes strictly for MS1 data only? If there are only for MS1, is there any other way to perform baseline reduction and noise filtering or correct rt distortions and group features from multiple maps of LC-MS/MS data before using the MetaboliteSpectralMatcher node?

ktzanakis · November 4, 2021, 12:48pm

So, basically can we use the FFM and MapAlignerPoseClustering(MAPC) with ms/ms and then the MetaboliteSpectralMatcher(MSM)?

Also the MSM does not give the abundance of a metabolite in the output table. Would it make sense to use the FFM with the MAPC in a separate branch from MSM and then join their results on rt-retention_time and mz-exp_mass_to_charge?

jpfeuffer · November 4, 2021, 1:09pm

That is exactly what I would do. You could also try IDMapper to map the IDs from MSM onto the quantified features from FFM (but I am not sure if MSM output is compatible yet).

FFM automatically extracts only MS1 spectra, while MSM extracts only MS2 spectra. They could be in the same mzML file (and most of the time they are).
FeatureLinker and MapAligner use only featureXMLs so they do not need the

And yes, you could use Baseline/Golay filters on MS2 level, but I am not sure if this is really necessary. I would try without first.

ktzanakis · November 4, 2021, 1:34pm

Thank you so much for your answer!

In your 2nd paragraph at the end you write “…so they do not need the…”?

Yes…IDMapper and MSM are not compatible! The point is that I used FFM on one file and I get ~23000 features. At the same time used the same file on the MSM and I got 29 hits. Trying to join on rt and mz afterwards I get an empty table! They are all double with the same number of decimals (used Round Double node to cut down digits to the results of both tables)! Unfortunately, even if I have 0 digits for mz and rt of both tables(one from FFM one from MSM) I get an empty table with Join.

If I do not use FFM on the one branch and just use TextExporter and FeatureTextReader I get ~3.100.000 features where I can join and NOT take an empty table! But without FFM is a bit weird as there are many intensities of the raw mzml file that have 0 value!

jpfeuffer · November 4, 2021, 2:06pm

Unfortunately, you have to write a small script to do matching with tolerance. Especially on RT level.
Features are 2D boxes with a startRT and endRT and potentially a span in mz direction as well (if multiple isotopic traces were found).
Features can elude for several seconds and the mass spectrometer can take an MS2 scan at any point in time there.
I am not sure if FeatureTextReader also reads rt_start and rt_end. The default rt value is a centroid however.

jpfeuffer · November 4, 2021, 2:12pm

By the way, what does IDMapper say? Maybe we can fix that?

ktzanakis · November 4, 2021, 2:26pm

The script would be a bit tricky! If I could use the join node with tolerance it would be exactly that!

Well, the MSM gives mzTab as output and the IDMapper takes mzid,idXML on the 1st port(there I guess should go the MSM output), featureXML,consensusXML,mzq on the 2nd port(that should go the FFM output) and an optional mzml on the 3rd port for unidentified spectra.

ktzanakis · November 4, 2021, 3:14pm

I managed to do it using a cross Joiner, so to cover all combinations between rt and mz, and then I used a java snippet where on every row checks a plus/minus 10 for rt and a plus/minus 0.5 for mz. I get 1120 identified features with intensities. Although a cross joiner is a bit expensive, and also I am not 100% sure if rt_tolerance =10 and mz_tolerance=0.5 are satisfactory values. Any luck with the IDMapper?

jpfeuffer · November 4, 2021, 3:49pm

Hmm yes, I can see the issue with IDMapper. Unfortunately, it is not a quick fix but a larger restructuring, which is actually something I wanted to look at anyway.

Did you check if you have rt_start and rt_end in your tables? then you could get rid of tolerances. Try exporting the convex_hulls via the corresponding parameter in FFM.

ktzanakis · November 4, 2021, 4:21pm

Yes there is rt_start and rt_end. You mean use these two values instead of minus/plus 10?
Yes, convex_hulls are exported.

jpfeuffer · November 4, 2021, 4:23pm

yes, just check if RT from MSM is between RT_start and RT_end of FFM.
For simplicity, I would do the same for mz if possible.

ktzanakis · November 4, 2021, 4:31pm

Ok I set it with rt_start and rt_end. You mean not to set “mz tolerance” at all? Now I get ~25000 metabolites instead of 1120!
Yes it would be nice if there was “mz_start” and “mz_end”.

jpfeuffer · November 4, 2021, 4:33pm

Ok if there is no mz_start and end, then you have to use a tolerance there. But your current tolerance is too high:
I would suggest a relative tolerance of 10 to 20 ppm (= 1/1,000,000)

ktzanakis · November 4, 2021, 4:44pm

I am not sure if by i.e. 20 ppm you mean simply 20/1000000 and basically mz plus/minus 0.00002. This is my java snippet:

if(c_exp_mass_to_charge-0.00002<=c_mz && c_mz<=c_exp_mass_to_charge+0.00002){
if(c_retention_time>=c_rt_start && c_retention_time<=c_rt_end){
out_newrt = c_rt;
out_newmz = c_mz;
}
}

But that gives me only 10 metabolites.

ktzanakis · November 4, 2021, 4:55pm

I missed something there. It must be around plus/minus 0.01 with which I get 175 metabolites.

jpfeuffer · November 4, 2021, 5:22pm

you usually do mz - mz * tolerance * 1/1000000 <= mz && mz <= mz + mz * tolerance * 1/1000000

ktzanakis · November 4, 2021, 5:47pm

Thank you very much for your help!
I changed it to:

if(c_exp_mass_to_charge-c_exp_mass_to_charge * 20 * 1/1000000<=c_mz && c_mz<=c_exp_mass_to_charge+c_exp_mass_to_charge * 20 * 1/1000000){
if(c_retention_time>=c_rt_start && c_retention_time<=c_rt_end){
out_newrt = c_rt;
out_newmz = c_mz;
}
}

and now I get 76 metabolites.

jpfeuffer · November 4, 2021, 8:52pm

Does not sound much, but everything else would in my opinion be too likely to be a false positive.
The tolerance depends a bit on your instrument resolution.

Which spectral database did you use?

ktzanakis · November 4, 2021, 10:46pm

It is not that many indeed. I used the latest MBSpectra.mzml file which i built according to this https://github.com/OpenMS/MassBankUpdate. The truth is, I am not aiming at a specific instrument but more like to a globally acceptable tolerance value. So i will tweak it a bit maybe.

P.S.: I didnt know how to update the MB2HMDBMapping.csv so I left it as it was https://forum.knime.com/t/openms-updating-mb2hmdbmapping-file-to-create-the-mbspectra-mzml-for-metabolitespectralmatcher-node/37097.

ktzanakis · November 5, 2021, 5:26am

Do you think it would make sense to use MAPC and FeautureLinker in the separate branch where the FFM is? I would get mz_cf and rt_cf and then join these values with the exp_mass_to_charge and retention time of every file from MSM?

Or would it make more sense to join on the separate values rt_0, mz_0, etc from MAPC and FeatureLinker with the exp_mass_to_charge and retention time of every file from MSM?