Agilent Spectrum Mill MS Proteomics Workbench

MS PMF Search/Summary


Table of Contents


Introduction

Peptide mass fingerprinting (PMF) is a  popular technique for protein identification. The method encompasses digestion of the protein with site-specific proteases, measurement of the peptide masses by mass spectrometry (MS), and protein identification via a database search. The PMF Search capability within the Spectrum Mill workbench is an advanced, automated database search program for MS-only spectra. 

PMF Search is limited to the analysis of digests from simple protein mixtures (usually three to five proteins). When there are peptides from too many different proteins in your spectrum, you will not achieve statistically significant scores because the more complex the protein mixture, the more non-matching (noise) peptides for any given protein.  Use of the Spectrum Mill workbench's Mixture scoring option helps to overcome this limitation.

With PMF Search, the certainty of the identification is primarily a function of the level of mass accuracy.

PMF Search is an automated program that searches one or more mass-intensity files. The Spectrum Mill workbench also has a  Manual PMF Search where the mass list can be typed or copied manually into the search form. You access Manual PMF via the PMF Search page or the Spectrum Mill home page. Use the Manual PMF Search form to paste data from:


To Use the PMF Search Form

The following topics describe options available on the PMF Search form.

Search

Data Directory

Search Parameters

Modifications

Search Criteria

The next several topics describe options available in the Search Criteria section of the PMF Search form.

Matching Tolerances

Data Files

Spectral Features

Contaminant Masses

Recalibrate Data

Report Details


To Use the PMF Summary Form

The following topics describe options available on the PMF Summary form.

Click here for default values.

Summarize Results for Review

Data Directory

Sorting

Review Fields


To Use the Manual PMF Search Form

The following topics describe options available on the Manual PMF Search form. Use the Manual PMF Search form to paste data from a Molecular Feature Extraction of Q-TOF and TOF .d files with MassHunter Qualitative Analysis.  See the Familiarization Guide and the Application Guide for instructions on how to use MassHunter Qualitative Analysis and Manual PMF Search to process MS-only Q-TOF and TOF .d files.

To return to default settings on the Manual PMF Search page, click the Spectrum Mill button to go to the Spectrum Mill home page.  Then click the link on the home page to go back to the Manual PMF Search page.

Search

Search Parameters

Modifications

Peptide Masses


Mass Tolerance

The mass tolerances should be set to be consistent with the mass accuracy of the instrument used to generate the data. For TOF instruments, it is generally a better idea to use units of ppm or % rather than Da, as  these mass spectrometers typically have an error associated with mass measurement that is mass-dependent and thus cannot be uniformly expressed in Da. For ion trap instruments, it is better to use units of Da.

If you set the mass tolerance too tight, you may miss peptides, but if you set it too loose, you may generate false positives.

Measuring masses as accurately as possible is the single most important thing one can do to achieve the highest certainty of protein identification in a peptide mass fingerprinting experiment. 


Instrument

For MS-only data, when you select an instrument, you trigger the software to configure extraction and search parameters that are designed particularly for the instrument type. You can edit the instrument parameters or add new instruments by editing the files: msparams_mill/instrument.txt and millhtml/SM_js/instrument.js.

E:\SpectrumMill\msparams_mill\instrument.txt  
E:\SpectrumMill\millhtml\SM_js\instrument.js  

If you add an instrument, be sure to set the parameters  in instrument.txt in a way that is appropriate for the data you export from that instrument. For example, if deisotoping is accomplished by the instrument data system, set bypassDeIsotoping = 1 in instrument.txt, to avoid repetition of deisotoping  in the Spectrum Mill workbench.

Examples of supported MS-only instrument configurations are shown below.  Users should ordinarily NOT change these values. For additional supported instruments, see E:\SpectrumMill\msparams_mill\instrument.txt.

Feature Description MALDI-TOF MALDI-TOF-AGILENT MALDI-ION-TRAP MALDI- QSTAR ESI-TOF-AGILENT
instrument charges certain see below* yes if determined if determined if determined yes
minSignalNoiseRatio threshold for peak detection for MS/MS data 30 8 5 8 15
minSignalNoiseRatioPMF threshold for peak detection for MS (PMF) data   2 15 15 15
peakLimitCount max # of detected peaks to use for interpretation 25 100 25 25 500
peakBinningTolerance used for centroiding in Data Extractor - expected peak width in amu 0.6 0.2   0.6 0.2
bypassDeIsotoping skip de-isotoping yes no no no no
bypassSignalNoiseThresholding skip S/N thresholding yes no no no no

*instrument_charges_certain:


Ranking / Scoring of Results

Probability Scoring

The probability scores represent the chance that the protein match occurs by chance. Thus a score of 0.5 means that match has a 50% chance of occurring randomly. A score of 1e-6 means that match has a one-in-a-million chance of occurring randomly. The probability distribution is calculated after counting the occurrences in the database of each mass submitted within the specified mass tolerance. Consequently, the score for the same set of masses submitted will change if the mass tolerance, the enzyme, the number of missed cleavages, or the database is changed. Also note that modified amino acids such as met-sulfoxide do not contribute to the score.

There are two types of probability scores. In the PMF Summary report, the column labeled Static Probability Score lists the probability score calculated based on the Peptide mass tolerance chosen in the PMF Search form.  The column labeled Dynamic Probability Score lists the probability score calculated based on the actual peptide mass deviations determined from the data.  Thus, if the actual data is more accurate than the mass tolerance set in PMF Search, then the Dynamic Probability Score will be better (smaller number) than the Static Probability Score.

Mixture scoring

When you invoke mixture scoring within PMF Search, the software assigns probability scores to potential mixtures and color-codes the mass spectrum to show peaks from each component. If the sample represents a mixture and this check box is marked, then the scoring method is optimized for mixtures.  Here is an example:

Say you have a three-component mixture with a total of 100 mass spectral peaks.  Component A matches 30 peaks, component B matches 30 peaks, component C matches 30 peaks, and 10 peaks are noise.  If you do not mark the check box for Mixture scoring, then the score for component A is penalized for the fact that it represents only 30 peaks out of the 100 total.  When you do mark the check box for Mixture scoring, then the score for component A is penalized only by the 10 noise peaks, because it now represents 30 peaks out of the 40 peaks remaining after the software subtracts the peaks attributed to components B and C.  In the results, the scores for each individual protein in the mixture are the same as without mixture scoring, but the core for the overall mixture does take into account the scenario described above.

Another advantage of mixture scoring is that there are bonus points for the peaks being mutually exclusive (e.g., no overlap of peaks among components). 

The mixture scoring feature is especially useful if you have a mixture where one protein dominates the spectrum, because it avoids the situation where the top hits are the dominant protein and various precursors of the dominant protein.  When you enable mixture scoring, you are  more likely to identify the less abundant protein components in a mixture.

If you mark this box and the sample is actually a single component, the search will take a very long time. In this case, you may want to stop the search, clear the Mixture scoring check box, and restart the search.

If you invoke mixture scoring, it is more convenient to review the results directly from the PMF Search page than from the PMF Summary page.  The links from the results section of the PMF Search page take you directly to the mixture results without requiring additional clicks.

The default parameters for mixture scoring (e.g., the total number of components permitted in the mixture) are set in msparams_mill\mixParamsMsfit.txt.

MOWSE Score

The MOWSE score reported by PMF Search is based on the scoring system described in Pappin et al, Current Biology, 1993, Vol 3, No 6, pp 327-. As PMF Search offers several options not available in the initial version of MOWSE, several modifications have had to be made.

After the species and molecular weight pre-searches, the remaining proteins undergo theoretical digestion. The resulting peptides are then placed in bins based on their molecular weight and the intact molecular weight of undigested protein they originated from. There are eleven intact molecular weight bins. Under 100000 Da, there are 10 bins of width 10000 Da. The other bin contains all the proteins over 100000 Da.  There are thirty peptide molecular weight bins of width 100 amu between 0-3000 Da.  Peptides above 3000 Da are not binned. Peptides with no missed cleavages contribute 1.0 to the bin total, whereas peptides containing missed cleavages contribute pfactor (a user supplied parameter).

Bin frequency values are then calculated by dividing the bin totals by the sum of the bin totals for each 10000 Da protein interval. The bin frequency values are then normalized to the largest bin frequency value to yield frequency values between 0 and 1.

Masses in the theoretical digestion which match masses in the data set are divided into scoring matches and non-scoring matches. Scoring matches include unmodified peptides and acrylamide-modified Cys and N-terminal Gln to pyroGlu and oxidation of Met in the presence of the unmodified peptide. Non-scoring matches include pyroGlu and oxidation of Met in the absence of the unmodified peptide, acetylated N-termini, phosphorylation of S, T and Y, and single amino acid substitutions. Unmatched masses are ignored. The score for each matching mass is assigned as the appropriate normalized distribution frequency value. In the case of multiple matching masses, the scores are multiplied together. The final product score is inverted and normalized to an average protein molecular weight of 50 kD.

For databases with < 1000 entries (not enough entries to generate valid scoring statistics)

PMF Search scoring systems are turned off and a simple ranking system is used. The results are sorted so that if multiple database entries are matched, more likely sequences are listed higher in the list. All database entries matching the input data and parameters are ranked on the following basis:

  1. Database entries with the highest number of matched masses are ranked higher.
  2. Among equivalent matches (those with the same rank) the results are sorted in order of increasing index number.

Note that the last sort does NOT imply a BETTER ranking, even though one match will be listed higher than another, but is merely intended to provide some organization to the listing.


Data Recalibration

The data recalibration feature of PMF Search/PMF Summary is useful if you have data files that were acquired without the instrument having been properly calibrated.  This feature recalibrates the experimental mass data based on the peptide masses of the top-scoring database match.  To use this feature:

  1. Run PMF Search
  2. .
  3. Run PMF Summary
  4. .
  5. Rerun PMF Search
  6. Examine the new results in PMF Search.

Caveats:

In PMF Summary, the slope and intercept are determined for each sample.  However, when you type these into PMF Search, they apply to the entire sample set.  So, unless you plan to process one sample at a time, this feature corrects for instrument calibration problems, but not for sample-specific calibration issues (e.g., MALDI plate surface irregularities).

If the top-scoring match (used to calculated the slope and intercept) is wrong, then the new calibration is wrong.  For a set of samples, it is worthwhile to examine a number of results to ensure that you select a valid slope and intercept to type into PMF Search.