Spectrum Mill MS/MS Search


Table of Contents


Introduction

The MS/MS Search module in Spectrum Mill automates the search of processed MS/MS spectra against protein or DNA databases. The MS/MS Search algorithm uses intelligent parallelization to provide extremely fast searches. It can operate in identity mode to find unmodified peptides or in variable modifications or homology modes to look for mutations, post-translational modifications, and chemical modifications.

As you process data with Spectrum Mill, you may iterate through multiple rounds of database search and results validation, with the goal of identifying as many spectra as possible. Spectrum Mill provides a means to segregate search results that contain a valid interpretation of an MS/MS spectrum from those that do not. Spectra that do not have validated matches can then be subjected to subsequent rounds of searches (against larger databases or in variable mode, for example). Spectrum Mill retains a cumulative list of validated matches that you can summarize at any time in the process.


What is a Fragment-Ion Tag?

A Fragment-Ion Tag can be obtained from an MS/MS spectrum and consists of three attributes:

Fragment-ion Tag
  1. A peptide precursor-ion mass: Pm.
  2. Masses of all sequence related fragment-ions from the peptide: Fi, Fj, ... The fragment ions need not all be of the same ion type. Supported fragment-ion types are described below.
  3. Masses of composition ions which indicate the presence of particular amino acids in the peptide: Ci, Cj, ... This information can be from immonium and related low mass ions or high mass ions representing side-chain losses from the precursor ion.


Search Mode

Definitions

The following modes are available for MS/MS Search:

To search for an unknown or unexpected modification, select one of the homology modes described above, then click Unassigned single mass gap. This search looks for an unexpected modification (a mass gap). It is a way of doing an "error tolerant database search." Note that you cannot simultaneously search variable modifications in this mode. This type of search is best done when searching previous non-validated hits.

Tips:

Searches run faster and generate fewer false positives when fewer modifications are considered.

Your system administrator can define new mutation/substitution matrices. Do not attempt to modify the existing homology modes.

Comparison of search modes

  Included in the search
  Exact matches* Variable modifications** All single amino acid substitutions Only single amino acid substitutions that would result from a point mutation
Identity yes no no no
Variable modifications yes yes no no
Homology - All mutations
  • Single AA substitution
yes yes yes no
Homology - All mutations
  • Unassigned single mass gap
yes no - only an unassigned modification no no
Homology - Single base pair mutations
  • Single AA substitution
yes yes no yes
Homology - Single base pair mutations
  • Unassigned single mass gap
yes no - only an unassigned modification no no

*Exact matches take into account fixed modifications, which are applied universally to their respective amino acids. Exact matches also take into account any mix modifications that are applied during the search cycle.

**You must select the variable modifications you wish to search (Choose... button)

Search mode translator across Spectrum Mill versions

Search mode and other settings
Identity
This mode is no longer supported.
Variable modifications
Select the appropriate k, m, q, s, t, y combinations as variable modifications.
Set Precursor mass shift range as shown in the following table:
Mode Precursor mass shift range
Homology Multi - mq -18 to 33
Homology Multi - sty 0 to 241
Homology Multi - mqsty -18 to 257
Homology Multi - mqst -18 to 257
Homology Multi - mqy -18 to 177
Homology Multi - kmqsty -18 to 257
Homology Multi - kmq -18 to 101
Homology – All mutations
Select k, m, q, s, t, and y as variable modifications.
Set Precursor mass shift to +/- 81
Homology – Single base pair mutations


Homology Mode

In order to match one's data to a sequence in the database that is not identical to the peptide used to generate the MS/MS spectrum, MS/MS Search must be used in homology mode. This enables matching for peptides with a mutation, cross-species substitution, sequence polymorphism, or error in the database. Homology mode works based on three concepts:

  1. Allow precursor mass to be shifted from the precursor mass of sequences in the database.
  2. Consider each ion independently rather than examining relationships between ions.
  3. When a precursor ion undergoes fragmentation, at least two pieces are formed (a fragment-ion and a neutral). While only the mass of the ionized fragment is measured, the mass of the neutral piece is easily calculated as precursor mass - fragment ion mass. If the peptide matches a database sequence exactly, then the masses of BOTH the fragment-ion and the neutral will match. If there is a single sequence difference, then the mass of either the fragment-ion or the neutral will match, but NOT BOTH. If there are two sequence differences, fragmentation at any sites located BETWEEN the two mismatched sites will result in NEITHER the fragment-ion nor the neutral matching.

Fragment-ion Tag and Sequence Mismatching

Matching sequences are filtered through a mutation matrix to try to find a single amino acid (AA) substitution that would transform the calculated mass of the database sequence to the experimentally determined mass. The output displays the necessary substitution and the corresponding sequence consistent with the experimental peptide mass data (not the sequence present in the database).


Precursor Mass Shift

MS/MS Search only considers database sequences with calculated precursor masses which pass through a precursor mass filter. In Identity Mode, the filter is determined by the specified precursor mass +/- the precursor mass tolerance. In Homology mode this is determined by the specified precursor mass and the precursor mass shift. You should NOT attempt to accomplish this by using a wider precursor mass tolerance. Use a precursor mass tolerance consistent with the accuracy to which the precursor mass is measured. The default value of +/- 130 allows for the largest possible precursor mass shift associated with a mutation among the 20 standard amino acids and phosphorylation. All database sequences with a calculated precursor mass + / - 130 Da of the specified precursor mass would thus be considered. This means a large increase in the number of sequences considered, and hence increases the potential for false-positives. The +/= and -/= features allow you to specify an anticipated precursor mass shift value and reduce the number of sequences considered in a search. For example, suppose you expect a phosphorylated peptide; specifying a precursor mass shift of +/= 80 would allow matches to database sequences that exactly match the precursor mass or database sequences that would match the specified precursor mass if 80 Da were added.

The default precursor mass shift range for the Variable modifications search mode is -18 to 177. You can change this setting to encompass the number and type of modifications you expect for your sample.

To summarize, the shift can be set in four different forms, all of which show only homologous matches, thus excluding identity mode matches:

Note that the +/- will compare many more spectra so it will take longer to run, and the run time will be proportional to the magnitude of the Precursor mass shift.


Ranking / Scoring of Results

The explanations below are for a simple score calculation. With version B.04.00 and later, you also have the option to select Discriminant Scoring.  See Discriminant scoring to learn more about the "figures of merit" that comprise the discriminant score and search mode to learn about the discriminant scoring options from which you can choose.

Scoring Scheme

Following peak detection, the MS/MS Search algorithm attempts to match every ion present in an MS/MS spectrum to an ion type consistent with fragmentation of a peptide sequence from a database. The scoring system is information-content oriented and based on the following general principles:

Kapp, E. A.; Schutz, F.; Reid, G. E.; Eddes, J. S.; Moritz, R. L.; O'Hair, R. A. J.; Speed, T. P.; Simpson, R. J.; "Mining a Tandem Mass Spectrometry Database To Determine the Trends and Global Factors Influencing Peptide Fragmentation;" Anal. Chem.; 2003; 75(22); 6251-6264. DOI: 10.1021/ac034616t

Do not use proton mobility scoring when peptides are modified in a way that significantly alters the way the peptide fragments.  You can use proton mobility scoring for 14N/15N, SILAC, and 16O/18O because these isotopic labels do not change the structure of the peptides and thus do not change the way these peptides fragment. However, the iTRAQ labels and the lys-imidazole labels do change the way a peptide fragments and thus proton mobility scoring is not advised for these. It is also not advised for guanidination or phosphorylation.
 

The scoring scheme is intended to facilitate review/filtering of large numbers of spectra (100's - 1000's) that enables segregating valid from false-positive interpretations. Note that probability values associated with the number of proteins/peptides in the database are NOT used. Thus the score for a spectrum against a particular candidate sequence will always be the same, as the information content of the spectrum is database-independent.

MS/MS Search scoring has four particular attributes. Only the first two are used for post-search review/filtering purposes within the Protein/Peptide Summary portion of Spectrum Mill.

Guidelines for correlating peptide score with the correct interpretation and the extent of peptide fragmentation:

The correct cut-off score is dependent on the sample complexity and type.  These values should be used as guidelines only.

Q-TOF

Ion trap

Fragmentation and Sequence Information Content

The figure below illustrates some of the diversity of quality and information content in MS/MS spectra.

Spectral Diversity

If scores for a single spectrum against multiple candidate sequence are identical, the results are then sorted so that if multiple sequences are matched, more likely sequences are listed higher in the list and sorted on the following basis:

  1. In Variable modifications mode and in Homology mode, matches to database sequences identical to their appearance in the database are listed higher.
  2. In Variable modifications mode and in Homology mode, matches with known modifications are listed higher.
  3. Sequences matching with the least number of unmatched ions are listed higher.
  4. Among equivalent matches, the results are sorted in order of increasing precursor mass.
  5. Among equivalent matches, the results are sorted in order of increasing index number.

Note that the last two sorts do NOT imply a BETTER ranking, even though one match will be listed higher than another, but are merely intended to provide some organization to the listing and to aid the user in viewing the results.


Discriminant Scoring

Discriminant scoring was new with Spectrum Mill version B.04.00.  If you enable discriminant scoring on the MS/MS Search page, the discriminant score is calculated in addition to the score, and you can select to use the discriminant score for Autovalidation.

With Spectrum Mill B.05.00, the default Discriminant mode is Off rather than Score only. Update any workflow parameter files to use the new settings.


The discriminant score is a combination of several figures of merit into a single score, which helps discriminate between real positives and false positives more successfully than score alone.  On average, the number of ID's with 1% and 5% false discovery rates increases by 10% using a discriminant score over score alone.  That is, it will improve the Global and Local Peptide FDR autovalidation because it includes additional factors other than the score.

To use discriminant scoring you must either use an already validated set of coefficients for the figures of merit for Agilent instruments or create your own set of coefficients, which you can do in the Tool Belt.  They require a carefully validated data set that is typical of the type to be used for future searches.  The data set also needs to be sufficiently large to generate reasonable coefficients.


These are the figures of merit from database search scores included in the discriminant score:


Instrument / Fragment-Ion Types

Selecting an instrument triggers the configuration of MS/MS Search scoring and peak detection parameters designed particularly for the type and extent of peptide fragmentation observed on that instrument. The particular parameters can be edited or new instruments added by editing the files: msparams_mill/instrument.txt, and millhtml/SM_js/instrument.js

E:\SpectrumMill\msparams_mill\instrument.txt  
E:\SpectrumMill\millhtml\SM_js\instrument.js  

If you add a new instrument type, be sure to set the parameters  in instrument.txt in a way that is appropriate for the data you export from that instrument. For example, if deisotoping is accomplished by  the instrument data system, set bypassDeIsotoping = 1 in instrument.txt to avoid repeating deisotoping in Spectrum Mill.

Examples of supported instrument configurations are shown in the three tables below.  Agilent ESI instruments are described in the first table, while other ESI instruments are described in the second. MALDI instruments are described in the third table. For additional supported configurations and the latest updated settings, see E:\SpectrumMill\msparams_mill\instrument.txt.

 

Table 1. Examples of supported configurations for Agilent instruments

Feature Description ESI-ION-TRAP-Agilent ESI-ION-TRAP-Agilent-ETD ESI-QTOF-Agilent
nh3_loss NH3 loss residues R, K, Q R, K, Q R, K, Q, N
h2o_loss H2O loss residues S, T, E, D S, T, E, D S, T, E, D
pos_charge charge-bearing residues R, H, K, N, Q R, H, K, N, Q R, H, K, N, Q
instrument charges certain fragment charges certain (allows ambiguity in charge) no no if determined
min_fragment_mass discards peaks below
impacts immonium ion detection capability
105 105 58
max_internal_ion_mass impacts search speed
if internal ions allowed
N/A N/A 750
minSignalNoiseRatio threshold for peak detection 8 0 8
minSignalNoiseRatioPMF threshold for peak detection in MS-only mode   5 15
peakLimitCount max # of detected peaks to use for interpretation 25 25 25
peakBinningTolerance used for centroiding in Data Extractor - expected peak width in amu N/A N/A 0.1
bypassDeIsotoping skip de-isotoping no no no
bypassSignalNoiseThresholding skip S/N thresholding no no yes
composition_bonus_scoring MALDI equivalent to proton mobility scoring, where bonuses are applied only to fragments on the N-terminal side of aspartic or glutamic acid and the C-terminal side of proline, scaled based on the relative intensity of the fragment. It does not give a bonus to any other amino acid. no no no
merge_num_peaks For similarity merging of MS/MS spectra, the number of peaks that match between the two spectra  must be greater than or equal to merge_num_peaks, which is a number between 0 and 50. The similarity merging takes the top 50 peaks from both spectra and compares them. All instruments that generate MS/MS data use the default merge_num_peaks = 25, but if you add an entry to instrument.txt, your entry overrides the default.  The format is merge_num_peaks, followed by a tab, followed by the value. 25 (default) 25 (default) 5
merge_SPI For similarity merging of MS/MS spectra, the percentage of the total intensity of the top 50 spectral peaks that is matched from spectrum A to spectrum B and from spectrum B to spectrum A must be greater than or equal to merge_SPI, which is a number between 0 and 100. All instruments that generate MS/MS data use the defaults of merge_SPI = 70, but if you add an entry to instrument.txt, your entry overrides the defaults.  The format is merge_SPI, followed by a tab, followed by the value. 70 (default) 70 (default) 50
minValidMSMSScore Scores lower than this setting are ignored during search. Lower values allow smaller peptides to be kept as possible hits, at the risk of adding more false hits. Note that this setting also affects reverse scores. 3 (default) 0 3 (default)
minMSMSScoreForOutputFile If the score is lower than this setting, the spo file is not generated. This helps limit “file clutter”. For small peptides, use a smaller number. 3 0 3
Ion type Restrictions ESI-ION-TRAP-Agilent
Score
ESI-ION-TRAP-Agilent-ETD ESI-QTOF-Agilent
Score
a none 0.25 N/A 0.50
b, y none 1.00 N/A, 0.25 0.5, 1.5
a-NH3 contains NH3 loss residue N/A N/A N/A
b-NH3, y-NH3 contains NH3 loss residue 0.50 N/A 0.25, 0.5
b-H2O, y-H2O contains H2O loss residue 0.50 N/A 0.25, 0.5
b+H2O ion contains charge bearing residue
only bn-1, bn-2 ( length n)
1.00 N/A 1.00
d(H) AA is A,C,D,E,K,M,N,R,Q, or S N/A N/A N/A
d(CH3) AA is I,T,or V N/A N/A N/A
w(H) AA is A,C,D,E,K,M,N,R,Q, or S N/A N/A N/A
w(CH3) AA is I,T,or V N/A N/A N/A
b++, b+++, y++, y+++ fragment charges not certain
precursor charge > 2 (++), > 3 (+++)
ion contains sufficient charge bearing residues
1.00 N/A, N/A, 0.25, 0.25 0.5, 0.5, 1.5, 1.5
b++-H2O, y++-H2O fragment charges not certain
precursor charge > 2 (++)
ion contains > 1 charge bearing residue
contains H2O loss residue
corresponding b++, y++ present
0.50 N/A 0.25, 0.5
a-H3PO4 ion contains phosphorylated S, T, Y
automatically turned on in homology mode
following detection of M-H3PO4
N/A N/A N/A
b-H3PO4, y-H3PO4 ion contains phosphorylated S, T, Y
automatically turned on in homology mode
following detection of M-H3PO4
0.25 N/A 0.50
b-SOCH4, y-SOCH4 ion contains oxidized M
automatically turned on in homology mode
following detection of M-SOCH4
0.25 N/A N/A
internal b < max_internal_ion_mass N/A N/A 0.75
internal a < max_internal_ion_mass, internal b present N/A N/A 0.50
internal b-H2O < max_internal_ion_mass, internal b present
ion contains H2O loss residue
N/A N/A 0.50
internal b-NH3 < max_internal_ion_mass, ion contains R N/A N/A 0.50
N-term ladder removal of N-term residues (y equiv.) N/A N/A N/A
C-term ladder removal of C term residues (b+H2Oequiv.) N/A N/A N/A
c cannot cleave at proline N/A 1.00 N/A
c++, c+++ cannot cleave at proline N/A 1.00 N/A
z· cannot cleave at proline N/A 1.00 N/A
z·++, z·+++ cannot cleave at proline N/A 1.00 N/A
c·, c·++, c·+++ cannot cleave at proline N/A 0.25 N/A
z··, z··++, z··+++ cannot cleave at proline N/A 0.25 N/A

*N/A = not applicable

 

Table 2. Examples of ESI configurations

Feature Description ESI-ION-TRAP ESI-LINEAR-ION-TRAP ESI-QTRAP ESI-QSTAR ESI-QTOF
nh3_loss NH3 loss residues R, K, Q R, K, Q R, K, Q, N R, K, Q, N R, K, Q, N
h2o_loss H2O loss residues S, T, E, D S, T, E, D S, T, E, D S, T, E, D S, T, E, D
pos_charge charge-bearing residues R, H, K, N, Q R, H, K, N, Q R, H, K, N R, H, K, N, Q R, H, K, N, Q
instrument charges certain fragment charges certain (allows ambiguity in charge) no no no if determined if determined
min_fragment_mass discards peaks below
impacts immonium ion detection capability
105 105 105 105 105
max_internal_ion_mass impacts search speed
if internal ions allowed
N/A N/A 750 750 750
localSignalNoiseRatio Signal-to-noise is calculated in local windows 100 m/z wide above the precursor m/z, and 70 m/z wide below. The window width is increased in integer multiples if there are less than 30 data points in the window above the precursor, or less than 20 data points in the window below the precursor. no yes no no no
minSignalNoiseRatio threshold for peak detection 8 8 8 8 8
minSignalNoiseRatioPMF threshold for peak detection in MS-only mode       15  
peakLimitCount max # of detected peaks to use for interpretation 25 25 25 25 25
peakBinningTolerance used for centroiding in Data Extractor - expected peak width in amu N/A N/A 0.95 0.3 N/A
bypassDeIsotoping skip de-isotoping no no no no no
bypassSignalNoiseThresholding skip S/N thresholding no no no no no
composition_bonus_scoring MALDI equivalent to proton mobility scoring, where bonuses are applied only to fragments on the N-terminal side of aspartic or glutamic acid and the C-terminal side of proline, scaled based on the relative intensity of the fragment. It does not give a bonus to any other amino acid. no no no no no
merge_num_peaks For similarity merging of MS/MS spectra, the number of peaks that match between the two spectra  must be greater than or equal to merge_num_peaks, which is a number between 0 and 50. The similarity merging takes the top 50 peaks from both spectra and compares them. All instruments that generate MS/MS data use the default merge_num_peaks = 25, but if you add an entry to instrument.txt, your entry overrides the default.  The format is merge_num_peaks, followed by a tab, followed by the value. 25 (default) 25 (default) 25 (default) 25 (default) 25 (default)
merge_SPI For similarity merging of MS/MS spectra, the percentage of the total intensity of the top 50 spectral peaks that is matched from spectrum A to spectrum B and from spectrum B to spectrum A must be greater than or equal to merge_SPI, which is a number between 0 and 100. All instruments that generate MS/MS data use the defaults of merge_SPI = 70, but if you add an entry to instrument.txt, your entry overrides the defaults.  The format is merge_SPI, followed by a tab, followed by the value. 70 (default) 70 (default) 70 (default) 70 (default) 70 (default)
Ion type Restrictions ESI-ION-TRAP
Score
ESI-LINEAR-ION-TRAP
Score
ESI-QTRAP
Score
ESI-QSTAR
Score
ESI-QTOF
Score
a none 0.25 0.25 0.25 0.50 0.50
b, y none 1.00 1.00 1.00 1.00 1.00
a-NH3 contains NH3 loss residue N/A* N/A N/A N/A N/A
b-NH3, y-NH3 contains NH3 loss residue 0.50 0.50 0.25 0.25 0.25
b-H2O, y-H2O contains H2O loss residue 0.50 0.50 0.25 0.25 0.25
b+H2O ion contains charge bearing residue
only bn-1, bn-2 ( length n)
1.00 1.00 1.00 1.00 1.00
d(H) AA is A,C,D,E,K,M,N,R,Q, or S N/A N/A N/A N/A N/A
d(CH3) AA is I,T,or V N/A N/A N/A N/A N/A
w(H) AA is A,C,D,E,K,M,N,R,Q, or S N/A N/A N/A N/A N/A
w(CH3) AA is I,T,or V N/A N/A N/A N/A N/A
b++, b+++, y++, y+++ fragment charges not certain
precursor charge > 2 (++), > 3 (+++)
ion contains sufficient charge bearing residues
1.00 1.00 1.00 1.00 1.00
b++-H2O, y++-H2O fragment charges not certain
precursor charge > 2 (++)
ion contains > 1 charge bearing residue
contains H2O loss residue
corresponding b++, y++ present
0.50 0.50 0.25 0.25 0.25
a-H3PO4 ion contains phosphorylated S, T, Y
automatically turned on in homology mode
following detection of M-H3PO4
N/A N/A N/A N/A N/A
b-H3PO4, y-H3PO4 ion contains phosphorylated S, T, Y
automatically turned on in homology mode
following detection of M-H3PO4
0.25 0.25 0.25 0.50 0.50
b-SOCH4, y-SOCH4 ion contains oxidized M
automatically turned on in homology mode
following detection of M-SOCH4
0.25 0.25 0.25 0.25 N/A
internal b < max_internal_ion_mass N/A N/A 0.75 0.75 0.75
internal a < max_internal_ion_mass, internal b present N/A N/A 0.25 0.50 0.50
internal b-H2O < max_internal_ion_mass, internal b present
ion contains H2O loss residue
N/A N/A N/A 0.50 0.50
internal b-NH3 < max_internal_ion_mass, ion contains R N/A N/A N/A 0.50 0.50
N-term ladder removal of N-term residues (y equiv.) N/A N/A N/A N/A N/A
C-term ladder removal of C term residues (b+H2Oequiv.) N/A N/A N/A N/A N/A

*N/A = not applicable

 

Table 3. Examples of MALDI configurations

Feature Description MALDI-ION-TRAP MALDI-TOF-TOF MALDI-TOF-TOF-DB MALDI-QTOF MALDI-QSTAR
nh3_loss NH3 loss residues R, K, Q R, K, Q R, K, Q R, K, Q R, K, Q
h2o_loss H2O loss residues S, T S, T S, T S, T S, T
pos_charge charge-bearing residues R, H, K R, H, K R, H, K R, H, K R, H, K
instrument charges certain fragment charges certain (allows ambiguity in charge) if determined if determined yes if determined if determined
min_fragment_mass discards peaks below
impacts immonium ion detection capability
105 58 58 58 58
max_internal_ion_mass impacts search speed
if internal ions allowed
750 750 750 750 750
minSignalNoiseRatio threshold for peak detection 5 20 20 8 8
minSignalNoiseRatioPMF threshold for peak detection in MS-only mode 15       15
peakLimitCount max # of detected peaks to use for interpretation 25 25 25 25 25
peakBinningTolerance used for centroiding in Data Extractor - expected peak width in amu N/A N/A N/A N/A 0.6
bypassDeIsotoping skip de-isotoping no yes yes no no
bypassSignalNoiseThresholding skip S/N thresholding no yes yes no no
composition_bonus_scoring MALDI equivalent to proton mobility scoring, where bonuses are applied only to fragments on the N-terminal side of aspartic or glutamic acid and the C-terminal side of proline, scaled based on the relative intensity of the fragment. It does not give a bonus to any other amino acid. yes yes yes yes yes
Ion type Restrictions MALDI-ION-TRAP
Score
MALDI-TOF-TOF
Score
MALDI-TOF-TOF-DB
Score
MALDI-QTOF
Score
MALDI-QSTAR
Score
a none 0.50 0.50 0.50 0.50 0.50
b, y none 1.00 1.00 1.00 1.00 1.00
a-NH3 contains NH3 loss residue N/A N/A N/A N/A N/A
b-NH3, y-NH3 contains NH3 loss residue 0.50 0.50 0.50 0.50 0.50
b-H2O, y-H2O contains H2O loss residue 0.50 0.50 0.50 0.50 0.50
b+H2O ion contains charge bearing residue
only bn-1, bn-2 ( length n)
1.00 1.00 1.00 1.00 1.00
d(H) AA is A,C,D,E,K,M,N,R,Q, or S N/A 0.25 0.25 N/A N/A
d(CH3) AA is I,T,or V N/A 0.50 0.50 N/A N/A
w(H) AA is A,C,D,E,K,M,N,R,Q, or S N/A 0.25 0.25 N/A N/A
w(CH3) AA is I,T,or V N/A 0.50 0.50 N/A N/A
b++, b+++, y++, y+++ fragment charges not certain
precursor charge > 2 (++), > 3 (+++)
ion contains sufficient charge bearing residues
N/A N/A N/A N/A N/A
b++-H2O, y++-H2O fragment charges not certain
precursor charge > 2 (++)
ion contains > 1 charge bearing residue
contains H2O loss residue
corresponding b++, y++ present
N/A N/A N/A N/A N/A
a-H3PO4 ion contains phosphorylated S, T, Y
automatically turned on in homology mode
following detection of M-H3PO4
N/A N/A N/A N/A N/A
b-H3PO4, y-H3PO4 ion contains phosphorylated S, T, Y
automatically turned on in homology mode
following detection of M-H3PO4
0.50 0.50 0.50 0.50 0.50
b-SOCH4, y-SOCH4 ion contains oxidized M
automatically turned on in homology mode
following detection of M-SOCH4
N/A N/A N/A N/A N/A
internal b < max_internal_ion_mass 0.75 0.75 0.75 0.75 0.75
internal a < max_internal_ion_mass, internal b present 0.50 0.50 0.50 0.50 0.50
internal b-H2O < max_internal_ion_mass, internal b present
ion contains H2O loss residue
0.50 0.50 0.50 0.50 0.50
internal b-NH3 < max_internal_ion_mass, ion contains R 0.50 0.50 0.50 0.50 0.50
N-term ladder removal of N-term residues (y equiv.) N/A N/A N/A N/A N/A
C-term ladder removal of C term residues (b+H2Oequiv.) N/A N/A N/A N/A N/A

*N/A = not applicable


Selecting Thermo Fisher Scientific Instruments

If you have a Thermo Fisher Scientific Orbitrap or LTQ FT, select your instrument based on where the MS/MS occurs.

Fragmentation modes and location

With version B.04.00 a list of fragmentation modes is now available.  The list does away with the need for the "MIX" Instrument types available in previous versions.  Select an Instrument, then a fragmentation mode.

Multiply-Charged Ions

When data is of sufficient resolution that charge state can be determined from the isotope distribution, and the software designates MS/MS Search instrument configuration as "fragment charges certain,"  then masses are converted to charge 1 inside MS/MS Search prior to interpretation. However, the charge state is still used to evaluate matching sequences to check that they contain a sufficient number of basic residues to support the charge. Further, in the output, the labels distinguish whether the ion type used inside MS/MS Search was of the converted to charge 1 high res variety (y+2) or of the ambiguous low res variety (y++).


Immonium Ions / Compositional Marker Ions

Marker ions represent peaks that indicate amino acid composition, but do not indicate sequence. The table below describes the allowed amino acid composition marker ions. In general, the scores correspond to the rarity of the amino acids as described by the number of codons coding for the amino acids that can produce the ion.

Mass Composition Score Additional Feature / Constraint
60 S 2/6  
70 PR 2/10  
72 V 2/4  
73 R 2/6  
86 IL 2/9  
88 D 1  
101 KQ 2/4  
102 E 1  
110 H 1  
112 R 2/6  
120 F 1  
129 KRQ 2/10  
136 Y 1  
159 W 2  
(M+zH-H3PO4)+z sty 2 variable mode required with those modifications selected
automatically turns on ion types b-H3PO4, y-H3PO4
(M+zH-284.2)+z
(M+zH-403.3)+z
(M+zH-477.3)+z
C 2 ICAT-D0
(M+zH-288.2)+z
(M+zH-411.3)+z
(M+zH-485.3)+z
C 2 ICAT-D8
(M+zH-270.2)+z
(M+zH-375.1)+z
(M+zH-449.3)+z
C 2 Acetyl-PEO-Biotin

Note that the file msparams_mill\smconfig.xml defines additional marker ions and their scoring for a large number of amino acid modifications. The scoring is invoked when the fixed and variable modifications are selected for the search. System administrators can add custom modifications, along with their marker ions.

For scoring purposes, one can not make a yes/no distinction between marker ions and peaks that are isobaric with marker ions. So for scoring purposes, Spectrum Mill shrinks the intensities of marker ions to 10% of their original intensities. This enables them to be matched when they are isobars, without the intensities causing hit rejection when they are marker ions.


Minimum Scored Peak Intensity

Prior to performing scoring, MS/MS Search first screens the MS/MS spectrum against candidate sequences using a simple filter. This filter is Minimum scored peak intensity. This approaches enhances search speed by allowing candidate sequences to be rapidly and summarily rejected once a sufficient number of spectral peaks are examined and found not to meet the threshold established by this filter.

For ultimate coverage in MS/MS Search, lower the Minimum scored peak intensity. When there are one or more very intense peaks that overwhelm other peaks but cannot be assigned, setting this value to near 0% may improve the number of hits at the expense of longer search times.

Guidelines: Since the matching which occurs before scoring is dependent on this filter, the value should be set in relation to one's expectation of the quality of peak detection, i.e. noise removal and selection of  12C isotope peaks representing fragment ions corresponding to the selected Allowed Fragment-Ion types in the spectrum prior to searching. This parameter has a very significant impact on search speed; the more unmatched peak intensity allowed (lower percentage), the longer the search time. Composition ions are counted as unmatched intensity, but only at 1/10 their actual peak height.


Mass Tolerances

The tolerances on both the precursor ion and fragment ions should be set to be consistent with the mass accuracy of the instrument used to generate the data. For spectra from time-of-flight instruments, it is generally a better idea to use units of ppm or % rather than Da, since mass accuracy is often better at lower mass than at higher mass. 


Batch Size

When you run MS/MS Search, the batch size determines the maximum number of spectra analyzed in one pass through the database. Since all spectra of similar charge states are grouped together before splitting into batches, the last batch for each charge state will likely contain fewer spectra than the maximum batch size.

For maximum search speed, the optimum batch size depends on the size of the database, the type of search (identity, variable modifications, or homology), the number of modifications, and the mass accuracy of the instrument. If the batch size is too large for the complexity of the search, the search may time out and fail to complete. For a complex search, there is no advantage to using a larger batch size because the majority of the search time results from the database matching rather than setting up the batches. The following table provides guidelines for the batch size you should enter into the MS/MS Search form.

With B.04.00 and later, larger batch sizes may be specified without the risk of timeouts. If Maximize CPUs is marked, best performance is with a batch size of 150 or more. The default batch size is now 500. If you have less than 16 Gb of memory and are searching large data sets, use a batch size less than 500.

If you have a Thermo Fisher Scientific Orbitrap or LTQ  FT, or another  instrument that produces spectra with high mass accuracy, follow the guidelines for Agilent Q-TOF. If you have an ion trap or other  instrument that produces spectra with lower mass accuracy, follow the guidelines for Agilent ion trap.

During searches, the Spectrum Mill software dynamically reduces the batch size as the m/z increases. For variable modifications searches and homology searches, the number of possible combinations rises dramatically with increasing m/z; by dynamically reducing the batch size, the software reduces memory usage.


Reversed Database Search

A reversed database search helps to rule out false positives and allows the software to calculate a false discovery rate. If you obtain similar scores for both forward and reversed searches, there is a higher likelihood of an incorrect assignment.

For a reversed database search, the Spectrum Mill software reverses only the internal portion of the peptide sequences in the database rather than reversing the complete database itself. For example, the peptide:

SAMPLER

is reversed to

SELPMAR

rather than

RELPMAS.

All of these internally-reversed sequences from the database are compared to the MS/MS spectrum and the one that returns the highest score is saved as the reversed database hit. The reversed database hit is not always the reverse of the peptide that matched in the forward search, because a different reversed hit may score higher. That is, all of the possible reversed hits are considered as potential matches for the experimental spectrum.

For spectra with high mass accuracy data, such as Agilent Q-TOF, many sequences will not have a reversed hit.


To Use the MS/MS Search Form

The following topics describe options available on the MS/MS Search form.  In general, you should retain the default settings, except for the options highlighted in red text on the form.

See the rest of this document for more details regarding MS/MS Search.

Search

Data Directories

Search Parameters

Modifications

Search Criteria

The next several topics describe options available in the Search Criteria section of the MS/MS Search form.

Matching Tolerances

Spectral Quality Filtering

Certain Spectral Features calculated by the SM Data Extractor and can be used with multiple downstream SM modules to craft a smaller subset of high value spectra. For more details see Spectral Quality Filtering.

Search Mode

The latter three options apply only in certain homology modes.

Data Files