Sherenga de novo Sequencing
Table of Contents
Introduction
Sherenga is an algorithm for de novo interpretation of MS/MS spectra. Sherenga employs graph theory,
a branch of mathematics which describes how networks can be represented and their properties measured. The
program runs in two stages. In the first stage, the program creates a directed graph. Vertices of the graph correspond
to peaks of the MS/MS spectrum, and edges of the graph correspond to mass differences between peaks. In the second
stage, the program searches for a high-scoring path in the graph. This path corresponds to an amino acid sequence
of a peptide.
For more information, refer to:
Dancik, V.; Clauser, K. R.; Addona, T.A.; Vath, J. E.; Pevzner, P.A. "De novo Peptide Sequencing via Tandem Mass Spectrometry", J. Comp. Biol. 1999, 6, 327-342.
PMID: 10582570
DOI: 10.1089/106652799318300
Learned Parameters
The scoring of the paths is instrument-dependent. These dependencies are captured by learned parameters. These
learned parameters are provided in *.sherc files for ESI ion trap and Q-TOF instruments.
Pre-Processing
Prior to interpreting, MS/MS spectra are pre-processed ("combed"). During combing, isotope peaks are merged
and, if instrument precision allows, fragment charge is determined. Precursor-neutral loss regions are also stripped.
To Use the Sherenga de novo Sequencing Form
The following options are available on the Sherenga de novo Sequencing form. In general,
you should retain the default settings, except for the options highlighted in red text on the form.
Sherenga de novo Sequencing
- Sequence - Click to initiate de novo sequencing. Click this button after you
have either loaded the desired parameter file or manually set the parameters. The name of the current
parameter file appears in red at the top of the form. Once you have saved a parameter file from
this form, you may start the de novo sequencing from a workflow
rather than manually with the Sequence button.
- Save As - Click to save current de novo sequencing settings in a parameter file.
- Load - Click to load a parameter file that contains settings for de novo sequencing.
For default values, select a parameter file from the Defaults
folder.
- Remove all prior Sherenga results - Mark this check box to remove prior de novo results
for this data set.
- Maximum reported hits: Set to the number of de novo interpretations you want for each
spectrum.
- Validation filter - Use this to interpret spectra having only a particular validation setting.
See Peptide Validation.
Show Equivalent Masses
- I/L: Choose whether to display mass differences of 113 as isoleucine or leucine. Since
isoleucine and leucine are isomeric this allows you to choose which amino acid to display in the results.
- K/Q: Choose whether to display mass differences of 128 as lysine, glutamine, or both.
Since lysine and glutamine are isobaric, this allows you to choose which amino acid to display in the
results. Select Both if your instrument has sufficient mass accuracy to discern lysine
(monoisotopic mass 128.09497) versus glutamine (monoisotopic mass 128.05858).
These
are the masses of the amino acids as incorporated into the polypeptide chain (amino acids minus H2O).
Data Directory
Modifications
Sequencing Parameters
- Scoring: Choose the learned parameter file for your instrument.
- Min. vertex score: If you do not want to include interpretations made on the smaller peaks,
raise this threshold. Amino acids interpreted based on a smaller peak will then be replaced by their
mass gaps.
- Sequence tag length: Mark this check box to restrict de novo sequencing to spectra
meeting minimum sequence tag length criteria. The sequence tag length is the length of the longest
path of amino acids that is represented in the MS/MS spectrum.
- Show Advanced / Hide Advanced: Click this button to toggle the display of the Advanced
Parameters (described below).
Advanced Parameters
- Correct precursor m/z via b/y pairs: Use this option to have Sherenga more accurately
determine the precursor ion mass by calculating relationships with b- and y-ions.
- Click Yes if you want Sherenga to determine the precursor ion mass as described above
.
- Click No if you want to use the precursor ion mass from the data file.
- Click Both if you want to use both the original precursor ion mass from the data
file and the precursor ion mass determined by comparison with b- and y-ions.
- Spectrum graph search direction: Use this setting to have Sherenga search the spectrum:
- >>>: from low mass to high mass
- <<<: from high mass to low mass. There is usually less interference from noise when
you search from high to low.
- Both: the default, which calculates both ways and is generally the best choice.
- Vertex stack size: Set to indicate how many scores are reported for each vertex. The
recommended value is 100.
- Allow di- and tri-peptide gaps: Mark this setting to get the most detailed Sherenga
interpretation in the absence of complete fragmentation information. Because MS/MS fragmentation is usually
incomplete, Sherenga typically finds missing vertices (missing mass spectral peaks). When you mark this
check box, the algorithm uses a "gap edge" to compensate for this. A "gap edge" is a sequence of two
or three amino acids whose masses correspond to the distance between two peaks present in the spectrum.
- Penalize for missing ions: Mark this check box if you wish to penalize scores for Sherenga
interpretations where expected ions are missing from the spectrum. When you mark this check box,
the Sherenga score incorporates positive scoring for the ions present in the spectra, as well as negative
scoring for ions that the algorithm expects to see, but does not. For example, Q scores higher than GA
when there is no ion that represents fragmentation between G and A.
- Use intensity thresholds: Mark this check box if you wish to take advantage of Sherenga’s
intelligent thresholding. Clear this check box if you suspect that your spectrum contains secondary ion
types of unusually strong intensities. Because Sherenga uses learned parameters, it knows, for example,
that in ion trap spectra, b- and y-ions generally occur at fairly high intensities. It also
knows that other secondary ion types, such as b-H2O- and y-H2O-ions, generally
occur at low intensities. When you mark this option, you take advantage of Sherenga’s ion-type dependent
thresholding. When you clear this option, you allow for the possibility that some ions in your spectrum
do not fall into the expected intensity ranges.
Data Files
- Spectrum files - Modify this list if you want to process only a subset of the spectra in the
data directory. Wildcards (*) are supported.
Review Sherenga de novo Sequencing Results
Sherenga de novo Sequencing Results can be reviewedon their own or integrated with database search results
using the Spectrum Summary tool in Spectrum Mill.