Sherenga de novo Sequencing

Table of Contents

Introduction

Sherenga is an algorithm for de novo interpretation of MS/MS spectra. Sherenga employs graph theory, a branch of mathematics which describes how networks can be represented and their properties measured. The program runs in two stages. In the first stage, the program creates a directed graph. Vertices of the graph correspond to peaks of the MS/MS spectrum, and edges of the graph correspond to mass differences between peaks. In the second stage, the program searches for a high-scoring path in the graph. This path corresponds to an amino acid sequence of a peptide.

For more information, refer to:
Dancik, V.; Clauser, K. R.; Addona, T.A.; Vath, J. E.; Pevzner, P.A. "De novo Peptide Sequencing via Tandem Mass Spectrometry", J. Comp. Biol. 1999, 6, 327-342.
PMID: 10582570
DOI: 10.1089/106652799318300

Learned Parameters

The scoring of the paths is instrument-dependent. These dependencies are captured by learned parameters. These learned parameters are provided in *.sherc files for ESI ion trap and Q-TOF instruments.

Pre-Processing

Prior to interpreting, MS/MS spectra are pre-processed ("combed"). During combing, isotope peaks are merged and, if instrument precision allows, fragment charge is determined. Precursor-neutral loss regions are also stripped.

To Use the Sherenga de novo Sequencing Form

The following options are available on the Sherenga de novo Sequencing form. In general, you should retain the default settings, except for the options highlighted in red text on the form.

Sherenga de novo Sequencing

Sequence - Click to initiate de novo sequencing. Click this button after you have either loaded the desired parameter file or manually set the parameters. The name of the current parameter file appears in red at the top of the form. Once you have saved a parameter file from this form, you may start the de novo sequencing from a workflow rather than manually with the Sequence button.
Save As - Click to save current de novo sequencing settings in a parameter file.
Load - Click to load a parameter file that contains settings for de novo sequencing. For default values, select a parameter file from the Defaults folder.
Remove all prior Sherenga results - Mark this check box to remove prior de novo results for this data set.
Maximum reported hits: Set to the number of de novo interpretations you want for each spectrum.
Validation filter - Use this to interpret spectra having only a particular validation setting. See Peptide Validation.

Show Equivalent Masses

I/L: Choose whether to display mass differences of 113 as isoleucine or leucine. Since isoleucine and leucine are isomeric this allows you to choose which amino acid to display in the results.
K/Q: Choose whether to display mass differences of 128 as lysine, glutamine, or both. Since lysine and glutamine are isobaric, this allows you to choose which amino acid to display in the results. Select Both if your instrument has sufficient mass accuracy to discern lysine (monoisotopic mass 128.09497) versus glutamine (monoisotopic mass 128.05858).

₂

Data Directory

Click the Select ... button to select a data directory. See Selecting Data Directories.

Modifications

Click the Choose... button to select modifications appropriate for your sample. See Choosing Modifications.

Sequencing Parameters

Scoring: Choose the learned parameter file for your instrument.
Min. vertex score: If you do not want to include interpretations made on the smaller peaks, raise this threshold. Amino acids interpreted based on a smaller peak will then be replaced by their mass gaps.
Sequence tag length: Mark this check box to restrict de novo sequencing to spectra meeting minimum sequence tag length criteria. The sequence tag length is the length of the longest path of amino acids that is represented in the MS/MS spectrum.
Show Advanced / Hide Advanced: Click this button to toggle the display of the Advanced Parameters (described below).

Advanced Parameters

Correct precursor m/z via b/y pairs: Use this option to have Sherenga more accurately determine the precursor ion mass by calculating relationships with b- and y-ions.
- Click Yes if you want Sherenga to determine the precursor ion mass as described above .
- Click No if you want to use the precursor ion mass from the data file.
- Click Both if you want to use both the original precursor ion mass from the data file and the precursor ion mass determined by comparison with b- and y-ions.
Spectrum graph search direction: Use this setting to have Sherenga search the spectrum:
- >>>: from low mass to high mass
- <<<: from high mass to low mass. There is usually less interference from noise when you search from high to low.
- Both: the default, which calculates both ways and is generally the best choice.
Vertex stack size: Set to indicate how many scores are reported for each vertex. The recommended value is 100.
Allow di- and tri-peptide gaps: Mark this setting to get the most detailed Sherenga interpretation in the absence of complete fragmentation information. Because MS/MS fragmentation is usually incomplete, Sherenga typically finds missing vertices (missing mass spectral peaks). When you mark this check box, the algorithm uses a "gap edge" to compensate for this. A "gap edge" is a sequence of two or three amino acids whose masses correspond to the distance between two peaks present in the spectrum.
Penalize for missing ions: Mark this check box if you wish to penalize scores for Sherenga interpretations where expected ions are missing from the spectrum. When you mark this check box, the Sherenga score incorporates positive scoring for the ions present in the spectra, as well as negative scoring for ions that the algorithm expects to see, but does not. For example, Q scores higher than GA when there is no ion that represents fragmentation between G and A.
Use intensity thresholds: Mark this check box if you wish to take advantage of Sherenga’s intelligent thresholding. Clear this check box if you suspect that your spectrum contains secondary ion types of unusually strong intensities. Because Sherenga uses learned parameters, it knows, for example, that in ion trap spectra, b- and y-ions generally occur at fairly high intensities. It also knows that other secondary ion types, such as b-H₂O- and y-H₂O-ions, generally occur at low intensities. When you mark this option, you take advantage of Sherenga’s ion-type dependent thresholding. When you clear this option, you allow for the possibility that some ions in your spectrum do not fall into the expected intensity ranges.

Data Files

Spectrum files - Modify this list if you want to process only a subset of the spectra in the data directory. Wildcards (*) are supported.

Review Sherenga de novo Sequencing Results

Sherenga de novo Sequencing Results can be reviewedon their own or integrated with database search results using the Spectrum Summary tool in Spectrum Mill.