Spectrum Mill Utility Programs
Table of Contents
Archive Data:
Peptide Selector:
MRM Selector:
Multiple Sequence Aligner:
MS Edman:
MS Digest:
MS Product:
MS Comp:
MS Isotope:
Peptide String Match:
Peptide List to Masses:
To Archive Data
This option lets you archive instrument-created files, search result files, spectral files and data directories.
Archiving data saves disk space and makes it easier and faster to copy data to a
backup drive. Data archiving can also be automated in a
workflow. Archiving folders with large numbers (tens of thousands) of hits
(.spo files) can take some time.
- Under Utilities, click Archive Data.
- Under Data Directories, click Select
and choose the directory whose files and/or folder you intend to archive or un-archive (zip or unzip).
- Select the general categories of files or directory you intend to archive and how you want to archive them.
Below the options in each category, please read the description of what happens when you select an option.
- Instrument-created files
- Ignore instrument data files
- Delete data files after making placeholder
- Spectrum Mill search results files
- Ignore
- Zip results_mstag/*.spo to spo.zip
- Unzip spo.zip to results_mstag/*.spo
- Spectrum Mill spectral files
- Ignore
- Zip cpick_in/*.pkl to pkl.zip
- Unzip pkl.zip to cpick_in/*.pkl
- Spectrum Mill data directories
- Ignore
- Zip dataDir/*.* to dataDir.zip
- Click Archive.
To Use the Peptide Selector Form
Peptide Selector performs theoretical digestions on each protein supplied by accession number or sequence
and then automatically selects from the theoretical peptides those that fit specific filtering criteria.
The most common uses are:
- List only those peptides suited to be synthesized with a stable isotope label and used for quantitation
via a multiple reaction monitoring (MRM) experiment.
- List only those peptides suited for incorporation into an accurate mass inclusion list to be used
in a data-dependent MS/MS experiment.
- List only those peptides expected to give doubly-charged ESI spectra.
- List only the likely detectable forms of all peptides that contain a possible phosphorylation site.
- List only those peptides that contain cysteines.
- List the peptides to expect in specific fractions from isoelectric
focusing by off-gel electrophoresis.
You can use Peptide Selector to create MRM or Q-TOF inclusion lists based on
selection criteria. You may also create Q-TOF MS/MS target lists from prior
results, filtered by unidentified and/or single-peptide-hit proteins.
In many ways, Peptide Selector is similar to MS Digest. Both Peptide Selector and
MS Digest perform automatic protein digestions. The difference is that Peptide Selector additionally
creates limited lists based on specific criteria. You can use it to create an inclusion list for MS/MS analysis.
The following topics describe options available on the Peptide Selector form. If you see settings on the form in
green font, that means that you have marked the check box for Penalize rather
than exclude, and the green text indicates the settings to which that applies.
Selection
- Select - Click to perform a theoretical digestion, calculate the masses of peptides that result,
and select from these the ones that meet the criteria you set in Criteria for Excluding Peptides.
Click the button after you have entered the accession number(s) or sequence of the protein of interest
and have set all parameters (or loaded them from a parameter file). If you have a protein name or partial
sequence but do not know the protein's accession number, use the Spectrum Mill program MS Edman to search
a database and retrieve the accession number.
- Save As - Click to save current settings in a parameter file. (Peptide Selector allows the
use of parameter files, but they cannot be used within an automated workflow.)
- Load - Click to load a parameter file that contains settings for Peptide Selector.
- Hide HTML links - Mark this check box if you want to generate results that are easier to cut
and paste into Microsoft® Excel.
Saved File Parameters
- Generate inclusion list text file - Mark this check box to have Peptide Selector put the results
into an inclusion list (a text file in tab-separated format) for use in data acquisition. When you mark
this check box, more settings appear. Choose one of these formats:
- MassHunter QQQ MRM list
- MassHunter QQQ MRM Optimizer list
- MassHunter Q-TOF MS/MS target list
- MassHunter AM Database
- Xcalibur inclusion list
Valid Results to Filter
When MassHunter Q-TOF MS/MS target list is selected, the target list may be based on validated results from a prior inclusion list run. Mark the From results check box to create the list based on proteins that were not identified or had only a single peptide hit in the selected results.
Make sure the list of protein accession numbers in the Proteins(s) to Select From is the same as the original inclusion list that the results are based on. Select one of the following:
- Unidentified + Single peptide : Includes only peptides from both unidentifed proteins and single peptide hit proteins
- Single peptide : Includes only peptides from proteins that had only one valid peptide
- Unidentified : Includes only peptides from proteins that were not identified in the results
Select one or more data folders that contain the validated results to base the selection on.
IEF: This check box appears when you mark
Generate inclusion or MRM list file. Mark the IEF check box if
you do off-gel electrophoresis and you want to predict the fractions that will
contain peptides of interest. This prediction reduces the number of fractions
you must analyze. IEF selector paper is published online in the proceeding
of IEEE BIBM 2014 conference
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6999136.
Filename: To avoid overwriting an existing file, type a new file name for the inclusion list.
Otherwise, keep the default lastPeptideSelectorResult.txt. This file is found in the
\SpectrumMill\results_msedman folder.
Run time (min): Type an upper limit for the LC retention time you wish to include in the file.
Use this setting to exclude unwanted peaks at the end of the run.
List Type: Choose either m/z or uncharged.
- Uncharged - Choose this to export a list to create an accurate mass RT database for MassHunter Qualitative Analysis.
- m/z - Choose this to export a list to MassHunter Acquisition for Q-TOF.
Precursor Charge: Choose a minimum and maximum precursor charge. For
the ion trap CID fragmentation mode the 2+ charge state is preferred for
peptides because it is most likely to yield b- and y-ions of high
abundance. For Q-TOF instruments, higher charge states can also yield good fragmentation. For the Agilent Q-TOF, the +2 and +3 charge states typically yield an equal number of peptide identifications.
Precursor m/z: Choose a minimum and maximum precursor m/z.
When you are ready to transfer the contents of the text file into the data acquisition software, see
instructions below.
Digest Parameters
Product Ion Parameters
- Show Product Ion Masses - Mark this check box to report product ion masses for each peptide.
The output will report the b2 ion and all y ions. Cleavage sites that are expected to produce
intense product ion signal (N-terminal side of Pro and the C-terminal side of Asp and Glu) will be highlighted.
Criteria for Excluding Peptides
- Max. # basic residues (RHK): Select a maximum number, to exclude sequences with multiple
basic residues.
- Peptide MH+: Select a minimum and maximum MH+.
- AA Composition Filtering
- AAs Required: Requires that candidate peptides from the database contain the
specified amino acid(s). To disable, leave the box blank.
- Disallowed: Requires that candidate peptides from the database do not
contain the specified amino acid(s). To disable, leave the box blank.
- Peptide exclusion criteria: Select options to exclude sequences that may be ambiguous
because certain amino acids could be modified.
- Has nearby cleavage site within n residues - Mark this check box to exclude
all forms of each peptide that contain a enzymatic cleavage site within n residues of
either end of the peptide. Because such a filter does not make sense if greater than 0 missed
cleavages are allowed, the program will generate an error if Maximum # missed cleavages
is greater than 0. The n residues setting also determines how many previous and next
amino acids to display in the output (independent of whether you have marked the check box).
- Contains peptide N-terminal Gln to pyroGlu - Any instance of glutamine at the
N-terminus of a peptide (following digestion) could exist as either normal Gln or as pyroglutamic
acid. Mark this check box to exclude all forms of such peptides. The program then excludes any
peptide with a leading Gln, whether or not you select pyroglutamic acid as a variable modification
(under Modifications in the form).
- Contains protein N-terminus Acetylatable - For any database entry with a Met
at the N-terminus, the N-terminal peptide is considered as either in its original form or in
a form where the Met is removed and the next amino acid is acetylated. Mark this check box to
exclude all forms of such peptides. In other words, you exclude any peptide at the N-terminus
of the protein sequence if the N-terminal residue is a Met. While this post-translational modification
does not occur in bacteria, Peptide Selector does not know that. Furthermore, if the database
curators have removed the N-terminal Met from the sequence, then Peptide Selector does not apply
the acetylation modification nor exclude the unmodified peptide.
- Contains consensus N-linked glycosylation site - Mark this check box to exclude
all forms of each peptide that contains the sequence N-x-[S/T], where x represents any amino
acid, and S/T means either S or T.
- Contains no variable modification - Mark this check box to exclude the unmodified
forms of all peptides that meet all of the other filtering criteria. When you mark this check
box, the the list includes only peptides that contain a variable modification (for example,
phosphopeptides).
- Protein Position Filtering: Restricts the selection based on the position of the peptides
within the protein.
Modifications
Protein(s) to Select From
- Database: Select either User Protein or the name of a database. See
Databases. If you select User Protein, type or paste a
sequence in the User-supplied sequence box.
- User-supplied sequence - Type or paste a sequence as instructed in the form.
Search Mode
- Count Peptide Uniqueness in Database by - Select None to skip this feature. (The
program will run faster, too). Select Sequence to count the number of times each peptide is present
in the entire database, including the accession numbers that from which peptides were selected.
- Species - Select a species if you wish to apply a species filter prior to the uniqueness count.
Scoring
- Penalize rather than exclude - Mark this check box if you are not getting enough peptides
to pass the filter, so you want to penalize, rather than exclude, peptides. When you mark this check box,
additional settings appear below. The settings for which you can apply penalties change to green font.
These settings no longer cause peptides to be excluded.
- Max reported peptides/protein - Type a maximum.
- Excess basic residues - Type a penalty factor.
- Nearby cleavage site - Type a penalty factor - grayed out if the Has nearby cleavage site ... check box is clear.
- Disallowed AA composition - Type a penalty factor - grayed out if AAs Disallowed box is empty.
- Lower relative uniqueness - Type a penalty factor - grayed out if None is selected for Count Peptide Uniqueness ....
Output Features
- Link from Peptide Sequence - Click this link to produce an MS Product report of all
the b/y product ion masses for the peptide.
- Link from #DB entries - Click this link to produce an MS Edman report of all the proteins
in the database that contain the peptide.
- Link from #DB entries, summary line - Click this link to produce a Multiple Sequence
Alignment of all the proteins in the database that contain at least one of the selected peptides. The
alignment will highlight all of the selected peptides.
- Link from Accession Number - Click this link to show a coverage map for the protein
that highlights all the selected peptides.
To transfer the inclusion list into MassHunter software
Transfer to MassHunter Data Acquisition for Triple
Quadrupole
To transfer the inclusion list from the text file into MassHunter Data Acquisition for Triple Quadrupole, do the following:
- Do one of the following:
- When the Peptide Selector results appear, click the blue link to the text file. (Do not click
the link until you see the results.)
- Open the saved text file. The file is in the results_msedman folder within your Spectrum
Mill installation.
- Select the contents of the file (Press CTRL + A on your keyboard).
- Copy the contents of the file (Press CTRL + C on your keyboard).
- Do one of the following:
- Paste directly into MassHunter Data Acquisition and omit the rest of the steps.
- Open Microsoft Excel.
- Paste as text into Excel. (In Excel click Paste > Paste Special, then in the dialog box,
click Text, then click OK.)
- Make any necessary edits.
- (Optional) Save as an Excel file, for future reference.
- Copy the contents from the Excel file and paste into the MRM table in the MassHunter Data Acquisition for Triple Quadrupole.
Import to MassHunter Data Acquisition for Q-TOF
Follow these instructions to import a Q-TOF target list:
- In Peptide Selector, mark Generate inclusion or MRM list file.
- Select MassHunter Q-TOF MS/MS target list.
- In the Filename
text box, type a name of your choice with a .txt suffix.
- From the List Type list, select m/z.
- Click Select.
- In MassHunter Data Acquisition, click the Targeted List tab.
- Right-click the Targeted List table and click Import.
- Import the TXT file from the \\SpectrumMill\results_msedman folder.
Note: The Open dialog box
for importing to MassHunter initially allows only selection of .csv
files. To force the file name selection to list all files, type
"*" and press enter, then select the .txt file.
Export to MassHunter Accurate Mass (AM) Database
Follow the instructions below to
create a file with a .csv suffix containing peptide neutral mass
formulas that can be searched in MassHunter Qualitative Analysis as an
accurate mass database, or used with the Find By Formula
algorithm. This database allows you to verify hits in MassHunter
Qualitative Analysis, especially one-hit wonders.
- In Peptide Selector, mark Generate inclusion or MRM list file.
- Select MassHunter AM database.
- In the Filename
text box, type a name of your choice with a .csv suffix.
- From the List Type list, select uncharged.
- Click Select.
- Copy the .csv file located in the \\SpectrumMill\results_msedman folder to the MassHunter\PCDL folder or MassHunter\databases folder, whichever exists on your system.
- Within MassHunter Qualitative Analysis select the file as the database to search or use it in Find By Formula.
To Use the MRM Selector Form
The MRM Selector uses Spectrum Mill MS/MS Search results to create multiple reaction monitoring (MRM) lists
for triple quadrupole instruments. You select data that has been searched, then filter the results to include
only those peptides that meet certain requirements. (The filters are similar to those in Protein/Peptide Summary.)
The Agilent Q-TOF Data Extractor has been enhanced to output the collision energy (CE), peak apex, and chromatographic
peak width in the specFeatures.tsv file, so you can use these values in dynamic MRM (DMRM) generation.
- You may instead choose to let the program calculate the collision energies based on an equation.
- You have the option to type a chromatographic peak width that will apply for all peaks in the DMRM
analysis. This peak width, which corresponds to the delta RT setting in the MassHunter Data Acquisition
software, is the retention time window for which the MRM transitions are monitored. For example, if you
have a peak apex at 2.5 min and a delta RT of 1.0 min., the MassHunter Data Acquisition software monitors
the MRM transitions for that peak from 2.0 min until 3.0 min.
Select Results for MRM Selection
- Select MRMs - Click to select MRM transitions that meet your criteria. Click this button after
you have either loaded the desired parameter file or manually set the parameters. The name of the current
parameter file appears in red at the top of the form.
- Save As - Click to save current MRM Selector settings in a parameter file. (MRM Selector allows
the use of parameter files, but you do not use MRM Selector within a workflow.)
- Load - Click to load a parameter file that contains settings for MRM Selector.
- Format: Select the format for the output of MRM Selector. Choose one of these formats:
- Agilent Triple Quad DMRM
- Agilent Triple Quad MRM
- Agilent Triple Quad Optimizer
- ABI Triple Quad
- Filter to distinct peptides:
To select only the instance of a particular peptide with the highest MS/MS Search score,
select one of the following:
- Off -- Disables the filtering.
- Case insensitive -- When collapsing to "distinct", a case-insensitive string compare is used, thus peptides with variable modifications (lowercase AA's) and unmodified peptides are combined.
- Case sensitive -- When collapsing to "distinct", a case-sensitive string compare is used, thus peptides with variable modifications (lowercase AA's), different localizations of those variable modifications, and unmodified peptides are kept separate.
- Charge file CS -- When collapsing to "distinct", a case-sensitive string compare is applied to both the sequence and spectrum filename prefix, thus peptides from different LC-MS/MS runs and those with different precursor charges are kept separate.
- Data directories: Click the Select ... button to select a data directory or data directories.
See Selecting Data Directories.
- Search result files: Modify this list if you want to summarize only a subset of the
files in the data directory. Wildcards (*) are supported. To see the names of your search result
files, look in the results_mstag subdirectory under the directory where you placed your raw files.
Validation and Sorting
- Filter results by: Choose a filter. See Peptide Validation.
- Protein grouping method: Determines how proteins are grouped
in certain protein summary modes.
- 1 shared peptide - When a peptide sequence >8 residues long is contained in multiple
protein entries in the sequence database, the software groups the proteins together and then
reports the highest-scoring one and its accession number.
- 1 shared peptide, expand subgroups - The software initially
groups the proteins as described for 1 shared peptide. In some cases when the protein
sequences are grouped in this manner, there are distinct peptides that uniquely represent a
lower-scoring member of the group (isoforms and family members). When you choose 1 shared
peptide, expand subgroups, more than one member of the group is reported and counted towards
the total number of proteins.
- Sort proteins by: Determines how proteins are sorted in the text file.
- Filter by protein score: Includes only proteins that match specified score criteria.
- Top n peptides for MRM: Do one of the following:
- If you want to limit the number of peptides per protein, select Limit to and type
a number in the box.
- Otherwise, select Take all.
- Rank peptides by: Determines how peptides
are ranked so the program can decide the top n peptides. You can choose either the database search score
or the total intensity.
- Sort MRM List by: Determines the order of the peptides in the MRM list.
- Filter peptides by: Permits display of only peptides that match specified criteria.
- Score: Filters by database search score.
- % SPI: Filters by percent scored peak intensity. This is the percentage of
the spectral peak-detected ion current explained by the search interpretation.
- Required AAs: Filters search results so that peptides are shown only if they contain
the required amino acid(s). To disable, select any. See
Amino Acid Filtering.
- Disallowed AAs: Filters search results so that peptides are not shown if they
contain disallowed amino acid(s). To disable, select none. See
Amino Acid Filtering.
- Peptide pI: Filters search results by peptide pI. Fill in a range, or mark the check
box for All. If you wish to use the pI filter for modified peptides, ask your server
administrator to first verify that the pK of the modified amino acid is specified in smconfig.std.xml
or smconfig.custom.xml. Spectrum Mill server administrators may set the pK values
for modifications when they define modifications
(only necessary if the pK values are different from those of the unmodified amino acid).
- Accession #'s: Filters search results by accession numbers. You can type or paste
a list of accession numbers in various formats (space-separated, separated by ‘|’, comma-separated,
etc).
MRM Parameters
- Text file export - Creates a file in the data folder, with the name MRMSelectorExport.#.txt,
where # is a number that increments so that none of the files are overwritten.
- Simple sequence - Mark this check box if you plan to use Dynamic MRM on the Agilent MassHunter
QQQ.
- If you do not mark the check box, the Compound Name column reports the peptide
in an annotated form: <accession #>_sequence_<iontype>, which is properly handled by most other
software, but not by Agilent QQQ DMRM.
- When you mark Simple sequence, MRM Selector reports only the peptide sequence in
the Compound Name column, and reports the accession # and ion type as additional columns.
(Do not paste these additional columns into the QQQ Acquisition table.)
- Top n transitions: Allows you to specify the number of MRM transitions to monitor for
each peptide. The recommended setting is 5; you usually want to end up with 3 good transitions, and monitoring
5 initially allows you to later delete any transitions that show interferences.
- ≥ % of precursor - Restricts the MRM transitions
to those where the product ion m/z's are greater than or equal to the precursor m/z's (for example,
a doubly charged precursor and a singly charged product ion)
- y ions only - Restricts MRM transitions to y-ions.
- Z options: Select one of the following settings:
- Observed precursor/fragment charge only - uses only the precursor and fragment charges
that are available in the data file(s) that you selected on this form, and that pass the filtering
criteria.
- Exhaustive precursor and fragment charges - uses all possible precursor and fragment
charges.
- Highest allowed charge pre/frags only - uses only the precursor and fragment ions
that have the maximum charge, based on the number of basic residues.
- Dwell time (ms): Type the dwell time you want to use for the MRM transitions. This setting
does not appear when you select the Format for Agilent Triple Quad DMRM.
- Use peak width of: Mark the check box and type the value you want to use for the retention
time window for dynamic MRM on an Agilent triple quadrupole system. This setting appears only when you
select the Format for Agilent Triple Quad DMRM.
- Collision Energy: Type the slopes and intercepts that the data acquisition software will use
to calculate the collision energy to produce MS/MS fragmentation for each precursor ion.
- Use actual CE if available - Mark this check box to have the data acquisition software use
the collision energy that it used to fragment each precursor ion in the data file(s) that you selected
in this form. This feature is especially useful when you generate a DMRM list for an Agilent Triple Quadrupole from
Agilent Q-TOF data, because the collision cells are the same.
- Declustering Potential: Type the m/z breakpoint and Potentials. These settings
appear only when you select the Format for ABI Triple Quad.
To transfer the MRM list into MassHunter Data Acquisition software
To transfer the MRM list from the text file into the MassHunter Data Acquisition software:
- Do one of the following:
- Observe the message bar at the bottom right of the browser window. When you no longer see "Waiting
for http:\\<server name>/millscripts/MRMsummaryPP.pl," click the blue link to the text file. (For
large data sets, allow enough time for the program to finish the file creation before you click
the link. Wait until you see "Done" in the message bar at the bottom of the browser window.)
- Open the saved text file. The file is in the data file folder within your Spectrum Mill installation. (If there is more than one folder, the saved file will be in the first one in the list.)
- Select the appropriate contents of the file. If you will copy to Excel, press CTRL + A
on your keyboard to select everything. If you will copy directly into the MassHunter Data Acquisition program,
do not select the Score column. For Dynamic MRM on the Agilent Triple Quadrupole, select only the Compound Name
column.
- Copy the contents of the file (Press CTRL + C on your keyboard).
- Do one of the following:
- Paste directly into MassHunter Data Acquisition and omit the rest of the steps.
- Open Microsoft Excel.
- Paste as text into Excel. (In Excel click Paste > Paste Special, then in the dialog box,
click Text.)
- Make any necessary edits.
- (Optional) Save as an Excel file, for future reference.
- Copy the appropriate contents from the Excel file and paste into the MRM table (for Triple Quadruple) in MassHunter
Data Acquisition.
As of B.04.01, two additional fields are exported which can be used to filter the list in Excel:
- #Proteins - indicates the number of proteins for the peptide
- Accessions - list of protein accessions for the peptide
To Use the Multiple Sequence Aligner Form
The Multiple Sequence Aligner enables alignment and comparison of the amino acid sequences of proteins that
are present in a database. The Spectrum Mill software highlights the amino acids that differ among the sequences.
The software accomplishes the alignment via a transparent interface to ClustalW, a program that is available
from the European Bioinformatics Institute (EBI). Agilent licenses the ClustalW program, and the Spectrum Mill
installation copies it to the millbin folder on the Spectrum Mill server.
Note: If the database is too large (> 4.2 Gb), the alignment does not work properly. In that case,
create a subset database before you do the alignment.
You can also access multiple sequence alignment from the Protein/Peptide Summary form. For more information
about multiple sequence alignment, please see the help for that form.
Align
- Align - Click to initiate the alignment with Clustal W. Click this button after you
have set all parameters.
Database
- Database - Select the name of the database that contains the proteins you wish to align.
- Accession #'s - Type the accession numbers of the proteins that you wish to compare.
Introduction - MS Edman
MS Edman began as a simple utility for specifying a text string (protein name, sequence, accession number)
and retrieving the database entries associated with that string. Since the algorithm used for accomplishing this
is very similar to the way regular expressions are treated with the UNIX® grep command,
the implementation lends itself well to describing the ambiguity often present in data obtained from an Edman
degradation protein sequencing experiment. Additional features, such as peptide mass filtering and tolerance for
mismatched amino acids, have since been added.
Search Mode - MS Edman
Sequence Only
MS Edman finds amino acid sequences in the selected database that match the regular
expression entered.
In this mode the sequence should be in CAPITAL LETTERS.
Sequence and Mass
MS Edman first finds amino acid sequences in the selected database that match the regular
expression entered, then filters those sequences to eliminate those not containing one of the specified peptide
mass WITHIN the sequence. Hence, not all of the specified sequence must be contained in the region defined
by the mass. Thus, residues outside of the peptide in question could be specified (unless done when specifying
No enzyme, since the cleavage rule may prevent matching in such cases).
In this mode the sequence should be in CAPITAL LETTERS.
Name, Accession Number or Species
If the search mode is set to Name, Accession Number or Species, the search only examines
the relevant field of the database entry's FASTA-formatted comment line. In the Name mode, you should type
one or more regular expressions; the case of letters is ignored. In the Accession
Number mode, you should type one or more accession numbers (NOT regular expressions). Again, the case of letters
is ignored. In the Species mode, you should type one or more species from the database .sl file. The output
will be a list of the entries which match anything in the input list. The list of entries can be
saved and searched by a different Spectrum Mill program such as PMF Search or
MS/MS Search.
Regular Expressions - MS Edman
Square brackets have special meaning in a regular expression. The regular expressions used are of the form
used by the UNIX grep facility. Examples (type man grep on a UNIX system for full details):
[EF] |
The amino acid is either E or F.
|
[^EF] |
The amino acid is anything but E or F. |
. |
Any single amino acid is possible. |
.* |
Used to represent a sequence of one or more unknown amino acids. Note that this is "dot-star"
not just "star". This wildcard allows some not entirely obvious features. A match is to the longest
sequence fitting the condition (ex: FMQ .*K will find the last K in the sequence following FMQ).
In Sequene is matched first and then a mHIN the sequence is found. Hence, not all of
the specified sequence must be contained in the region defined by the mass. Thus, residues outside
of the peptide in question could be specified (unless done when specifying No enzyme, since
the cleavage rule may prevent matching in such cases). |
Mismatched AA's - MS Edman
By setting the Max. # of Mismatched AA's parameter to a value other than 0, homologous sequences
can be matched. This is done by allowing a number of positions, as determined by this parameter, not to match
protein sequences in the database. This parameter is active in the following search modes:
- Sequence Only
- Sequence and Mass
To Use the MS Edman Form
MS Edman allows you to search text fields (such as sequence, name, accession number or species)
in protein databases. MS Edman can help identify a protein if you know only the molecular weight
of a tryptic fragment and some of the amino acid composition. MS Edman is also the first step when you want to
create a specialized subset database of entries that match your text search criteria. For example, you could use
MS Edman to find all proteins that contain a certain amino acid sequence. If you marked the Save hits to file
check box on the MS Edman form, you could then use Protein Databases to create a subset database from these saved
results.
The following topics describe options available on the MS Edman form.
To return to default settings on the MS Edman page, click the Spectrum Mill button to go to the
Spectrum Mill home page. Then click the link on the home page to go back to the MS Edman page.
Search
- Start Search - Click to initiate a text search of a sequence database. Click this button
after you have set all parameters.
- Maximum reported hits: Set to the maximum number of hits you want for each search.
- Display combinatorial peptide output - Mark this check box if you have an ambiguous sequence
and you want to extract from the database the distribution of amino acids at a particular position.
For example, if you had a sequence SPXK (where X was unknown) and you wanted to know the distribution
of possible values of X in a proteome, you would mark this check box and then search for SP.K.
The results would list which of the residues could be substituted for X, along with the number of each
found in the database.
- Sample ID: Type your sample name or other identifier.
Search Parameters
- Database: Select a database. See Databases.
- Species: Choose a species if you want to narrow the search possibilities and to accelerate
searches. Please see the list of species definitions
that ship with the software, as some definitions do not encompass all possible members. Retain
the default of All to search the entire database. Be aware that because of inconsistencies in
the way species information is organized in different databases, the Spectrum Mill workbench cannot read
about 10% of the species information in NCBInr, and cannot read any of the species information in trEMBL.
See Species Filtering.
- Search hits from file: Mark this check box to search hits saved from a previous search.
Type the filename of the saved hits. See Saving Hits.
- Save hits to file: Mark this check box to save your search hits. Type a
filename.
- DNA frame translation: See Frame Translation in DNA databases.
Note that this setting appears only if you select a DNA database.
- Digest: Select the enzyme used for the proteolytic digestion. See
Enzyme Specificity / Missed Cleavages.
- Maximum # of mismatched AA's: See Mismatched AA's - MS Edman.
- MW of protein: Type the molecular weight range for your protein, or mark the All
check box to search the entire database. See Intact Protein MW Filtering.
Modifications
- Click the Choose... button to select modifications appropriate for your sample. See
Choosing Modifications. The
molecular weight of the protein includes any Fixed modifications.
Remove any Fixed modifications if you do not want their masses
included.
Search Mode
- Search mode: See Search Mode - MS Edman. Your choice
of search mode determines which options are displayed below.
- Regular expression: If you chose Sequence Only or Sequence and Mass search
mode, type text here. See Regular Expressions - MS Edman.
- Filter by peptide mass: If you chose the Sequence and Mass search mode, you type
masses to filter the possible sequences to eliminate those that do not contain the specified peptide
mass(es) within the sequence. See Search Mode - MS Edman.
- Mass (m/z) - Type an m/z value. See Mass (m/z).
- Charge (z) - Type a charge. See Mass (m/z).
- Mass tolerance: See Mass Tolerance.
- Mass(es) are: See Mass Type.
- Name Regular Expression List, Accession Number List, Species List: If you chose Name
or Accession Number, or Species search mode, type the appropriate text in the box.
See Search Mode - MS Edman.
Digestion of a User Supplied Sequence - MS Digest
To use MS Digest to digest a user supplied sequence:
- Select User Protein as the Database option.
- Read the instructions in the Protein sequence box.
- Paste or type the sequence in the Protein sequence box.
- Set the other MS Digest parameters as appropriate.
- Click the Digest button.
Database Entry Retrieval Method - MS Digest
It is possible to retrieve entries from the database by specifying either the Accession Number or the
Index Number. The accession number is a unique identifier for a protein within the database. It will not
change between subsequent revisions of the database and is external to the Spectrum Mill package. The index number
for a particular protein is internal to the Spectrum Mill package and is likely to change when you update the
database. Both the index number and the accession number are reported in Spectrum Mill search results. Entries
are generally more efficiently retrieved using index numbers.
To Use the MS Digest Form
MS Digest performs theoretical digestions and calculates masses of peptides that result. The program
accepts both user proteins and sequences from databases.
The following topics describe options available on the MS Digest form.
To return to default settings on the MS Digest page, click the Spectrum Mill button to go to
the Spectrum Mill home page. Then click the link on the home page to go back to the MS Digest page.
Digest
- Digest - Click to perform a theoretical digestion and calculate the masses of peptides that
result. Click the button after you have set all parameters.
- Report multiple charges - The MS Digest output usually lists only singly-charged peptides.
Mark this check box if you want the output to also include multiply charged peptides. The maximum
number of charges a peptide can have is based on the number of basic amino acids in the peptide.
- Hide protein sequence - The complete protein sequence is normally displayed in the MS Digest
output. Mark this check box to disable this display.
- Show only protein sequence - The list of peptides resulting from a digest is normally displayed
in the MS Digest output. Mark this check box to disable this display.
- Hide HTML links - The outputs from Spectrum Mill programs usually contain links to other Spectrum
Mill programs and internet pages (general features of links from program
output). Mark this check box to disable these links if you want to reduce network traffic.
Protein
Modifications
Database
- Database: Select either User Protein or the name of a database. See
Databases. If you select User Protein, type a sequence
in the Protein sequence box, and ignore Retrieve database entry by and Database accession
number. If you select a database name, select an option for Retrieve database entry by.
- Retrieve database entry by: Select Accession Number and type an accession number
below, or select Index Number and type an index number below. See
Database Entry Retrieval Method - MS Digest.
- Database accession number: Type an accession number.
- MS Digest index number: Type an index number.
- Protein sequence - Type the sequence as instructed in the form. If you wish to include a user-specified
amino acid, use a lower-case "u" and fill in the elemental composition for User- specified amino acid
(u). See User-Specified Amino Acid.
- User-specified amino acid (u): If you wish to include a user-specified amiid, fill in
the elemental composition here. Specify it as a lower-case "u" in the Protein sequence box
below.
Fragment Ion Types - MS Product
Check the boxes next to each ion type to list the corresponding fragment ions masses in the MS Product output.
The default ion-types are those generally seen in MS/MS spectra. Supported ion types include:
Ion type |
Restrictions |
a, b, y |
no restrictions |
a-NH3, b-NH3, y-NH3 |
ion contains R, K, or Q |
b-H2O |
ion contains S or T |
b+H2O |
ion contains R, H, or K; only bn-1, bn-2 ( length n) |
a-H3PO4, b-H3PO4, y-H3PO4 |
ion contains phosphorylated S,T |
b-SOCH4, y-SOCH4 |
ion contains oxidized M |
internal b |
<800 Da |
internal a |
<800 Da, internal b present |
internal b-H2O |
<800 Da, internal b present, ion contains S or T |
internal b-NH3 |
<800 Da, ion contains R |
N-term ladder |
removal of N-term residues (y equiv.) |
C-term ladder |
removal of C term residues (b+H2O equiv.) |
To Use the MS Product Form
MS Product calculates theoretical ion masses from peptides which undergo dissociation via post-source
decay or high- or low-energy collision-induced dissociation.
The following topics describe options available on the MS Product form.
To return to default settings on the MS Product page, click the Spectrum Mill button to go to
the Spectrum Mill home page. Then click the link on the home page to go back to the MS Product page.
Fragmentation
- Fragment - Click to calculate theoretical ion masses of peptides. Click it after you
have set all parameters.
- Calculate masses as: See Mass Type.
- Maximum reported charge: Set to the highest charge state you expect from your instrument.
Peptide Sequence
The variable modifications kmqsty are defined by default for MS Product. But if you select some other modification
of K, M, Q, S, T, or Y (for example, guanidination of K), then that modification is used instead. That is, the
default kmqsty modifications are defined in addition to whatever variable modifications you selected, but any
selected variable modifications have priority.
Modifications
Product Ion Types
- (some types require presence of specific AAs in ion) Mark check boxes for appropriate
ion types for your instrument. The default ion types are those generally seen in MALDI PSD spectra.
See Fragment Ion Types - MS Product.
- User-specified AA elemental composition (u): If you wish to include a user-specified
amino acid, fill in the elemental composition here. Specify it as a lower-case "u" in the Enter
sequence box above.
Combination Type - MS Comp
Amino Acid
Lists the amino acid combinations consistent with the search conditions. Some of these will have identical elemental
compositions.
Peptide Elemental
Lists the unique elemental compositions from the list of amino acid combinations reported by the Amino Acid
option.
Elemental
Lists the elemental compositions consistent with the search conditions. The elemental compositions returned will
obey the nitrogen rule and will have a double bond equivalent
within the range expected for a peptide. The elemental compositions are, however, not guaranteed to have corresponding
peptides. This option will work at much higher mass than the first two options.
Nitrogen Rule - MS Comp
The nitrogen rule states that for an organic compound with even number of nitrogens (including 0), the nominal
mass of the molecular ion will be even. Note that this rule was first observed for EI spectra of small molecules,
where the molecular ion is not protonated. Hence the rule for peptides is that the nominal mass for an MH+ equivalent
must be odd.
The nitrogen rule stems from the fact that most of the common elements that have even nominal masses have even
valence:
12C, valence = 4;
16O, valence = 2;
28Si, valence = 4;
32S, valence = 2.
On the other hand most of the elements with odd nominal masses have odd valence:
1H, valence = 1;
19F, valence = 1;
31P, valence = 3;
35Cl, valence = 1.
Nitrogen is an exception in that it has an even nominal mass but an odd valence:
14N, valence = 3.
Double Bond Equivalent - MS Comp
The double bond equivalent (DBE) is the number of rings or double bonds that an ion contains. It can be calculated
from the elemental formula as follows:
DBE = 1 - a/2 + c/2 + d
where:
a = number of atoms with a valence of 1 (H, F, Cl).
b = number of atoms with a valence of 2 (O, S).
c = number of atoms with a valence of 3 (N, P).
d = number of atoms with a valence of 4 (C, Si).
If the value calculated ends in 0.5, then this should be subtracted to get the true value.
Amino Acid |
DBE |
Elemental Formula |
Calculation |
A |
1.0 |
C3 H5 N1 O1 |
3 - 5/2 + 1/2 |
C |
1.0 |
C3 H5 N1 O1 S1 |
3 - 5/2 + 1/2 |
D |
2.0 |
C4 H5 N1 O3 |
4 - 5/2 + 1/2 |
E |
2.0 |
C5 H7 N1 O3 |
5 - 7/2 + 1/2 |
F |
5.0 |
C9 H9 N1 O1 |
9 - 9/2 + 1/2 |
G |
1.0 |
C2 H3 N1 O1 |
2 - 3/2 + 1/2 |
H |
4.0 |
C6 H7 N3 O1 |
6 - 7/2 + 3/2 |
I |
1.0 |
C6 H11 N1 O1 |
6 - 11/2 + 1/2 |
K |
2.0 |
C6 H12 N2 O1 |
6 - 12/2 + 2/2 |
L |
3.0 |
C6 H11 N1 O1 |
6 - 11/2 + 1/2 |
M |
1.0 |
C5 H9 N1 O1 S1 |
5 - 9/2 + 1/2 |
N |
3.0 |
C4 H6 N2 O2 |
4 - 6/2 + 2/2 |
P |
2.0 |
C5 H7 N1 O1 |
5 - 7/2 + 1/2 |
Q |
3.0 |
C5 H8 N2 O2 |
5 - 8/2 + 2/2 |
R |
2.0 |
C6 H12 N4 O1 |
6 - 12/2 + 4/2 |
S |
1.0 |
C3 H5 N1 O2 |
3 - 5/2 + 1/2 |
T |
1.0 |
C4 H7 N1 O2 |
4 - 7/2 + 1/2 |
V |
1.0 |
C5 H9 N1 O1 |
5 - 9/2 + 1/2 |
W |
8.0 |
C11 H10 N2 O1 |
11 - 10/2 + 2/2 |
Y |
5.0 |
C9 H9 N1 O2 |
9 - 9/2 + 1/2 |
The terminal groups and cation then contribute H3O to the overall elemental formula, reducing the
DBE by 1.5. Also, there is one to add on from the original formula.
To Use the MS Comp Form
MS Comp fills in possible amino acid compositions for a peptide, given a peptide mass and partial composition
determined from immonium ions present in MS/MS spectra.
To return to default settings on the MS Comp page, click the Spectrum Mill button to go to the
Spectrum Mill home page. Then click the link on the home page to go back to the MS Comp page.
Compositions
- Composition - Click to calculate possible amino acid composition. Click it after you
have set all parameters.
- Maximum reported compositions: Type the maximum number of compositions you wish to report.
If in the calculation this number is exceeded, you will see an error message rather than a partial list.
- Combination type: See Combination Type - MS Comp.
Peptide
- m/z: Type the peptide m/z. See Mass (m/z).
- Da +/- - Type the error for the peptide m/z. Choose units of either
Da or ppm. See Mass Tolerances.
- Charge (z): Select the charge that corresponds with the peptide m/z.
- m/z is: See Mass Type.
- Ion types: You can select one or more possible ion types for your m/z value.
The report will list all the possibilities for each ion type in turn.
AA Composition
- Based on immonium and related ions - Mark check boxes for the masses you observe in
your spectrum.
- Based on loss from precursor ion - Mark check boxes for losses you observe in your spectrum.
Amino Acids
- Absent amino acids (Check to prevent possible inclusion) - Mark check boxes for amino acids
you know are absent from your sample.
Modifications
- User defined amino acid: Mark the check box to define your own amino acid. Then fill in the
elemental composition. See user-specified amino acid.
- Click the Choose... button to select modifications appropriate for your sample. See
Choosing Modifications.
To Use the MS Isotope Form
MS Isotope calculates and displays isotope patterns of peptides. The following topics describe options
available on the MS Isotope form.
To return to default settings on the MS Isotope page, click the Spectrum Mill button to go to
the Spectrum Mill home page. Then click the link on the home page to go back to the MS Isotope page.
Isotope Distribution
- Calculate - Click to calculate isotope distribution. Click it after you have set all
parameters.
- Calculate masses as: See Mass Type.
- Show detailed report - Mark this check box to additionally display the contribution of individual
isotopes to the overall percentage of each isotopic mass.
- Peptide sequence - Click this option if you wish to calculate the isotope distribution for
an amino acid sequence. Then type the information described below under Peptide Sequence. The
output shows both the isotopic distribution of the amino acid sequence you typed, and that of "averagine"
for the same precursor mass. The averagine cluster shows the mass distribution you get if you assume
that the peptide is made up of "average" amino acids. The elemental composition of averagine is C 4.9384
H 7.7583 N 1.3577 O 1.4773 S 0.0417. (See Senko MW, Beu SC, McLafferty FW, "Determination of monoisotopic
masses and ion populations for large biomolecules from resolved isotopic distributions," J Am Soc
Mass Spectrom 1995, 6:229-233.)
- Elemental composition - Click this option if you wish to calculate the isotope distribution
for an elemental composition. Then type the information described below under Elemental Composition.
Peptide Sequence
- Sequence - Enter the sequence as instructed in the form. If you wish to include a user-specified
amino acid, use a lower-case "u" and fill in the elemental composition at the bottom of the form.
See User-Specified Amino Acid.
- User-specified AA elemental composition (u): If you wish to include a user-specified amino
acid, fill in the elemental composition here. Specify it as a lower-case "u" in the Sequence
box above.
Elemental Composition
- Enter the elemental composition. Fill in a number for each atomic species.
Modifications
To Use the Peptide String Match Form
Peptide String Match finds peptides that contain a specific sequence of amino acids.
To use this tool:
- Under Utilities, click Peptide String Match.
- Under Database, select the database for which you want
to find peptides that match a given peptide sequence.
- Set options as described below.
- Click Find Peptides.
Filters
- Require Preceding Tryptic Site
- Subset of Accession #'s to search - Type or
paste the accession numbers. Leave the box empty if you want to
find all proteins in the database that match your criteria.
Output Features
- All gene symbols - Supported only for UniProt,
SwissProt and IPI databases
- All Matched Peptides - Select to show the
peptide sequences that were matched.
- Species
- All accession numbers - Select to show the
protein accession numbers for the peptides that were matched.
- All Start AA - Select to show the
locations of the peptide sequence in the proteins.
- Link to ClustalW Multi-Aligner - Provides a
link to align the sequences of the matched proteins with the input
peptide highlighted.
- Link to Database Website
- Show flanking sequence around variable
modification sites - Select this option to indicate the variable
modification site motif within the number of residues indicated,
and for the selected modifications. Output will center on the
specified lower-case amino acid in the sequence and will show the
designated number of leading and trailing residues in the protein
sequence. The flanking residues need not be present in the input
peptide. The output is intended for downstream use with motif logo
generation programs.
Enter Peptide Sequences
- Type or paste the sequences you wish to match. Indicate
variable modifications with a lower-case letter. String matching is
case-insensitive. Regular expressions are also allowed.
To Use the Peptide List to Masses Form
Peptide List to Masses calculates the masses and formulas for a set of peptides that you specify.
Calculate Masses
- Calculate masses as:
- Monoisotopic - Select to calculate masses based on one isotope
- Monoisotopic no e- - Select to calculate masses based on one isotope with no charge
- Average - Select to calculate masses based on the average mass of the isotopes present
- Reported precursor charge Min: Max: -
Type a minimum and maximum charge for the specified peptides
- Limited actual charge by:
- RKH present - Select if arginine, histidine and lysine are present.
- RKHQN present - Select if glutamine and asparagine are present.
- Above min/max - Select if the limited actual charge minimum and maximum are the same as the reported ones.
- Calculate - Click to calculate the masses and formulas for the specified peptides with or without modifications.
Modifications
Peptide Sequences
- Enter the peptide sequences whose masses and formulas you want to calculate. Follow the instructions above the text box.