Spectrum Mill Utility Programs

Archiving data saves disk space and makes it easier and faster to copy data to a backup drive. Data archiving can also be automated in a workflow. Archiving folders with large numbers (tens of thousands) of hits (.spo files) can take some time.

Under Utilities, click Archive Data.
Under Data Directories, click Select and choose the directory whose files and/or folder you intend to archive or un-archive (zip or unzip).
Select the general categories of files or directory you intend to archive and how you want to archive them.
Below the options in each category, please read the description of what happens when you select an option.

Instrument-created files

Ignore instrument data files
Delete data files after making placeholder

Spectrum Mill search results files

Ignore
Zip results_mstag/*.spo to spo.zip
Unzip spo.zip to results_mstag/*.spo

Spectrum Mill spectral files

Ignore
Zip cpick_in/*.pkl to pkl.zip
Unzip pkl.zip to cpick_in/*.pkl

Spectrum Mill data directories

Ignore
Zip dataDir/*.* to dataDir.zip

Click Archive.

To Use the Peptide Selector Form

Peptide Selector performs theoretical digestions on each protein supplied by accession number or sequence and then automatically selects from the theoretical peptides those that fit specific filtering criteria. The most common uses are:

List only those peptides suited to be synthesized with a stable isotope label and used for quantitation via a multiple reaction monitoring (MRM) experiment.
List only those peptides suited for incorporation into an accurate mass inclusion list to be used in a data-dependent MS/MS experiment.
List only those peptides expected to give doubly-charged ESI spectra.
List only the likely detectable forms of all peptides that contain a possible phosphorylation site.
List only those peptides that contain cysteines.
List the peptides to expect in specific fractions from isoelectric focusing by off-gel electrophoresis.

You can use Peptide Selector to create MRM or Q-TOF inclusion lists based on selection criteria. You may also create Q-TOF MS/MS target lists from prior results, filtered by unidentified and/or single-peptide-hit proteins.

In many ways, Peptide Selector is similar to MS Digest. Both Peptide Selector and MS Digest perform automatic protein digestions. The difference is that Peptide Selector additionally creates limited lists based on specific criteria. You can use it to create an inclusion list for MS/MS analysis.

The following topics describe options available on the Peptide Selector form. If you see settings on the form in green font, that means that you have marked the check box for Penalize rather than exclude, and the green text indicates the settings to which that applies.

Selection

Select - Click to perform a theoretical digestion, calculate the masses of peptides that result, and select from these the ones that meet the criteria you set in Criteria for Excluding Peptides. Click the button after you have entered the accession number(s) or sequence of the protein of interest and have set all parameters (or loaded them from a parameter file). If you have a protein name or partial sequence but do not know the protein's accession number, use the Spectrum Mill program MS Edman to search a database and retrieve the accession number.
Save As - Click to save current settings in a parameter file. (Peptide Selector allows the use of parameter files, but they cannot be used within an automated workflow.)
Load - Click to load a parameter file that contains settings for Peptide Selector.
Hide HTML links - Mark this check box if you want to generate results that are easier to cut and paste into Microsoft^®Excel.

Saved File Parameters

Generate inclusion list text file - Mark this check box to have Peptide Selector put the results into an inclusion list (a text file in tab-separated format) for use in data acquisition. When you mark this check box, more settings appear. Choose one of these formats:

MassHunter QQQ MRM list
MassHunter QQQ MRM Optimizer list
MassHunter Q-TOF MS/MS target list
MassHunter AM Database
Xcalibur inclusion list

Valid Results to Filter

When MassHunter Q-TOF MS/MS target list is selected, the target list may be based on validated results from a prior inclusion list run. Mark the From results check box to create the list based on proteins that were not identified or had only a single peptide hit in the selected results.

Make sure the list of protein accession numbers in the Proteins(s) to Select From is the same as the original inclusion list that the results are based on. Select one of the following:

Unidentified + Single peptide : Includes only peptides from both unidentifed proteins and single peptide hit proteins
Single peptide : Includes only peptides from proteins that had only one valid peptide
Unidentified : Includes only peptides from proteins that were not identified in the results

Select one or more data folders that contain the validated results to base the selection on.

IEF: This check box appears when you mark Generate inclusion or MRM list file. Mark the IEF check box if you do off-gel electrophoresis and you want to predict the fractions that will contain peptides of interest. This prediction reduces the number of fractions you must analyze. IEF selector paper is published online in the proceeding of IEEE BIBM 2014 conference http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6999136.

Filename: To avoid overwriting an existing file, type a new file name for the inclusion list. Otherwise, keep the default lastPeptideSelectorResult.txt. This file is found in the \SpectrumMill\results_msedman folder.

Run time (min): Type an upper limit for the LC retention time you wish to include in the file. Use this setting to exclude unwanted peaks at the end of the run.

List Type: Choose either m/z or uncharged.

Uncharged - Choose this to export a list to create an accurate mass RT database for MassHunter Qualitative Analysis.
m/z - Choose this to export a list to MassHunter Acquisition for Q-TOF.

Precursor Charge: Choose a minimum and maximum precursor charge. For the ion trap CID fragmentation mode the 2+ charge state is preferred for peptides because it is most likely to yield b- and y-ions of high abundance. For Q-TOF instruments, higher charge states can also yield good fragmentation. For the Agilent Q-TOF, the +2 and +3 charge states typically yield an equal number of peptide identifications.

Precursor m/z: Choose a minimum and maximum precursor m/z.

When you are ready to transfer the contents of the text file into the data acquisition software, see instructions below.

Digest Parameters

Digest: Select the enzyme used for the proteolytic digestion. See Enzyme Specificity / Missed Cleavages.
Maximum # missed cleavages: Set the maximum number of missed enzymatic cleavages. See Enzyme Specificity / Missed Cleavages.

Product Ion Parameters

Show Product Ion Masses - Mark this check box to report product ion masses for each peptide. The output will report the b₂ ion and all y ions. Cleavage sites that are expected to produce intense product ion signal (N-terminal side of Pro and the C-terminal side of Asp and Glu) will be highlighted.

Criteria for Excluding Peptides

Max. # basic residues (RHK): Select a maximum number, to exclude sequences with multiple basic residues.
Peptide MH⁺: Select a minimum and maximum MH⁺.
AA Composition Filtering
- AAs Required: Requires that candidate peptides from the database contain the specified amino acid(s). To disable, leave the box blank.
- Disallowed: Requires that candidate peptides from the database do not contain the specified amino acid(s). To disable, leave the box blank.
Peptide exclusion criteria: Select options to exclude sequences that may be ambiguous because certain amino acids could be modified.
- Has nearby cleavage site within n residues - Mark this check box to exclude all forms of each peptide that contain a enzymatic cleavage site within n residues of either end of the peptide. Because such a filter does not make sense if greater than 0 missed cleavages are allowed, the program will generate an error if Maximum # missed cleavages is greater than 0. The n residues setting also determines how many previous and next amino acids to display in the output (independent of whether you have marked the check box).
- Contains peptide N-terminal Gln to pyroGlu - Any instance of glutamine at the N-terminus of a peptide (following digestion) could exist as either normal Gln or as pyroglutamic acid. Mark this check box to exclude all forms of such peptides. The program then excludes any peptide with a leading Gln, whether or not you select pyroglutamic acid as a variable modification (under Modifications in the form).
- Contains protein N-terminus Acetylatable - For any database entry with a Met at the N-terminus, the N-terminal peptide is considered as either in its original form or in a form where the Met is removed and the next amino acid is acetylated. Mark this check box to exclude all forms of such peptides. In other words, you exclude any peptide at the N-terminus of the protein sequence if the N-terminal residue is a Met. While this post-translational modification does not occur in bacteria, Peptide Selector does not know that. Furthermore, if the database curators have removed the N-terminal Met from the sequence, then Peptide Selector does not apply the acetylation modification nor exclude the unmodified peptide.
- Contains consensus N-linked glycosylation site - Mark this check box to exclude all forms of each peptide that contains the sequence N-x-[S/T], where x represents any amino acid, and S/T means either S or T.
- Contains no variable modification - Mark this check box to exclude the unmodified forms of all peptides that meet all of the other filtering criteria. When you mark this check box, the the list includes only peptides that contain a variable modification (for example, phosphopeptides).
Protein Position Filtering: Restricts the selection based on the position of the peptides within the protein.

Modifications

Click the Choose... button to select modifications appropriate for your sample. See Choosing Modifications.

Protein(s) to Select From

Database: Select either User Protein or the name of a database. See Databases. If you select User Protein, type or paste a sequence in the User-supplied sequence box.
User-supplied sequence - Type or paste a sequence as instructed in the form.

Search Mode

Count Peptide Uniqueness in Database by - Select None to skip this feature. (The program will run faster, too). Select Sequence to count the number of times each peptide is present in the entire database, including the accession numbers that from which peptides were selected.
Species - Select a species if you wish to apply a species filter prior to the uniqueness count.

Scoring

Penalize rather than exclude - Mark this check box if you are not getting enough peptides to pass the filter, so you want to penalize, rather than exclude, peptides. When you mark this check box, additional settings appear below. The settings for which you can apply penalties change to green font. These settings no longer cause peptides to be excluded.

Max reported peptides/protein - Type a maximum.
Excess basic residues - Type a penalty factor.
Nearby cleavage site - Type a penalty factor - grayed out if the Has nearby cleavage site ... check box is clear.
Disallowed AA composition - Type a penalty factor - grayed out if AAs Disallowed box is empty.
Lower relative uniqueness - Type a penalty factor - grayed out if None is selected for Count Peptide Uniqueness ....

Output Features

Link from Peptide Sequence - Click this link to produce an MS Product report of all the b/y product ion masses for the peptide.
Link from #DB entries - Click this link to produce an MS Edman report of all the proteins in the database that contain the peptide.
Link from #DB entries, summary line - Click this link to produce a Multiple Sequence Alignment of all the proteins in the database that contain at least one of the selected peptides. The alignment will highlight all of the selected peptides.
Link from Accession Number - Click this link to show a coverage map for the protein that highlights all the selected peptides.

To transfer the inclusion list into MassHunter software

Transfer to MassHunter Data Acquisition for Triple Quadrupole

To transfer the inclusion list from the text file into MassHunter Data Acquisition for Triple Quadrupole, do the following:

Do one of the following:

When the Peptide Selector results appear, click the blue link to the text file. (Do not click the link until you see the results.)
Open the saved text file. The file is in the results_msedman folder within your Spectrum Mill installation.

Select the contents of the file (Press CTRL + A on your keyboard).
Copy the contents of the file (Press CTRL + C on your keyboard).
Do one of the following:

Paste directly into MassHunter Data Acquisition and omit the rest of the steps.
Open MicrosoftExcel.

Paste as text into Excel. (In Excel click Paste > Paste Special, then in the dialog box, click Text, then click OK.)
Make any necessary edits.
(Optional) Save as an Excel file, for future reference.
Copy the contents from the Excel file and paste into the MRM table in the MassHunter Data Acquisition for Triple Quadrupole.

Import to MassHunter Data Acquisition for Q-TOF

Follow these instructions to import a Q-TOF target list:

In Peptide Selector, mark Generate inclusion or MRM list file.
Select MassHunter Q-TOF MS/MS target list.
In the Filename text box, type a name of your choice with a .txt suffix.
From the List Type list, select m/z.
Click Select.
In MassHunter Data Acquisition, click the Targeted List tab.
Right-click the Targeted List table and click Import.
Import the TXT file from the \\SpectrumMill\results_msedman folder.

Note: The Open dialog box for importing to MassHunter initially allows only selection of .csv files. To force the file name selection to list all files, type "*" and press enter, then select the .txt file.

Export to MassHunter Accurate Mass (AM) Database

Follow the instructions below to create a file with a .csv suffix containing peptide neutral mass formulas that can be searched in MassHunter Qualitative Analysis as an accurate mass database, or used with the Find By Formula algorithm. This database allows you to verify hits in MassHunter Qualitative Analysis, especially one-hit wonders.

In Peptide Selector, mark Generate inclusion or MRM list file.
Select MassHunter AM database.
In the Filename text box, type a name of your choice with a .csv suffix.
From the List Type list, select uncharged.
Click Select.
Copy the .csv file located in the \\SpectrumMill\results_msedman folder to the MassHunter\PCDL folder or MassHunter\databases folder, whichever exists on your system.
Within MassHunter Qualitative Analysis select the file as the database to search or use it in Find By Formula.

To Use the MRM Selector Form

The MRM Selector uses Spectrum Mill MS/MS Search results to create multiple reaction monitoring (MRM) lists for triple quadrupole instruments. You select data that has been searched, then filter the results to include only those peptides that meet certain requirements. (The filters are similar to those in Protein/Peptide Summary.)

The Agilent Q-TOF Data Extractor has been enhanced to output the collision energy (CE), peak apex, and chromatographic peak width in the specFeatures.tsv file, so you can use these values in dynamic MRM (DMRM) generation.

You may instead choose to let the program calculate the collision energies based on an equation.
You have the option to type a chromatographic peak width that will apply for all peaks in the DMRM analysis. This peak width, which corresponds to the delta RT setting in the MassHunter Data Acquisition software, is the retention time window for which the MRM transitions are monitored. For example, if you have a peak apex at 2.5 min and a delta RT of 1.0 min., the MassHunter Data Acquisition software monitors the MRM transitions for that peak from 2.0 min until 3.0 min.

Select Results for MRM Selection

Select MRMs - Click to select MRM transitions that meet your criteria. Click this button after you have either loaded the desired parameter file or manually set the parameters. The name of the current parameter file appears in red at the top of the form.
Save As - Click to save current MRM Selector settings in a parameter file. (MRM Selector allows the use of parameter files, but you do not use MRM Selector within a workflow.)
Load - Click to load a parameter file that contains settings for MRM Selector.
Format: Select the format for the output of MRM Selector. Choose one of these formats:

Agilent Triple Quad DMRM
Agilent Triple Quad MRM
Agilent Triple Quad Optimizer
ABI Triple Quad

Filter to distinct peptides: To select only the instance of a particular peptide with the highest MS/MS Search score, select one of the following:
- Off -- Disables the filtering.
- Case insensitive -- When collapsing to "distinct", a case-insensitive string compare is used, thus peptides with variable modifications (lowercase AA's) and unmodified peptides are combined.
- Case sensitive -- When collapsing to "distinct", a case-sensitive string compare is used, thus peptides with variable modifications (lowercase AA's), different localizations of those variable modifications, and unmodified peptides are kept separate.
- Charge file CS -- When collapsing to "distinct", a case-sensitive string compare is applied to both the sequence and spectrum filename prefix, thus peptides from different LC-MS/MS runs and those with different precursor charges are kept separate.
Data directories: Click the Select ... button to select a data directory or data directories. See Selecting Data Directories.
Search result files: Modify this list if you want to summarize only a subset of the files in the data directory. Wildcards (*) are supported. To see the names of your search result files, look in the results_mstag subdirectory under the directory where you placed your raw files.

Validation and Sorting

Filter results by: Choose a filter. See Peptide Validation.
Protein grouping method: Determines how proteins are grouped in certain protein summary modes.
- 1 shared peptide - When a peptide sequence >8 residues long is contained in multiple protein entries in the sequence database, the software groups the proteins together and then reports the highest-scoring one and its accession number.
- 1 shared peptide, expand subgroups - The software initially groups the proteins as described for 1 shared peptide. In some cases when the protein sequences are grouped in this manner, there are distinct peptides that uniquely represent a lower-scoring member of the group (isoforms and family members). When you choose 1 shared peptide, expand subgroups, more than one member of the group is reported and counted towards the total number of proteins.
Sort proteins by: Determines how proteins are sorted in the text file.
Filter by protein score: Includes only proteins that match specified score criteria.
Top n peptides for MRM: Do one of the following:

If you want to limit the number of peptides per protein, select Limit to and type a number in the box.
Otherwise, select Take all.

Rank peptides by: Determines how peptides are ranked so the program can decide the top n peptides. You can choose either the database search score or the total intensity.
Sort MRM List by: Determines the order of the peptides in the MRM list.
Filter peptides by: Permits display of only peptides that match specified criteria.

Score: Filters by database search score.
% SPI: Filters by percent scored peak intensity. This is the percentage of the spectral peak-detected ion current explained by the search interpretation.
Required AAs: Filters search results so that peptides are shown only if they contain the required amino acid(s). To disable, select any. See Amino Acid Filtering.
Disallowed AAs: Filters search results so that peptides are not shown if they contain disallowed amino acid(s). To disable, select none. See Amino Acid Filtering.
Peptide pI: Filters search results by peptide pI. Fill in a range, or mark the check box for All. If you wish to use the pI filter for modified peptides, ask your server administrator to first verify that the pK of the modified amino acid is specified in smconfig.std.xml or smconfig.custom.xml. Spectrum Mill server administrators may set the pK values for modifications when they define modifications (only necessary if the pK values are different from those of the unmodified amino acid).
Accession #'s: Filters search results by accession numbers. You can type or paste a list of accession numbers in various formats (space-separated, separated by ‘|’, comma-separated, etc).

MRM Parameters

Text file export - Creates a file in the data folder, with the name MRMSelectorExport.#.txt, where # is a number that increments so that none of the files are overwritten.
Simple sequence - Mark this check box if you plan to use Dynamic MRM on the Agilent MassHunter QQQ.

If you do not mark the check box, the Compound Name column reports the peptide in an annotated form: <accession #>_sequence_<iontype>, which is properly handled by most other software, but not by Agilent QQQ DMRM.
When you mark Simple sequence, MRM Selector reports only the peptide sequence in the Compound Name column, and reports the accession # and ion type as additional columns. (Do not paste these additional columns into the QQQ Acquisition table.)

Top n transitions: Allows you to specify the number of MRM transitions to monitor for each peptide. The recommended setting is 5; you usually want to end up with 3 good transitions, and monitoring 5 initially allows you to later delete any transitions that show interferences.
≥ % of precursor - Restricts the MRM transitions to those where the product ion m/z's are greater than or equal to the precursor m/z's (for example, a doubly charged precursor and a singly charged product ion)
y ions only - Restricts MRM transitions to y-ions.
Z options: Select one of the following settings:

Observed precursor/fragment charge only - uses only the precursor and fragment charges that are available in the data file(s) that you selected on this form, and that pass the filtering criteria.
Exhaustive precursor and fragment charges - uses all possible precursor and fragment charges.
Highest allowed charge pre/frags only - uses only the precursor and fragment ions that have the maximum charge, based on the number of basic residues.

Dwell time (ms): Type the dwell time you want to use for the MRM transitions. This setting does not appear when you select the Format for Agilent Triple Quad DMRM.
Use peak width of: Mark the check box and type the value you want to use for the retention time window for dynamic MRM on an Agilent triple quadrupole system. This setting appears only when you select the Format for Agilent Triple Quad DMRM.
Collision Energy: Type the slopes and intercepts that the data acquisition software will use to calculate the collision energy to produce MS/MS fragmentation for each precursor ion.
Use actual CE if available - Mark this check box to have the data acquisition software use the collision energy that it used to fragment each precursor ion in the data file(s) that you selected in this form. This feature is especially useful when you generate a DMRM list for an Agilent Triple Quadrupole from Agilent Q-TOF data, because the collision cells are the same.
Declustering Potential: Type the m/z breakpoint and Potentials. These settings appear only when you select the Format for ABI Triple Quad.

To transfer the MRM list into MassHunter Data Acquisition software

To transfer the MRM list from the text file into the MassHunter Data Acquisition software:

Do one of the following:

Observe the message bar at the bottom right of the browser window. When you no longer see "Waiting for http:\\<server name>/millscripts/MRMsummaryPP.pl," click the blue link to the text file. (For large data sets, allow enough time for the program to finish the file creation before you click the link. Wait until you see "Done" in the message bar at the bottom of the browser window.)
Open the saved text file. The file is in the data file folder within your Spectrum Mill installation. (If there is more than one folder, the saved file will be in the first one in the list.)

Select the appropriate contents of the file. If you will copy to Excel, press CTRL + A on your keyboard to select everything. If you will copy directly into the MassHunter Data Acquisition program, do not select the Score column. For Dynamic MRM on the Agilent Triple Quadrupole, select only the Compound Name column.
Copy the contents of the file (Press CTRL + C on your keyboard).
Do one of the following:

Paste directly into MassHunter Data Acquisition and omit the rest of the steps.
Open MicrosoftExcel.

Paste as text into Excel. (In Excel click Paste > Paste Special, then in the dialog box, click Text.)
Make any necessary edits.
(Optional) Save as an Excel file, for future reference.
Copy the appropriate contents from the Excel file and paste into the MRM table (for Triple Quadruple) in MassHunter Data Acquisition.

As of B.04.01, two additional fields are exported which can be used to filter the list in Excel:

#Proteins - indicates the number of proteins for the peptide
Accessions - list of protein accessions for the peptide

To Use the Multiple Sequence Aligner Form

The Multiple Sequence Aligner enables alignment and comparison of the amino acid sequences of proteins that are present in a database. The Spectrum Mill software highlights the amino acids that differ among the sequences.

The software accomplishes the alignment via a transparent interface to ClustalW, a program that is available from the European Bioinformatics Institute (EBI). Agilent licenses the ClustalW program, and the Spectrum Mill installation copies it to the millbin folder on the Spectrum Mill server.

Note: If the database is too large (> 4.2 Gb), the alignment does not work properly. In that case, create a subset database before you do the alignment.

You can also access multiple sequence alignment from the Protein/Peptide Summary form. For more information about multiple sequence alignment, please see the help for that form.

Align

Align - Click to initiate the alignment with Clustal W. Click this button after you have set all parameters.

Database

Database - Select the name of the database that contains the proteins you wish to align.
Accession #'s - Type the accession numbers of the proteins that you wish to compare.

Introduction - MS Edman

MS Edman began as a simple utility for specifying a text string (protein name, sequence, accession number) and retrieving the database entries associated with that string. Since the algorithm used for accomplishing this is very similar to the way regular expressions are treated with the UNIX® grep command, the implementation lends itself well to describing the ambiguity often present in data obtained from an Edman degradation protein sequencing experiment. Additional features, such as peptide mass filtering and tolerance for mismatched amino acids, have since been added.

Search Mode - MS Edman

Sequence Only
MS Edman finds amino acid sequences in the selected database that match the regular expression entered.
In this mode the sequence should be in CAPITAL LETTERS.

Sequence and Mass
MS Edman first finds amino acid sequences in the selected database that match the regular expression entered, then filters those sequences to eliminate those not containing one of the specified peptide mass WITHIN the sequence. Hence, not all of the specified sequence must be contained in the region defined by the mass. Thus, residues outside of the peptide in question could be specified (unless done when specifying No enzyme, since the cleavage rule may prevent matching in such cases).
In this mode the sequence should be in CAPITAL LETTERS.

Name, Accession Number or Species
If the search mode is set to Name, Accession Number or Species, the search only examines the relevant field of the database entry's FASTA-formatted comment line. In the Name mode, you should type one or more regular expressions; the case of letters is ignored. In the Accession Number mode, you should type one or more accession numbers (NOT regular expressions). Again, the case of letters is ignored. In the Species mode, you should type one or more species from the database .sl file. The output will be a list of the entries which match anything in the input list. The list of entries can be saved and searched by a different Spectrum Mill program such as PMF Search or MS/MS Search.

Regular Expressions - MS Edman

Square brackets have special meaning in a regular expression. The regular expressions used are of the form used by the UNIX grep facility. Examples (type man grep on a UNIX system for full details):

[EF]	The amino acid is either E or F.
[^EF]	The amino acid is anything but E or F.
.	Any single amino acid is possible.
.*	Used to represent a sequence of one or more unknown amino acids. Note that this is "dot-star" not just "star". This wildcard allows some not entirely obvious features. A match is to the longest sequence fitting the condition (ex: FMQ .K will find the last K in the sequence following FMQ). In Sequene is matched first and then a mHIN the sequence is found. Hence, not all of the specified sequence must be contained in the region defined by the mass. Thus, residues outside of the peptide in question could be specified (unless done when specifying No enzyme*, since the cleavage rule may prevent matching in such cases).

Mismatched AA's - MS Edman

By setting the Max. # of Mismatched AA's parameter to a value other than 0, homologous sequences can be matched. This is done by allowing a number of positions, as determined by this parameter, not to match protein sequences in the database. This parameter is active in the following search modes:

Sequence Only
Sequence and Mass

To Use the MS Edman Form

MS Edman allows you to search text fields (such as sequence, name, accession number or species) in protein databases. MS Edman can help identify a protein if you know only the molecular weight of a tryptic fragment and some of the amino acid composition. MS Edman is also the first step when you want to create a specialized subset database of entries that match your text search criteria. For example, you could use MS Edman to find all proteins that contain a certain amino acid sequence. If you marked the Save hits to file check box on the MS Edman form, you could then use Protein Databases to create a subset database from these saved results.

The following topics describe options available on the MS Edman form.

To return to default settings on the MS Edman page, click the Spectrum Mill button to go to the Spectrum Mill home page. Then click the link on the home page to go back to the MS Edman page.

Search

Start Search - Click to initiate a text search of a sequence database. Click this button after you have set all parameters.
Maximum reported hits: Set to the maximum number of hits you want for each search.
Display combinatorial peptide output - Mark this check box if you have an ambiguous sequence and you want to extract from the database the distribution of amino acids at a particular position. For example, if you had a sequence SPXK (where X was unknown) and you wanted to know the distribution of possible values of X in a proteome, you would mark this check box and then search for SP.K. The results would list which of the residues could be substituted for X, along with the number of each found in the database.
Sample ID: Type your sample name or other identifier.

Search Parameters

Database: Select a database. See Databases.
Species: Choose a species if you want to narrow the search possibilities and to accelerate searches. Please see the list of species definitions that ship with the software, as some definitions do not encompass all possible members. Retain the default of All to search the entire database. Be aware that because of inconsistencies in the way species information is organized in different databases, the Spectrum Mill workbench cannot read about 10% of the species information in NCBInr, and cannot read any of the species information in trEMBL. See Species Filtering.
Search hits from file: Mark this check box to search hits saved from a previous search. Type the filename of the saved hits. See Saving Hits.
Save hits to file: Mark this check box to save your search hits. Type a filename.
DNA frame translation: See Frame Translation in DNA databases. Note that this setting appears only if you select a DNA database.
Digest: Select the enzyme used for the proteolytic digestion. See Enzyme Specificity / Missed Cleavages.
Maximum # of mismatched AA's: See Mismatched AA's - MS Edman.
MW of protein: Type the molecular weight range for your protein, or mark the All check box to search the entire database. See Intact Protein MW Filtering.

Modifications

Click the Choose... button to select modifications appropriate for your sample. See Choosing Modifications. The molecular weight of the protein includes any Fixed modifications. Remove any Fixed modifications if you do not want their masses included.

Search Mode

Search mode: See Search Mode - MS Edman. Your choice of search mode determines which options are displayed below.
Regular expression: If you chose Sequence Only or Sequence and Mass search mode, type text here. See Regular Expressions - MS Edman.
Filter by peptide mass: If you chose the Sequence and Mass search mode, you type masses to filter the possible sequences to eliminate those that do not contain the specified peptide mass(es) within the sequence. See Search Mode - MS Edman.
Mass (m/z) - Type an m/z value. See Mass (m/z).
Charge (z) - Type a charge. See Mass (m/z).
Mass tolerance: See Mass Tolerance.
Mass(es) are: See Mass Type.
Name Regular Expression List, Accession Number List, Species List: If you chose Name or Accession Number, or Species search mode, type the appropriate text in the box. See Search Mode - MS Edman.

Digestion of a User Supplied Sequence - MS Digest

To use MS Digest to digest a user supplied sequence:

Select User Protein as the Database option.
Read the instructions in the Protein sequence box.
Paste or type the sequence in the Protein sequence box.
Set the other MS Digest parameters as appropriate.
Click the Digest button.

Database Entry Retrieval Method - MS Digest

It is possible to retrieve entries from the database by specifying either the Accession Number or the Index Number. The accession number is a unique identifier for a protein within the database. It will not change between subsequent revisions of the database and is external to the Spectrum Mill package. The index number for a particular protein is internal to the Spectrum Mill package and is likely to change when you update the database. Both the index number and the accession number are reported in Spectrum Mill search results. Entries are generally more efficiently retrieved using index numbers.

To Use the MS Digest Form

MS Digest performs theoretical digestions and calculates masses of peptides that result. The program accepts both user proteins and sequences from databases.

The following topics describe options available on the MS Digest form.

To return to default settings on the MS Digest page, click the Spectrum Mill button to go to the Spectrum Mill home page. Then click the link on the home page to go back to the MS Digest page.

Digest

Digest - Click to perform a theoretical digestion and calculate the masses of peptides that result. Click the button after you have set all parameters.
Report multiple charges - The MS Digest output usually lists only singly-charged peptides. Mark this check box if you want the output to also include multiply charged peptides. The maximum number of charges a peptide can have is based on the number of basic amino acids in the peptide.
Hide protein sequence - The complete protein sequence is normally displayed in the MS Digest output. Mark this check box to disable this display.
Show only protein sequence - The list of peptides resulting from a digest is normally displayed in the MS Digest output. Mark this check box to disable this display.
Hide HTML links - The outputs from Spectrum Mill programs usually contain links to other Spectrum Mill programs and internet pages (general features of links from program output). Mark this check box to disable these links if you want to reduce network traffic.

Protein

Digest: Select the enzyme used for the proteolytic digestion. See Enzyme Specificity / Missed Cleavages.
Maximum # missed cleavages: Set the maximum number of missed enzymatic cleavages. See Enzyme Specificity / Missed Cleavages.
Reading frame: See Frame Translation in DNA databases. Note that this setting appears only if you select a DNA database.
Calculate masses as: See Mass Type.

Modifications

Click the Choose... button to select modifications appropriate for your sample. See Choosing Modifications.

Database

Database: Select either User Protein or the name of a database. See Databases. If you select User Protein, type a sequence in the Protein sequence box, and ignore Retrieve database entry by and Database accession number. If you select a database name, select an option for Retrieve database entry by.
Retrieve database entry by: Select Accession Number and type an accession number below, or select Index Number and type an index number below. See Database Entry Retrieval Method - MS Digest.
Database accession number: Type an accession number.
MS Digest index number: Type an index number.
Protein sequence - Type the sequence as instructed in the form. If you wish to include a user-specified amino acid, use a lower-case "u" and fill in the elemental composition for User- specified amino acid (u). See User-Specified Amino Acid.
User-specified amino acid (u): If you wish to include a user-specified amiid, fill in the elemental composition here. Specify it as a lower-case "u" in the Protein sequence box below.

Fragment Ion Types - MS Product

Check the boxes next to each ion type to list the corresponding fragment ions masses in the MS Product output. The default ion-types are those generally seen in MS/MS spectra. Supported ion types include:

Ion type	Restrictions
a, b, y	no restrictions
a-NH₃, b-NH₃, y-NH₃	ion contains R, K, or Q
b-H₂O	ion contains S or T
b+H₂O	ion contains R, H, or K; only b_n-1, b_n-2 ( length n)
a-H₃PO₄, b-H₃PO₄, y-H₃PO₄	ion contains phosphorylated S,T
b-SOCH₄, y-SOCH₄	ion contains oxidized M
internal b	<800 Da
internal a	<800 Da, internal b present
internal b-H₂O	<800 Da, internal b present, ion contains S or T
internal b-NH₃	<800 Da, ion contains R
N-term ladder	removal of N-term residues (y equiv.)
C-term ladder	removal of C term residues (b+H₂O equiv.)

To Use the MS Product Form

MS Product calculates theoretical ion masses from peptides which undergo dissociation via post-source decay or high- or low-energy collision-induced dissociation.

The following topics describe options available on the MS Product form.

To return to default settings on the MS Product page, click the Spectrum Mill button to go to the Spectrum Mill home page. Then click the link on the home page to go back to the MS Product page.

Fragmentation

Fragment - Click to calculate theoretical ion masses of peptides. Click it after you have set all parameters.
Calculate masses as: See Mass Type.
Maximum reported charge: Set to the highest charge state you expect from your instrument.

Peptide Sequence

Enter sequence - Type the sequence as described in the form. If you wish to include a user-specified amino acid, use a lower-case "u" and fill in the User-specified AA elemental composition at the bottom of the form. See User-Specified Amino Acid.

Note that in addition to single-letter capitalized abbreviations for the 20 amino acids, you can type the following lower-case abbreviations for modified amino acids:

Designation	Modified amino acid
k	Carbamylated lysine
m	Methionine sulfoxide
q	Pyroglutamic acid (only at N-terminus of peptide)
s	Phosphorylated serine
t	Phosphorylated threonine
y	Phosphorylated tyrosine

The variable modifications kmqsty are defined by default for MS Product. But if you select some other modification of K, M, Q, S, T, or Y (for example, guanidination of K), then that modification is used instead. That is, the default kmqsty modifications are defined in addition to whatever variable modifications you selected, but any selected variable modifications have priority.

Modifications

Click the Choose... button to select modifications appropriate for your sample. See Choosing Modifications.

Product Ion Types

(some types require presence of specific AAs in ion) Mark check boxes for appropriate ion types for your instrument. The default ion types are those generally seen in MALDI PSD spectra. See Fragment Ion Types - MS Product.
User-specified AA elemental composition (u): If you wish to include a user-specified amino acid, fill in the elemental composition here. Specify it as a lower-case "u" in the Enter sequence box above.

Combination Type - MS Comp

Amino Acid
Lists the amino acid combinations consistent with the search conditions. Some of these will have identical elemental compositions.

Peptide Elemental
Lists the unique elemental compositions from the list of amino acid combinations reported by the Amino Acid option.

Elemental
Lists the elemental compositions consistent with the search conditions. The elemental compositions returned will obey the nitrogen rule and will have a double bond equivalent within the range expected for a peptide. The elemental compositions are, however, not guaranteed to have corresponding peptides. This option will work at much higher mass than the first two options.

Nitrogen Rule - MS Comp

The nitrogen rule states that for an organic compound with even number of nitrogens (including 0), the nominal mass of the molecular ion will be even. Note that this rule was first observed for EI spectra of small molecules, where the molecular ion is not protonated. Hence the rule for peptides is that the nominal mass for an MH+ equivalent must be odd.

The nitrogen rule stems from the fact that most of the common elements that have even nominal masses have even valence:

¹²C, valence = 4;
¹⁶O, valence = 2;
²⁸Si, valence = 4;
³²S, valence = 2.

On the other hand most of the elements with odd nominal masses have odd valence:

¹H, valence = 1;
¹⁹F, valence = 1;
³¹P, valence = 3;
³⁵Cl, valence = 1.

Nitrogen is an exception in that it has an even nominal mass but an odd valence:

¹⁴N, valence = 3.

Double Bond Equivalent - MS Comp

The double bond equivalent (DBE) is the number of rings or double bonds that an ion contains. It can be calculated from the elemental formula as follows:

DBE = 1 - a/2 + c/2 + d

where:

a = number of atoms with a valence of 1 (H, F, Cl).

b = number of atoms with a valence of 2 (O, S).

c = number of atoms with a valence of 3 (N, P).

d = number of atoms with a valence of 4 (C, Si).

If the value calculated ends in 0.5, then this should be subtracted to get the true value.

Amino Acid	DBE	Elemental Formula	Calculation
A	1.0	C3 H5 N1 O1	3 - 5/2 + 1/2
C	1.0	C3 H5 N1 O1 S1	3 - 5/2 + 1/2
D	2.0	C4 H5 N1 O3	4 - 5/2 + 1/2
E	2.0	C5 H7 N1 O3	5 - 7/2 + 1/2
F	5.0	C9 H9 N1 O1	9 - 9/2 + 1/2
G	1.0	C2 H3 N1 O1	2 - 3/2 + 1/2
H	4.0	C6 H7 N3 O1	6 - 7/2 + 3/2
I	1.0	C6 H11 N1 O1	6 - 11/2 + 1/2
K	2.0	C6 H12 N2 O1	6 - 12/2 + 2/2
L	3.0	C6 H11 N1 O1	6 - 11/2 + 1/2
M	1.0	C5 H9 N1 O1 S1	5 - 9/2 + 1/2
N	3.0	C4 H6 N2 O2	4 - 6/2 + 2/2
P	2.0	C5 H7 N1 O1	5 - 7/2 + 1/2
Q	3.0	C5 H8 N2 O2	5 - 8/2 + 2/2
R	2.0	C6 H12 N4 O1	6 - 12/2 + 4/2
S	1.0	C3 H5 N1 O2	3 - 5/2 + 1/2
T	1.0	C4 H7 N1 O2	4 - 7/2 + 1/2
V	1.0	C5 H9 N1 O1	5 - 9/2 + 1/2
W	8.0	C11 H10 N2 O1	11 - 10/2 + 2/2
Y	5.0	C9 H9 N1 O2	9 - 9/2 + 1/2

The terminal groups and cation then contribute H₃O to the overall elemental formula, reducing the DBE by 1.5. Also, there is one to add on from the original formula.

To Use the MS Comp Form

MS Comp fills in possible amino acid compositions for a peptide, given a peptide mass and partial composition determined from immonium ions present in MS/MS spectra.

To return to default settings on the MS Comp page, click the Spectrum Mill button to go to the Spectrum Mill home page. Then click the link on the home page to go back to the MS Comp page.

Compositions

Composition - Click to calculate possible amino acid composition. Click it after you have set all parameters.
Maximum reported compositions: Type the maximum number of compositions you wish to report. If in the calculation this number is exceeded, you will see an error message rather than a partial list.
Combination type: See Combination Type - MS Comp.

Peptide

m/z: Type the peptide m/z. See Mass (m/z).
Da +/- - Type the error for the peptide m/z. Choose units of either Da or ppm. See Mass Tolerances.
Charge (z): Select the charge that corresponds with the peptide m/z.
m/z is: See Mass Type.
Ion types: You can select one or more possible ion types for your m/z value. The report will list all the possibilities for each ion type in turn.

AA Composition

Based on immonium and related ions - Mark check boxes for the masses you observe in your spectrum.
Based on loss from precursor ion - Mark check boxes for losses you observe in your spectrum.

Amino Acids

Absent amino acids (Check to prevent possible inclusion) - Mark check boxes for amino acids you know are absent from your sample.

Modifications

User defined amino acid: Mark the check box to define your own amino acid. Then fill in the elemental composition. See user-specified amino acid.
Click the Choose... button to select modifications appropriate for your sample. See Choosing Modifications.

To Use the MS Isotope Form

MS Isotope calculates and displays isotope patterns of peptides. The following topics describe options available on the MS Isotope form.

To return to default settings on the MS Isotope page, click the Spectrum Mill button to go to the Spectrum Mill home page. Then click the link on the home page to go back to the MS Isotope page.

Isotope Distribution

Calculate - Click to calculate isotope distribution. Click it after you have set all parameters.
Calculate masses as: See Mass Type.
Show detailed report - Mark this check box to additionally display the contribution of individual isotopes to the overall percentage of each isotopic mass.
Peptide sequence - Click this option if you wish to calculate the isotope distribution for an amino acid sequence. Then type the information described below under Peptide Sequence. The output shows both the isotopic distribution of the amino acid sequence you typed, and that of "averagine" for the same precursor mass. The averagine cluster shows the mass distribution you get if you assume that the peptide is made up of "average" amino acids. The elemental composition of averagine is C 4.9384 H 7.7583 N 1.3577 O 1.4773 S 0.0417. (See Senko MW, Beu SC, McLafferty FW, "Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions," J Am Soc Mass Spectrom 1995, 6:229-233.)
Elemental composition - Click this option if you wish to calculate the isotope distribution for an elemental composition. Then type the information described below under Elemental Composition.

Peptide Sequence

Sequence - Enter the sequence as instructed in the form. If you wish to include a user-specified amino acid, use a lower-case "u" and fill in the elemental composition at the bottom of the form. See User-Specified Amino Acid.
User-specified AA elemental composition (u): If you wish to include a user-specified amino acid, fill in the elemental composition here. Specify it as a lower-case "u" in the Sequence box above.

Elemental Composition

Enter the elemental composition. Fill in a number for each atomic species.

Modifications

Click the Choose... button to select modifications appropriate for your sample. See Choosing Modifications.

To Use the Peptide String Match Form

Peptide String Match finds peptides that contain a specific sequence of amino acids.

To use this tool:

Under Utilities, click Peptide String Match.
Under Database, select the database for which you want to find peptides that match a given peptide sequence.
Set options as described below.
Click Find Peptides.

Filters

Require Preceding Tryptic Site
Subset of Accession #'s to search - Type or paste the accession numbers. Leave the box empty if you want to find all proteins in the database that match your criteria.

Output Features

All gene symbols - Supported only for UniProt, SwissProt and IPI databases
All Matched Peptides - Select to show the peptide sequences that were matched.
Species
All accession numbers - Select to show the protein accession numbers for the peptides that were matched.
All Start AA - Select to show the locations of the peptide sequence in the proteins.
Link to ClustalW Multi-Aligner - Provides a link to align the sequences of the matched proteins with the input peptide highlighted.
Link to Database Website
Show flanking sequence around variable modification sites - Select this option to indicate the variable modification site motif within the number of residues indicated, and for the selected modifications. Output will center on the specified lower-case amino acid in the sequence and will show the designated number of leading and trailing residues in the protein sequence. The flanking residues need not be present in the input peptide. The output is intended for downstream use with motif logo generation programs.

Enter Peptide Sequences

Type or paste the sequences you wish to match. Indicate variable modifications with a lower-case letter. String matching is case-insensitive. Regular expressions are also allowed.

To Use the Peptide List to Masses Form

Peptide List to Masses calculates the masses and formulas for a set of peptides that you specify.

Calculate Masses

Calculate masses as:

Monoisotopic - Select to calculate masses based on one isotope
Monoisotopic no e- - Select to calculate masses based on one isotope with no charge
Average - Select to calculate masses based on the average mass of the isotopes present

Reported precursor charge Min: Max: - Type a minimum and maximum charge for the specified peptides
Limited actual charge by:

RKH present - Select if arginine, histidine and lysine are present.
RKHQN present - Select if glutamine and asparagine are present.
Above min/max - Select if the limited actual charge minimum and maximum are the same as the reported ones.

Calculate - Click to calculate the masses and formulas for the specified peptides with or without modifications.

Modifications

Click the Choose... button to select modifications appropriate for your sample. See Choosing Modifications.

Peptide Sequences

Enter the peptide sequences whose masses and formulas you want to calculate. Follow the instructions above the text box.

Spectrum Mill Utility Programs

Table of Contents

To Archive Data

To Use the Peptide Selector Form

Selection

Saved File Parameters

Digest Parameters

Product Ion Parameters

Criteria for Excluding Peptides

Modifications

Protein(s) to Select From

Search Mode

Scoring

Output Features

To transfer the inclusion list into MassHunter software

Transfer to MassHunter Data Acquisition for Triple Quadrupole

Import to MassHunter Data Acquisition for Q-TOF

Export to MassHunter Accurate Mass (AM) Database

To Use the MRM Selector Form

Select Results for MRM Selection

Validation and Sorting

MRM Parameters

To transfer the MRM list into MassHunter Data Acquisition software

To Use the Multiple Sequence Aligner Form

Align

Database

Introduction - MS Edman

Search Mode - MS Edman

Regular Expressions - MS Edman

Mismatched AA's - MS Edman

To Use the MS Edman Form

Search

Search Parameters

Modifications

Search Mode

Digestion of a User Supplied Sequence - MS Digest

Database Entry Retrieval Method - MS Digest

To Use the MS Digest Form

Digest

Protein

Modifications

Database

Fragment Ion Types - MS Product

To Use the MS Product Form

Fragmentation

Peptide Sequence

Modifications

Product Ion Types

Combination Type - MS Comp

Nitrogen Rule - MS Comp

Double Bond Equivalent - MS Comp

To Use the MS Comp Form

Compositions

Peptide

AA Composition

Amino Acids

Modifications

To Use the MS Isotope Form

Isotope Distribution

Peptide Sequence

Elemental Composition

Modifications

To Use the Peptide String Match Form

Filters

Output Features

Enter Peptide Sequences

To Use the Peptide List to Masses Form

Calculate Masses

Modifications

Peptide Sequences