Spectrum Mill Utility Programs


Table of Contents

Archive Data:

Peptide Selector:

MRM Selector:

Multiple Sequence Aligner:

MS Edman:

MS Digest:

MS Product:

MS Comp:

MS Isotope:

Peptide String Match:

Peptide List to Masses:



To Archive Data

This option lets you archive instrument-created files, search result files, spectral files and data directories.

Archiving data saves disk space and makes it easier and faster to copy data to a backup drive. Data archiving can also be automated in a workflow. Archiving folders with large numbers (tens of thousands) of hits (.spo files) can take some time.

  1. Under Utilities, click Archive Data.
  2. Under Data Directories, click Select and choose the directory whose files and/or folder you intend to archive or un-archive (zip or unzip).
  3. Select the general categories of files or directory you intend to archive and how you want to archive them.
    Below the options in each category, please read the description of what happens when you select an option. 
  1. Click Archive.


To Use the Peptide Selector Form

Peptide Selector performs theoretical digestions on each protein supplied by accession number or sequence and then automatically selects from the theoretical peptides those that fit specific filtering criteria.  The most common uses are:

You can use Peptide Selector to create MRM or Q-TOF inclusion lists based on selection criteria. You may also create Q-TOF MS/MS target lists from prior results, filtered by unidentified and/or single-peptide-hit proteins.

In many ways, Peptide Selector is similar to MS Digest.  Both Peptide Selector and MS Digest perform automatic protein digestions. The difference is that Peptide Selector additionally creates limited lists based on specific criteria. You can use it to create an inclusion list for MS/MS analysis.

The following topics describe options available on the Peptide Selector form. If you see settings on the form in green font, that means that you have marked the check box for Penalize rather than exclude, and the green text indicates the settings to which that applies.

Selection

Saved File Parameters

  • IEF: This check box appears when you mark Generate inclusion or MRM list file. Mark the IEF check box if you do off-gel electrophoresis and you want to predict the fractions that will contain peptides of interest. This prediction reduces the number of fractions you must analyze. IEF selector paper is published online in the proceeding of IEEE BIBM 2014 conference http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6999136.
  • Filename: To avoid overwriting an existing file, type a new file name for the inclusion list. Otherwise, keep the default lastPeptideSelectorResult.txtThis file is found in the \SpectrumMill\results_msedman folder.
  • Run time (min): Type an upper limit for the LC retention time you wish to include in the file. Use this setting to exclude unwanted peaks at the end of the run.
  • List Type: Choose either m/z or uncharged.
  • Precursor Charge: Choose a minimum and maximum precursor charge. For the ion trap CID fragmentation mode the 2+ charge state is preferred for peptides because it is most likely to yield b- and y-ions of high abundance. For Q-TOF instruments, higher charge states can also yield good fragmentation.  For the Agilent Q-TOF, the +2 and +3 charge states typically yield an equal number of peptide identifications.
  • Precursor m/z: Choose a minimum and maximum precursor m/z.
  • When you are ready to transfer the contents of the text file into the data acquisition software, see instructions below.

    Digest Parameters

    Product Ion Parameters

    Criteria for Excluding Peptides

    Modifications

    Protein(s) to Select From

    Search Mode

    Scoring

    Output Features


    To transfer the inclusion list into MassHunter software

    Transfer to MassHunter Data Acquisition for Triple Quadrupole

    To transfer the inclusion list from the text file into MassHunter Data Acquisition for Triple Quadrupole, do the following:

    1. Do one of the following:
    2. Select the contents of the file (Press CTRL + A on your keyboard).
    3. Copy the contents of the file (Press CTRL + C on your keyboard).
    4. Do one of the following:
    5. Paste as text into Excel. (In Excel click Paste > Paste Special, then in the dialog box, click Text, then click OK.)
    6. Make any necessary edits.
    7. (Optional) Save as an Excel file, for future reference.
    8. Copy the contents from the Excel file and paste into the MRM table in the MassHunter Data Acquisition for Triple Quadrupole.

    Import to MassHunter Data Acquisition for Q-TOF

    Follow these instructions to import a Q-TOF target list:

    1. In Peptide Selector, mark Generate inclusion or MRM list file.
    2. Select MassHunter Q-TOF MS/MS target list.
    3. In the Filename text box, type a name of your choice with a .txt suffix.
    4. From the List Type list, select m/z.
    5. Click Select.
    6. In MassHunter Data Acquisition, click the Targeted List tab.
    7. Right-click the Targeted List table and click Import.
    8. Import the TXT file from the \\SpectrumMill\results_msedman folder.
    Note: The Open dialog box for importing to MassHunter initially allows only selection of .csv files.  To force the file name selection to list all files, type "*" and press enter, then select the .txt file.

    Export to MassHunter Accurate Mass (AM) Database

    Follow the instructions below to create a file with a .csv suffix containing peptide neutral mass formulas that can be searched in MassHunter Qualitative Analysis as an accurate mass database, or used with the Find By Formula algorithm.  This database allows you to verify hits in MassHunter Qualitative Analysis, especially one-hit wonders.
    1. In Peptide Selector, mark Generate inclusion or MRM list file.
    2. Select MassHunter AM database.
    3. In the Filename text box, type a name of your choice with a .csv suffix.
    4. From the List Type list, select uncharged.
    5. Click Select.
    6. Copy the .csv file located in the \\SpectrumMill\results_msedman folder to the MassHunter\PCDL folder or MassHunter\databases folder, whichever exists on your system.
    7. Within MassHunter Qualitative Analysis select the file as the database to search or use it in Find By Formula. 

    To Use the MRM Selector Form

    The MRM Selector uses Spectrum Mill MS/MS Search results to create multiple reaction monitoring (MRM) lists for triple quadrupole instruments. You select data that has been searched, then filter the results to include only those peptides that meet certain requirements. (The filters are similar to those in Protein/Peptide Summary.)

    The Agilent Q-TOF Data Extractor has been enhanced to output the collision energy (CE), peak apex, and chromatographic peak width in the specFeatures.tsv file, so you can use these values in dynamic MRM (DMRM) generation.

    Select Results for MRM Selection

    Validation and Sorting

    MRM Parameters

    To transfer the MRM list into MassHunter Data Acquisition software

    To transfer the MRM list from the text file into the MassHunter Data Acquisition software:

    1. Do one of the following:
    2. Select the appropriate contents of the file. If you will copy to Excel, press CTRL + A on your keyboard to select everything. If you will copy directly into the MassHunter Data Acquisition program, do not select the Score column. For Dynamic MRM on the Agilent Triple Quadrupole, select only the Compound Name column.
    3. Copy the contents of the file (Press CTRL + C on your keyboard).
    4. Do one of the following:
    5. Paste as text into Excel. (In Excel click Paste > Paste Special, then in the dialog box, click Text.)
    6. Make any necessary edits.
    7. (Optional) Save as an Excel file, for future reference.
    8. Copy the appropriate contents from the Excel file and paste into the MRM table (for Triple Quadruple) in MassHunter Data Acquisition.

    As of B.04.01, two additional fields are exported which can be used to filter the list in Excel:


    To Use the Multiple Sequence Aligner Form

    The Multiple Sequence Aligner enables alignment and comparison of the amino acid sequences of proteins that are present in a database. The Spectrum Mill software highlights the amino acids that differ among the sequences.

    The software accomplishes the alignment via a transparent interface to ClustalW, a program that is available from the European Bioinformatics Institute (EBI). Agilent licenses the ClustalW program, and the Spectrum Mill installation copies it to the millbin folder on the Spectrum Mill server.

    Note: If the database is too large (> 4.2 Gb), the alignment does not work properly. In that case, create a subset database before you do the alignment.

    You can also access multiple sequence alignment from the Protein/Peptide Summary form. For more information about multiple sequence alignment, please see the help for that form.

    Align

    Database


    Introduction - MS Edman

    MS Edman began as a simple utility for specifying a text string (protein name, sequence, accession number) and retrieving the database entries associated with that string. Since the algorithm used for accomplishing this is very similar to the way regular expressions are treated with the UNIX® grep command, the implementation lends itself well to describing the ambiguity often present in data obtained from an Edman degradation protein sequencing experiment. Additional features, such as peptide mass filtering and tolerance for mismatched amino acids, have since been added.


    Search Mode - MS Edman

    Sequence Only
    MS Edman finds amino acid sequences in the selected database that match the regular expression entered.
    In this mode the sequence should be in CAPITAL LETTERS.

    Sequence and Mass
    MS Edman first finds amino acid sequences in the selected database that match the regular expression entered, then filters those sequences to eliminate those not containing one of the specified peptide mass WITHIN the sequence. Hence, not all of the specified sequence must be contained in the region defined by the mass. Thus, residues outside of the peptide in question could be specified (unless done when specifying No enzyme, since the cleavage rule may prevent matching in such cases).
    In this mode the sequence should be in CAPITAL LETTERS.

    Name, Accession Number or Species
    If the search mode is set to Name, Accession Number or Species, the search only examines the relevant field of the database entry's FASTA-formatted comment line. In the Name mode, you should type one or more regular expressions; the case of letters is ignored. In the Accession Number mode, you should type one or more accession numbers (NOT regular expressions). Again, the case of letters is ignored. In the Species mode, you should type one or more species from the database .sl file. The output will be a list of the entries which match anything in the input list. The list of entries can be saved and searched by a different Spectrum Mill program such as PMF Search or MS/MS Search.


    Regular Expressions - MS Edman

    Square brackets have special meaning in a regular expression. The regular expressions used are of the form used by the UNIX grep facility. Examples (type man grep on a UNIX system for full details):

    [EF] The amino acid is either E or F.
    [^EF] The amino acid is anything but E or F.
    . Any single amino acid is possible.
    .* Used to represent a sequence of one or more unknown amino acids. Note that this is "dot-star" not just "star". This wildcard allows some not entirely obvious features. A match is to the longest sequence fitting the condition (ex: FMQ .*K will find the last K in the sequence following FMQ). In Sequene is matched first and  then a mHIN the sequence is found. Hence, not all of the specified sequence must be contained in the region defined by the mass. Thus, residues outside of the peptide in question could be specified (unless done when specifying No enzyme, since the cleavage rule may prevent matching in such cases).


    Mismatched AA's - MS Edman

    By setting the Max. # of Mismatched AA's parameter to a value other than 0, homologous sequences can be matched. This is done by allowing a number of positions, as determined by this parameter, not to match protein sequences in the database. This parameter is active in the following search modes:


    To Use the MS Edman Form

    MS Edman allows you to search text fields (such as sequence, name, accession number or species) in protein databases.  MS Edman can help identify a protein if you know only the molecular weight of a tryptic fragment and some of the amino acid composition. MS Edman is also the first step when you want to create a specialized subset database of entries that match your text search criteria. For example, you could use MS Edman to find all proteins that contain a certain amino acid sequence. If you marked the Save hits to file check box on the MS Edman form, you could then use Protein Databases to create a subset database from these saved results.

    The following topics describe options available on the MS Edman form.

    To return to default settings on the MS Edman page, click the Spectrum Mill button to go to the Spectrum Mill home page.  Then click the link on the home page to go back to the MS Edman page.

    Search

    Search Parameters

    Modifications

    Search Mode


    Digestion of a User Supplied Sequence - MS Digest

    To use MS Digest to digest a user supplied sequence:

    1. Select User Protein as the Database option.
    2. Read the instructions in the Protein sequence box.
    3. Paste or type the sequence in the Protein sequence box.

    4. Set the other MS Digest parameters as appropriate.
    5. Click the Digest button. 


    Database Entry Retrieval Method - MS Digest

    It is possible to retrieve entries from the database by specifying either the Accession Number or the Index Number. The accession number is a unique identifier for a protein within the database. It will not change between subsequent revisions of the database and is external to the Spectrum Mill package. The index number for a particular protein is internal to the Spectrum Mill package and is likely to change when you update the database. Both the index number and the accession number are reported in Spectrum Mill search results. Entries are generally more efficiently retrieved using index numbers.


    To Use the MS Digest Form

    MS Digest performs theoretical digestions and calculates masses of peptides that result. The program accepts both user proteins and sequences from databases.

    The following topics describe options available on the MS Digest form.

    To return to default settings on the MS Digest page, click the Spectrum Mill button to go to the Spectrum Mill home page.  Then click the link on the home page to go back to the MS Digest page.

    Digest

    Protein

    Modifications

    Database


    Fragment Ion Types - MS Product

    Check the boxes next to each ion type to list the corresponding fragment ions masses in the MS Product output. The default ion-types are those generally seen in MS/MS spectra. Supported ion types include:

     

    Ion type Restrictions
    a, b, y no restrictions
    a-NH3, b-NH3, y-NH3 ion contains R, K, or Q
    b-H2O ion contains S or T
    b+H2O ion contains R, H, or K; only bn-1, bn-2 ( length n)
    a-H3PO4, b-H3PO4, y-H3PO4 ion contains phosphorylated S,T
    b-SOCH4, y-SOCH4 ion contains oxidized M
    internal b <800 Da
    internal a <800 Da, internal b present
    internal b-H2O <800 Da, internal b present, ion contains S or T
    internal b-NH3 <800 Da, ion contains R
    N-term ladder removal of N-term residues (y equiv.)
    C-term ladder removal of C term residues (b+H2O equiv.)


    To Use the MS Product Form

    MS Product calculates theoretical ion masses from peptides which undergo dissociation via post-source decay or high- or low-energy collision-induced dissociation.

    The following topics describe options available on the MS Product form.

    To return to default settings on the MS Product page, click the Spectrum Mill button to go to the Spectrum Mill home page.  Then click the link on the home page to go back to the MS Product page.

    Fragmentation

    Peptide Sequence

    The variable modifications kmqsty are defined by default for MS Product. But if you select some other modification of K, M, Q, S, T, or Y (for example, guanidination of K), then that modification is used instead. That is, the default kmqsty modifications are defined in addition to whatever variable modifications you selected, but any selected variable modifications have priority.
     

    Modifications

    Product Ion Types


    Combination Type - MS Comp

    Amino Acid
    Lists the amino acid combinations consistent with the search conditions. Some of these will have identical elemental compositions.

    Peptide Elemental
    Lists the unique elemental compositions from the list of amino acid combinations reported by the Amino Acid option.

    Elemental
    Lists the elemental compositions consistent with the search conditions. The elemental compositions returned will obey the nitrogen rule and will have a double bond equivalent within the range expected for a peptide. The elemental compositions are, however, not guaranteed to have corresponding peptides. This option will work at much higher mass than the first two options.


    Nitrogen Rule - MS Comp

    The nitrogen rule states that for an organic compound with even number of nitrogens (including 0), the nominal mass of the molecular ion will be even. Note that this rule was first observed for EI spectra of small molecules, where the molecular ion is not protonated. Hence the rule for peptides is that the nominal mass for an MH+ equivalent must be odd.

    The nitrogen rule stems from the fact that most of the common elements that have even nominal masses have even valence:

    12C, valence = 4;
    16O, valence = 2;
    28Si, valence = 4;
    32S, valence = 2.

    On the other hand most of the elements with odd nominal masses have odd valence:

    1H, valence = 1;
    19F, valence = 1;
    31P, valence = 3;
    35Cl, valence = 1.

    Nitrogen is an exception in that it has an even nominal mass but an odd valence:

    14N, valence = 3.


    Double Bond Equivalent - MS Comp

    The double bond equivalent (DBE) is the number of rings or double bonds that an ion contains. It can be calculated from the elemental formula as follows:

    DBE = 1 - a/2 + c/2 + d

    where:

    a = number of atoms with a valence of 1 (H, F, Cl).

    b = number of atoms with a valence of 2 (O, S).

    c = number of atoms with a valence of 3 (N, P).

    d = number of atoms with a valence of 4 (C, Si).

    If the value calculated ends in 0.5, then this should be subtracted to get the true value.

     

    Amino Acid DBE Elemental Formula Calculation
    A 1.0 C3 H5 N1 O1 3 - 5/2 + 1/2
    C 1.0 C3 H5 N1 O1 S1 3 - 5/2 + 1/2
    D 2.0 C4 H5 N1 O3 4 - 5/2 + 1/2
    E 2.0 C5 H7 N1 O3 5 - 7/2 + 1/2
    F 5.0 C9 H9 N1 O1 9 - 9/2 + 1/2
    G 1.0 C2 H3 N1 O1 2 - 3/2 + 1/2
    H 4.0 C6 H7 N3 O1 6 - 7/2 + 3/2
    I 1.0 C6 H11 N1 O1 6 - 11/2 + 1/2
    K 2.0 C6 H12 N2 O1 6 - 12/2 + 2/2
    L 3.0 C6 H11 N1 O1 6 - 11/2 + 1/2
    M 1.0 C5 H9 N1 O1 S1 5 - 9/2 + 1/2
    N 3.0 C4 H6 N2 O2 4 - 6/2 + 2/2
    P 2.0 C5 H7 N1 O1 5 - 7/2 + 1/2
    Q 3.0 C5 H8 N2 O2 5 - 8/2 + 2/2
    R 2.0 C6 H12 N4 O1 6 - 12/2 + 4/2
    S 1.0 C3 H5 N1 O2 3 - 5/2 + 1/2
    T 1.0 C4 H7 N1 O2 4 - 7/2 + 1/2
    V 1.0 C5 H9 N1 O1 5 - 9/2 + 1/2
    W 8.0 C11 H10 N2 O1 11 - 10/2 + 2/2
    Y 5.0 C9 H9 N1 O2 9 - 9/2 + 1/2

    The terminal groups and cation then contribute H3O to the overall elemental formula, reducing the DBE by 1.5. Also, there is one to add on from the original formula.


    To Use the MS Comp Form

    MS Comp fills in possible amino acid compositions for a peptide, given a peptide mass and partial composition determined from immonium ions present in MS/MS spectra.

    To return to default settings on the MS Comp page, click the Spectrum Mill button to go to the Spectrum Mill home page.  Then click the link on the home page to go back to the MS Comp page.

    Compositions

    Peptide

    AA Composition

    Amino Acids

    Modifications


    To Use the MS Isotope Form

    MS Isotope calculates and displays isotope patterns of peptides. The following topics describe options available on the MS Isotope form.

    To return to default settings on the MS Isotope page, click the Spectrum Mill button to go to the Spectrum Mill home page.  Then click the link on the home page to go back to the MS Isotope page.

    Isotope Distribution

    Peptide Sequence

    Elemental Composition

    Modifications



    To Use the Peptide String Match Form

    Peptide String Match finds peptides that contain a specific sequence of amino acids.

    To use this tool:

    1. Under Utilities, click Peptide String Match.
    2. Under Database, select the database for which you want to find peptides that match a given peptide sequence.
    3. Set options as described below.
    4. Click Find Peptides.

    Filters

    Output Features

    Enter Peptide Sequences



    To Use the Peptide List to Masses Form

    Peptide List to Masses calculates the masses and formulas for a set of peptides that you specify.

    Calculate Masses

    Modifications

    Peptide Sequences