What's New in This Version of Spectrum Mill

New in BI.08.01
New in BI.07.11
New in BI.07.09
New in BI.07.08
New in BI.07.07
New in BI.07.00
New in B.06.00
New in B.05.00
New in B.04.01
New in B.04.00

New in BI.08.01 (December 2022)

The BI.08.01 version of Spectrum Mill is a major version release due to the addition of:

Bruker Data Extractor for TimsTof data
Report-to-Plots tool for making plots of metrics from SM tabular reports (PSM/Peptide score metrics, sample handling Quality Metrics, and Spectral Features)
Subset-specific FDR filter for extracting a subset of peptides/PTM-sites derived from a rare source, apply subset-specific FDR filtering
Overhaul of underlying data structures that diminish by 2-fold the memory required for Protein/Peptide Summary reports for large TMT proteome cohorts.

Several other new features have also been included. Below describes additional differences between the BI.07.11 and prior versions of Spectrum Mill.

Installation and Configuration
Workflow Automation
Protein Sequence Database Utilities
Data Extractors
Protein/Peptide Summary
Process Report
Lorikeet spectrum viewer
Report-to-Plots
Subset-specific FDR filter
Changes with the B.07.11 version

Installation and Configuration

Python v3.9

Anaconda Anaconda3-2022.05-Windows-x86_64.exe is supplied with this release.

Workflow Automation

New features motivated by the analysis of cohorts of immunopeptidomic samples where each sample is analyzed independently using an identical workflow except for a personalized sequence database with single amino acid variants (SAAVs) derived from whole exome sequencing (WES) and aberrant splice junctions derived from RNA-seq.

Override Workflow Database:
Apply all tasks to individual directories (no combined autovalidation or reports)

Protein Sequence Database Utilities

Individual Databases Instead Of Combined - option added to the Concatenate FASTA files utility, when the Make Proteogenomic Summary Tables feature is used.
Translate nucleotide FASTA to protein FASTA utility introduced.

Data Extractor

XtractorBruker .d TimsTof Extractor introduced with the following features akin to the SM Data Extractors for other instrument vendors:

2D MS/MS replicate merging (in both retention time and ion mobility dimensions)
Filtering by MH+, z, RT, Sequence tag length
MS/MS spectral feature calculation
SRM control of XtractorBruker enables Max CPU extraction

XtractorBruker does not yet read/process MS1 scans. Hence, the following spectral features are currently missing: MS1 precursor peak area, Chromatographic peak width, Precursor ion purity. However, an MS1 intensity is stored in the Bruker Precursor SQL table and is currently output as the SM precursor peak area.

Description from the Bruker SQL table documentation:
Intensity of this precursor in the corresponding MS ^ 1 frame. The corresponding MS ^ 1 frame in which this precursor was detected. In the case that MS ^ 1 frames were repeatedly measured and averaged to improve SNR for precursor detection, the TDF stores those frames individually, and this field points to the last of that set of frames.

Protein/Peptide Summary

While not readily apparent to users, the underlying data structures for spectral features were overhauled to acheive two objectives:

Diminish by 2-fold the memory required for Protein/Peptide Summary reports for large TMT proteome cohorts, particularly when transitioning from TMT10/11 to TMT16/18.
RICF - Enable combined reports of multiple data directories using different TMT reagents for individual data directories. The reagent used for each directory is automatically diagnosed from the reporter ion correction factors (RICF) file. The particular control ion (ratio denominator) used for each data directory must be consistent (first, last, or MedianMulti).

Process Report

Auto-SA reporter ion label type. Reporter ion label can be automatically determined from the sample annotation (SA) file. Accompanies the P/P Summary RICF feature to enable combined reports of multiple data directories using different TMT reagents for individual data directories.
-precursorIntensity.gct file now produced for label type - Label free.
-reporterIntensity.gct file generated for iTRAQ/TMT label types when sample annotation (SA) file contains a ratioDenominator column.

Lorikeet spectrum viewer

Added support for internal ions.

When using the Microsoft Edge Browser in Internet Explorer mode it remains possible to use the classic SM Java applet from the SPI link in SM HTML output. However, configuring IE mode is a considerable hassle. Lorikeet support for internal ions, essentially renders the classic viewer obsolete. The feature of the proportional peptide sequence with gray bars at theoretical ion positions in the classic viewer may make it into a future revision of the Lorikeet viewer.

Report-to-Plots

Introduced Report-to-Plots tool for making plots of metrics from SM tabular reports (PSM/Peptide score metrics, sample handling Quality Metrics, and Spectral Features)

This standalone tool provides a convenient means for developing and generating plots from various SM reports. Feedback is welcome. Users are encouraged to use the underlying Python scripts as a starting point for generating their own customized plots. In future SM releases plot generation by RtP will be a Task that can be added to an SM workflow and some existing tasks (especially Quality Metrics and P/P Summary) will automatically generate particular plots.

Subset-specific FDR filter

Introduced Subset-specific FDR filter for extracting a subset of peptides/PTM-sites derived from a rare source, apply subset-specific FDR filtering

This tool was initially motivated for analysis of immunopeptidomics datatsets with the objective of increasing confidence in the identification of HLA-bound peptides derived from non-canonical unannotated open reading frames (nuORFs). Following conventional aggregate FDR Filtering of a dataset with the SM Autovalidation module the basic approach is:

Make Subset(s)Peptides of Interest, based on values in the species column of an SM report or a user-provided column derived from the P/P Summary Category file option.
Calculate FDR of Subset(s)
Apply Fixed Thresholds - stricter thresholds to several scoring metrics
Re-calculate FDR of Subset(s)
Perform a dynamic, Grid Search to optimize the backbone cleavage score (BCS), score, SPI, BCS% thresholds for Subset(s) to reach a target FDR of <1%.

New in BI.07.11 (April 2022)

The BI.07.11 version of Spectrum Mill is mostly a minor maintenance release to aid new users getting started that is motivated by revised default saved parameter sets that now don't refer to Broad Institute project specific category files in P/P Summary that are not present on a newly installed SM server. Nonetheless, a few other new features have been included. Below describes additional differences between the BI.07.11 and prior versions of Spectrum Mill.

Default Saved Parameter sets
Data Extractors
MS/MS Search
Process Report
Changes with the B.07.09 version

Default Saved Parameter sets

Revised several parameter sets to remove features specific to use at Broad Institute, like category files in P/P Summary that are not present on a newly installed SM server.

Data Extractor

Thermo .RAW Extractor revisions include:

Created instrument params for Orbitrap CID HLA v3, that for high resolution CID spectra provide more sensitive peak detection before calculaing the spectral feature: max sequence tag length.

MS/MS Search

Exposed control of the feature: Skip QUILTS Unmutated Peptides. This was introduced in v7.09 in a faulty way which caused searches using Ensembl protein identifiers to report only a single protein identifier, from the longest ptorein containing the matched peptide. When the skip feature is checked in v7.11 it now performs as intended with all reference proteome identifiers present. Proteogenomic search revision to NOT report hits to wt peptides in proteins containing SAAVs, applies only to Ensembl sequence identifiers. Implemented by not allowing extended accession nums (mutant protein) with same peptide sequence matched to an unextended accession num.
Created instrument params for Orbitrap CID HLA v3, that for high resolution CID spectra provide more sensitive peak detection and fragment ion type scoring suited for HLA peptides.
Added the Spectral Quality Filter: Phospho Product Ion Score (PPIS) to craft subsets of spectra containing a phospho neutral-loss ion signature.

Process Report

Fixed bug preventing Plot Ratio Distributions from running. Extended plots to support .GCT format.
To Normalize Reporter Ratios added features:

Retain Species:
Combine replicate columns & omit QC.fail
Apply prior factors (for subsets to use aggregate)

New in BI.07.09 (Sept 2021)

The BI.07.09 version of Spectrum Mill includes the new key features:

Support for TMTpro-18 reagents
Integrated Lorikeet spectrum viewer both as a link from Protein Peptide Summary and as a standalone page when using browsers other than Internet Explorer (IE11).
Process Report now export of .GCT format reports in conjunction with *sample-annotation.csv files (vertical layout) for mapping samples to reporter ions.
Added the Spectral Quality Filter: Glyco Product Ion Score (GPIS) to craft subsets of spectra containing a glyco ion signature.

Below describes additional differences between the BI.07.09 and prior versions of Spectrum Mill.

Installation and Configuration
Data Extractors
MS/MS Search
Protein/Peptide Summary
Protein Sequence Database Utilities
Tool Belt
Process Report
Changes with the B.07.08 version

Installation and Configuration

R

R R-4.1.0-win is supplied with this release.

Data Extractor

Thermo .RAW Extractor revisions include:

Added support for TMTpro-18 reagents

MS/MS Search

Proteogenomic search revision to NOT report hits to wt peptides in proteins containing SAAVs, applies only to Ensembl sequence identifiers. Implemented by not allowing extended accession nums (mutant protein) with same peptide sequence matched to an unextended accession num.
Added support for prompt neutral loss mods like O-HexNAc, used to trigger calculations of peptide parent ion modified paired with fragment ions unmodified.
Added the Spectral Quality Filter: Glyco Product Ion Score (GPIS) to craft subsets of spectra containing a glyco ion signature.

Protein/Peptide Summary

Integrated Lorikeet spectrum viewer. Link from SPI in HTML output now launches Lorikeet viewer when using browsers other than Internet Explorer (IE11). Classic SM Java applet continues to be launched, when using IE11
Enabled VM-site reports to have annotated PG Features (Variants & Spliceforms) included along with all others
Revised PG features column headers so they are now valid R names.
Added support for TMTpro-18 reagents
Changes to behavior of the Control ion menu when selecting MedianMulti or MeanMulti:

If from the UI you choose MedianMulti but then select NO individual accompanying control ions the script (not the UI) will default to all of the ions for that label.
When all reporters or no reporter for a particular label are chosen, the denominator annotation in the output is shortened (ex: MedianMulti.all.18 for TMT18) instead of listing every single reporter ion.

Process Report

Added support for TMTpro-18 reagents
Added support for *sample-annotation.csv file (vertical layout) for mapping samples to reporter ions.
Added ability to generate .GCT formatted output reports.

Protein Sequence Database Utilities

Added the utility Remove fragments - UniProt
Overhauled the Calculate statistics utility. Output now in table format and includes 9-mer redundancy factor.

Tool Belt

Added support for TMTpro-18 correction factors

Create Reporter Ion correction factors
Apply Reporter Ion correction factors

New in BI.07.08 (June 2021)

The BI.07.08 version of Spectrum Mill contains minor changes. Below describes additional differences between the BI.07.07 and prior versions of Spectrum Mill.

Protein/Peptide Summary
Protein Sequence Database Utilities
de novo Sequencing
Spectrum Summary
Changes with the B.07.07 version

Protein/Peptide Summary

Updated default saved parameter sets from ~2015 to 2021
Revised to only calculate coverage maps when % coverage requested, to save memory on very large CPTAC reports.
Fixes to get PG feaures to report in Peptide/PSM report modes.

Protein Sequence Database Utilities

Overhauled the manual. Updated supported database descriptions from ~2007 to 2021

de novo Sequencing

Removed some residual software testing features from the UI.

Spectrum Summary

Updated default parameter sets for co-reporting de novo and DB search results.
Fixes to Rscript for making plots of performance metrics for de novo and DB search results.

New in BI.07.07 (March 2021 - first Broad Institute release)

The BI.07.07 version of Spectrum Mill contains updated JAVA applets for spectrum viewing with certificates valid until Jan 2024. Below describes additional differences between the BI.07.07 and prior versions of Spectrum Mill.

Installation and Configuration
Home Page
Data Extractors
MS/MS Search
MS/MS Autovalidation
Protein/Peptide Summary
Changes with the B.07.00 version

Installation and Configuration

Operating systems supported

Windows Server 2016 (encouraged)
Windows 10 (single-user environment)

Web browser support

Google Chrome (primary development, testing use at Broad Institute)
Microsoft Edge, Firefox, and Opera (tested occasionally)
Microsoft Internet Explorer 11 (required only for Spectrum Viewer applied use)

The following installation and configuration features are new in Spectrum Mill vBI.07.07. For details, see the Installation Guide.

Perl

Strawberry Perl v5.32.1.1 is supplied with this release. This SM release is backwards compatible with ActiveState Perl v5.18.4 supplied with prior SM releases.

R

R R-3.6.2-win is supplied with this release.

JRE

Java JRE v1.8u202 is supplied with this release. Install both the 32- and 64-bit JREs, even if you run the 64-bit Internet Explorer (IE 11). Later versions of JRE 8 may be available (see www.java.com), and in general should work with Spectrum Mill. The JRE is only required to support the Spectrum Viewer applets and the Sherenga de novo program. For the Spectrum Viewer applet be sure to install JREs on all client computers, as well as the Spectrum Mill server.

Thermo Fisher MSFileReader (for .raw files)

MSFileReader_3.1_SP4 is supplied with this release.

Default parameter sets

Nearly all default parameter sets have been updated since vB.06.00.

Home Page

Added links to:
- Slides: Spectrum Mill - Overview
- Getting Started Guide
Removed links to obsolete utilities
- MS Edman
- MS Comp
- MS Isotope

Data Extractors

Thermo Fisher .RAW Extractor revisions include:

Updates to use the most recent release of the Thermo Fisher API MSFileReader_3.1_SP4.
User control of peak detection thru addition of instrument menu to allow matching settings between Data Extractor and MS/MS search.

MS/MS Search

Updated peak detection to improve sensitivity originally implemented for HLA peptides now applied to other instrument definitions.

ESI QExactive HCD v4 35 (March 2020)
ESI QExactive HCD v4 30 (March 2020)

MS/MS Autovalidation

Updated the VM site grouping used in VM site polishing mode that is also shared with P/P Summary (see below).

Protein/Peptide Summary

Updated the VM site grouping used in Protein - Var Mod site reporting mode to more consistently handle grouping PSMs from confident localizations and multiple ambiguous localizations overlapping the same region of a protein. Ambiguous localizations are now grouped with consistent confident ones preferrentially over other ambiguous localizations that are not consistent with the confident localization. Previously the grouping was preferentially combining PSMs with the most N-terminally positioned localization.

New in BI.07.00 (November 2016 - Feb 2020)

The BI.07.00 version of Spectrum Mill eliminates the dependency on Internet Explorer, and can now be used with most web browsers including Google Chrome, Firefox, and Microsoft Edge. Primary development and testing is now done with Google Chrome. Below describes additional differences between the BI.07.00 and prior versions of Spectrum Mill.

Home Page
Data Extractors
MS/MS Search
MS/MS Autovalidation
Protein/Peptide Summary
Quality Metrics and False Discovery Rate (FDR)
Spectrum Matcher
Spectrum Summary
de novo Sequencing
Protein Sequence Database Utilities
Tool Belt
Process Report
Changes with the B.06.00 version

Home Page

Added links to:
- Help Index
- Custom Modifications Guide
- Installation Guide
Removed obsolete tools
- PMF Summary
- de novo summary (functionality now included in Spectrum Summary)

Data Extractors

Thermo .RAW Extractor revisions include:

Added support for ETHCD dissociation.
Added support for FAIMS so that all MS/MS and MS XIC's are associated with the compensation voltage (CV) used.
Improved support for Lumos data MS1 chromatographic peak detection and monoisotopic m/z adjustment
Added support for TMT11 & TMTpro-16 reagents

MS/MS Search

Updated peak detection to improve sensitivity by revision of the noise level calculation performed for each spectrum prior to signal/noise based peak detection. This primarily affects spectra of low abundance peptides with very little noise, and not only leads to higher identification scores for low-signal spectra, but also allows more low-signal spectra to pass the sequence tag length based spectral quality threshold employed by the data extractor.
- ESI QExactive HLA v3 30 (June 2019)
Newly optimized MS/MS search scoring of fragment ion types for HLA class I peptides.
- ESI QExactive HLA v2 (December 2017)
No enzyme search efficiency optimized to no longer require using the Disable Skipping Repeat Peptides in Database
- Requires hardware with a Memory to CPU ratio of ~3 GB RAM / CPU
- For a typical sequence database the searches should be ~2X faster
Revised to allow variable modifications with accompanying sequence motifs to substantially reduce the search space to the most likely possibilities for some modifications. Primary examples include:
- Deamidated NG (only when preceding Gly)
- Hydroxylation of PG (only when preceding Gly), for hydroxyproline in protein collagen domains
- TMT10 contains His (Y,S,T) (only when peptide contains His), for overlabeling with TMT reagents
Added support for TMT11 & TMTpro-16 reagents
Revised to allow no enzyme mode support for protein N-terminal acetylation
Added the Full length digest option, which prevents making subsequences of a protein entry in the database thus attempting to match only the full length sequence. This is intended to handle very large databases of short HLA peptide candidate sequences.
Support for ETHCD dissociation on the fragmentation mode menu to support LC-MS/MS runs with multiple fragmentation methods employed in a single run.
Bug fix to allow modifications with a negative mass shift to match with multiple occurrences in a peptide. This was intended to handle Acetylated Lysines in TMT labeled peptides. The negative mass shift is due to in vivo acetylation preventing subsequent in vitro labeling with TMT
Support for ETHCD dissociation on the fragmentation mode menu to support LC-MS/MS runs with multiple fragmentation methods employed in a single run.

MS/MS Autovalidation

Added additional peptide level filter, backbone cleavage score (BCS) primarily for immunopeptidomics of HLA classI peptides. BCS is a peptide sequence coverage metric and the BCS threshold (typically 5) enforces a uniformly higher minimum sequence coverage for each PSM, at least 4-5 residues of unambiguous sequence. The BCS metric serves to decrease false positives associated with spectra having fragmentation in a limited portion of the peptide that yields multiple ion types.
Added mode VM site polishing to preferentially retain VM sites that are recurrent across mutliple experiments, while invalidating low scoring VM sites observed in single experiments.
Added protein polishing features:
- Added protein grouping option, expand subgroups, top uses shared
- Added retain proteins above either threshold (less strict - retains recurrently observed proteins with scores below threshold)

Protein/Peptide Summary

Reorganized and revised protein grouping options:
- Added new protein grouping method: expand subgroups, top uses shared
- Existing protein grouping method, subgroup specific moved from a checkbox to under the protein groping menu: expand subgroups, ignored shared, SGS
- Revised Protein Comparison mode to make 2 reports for Excel Export. When either top shared, or ignore shared is selected the all shared report is also made.
Added support for TMT11 & TMTpro-16 reagents
afRICA - dynamic reporter ion correction algorithm implemented
Improved SILAC dataset support (DEQ ratios) in Protein Comparison mode
- Added ability for user to specify the threshold value of Exclude poor isotope quality Precursor XIC's: Chi2 vs. Averagine
- Fixed bugs so that Exclude outlier DEQ Ratios (> 2 std dev from mean) works properly.
Added multi species reporting in protein comparison mode reports for better support of xenograft datasets.
Added support for VMsiteFlankingSequence in Protein VMsite reports
Revised mode Protein Prot Genom Site Comparison reporting of variants and spliceforms to be more generic instead of CPTAC2-specific
Added support to report FAIMS compensation voltage (CV) for PSMs
Fixed bug so fill time is reported for PSMs
Added sequenceMulti column for peptide mode outputs.
Added reporting of fragmentation metrics/categories for MS/MS spectra to PSM/peptide reports

Quality Metrics and False Discovery Rate (FDR)

More accurate MS1 chromatographic peak widths for Thermo Lumos datasets, based on Data Extractor changes. Also stopped counting 0’s towards median peak width metric for all instrument types.
Added additional fill time metric medianTrapFillMsecUnmaxed to exclude spectra reaching maximum fill time, to better measure the performance of QExactive HFX model instruments that not only have a very rapid scan rate but can also be operated with the monoisotopic precursor recognition, peptide match set to preferred. With those settings many unproductive MS/MS scans with maximum fill times are taken when there are no remaining good quality precursors detected, hence retaining those in the median causes the metric to measure the sample composition more than the instrument performance.
Revised Isobaric label incorporation quality metrics to more clearly count blocked Nterms as underlabeled to enable better decision making about the prospects for re-labeling.
Added calculations and plots about reporter ion ratios across the LC gradient to enable recognition of problems with inconsistent recovery of early eluting peptides in some samples in a plex.
Added reporting of fragmentation metrics in MS/MS spectra
Revised Digestion stats to count percent tryptic, semitryptic, nontryptic.
Added peptide separation metric that makes:
- Peptide subset reports for each data directory derived from central peptide lists(seqdb/peptideQMlists/*.txt)
- Added support for making comparative retention time plots relative to a gold standard run. Requires Namrata’s python script (millpy/20180327_SM_Select_Peptide_QM.py), and python installed on the SM server.
Added columns for iTRAQ/TMT metrics: Median S/N All Reporters, allReportersDetectedPercent, controlIonDetectedPercent
Renamed Contaminant Product Ions score to Glyco Product Ions score and raised the threshold score from 2.0 to 4.5 for a spectrum for counting a PSM as containing glyco marker ions.

Spectrum Matcher

Updated to enable searching high quality left-over spectra against high-quality already identified spectra
Produces a histogram of frequently observed precursor mass shifts to help suggest presence of modifications not accounted for during a database search
Results reported in Spectrum Matcher

Spectrum Summary

Updated to combine PSM level results of DB search, de Novo, and Spectrum Matcher results
Added de novo performance metrics of accuracy and sequence recall that are calculated, tabulated, and graphed
Summarizes Spectrum Matcher results to help explain the Spectrum matcher Precursor Mass Shifts using Database search Id from Library Spectrum matched with metrics that include:
- Enriched for particular AA's present in peptide
- Frequent N-terminal AA's
- Compare precursor MS1 intensity
- Compare LC retention time (RT)
- Co-fractionate in bRP

de novo Sequencing

Updated Sherenga scoring and peak detection optimized for high resolution HCD spectra
Updated Sherenga automation that is ~100x faster than in SM v6.00
Can now be incorporated into workflows

Protein Sequence Database Utilities

Revised to handle very large databases of HLA peptides, associated with the MS/MS Search Full length digest option. Added DB indexing option to omit pI, MW, and Species Indices. This concumes much less disk space but also prevents pI, mw and species subset searches).
Restored ability to detect duplicate accession numbers when indexing a database.
Added support for making a list of tryptic peptides in the entire database.
Enabled generic rather than CPTAC-specific support for concatenating personalized sequence databases from QUILTS variants and spliceforms along with the option to Make Proteogenomic Summary Tables for subsequent use in Protein/Peptide Summary reporting mode: Protein Prot Genom Site Comparison.

Tool Belt

Added support for TMT11 & TMTpro-16 correction factors
Revised PepXML export to handle a specified subset of search result files.

Process Report

Added the Process Report Module with the following capabilities:

Parse Report - Parses .ssv files generated by Protein/Peptide Summary to selectively extract the highest information value columns. Can also map sample identifiers to reproter ion masses.
Normalize Reporter Ratios - Normalizes distributions of reporter ion ratios within a dataset.
Plot Ratio Distributions - Plots histograms of iTRAQ/TMT ratio distributions in a dataset at the protein or VM-site level, both before and after normalization.

New in B.06.00.200 (October 2016 - last Agilent release)

The B.06.00 version of Agilent Spectrum Mill MS Proteomics Workbench supports 64-bit Windows operating systems and includes enhancements that increase the flexibility of the software and enable greater productivity. This document describes differences between the B.06.00 and prior versions of the Spectrum Mill workbench.

Installation and Configuration
MS/MS Search
Quality Metrics and False Discovery Rate (FDR)
Home Page
Protein Databases
Tool Belt
Spectrum Matcher
Protein/Peptide Summary
Spectrum Summary
Data Extractors
Changes with the B.05.00 version

Note: If you have upgraded from a prior version, you will see some differences in validated results compared to the earlier version. When comparing data sets, it is best to reprocess (extract, search, autovalidate) them using the B.06.00 version. See Quality Metrics & FDR

Installation and Configuration

The following installation and configuration features are new in the Spectrum Mill workbench version B.06.00. For details, see the Installation Guide and the Site Preparation Guide. Both are on your software disk.

Operating system

Spectrum Mill B.06.00 is supported on the following Microsoft Windows operating systems:

Windows 7 and Windows Server 2008 R2
Windows Server 2012 R2
Windows 10
Windows Server 2016

Only 64-bit versions of the above operating systems are supported.

Windows XP and Windows Server 2003 are no longer supported.

If you configure a new system, Agilent recommends that you provide adequate memory (16 GB or more) and adequate disk space (1 to 2 TB for data). For a new system, Agilent recommends 24 to 32 GB of memory.

Perl

ActiveState Perl 5.18.4 is supplied with this release. Install the 64-bit MSI. (If you are updating from Spectrum Mill B.04.01 or earlier, uninstall the prior version first.)

JRE

Java JRE 1.8u111 (1.8.0_111) is included in this release. Install both the 32- and 64-bit JREs, even if you run the 64-bit Internet Explorer (IE 11). Later versions of JRE 8 may be available (see www.java.com), and in general should work with Spectrum Mill.

The JRE is required to support the Spectrum Viewer and other applets.

Be sure to install JREs on all client browsers, as well as the Spectrum Mill server.

IIS

IIS must be installed before you install the Spectrum Mill workbench.

See the Installation Guide for details on installation and configuration of IIS.

MSFileReader support (for .raw files)

To process Thermo Scientific .raw files, Spectrum Mill B.06.00 can use the 64-bit Thermo Scientific MSFileReader.

If you update your Spectrum Mill installation from B.04.01 or prior, then you have the 32-bit MSFileReader installed, so you must install the 64-bit version. If you update your Spectrum Mill installation from B.05, you already have the 64-bit version.

Web browser support

The following Web browser is supported with this version:

Microsoft Internet Explorer 11

If you are upgrading from Spectrum Mill workbench B.04.01 or earlier, note that the prior release required Compatibility View for Internet Explorer 9 and later. For this release you must disable Compatibility View.

MS/MS Search

MS/MS Search is from two to five time faster, depending upon the peptide redundancy in the databases you search.
MS/MS Search no longer creates individual spo files in results_mstag. Instead, searches create temporary concatenated spo files (cpo files). The spo files are added to an spo.zip file. The program deletes the individual cpo files when a search completes.
- This improvement significantly reduces the load on the file system (both in space and number of files), and makes archiving and copying data folders much faster.
In the Spectral Quality Filtering section of the MS/MS Search page, the min# of peaks filter has been removed and Precursor isotope quality and Precursor isolation purity filters have been added.
HLA peptide motifs and related half-enzyme searches are now supported.
The C-terminal peptide can now be matched in half-enzyme digests that are built C-term to N-term. Previously, the C-terminal peptide would only be matched, if it matched the enzyme specificity.
By default, peptides with lengths less than five amino acids are no longer matched. This enhancement is especially valuable when you allow large precursor mass shifts for variable modifications. To enable matching of smaller peptides, mark the Dynamic peak thresholding check box.
If you search data with multiple fragmentation modes and you select All, an informative error message is generated to indicate which mode(s) should be selected for searching.

Quality Metrics and False Discovery Rate (FDR)

Additional metrics are now available.
The defect fix that was in B.05.00 SP1 is included. In Spectrum Mill B.05.00, the reversed hit with the second best score was sometimes reported instead of the one with the best score. This caused the search to underestimate FDR. If you search against a typical UniProt database, the defect fix reduces the total number of identifications by 1-5%, but it makes them more accurate.

Home Page

Spectrum Matcher is back (under Mass Spectral Interpretation Tools) and provides additional quality filtering features to assist in evaluating instrument performance or method changes.

Protein Databases

Searching of DNA FASTA databases such as dbEST or custom databases (DN or DA prefix) are no longer supported. The DNA sequences must be converted to protein sequences. The FASTA protein header lines must correspond to one of the supported formats. See Updating databases
Addition of tool to Concatenate FASTA files, which allows you to link together two or more existing databases. It includes two possible ways to
concatenate:
- Select one or more existing database files to concatenate.
- Concatenate all files in a folder.
Support for change of NCBI FASTA header format:
- In September 2016, NCBI changed the FASTA header format to only supply the gb (GeneBank) accession. The former gi accession is no longer indicated.
- Newly downloaded databases in the new format are supported and the gb accession is used by Spectrum Mill for those databases.
- For the Spectrum Mill workbench to properly recognize the format, these new databases require either an NCBIgb or gb prefix instead of the NCBInr prefix.
- Existing databases (NCBInr) are still supported. GeneBank accessions (when present) can be reported in Protein/Peptide Summary by creating a Category file for the database.

Tool Belt

Now supports TMT10 correction factors
When you remove prior search results, any applied correction factors remain.

Spectrum Matcher

Includes new spectral quality filters and homology/variable mode
Supports Load/Save parameter files (but not in workflows)

Protein/Peptide Summary

Enhancements to support ion mobility data:
- Includes new Ion Mobility review field to report DT (drift time in milliseconds) and CCS (collision cross section in square angstroms).
- Agilent Proteomics Results (APR) Export can now export DT and CCS values if they are present in ion mobility data.
Review Fields now include Correction Method (for Reporter Ratios). Select Apply to apply correction factors or None to skip correction.

Spectrum Summary

Includes spectral quality filters
Supports Load/Save parameter files
New Review Field for ion mobility: reports DT and CCS values (if available)

Data Extractors

Thermo .RAW Extractor now supports Thermo Fusion/Lumos data.
Generic Extractor now extracts to mzXML.
Generic Extractor now supports PKL files from IM-MS Browser. (IM is ion mobility.)
- The IM-MS Browser Rev 7.02 Build 209 or later can export concatenated PKL files that contain the retention time (RT), drift time (DT), and collision cross
  section (CCS) for a feature (precursor).
- To report the CCS value, the CCS Single Field calibration must be applied to the data.
- The names of the extracted spectra indicate RT (in minutes x 10) and DT (in msec x 10), so the name has the form myDataFile.<RTx10>.<DTx10>.<charge>.pkl.
- This name provides a way to more easily find the spectrum in IM-MS Browser.
- The RT, DT, and CCS values are stored in the specFeatures.tsv file, and reported in Protein/Peptide Summary when the Ion mobility review field is marked.
- DT and CCS values from ion mobility experiments are written to mzXML for import into Skyline.

New in B.05.00

The B.05.00 version of Agilent Spectrum Mill MS Proteomics Workbench provides the following new features:

Installation and Configuration
Workflow Automation
Tool Belt and Metrics
MS/MS Autovalidation
Protein/Peptide Summary
Data Extractors
MS/MS Search
Utilities
Protein Databases
Changes with the B.04.01 version

Note: If you have upgraded from a prior version, due to improvements in data extraction, search scoring, and protein grouping, you will see some differences in validated results compared to the earlier version. When comparing data sets, it is best to reprocess (extract, search, autovalidate) them using the B.05.00 version.

Installation and Configuration

The following installation and configuration features are new in the Spectrum Mill workbench version B.05.00. For details, see the Installation Guide.

Operating system

Spectrum Mill B.05.00 is supported on the following Microsoft Windows operating systems:

Windows 7 and Windows Server 2008 R2
Windows Server 2012 R2

Only 64-bit versions of the above operating systems are supported.

If you configure a new system, Agilent recommends that you provide adequate memory (16 GB or more) and adequate disk space (1 to 2 TB for data).

Perl

ActiveState Perl 5.18.4 is supplied with this release. Install the 64-bit MSI. (Uninstall the prior version first.)

JRE

Java JRE 1.8u45 is included in this release. Install both the 32- and 64-bit JREs, even if you run the 64-bit Internet Explorer (IE 11).

Be sure to install JREs on all client browsers, as well as the Spectrum Mill server.

IIS

IIS must be installed before you install the Spectrum Mill workbench.

See the Installation Guide for details on configuration of IIS.

MSFileReader Support (for .raw files)

To process Thermo Scientific .raw files, Spectrum Mill B.05.00 can use either Xcalibur or the 64-bit Thermo Scientific MSFileReader. If you want to use one of these programs, install it on the server before you install the Spectrum Mill workbench. You can install it afterwards, but you must do some additional steps. (See the Installation Guide for details.)

If you update your Spectrum Mill installation, and you previously had installed MSFileReader, you must install the 64-bit version.

Web Browser Support

The following Web browser is supported with this version:

Microsoft Internet Explorer 11

Internet Explorer Compatibility View

If you are upgrading, note that the prior release required Compatibility View for Internet Explorer 9 and later. For this release you must disable Compatibility View.

Workflow Automation

New workflow automation tools are available.

Multi-core support for data extraction:
- The “Maximize CPUs” feature previously available for MS/MS Search is now available for data extraction.
- When multiple data files are present in a folder, extraction for each data files is assigned to an available core.
Automation support for the Quality Metrics feature as a workflow task:
- When you edit a workflow, you can load and save the Quality Metrics parameters.
- When you run a workflow, the Quality Metrics task is listed in the Request Queue. When it is finished, it is listed in the Completion Log.
Automation support for Data Archive as a workflow task:

When you edit a workflow, you can load and save the Data Archive parameters.
When you run a workflow, the Data Archive task is listed in the Request Queue. When it is finished, it is listed in the Completion Log.
Addition of Data Archive as the last step in a workflow reduces the number of files and required disk storage space.

All default parameter files have been updated, and new ones have been added for Archive and Quality Metrics.
For Protein/Peptide Summary, new default parameter files named "_valid" specify a Valid filter and set the scores and SPI thresholds to 0. These starting values overcome a common mistake where default values are too high to see all the valid hits.

Tool Belt and Metrics

The Quality Metrics tool has been moved from the Tool Belt to its own page, accessible from the Spectrum Mill home page and the Tool Belt page.
New Quality Metrics support for CPTAC (Clinical Proteomic Tumor Analysis Consortium) requirements.
The Archive Data tool has been moved from the Tool Belt to its own page, accessible from the Spectrum Mill home page and the Tool Belt page.
New bidirectional conversion of mzXML and pkl files.
The Spectral Collector tool has been removed.
The File Collector has been added. It provides an easy way to organize Protein/Peptide Summary exports in a single folder.

MS/MS Autovalidation

New filter for minimum number of directories that include a protein group.
This filter exists in Auto Thresholds, Protein Polishing mode.
Optimize thresholds by directory is useful for low-frequency matches in complex samples.

Protein/Peptide Summary

New Protein-Protein Comparison format for export of Agilent Proteomics Results (APR) to Agilent Mass Profiler Professional (MPP).
- Export combined protein and peptide results to MPP with new format.
- Addition of peptide information allows you to eliminate incorrectly identified peptides (because they do not track the protein).
- Only Agilent Q-TOF (.d) data is supported with this feature.
- The MPP Generic export is also selectable as an option for customers who have not updated to MPP 14.0, which is required to import APR files.
- Protein-Protein Comparison mode allows you to select the type of export (No export, Excel, MPP Generic, MPP APR).
Consolidated Protein-Peptide Summary modes, with the option to configure the "classic" modes by editing smglobals.js.

Data Extractors

Agilent Q-TOF Extractor now uses latest (64-bit) MassHunter Data Access Component (MHDAC).
Agilent Q-TOF Extractor now determines nominal resolution from method in data file. Resolution is adjusted based on instrument used to acquire data. Data acquired with older versions of Q-TOF data acquisition programs will report that the nominal resolution will default to 20,000.
Support for TMT10 isobaric mass tags (where instrument resolution is sufficient)
Support for automatic data file deletion after successful extraction
- Check box specifies to delete raw data file/folder after successful extraction.
- Placeholder file is created and used to indicate presence of extracted data.
- This feature keeps disk usage to a manageable level.
- Warning: If you mark the check box to Delete data files after extraction, make sure your data is archived somewhere else.
Extractions maximize usage of available CPUs. The request queue now shows two requests: the initial one to create the batch (of files) and the other to show the progress and extractor results.
The Thermo extractor is now twice as fast.
- XtractorFinnigan reads the centroided data from the .RAW file rather than Spectrum Mill doing the centroiding.
- The default choice is to use the Xcalibur centroiding. It does a better job of using appropriately narrow windows across the entire mass range (particularly important for the poorly resolved TMT-10 peaks).
- Because the intensities are scaled differently (10-100-fold), you should not mix Spectrum Mill centroiding and Xcalibur centroiding across multiple directories that will later be used for a combined report.

MS/MS Search

Capability to select variable protein N-terminal and C-terminal modifications.
- The variable modifications on the protein termini are treated slightly differently from other variable modifications; to see if a peptide terminus was modified, you must mark the N-term or C-term review field in Protein/Peptide Summary.
- Also, these modifications are not subject to VML scoring.
Protein pI and required/disallowed amino acids have been removed from the search form. When you load an old parameter file, you are warned if any of these parameters contain obsolete values. You must re-save the parameter file to avoid the warning and to ensure the search is processed properly in workflows.
The new default for Discriminant Scoring is “Off”, which prevents creation of the extra files required when you autovalidate with the Autothresholds-discriminant strategy.
- The “Default (same as score)” is now “Score only” and is not a recommended setting. (It is there for backwards compatibility.)
- Either use “Off” or select a coefficient set.
List of instruments to choose from in MS/MS Search has been significantly reduced. (You can show all by editing smglobals.js.)

Utilities

Addition of isoelectric focusing (IEF) in Peptide Selector, so you can predict which off-gel electrophoresis fractions are likely to have unique peptides.
New Peptide String Match utility (on home page). This utility allows you to search a database with a list of peptide sequences.

Protein Databases

Addition of tool to Create Category File, which allows you to create your own report fields to associate with a protein. You can then limit summaries to the set of proteins of interest.
Reordered the list of tools.
New Re-index existing database tool. After downloading a new version of a database, select the database from the list to re-index. Re-create any species subsets after you re-index the main database.

New in B.04.01

The B.04.01 version of Agilent Spectrum Mill MS Proteomics Workbench provides the following new features:

Data Extraction: Extract to mzXML for .raw and Agilent Q-ToF .d data
Automation: MS/MS Search supports multiple CPU cores
Protein/Peptide Summary changes
Autovalidation changes
Export results to Mass Profiler Professional (MPP)
Peptide Selector: Export Q-TOF Inclusion lists from results
Q-Exactive Support
TMT Correction Factors and TMT6 changes
Tool Belt: Enhanced quality metrics reporting
Protein Databases: Make non-redundant
Build TIC changes
Changes with the B.04.00 version

Data Extraction

The Agilent Q-ToF data extractor for B.04.01 now adjusts the tolerance based on m/z for finding related precurors with SILAC data. The differences are minor, but you will see some differences. The same adjustments may affect merging, where fewer spectra might be merged.

Extract to mzXML

The data extractors for Agilent Q-TOF (.d) and Thermo (.raw) data now extract to mzXML instead of multiple pkl files. The reduction in the number of files improves performance and simplifies archival of results. The mzXML may be combined with pepXML (from Tool Belt) to support Skyline's creation of peptide spectral libraries for MRM experiments and data independent analyis (DIA).
The extractor form has a new Create PKL files instead of mzXML check box field to have the extractor create PKL files instead of mzXML. Mark this check box if you use additional software to process Spectrum Mill results that has not been updated to accept the mzXML produced by Spectrum Mill.
To maintain backwards compatibility, many features in Spectrum Mill still refer to an individual spectrum by its pkl filename. The pkl filename is now an attribute in the mzXML. When the Spectrum Viewer or the MS/MS Search program needs to retrieve an individual spectrum, they are configured to look in both the old location (cpick_in/*.pkl) or in the new location specified within the mzXML for the spectrum.

Automation: MS/MS Search supports multiple CPU cores

Spectrum Mill's MS/MS Search can now take advantage of multiple CPU cores. In the MS/MS Search page, select Maximize CPUs, and in the Workflows page, select Max CPUs per search. When enabled, batches from a single data directory will be assigned to available CPUs. The parallel searches are listed in the Request Queue as tasks with a "P". The search progress is indicated as the number of batches completed out of total batches. When using this search mode, it is recommended to increase the batch size to 150 or more, instead of the former default of 81.

To prevent a single user from unfairly monopolizing all CPU's on a Spectrum Mill server, all directories submitted to the queue with Maxmize CPUs enabled compete for CPU access. When multiple directories are submitted to search, they all will share the available CPUs, with processing occurring in parallel. When Maximize CPUs is not enabled, each search on a directory retains exclusive access to a single processor until all of its batches have completed.

When you select a queued Maximize CPUs search to Remove that has begun processing, the Request Queue will continue to show the search in the queue. Wait a little while for any currently executing batch searchs to complete, then click Request Queue to refresh the list.

Protein/Peptide Summary Changes

Review Fields

The Review Fields layout has been changed slightly to accomodate additional fields. The new spectral quality related fields include Prec Av Chi2 (Precursor Averagine Chi²) and Isol Pur (Isolation Purity). The pI field is now Prot pI, and the Peptide pI field is now just Pep pI. The Var mod sequence field is now VML sequence.

Protein Grouping

Protein Grouping was revised to disallow transitive closure. With transitive closure disallowed, all proteins belonging to a group must share at least 1 peptide. With transitive closure allowed (prior versions), proteins A and B which share peptide 1, while proteins B and C which share peptide 2, would lead to grouping A, B, and C together.
A defect in subgroup subsuming was fixed. The subgroup subsuming could cause loss of some subgroups and the peptides which were unique to them. The most affected samples might be ones derived from multiple species, i.e. human/mouse xenografts.
The protein subgrouping logic was revised to allow a protein that shares peptides with 2 different groups to be taken away from a higher scoring group if membership in a lower scoring group is achieved with more shared peptides. That is, when the first protein in groups A and B share no peptides, and protein C shares 1 peptide with A1 and more than 1 with B1, the prior versions would have put C in group A. The B.04.01 version puts it in group B.

You may re-enable the grouping mode used in prior versions by selecting the Prot pI review field.

Export results to Mass Profiler Professional (MPP)

The Protein-Protein Comparison Columns and Protein-Protein Comparison Redundant summary modes now support exporting to the MPP generic format. To only report the top group protein, use Protein-Protein Comparison Columns with 1 shared peptide as the Protein grouping method. To report all proteins, use the 1 shared, expand subgroups method. The Protein-Protein Comparison Redundant mode will always report all proteins (the 1 shared, expand subgroups method is not allowed in this mode).

Peptide Mode changes

You can now Filter to distinct peptides in several different ways:
- Off -->
- Case insensitive -- When collapsing to "distinct", a case-insensitive string compare is used, thus peptides with variable modifications (lowercase AA's) and unmodified peptides are combined.
- Case sensitive -- When collapsing to "distinct", a case-sensitive string compare is used, thus peptides with variable modifications (lowercase AA's), different localizations of those variable modifications, and unmodified peptides are kept separate.
- Charge file CS -- When collapsing to "distinct", a case-sensitive string compare is applied to both the sequence and spectrum filename prefix, thus peptides from different LC-MS/MS runs and those with different precursor charges are kept separate.
- You can now create an inclusion list for Agilent Q-TOF instruments. Mark the Export inclusion list for a specified top peptides/protein check box and enter a value for the maximum number of peptides to target per protein.
  (This feature is only available if Agilent Q-TOF data has been selected.)

Protein-Peptide Comparison Columns Mode: Enhanced VML Reporting and Excel Export

The Protein-Peptide Comparison Columns summary mode has been enhanced to enable Excel export and to allow peptide level or modification site level report organization. Specific changes include:

Excel export (.ssv semi-colon delimited) is now supported in this mode.
Multiple peptide spectrum matches can be collapsed to a single row in the table according to either variable modification site or peptide sequence. You can now specify to group rows by Sequence or by variable modification site (Var mod site). When Var mod site is selected, the following additional features are relevant:
- Mark the VML score check box and select the particular type of variable modification site in the menu next to the VML score check box (s|t|y for phosphorylation sites, k for uiquitin or acetylation sites) to specify the variable modification site used to collapse PSM's.
- All PSM's collapsed to a single row must have the same number of modification sites. Singly and doubly phosphorylated forms of the same peptide will be on separate rows, due to the potential for differences in quantitative measurements between sites.
- The displayed representative PSM in each column of a single row is the one having the highest VML score. Spectra with ambiguous and confident site localizations in the same peptide wil be collapsed together so long as they are not conflicting.
A new Spectrum Grouping Options section:
- The Group missed cleavages containing VM site(s) check box enables different missed cleavage forms of peptides containing the same modification site (AA position in the protein sequence) to be collapsed into a single row.
- The Show all grouped spectra check box allows one to inspect the collapsing behavior by reporting all the individual PSM's that are collapsed to an individual sequence or VM site. Because this results in a nested table with multiple rows in individual cells, Excel Export is not supported for this feature.
Reduced RAM consumption by 30% and increased protein grouping speed by 20% for Protein-Protein Comparison Columns reports to better handle large datasets.

Revised MS/MS Search Scoring

Revised MS/MS search scoring of Delta Rank1-Rank2 and Delta Foward-Reverse calculations to check for all isobaric characters against all the Rank1 hits, not just the 1st Rank1 hit, to help eliminate inordinately small values resulting from different indistinguishable localizations of variable modifications in the Rank1 hits.
Improved scoring of phosphopeptide MS/MS spectra from Agilent Q-Tof and Thermo HCD spectra by not requiring the presence of precursor minus phosphate ion in order to turn enable y-H₃PO₄ and b-H₃PO₄ fragment ion types.

Autovalidation Changes

The Autovalidation Auto thresholds strategy now supports a Min. Sequence Length filter. The default value for this field is 1 so as not to surprise users who upgrade from prior versions, but higher values are recommended.

Note: A value of 6 or higher is recommended for the Min. Sequence Length value. Sequences shorter than 6 tend not to be unique to a single protein.

In addition, the Peptide FDR for validated proteins has been removed from the Auto thresholds Protein polishing strategy. It is only available when using the Auto thresholds - discriminant strategy.

Note: It is recommended that you open and re-save all Autovalidation parameter files that use the Auto thresholds strategy that were created using B.04.00.

Peptide Selector: Export Q-TOF Inclusion lists from results

This feature is only available if Agilent Q-TOF data has been selected.
You can now generate Agilent Q-TOF target lists based on search results. The target list can include peptides from proteins that were unidentified from a prior list of proteins, that contain only one peptide, or a combination of both.

In Peptide Selector, first make sure the Protein(s) to Select From list (bottom left of the form) displays the list of accession numbers that were used to generate the orignal target list. In the Save File Parameters section, mark the Generate inclusion or MRM list file check box, and select MassHunter Q-TOF MS/MS target list selected. Then mark the From results check box to view the Valid Results to Filter selections. Use the Select... button to select the data folders containing the results. Select either Unidentified, Single peptide, or Unidentifed + Single peptide as the proteins to select. Click Select to generate the target list.

Q-Exactive Support

The data extraction and MS/MS Search now support data acquired from a Thermo Q-Exactive and Thermo Q-Exactive Plus instruments. You must either have installed Xcalibur on the Spectrum Mill server, or have downloaded and installed the MSFileReader (you must select the 32-bit version to install). See the Spectrum Mill Installation Guide for details.

TMT Correction Factors and TMT6 changes

The Tool Belt now supports creating and applying correction factors for TMT in additon to iTRAQ. In addition, the TMT modification definition has been changed to account for a prominent unmatched singly charged ion of mass 230 (cleavage of bond between reagent and peptide, with charge staying on the reagent side), and a prominent set of ions at (precursor mass - (155 to 159) ) /(parent_charge -1), from cleavage of the amide bond in the reagent group, with charge-1 staying with the peptide side of the bond.

Important TMT6 Changes

TMT6 reagents with lot numbers that begin with MJ and later modified the 127 and 129 reporter ions masses by almost 50 ppm. This release supports these changes. The prior TMT6 definition has been moved to smconfig.misc.xml and renamed with "pre-MJ" appended. Summaries, including ratios, with older data will still report correctly. However, File Details and Spectrum Summary may indicated different scores and ion labels if the 127 and 129 ions are not matched. If you need to extract and search data that was prepared with TMT6 lots prior to "MJ", please contact your Agilent representative to request a special data extractor and search program that will support the older definitions.

Tool Belt: Enhanced quality metrics reporting

The Tool Belt tool "Report FDR and search statistics" is now "Report FDR and quality metrics", and has been enhanced to provide quality metrics in addition to the search statistics. The new metrics include sample handling, site localization, and isobaric label incorporation.

Protein Databases: Make non-redundant

The Protein Databases Utility page provides a new utility to remove redundant entries. When keeping/removing redundant entries (identical sequence), only the one nearest the top of the FASTA file is kept. However, for UniProt databases, SwissProt entries are preferentially kept over TrEMBL entries regardless of order in the FASTA file.

Build TIC changes

The Build TIC form has changed to provide more filtering options for data that has been extracted to mzXML. The "Neutral loss" selection has been removed, and the "Y axis" filtering selections have been enhanced to allow for building TIC's based on the following:

MS1 Precursor Intensity
MS1 Precursor Chromatographic Peak Width (sec)
MS1 Precursor Isolation Purity %
MS1 Precursor Averagine Chi2
MS2 Base Peak Intensity
MS2 Max Sequence Tag Length
MS2 Phosphate Precursor-Neutral loss (% of base peak)
MS2 Dissociated Intensity %

The data must be extracted with B.04.01 in order to fully support the above options, and only data which has been extracted to mzXML supports all of the above.

The "MS1 Precursor Intensity" corresponds to the former "Neutral loss" of "None". The "MS2 Phosphate Precursor-Neutral loss (% of base peak)" corresponds most closely (but not exactly) to the former "Neutral loss" of "H2PO4". The "MS2 Dissociated Intensity" corresponds most closely (but not exactly) to the former "Neutral loss" of "H2O". These are the only filtering selections that will work with pkl files.

New in B.04.00

The B.04.00 version of Agilent Spectrum Mill MS Proteomics Workbench included extensive new workflow automation capabilities, the ability to use Spectrum Mill identification results to create MRM lists for triple quadrupole instruments, as well as many other enhancements that increase the flexibility of the software and enable greater productivity. This section describes differences between the A.03.03 and the B.04.00 versions of the Spectrum Mill workbench.

Installation and Configuration
Process Automation Tools
Creation of MRM Lists
False Discovery Rate
Data Extractors
MS/MS Search
MS/MS Autovalidation
Protein/Peptide Summary
Tool Belt
Peptide Selector
Peptide List to Masses Utility
Protein Databases
Server Administration
What happened to Easy MS/MS Search?

Installation and Configuration

The following installation and configuration features are new in the Spectrum Mill workbench version B.04.00. For details, see the Installation Guide.

Operating system

Spectrum Mill B.04.00 is supported on the following operating systems:

Windows XP (SP3)
Windows Server 2003 (SP2 or later)
Windows 7
Windows Server 2008 R2

Both 32-bit and 64-bit versions of the above operating systems are supported. Spectrum Mill programs run as 32-bit applications but are able to address up to 4Gb of memory per process when running on 64-bit systems.

Windows Server 2000 is no longer supported.

If you configure a new system, Agilent recommends that you install a 64-bit operating system and provide adequate memory (8 GB or more) and adequate disk space (1–2 TB for data).

Perl

ActiveState Perl 5.14.2.1402 is supplied with this release. Both 32-bit (x86) and 64-bit (x64) versions are provided, so select the one that is appropriate for your operating system. See the Installation Guide for details on installation of Perl and implications for configuration of IIS.

Java (JRE)

Java JRE 1.6.29u is included in this release. Both 32-bit (i586)) and 64-bit (x64) versions of JRE are provided.

Be sure to install JRE on all client browsers, as well as the Spectrum Mill server.

IIS

Install IIS before installing Spectrum Mill workbench.

See the Installation Guide for details on installation and configuration of IIS.

MSFileReader Support (for .raw files)

Prior versions of Spectrum Mill required that Xcalibur software be installed to process Thermo Scientific (.raw) data files. You no longer need Xcalibur, provided you install the Thermo Scientific MSFileReader, which you can download for free. Xcalibur is still supported. Refer to the Installation Guide for details.

Process Automation Tools

New process automation tools let you perform an entire data analysis, from spectral extraction to final results summary, in a completely unattended way. This new capability includes:

Parameter files that allow you to save settings for forms and use them in automated workflows. You can save parameter files for Data Extraction, MS/MS Search, Autovalidation, Protein/Peptide Summary, Sherenga de novo Sequencing, PMF Search, and PMF Summary. (You can also save parameter files for Peptide Selector, MRM Selector, and Sherenga de novo Summary, but these are not used in workflows.)
A new Edit Workflow page that lets you create and edit workflows. Workflows consist of a series of tasks that use parameter files. For example, you can create a workflow that does Data Extraction, MS/MS Search, Autovalidation, and Protein/Peptide Summary.
A new Workflows page that lets you execute workflows on single or multiple folders. Tasks assigned to the same data folder are processed seequentially. If multiple folders have been selected for a workflow, the same tasks in each folder are executed in parallel. You can also start multiple workflows, which can run in parallel if the Spectrum Mill server has multiple CPUs.
New Request Queue/Completion Log viewers that allow you to monitor the progress of workflows. The Request Queue makes sure that the tasks are processed in the correct order, and that multiple workflows are executed properly.

The Request Queue also accepts interactive task commands to be executed either as a single task or as part of interactive automation. For example, when you click Validate Files after marking the Request queue check box, this task is placed in the queue and executed either in parallel to other workflow tasks or consecutively in the order it appears in the queue. You can perform an interactive automation by manually clicking each action button in their respective pages, such as Extract, Start Search, Validate Files and Summarize, one after the other.

Creation of MRM Lists

The new MRM Selector streamlines protein quantitation workflows that use identification results to create multiple reaction monitoring (MRM) lists for triple quadrupole instruments.

In the new MRM Selector form, you select data that has been searched, then filter the results to include only those peptides that meet your requirements. (The filters are similar to those in Protein/Peptide Summary.) Then you can automatically generate a list of transitions for multiple reaction monitoring.
Output formats include Agilent Triple Quadrupole MRM, Agilent Triple Quadrupole DMRM, Agilent Triple Quadrupole Optimizer, and ABI Triple Quad.
The Agilent Q-TOF Data Extractor has been enhanced to output the collision energy (CE), peak apex, and chromatographic peak width in the specFeatures.tsv file, so that you can use these values for dynamic MRM (DMRM) generation. The collision energy is especially useful when you generate a DMRM list for an Agilent Triple Quadrupole from Agilent Q-TOF data, because the collision cells are the same.
You may instead choose to have the collision energies calculated based on an equation, and you have the option to type a chromatographic peak width that will apply for all peaks in the DMRM analysis.
You can save and load parameter files that contain settings for this form.

False Discovery Rate (FDR)

False discovery rate (FDR) is an independent measure of the likelihood that the results are incorrect. Calculation of FDR is important to ensure the validity of results, and is a requirement for publication in some journals. Spectrum Mill workbench now provides tools to calculate FDR in Autovalidation and report FDR's in the Tool Belt at the peptide and spectra level, and protein level. See "What's new" for autovalidation, and the Tool Belt, to learn more about these new options.

Data Extractors

You can now perform data extraction as part of a workflow, which saves time and minimizes your effort.

You can select multiple data folders for extraction. Each folder is a separate request that is queued and executed when a CPU becomes available.
Some data extractors (.raw and Q-TOF .d ) determine a retention time (RT) apex and report that in specFeatures.tsv, which improves quantitation for techniques like SILAC and ICAT.
The Agilent Q-TOF extractor also reports the collision energy, which can be used in the MRM Selector to generate a DMRM list.
The Agilent Q-TOF .d and Thermo .raw extractors calculate some metrics that are used in the Tool Belt search statistics report, and they calculate an averagine chi squared value that is used in Protein/Peptide Summary.

Java is a U.S. trademark of Sun Microsystems, Inc.

MS/MS Search

MS/MS Search can now be part of an automated workflow.
You can select multiple data folders for MS/MS Search. Each folder is a separate request that is queued and executed when a CPU becomes available. A search is automatically dependent upon any extraction request for the same folder.
A new discriminant scoring field is available in MS/MS Search. This scoring option combines several metrics, including score, SPI, backbone cleavage score, among others, and excludes more false positives thant the “Score” metric. Discriminant coefficients determine the relative importance of each of the metrics in the calculation. Agilent provides coefficients for Agilent Q-TOF and Ion Trap instruments, but you must create your own coefficients in the Tool Belt if your study requires them or you are using another vendor’s instrument.

MS/MS Autovalidation

False discovery rate is incorporated into autovalidation, which makes it easy to validate only those hits that have a low chance of being false positives.
You can now use three strategies with different modes to autovalidate results: Fixed thresholds (Uses Protein and Peptide Rules for thresholds whose values you can enter, and calculates an FDR using reversed hits), Auto thresholds (optimizes the score and delta R1-R2 score thresholds until a target FDR is reached), and Auto thresholds -discriminant (optimizes the discriminant score threshold until a target FDR is reached)
For each of the three strategies for autovalidation, you can perform two steps (called modes). See the Autovalidation section of Software Basics to learn more about these modes.
MS/MS Autovalidation can now be done either interactively or via an automated workflow.
Buttons allow you to remove the results of the last autovalidation you performed, or remove all autovalidation results.

Protein/Peptide Summary

You can now exclude spo files (search results files) in results summaries.
Protein-related summary modes that enable quantitation now allow you to exclude:

Precursor extracted ion chromatograms that have poor isotope quality and results that have poor precursor isolation purity. This feature applies to Agilent Q-TOF .d and Thermo.raw data.
Precursor isolation purity below a specified percentage

Review fields for differential expression quantitation (DEQ) now allow you to:

Display median, mean, or both for DEQ ratios
Exclude outlier DEQ ratios for protein quantitation

Additional modifications for DEQ are supported, including full support for iTRAQ and TMT.
You can now select to report variable modification localization (VML) scores and sequences for modificatiion sites.
Protein Summary Details mode now allows you to export to Microsoft^® Excel.
The default size of the Spectrum Viewer (in the Protein/Peptide Summary results) has changed to allow for easier copy/paste into Microsoft PowerPoint or other documents.
Accurate Mass Retention Time (AMRT) export (Peptide mode) improves integration with feature assignment and annotation in Agilent MassHunter Mass Profiler Professional, by enabling you to more easily map features to identifications from the Spectrum Mill workbench.
The Review Fields section has been rearranged to use less space in all of the modes.
Protein/Peptide Summary can now be done either interactively or via an automated workflow.

Tool Belt

Ability to report FDR on protein, peptide and spectral levels
New feature to create your own set of discriminant scoring coefficients
New capability to export an MS/MS search summary file to PepXML format. PepXML is a results exchange format that is supported by the Trans-Proteomic Pipeline from the Institute for Systems Biology and can be imported into other software packages, such as Skyline, an MRM data analysis package from the Univiversity of Washington.
Ability to convert data extractor pkl files into mzXML files, which let you import Spectrum Mill results into other software packages, such as Skyline. Or, you can use these files as links to labeled MS/MS spectra required to identify post translational modifications for papers published in MCP.
New feature to archive instrument-created files, search results files, spectral files and data directories

Peptide Selector

Ability to generate MRM lists and inclusion lists for Q-TOF that you can copy and paste directly into Excel or Agilent MassHunter Data Acquisition software and to generate lists that can be exported to MassHunter Qualitative Analysis to create an Accurate Mass (AM) database
Capability to save and load parameter files
New features that assist in the targeted proteomics workflow
New Protein Position Filtering settings that enable you to restrict the selection based on the position of peptides within the protein.
New Scoring settings that let you penalize, rather than exclude, peptides that fail the selection criteria

Protein List to Masses Utility

A new utility is available to calculate the masses and formulas for a set of specified peptides.

Protein Databases

Ability to compare two databases to determine whether their content is different, which is useful when you need to remove redundant databases from the Spectrum Mill server.
Option to create a subset FASTA file from accession numbers that you type, which is useful for limiting searches to the set of proteins of particular interest.

Server Administration

With the advent of automated workflows, administrators can limit the number of parallel workflow processes to less than the CPU count.
Administrators can start and stop the Spectrum Mill Workflow Manager Service to troubleshoot workflows.

What happened to Easy MS/MS Search?

Easy MS/MS Search has been replaced by new process automation tools that provide more power and flexibility.

Microsoft is a U.S. registered trademark of Microsoft Corporation.