What's New in This Version of Spectrum Mill
New in BI.08.01 (December 2022)
The BI.08.01 version of Spectrum Mill is a major version release due to the addition of:
- Bruker Data Extractor for TimsTof data
- Report-to-Plots tool for making plots of metrics from SM tabular reports (PSM/Peptide score metrics, sample handling Quality Metrics, and Spectral Features)
- Subset-specific FDR filter for extracting a subset of peptides/PTM-sites derived from a rare source, apply subset-specific FDR filtering
- Overhaul of underlying data structures that diminish by 2-fold the memory required for Protein/Peptide Summary reports for large TMT proteome cohorts.
Several other new features have also been included.
Below describes additional differences between the BI.07.11 and prior versions of Spectrum Mill.
Installation and Configuration
Python v3.9
- Anaconda Anaconda3-2022.05-Windows-x86_64.exe is supplied with this release.
Workflow Automation
New features motivated by the analysis of cohorts of immunopeptidomic samples where each sample is analyzed independently
using an identical workflow except for a personalized sequence database with single amino acid variants (SAAVs)
derived from whole exome sequencing (WES) and aberrant splice junctions derived from RNA-seq.
- Override Workflow Database:
- Apply all tasks to individual directories (no combined autovalidation or reports)
Protein Sequence Database Utilities
- Individual Databases Instead Of Combined - option added to the Concatenate FASTA files utility,
when the Make Proteogenomic Summary Tables feature is used.
- Translate nucleotide FASTA to protein FASTA utility introduced.
Data Extractor
XtractorBruker .d TimsTof Extractor introduced with the following features akin to the SM Data Extractors for other instrument vendors:
- 2D MS/MS replicate merging (in both retention time and ion mobility dimensions)
- Filtering by MH+, z, RT, Sequence tag length
- MS/MS spectral feature calculation
- SRM control of XtractorBruker enables Max CPU extraction
XtractorBruker does not yet read/process MS1 scans. Hence, the following spectral features are currently missing:
MS1 precursor peak area, Chromatographic peak width, Precursor ion purity.
However, an MS1 intensity is stored in the Bruker Precursor SQL table and is currently output as the SM precursor peak area.
Description from the Bruker SQL table documentation:
Intensity of this precursor in the corresponding MS ^ 1 frame. The corresponding MS ^ 1 frame in which this precursor
was detected. In the case that MS ^ 1 frames were repeatedly measured and averaged to improve SNR for precursor detection,
the TDF stores those frames individually, and this field points to the last of that set of frames.
Protein/Peptide Summary
While not readily apparent to users, the underlying data structures for spectral features were overhauled to acheive two objectives:
- Diminish by 2-fold the memory required for Protein/Peptide Summary reports for
large TMT proteome cohorts, particularly when transitioning from TMT10/11 to TMT16/18.
- RICF - Enable combined reports of multiple data directories using different TMT reagents
for individual data directories. The reagent used for each directory is automatically diagnosed from
the reporter ion correction factors (RICF) file. The particular control ion (ratio denominator) used for each
data directory must be consistent (first, last, or MedianMulti).
Process Report
- Auto-SA reporter ion label type. Reporter ion label can be automatically determined from the sample annotation (SA)
file. Accompanies the P/P Summary RICF feature to enable combined reports of multiple data directories using
different TMT reagents for individual data directories.
- -precursorIntensity.gct file now produced for label type - Label free.
- -reporterIntensity.gct file generated for iTRAQ/TMT label types when sample annotation (SA) file contains a ratioDenominator column.
Lorikeet spectrum viewer
- Added support for internal ions.
When using the Microsoft Edge Browser in Internet Explorer mode it remains possible to use the classic SM Java
applet from the SPI link in SM HTML output. However, configuring IE mode is a considerable hassle. Lorikeet support
for internal ions, essentially renders the classic viewer obsolete. The feature of the proportional peptide sequence
with gray bars at theoretical ion positions in the classic viewer may make it into a future revision of the Lorikeet viewer.
Report-to-Plots
- Introduced Report-to-Plots tool for making plots of metrics from SM tabular reports (PSM/Peptide score metrics, sample handling Quality Metrics, and Spectral Features)
This standalone tool provides a convenient means for developing and generating plots from various SM reports. Feedback is welcome.
Users are encouraged to use the underlying Python scripts as a starting point for generating their own customized plots.
In future SM releases plot generation by RtP will be a Task that can be added to an SM workflow and
some existing tasks (especially Quality Metrics and P/P Summary) will automatically generate particular plots.
Subset-specific FDR filter
- Introduced Subset-specific FDR filter for extracting a subset of peptides/PTM-sites derived from a rare source, apply subset-specific FDR filtering
This tool was initially motivated for analysis of immunopeptidomics datatsets with the objective of increasing confidence
in the identification of HLA-bound peptides derived from non-canonical unannotated open reading frames (nuORFs).
Following conventional aggregate FDR Filtering of a dataset with the SM Autovalidation module the basic approach is:
- Make Subset(s)Peptides of Interest, based on values in the species column of an SM report or a user-provided column
derived from the P/P Summary Category file option.
- Calculate FDR of Subset(s)
- Apply Fixed Thresholds - stricter thresholds to several scoring metrics
- Re-calculate FDR of Subset(s)
- Perform a dynamic, Grid Search to optimize the backbone cleavage score (BCS), score, SPI, BCS% thresholds
for Subset(s) to reach a target FDR of <1%.
New in BI.07.11 (April 2022)
The BI.07.11 version of Spectrum Mill is mostly a minor maintenance release to aid new users getting started that is motivated by
revised default saved parameter sets that now don't refer to Broad Institute project specific category files in P/P Summary that are not present
on a newly installed SM server. Nonetheless, a few other new features have been included.
Below describes additional differences between the BI.07.11 and prior versions of Spectrum Mill.
Default Saved Parameter sets
- Revised several parameter sets to remove features specific to use at Broad Institute, like category files in P/P Summary that are not present on a newly installed SM server.
Data Extractor
Thermo .RAW Extractor revisions include:
- Created instrument params for Orbitrap CID HLA v3, that for high resolution CID spectra provide more sensitive peak detection before calculaing the spectral feature: max sequence tag length.
MS/MS Search
- Exposed control of the feature: Skip QUILTS Unmutated Peptides. This was introduced in v7.09 in a faulty way which caused searches using Ensembl protein
identifiers to report only a single protein identifier, from the longest ptorein containing the matched peptide. When the skip feature is checked in v7.11
it now performs as intended with all reference proteome identifiers present. Proteogenomic search revision to NOT report hits to wt peptides in proteins containing SAAVs, applies only to Ensembl sequence identifiers.
Implemented by not allowing extended accession nums (mutant protein) with same peptide sequence matched to an unextended accession num.
- Created instrument params for Orbitrap CID HLA v3, that for high resolution CID spectra provide more sensitive peak detection and fragment ion type scoring suited for HLA peptides.
- Added the Spectral Quality Filter: Phospho Product Ion Score (PPIS) to craft subsets of spectra containing a phospho neutral-loss ion signature.
Process Report
- Fixed bug preventing Plot Ratio Distributions from running. Extended plots to support .GCT format.
- To Normalize Reporter Ratios added features:
- Retain Species:
- Combine replicate columns & omit QC.fail
- Apply prior factors (for subsets to use aggregate)
New in BI.07.09 (Sept 2021)
The BI.07.09 version of Spectrum Mill includes the new key features:
- Support for TMTpro-18 reagents
- Integrated Lorikeet spectrum viewer both as a link from Protein Peptide Summary and as a standalone page when using browsers other than Internet Explorer (IE11).
- Process Report now export of .GCT format reports in conjunction with *sample-annotation.csv files (vertical layout) for mapping samples to reporter ions.
- Added the Spectral Quality Filter: Glyco Product Ion Score (GPIS) to craft subsets of spectra containing a glyco ion signature.
Below describes additional differences between the BI.07.09 and prior versions of Spectrum Mill.
Installation and Configuration
R
- R R-4.1.0-win is supplied with this release.
Data Extractor
Thermo .RAW Extractor revisions include:
- Added support for TMTpro-18 reagents
MS/MS Search
- Proteogenomic search revision to NOT report hits to wt peptides in proteins containing SAAVs, applies only to Ensembl sequence identifiers.
Implemented by not allowing extended accession nums (mutant protein) with same peptide sequence matched to an unextended accession num.
- Added support for prompt neutral loss mods like O-HexNAc, used to trigger calculations of peptide parent ion modified paired with fragment ions unmodified.
- Added the Spectral Quality Filter: Glyco Product Ion Score (GPIS) to craft subsets of spectra containing a glyco ion signature.
Protein/Peptide Summary
- Integrated Lorikeet spectrum viewer. Link from SPI in HTML output now launches Lorikeet viewer when using browsers other than Internet Explorer (IE11). Classic SM Java
applet continues to be launched, when using IE11
- Enabled VM-site reports to have annotated PG Features (Variants & Spliceforms) included along with all others
- Revised PG features column headers so they are now valid R names.
- Added support for TMTpro-18 reagents
- Changes to behavior of the Control ion menu when selecting MedianMulti or MeanMulti:
- If from the UI you choose MedianMulti but then select NO individual accompanying control ions the script (not the UI) will default to all of the ions for that label.
- When all reporters or no reporter for a particular label are chosen, the denominator annotation in the output is shortened (ex: MedianMulti.all.18 for TMT18) instead of listing every single reporter ion.
Process Report
- Added support for TMTpro-18 reagents
- Added support for *sample-annotation.csv file (vertical layout) for mapping samples to reporter ions.
- Added ability to generate .GCT formatted output reports.
Protein Sequence Database Utilities
- Added the utility Remove fragments - UniProt
- Overhauled the Calculate statistics utility. Output now in table format and includes 9-mer redundancy factor.
Tool Belt
- Added support for TMTpro-18 correction factors
- Create Reporter Ion correction factors
- Apply Reporter Ion correction factors
New in BI.07.08 (June 2021)
The BI.07.08 version of Spectrum Mill contains minor changes. Below describes additional differences between the BI.07.07 and prior versions of Spectrum Mill.
Protein/Peptide Summary
- Updated default saved parameter sets from ~2015 to 2021
- Revised to only calculate coverage maps when % coverage requested, to save memory on very large CPTAC reports.
- Fixes to get PG feaures to report in Peptide/PSM report modes.
Protein Sequence Database Utilities
- Overhauled the manual. Updated supported database descriptions from ~2007 to 2021
de novo Sequencing
- Removed some residual software testing features from the UI.
Spectrum Summary
- Updated default parameter sets for co-reporting de novo and DB search results.
- Fixes to Rscript for making plots of performance metrics for de novo and DB search results.
New in BI.07.07 (March 2021 - first Broad Institute release)
The BI.07.07 version of Spectrum Mill contains updated JAVA applets for spectrum viewing with certificates
valid until Jan 2024. Below describes additional differences between the BI.07.07 and prior versions of Spectrum Mill.
Installation and Configuration
Operating systems supported
- Windows Server 2016 (encouraged)
- Windows 10 (single-user environment)
Web browser support
- Google Chrome (primary development, testing use at Broad Institute)
- Microsoft Edge, Firefox, and Opera (tested occasionally)
- Microsoft Internet Explorer 11 (required only for Spectrum Viewer applied use)
The following installation and configuration features are new in Spectrum Mill vBI.07.07. For details, see the Installation Guide.
Perl
- Strawberry Perl v5.32.1.1 is supplied with this release. This SM release is backwards compatible with ActiveState Perl v5.18.4 supplied with prior SM releases.
R
- R R-3.6.2-win is supplied with this release.
JRE
- Java JRE v1.8u202 is supplied with this release.
Install both the 32- and 64-bit JREs, even if you run the 64-bit Internet Explorer (IE 11).
Later versions of JRE 8 may be available (see www.java.com), and in general should work with Spectrum Mill.
The JRE is only required to support the Spectrum Viewer applets and the Sherenga de novo program.
For the Spectrum Viewer applet be sure to install JREs on all client computers, as well as the Spectrum Mill server.
Thermo Fisher MSFileReader (for .raw files)
- MSFileReader_3.1_SP4 is supplied with this release.
Default parameter sets
- Nearly all default parameter sets have been updated since vB.06.00.
Home Page
- Added links to:
- Slides: Spectrum Mill - Overview
- Getting Started Guide
- Removed links to obsolete utilities
- MS Edman
- MS Comp
- MS Isotope
Data Extractors
Thermo Fisher .RAW Extractor revisions include:
- Updates to use the most recent release of the Thermo Fisher API MSFileReader_3.1_SP4.
- User control of peak detection thru addition of instrument menu to allow matching settings between Data Extractor and MS/MS search.
MS/MS Search
- Updated peak detection to improve sensitivity originally implemented for HLA peptides now applied to other instrument definitions.
- ESI QExactive HCD v4 35 (March 2020)
- ESI QExactive HCD v4 30 (March 2020)
MS/MS Autovalidation
- Updated the VM site grouping used in VM site polishing mode that is also shared with P/P Summary (see below).
Protein/Peptide Summary
- Updated the VM site grouping used in Protein - Var Mod site reporting mode to more consistently handle grouping PSMs from
confident localizations and multiple ambiguous localizations overlapping the same region of a protein. Ambiguous localizations
are now grouped with consistent confident ones preferrentially over other ambiguous localizations that are not consistent with the confident localization.
Previously the grouping was preferentially combining PSMs with the most N-terminally positioned localization.
New in BI.07.00 (November 2016 - Feb 2020)
The BI.07.00 version of Spectrum Mill eliminates the dependency on Internet Explorer, and can now be used with most
web browsers including Google Chrome, Firefox, and Microsoft Edge. Primary development and testing is now done with Google Chrome.
Below describes additional differences between the BI.07.00 and prior versions of Spectrum Mill.
Home Page
- Added links to:
- Help Index
- Custom Modifications Guide
- Installation Guide
- Removed obsolete tools
- PMF Summary
- de novo summary (functionality now included in Spectrum Summary)
Data Extractors
Thermo .RAW Extractor revisions include:
- Added support for ETHCD dissociation.
- Added support for FAIMS so that all MS/MS and MS XIC's are associated with the compensation voltage (CV) used.
- Improved support for Lumos data MS1 chromatographic peak detection and monoisotopic m/z adjustment
- Added support for TMT11 & TMTpro-16 reagents
MS/MS Search
- Updated peak detection to improve sensitivity by revision of the noise level calculation performed for each spectrum prior to signal/noise based peak detection.
This primarily affects spectra of low abundance peptides with very little noise, and not only leads to higher identification scores for low-signal spectra, but also
allows more low-signal spectra to pass the sequence tag length based spectral quality threshold employed by the data extractor.
- ESI QExactive HLA v3 30 (June 2019)
- Newly optimized MS/MS search scoring of fragment ion types for HLA class I peptides.
- ESI QExactive HLA v2 (December 2017)
- No enzyme search efficiency optimized to no longer require using the Disable Skipping Repeat Peptides in Database
- Requires hardware with a Memory to CPU ratio of ~3 GB RAM / CPU
- For a typical sequence database the searches should be ~2X faster
- Revised to allow variable modifications with accompanying sequence motifs to substantially reduce the search space to the most likely
possibilities for some modifications. Primary examples include:
- Deamidated NG (only when preceding Gly)
- Hydroxylation of PG (only when preceding Gly), for hydroxyproline in protein collagen domains
- TMT10 contains His (Y,S,T) (only when peptide contains His), for overlabeling with TMT reagents
- Added support for TMT11 & TMTpro-16 reagents
- Revised to allow no enzyme mode support for protein N-terminal acetylation
- Added the Full length digest option, which prevents making subsequences of a protein entry in the database
thus attempting to match only the full length sequence. This is intended to handle very large databases
of short HLA peptide candidate sequences.
- Support for ETHCD dissociation on the fragmentation mode menu to support LC-MS/MS runs with multiple fragmentation methods
employed in a single run.
- Bug fix to allow modifications with a negative mass shift to match with multiple occurrences in a peptide.
This was intended to handle Acetylated Lysines in TMT labeled peptides. The negative mass shift is due to in vivo
acetylation preventing subsequent in vitro labeling with TMT
- Support for ETHCD dissociation on the fragmentation mode menu to support LC-MS/MS runs with multiple fragmentation methods
employed in a single run.
MS/MS Autovalidation
- Added additional peptide level filter, backbone cleavage score (BCS) primarily for immunopeptidomics of HLA classI peptides.
BCS is a peptide sequence coverage metric and the BCS threshold (typically 5) enforces a uniformly higher minimum sequence coverage for each PSM,
at least 4-5 residues of unambiguous sequence. The BCS metric serves to decrease false positives associated with spectra having fragmentation
in a limited portion of the peptide that yields multiple ion types.
- Added mode VM site polishing to preferentially retain VM sites that are recurrent across mutliple experiments, while invalidating low scoring VM sites
observed in single experiments.
- Added protein polishing features:
- Added protein grouping option, expand subgroups, top uses shared
- Added retain proteins above either threshold (less strict - retains recurrently observed proteins with scores below threshold)
Protein/Peptide Summary
- Reorganized and revised protein grouping options:
- Added new protein grouping method: expand subgroups, top uses shared
- Existing protein grouping method, subgroup specific moved from a checkbox to under the protein groping menu: expand subgroups, ignored shared, SGS
- Revised Protein Comparison mode to make 2 reports for Excel Export. When either top shared, or ignore shared is selected the all shared report is also made.
- Added support for TMT11 & TMTpro-16 reagents
- afRICA - dynamic reporter ion correction algorithm implemented
- Improved SILAC dataset support (DEQ ratios) in Protein Comparison mode
- Added ability for user to specify the threshold value of
Exclude poor isotope quality Precursor XIC's: Chi2 vs. Averagine
- Fixed bugs so that Exclude outlier DEQ Ratios (> 2 std dev from mean) works properly.
- Added multi species reporting in protein comparison mode reports for better support of xenograft datasets.
- Added support for VMsiteFlankingSequence in Protein VMsite reports
- Revised mode Protein Prot Genom Site Comparison reporting of variants and spliceforms to be more generic instead of CPTAC2-specific
- Added support to report FAIMS compensation voltage (CV) for PSMs
- Fixed bug so fill time is reported for PSMs
- Added sequenceMulti column for peptide mode outputs.
- Added reporting of fragmentation metrics/categories for MS/MS spectra to PSM/peptide reports
Quality Metrics and False Discovery Rate (FDR)
- More accurate MS1 chromatographic peak widths for Thermo Lumos datasets, based on Data Extractor changes. Also stopped counting 0’s towards median peak width metric for all instrument types.
- Added additional fill time metric medianTrapFillMsecUnmaxed to exclude spectra reaching maximum fill time, to better measure the performance of QExactive HFX model
instruments that not only have a very rapid scan rate but can also be operated with the monoisotopic precursor recognition, peptide match set to preferred.
With those settings many unproductive MS/MS scans with maximum fill times are taken when there are no remaining good quality precursors detected, hence retaining
those in the median causes the metric to measure the sample composition more than the instrument performance.
- Revised Isobaric label incorporation quality metrics to more clearly count blocked Nterms as underlabeled
to enable better decision making about the prospects for re-labeling.
Stopped counting 0’s towards median peak width metric.
- Added calculations and plots about reporter ion ratios across the LC gradient to enable recognition of problems
with inconsistent recovery of early eluting peptides in some samples in a plex.
- Added reporting of fragmentation metrics in MS/MS spectra
- Revised Digestion stats to count percent tryptic, semitryptic, nontryptic.
- Added peptide separation metric that makes:
- Peptide subset reports for each data directory derived from central peptide lists(seqdb/peptideQMlists/*.txt)
- Added support for making comparative retention time plots relative to a gold standard run. Requires Namrata’s python script
(millpy/20180327_SM_Select_Peptide_QM.py), and python installed on the SM server.
- Added columns for iTRAQ/TMT metrics: Median S/N All Reporters, allReportersDetectedPercent, controlIonDetectedPercent
- Renamed Contaminant Product Ions score to Glyco Product Ions score and raised the threshold score from 2.0 to 4.5 for a spectrum for counting a PSM as
containing glyco marker ions.
Spectrum Matcher
- Updated to enable searching high quality left-over spectra against
high-quality already identified spectra
- Produces a histogram of frequently observed precursor mass shifts to help suggest presence of modifications
not accounted for during a database search
- Results reported in Spectrum Matcher
Spectrum Summary
- Updated to combine PSM level results of DB search, de Novo, and Spectrum Matcher results
- Added de novo performance metrics of accuracy and sequence recall that are calculated, tabulated, and graphed
- Summarizes Spectrum Matcher results to help explain the Spectrum matcher Precursor Mass Shifts using Database search
Id from Library Spectrum matched with metrics that include:
- Enriched for particular AA's present in peptide
- Frequent N-terminal AA's
- Compare precursor MS1 intensity
- Compare LC retention time (RT)
- Co-fractionate in bRP
de novo Sequencing
- Updated Sherenga scoring and peak detection optimized for high resolution HCD spectra
- Updated Sherenga automation that is ~100x faster than in SM v6.00
- Can now be incorporated into workflows
Protein Sequence Database Utilities
- Revised to handle very large databases of HLA peptides, associated with the MS/MS Search Full length digest option.
Added DB indexing option to omit pI, MW, and Species Indices. This concumes much less disk space but also prevents
pI, mw and species subset searches).
- Restored ability to detect duplicate accession numbers when indexing a database.
- Added support for making a list of tryptic peptides in the entire database.
- Enabled generic rather than CPTAC-specific support for concatenating personalized sequence databases from QUILTS variants and spliceforms along with the option
to Make Proteogenomic Summary Tables for subsequent use in Protein/Peptide Summary reporting mode: Protein Prot Genom Site Comparison.
Tool Belt
- Added support for TMT11 & TMTpro-16 correction factors
- Revised PepXML export to handle a specified subset of search result files.
Process Report
Added the Process Report Module with the following capabilities:
- Parse Report - Parses .ssv files generated by Protein/Peptide Summary to selectively extract the highest information value columns. Can also map sample identifiers to reproter ion masses.
- Normalize Reporter Ratios - Normalizes distributions of reporter ion ratios within a dataset.
- Plot Ratio Distributions - Plots histograms of iTRAQ/TMT ratio distributions in a dataset at the protein or VM-site level, both before and after normalization.
New in B.06.00.200 (October 2016 - last Agilent release)
The B.06.00 version of Agilent Spectrum Mill MS Proteomics
Workbench supports 64-bit Windows operating systems and includes
enhancements that increase the flexibility of the software and enable
greater productivity. This document describes differences between the
B.06.00 and prior versions of the Spectrum Mill workbench.
Note:
If you have upgraded from a prior version,
you will see some differences in validated results compared to the
earlier version.
When comparing data sets, it is best to reprocess (extract, search,
autovalidate) them using the B.06.00 version. See Quality Metrics & FDR
Installation and Configuration
The following installation and configuration features are new in the Spectrum
Mill workbench version B.06.00. For details, see the Installation Guide and
the Site Preparation Guide. Both are on your software disk.
Operating system
Spectrum Mill B.06.00 is supported on the following Microsoft
Windows operating systems:
- Windows 7 and Windows Server 2008
R2
- Windows Server 2012 R2
- Windows 10
- Windows Server 2016
Only 64-bit versions of the above operating systems are supported.
Windows XP and Windows Server 2003 are no longer supported.
If you configure a new system, Agilent recommends
that you provide adequate memory (16 GB or more) and adequate disk space
(1 to 2 TB for data). For a new system, Agilent recommends 24 to 32 GB of
memory.
Perl
ActiveState Perl 5.18.4
is supplied with this release. Install the 64-bit MSI. (If you are updating
from Spectrum Mill B.04.01 or earlier, uninstall the prior
version first.)
JRE
Java JRE 1.8u111 (1.8.0_111) is included in this release.
Install both the 32- and 64-bit JREs, even if you run the 64-bit Internet Explorer (IE 11).
Later versions of JRE 8 may be available (see www.java.com), and in general should work with Spectrum Mill.
The JRE is required to support the Spectrum Viewer and other applets.
Be sure to install JREs on all client browsers, as well as the
Spectrum Mill server.
IIS
IIS must be installed before you install the Spectrum Mill workbench.
See the Installation Guide for details on
installation and configuration of IIS.
MSFileReader support (for .raw files)
To process Thermo Scientific .raw files, Spectrum Mill B.06.00 can use
the 64-bit Thermo Scientific MSFileReader.
If you update your Spectrum Mill
installation from B.04.01 or prior, then you have the 32-bit MSFileReader
installed, so you must install
the 64-bit version. If you update your Spectrum Mill
installation from B.05, you already have
the 64-bit version.
Web browser support
The following Web browser is supported with this
version:
- Microsoft Internet Explorer 11
If you are upgrading from
Spectrum Mill workbench B.04.01 or earlier, note that the prior release
required Compatibility View for Internet Explorer 9 and later. For this
release you must disable Compatibility View.
MS/MS Search
- MS/MS Search is from two to five time faster, depending upon the peptide
redundancy in the databases you search.
- MS/MS Search no longer creates individual spo files in
results_mstag. Instead, searches create temporary
concatenated spo files (cpo files). The spo
files are added to an spo.zip file. The
program deletes the individual cpo files when a search
completes.
- This improvement significantly reduces the load on the file system (both
in space and number of files), and makes archiving and copying data
folders much faster.
- In the Spectral Quality Filtering section of the MS/MS Search page, the
min# of peaks filter has been removed and Precursor
isotope quality and Precursor isolation purity
filters have been added.
- HLA peptide motifs and related half-enzyme searches are now supported.
- The C-terminal peptide can now be matched in half-enzyme digests that
are built C-term to N-term. Previously, the C-terminal peptide would only
be matched, if it matched the enzyme specificity.
- By default, peptides with lengths less than five amino acids are no longer matched.
This enhancement is especially valuable when you allow large precursor
mass shifts for variable modifications. To enable
matching of smaller peptides, mark the Dynamic peak thresholding
check box.
- If you search data with multiple fragmentation modes and you select
All,
an informative error message is generated to indicate which mode(s)
should be selected for searching.
Quality Metrics and False Discovery Rate (FDR)
- Additional metrics are now available.
- The defect fix that was in B.05.00 SP1 is included. In Spectrum Mill B.05.00,
the reversed hit with the second best score was sometimes reported instead
of the one with the best score. This caused the search to underestimate
FDR. If you search against a typical UniProt database, the defect fix reduces
the total number of identifications by 1-5%, but it makes them more
accurate.
Home Page
- Spectrum Matcher is back (under Mass Spectral Interpretation Tools) and
provides additional quality filtering features to assist in evaluating
instrument performance or method changes.
Protein Databases
- Searching of DNA FASTA databases such as dbEST or custom databases (DN or DA prefix) are no longer supported. The DNA sequences must be converted to protein sequences. The FASTA protein header lines must correspond to one of the supported formats. See Updating databases
- Addition of tool to Concatenate FASTA files, which
allows you to link together two or more existing databases. It includes two possible ways to
concatenate:
- Select one or more existing database files to concatenate.
- Concatenate all files in a folder.
- Support for change of NCBI FASTA header format:
- In September 2016, NCBI changed the FASTA header format to only supply the gb (GeneBank) accession. The former
gi accession is no longer indicated.
- Newly downloaded databases in the new format
are supported and the gb accession is used by Spectrum Mill
for those databases.
- For the Spectrum Mill workbench to properly
recognize the format, these new databases require either an NCBIgb or
gb prefix instead of the NCBInr prefix.
- Existing databases (NCBInr) are still supported. GeneBank accessions (when present) can
be reported in Protein/Peptide Summary by creating a Category file
for the database.
Tool Belt
- Now supports TMT10 correction factors
- When you remove prior search results, any
applied correction factors remain.
Spectrum Matcher
- Includes new spectral quality filters and homology/variable mode
- Supports Load/Save parameter files (but not in workflows)
Protein/Peptide Summary
- Enhancements to support ion mobility data:
- Includes new Ion Mobility review field to report DT (drift time in
milliseconds) and CCS (collision cross section in square angstroms).
- Agilent Proteomics Results (APR) Export can now export DT and CCS
values if they are present in ion mobility data.
- Review Fields now include Correction Method (for Reporter Ratios).
Select Apply to apply correction factors or None
to skip correction.
Spectrum Summary
- Includes spectral quality filters
- Supports Load/Save parameter files
- New Review Field for ion mobility: reports DT and CCS values (if
available)
Data Extractors
- Thermo .RAW Extractor now supports Thermo Fusion/Lumos data.
- Generic Extractor now extracts to mzXML.
- Generic Extractor now supports PKL files from IM-MS Browser. (IM is ion
mobility.)
- The IM-MS Browser Rev 7.02 Build 209 or later can export concatenated PKL files
that contain the retention time (RT), drift time (DT), and collision
cross
section (CCS) for a feature (precursor).
- To report the CCS value, the CCS Single Field calibration must be
applied to the data.
- The names of the extracted spectra indicate RT (in minutes x 10) and
DT (in msec x 10), so the name has the form
myDataFile.<RTx10>.<DTx10>.<charge>.pkl.
- This name provides a way to more easily find the spectrum in IM-MS Browser.
- The RT, DT, and CCS values are stored in the specFeatures.tsv file,
and reported in Protein/Peptide Summary when the Ion mobility review
field is marked.
- DT and CCS values from ion mobility experiments are written to mzXML
for import into Skyline.
New in B.05.00
The B.05.00 version of Agilent Spectrum Mill MS Proteomics Workbench
provides the following new features:
Note:
If you have upgraded from a prior version, due to
improvements in data extraction, search scoring, and protein grouping,
you will see some differences in validated results compared to the
earlier version.
When comparing data sets, it is best to reprocess (extract, search,
autovalidate) them using the B.05.00 version.
Installation and Configuration
The following installation and configuration features are new
in the Spectrum Mill workbench version B.05.00. For details, see the Installation
Guide.
Operating system
Spectrum Mill B.05.00 is supported on the following Microsoft
Windows operating systems:
- Windows 7 and Windows Server 2008
R2
- Windows Server 2012 R2
Only 64-bit versions of the above operating systems are supported.
If you configure a new system, Agilent recommends
that you provide adequate memory (16 GB or more) and adequate disk space
(1 to 2 TB for data).
Perl
ActiveState Perl 5.18.4
is supplied with this release. Install the 64-bit MSI. (Uninstall the prior
version first.)
JRE
Java JRE 1.8u45 is included in this release.
Install both the 32- and 64-bit JREs, even if you run the 64-bit Internet Explorer (IE 11).
Be sure to install JREs on all client browsers, as well as the
Spectrum Mill server.
IIS
IIS must be installed before you install the Spectrum Mill workbench.
See the Installation Guide for details on
configuration of IIS.
MSFileReader Support (for .raw files)
To process Thermo Scientific .raw files, Spectrum Mill B.05.00 can use either
Xcalibur or the 64-bit Thermo Scientific MSFileReader. If you want to use one of
these programs, install it on the server before you install the Spectrum Mill
workbench. You can install it afterwards, but you must do some additional steps.
(See the Installation Guide for details.)
If you update your Spectrum Mill
installation, and you previously had installed MSFileReader, you must install
the 64-bit version.
Web Browser Support
The following Web browser is supported with this version:
- Microsoft Internet Explorer 11
Internet Explorer Compatibility View
If you are upgrading, note that the prior release required Compatibility View
for Internet Explorer 9 and later. For this release you must disable
Compatibility View.
Workflow Automation
New workflow automation tools are available.
- Multi-core support for data extraction:
- The “Maximize CPUs” feature previously available for MS/MS Search is now available for data extraction.
- When multiple data files are present in a folder, extraction for each data files is assigned to an available core.
- Automation support for the Quality Metrics feature as a workflow task:
- When you edit a workflow, you can load and save the Quality Metrics parameters.
- When you run a workflow, the Quality Metrics task is listed in the Request
Queue. When it is finished, it is listed in the Completion Log.
- Automation support for Data Archive as a workflow task:
- When you edit a workflow, you can load and save the Data Archive parameters.
- When you run a workflow, the Data Archive task is listed
in the Request Queue. When it is finished, it is listed in the
Completion Log.
- Addition of Data Archive as the last step in a
workflow reduces the number of files and required disk storage space.
- All default parameter files have been updated, and new ones have been added for
Archive and Quality Metrics.
- For Protein/Peptide Summary, new default parameter files named "_valid"
specify a Valid filter and set the scores and SPI thresholds to 0. These
starting values overcome a common mistake where default values are too high
to see all the valid hits.
Tool Belt and
Metrics
- The Quality Metrics tool has been moved from the Tool Belt
to its own page, accessible from the Spectrum Mill home page and the Tool
Belt page.
- New Quality Metrics support for CPTAC (Clinical Proteomic Tumor
Analysis Consortium) requirements.
- The Archive Data tool has been moved from the Tool Belt
to its own page, accessible from the Spectrum Mill home page and the Tool
Belt page.
- New bidirectional conversion of mzXML and pkl files.
- The Spectral Collector tool has been removed.
- The File Collector has been added. It provides an easy way to
organize Protein/Peptide Summary exports in a single folder.
MS/MS Autovalidation
- New filter for minimum number of directories that include a protein group.
- This filter exists in Auto Thresholds, Protein Polishing mode.
- Optimize thresholds by directory is useful for low-frequency matches in
complex samples.
Protein/Peptide Summary
- New Protein-Protein Comparison format for export of Agilent Proteomics
Results (APR) to Agilent Mass Profiler Professional (MPP).
- Export combined protein and peptide results to MPP with new format.
- Addition of peptide information allows you to eliminate incorrectly identified peptides
(because they do not track the protein).
- Only Agilent Q-TOF (.d) data is supported with this feature.
- The MPP Generic export is also selectable as an option for customers who
have not updated to MPP 14.0, which is required to import APR files.
- Protein-Protein Comparison mode allows you to select the type of export
(No export, Excel, MPP Generic, MPP APR).
- Consolidated Protein-Peptide Summary modes, with the option to
configure the
"classic" modes by editing smglobals.js.
Data Extractors
- Agilent Q-TOF Extractor now uses latest (64-bit) MassHunter Data Access Component (MHDAC).
- Agilent Q-TOF Extractor now determines nominal resolution from method in data file.
Resolution is adjusted based on instrument used to acquire data. Data acquired
with older versions of Q-TOF data acquisition programs will report that the
nominal resolution will default to 20,000.
- Support for TMT10 isobaric mass tags (where instrument resolution is
sufficient)
- Support for automatic data file deletion after successful extraction
- Check box specifies to delete raw data file/folder after successful extraction.
- Placeholder file is created and used to indicate presence of extracted data.
- This feature keeps disk usage to a manageable level.
- Warning: If you mark the check box to Delete
data files after extraction, make sure your data is archived
somewhere else.
- Extractions maximize usage of available CPUs. The request queue now shows
two requests: the initial one to create the batch (of files) and the other to
show the progress and extractor results.
- The Thermo extractor is now twice as fast.
- XtractorFinnigan
reads the centroided data from the .RAW file rather than Spectrum Mill doing
the centroiding.
- The default choice is to use the Xcalibur centroiding. It does a
better job of using appropriately narrow windows across the entire mass range
(particularly important for the poorly resolved TMT-10 peaks).
- Because the intensities are scaled
differently (10-100-fold), you should not mix Spectrum Mill centroiding
and Xcalibur centroiding across multiple directories that will later be used
for a combined report.
MS/MS Search
- Capability to select variable protein N-terminal and C-terminal
modifications.
- The variable modifications on the protein termini are
treated slightly differently from other variable modifications; to see if a
peptide terminus was modified, you must mark the N-term or
C-term review field in Protein/Peptide Summary.
- Also, these
modifications are not subject to VML scoring.
- Protein pI and required/disallowed amino acids have been
removed from the search form. When you load an old parameter file, you are
warned if any of these parameters contain obsolete values. You must re-save the
parameter file to avoid the warning and to ensure the search is processed
properly in workflows.
- The new default for Discriminant Scoring is “Off”, which prevents
creation of the extra files required when you autovalidate with the
Autothresholds-discriminant strategy.
- The “Default (same as score)” is now
“Score only” and is not a recommended setting. (It is there for backwards
compatibility.)
- Either use “Off” or select a coefficient set.
- List of instruments to choose from in MS/MS Search has been
significantly reduced. (You can show all by editing smglobals.js.)
Utilities
- Addition of isoelectric focusing (IEF) in Peptide Selector, so you can predict which
off-gel electrophoresis fractions are likely to have unique peptides.
- New Peptide String Match utility (on home page). This utility allows you to
search a database with a list of peptide sequences.
Protein Databases
- Addition of tool to Create Category File, which allows you to
create your own report fields to associate with a protein. You can then limit
summaries to the set of proteins of interest.
- Reordered the list of tools.
- New Re-index existing database tool. After downloading
a new version of a database, select the database from the list to re-index.
Re-create any species subsets after you re-index the main database.
New in B.04.01
The B.04.01 version of Agilent
Spectrum Mill MS Proteomics Workbench provides the following new
features:
Data
Extraction
The Agilent Q-ToF data extractor for B.04.01 now adjusts the tolerance
based on m/z for finding related precurors with SILAC data. The
differences are minor, but you will see some differences. The same
adjustments may affect merging, where fewer spectra might be merged.
Extract to
mzXML
- The data extractors for Agilent Q-TOF (.d) and Thermo
(.raw) data now extract to mzXML instead of multiple pkl files. The
reduction in the number of files improves performance and simplifies
archival of results. The mzXML may be combined with pepXML (from Tool
Belt) to support Skyline's creation of peptide spectral libraries for
MRM experiments and data independent analyis (DIA).
-
The extractor form has a new Create PKL files instead of mzXML
check box field to have the extractor create PKL files instead of
mzXML. Mark this check box if you use additional software to process
Spectrum Mill results that has not been updated to accept the mzXML
produced by Spectrum Mill.
- To maintain backwards compatibility, many features in
Spectrum Mill still refer to an individual spectrum by its pkl
filename. The pkl filename is now an attribute in the mzXML. When the
Spectrum Viewer or the MS/MS Search program needs to retrieve an
individual spectrum, they are configured to look in both the old
location (cpick_in/*.pkl) or in the new location specified within the
mzXML for the spectrum.
Automation:
MS/MS Search supports multiple CPU cores
Spectrum Mill's MS/MS Search can now take advantage of multiple CPU
cores. In the MS/MS Search page, select Maximize CPUs,
and in the Workflows page, select Max CPUs per search.
When enabled, batches from a single data directory will be assigned to
available CPUs. The parallel searches are listed in the Request Queue
as tasks with a "P". The search progress is indicated as the number of
batches completed out of total batches. When using this search mode, it
is recommended to increase the batch size to 150 or more, instead of
the former default of 81.
To prevent a single user from unfairly monopolizing all
CPU's
on a Spectrum Mill server, all directories submitted to the queue with Maxmize
CPUs enabled compete for CPU access. When multiple
directories are submitted to search, they all will share the available
CPUs, with processing occurring in parallel.
When Maximize CPUs is not enabled, each search on a
directory retains exclusive access to a single processor until all of
its batches have completed.
When you select a queued Maximize CPUs search to Remove
that has begun processing, the Request Queue will
continue to show the search in the queue. Wait a little while for any
currently executing batch searchs to complete, then click Request
Queue to refresh the list.
Protein/Peptide
Summary Changes
Review
Fields
The Review Fields layout has been changed slightly
to accomodate additional fields. The new spectral quality related
fields include Prec Av Chi2 (Precursor Averagine Chi2)
and Isol Pur (Isolation Purity). The pI
field is now Prot pI, and the Peptide pI
field is now just Pep pI. The Var mod
sequence field is now VML sequence.
Protein
Grouping
- Protein Grouping was revised to disallow transitive
closure. With transitive closure disallowed, all proteins belonging to
a group must share at least 1 peptide. With transitive closure allowed
(prior versions), proteins A and B which share peptide 1, while
proteins B and C which share peptide 2, would lead to grouping A, B,
and C together.
- A defect in subgroup subsuming was fixed. The subgroup
subsuming could cause loss of some subgroups and the peptides which
were unique to them. The most affected samples might be ones derived
from multiple species, i.e. human/mouse xenografts.
- The protein subgrouping logic was revised to allow a
protein that shares peptides with 2 different groups to be taken away
from a higher scoring group if membership in a lower scoring group is
achieved with more shared peptides. That is, when the first protein in
groups A and B share no peptides, and protein C shares 1 peptide with
A1 and more than 1 with B1, the prior versions would have put C in
group A. The B.04.01 version puts it in group B.
You may re-enable the grouping mode used in prior versions by selecting
the Prot pI review field.
Export results to Mass Profiler
Professional (MPP)
The Protein-Protein Comparison Columns and Protein-Protein
Comparison Redundant summary modes now support exporting to
the MPP generic format. To only report the top group protein, use Protein-Protein
Comparison Columns with 1 shared peptide
as the Protein grouping method. To report all
proteins, use the 1 shared, expand subgroups
method. The Protein-Protein Comparison Redundant
mode will always report all proteins (the 1 shared, expand
subgroups method is not allowed in this mode).
Peptide
Mode
changes
- You can now Filter to distinct peptides
in several different ways:
- Off -->
- Case insensitive -- When
collapsing
to "distinct", a case-insensitive string compare is used, thus peptides
with variable modifications (lowercase AA's) and unmodified peptides
are combined.
- Case sensitive -- When collapsing
to
"distinct", a case-sensitive string compare is used, thus peptides with
variable modifications (lowercase AA's), different localizations of
those variable modifications, and unmodified peptides are kept
separate.
- Charge file CS -- When collapsing
to
"distinct", a case-sensitive string compare is applied to both the
sequence and spectrum filename prefix, thus peptides from different
LC-MS/MS runs and those with different precursor charges are kept
separate.
- You can now create an inclusion list for Agilent Q-TOF
instruments. Mark the Export inclusion list for a
specified top peptides/protein check box and enter
a value for the maximum number of peptides to target per protein.
(This feature is only available if Agilent Q-TOF data has been
selected.)
Protein-Peptide
Comparison Columns Mode: Enhanced VML Reporting and Excel Export
The Protein-Peptide Comparison Columns summary mode
has been enhanced to enable Excel export and to allow peptide level or
modification site level report organization. Specific changes include:
- Excel export (.ssv semi-colon delimited) is now supported
in this mode.
- Multiple peptide spectrum matches can be collapsed to a
single row in the table according to either variable modification site
or peptide sequence. You can now specify to group rows by Sequence
or by variable modification site (Var mod site).
When Var mod site is selected, the following
additional features are relevant:
- Mark the VML score check box and
select the particular type of variable modification site in the menu
next to the VML score check box (s|t|y for
phosphorylation sites, k for uiquitin or acetylation sites) to specify
the variable modification site used to collapse PSM's.
- All PSM's collapsed to a single row must have the
same
number of modification sites. Singly and doubly phosphorylated forms of
the same peptide will be on separate rows, due to the potential for
differences in quantitative measurements between sites.
- The displayed representative PSM in each column of a
single row is the one having the highest VML score.
Spectra with ambiguous and confident site localizations in the same
peptide wil be collapsed together so long as they are not conflicting.
- A new Spectrum
Grouping Options section:
- The Group missed cleavages containing VM
site(s) check box enables different missed cleavage forms of
peptides containing the same modification site (AA position in the
protein sequence) to be collapsed into a single row.
- The Show all grouped spectra
check
box allows one to inspect the collapsing behavior by reporting all the
individual PSM's that are collapsed to an individual sequence or VM
site. Because this results in a nested table with multiple rows in
individual cells, Excel Export is not supported for
this feature.
- Reduced RAM consumption by 30% and increased protein
grouping speed by 20% for Protein-Protein Comparison Columns reports to
better handle large datasets.
Revised MS/MS Search
Scoring
- Revised MS/MS search scoring of Delta Rank1-Rank2 and
Delta
Foward-Reverse calculations to check for all isobaric characters
against all the Rank1 hits, not just the 1st Rank1 hit, to help
eliminate inordinately small values resulting from different
indistinguishable localizations of variable modifications in the Rank1
hits.
- Improved scoring of phosphopeptide MS/MS spectra from
Agilent Q-Tof and Thermo HCD spectra by not requiring the presence of
precursor minus phosphate ion in order to turn enable y-H3PO4
and b-H3PO4 fragment ion
types.
Autovalidation
Changes
The Autovalidation Auto thresholds strategy now
supports a Min. Sequence Length filter. The default
value for this field is 1 so as not to surprise users who upgrade from
prior versions, but higher values are recommended.
Note: A
value of 6 or higher is recommended for the Min. Sequence
Length value. Sequences shorter than 6 tend not to be unique
to a single protein.
In addition, the Peptide FDR for validated proteins
has been removed from the Auto thresholds Protein polishing
strategy. It is only available when using the Auto thresholds
- discriminant strategy.
Note: It
is recommended that you open and re-save all Autovalidation parameter
files that use the Auto thresholds strategy that
were created using B.04.00.
Peptide
Selector: Export Q-TOF Inclusion lists from results
This feature is only available if Agilent Q-TOF data has
been
selected.
You can now generate Agilent Q-TOF target lists based on search
results. The target list can include peptides from proteins that were
unidentified from a prior list of proteins, that contain only one
peptide, or a combination of both.
In Peptide Selector, first make sure the Protein(s)
to Select From list (bottom left of the form) displays the
list of accession numbers that were used to generate the orignal target
list. In the Save File Parameters section, mark the
Generate inclusion or MRM list file check box, and
select MassHunter Q-TOF MS/MS target list selected.
Then mark the From results check box to view the Valid
Results to Filter selections. Use the Select...
button to select the data folders containing the results. Select either
Unidentified, Single peptide,
or
Unidentifed + Single peptide as the proteins to
select. Click Select to generate the target list.
Q-Exactive
Support
The data extraction and MS/MS Search now support data acquired from a
Thermo Q-Exactive and Thermo Q-Exactive Plus instruments. You must
either have installed Xcalibur on the Spectrum Mill server, or have
downloaded and installed the MSFileReader (you must select the 32-bit
version to install). See the Spectrum Mill Installation Guide for
details.
TMT
Correction Factors and TMT6 changes
The Tool Belt now supports creating and applying correction factors for
TMT in additon to iTRAQ. In addition, the TMT modification definition
has been changed to account for a prominent unmatched singly charged
ion of mass 230 (cleavage of bond between reagent and peptide, with
charge staying on the reagent side), and a prominent set of ions at
(precursor mass - (155 to 159) ) /(parent_charge -1), from cleavage of
the amide bond in the reagent group, with charge-1 staying with the
peptide side of the bond.
Important TMT6 Changes
TMT6 reagents with lot numbers that begin with MJ and later modified
the 127 and 129 reporter ions masses by almost 50 ppm. This release
supports these changes. The prior TMT6 definition has been moved to
smconfig.misc.xml and renamed with "pre-MJ" appended. Summaries,
including ratios, with older data will still report correctly. However,
File Details and Spectrum Summary may indicated different scores and
ion labels if the 127 and 129 ions are not matched. If you need to
extract and search data that was prepared with TMT6 lots prior to "MJ",
please contact your Agilent representative to request a special data
extractor and search program that will support the older definitions.
Tool
Belt: Enhanced quality metrics reporting
The Tool Belt tool "Report FDR and search statistics" is now "Report
FDR and quality metrics", and has been enhanced to provide quality
metrics in addition to the search statistics. The new metrics include
sample handling, site localization, and isobaric label incorporation.
Protein
Databases: Make non-redundant
The Protein Databases Utility page provides a new utility to remove
redundant entries.
When keeping/removing redundant entries (identical sequence), only the
one nearest the top of the FASTA file is kept. However, for UniProt
databases, SwissProt entries are preferentially kept over TrEMBL
entries regardless of order in the FASTA file.
Build
TIC changes
The Build TIC form has changed to provide more filtering options for
data that has been extracted to mzXML. The "Neutral loss" selection has
been removed, and the "Y axis" filtering selections have been enhanced
to allow for building TIC's based on the following:
- MS1 Precursor Intensity
- MS1 Precursor Chromatographic Peak Width (sec)
- MS1 Precursor Isolation Purity %
- MS1 Precursor Averagine Chi2
- MS2 Base Peak Intensity
- MS2 Max Sequence Tag Length
- MS2 Phosphate Precursor-Neutral loss (% of base peak)
- MS2 Dissociated Intensity %
The data must be extracted with B.04.01 in order to fully
support the above options, and only data which has been extracted to
mzXML supports all of the above.
The "MS1 Precursor Intensity" corresponds to the former
"Neutral loss" of "None".
The "MS2 Phosphate Precursor-Neutral loss (% of base peak)" corresponds
most closely (but not exactly) to the former "Neutral loss" of "H2PO4".
The "MS2 Dissociated Intensity" corresponds most closely (but not
exactly) to the former "Neutral loss" of "H2O".
These are the only filtering selections that will work with pkl files.
New in B.04.00
The B.04.00 version of Agilent
Spectrum Mill MS Proteomics Workbench included extensive new
workflow automation capabilities,
the ability to use Spectrum Mill identification results to create MRM lists for triple
quadrupole instruments, as well as many other enhancements that
increase the flexibility of the software and enable greater
productivity. This
section describes
differences between the A.03.03 and the B.04.00 versions of the
Spectrum Mill workbench.
Installation and Configuration
The following installation and configuration features are new in the Spectrum Mill workbench version B.04.00. For details, see the Installation Guide.
Operating system
Spectrum Mill B.04.00 is supported on the following operating systems:
- Windows XP (SP3)
- Windows Server 2003 (SP2 or later)
- Windows 7
- Windows Server 2008 R2
Both 32-bit and 64-bit versions of the above operating systems are supported. Spectrum Mill programs run as 32-bit applications but are able to address up to 4Gb of memory per process when running on 64-bit systems.
Windows Server 2000 is no longer supported.
If you configure a new system, Agilent recommends that you install a 64-bit operating system and provide adequate memory (8 GB or more) and adequate disk space (1–2 TB for data).
Perl
ActiveState Perl 5.14.2.1402 is supplied with this release. Both 32-bit (x86) and 64-bit (x64) versions are provided, so select the one that is appropriate for your operating system. See the Installation Guide for details on installation of Perl and implications for configuration of IIS.
Java (JRE)
Java JRE 1.6.29u is included in this release. Both 32-bit (i586)) and 64-bit (x64) versions of JRE are provided.
Be sure to install JRE on all client browsers, as well as the Spectrum Mill server.
IIS
Install IIS before installing Spectrum Mill workbench.See the Installation Guide for details on installation and configuration of IIS.
MSFileReader Support (for .raw files)
Prior versions of Spectrum Mill required that Xcalibur software be installed to process Thermo Scientific (.raw) data files. You no longer need Xcalibur, provided you install the Thermo Scientific MSFileReader, which you can download for free. Xcalibur is still supported. Refer to the Installation Guide for details.
Process Automation Tools
New process automation tools let you perform an entire data analysis, from spectral extraction to final results summary, in a completely unattended way. This new capability includes:
- Parameter files that allow you to save settings for forms and use them in automated workflows. You can save parameter files for Data Extraction, MS/MS Search, Autovalidation, Protein/Peptide Summary, Sherenga de novo Sequencing, PMF Search, and PMF Summary. (You can also save parameter files for Peptide Selector, MRM Selector, and Sherenga de novo Summary, but these are not used in workflows.)
- A new Edit Workflow page that lets you create and edit workflows. Workflows consist of a series of tasks that use parameter files. For example, you can create a workflow that does Data Extraction, MS/MS Search, Autovalidation, and Protein/Peptide Summary.
- A new Workflows page that lets you execute workflows on single or multiple folders. Tasks assigned to the same data folder are processed seequentially. If multiple folders have been selected for a workflow, the same tasks in each folder are executed in parallel. You can also start multiple workflows, which can run in parallel if the Spectrum Mill server has multiple CPUs.
- New Request Queue/Completion Log viewers that allow you to monitor the progress of workflows. The Request Queue makes sure that the tasks are processed in the correct order, and that multiple workflows are executed properly.
The Request Queue also accepts interactive task commands to be executed either as a single task or as part of interactive automation. For example, when you click Validate Files after marking the Request queue check box, this task is placed in the queue and executed either in parallel to other workflow tasks or consecutively in the order it appears in the queue. You can perform an interactive automation by manually clicking each action button in their respective pages, such as Extract, Start Search, Validate Files and Summarize, one after the other.
Creation of MRM Lists
The new MRM Selector streamlines protein quantitation workflows that use identification results to create multiple reaction monitoring (MRM) lists for triple quadrupole instruments.
- In the new MRM Selector form, you select data that has been searched, then filter the results to include only those peptides that meet your requirements. (The filters are similar to those in Protein/Peptide Summary.) Then you can automatically generate a list of transitions for multiple reaction monitoring.
- Output formats include Agilent Triple Quadrupole MRM, Agilent Triple Quadrupole DMRM, Agilent Triple Quadrupole Optimizer, and ABI Triple Quad.
- The Agilent Q-TOF Data Extractor has been enhanced to output the collision energy (CE), peak apex, and chromatographic peak width in the specFeatures.tsv file, so that you can use these values for dynamic MRM (DMRM) generation. The collision energy is especially useful when you generate a DMRM list for an Agilent Triple Quadrupole from Agilent Q-TOF data, because the collision cells are the same.
- You may instead choose to have the collision energies calculated based on an equation, and you have the option to type a chromatographic peak width that will apply for all peaks in the DMRM analysis.
- You can save and load parameter files that contain settings for this form.
False Discovery Rate (FDR)
False discovery rate (FDR) is an independent measure of the likelihood that the results are incorrect. Calculation of FDR is important to ensure the validity of results, and is a requirement for publication in some journals. Spectrum Mill workbench now provides tools to calculate FDR in Autovalidation and report FDR's in the Tool Belt at the peptide and spectra level, and protein level. See "What's new" for autovalidation, and the Tool Belt, to learn more about these new options.
Data Extractors
- You can now perform data extraction as part of a workflow, which saves time and minimizes your effort.
- You can select multiple data folders for extraction. Each folder is a separate request that is queued and executed when a CPU becomes available.
- Some data extractors (.raw and Q-TOF .d ) determine a retention time (RT) apex and report that in specFeatures.tsv, which improves quantitation for techniques like SILAC and ICAT.
- The Agilent Q-TOF extractor also reports the collision energy, which can be used in the MRM Selector to generate a DMRM list.
- The Agilent Q-TOF .d and Thermo .raw extractors calculate some metrics that are used in the Tool Belt search statistics report, and they calculate an averagine chi squared value that is used in Protein/Peptide Summary.
Java
is a U.S. trademark of Sun Microsystems, Inc.
MS/MS
Search
- MS/MS Search can now be part of an automated workflow.
- You can select multiple data folders for MS/MS Search.
Each
folder is a separate request that is queued and executed when a CPU
becomes available. A search is automatically dependent upon any
extraction request for the same folder.
- A new discriminant
scoring
field is available in MS/MS Search. This scoring option combines
several metrics, including score, SPI, backbone cleavage score, among
others, and excludes more false positives thant the “Score” metric.
Discriminant coefficients determine the relative importance of each of
the metrics in the calculation. Agilent provides coefficients for
Agilent Q-TOF and Ion Trap instruments, but you must create your own
coefficients in the Tool Belt if your study requires them or you are
using another vendor’s instrument.
MS/MS
Autovalidation
- False discovery
rate
is incorporated into autovalidation, which makes it easy to validate
only those hits that have a low chance of being false positives.
- You
can now use three strategies with different modes to autovalidate
results: Fixed thresholds (Uses
Protein and Peptide Rules for thresholds whose values you can enter,
and calculates an
FDR using reversed hits), Auto thresholds (optimizes the score and
delta R1-R2 score thresholds until a target FDR is reached), and Auto
thresholds -discriminant (optimizes the discriminant score threshold
until a target FDR is reached)
- For each of the
three
strategies for autovalidation, you can perform two steps (called
modes). See the Autovalidation section of Software
Basics to learn more
about these modes.
- MS/MS Autovalidation can now be done either interactively
or via an automated workflow.
- Buttons allow you to remove the results of the last
autovalidation you performed, or remove all autovalidation results.
Protein/Peptide
Summary
- You can now exclude spo files (search
results files) in results summaries.
- Protein-related summary modes that enable quantitation
now
allow you to exclude:
- Precursor extracted ion chromatograms that have poor
isotope quality and results that have poor precursor isolation
purity. This
feature applies to Agilent Q-TOF .d and Thermo.raw data.
- Precursor
isolation purity below a specified percentage
- Review fields for differential expression quantitation
(DEQ) now allow you to:
- Display median, mean, or both for DEQ ratios
- Exclude outlier DEQ ratios for protein quantitation
- Additional
modifications for DEQ are supported, including full support for iTRAQ
and TMT.
- You can now
select
to report variable modification localization (VML) scores and sequences
for modificatiion sites.
- Protein Summary Details mode now allows you to export to
Microsoft® Excel.
- The default size of the Spectrum Viewer (in the
Protein/Peptide Summary results) has changed to allow for easier
copy/paste into Microsoft PowerPoint or other documents.
- Accurate
Mass Retention Time (AMRT) export (Peptide mode) improves integration
with feature assignment and annotation in Agilent MassHunter Mass
Profiler Professional, by enabling you to more easily map features to
identifications from the Spectrum Mill workbench.
- The Review Fields section has been
rearranged to use less space in all of the modes.
- Protein/Peptide Summary can now be done either
interactively or via an automated workflow.
Tool Belt
- Ability to report FDR on protein, peptide and spectral levels
- New feature to create your own set of discriminant scoring coefficients
- New capability to export an MS/MS search summary file to PepXML format. PepXML is a results exchange format that is supported by the Trans-Proteomic Pipeline from the Institute for Systems Biology and can be imported into other software packages, such as Skyline, an MRM data analysis package from the Univiversity of Washington.
- Ability to convert data extractor pkl files into mzXML files, which let you import Spectrum Mill results into other software packages, such as Skyline. Or, you can use these files as links to labeled MS/MS spectra required to identify post translational modifications for papers published in MCP.
- New feature to archive instrument-created files, search results files, spectral files and data directories
Peptide Selector
- Ability to generate MRM lists and inclusion lists for Q-TOF that you can copy and paste directly into Excel or Agilent MassHunter Data Acquisition software and to generate lists that can be exported to MassHunter Qualitative Analysis to create an Accurate Mass (AM) database
- Capability to save and load parameter files
- New features that assist in the targeted proteomics workflow
- New Protein Position Filtering settings that enable you to restrict the selection based on the position of peptides within the protein.
- New Scoring settings that let you penalize, rather than exclude, peptides that fail the selection criteria
Protein List to Masses Utility
A new utility is available
to
calculate the masses and formulas for a set of specified peptides.
Protein
Databases
- Ability to compare two databases
to determine whether their content is different, which is useful when
you need to remove redundant databases from the Spectrum Mill server.
- Option to create a subset FASTA file from accession numbers that you type,
which is useful for limiting
searches to the set of proteins of particular interest.
Server Administration
- With the advent of automated workflows, administrators can limit the number of parallel workflow processes to less than the CPU count.
- Administrators can start and stop the Spectrum Mill Workflow Manager Service to troubleshoot workflows.
What happened to Easy MS/MS Search?
Easy MS/MS Search has been replaced by new process automation tools that provide more power and flexibility.
Microsoft is a U.S. registered
trademark of Microsoft Corporation.