Package: RforProteomics
Authors: Authors: Laurent Gatto, Lisa Breckels and Sebastian Gibb
Last compiled: Thu Oct 15 12:55:22 2015
Last modified: 2015-10-13 16:26:50
This vignette illustrates existing and Bioconductor infrastructure for the visualisation of mass spectrometry and proteomics data. The code details the visualisations presented in
Gatto L, Breckels LM, Naake T, Gibb S. Visualisation of proteomics data using R and Bioconductor. Proteomics. 2015 Feb 18. doi: 10.1002/pmic.201400392. PubMed PMID: 25690415.
There are currently 79 Proteomics and 55 MassSpectrometry packages in Bioconductor version 3.2. Other non-Bioconductor packages are described in the RforProteomics vignette.
Package | Title | Version | |
---|---|---|---|
ASEB | ASEB | Predict Acetylated Lysine Sites | 1.14.0 |
bioassayR | bioassayR | R library for Bioactivity analysis | 1.8.0 |
biobroom | biobroom | Turn Bioconductor objects into tidy data frames | 1.0.0 |
BRAIN | BRAIN | Baffling Recursive Algorithm for Isotope distributioN calculations | 1.16.0 |
Cardinal | Cardinal | A mass spectrometry imaging toolbox for statistical analysis | 1.2.0 |
CellNOptR | CellNOptR | Training of boolean logic models of signalling networks using prior knowledge networks and perturbation data. | 1.16.0 |
ChemmineOB | ChemmineOB | R interface to a subset of OpenBabel functionalities | 1.8.0 |
ChemmineR | ChemmineR | Cheminformatics Toolkit for R | 2.22.0 |
cisPath | cisPath | Visualization and management of the protein-protein interaction networks. | 1.10.0 |
cleaver | cleaver | Cleavage of Polypeptide Sequences | 1.8.0 |
clippda | clippda | A package for the clinical proteomic profiling data analysis | 1.20.0 |
CNORdt | CNORdt | Add-on to CellNOptR: Discretized time treatments | 1.12.0 |
CNORfeeder | CNORfeeder | Integration of CellNOptR to add missing links | 1.10.0 |
CNORode | CNORode | ODE add-on to CellNOptR | 1.12.0 |
customProDB | customProDB | Generate customized protein database from NGS data, with a focus on RNA-Seq data, for proteomics search. | 1.10.0 |
DAPAR | DAPAR | Tools for the Differential Analysis of Proteins Abundance with R | 1.0.0 |
deltaGseg | deltaGseg | deltaGseg | 1.10.0 |
eiR | eiR | Accelerated similarity searching of small molecules | 1.10.0 |
fCI | fCI | f-divergence Cutoff Index | 1.0.0 |
fmcsR | fmcsR | Mismatch Tolerant Maximum Common Substructure Searching | 1.12.0 |
GraphPAC | GraphPAC | Identification of Mutational Clusters in Proteins via a Graph Theoretical Approach. | 1.12.0 |
hpar | hpar | Human Protein Atlas in R | 1.12.0 |
iPAC | iPAC | Identification of Protein Amino acid Clustering | 1.14.0 |
IPPD | IPPD | Isotopic peak pattern deconvolution for Protein Mass Spectrometry by template matching | 1.18.0 |
isobar | isobar | Analysis and quantitation of isobarically tagged MSMS proteomics data | 1.16.0 |
LPEadj | LPEadj | A correction of the local pooled error (LPE) method to replace the asymptotic variance adjustment with an unbiased adjustment based on sample size. | 1.30.0 |
MassSpecWavelet | MassSpecWavelet | Mass spectrum processing by wavelet-based algorithms | 1.36.0 |
MSGFgui | MSGFgui | A shiny GUI for MSGFplus | 1.4.0 |
MSGFplus | MSGFplus | An interface between R and MS-GF+ | 1.4.0 |
msmsEDA | msmsEDA | Exploratory Data Analysis of LC-MS/MS data by spectral counts | 1.8.0 |
msmsTests | msmsTests | LC-MS/MS Differential Expression Tests | 1.8.0 |
MSnbase | MSnbase | Base Functions and Classes for MS-based Proteomics | 1.18.0 |
MSnID | MSnID | Utilities for Exploration and Assessment of Confidence of LC-MSn Proteomics Identifications. | 1.4.0 |
mzID | mzID | An mzIdentML parser for R | 1.8.0 |
mzR | mzR | parser for netCDF, mzXML, mzData and mzML and mzIdentML files (mass spectrometry data) | 2.4.0 |
PAA | PAA | PAA (Protein Array Analyzer) | 1.4.0 |
PAnnBuilder | PAnnBuilder | Protein annotation data package builder | 1.34.0 |
Path2PPI | Path2PPI | Prediction of pathway-specific protein-protein interaction networks | 1.0.0 |
pathview | pathview | a tool set for pathway based data integration and visualization | 1.10.0 |
Pbase | Pbase | Manipulating and exploring protein and proteomics data | 0.10.0 |
PCpheno | PCpheno | Phenotypes and cellular organizational units | 1.32.0 |
PECA | PECA | Probe-level Expression Change Averaging | 1.6.0 |
pepXMLTab | pepXMLTab | Parsing pepXML files and filter based on peptide FDR. | 1.4.0 |
PGA | PGA | An package for identification of novel peptides by customized database derived from RNA-Seq | 1.0.0 |
plgem | plgem | Detect differential expression in microarray and proteomics datasets with the Power Law Global Error Model (PLGEM) | 1.42.0 |
PLPE | PLPE | Local Pooled Error Test for Differential Expression with Paired High-throughput Data | 1.30.0 |
ppiStats | ppiStats | Protein-Protein Interaction Statistical Package | 1.36.0 |
proBAMr | proBAMr | Generating SAM file for PSMs in shotgun proteomics data | 1.4.0 |
PROcess | PROcess | Ciphergen SELDI-TOF Processing | 1.46.0 |
procoil | procoil | Prediction of Oligomerization of Coiled Coil Proteins | 1.20.0 |
ProCoNA | ProCoNA | Protein co-expression network analysis (ProCoNA). | 1.8.0 |
pRoloc | pRoloc | A unifying bioinformatics framework for spatial proteomics | 1.10.0 |
pRolocGUI | pRolocGUI | Interactive visualisation of spatial proteomics data | 1.4.0 |
Prostar | Prostar | Provides a GUI for DAPAR | 1.0.0 |
prot2D | prot2D | Statistical Tools for volume data from 2D Gel Electrophoresis | 1.8.0 |
ProteomicsAnnotationHubData | ProteomicsAnnotationHubData | Transform public proteomics data resources into Bioconductor Data Structures | 1.0.0 |
proteoQC | proteoQC | An R package for proteomics data quality control | 1.6.0 |
ProtGenerics | ProtGenerics | S4 generic functions for Bioconductor proteomics infrastructure | 1.2.0 |
Pviz | Pviz | Peptide Annotation and Data Visualization using Gviz | 1.4.0 |
qcmetrics | qcmetrics | A Framework for Quality Control | 1.8.0 |
QuartPAC | QuartPAC | Identification of mutational clusters in protein quaternary structures. | 1.2.0 |
rain | rain | Rhythmicity Analysis Incorporating Non-parametric Methods | 1.4.0 |
RCASPAR | RCASPAR | A package for survival time prediction based on a piecewise baseline hazard Cox regression model. | 1.16.0 |
Rchemcpp | Rchemcpp | Similarity measures for chemical compounds | 2.8.0 |
Rcpi | Rcpi | Toolkit for Compound-Protein Interaction in Drug Discovery | 1.6.0 |
ropls | ropls | PCA, PLS(-DA) and OPLS(-DA) for multivariate analysis and feature selection of omics data | 1.2.0 |
RpsiXML | RpsiXML | R interface to PSI-MI 2.5 files | 2.12.0 |
rpx | rpx | R Interface to the ProteomeXchange Repository | 1.6.0 |
rTANDEM | rTANDEM | Interfaces the tandem protein identification algorithm in R | 1.10.0 |
sapFinder | sapFinder | A package for variant peptides detection and visualization in shotgun proteomics. | 1.8.0 |
ScISI | ScISI | In Silico Interactome | 1.42.0 |
shinyTANDEM | shinyTANDEM | Provides a GUI for rTANDEM | 1.8.0 |
SLGI | SLGI | Synthetic Lethal Genetic Interaction | 1.30.0 |
SpacePAC | SpacePAC | Identification of Mutational Clusters in 3D Protein Space via Simulation. | 1.8.0 |
specL | specL | specL - Prepare Peptide Spectrum Matches for Use in Targeted Proteomics | 1.4.0 |
spliceSites | spliceSites | Manages align gap positions from RNA-seq data | 1.8.0 |
SWATH2stats | SWATH2stats | Transform and Filter SWATH Data for Statistical Packages | 1.0.0 |
synapter | synapter | Label-free data analysis pipeline for optimal identification and quantitation | 1.12.0 |
TPP | TPP | Analyze thermal proteome profiling (TPP) experiments | 2.0.0 |
Package | Title | Version | |
---|---|---|---|
apComplex | apComplex | Estimate protein complex membership using AP-MS protein data | 2.36.0 |
BRAIN | BRAIN | Baffling Recursive Algorithm for Isotope distributioN calculations | 1.16.0 |
CAMERA | CAMERA | Collection of annotation related methods for mass spectrometry data | 1.26.0 |
Cardinal | Cardinal | A mass spectrometry imaging toolbox for statistical analysis | 1.2.0 |
cosmiq | cosmiq | cosmiq - COmbining Single Masses Into Quantities | 1.4.0 |
customProDB | customProDB | Generate customized protein database from NGS data, with a focus on RNA-Seq data, for proteomics search. | 1.10.0 |
cytofkit | cytofkit | cytofkit: an integrated analysis pipeline for mass cytometry data | 1.1.1 |
DAPAR | DAPAR | Tools for the Differential Analysis of Proteins Abundance with R | 1.0.0 |
flagme | flagme | Analysis of Metabolomics GC/MS Data | 1.26.0 |
gaga | gaga | GaGa hierarchical model for high-throughput data analysis | 2.16.0 |
iontree | iontree | Data management and analysis of ion trees from ion-trap mass spectrometry | 1.16.0 |
isobar | isobar | Analysis and quantitation of isobarically tagged MSMS proteomics data | 1.16.0 |
MAIT | MAIT | Statistical Analysis of Metabolomic Data | 1.4.0 |
MassArray | MassArray | Analytical Tools for MassArray Data | 1.22.0 |
MassSpecWavelet | MassSpecWavelet | Mass spectrum processing by wavelet-based algorithms | 1.36.0 |
Metab | Metab | Metab: An R Package for a High-Throughput Analysis of Metabolomics Data Generated by GC-MS. | 1.4.0 |
metabomxtr | metabomxtr | A package to run mixture models for truncated metabolomics data with normal or lognormal distributions | 1.4.0 |
metaMS | metaMS | MS-based metabolomics annotation pipeline | 1.6.0 |
metaX | metaX | An R package for metabolomic data analysis | 1.0.0 |
MSGFgui | MSGFgui | A shiny GUI for MSGFplus | 1.4.0 |
MSGFplus | MSGFplus | An interface between R and MS-GF+ | 1.4.0 |
msmsEDA | msmsEDA | Exploratory Data Analysis of LC-MS/MS data by spectral counts | 1.8.0 |
msmsTests | msmsTests | LC-MS/MS Differential Expression Tests | 1.8.0 |
MSnbase | MSnbase | Base Functions and Classes for MS-based Proteomics | 1.18.0 |
MSnID | MSnID | Utilities for Exploration and Assessment of Confidence of LC-MSn Proteomics Identifications. | 1.4.0 |
mzID | mzID | An mzIdentML parser for R | 1.8.0 |
mzR | mzR | parser for netCDF, mzXML, mzData and mzML and mzIdentML files (mass spectrometry data) | 2.4.0 |
PAPi | PAPi | Predict metabolic pathway activity based on metabolomics data | 1.10.0 |
Pbase | Pbase | Manipulating and exploring protein and proteomics data | 0.10.0 |
pepXMLTab | pepXMLTab | Parsing pepXML files and filter based on peptide FDR. | 1.4.0 |
PGA | PGA | An package for identification of novel peptides by customized database derived from RNA-Seq | 1.0.0 |
plgem | plgem | Detect differential expression in microarray and proteomics datasets with the Power Law Global Error Model (PLGEM) | 1.42.0 |
proBAMr | proBAMr | Generating SAM file for PSMs in shotgun proteomics data | 1.4.0 |
PROcess | PROcess | Ciphergen SELDI-TOF Processing | 1.46.0 |
pRoloc | pRoloc | A unifying bioinformatics framework for spatial proteomics | 1.10.0 |
Prostar | Prostar | Provides a GUI for DAPAR | 1.0.0 |
proteoQC | proteoQC | An R package for proteomics data quality control | 1.6.0 |
ProtGenerics | ProtGenerics | S4 generic functions for Bioconductor proteomics infrastructure | 1.2.0 |
qcmetrics | qcmetrics | A Framework for Quality Control | 1.8.0 |
Rdisop | Rdisop | Decomposition of Isotopic Patterns | 1.30.0 |
Risa | Risa | Converting experimental metadata from ISA-tab into Bioconductor data structures | 1.12.0 |
RMassBank | RMassBank | Workflow to process tandem MS files and build MassBank records | 1.12.0 |
rols | rols | An R interface to the Ontology Lookup Service | 1.12.0 |
ropls | ropls | PCA, PLS(-DA) and OPLS(-DA) for multivariate analysis and feature selection of omics data | 1.2.0 |
rpx | rpx | R Interface to the ProteomeXchange Repository | 1.6.0 |
rTANDEM | rTANDEM | Interfaces the tandem protein identification algorithm in R | 1.10.0 |
sapFinder | sapFinder | A package for variant peptides detection and visualization in shotgun proteomics. | 1.8.0 |
shinyTANDEM | shinyTANDEM | Provides a GUI for rTANDEM | 1.8.0 |
SIMAT | SIMAT | GC-SIM-MS data processing and alaysis tool | 1.2.0 |
specL | specL | specL - Prepare Peptide Spectrum Matches for Use in Targeted Proteomics | 1.4.0 |
SWATH2stats | SWATH2stats | Transform and Filter SWATH Data for Statistical Packages | 1.0.0 |
synapter | synapter | Label-free data analysis pipeline for optimal identification and quantitation | 1.12.0 |
TargetSearch | TargetSearch | A package for the analysis of GC-MS metabolite profiling data | 1.26.0 |
TPP | TPP | Analyze thermal proteome profiling (TPP) experiments | 2.0.0 |
xcms | xcms | LC/MS and GC/MS Data Analysis | 1.46.0 |
x1 | x2 | x3 | x4 | y1 | y2 | y3 | y4 |
---|---|---|---|---|---|---|---|
10 | 10 | 10 | 8 | 8.04 | 9.14 | 7.46 | 6.58 |
8 | 8 | 8 | 8 | 6.95 | 8.14 | 6.77 | 5.76 |
13 | 13 | 13 | 8 | 7.58 | 8.74 | 12.74 | 7.71 |
9 | 9 | 9 | 8 | 8.81 | 8.77 | 7.11 | 8.84 |
11 | 11 | 11 | 8 | 8.33 | 9.26 | 7.81 | 8.47 |
14 | 14 | 14 | 8 | 9.96 | 8.10 | 8.84 | 7.04 |
6 | 6 | 6 | 8 | 7.24 | 6.13 | 6.08 | 5.25 |
4 | 4 | 4 | 19 | 4.26 | 3.10 | 5.39 | 12.50 |
12 | 12 | 12 | 8 | 10.84 | 9.13 | 8.15 | 5.56 |
7 | 7 | 7 | 8 | 4.82 | 7.26 | 6.42 | 7.91 |
5 | 5 | 5 | 8 | 5.68 | 4.74 | 5.73 | 6.89 |
tab <- matrix(NA, 5, 4)
colnames(tab) <- 1:4
rownames(tab) <- c("var(x)", "mean(x)",
"var(y)", "mean(y)",
"cor(x,y)")
for (i in 1:4)
tab[, i] <- c(var(anscombe[, i]),
mean(anscombe[, i]),
var(anscombe[, i+4]),
mean(anscombe[, i+4]),
cor(anscombe[, i], anscombe[, i+4]))
1 | 2 | 3 | 4 | |
---|---|---|---|---|
var(x) | 11.0000000 | 11.0000000 | 11.0000000 | 11.0000000 |
mean(x) | 9.0000000 | 9.0000000 | 9.0000000 | 9.0000000 |
var(y) | 4.1272691 | 4.1276291 | 4.1226200 | 4.1232491 |
mean(y) | 7.5009091 | 7.5009091 | 7.5000000 | 7.5009091 |
cor(x,y) | 0.8164205 | 0.8162365 | 0.8162867 | 0.8165214 |
While the residuals of the linear regression clearly indicate fundamental differences in these data, the most simple and straightforward approach is visualisation to highlight the fundamental differences in the datasets.
ff <- y ~ x
mods <- setNames(as.list(1:4), paste0("lm", 1:4))
par(mfrow = c(2, 2), mar = c(4, 4, 1, 1))
for (i in 1:4) {
ff[2:3] <- lapply(paste0(c("y","x"), i), as.name)
plot(ff, data = anscombe, pch = 19, xlim = c(3, 19), ylim = c(3, 13))
mods[[i]] <- lm(ff, data = anscombe)
abline(mods[[i]])
}
lm1 | lm2 | lm3 | lm4 |
---|---|---|---|
0.0390000 | 1.1390909 | -0.5397273 | -0.421 |
-0.0508182 | 1.1390909 | -0.2302727 | -1.241 |
-1.9212727 | -0.7609091 | 3.2410909 | 0.709 |
1.3090909 | 1.2690909 | -0.3900000 | 1.839 |
-0.1710909 | 0.7590909 | -0.6894545 | 1.469 |
-0.0413636 | -1.9009091 | -1.1586364 | 0.039 |
1.2393636 | 0.1290909 | 0.0791818 | -1.751 |
-0.7404545 | -1.9009091 | 0.3886364 | 0.000 |
1.8388182 | 0.1290909 | -0.8491818 | -1.441 |
-1.6807273 | 0.7590909 | -0.0805455 | 0.909 |
0.1794545 | -0.7609091 | 0.2289091 | -0.111 |
The following code chunk connects to the PXD000001
data set on the ProteomeXchange repository and fetches the mzTab
file. After missing values filtering, we extract relevant data (log2 fold-changes and log10 mean expression intensities) into data.frames
.
library("rpx")
px1 <- PXDataset("PXD000001")
mztab <- pxget(px1, "PXD000001_mztab.txt")
## Downloading 1 file
library("MSnbase")
## here, we need to specify the (old) mzTab version 0.9
qnt <- readMzTabData(mztab, what = "PEP", version = "0.9")
## Version 0.9 is deprecated. Please see '?readMzTabData' and '?MzTab' for details.
## Detected a metadata section
## Detected a peptide section
sampleNames(qnt) <- reporterNames(TMT6)
qnt <- filterNA(qnt)
## may be combineFeatuers
spikes <- c("P02769", "P00924", "P62894", "P00489")
protclasses <- as.character(fData(qnt)$accession)
protclasses[!protclasses %in% spikes] <- "Background"
madata42 <- data.frame(A = rowMeans(log(exprs(qnt[, c(4, 2)]), 10)),
M = log(exprs(qnt)[, 4], 2) - log(exprs(qnt)[, 2], 2),
data = rep("4vs2", nrow(qnt)),
protein = fData(qnt)$accession,
class = protclasses)
madata62 <- data.frame(A = rowMeans(log(exprs(qnt[, c(6, 2)]), 10)),
M = log(exprs(qnt)[, 6], 2) - log(exprs(qnt)[, 2], 2),
data = rep("6vs2", nrow(qnt)),
protein = fData(qnt)$accession,
class = protclasses)
madata <- rbind(madata42, madata62)
par(mfrow = c(1, 2))
plot(M ~ A, data = madata42, main = "4vs2",
xlab = "A", ylab = "M", col = madata62$class)
plot(M ~ A, data = madata62, main = "6vs2",
xlab = "A", ylab = "M", col = madata62$class)
library("lattice")
latma <- xyplot(M ~ A | data, data = madata,
groups = madata$class,
auto.key = TRUE)
print(latma)
library("ggplot2")
ggma <- ggplot(aes(x = A, y = M, colour = class), data = madata,
colour = class) +
geom_point() +
facet_grid(. ~ data)
print(ggma)
library("RColorBrewer")
bcols <- brewer.pal(4, "Set1")
cls <- c("Background" = "#12121230",
"P02769" = bcols[1],
"P00924" = bcols[2],
"P62894" = bcols[3],
"P00489" = bcols[4])
ggma2 <- ggplot(aes(x = A, y = M, colour = class),
data = madata) + geom_point(shape = 19) +
facet_grid(. ~ data) + scale_colour_manual(values = cls) +
guides(colour = guide_legend(override.aes = list(alpha = 1)))
print(ggma2)
MAplot
method for MSnSet
instancesMAplot(qnt, cex = .8)
This app is based on Mike Love’s shinyMA application, adapted for a proteomics data. A screen shot is displayed below. To start the application:
shinyMA()
The application is also available online at https://lgatto.shinyapps.io/shinyMA/.
Below, using the msmsTest package, we load a example MSnSet
data with spectral couting data (from the r Biocpkg("msmsEDA")
package) and run a statistical test to obtain (adjusted) p-values and fold-changes.
library("msmsEDA")
library("msmsTests")
data(msms.dataset)
## Pre-process expression matrix
e <- pp.msms.data(msms.dataset)
## Models and normalizing condition
null.f <- "y~batch"
alt.f <- "y~treat+batch"
div <- apply(exprs(e),2,sum)
## Test
res <- msms.glm.qlll(e,alt.f,null.f,div=div)
lst <- test.results(res,e,pData(e)$treat,"U600","U200",div,
alpha=0.05,minSpC=2,minLFC=log2(1.8),
method="BH")
Here, we produce the volcano plot by hand, with the plot
function. In the second plot, we limit the x axis limits and add grid lines.
plot(lst$tres$LogFC, -log10(lst$tres$p.value))
plot(lst$tres$LogFC, -log10(lst$tres$p.value),
xlim = c(-3, 3))
grid()
Below, we use the res.volcanoplot
function from the r Biocpkg("msmsTests")
package. This functions uses the sample annotation stored with the quantitative data in the MSnSet
object to colour the samples according to their phenotypes.
## Plot
res.volcanoplot(lst$tres,
max.pval=0.05,
min.LFC=1,
maxx=3,
maxy=NULL,
ylbls=4)
Using the counts.pca
function from the msmsEDA package:
library("msmsEDA")
data(msms.dataset)
msnset <- pp.msms.data(msms.dataset)
lst <- counts.pca(msnset, wait=FALSE)
It is also possible to generate the PCA data using the prcomp
. Below, we extract the coordinates of PC1 and PC2 from the counts.pca
result and plot them using the plot
function.
pcadata <- lst$pca$x[, 1:2]
head(pcadata)
## PC1 PC2
## U2.2502.1 -120.26080 -53.55270
## U2.2502.2 -99.90618 -53.89979
## U2.2502.3 -127.35928 -49.29906
## U2.2502.4 -166.04611 -39.27557
## U6.2502.1 -127.18423 37.11614
## U6.2502.2 -117.97016 47.03702
plot(pcadata[, 1], pcadata[, 2],
xlab = "PCA1", ylab = "PCA2")
grid()
kable(plotfuns)
plot type | traditional | lattice | ggplot2 |
---|---|---|---|
scatterplots | plot | xyplot | geom_point |
histograms | hist | histgram | geom_histogram |
density plots | plot(density()) | densityplot | geom_density |
boxplots | boxplot | bwplot | geom_boxplot |
violin plots | vioplot::vioplot | bwplot(…, panel = panel.violin) | geom_violin |
line plots | plot, matplot | xyploy, parallelplot | geom_line |
bar plots | barplot | barchart | geom_bar |
pie charts | pie | geom_bar with polar coordinates | |
dot plots | dotchart | dotplot | geom_point |
stip plots | stripchart | stripplot | goem_point |
dendrogramms | plot(hclust()) | latticeExtra package | ggdendro package |
heatmaps | image, heatmap | levelplot | geom_tile |
Below, we are going to use a data from the r Biocexptpkg("pRolocdata")
to illustrate the plotting functions.
library("pRolocdata")
data(tan2009r1)
See the MA and volcano plot examples.
The default plot type
is p
, for points. Other important types are l
for lines and h
for histogram (see below).
We extract the (normalised) intensities of the first sample
x <- exprs(tan2009r1)[, 1]
and plot the distribution with a histogram and a density plot next to each other on the same figure (using the mfrow
par
plotting paramter)
par(mfrow = c(1, 2))
hist(x)
plot(density(x))
we first extract the 888 proteins by r ncol(tan2009r1)
samples data matrix and plot the sample distributions next to each other using boxplot
and beanplot
(from the beanplot package).
library("beanplot")
x <- exprs(tan2009r1)
par(mfrow = c(2, 1))
boxplot(x)
beanplot(x[, 1], x[, 2], x[, 3], x[, 4], log = "")
below, we produce line plots that describe the protein quantitative profiles for two sets of proteins, namely er and mitochondrial proteins using matplot
.
we need to transpose the matrix (with t
) and set the type to both (b
), to display points and lines, the colours to red and steel blue, the point characters to 1 (an empty point) and the line type to 1 (a solid line).
er <- fData(tan2009r1)$markers == "ER"
mt <- fData(tan2009r1)$markers == "mitochondrion"
par(mfrow = c(2, 1))
matplot(t(x[er, ]), type = "b", col = "red", pch = 1, lty = 1)
matplot(t(x[mt, ]), type = "b", col = "steelblue", pch = 1, lty = 1)
In the last section, about spatial proteomics, we use the specialised plotDist
function from the pRoloc package to generate such figures.
To illustrate bar and dot charts, we cound the number of proteins in the respective class using table.
x <- table(fData(tan2009r1)$markers)
x
##
## Cytoskeleton ER Golgi Lysosome Nucleus
## 7 28 13 8 21
## PM Peroxisome Proteasome Ribosome 40S Ribosome 60S
## 34 4 15 20 32
## mitochondrion unknown
## 29 677
par(mfrow = c(1, 2))
barplot(x)
dotchart(x)
## Warning in dotchart(x): 'x' is neither a vector nor a matrix: using
## as.numeric(x)
The easiest to produce a complete heatmap is with the heatmap
function:
heatmap(exprs(tan2009r1))
To produce the a heatmap without the dendrograms, one can use the image function on a matrix or the specialised version for MSnSet
objects from the MSnbase package.
par(mfrow = c(1, 2))
x <- matrix(1:9, ncol = 3)
image(x)
image(tan2009r1)
See also gplots’s heatmap.2
function and the Heatplus Bioconductor package for more advanced heatmaps and the corrplot package for correlation matrices.
The easiest way to produce and plot a dendrogram is:
d <- dist(t(exprs(tan2009r1))) ## distance between samples
hc <- hclust(d) ## hierarchical clustering
plot(hc) ## visualisation
See also dendextend and this post to illustrate latticeExtra and ggdendro.
library("mzR")
mzf <- pxget(px1, 6)
## Downloading 1 file
ms <- openMSfile(mzf)
hd <- header(ms)
ms1 <- which(hd$msLevel == 1)
rtsel <- hd$retentionTime[ms1] / 60 > 30 & hd$retentionTime[ms1] / 60 < 35
library("MSnbase")
(M <- MSmap(ms, ms1[rtsel], 521, 523, .005, hd))
## 1
## Object of class "MSmap"
## Map [75, 401]
## [1] Retention time: 30:1 - 34:58
## [2] M/Z: 521 - 523 (res 0.005)
library("lattice")
ff <- colorRampPalette(c("yellow", "steelblue"))
trellis.par.set(regions=list(col=ff(100)))
plot(M, aspect = 1, allTicks = FALSE)
M@map[msMap(M) == 0] <- NA
plot3D(M, rgl = FALSE)
To produce a version that can be reoriented interactively on the screen discplay, use the rgl
library("rgl")
plot3D(M, rgl = TRUE)
lout <- matrix(NA, ncol = 10, nrow = 8)
lout[1:2, ] <- 1
for (ii in 3:4)
lout[ii, ] <- c(2, 2, 2, 2, 2, 2, 3, 3, 3, 3)
lout[5, ] <- rep(4:8, each = 2)
lout[6, ] <- rep(4:8, each = 2)
lout[7, ] <- rep(9:13, each = 2)
lout[8, ] <- rep(9:13, each = 2)
i <- ms1[which(rtsel)][1]
j <- ms1[which(rtsel)][2]
ms2 <- (i+1):(j-1)
layout(lout)
par(mar=c(4,2,1,1))
chromatogram(ms)
abline(v = hd[i, "retentionTime"], col = "red")
par(mar = c(3, 2, 1, 0))
plot(peaks(ms, i), type = "l", xlim = c(400, 1000))
legend("topright", bty = "n",
legend = paste0(
"Acquisition ", hd[i, "acquisitionNum"], "\n",
"Retention time ", formatRt(hd[i, "retentionTime"])))
abline(h = 0)
abline(v = hd[ms2, "precursorMZ"],
col = c("#FF000080",
rep("#12121280", 9)))
par(mar = c(3, 0.5, 1, 1))
plot(peaks(ms, i), type = "l", xlim = c(521, 522.5),
yaxt = "n")
abline(h = 0)
abline(v = hd[ms2, "precursorMZ"], col = "#FF000080")
##par(mar = omar)
par(mar = c(2, 2, 0, 1))
for (ii in ms2) {
p <- peaks(ms, ii)
plot(p, xlab = "", ylab = "", type = "h", cex.axis = .6)
legend("topright", legend = paste0("Prec M/Z\n",
round(hd[ii, "precursorMZ"], 2)),
bty = "n", cex = .8)
}
M2 <- MSmap(ms, i:j, 100, 1000, 1, hd)
## 1
plot3D(M2)
par(mar=c(4,1,1,1))
image(t(matrix(hd$msLevel, 1, nrow(hd))),
xlab="Retention time",
xaxt="n", yaxt="n", col=c("black","steelblue"))
k <- round(range(hd$retentionTime) / 60)
nk <- 5
axis(side=1, at=seq(0,1,1/nk), labels=seq(k[1],k[2],k[2]/nk))
The following animation scrolls over 5 minutes of retention time for a MZ range between 521 and 523.
library("animation")
an1 <- function() {
for (i in seq(0, 5, 0.2)) {
rtsel <- hd$retentionTime[ms1] / 60 > (30 + i) &
hd$retentionTime[ms1] / 60 < (35 + i)
M <- MSmap(ms, ms1[rtsel], 521, 523, .005, hd)
M@map[msMap(M) == 0] <- NA
print(plot3D(M, rgl = FALSE))
}
}
saveGIF(an1(), movie.name = "msanim1.gif")
The code chunk below scrolls of a slice of retention times while keeping the retention time constant between 30 and 35 minutes.
an2 <- function() {
for (i in seq(0, 2.5, 0.1)) {
rtsel <- hd$retentionTime[ms1] / 60 > 30 & hd$retentionTime[ms1] / 60 < 35
mz1 <- 520 + i
mz2 <- 522 + i
M <- MSmap(ms, ms1[rtsel], mz1, mz2, .005, hd)
M@map[msMap(M) == 0] <- NA
print(plot3D(M, rgl = FALSE))
}
}
saveGIF(an2(), movie.name = "msanim2.gif")
library("MSnbase")
data(itraqdata)
itraqdata2 <- pickPeaks(itraqdata, verbose = FALSE)
plot(itraqdata[[25]], full=TRUE, reporters = iTRAQ4)
par(oma = c(0, 0, 0, 0))
par(mar = c(4, 4, 1, 1))
plot(itraqdata2[[25]], itraqdata2[[28]], sequences = rep("IMIDLDGTENK", 2))
library("protViz")
data(msms)
fi <- fragmentIon("TAFDEAIAELDTLNEESYK")
fi.cyz <- as.data.frame(cbind(c=fi[[1]]$c, y=fi[[1]]$y, z=fi[[1]]$z))
p <- peakplot("TAFDEAIAELDTLNEESYK",
spec = msms[[1]],
fi = fi.cyz,
itol = 0.6,
ion.axes = FALSE)
The peakplot
function return the annotation of the MSMS spectrum that is plotted:
str(p)
## List of 7
## $ mZ.Da.error : num [1:57] 215.3 144.27 -2.8 -17.06 2.03 ...
## $ mZ.ppm.error: num [1:57] 1808046 758830 -8306 -37724 3501 ...
## $ idx : num [1:57] 1 1 1 3 16 24 41 52 67 88 ...
## $ label : chr [1:57] "c1" "c2" "c3" "c4" ...
## $ score : num -1
## $ sequence : chr "TAFDEAIAELDTLNEESYK"
## $ fragmentIon :'data.frame': 19 obs. of 3 variables:
## ..$ c: num [1:19] 119 190 337 452 581 ...
## ..$ y: num [1:19] 147 310 397 526 655 ...
## ..$ z: num [1:19] 130 293 380 509 638 ...
The following code chunks demonstrate the usage of the mass spectrometry preprocessing and plotting routines in the r CRANpkg("MALDIquant")
package. MALDIquant uses the traditional graphics system. Therefore MALDIquant overloads the traditional functions plot
, lines
and points
for its own data types. These data types represents spectrum and peak lists as S4 classes. Please see the MALDIquant vignette and the corresponding website for more details.
After loading some example data a simple plot
draws the raw spectrum.
library("MALDIquant")
data("fiedler2009subset", package="MALDIquant")
plot(fiedler2009subset[[14]])
After some preprocessing, namely variance stabilization and smoothing, we use lines
to draw our baseline estimate in our processed spectrum.
transformedSpectra <- transformIntensity(fiedler2009subset, method = "sqrt")
smoothedSpectra <- smoothIntensity(transformedSpectra, method = "SavitzkyGolay")
plot(smoothedSpectra[[14]])
lines(estimateBaseline(smoothedSpectra[[14]]), lwd = 2, col = "red")
After removing the background removal we could use plot
again to draw our baseline corrected spectrum.
rbSpectra <- removeBaseline(smoothedSpectra)
plot(rbSpectra[[14]])
detectPeaks
returns a MassPeaks
object that offers the same traditional graphics functions. The next code chunk demonstrates how to mark the detected peaks in a spectrum.
cbSpectra <- calibrateIntensity(rbSpectra, method = "TIC")
peaks <- detectPeaks(cbSpectra, SNR = 5)
plot(cbSpectra[[14]])
points(peaks[[14]], col = "red", pch = 4, lwd = 2)
Additional there is a special function labelPeaks
that allows to draw the M/Z values above the corresponding peaks. Next we mark the 5 top peaks in the spectrum.
top5 <- intensity(peaks[[14]]) %in% sort(intensity(peaks[[14]]),
decreasing = TRUE)[1:5]
labelPeaks(peaks[[14]], index = top5, avoidOverlap = TRUE)
Often multiple spectra have to be recalibrated to be comparable. Therefore MALDIquant warps the spectra according to so called reference or landmark peaks. For debugging the determineWarpingFunctions
function offers some warping plots. Here we show only the last 4 plots:
par(mfrow = c(2, 2))
warpingFunctions <-
determineWarpingFunctions(peaks,
tolerance = 0.001,
plot = TRUE,
plotInteractive = TRUE)
par(mfrow = c(1, 1))
warpedSpectra <- warpMassSpectra(cbSpectra, warpingFunctions)
warpedPeaks <- warpMassPeaks(peaks, warpingFunctions)
In the next code chunk we visualise the need and the effect of the recalibration.
sel <- c(2, 10, 14, 16)
xlim <- c(4180, 4240)
ylim <- c(0, 1.9e-3)
lty <- c(1, 4, 2, 6)
par(mfrow = c(1, 2))
plot(cbSpectra[[1]], xlim = xlim, ylim = ylim, type = "n")
for (i in seq(along = sel)) {
lines(peaks[[sel[i]]], lty = lty[i], col = i)
lines(cbSpectra[[sel[i]]], lty = lty[i], col = i)
}
plot(cbSpectra[[1]], xlim = xlim, ylim = ylim, type = "n")
for (i in seq(along = sel)) {
lines(warpedPeaks[[sel[i]]], lty = lty[i], col = i)
lines(warpedSpectra[[sel[i]]], lty = lty[i], col = i)
}
par(mfrow = c(1, 1))
The code chunks above generate plots that are very similar to the figure 7 in the corresponding paper “Visualisation of proteomics data using R”. Please find the code to exactly reproduce the figure at: https://github.com/sgibb/MALDIquantExamples/blob/master/R/createFigure1_color.R
These visualisations originate from the Pbase
Pbase-data
and mapping
vignettes.
The following code chunk downloads a MALDI imaging dataset from a mouse kidney shared by Adrien Nyakas and Stefan Schurch and generates a plot with the mean spectrum and three slices of interesting M/Z regions.
library("MALDIquant")
library("MALDIquantForeign")
spectra <- importBrukerFlex("http://files.figshare.com/1106682/MouseKidney_IMS_testdata.zip", verbose = FALSE)
spectra <- smoothIntensity(spectra, "SavitzkyGolay", halfWindowSize = 8)
spectra <- removeBaseline(spectra, method = "TopHat", halfWindowSize = 16)
spectra <- calibrateIntensity(spectra, method = "TIC")
avgSpectrum <- averageMassSpectra(spectra)
avgPeaks <- detectPeaks(avgSpectrum, SNR = 5)
avgPeaks <- avgPeaks[intensity(avgPeaks) > 0.0015]
oldPar <- par(no.readonly = TRUE)
layout(matrix(c(1,1,1,2,3,4), nrow = 2, byrow = TRUE))
plot(avgSpectrum, main = "mean spectrum",
xlim = c(3000, 6000), ylim = c(0, 0.007))
lines(avgPeaks, col = "red")
labelPeaks(avgPeaks, cex = 1)
par(mar = c(0.5, 0.5, 1.5, 0.5))
plotMsiSlice(spectra,
center = mass(avgPeaks),
tolerance = 1,
plotInteractive = TRUE)
par(oldPar)
)]
There is also an interactive MALDIquant IMS shiny app for demonstration purposes. A screen shot is displayed below. To start the application:
library("shiny")
runGitHub("sgibb/ims-shiny")
library("pRoloc")
library("pRolocdata")
data(tan2009r1)
## these params use class weights
fn <- dir(system.file("extdata", package = "pRoloc"),
full.names = TRUE, pattern = "params2.rda")
load(fn)
setStockcol(NULL)
setStockcol(paste0(getStockcol(), 90))
w <- table(fData(tan2009r1)[, "pd.markers"])
(w <- 1/w[names(w) != "unknown"])
##
## Cytoskeleton ER Golgi Lysosome Nucleus
## 0.14285714 0.05000000 0.16666667 0.12500000 0.05000000
## PM Peroxisome Proteasome Ribosome 40S Ribosome 60S
## 0.06666667 0.25000000 0.09090909 0.07142857 0.04000000
## mitochondrion
## 0.07142857
tan2009r1 <- svmClassification(tan2009r1, params2,
class.weights = w,
fcol = "pd.markers")
ptsze <- exp(fData(tan2009r1)$svm.scores) - 1
lout <- matrix(c(1:4, rep(5, 4)), ncol = 4, nrow = 2)
layout(lout)
cls <- getStockcol()
par(mar = c(4, 4, 1, 1))
plotDist(tan2009r1[which(fData(tan2009r1)$PLSDA == "mitochondrion"), ],
markers = featureNames(tan2009r1)[which(fData(tan2009r1)$markers.orig == "mitochondrion")],
mcol = cls[5])
legend("topright", legend = "mitochondrion", bty = "n")
plotDist(tan2009r1[which(fData(tan2009r1)$PLSDA == "ER/Golgi"), ],
markers = featureNames(tan2009r1)[which(fData(tan2009r1)$markers.orig == "ER")],
mcol = cls[2])
legend("topright", legend = "ER", bty = "n")
plotDist(tan2009r1[which(fData(tan2009r1)$PLSDA == "ER/Golgi"), ],
markers = featureNames(tan2009r1)[which(fData(tan2009r1)$markers.orig == "Golgi")],
mcol = cls[3])
legend("topright", legend = "Golgi", bty = "n")
plotDist(tan2009r1[which(fData(tan2009r1)$PLSDA == "PM"), ],
markers = featureNames(tan2009r1)[which(fData(tan2009r1)$markers.orig == "PM")],
mcol = cls[8])
legend("topright", legend = "PM", bty = "n")
plot2D(tan2009r1, fcol = "svm", cex = ptsze, method = "kpca")
addLegend(tan2009r1, where = "bottomleft", fcol = "svm", bty = "n")
See the pRoloc-tutorial
vignette (pdf) from the pRoloc package for details about spatial proteomics data analysis and visualisation.
print(sessionInfo(), locale = FALSE)
## R version 3.2.2 (2015-08-14)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.3 LTS
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] protViz_0.2.9 beanplot_1.2 ggplot2_1.0.1
## [4] lattice_0.20-33 msmsTests_1.8.0 msmsEDA_1.8.0
## [7] pRolocdata_1.8.0 pRoloc_1.10.0 MLInterfaces_1.50.0
## [10] cluster_2.0.3 annotate_1.48.0 XML_3.98-1.3
## [13] AnnotationDbi_1.32.0 IRanges_2.4.0 S4Vectors_0.8.0
## [16] MALDIquantForeign_0.9 MALDIquant_1.13 RColorBrewer_1.1-2
## [19] xtable_1.7-4 rpx_1.6.0 knitr_1.11
## [22] BiocInstaller_1.20.0 RforProteomics_1.8.0 MSnbase_1.18.0
## [25] ProtGenerics_1.2.0 BiocParallel_1.4.0 mzR_2.4.0
## [28] Rcpp_0.12.1 Biobase_2.30.0 BiocGenerics_0.16.0
## [31] BiocStyle_1.8.0
##
## loaded via a namespace (and not attached):
## [1] plyr_1.8.3 GSEABase_1.32.0
## [3] splines_3.2.2 ggvis_0.4.2
## [5] digest_0.6.8 foreach_1.4.3
## [7] htmltools_0.2.6 gdata_2.17.0
## [9] magrittr_1.5 doParallel_1.0.10
## [11] sfsmisc_1.0-28 limma_3.26.0
## [13] rda_1.0.2-2 R.utils_2.1.0
## [15] lpSolve_5.6.13 colorspace_1.2-6
## [17] dplyr_0.4.3 RCurl_1.95-4.7
## [19] readBrukerFlexData_1.8.2 graph_1.48.0
## [21] genefilter_1.52.0 lme4_1.1-10
## [23] impute_1.44.0 survival_2.38-3
## [25] iterators_1.0.8 gtable_0.1.2
## [27] zlibbioc_1.16.0 readMzXmlData_2.8.1
## [29] MatrixModels_0.4-1 car_2.1-0
## [31] kernlab_0.9-22 prabclus_2.2-6
## [33] DEoptimR_1.0-3 SparseM_1.7
## [35] scales_0.3.0 futile.options_1.0.0
## [37] vsn_3.38.0 mvtnorm_1.0-3
## [39] edgeR_3.12.0 DBI_0.3.1
## [41] proxy_0.4-15 mclust_5.0.2
## [43] preprocessCore_1.32.0 htmlwidgets_0.5
## [45] sampling_2.7 threejs_0.2.1
## [47] gplots_2.17.0 FNN_1.1
## [49] fpc_2.1-10 modeltools_0.2-21
## [51] R.methodsS3_1.7.0 flexmix_2.3-13
## [53] nnet_7.3-11 RJSONIO_1.3-0
## [55] caret_6.0-57 labeling_0.3
## [57] reshape2_1.4.1 munsell_0.4.2
## [59] mlbench_2.1-1 biocViews_1.38.0
## [61] tools_3.2.2 downloader_0.4
## [63] RSQLite_1.0.0 pls_2.5-0
## [65] evaluate_0.8 stringr_1.0.0
## [67] mzID_1.8.0 yaml_2.1.13
## [69] robustbase_0.92-5 caTools_1.17.1
## [71] rgl_0.95.1367 randomForest_4.6-12
## [73] RBGL_1.46.0 nlme_3.1-122
## [75] mime_0.4 quantreg_5.19
## [77] formatR_1.2.1 R.oo_1.19.0
## [79] biomaRt_2.26.0 pbkrtest_0.4-2
## [81] interactiveDisplayBase_1.8.0 e1071_1.6-7
## [83] affyio_1.40.0 stringi_0.5-5
## [85] highr_0.5.1 futile.logger_1.4.1
## [87] trimcluster_0.1-2 Matrix_1.2-2
## [89] nloptr_1.0.4 gbm_2.1.1
## [91] RUnit_0.4.30 bitops_1.0-6
## [93] qvalue_2.2.0 httpuv_1.3.3
## [95] R6_2.1.1 pcaMethods_1.60.0
## [97] affy_1.48.0 hwriter_1.3.2
## [99] KernSmooth_2.23-15 gridSVG_1.4-3
## [101] codetools_0.2-14 lambda.r_1.1.7
## [103] MASS_7.3-44 gtools_3.5.0
## [105] assertthat_0.1 interactiveDisplay_1.8.0
## [107] proto_0.3-10 Category_2.36.0
## [109] diptest_0.75-7 mgcv_1.8-7
## [111] grid_3.2.2 rpart_4.1-10
## [113] class_7.3-14 minqa_1.2.4
## [115] rmarkdown_0.8.1 shiny_0.12.2
## [117] base64enc_0.1-3