removeBatchEffect {limma} | R Documentation |
Remove batch effects from expression data.
removeBatchEffect(x, batch=NULL, batch2=NULL, covariates=NULL, design=matrix(1,ncol(x),1), ...)
x |
numeric matrix, or any data object that can be processed by |
batch |
factor or vector indicating batches. |
batch2 |
optional factor or vector indicating a second series of batches. |
covariates |
matrix or vector of numeric covariates to be adjusted for. |
design |
optional design matrix relating to treatment conditions to be preserved |
... |
other arguments are passed to |
This function is useful for removing batch effects, associated with hybridization time or other technical variables, prior to clustering or unsupervised analysis such as PCA, MDS or heatmaps. The design matrix is used to describe comparisons between the samples, for example treatment effects, which should not be removed. The function (in effect) fits a linear model to the data, including both batches and regular treatments, then removes the component due to the batch effects.
In most applications, only the first batch
argument will be needed.
This covers the situation where the data has been collected in a series of separate batches.
The batch2
argument is used when there is a second series of batch effects, independent of the first series.
For example, batch
might correspond to time of data collection while batch2
might correspond to operator or some other change in operating characteristics.
If batch2
is included, then the effects of batch
and batch2
are assumed to be additive.
The covariates
argument allows correction for one or more continuous numeric effects, similar to the analysis of covariance method in statistics.
If covariates
contains more than one column, then the columns are assumed to have additive effects.
The data object x
can be of any class for which lmFit
works.
If x
contains weights, then these will be used in estimating the batch effects.
A numeric matrix of log-expression values with batch and covariate effects removed.
This function is not intended to be used prior to linear modelling. For linear modelling, it is better to include the batch factors in the linear model.
Gordon Smyth and Carolyn de Graaf
y <- matrix(rnorm(10*9),10,9) y[,1:3] <- y[,1:3] + 5 batch <- c("A","A","A","B","B","B","C","C","C") y2 <- removeBatchEffect(y, batch) par(mfrow=c(1,2)) boxplot(as.data.frame(y),main="Original") boxplot(as.data.frame(y2),main="Batch corrected")