ebayes {limma} R Documentation

## Empirical Bayes Statistics for Differential Expression

### Description

Given a microarray linear model fit, compute moderated t-statistics, moderated F-statistic, and log-odds of differential expression by empirical Bayes moderation of the standard errors towards a common value.

### Usage

```ebayes(fit, proportion=0.01, stdev.coef.lim=c(0.1,4),
trend=FALSE, robust=FALSE, winsor.tail.p=c(0.05,0.1))
eBayes(fit, proportion=0.01, stdev.coef.lim=c(0.1,4),
trend=FALSE, robust=FALSE, winsor.tail.p=c(0.05,0.1))
treat(fit, lfc=0, trend=FALSE, robust=FALSE, winsor.tail.p=c(0.05,0.1))
```

### Arguments

 `fit` an `MArrayLM` fitted model object produced by `lmFit` or `contrasts.fit`. For `ebayes` only, `fit` can alternatively be an unclassed list produced by `lm.series`, `gls.series` or `mrlm` containing components `coefficients`, `stdev.unscaled`, `sigma` and `df.residual`. `proportion` numeric value between 0 and 1, assumed proportion of genes which are differentially expressed `stdev.coef.lim` numeric vector of length 2, assumed lower and upper limits for the standard deviation of log2-fold-changes for differentially expressed genes `trend` logical, should an intensity-trend be allowed for the prior variance? Default is that the prior variance is constant. `robust` logical, should the estimation of `df.prior` and `var.prior` be robustified against outlier sample variances? `winsor.tail.p` numeric vector of length 1 or 2, giving left and right tail proportions of `x` to Winsorize. Used only when `robust=TRUE`. `lfc` the minimum log2-fold-change that is considered scientifically meaningful

### Details

These functions are used to rank genes in order of evidence for differential expression. They use an empirical Bayes method to shrink the probe-wise sample variances towards a common value and to augmenting the degrees of freedom for the individual variances (Smyth, 2004). The functions accept as input argument `fit` a fitted model object from the functions `lmFit`, `lm.series`, `mrlm` or `gls.series`. The fitted model object may have been processed by `contrasts.fit` before being passed to `eBayes` to convert the coefficients of the design matrix into an arbitrary number of contrasts which are to be tested equal to zero. The columns of `fit` define a set of contrasts which are to be tested equal to zero.

The empirical Bayes moderated t-statistics test each individual contrast equal to zero. For each probe (row), the moderated F-statistic tests whether all the contrasts are zero. The F-statistic is an overall test computed from the set of t-statistics for that probe. This is exactly analogous the relationship between t-tests and F-statistics in conventional anova, except that the residual mean squares and residual degrees of freedom have been moderated between probes.

The estimates `s2.prior` and `df.prior` are computed by `fitFDist`. `s2.post` is the weighted average of `s2.prior` and `sigma^2` with weights proportional to `df.prior` and `df.residual` respectively. The `lods` is sometimes known as the B-statistic. The F-statistics `F` are computed by `classifyTestsF` with `fstat.only=TRUE`.

`eBayes` doesn't compute ordinary (unmoderated) t-statistics by default, but these can be easily extracted from the linear model output, see the example below.

`ebayes` is the earlier and leaner function, kept for backwards compatibility, while `eBayes` is the later more object-orientated version. The difference is that `ebayes` outputs only the empirical Bayes statistics whereas `eBayes` adds them to the fitted model object `fit`. `eBayes` is recommended for routine use as it produces objects containing all the necessary components for downstream analysis

`treat` computes empirical Bayes moderated-t p-values relative to a minimum required fold-change threshold. Use `topTreat` to summarize output from `treat`. Instead of testing for genes which have log-fold-changes different from zero, it tests whether the log2-fold-change is greater than `lfc` in absolute value (McCarthy and Smyth, 2009). `treat` is concerned with p-values rather than posterior odds, so it does not compute the B-statistic `lods`. The idea of thresholding doesn't apply to F-statistics in a straightforward way, so moderated F-statistics are also not computed.

If `trend=TRUE` then an intensity-dependent trend is fitted to the prior variances `s2.prior`. Specifically, `squeezeVar` is called with the `covariate` equal to `Amean`, the average log2-intensity for each gene. See `squeezeVar` for more details.

If `robust=TRUE` then the robust empirical Bayes procedure of Phipson et al (2016) is used. See `squeezeVar` for more details.

### Value

`eBayes` produces an object of class `MArrayLM` (see `MArrayLM-class`) containing everything found in `fit` plus the following added components:

 `t` numeric vector or matrix of moderated t-statistics `p.value` numeric vector of p-values corresponding to the t-statistics `s2.prior` estimated prior value for `sigma^2`. A vector if `covariate` is non-`NULL`, otherwise a scalar. `df.prior` degrees of freedom associated with `s2.prior` `df.total` numeric vector of total degrees of freedom associated with t-statistics and p-values. Equal to `df.prior+df.residual` or `sum(df.residual)`, whichever is smaller. `s2.post` numeric vector giving the posterior values for `sigma^2` `lods` numeric vector or matrix giving the log-odds of differential expression (natural log scale). `var.prior` estimated prior value for the variance of the log2-fold-change for differentially expressed gene `F` numeric vector of moderated F-statistics for testing all contrasts defined by the columns of `fit` simultaneously equal to zero `F.p.value` numeric vector giving p-values corresponding to `F`

`treat` a produces an `MArrayLM` object similar to `eBayes` but without `lods`, `var.prior`, `F` or `F.p.value`. `ebayes` produces an ordinary list containing the above components except for `F` and `F.p.value`.

### Note

The algorithm used by `eBayes` and `treat` with `robust=TRUE` was revised slightly in limma 3.27.6. The minimum `df.prior` returned may be slightly smaller than previously.

### Author(s)

Gordon Smyth and Davis McCarthy

### References

McCarthy, D. J., and Smyth, G. K. (2009). Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics 25, 765-771. http://bioinformatics.oxfordjournals.org/content/25/6/765

Loennstedt, I., and Speed, T. P. (2002). Replicated microarray data. Statistica Sinica 12, 31-46.

Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963. http://projecteuclid.org/euclid.aoas/1469199900

Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3, Article 3. http://www.statsci.org/smyth/pubs/ebayes.pdf

`squeezeVar`, `fitFDist`, `tmixture.matrix`.

An overview of linear model functions in limma is given by 06.LinearModels.

### Examples

```#  See also lmFit examples

#  Simulate gene expression data,
#  6 microarrays and 100 genes with one gene differentially expressed
set.seed(2016)
sigma2 <- 0.05 / rchisq(100, df=10) * 10
y <- matrix(rnorm(100*6,sd=sqrt(sigma2)),100,6)
design <- cbind(Intercept=1,Group=c(0,0,0,1,1,1))
y[1,4:6] <- y[1,4:6] + 1
fit <- lmFit(y,design)

#  Moderated t-statistic
fit <- eBayes(fit)
topTable(fit,coef=2)

#  Ordinary t-statistic
ordinary.t <- fit\$coef[,2] / fit\$stdev.unscaled[,2] / fit\$sigma

#  Q-Q plots of t statistics
#  Points off the line may be differentially expressed
par(mfrow=c(1,2))
qqt(ordinary.t, df=fit\$df.residual, main="Ordinary t")
abline(0,1)
qqt(fit\$t[,2], df=fit\$df.total,main="Moderated t")
abline(0,1)
par(mfrow=c(1,1))

# Treat
tfit <- treat(fit,lfc=log2(1.1))
topTreat(tfit,coef=2)
```

[Package limma version 3.34.5 Index]