ebayes {limma} | R Documentation |

Given a microarray linear model fit, compute moderated t-statistics, moderated F-statistic, and log-odds of differential expression by empirical Bayes moderation of the standard errors towards a common value.

ebayes(fit, proportion=0.01, stdev.coef.lim=c(0.1,4), trend=FALSE, robust=FALSE, winsor.tail.p=c(0.05,0.1)) eBayes(fit, proportion=0.01, stdev.coef.lim=c(0.1,4), trend=FALSE, robust=FALSE, winsor.tail.p=c(0.05,0.1)) treat(fit, lfc=0, trend=FALSE, robust=FALSE, winsor.tail.p=c(0.05,0.1))

`fit` |
an |

`proportion` |
numeric value between 0 and 1, assumed proportion of genes which are differentially expressed |

`stdev.coef.lim` |
numeric vector of length 2, assumed lower and upper limits for the standard deviation of log2-fold-changes for differentially expressed genes |

`trend` |
logical, should an intensity-trend be allowed for the prior variance? Default is that the prior variance is constant. |

`robust` |
logical, should the estimation of |

`winsor.tail.p` |
numeric vector of length 1 or 2, giving left and right tail proportions of |

`lfc` |
the minimum log2-fold-change that is considered scientifically meaningful |

These functions are used to rank genes in order of evidence for differential expression.
They use an empirical Bayes method to shrink the probe-wise sample variances towards a common value and to augmenting the degrees of freedom for the individual variances (Smyth, 2004).
The functions accept as input argument `fit`

a fitted model object from the functions `lmFit`

, `lm.series`

, `mrlm`

or `gls.series`

.
The fitted model object may have been processed by `contrasts.fit`

before being passed to `eBayes`

to convert the coefficients of the design matrix into an arbitrary number of contrasts which are to be tested equal to zero.
The columns of `fit`

define a set of contrasts which are to be tested equal to zero.

The empirical Bayes moderated t-statistics test each individual contrast equal to zero. For each probe (row), the moderated F-statistic tests whether all the contrasts are zero. The F-statistic is an overall test computed from the set of t-statistics for that probe. This is exactly analogous the relationship between t-tests and F-statistics in conventional anova, except that the residual mean squares and residual degrees of freedom have been moderated between probes.

The estimates `s2.prior`

and `df.prior`

are computed by `fitFDist`

.
`s2.post`

is the weighted average of `s2.prior`

and `sigma^2`

with weights proportional to `df.prior`

and `df.residual`

respectively.
The `lods`

is sometimes known as the B-statistic.
The F-statistics `F`

are computed by `classifyTestsF`

with `fstat.only=TRUE`

.

`eBayes`

doesn't compute ordinary (unmoderated) t-statistics by default, but these can be easily extracted from
the linear model output, see the example below.

`ebayes`

is the earlier and leaner function, kept for backwards compatibility, while
`eBayes`

is the later more object-orientated version.
The difference is that `ebayes`

outputs only the empirical Bayes statistics whereas `eBayes`

adds them to the fitted model object `fit`

.
`eBayes`

is recommended for routine use as it produces objects containing all the necessary components for downstream analysis

`treat`

computes empirical Bayes moderated-t p-values relative to a minimum required fold-change threshold.
Use `topTreat`

to summarize output from `treat`

.
Instead of testing for genes which have log-fold-changes different from zero, it tests whether the log2-fold-change is greater than `lfc`

in absolute value (McCarthy and Smyth, 2009).
`treat`

is concerned with p-values rather than posterior odds, so it does not compute the B-statistic `lods`

.
The idea of thresholding doesn't apply to F-statistics in a straightforward way, so moderated F-statistics are also not computed.

If `trend=TRUE`

then an intensity-dependent trend is fitted to the prior variances `s2.prior`

.
Specifically, `squeezeVar`

is called with the `covariate`

equal to `Amean`

, the average log2-intensity for each gene.
See `squeezeVar`

for more details.

If `robust=TRUE`

then the robust empirical Bayes procedure of Phipson et al (2016) is used.
See `squeezeVar`

for more details.

`eBayes`

produces an object of class `MArrayLM`

(see `MArrayLM-class`

) containing everything found in `fit`

plus the following added components:

`t` |
numeric vector or matrix of moderated t-statistics |

`p.value` |
numeric vector of p-values corresponding to the t-statistics |

`s2.prior` |
estimated prior value for |

`df.prior` |
degrees of freedom associated with |

`df.total` |
numeric vector of total degrees of freedom associated with t-statistics and p-values. Equal to |

`s2.post` |
numeric vector giving the posterior values for |

`lods` |
numeric vector or matrix giving the log-odds of differential expression (natural log scale). |

`var.prior` |
estimated prior value for the variance of the log2-fold-change for differentially expressed gene |

`F` |
numeric vector of moderated F-statistics for testing all contrasts defined by the columns of |

`F.p.value` |
numeric vector giving p-values corresponding to |

`treat`

a produces an `MArrayLM`

object similar to `eBayes`

but without `lods`

, `var.prior`

, `F`

or `F.p.value`

.
`ebayes`

produces an ordinary list containing the above components except for `F`

and `F.p.value`

.

The algorithm used by `eBayes`

and `treat`

with `robust=TRUE`

was revised slightly in limma 3.27.6.
The minimum `df.prior`

returned may be slightly smaller than previously.

Gordon Smyth and Davis McCarthy

McCarthy, D. J., and Smyth, G. K. (2009).
Testing significance relative to a fold-change threshold is a TREAT.
*Bioinformatics* 25, 765-771.
http://bioinformatics.oxfordjournals.org/content/25/6/765

Loennstedt, I., and Speed, T. P. (2002). Replicated microarray data. *Statistica Sinica* **12**, 31-46.

Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016).
Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression.
*Annals of Applied Statistics* 10, 946-963.
http://projecteuclid.org/euclid.aoas/1469199900

Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments.
*Statistical Applications in Genetics and Molecular Biology* 3, Article 3.
http://www.statsci.org/smyth/pubs/ebayes.pdf

`squeezeVar`

, `fitFDist`

, `tmixture.matrix`

.

An overview of linear model functions in limma is given by 06.LinearModels.

# See also lmFit examples # Simulate gene expression data, # 6 microarrays and 100 genes with one gene differentially expressed set.seed(2016) sigma2 <- 0.05 / rchisq(100, df=10) * 10 y <- matrix(rnorm(100*6,sd=sqrt(sigma2)),100,6) design <- cbind(Intercept=1,Group=c(0,0,0,1,1,1)) y[1,4:6] <- y[1,4:6] + 1 fit <- lmFit(y,design) # Moderated t-statistic fit <- eBayes(fit) topTable(fit,coef=2) # Ordinary t-statistic ordinary.t <- fit$coef[,2] / fit$stdev.unscaled[,2] / fit$sigma # Q-Q plots of t statistics # Points off the line may be differentially expressed par(mfrow=c(1,2)) qqt(ordinary.t, df=fit$df.residual, main="Ordinary t") abline(0,1) qqt(fit$t[,2], df=fit$df.total,main="Moderated t") abline(0,1) par(mfrow=c(1,1)) # Treat tfit <- treat(fit,lfc=log2(1.1)) topTreat(tfit,coef=2)

[Package *limma* version 3.34.5 Index]