roast {limma} | R Documentation |

Rotation gene set testing for linear models.

## Default S3 method: roast(y, index = NULL, design = NULL, contrast = ncol(design), geneid = NULL, set.statistic = "mean", gene.weights = NULL, var.prior = NULL, df.prior = NULL, nrot = 999, approx.zscore = TRUE, ...) ## Default S3 method: mroast(y, index = NULL, design = NULL, contrast = ncol(design), geneid = NULL, set.statistic = "mean", gene.weights = NULL, var.prior = NULL, df.prior = NULL, nrot = 999, approx.zscore = TRUE, adjust.method = "BH", midp = TRUE, sort = "directional", ...) ## Default S3 method: fry(y, index = NULL, design = NULL, contrast = ncol(design), geneid = NULL, standardize = "posterior.sd", sort = "directional", ...)

`y` |
numeric matrix giving log-expression or log-ratio values for a series of microarrays, or any object that can coerced to a matrix including |

`index` |
index vector specifying which rows (probes) of |

`design` |
design matrix |

`contrast` |
contrast for which the test is required.
Can be an integer specifying a column of |

`geneid` |
gene identifiers corresponding to the rows of |

`set.statistic` |
summary set statistic. Possibilities are |

`gene.weights` |
numeric vector of directional (positive or negative) probewise weights.
For |

`var.prior` |
prior value for residual variances. If not provided, this is estimated from all the data using |

`df.prior` |
prior degrees of freedom for residual variances. If not provided, this is estimated using |

`nrot` |
number of rotations used to compute the p-values. |

`approx.zscore` |
logical, if |

`adjust.method` |
method used to adjust the p-values for multiple testing. See |

`midp` |
logical, should mid-p-values be used in instead of ordinary p-values when adjusting for multiple testing? |

`sort` |
character, whether to sort output table by directional p-value ( |

`standardize` |
how to standardize for unequal probewise variances. Possibilities are |

`...` |
any argument that would be suitable for |

These functions implement the ROAST gene set tests proposed by Wu et al (2010).
They perform *self-contained* gene set tests in the sense defined by Goeman and Buhlmann (2007).
For *competitive* gene set tests, see `camera`

.
For a gene set enrichment analysis style analysis using a database of gene sets, see `romer`

.

`roast`

and `mroast`

test whether any of the genes in the set are differentially expressed.
They can be used for any microarray experiment which can be represented by a linear model.
The design matrix for the experiment is specified as for the `lmFit`

function, and the contrast of interest is specified as for the `contrasts.fit`

function.
This allows users to focus on differential expression for any coefficient or contrast in a linear model.
If `contrast`

is not specified, then the last coefficient in the linear model will be tested.

The argument `index`

is often made using ids2indices.

The argument `gene.weights`

allows directional weights to be set for individual genes in the set.
This is often useful, because it allows each gene to be flagged as to its direction and magnitude of change based on prior experimentation.
A typical use is to make the `gene.weights`

`1`

or `-1`

depending on whether the gene is up or down-regulated in the pathway under consideration.

The arguments `array.weights`

, `block`

and `correlation`

have the same meaning as for the `lmFit`

function.
The arguments `df.prior`

and `var.prior`

have the same meaning as in the output of the `eBayes`

function.
If these arguments are not supplied, they are estimated exactly as is done by `eBayes`

.

The gene set statistics `"mean"`

, `"floormean"`

, `"mean50"`

and `msq`

are defined by Wu et al (2010).
The different gene set statistics have different sensitivities to small number of genes.
If `set.statistic="mean"`

then the set will be statistically significantly only when the majority of the genes are differentially expressed.
`"floormean"`

and `"mean50"`

will detect as few as 25% differentially expressed.
`"msq"`

is sensitive to even smaller proportions of differentially expressed genes, if the effects are reasonably large.

The output gives p-values three possible alternative hypotheses,
`"Up"`

to test whether the genes in the set tend to be up-regulated, with positive t-statistics,
`"Down"`

to test whether the genes in the set tend to be down-regulated, with negative t-statistics,
and `"Mixed"`

to test whether the genes in the set tend to be differentially expressed, without regard for direction.

`roast`

estimates p-values by simulation, specifically by random rotations of the orthogonalized residuals (Langsrud, 2005), so p-values will vary slightly from run to run.
To get more precise p-values, increase the number of rotations `nrot`

.
The p-value is computed as `(b+1)/(nrot+1)`

where `b`

is the number of rotations giving a more extreme statistic than that observed (Phipson and Smyth, 2010).
This means that the smallest possible p-value is `1/(nrot+1)`

.

`mroast`

does roast tests for multiple sets, including adjustment for multiple testing.
By default, `mroast`

reports ordinary p-values but uses mid-p-values (Routledge, 1994) at the multiple testing stage.
Mid-p-values are probably a good choice when using false discovery rates (`adjust.method="BH"`

) but not when controlling the family-wise type I error rate (`adjust.method="holm"`

).

`fry`

is a fast approximation to `mroast`

.
In the special case that `df.prior`

is large and `set.statistic="mean"`

, `fry`

gives the same result as `mroast`

with an infinite number of rotations.
In other circumstances, when genes have different variances, `fry`

uses a standardization strategy to approximate the `mroast`

results.
Using `fry`

may be advisable when performing tests for a large number of sets, because it is fast and because the `fry`

p-values are not limited by the number of rotations performed.

`roast`

produces an object of class `"Roast"`

.
This consists of a list with the following components:

`p.value` |
data.frame with columns |

`var.prior` |
prior value for residual variances. |

`df.prior` |
prior degrees of freedom for residual variances. |

`mroast`

produces a data.frame with a row for each set and the following columns:

`NGenes` |
number of genes in set |

`PropDown` |
proportion of genes in set with |

`PropUp` |
proportion of genes in set with |

`Direction` |
direction of change, |

`PValue` |
two-sided directional p-value |

`FDR` |
two-sided directional false discovery rate |

`PValue.Mixed` |
non-directional p-value |

`FDR.Mixed` |
non-directional false discovery rate |

`fry`

produces the same output format as `mroast`

but without the columns `PropDown`

and `ProbUp`

.

The default setting for the set statistic was changed in limma 3.5.9 (3 June 2010) from `"msq"`

to `"mean"`

.

Gordon Smyth and Di Wu

Goeman, JJ, and Buhlmann, P (2007).
Analyzing gene expression data in terms of gene sets: methodological issues.
*Bioinformatics* 23, 980-987.

Langsrud, O (2005).
Rotation tests.
*Statistics and Computing* 15, 53-60.

Phipson B, and Smyth GK (2010).
Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn.
*Statistical Applications in Genetics and Molecular Biology*, Volume 9, Article 39.
http://www.statsci.org/smyth/pubs/PermPValuesPreprint.pdf

Routledge, RD (1994).
Practicing safe statistics with the mid-p.
*Canadian Journal of Statistics* 22, 103-110.

Wu, D, Lim, E, Francois Vaillant, F, Asselin-Labat, M-L, Visvader, JE, and Smyth, GK (2010). ROAST: rotation gene set tests for complex microarray experiments.
*Bioinformatics* 26, 2176-2182.
http://bioinformatics.oxfordjournals.org/content/26/17/2176

See 10.GeneSetTests for a description of other functions used for gene set testing.

y <- matrix(rnorm(100*4),100,4) design <- cbind(Intercept=1,Group=c(0,0,1,1)) # First set of 5 genes contains 3 that are genuinely differentially expressed index1 <- 1:5 y[index1,3:4] <- y[index1,3:4]+3 # Second set of 5 genes contains none that are DE index2 <- 6:10 roast(y,index1,design,contrast=2) fry(y,list(set1=index1,set2=index2),design,contrast=2)

[Package *limma* version 3.30.8 Index]