genas {limma} | R Documentation |
Calculates biological correlation between two gene expression profiles.
genas(fit, coef=c(1,2), subset="all", plot=FALSE, alpha=0.4)
fit |
an |
coef |
numeric vector of length 2 indicating which columns in the fit object are to be correlated. |
subset |
character string indicating which subset of genes to include in the correlation analysis.
Choices are |
plot |
logical, should a scatterplot be produced summarizing the correlation analysis? |
alpha |
numeric value between 0 and 1 determining the transparency of the technical and biological ellipses if a plot is produced.
|
The function estimates the biological correlation between two different contrasts in a linear model. By biological correlation, we mean the correlation that would exist between the log2-fold changes (logFC) for the two contrasts, if measurement error could be eliminated and the true log-fold-changes were known. This function is motivated by the fact that different contrasts for a linear model are often strongly correlated in a technical sense. For example, the estimated logFC for multiple treatment conditions compared back to the same control group will be positively correlated even in the absence of any biological effect. This function aims to separate the biological from the technical components of the correlation. The method is explained briefly in Majewski et al (2010) and in full detail in Phipson (2013).
The subset
argument specifies whether and how the fit object should be subsetted.
Ideally, only genes that are truly differentially expressed for one or both of the contrasts should be used estimate the biological correlation.
The default is "all"
, which uses all genes in the fit object to estimate the biological correlation.
The option "Fpval"
chooses genes based on how many F-test p-values are estimated to be truly significant using the function propTrueNull
.
This should capture genes that display any evidence of differential expression in either of the two contrasts.
The options "p.union"
and "p.int"
are based on the moderated t p-values from both contrasts.
From the propTrueNull
function an estimate of the number of p-values truly significant in either of the two contrasts can be obtained.
"p.union" takes the union of these genes and "p.int"
takes the intersection of these genes.
The other options, "logFC"
and "predFC"
subsets on genes that attain a logFC or predFC at least as large as the 90th percentile of the log fold changes or predictive log fold changes on the absolute scale.
The plot
option is a logical argument that specifies whether or not to plot a scatter plot of log-fold-changes for the two contrasts.
The biological and technical correlations are overlaid on the scatterplot using semi-transparent ellipses.
library(ellipse)
is required to enable the plotting of ellipses.
genas
produces a list with the following components:
technical.correlation |
estimate of the technical correlation |
biological.correlation |
estimate of the biological correlation |
covariance.matrix |
estimate of the covariance matrix from which the biological correlation is obtained |
deviance |
the likelihood ratio test statistic used to test whether the biological correlation is equal to 0 |
p.value |
the p.value associated with |
n |
the number of genes used to estimate the biological correlation |
As present, genas
assumes that technical correlations between coefficients are the same for all genes, and hence it only works with fit objects that were created without observation weights or missing values.
It does not work with voom
pipelines, because these involve observation weights.
Belinda Phipson and Gordon Smyth
Majewski, IJ, Ritchie, ME, Phipson, B, Corbin, J, Pakusch, M, Ebert, A, Busslinger, M, Koseki, H, Hu, Y, Smyth, GK, Alexander, WS, Hilton, DJ, and Blewitt, ME (2010). Opposing roles of polycomb repressive complexes in hematopoietic stem and progenitor cells. Blood 116, 731-739. http://www.bloodjournal.org/content/116/5/731
Phipson, B. (2013). Empirical Bayes modelling of expression profiles and their associations. PhD Thesis. University of Melbourne, Australia. http://repository.unimelb.edu.au/10187/17614
Ritchie, ME, Phipson, B, Wu, D, Hu, Y, Law, CW, Shi, W, and Smyth, GK (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43, e47. http://nar.oxfordjournals.org/content/43/7/e47
# Simulate gene expression data # Three conditions (Control, A and B) and 1000 genes ngene <- 1000 mu.A <- mu.B <- mu.ctrl <- rep(5,ngene) # 200 genes are differentially expressed. # All are up in condition A and down in B # so the biological correlation is negative. mu.A[1:200] <- mu.ctrl[1:200]+2 mu.B[1:200] <- mu.ctrl[1:200]-2 # Two microarrays for each condition mu <- cbind(mu.ctrl,mu.ctrl,mu.A,mu.A,mu.B,mu.B) y <- matrix(rnorm(6000,mean=mu,sd=1),ngene,6) # two experimental groups and one control group with two replicates each group <- factor(c("Ctrl","Ctrl","A","A","B","B"), levels=c("Ctrl","A","B")) design <- model.matrix(~group) # fit a linear model fit <- lmFit(y,design) fit <- eBayes(fit) # Estimate biological correlation between the logFC profiles # for A-vs-Ctrl and B-vs-Ctrl genas(fit, coef=c(2,3), plot=TRUE, subset="F")