vsn2 {vsn} | R Documentation |
vsn2
fits the vsn model to the data
in x
and returns a vsn
object with
the fit parameters and the transformed data matrix.
The data are, typically, feature intensity readings from a
microarray, but this function may also be useful for other kinds of
intensity data that obey an additive-multiplicative error model.
To obtain an object of the same class as x
, containing
the normalised data and the same metdata as x
, use
fit = vsn2(x, ...) nx = predict(fit, newdata=x)
or the wrapper justvsn
.
Please see the vignette Introduction to vsn.
vsnMatrix(x, reference, strata, lts.quantile = 0.9, subsample = 0L, verbose = interactive(), returnData = TRUE, calib = "affine", pstart, minDataPointsPerStratum = 42L, optimpar = list(), defaultpar = list(factr=5e7, pgtol=2e-4, maxit=60000L, trace=0L, cvg.niter=7L, cvg.eps=0)) ## S4 method for signature 'ExpressionSet' vsn2(x, reference, strata, ...) ## S4 method for signature 'AffyBatch' vsn2(x, reference, strata, subsample, ...) ## S4 method for signature 'NChannelSet' vsn2(x, reference, strata, backgroundsubtract=FALSE, foreground=c("R","G"), background=c("Rb", "Gb"), ...) ## S4 method for signature 'RGList' vsn2(x, reference, strata, ...)
x |
An object containing the data to which the model is fitted. |
reference |
Optional, a |
strata |
Optional, a |
lts.quantile |
Numeric of length 1. The quantile that is used for the resistant least trimmed sum of squares regression. Allowed values are between 0.5 and 1. A value of 1 corresponds to ordinary least sum of squares regression. |
subsample |
Integer of length 1. If its value is greater than 0,
the model parameters are
estimated from a subsample of the data of size |
backgroundsubtract |
Logical of length 1: should local background estimates be subtracted before fitting vsn? |
foreground, background |
Aligned character vectors of the same length,
naming the channels of |
verbose |
Logical. If TRUE, some messages are printed. |
returnData |
Logical. If TRUE, the transformed data are returned
in a slot of the resulting |
calib |
Character of length 1. Allowed values are |
pstart |
Optional, a three-dimensional numeric array that
specifies start values for the iterative parameter
estimation algorithm.
If not specified, the function tries to guess useful start values.
The first dimension corresponds to the levels of |
minDataPointsPerStratum |
The minimum number of data points per stratum. Normally there is no need for the user to change this; refer to the vignette for further documentation. |
optimpar |
Optional, a list with parameters for the likelihood
optimisation algorithm. Default parameters are taken from
|
defaultpar |
The default parameters for the likelihood
optimisation algorithm. Values in |
... |
Arguments that get passed on to |
An object of class vsn
.
The data are returned on a glog scale to base 2. More precisely,
the transformed data are subject to the transformation
glog_2(f(b)*x+a) + c, where the function
glog_2(u) = log_2(u+√{u*u+1}) = asinh(u)/\log(2) is called the
generalised logarithm, the offset a and the scaling parameter
b are the fitted model parameters
(see references), and f(x)=\exp(x) is a parameter transformation that
allows ensuring positivity of the factor in front of x while
using an unconstrained optimisation over b [4].
The overall offset c is computed from the b's such that for
large x the transformation approximately corresponds to the
\log_2 function. This is done separately for each stratum, but with the
same value across arrays. More precisely, if the element b[s,i]
of the array b is the scaling parameter for the s
-th
stratum and the i
-th array, then c[s]
is computed as
log2(2*f(mean(b[,i])))
.
The offset c is inconsequential for all differential
expression calculations, but many users like to see the data in a
range that they are familiar with.
vsn2
methods exist for
ExpressionSet
,
NChannelSet
,
AffyBatch
(from the affy
package),
RGList
(from the limma
package),
matrix
and numeric
.
If x
is an NChannelSet
, then
vsn2
is applied to the matrix that is obtained
by horizontally concatenating the color channels.
Optionally, available background estimates can be subtracted before.
If x
is an RGList
, it is
converted into an NChannelSet
using a copy of Martin Morgan's code for RGList
to
NChannelSet
coercion, then the NChannelSet
method is called.
If the reference
argument is not specified, then the model
parameters μ_k and σ are fit from the data in x
.
This is the mode of operation described in [1]
and that was the only option in versions 1.X of this package.
If reference
is specified, the model parameters
μ_k and σ are taken from it.
This allows for 'incremental' normalization [4].
L-BFGS-B
uses three termination criteria:
(f_k - f_{k+1}) / max(|f_k|, |f_{k+1}|, 1) <= factr * epsmch
where epsmch
is the machine precision.
|gradient| < pgtol
iterations > maxit
These are set by the elements factr
, pgtol
and
maxit
of optimpar
. The remaining elements are
trace
An integer between 0 and 6, indicating the
verbosity level of L-BFGS-B
, higher values
create more output.
cvg.niter
The number of iterations to be used in the least trimmed sum of squares regression.
cvg.eps
Numeric. A convergence threshold for the least trimmed sum of squares regression.
Wolfgang Huber
[1] Variance stabilization applied to microarray data calibration and to the quantification of differential expression, Wolfgang Huber, Anja von Heydebreck, Holger Sueltmann, Annemarie Poustka, Martin Vingron; Bioinformatics (2002) 18 Suppl.1 S96-S104.
[2] Parameter estimation for the calibration and variance stabilization of microarray data, Wolfgang Huber, Anja von Heydebreck, Holger Sueltmann, Annemarie Poustka, and Martin Vingron; Statistical Applications in Genetics and Molecular Biology (2003) Vol. 2 No. 1, Article 3. http://www.bepress.com/sagmb/vol2/iss1/art3.
[3] L-BFGS-B: Fortran Subroutines for Large-Scale Bound Constrained Optimization, C. Zhu, R.H. Byrd, P. Lu and J. Nocedal, Technical Report, Northwestern University (1996).
[4] Package vignette: Likelihood Calculations for vsn
data("kidney") fit = vsn2(kidney) ## fit nkid = predict(fit, newdata=kidney) ## apply fit plot(exprs(nkid), pch=".") abline(a=0, b=1, col="red")