RangedData-class {IRanges} | R Documentation |
IMPORTANT NOTE: RangedData
objects will be deprecated in BioC 3.7!
The use of RangedData
objects has been discouraged in favor
of GRanges or GRangesList
objects since BioC 2.12, that is, since 2014.
The GRanges and GRangesList
classes are defined in the GenomicRanges package.
See ?GRanges
and ?GenomicRanges
(after loading the
GenomicRanges package) for more information about these classes.
PLEASE MIGRATE YOUR CODE TO USE GRanges OR
GRangesList OBJECTS INSTEAD OF RangedData
OBJECTS AS SOON AS POSSIBLE. Don't hesitate to ask on the bioc-devel
mailing list (https://bioconductor.org/help/support/#bioc-devel)
if you need help with this.
RangedData
supports storing data, i.e. a set of variables, on a
set of ranges spanning multiple spaces (e.g. chromosomes). Although
the data is split across spaces, it can still be treated as one
cohesive dataset when desired and extends DataTable
.
A RangedData
object consists of two primary components:
a RangesList
holding the ranges over multiple
spaces and a parallel SplitDataFrameList
,
holding the split data. There is also an universe
slot
for denoting the source (e.g. the genome) of the ranges and/or
data.
There are two different modes of interacting with a
RangedData
. The first mode treats the object as a contiguous
"data frame" annotated with range information. The accessors
start
, end
, and width
get the corresponding
fields in the ranges as atomic integer vectors, undoing the division
over the spaces. The [[
and matrix-style [,
extraction
and subsetting functions unroll the data in the same way. [[<-
does the inverse. The number
of rows is defined as the total number of ranges and the number of
columns is the number of variables in the data. It is often convenient
and natural to treat the data this way, at least when the data is
small and there is no need to distinguish the ranges by their space.
The other mode is to treat the RangedData
as a list, with an
element (a virtual Ranges
/DataFrame
pair) for each space. The length of the object is defined as the number
of spaces and the value returned by the names
accessor gives the
names of the spaces. The list-style [
subset function behaves
analogously.
In the code snippets below, x
is a RangedData
object.
The following accessors treat the data as a contiguous dataset, ignoring the division into spaces:
Array accessors:
nrow(x)
: The number of ranges in x
.
ncol(x)
: The number of data variables in x
.
dim(x)
: An integer vector of length two, essentially
c(nrow(x), ncol(x))
.
rownames(x)
, rownames(x) <- value
: Gets or sets
the names of the ranges in x
.
colnames(x)
, colnames(x) <- value
: Gets the
names of the variables in x
.
dimnames(x)
: A list with two elements, essentially
list(rownames(x), colnames(x))
.
dimnames(x) <- value
: Sets the row and column names,
where value is a list as described above.
columnMetadata(x)
: Get the DataFrame
of
metadata along the value columns, i.e., where each column in
x
is represented by a row in the metadata. Note that
calling mcols(x)
returns the metadata on each
space in x
.
columnMetadata(x) <- value
: Set the DataFrame
of metadata for the columns.
within(data, expr, ...)
: Evaluates expr
within data
, a RangedData
. Any values assigned
in expr
will be stored as value columns in data
,
unless they match one of the reserved names: ranges
,
start
, end
, width
and
space
. Behavior is undefined if any of the range
symbols are modified inconsistently. Modifications
to space
are ignored.
Range accessors. The type of the return value depends on
the type of Ranges
. For IRanges
,
an integer vector. Regardless, the number of elements is always equal to
nrow(x)
.
start(x), start(x) <- value
: Get or set the starts of the
ranges. When setting the starts, value
can be an integer
vector of length(sum(elementNROWS(ranges(x))))
or an
IntegerList object of length length(ranges(x))
and names
names(ranges(x))
.
end(x), end(x) <- value
: Get or set the ends of the
ranges. When setting the ends, value
can be an integer
vector of length(sum(elementNROWS(ranges(x))))
or an
IntegerList object of length length(ranges(x))
and names
names(ranges(x))
.
width(x), width(x) <- value
: Get or set the widths of the
ranges. When setting the widths, value
can be an integer
vector of length(sum(elementNROWS(ranges(x))))
or an
IntegerList object of length length(ranges(x))
and names
names(ranges(x))
.
These accessors make the object seem like a list along the spaces:
length(x)
:
The number of spaces (e.g. chromosomes) in x
.
names(x)
, names(x) <- value
: Get or set the names of
the spaces (e.g. "chr1"
).
NULL
or a character vector of the same length as x
.
Other accessors:
universe(x)
, universe(x) <- value
: Get or set the
scalar string identifying the scope of the data in some way (e.g.
genome, experimental platform, etc). The universe may be NULL
.
ranges(x), ranges(x) <- value
: Gets or sets the ranges in
x
as a RangesList
.
space(x)
: Gets the spaces from ranges(x)
.
values(x), values(x) <- value
: Gets or sets the data values in
x
as a SplitDataFrameList
.
score(x), score(x) <- value
: Gets or sets the column
representing a "score" in x
, as a vector. This is the column
named score
, or, if this does not exist, the first column, if it
is numeric. The get method return NULL
if no suitable score
column is found. The set method takes a numeric vector as its value.
RangedData(ranges = IRanges(), ..., space = NULL,
universe = NULL)
:
Creates a RangedData
with the ranges in ranges
and
variables given by the arguments in ...
. See the
constructor DataFrame
for how the ...
arguments are interpreted.
If ranges
is a Ranges
object, the
space
argument is used to split of the data into spaces.
If space
is NULL
, all of the ranges and values are
placed into the same space, resulting in a single-space (length one)
RangedData
object. Otherwise, the ranges and values are split
into spaces according to space
, which is treated as a factor,
like the f
argument in split
.
If ranges
is a RangesList
object, then
the supplied space
argument is ignored and its value is derived
from ranges
.
If ranges
is not a Ranges
or
RangesList
object, this function calls
as(ranges, "RangedData")
and returns the result if successful.
The universe may be specified as a scalar string by the universe
argument.
as.data.frame(x, row.names=NULL, optional=FALSE, ...)
:
Copy the start, end, width of the ranges and all of the variables
as columns in a data.frame
. This is a bridge to existing
functionality in R, but of course care must be taken if the data
is large. Note that optional
and ...
are ignored.
as(from, "DataFrame")
: Like as.data.frame
above,
except the result is an DataFrame
and it
probably involves less copying, especially if there is only a
single space.
as(from, "RangedData")
: Coerce from
to
a RangedData
, according to the type of from
:
Rle
, RleList
Converts each run to a range and stores the run values in a column named "score".
RleViewsList
Creates a RangedData
using the ranges given by the runs
of subject(from)
in each of the windows, with a value column
score
taken as the corresponding subject values.
Ranges
Creates a RangedData
with only the ranges in from
;
no data columns.
RangesList
Creates a RangedData
with the ranges in from
.
Also propagates the inner metadata columns of the
RangesList
(accessed with mcols(unlist(from))
)
to the data columns (aka values) of the RangedData
.
This makes it a lossless coercion and the exact reverse of the
coercion from RangedData
to RangesList
.
data.frame
or DataTable
Constructs a
RangedData
, using the columns “start”,
“end”, and, optionally, “space” columns in
from
. The other columns become data columns in the
result. Any “width” column is ignored.
as(from, "RangesList")
:
Creates a CompressedIRangesList
(a subclass of RangesList
)
made of the ranges in from
.
Also propagates the data columns (aka values) of the RangedData
to the inner metadata columns of the RangesList
.
This makes it a lossless coercion and the exact reverse of the
coercion from RangesList
to RangedData
.
as.env(x, enclos = parent.frame())
:
Creates an environment
with a symbol for each variable in
the frame, as well as a ranges
symbol for
the ranges. This is efficient, as no copying is performed.
In the code snippets below, x
is a RangedData
object.
x[i]
:
Subsets x
by indexing into its spaces, so the
result is of the same class, with a different set of spaces.
i
can be numerical, logical, NULL
or missing.
x[i,j]
:
Subsets x
by indexing into its rows and columns. The result
is of the same class, with a different set of rows and columns.
The row index i
can either treat x
as a flat table
by being a character, integer, or logical vector or treat x
as a partitioned table by being a RangesList
,
LogicalList
, or IntegerList
of the same length as x
.
x[[i]]
:
Extracts a variable from x
, where i
can be
a character, numeric, or logical scalar that indexes into the
columns. The variable is unlisted over the spaces.
For convenience, values of "space"
and "ranges"
are equivalent to space(x)
and unlist(ranges(x))
respectively.
x$name
: similar to above, where name
is taken
literally as a column name in the data.
x[[i]] <- value
:
Sets value as column i
in x
, where i
can be
a character, numeric, or logical scalar that indexes into the
columns. The length of value
should equal
nrow(x)
. x[[i]]
should be identical to value
after this operation.
For convenience, i="ranges"
is equivalent to
ranges(x) <- value
.
x$name <- value
: similar to above, where name
is taken
literally as a column name in the data.
In the code snippets below, x
is a RangedData
object.
rbind(...)
: Matches the spaces from
the RangedData
objects in ...
by name and combines
them row-wise.
c(x, ..., recursive = FALSE)
: Combines x
with
arguments specified in ...
, which must all be
RangedData
objects. This combination acts as if x
is
a list of spaces, meaning that the result will contain the spaces
of the first concatenated with the spaces of the second, and so
on. This function is useful when creating RangedData
objects on a space-by-space basis and then needing to
combine them.
An lapply
method is provided to apply a function over the spaces
of a RangedData
:
lapply(X, FUN, ...)
:
Applies FUN
to each space in X
with extra parameters
in ...
.
Michael Lawrence
DataTable, the parent of this class, with more utilities.
ranges <- IRanges(c(1,2,3),c(4,5,6)) filter <- c(1L, 0L, 1L) score <- c(10L, 2L, NA) ## constructing RangedData instances ## no variables rd <- RangedData() rd <- RangedData(ranges) ranges(rd) ## one variable rd <- RangedData(ranges, score) rd[["score"]] ## multiple variables rd <- RangedData(ranges, filter, vals = score) rd[["vals"]] # same as rd[["score"]] above rd$vals rd[["filter"]] rd <- RangedData(ranges, score + score) rd[["score...score"]] # names made valid ## split some data over chromosomes range2 <- IRanges(start=c(15,45,20,1), end=c(15,100,80,5)) both <- c(ranges, range2) score <- c(score, c(0L, 3L, NA, 22L)) filter <- c(filter, c(0L, 1L, NA, 0L)) chrom <- paste("chr", rep(c(1,2), c(length(ranges), length(range2))), sep="") rd <- RangedData(both, score, filter, space = chrom) rd[["score"]] # identical to score rd[1][["score"]] # identical to score[1:3] ## subsetting ## list style: [i] rd[numeric()] # these three are all empty rd[logical()] rd[NULL] rd[] # missing, full instance returned rd[FALSE] # logical, supports recycling rd[c(FALSE, FALSE)] # same as above rd[TRUE] # like rd[] rd[c(TRUE, FALSE)] rd[1] # numeric index rd[c(1,2)] rd[-2] ## matrix style: [i,j] rd[,NULL] # no columns rd[NULL,] # no rows rd[,1] rd[,1:2] rd[,"filter"] rd[1,] # now by the rows rd[c(1,3),] rd[1:2, 1] # row and column rd[c(1:2,1,3),1] ## repeating rows ## dimnames colnames(rd)[2] <- "foo" colnames(rd) rownames(rd) <- head(letters, nrow(rd)) rownames(rd) ## space names names(rd) names(rd)[1] <- "chr1" ## variable replacement count <- c(1L, 0L, 2L) rd <- RangedData(ranges, count, space = c(1, 2, 1)) ## adding a variable score <- c(10L, 2L, NA) rd[["score"]] <- score rd[["score"]] # same as 'score' ## replacing a variable count2 <- c(1L, 1L, 0L) rd[["count"]] <- count2 ## numeric index also supported rd[[2]] <- score rd[[2]] # gets 'score' ## removing a variable rd[[2]] <- NULL ncol(rd) # is only 1 rd$score2 <- score ## combining rd <- RangedData(ranges, score, space = c(1, 2, 1)) c(rd[1], rd[2]) # equal to 'rd' rd2 <- RangedData(ranges, score) ## applying lapply(rd, `[[`, 1) # get first column in each space