IPos-class {IRanges} | R Documentation |
The IPos class is a container for storing a set of integer positions where most of the positions are typically (but not necessarily) adjacent. Because integer positions can be seen as integer ranges of width 1, the IPos class extends the Ranges virtual class. Note that even though an IRanges object can be used for storing integer positions, using an IPos object will be much more memory-efficient, especially when the object contains long runs of adjacent positions in ascending order.
IPos(pos_runs) # constructor function
pos_runs |
An IRanges object (or any other Ranges derivative) where
each range is interpreted as a run of adjacent ascending positions.
If |
An IPos object.
IPos objects support the same set of getters as other Ranges
derivatives (i.e. start()
, end()
, mcols()
, etc...),
plus the pos()
getter which is equivalent to start()
or end()
. See ?Ranges
for the list of getters
supported by Ranges derivatives.
IMPORTANT NOTE: An IPos object cannot hold names i.e. names()
always returns NULL
on it.
IPos objects support the mcols()
and metadata()
setters
only.
From Ranges to IPos:
A Ranges derivative x
in which all the ranges have a
width of 1 can be coerced to an IPos object with as(x, "IPos")
.
The names on x
are not propagated (a warning is issued if x
has names on it).
From IPos to IRanges:
An IPos object x
can be coerced to an IRanges object
with as(x, "IRanges")
. However be aware that the resulting object
can use thousands times (or more) memory than x
!
See "MEMORY USAGE" in the Examples section below.
From IPos to ordinary R objects:
Like with any other Ranges derivative, as.character()
,
as.factor()
, and as.data.frame()
work on an IPos object
x
. Note however that as.data.frame(x)
returns a data frame
with a pos
column (containing pos(x)
) instead of the
start
, end
, and width
columns that one gets with other
Ranges derivatives.
An IPos object can be subsetted exactly like an IRanges object.
IPos objects can be combined (a.k.a. appended) with c()
or
append()
.
Like with an IRanges object, split()
and relist()
work
on an IPos object.
Like for any Vector derivative, the length of an
IPos object cannot exceed .Machine$integer.max
(i.e. 2^31 on
most platforms). IPos()
will return an error if pos_runs
contains too many integer positions.
Hervé Pagès; based on ideas borrowed from Georg Stricker georg.stricker@in.tum.de and Julien Gagneur gagneur@in.tum.de
The GPos class in the GenomicRanges package for a memory-efficient representation of genomic positions (i.e. genomic ranges of width 1).
Ranges-comparison for comparing and ordering integer ranges and/or positions.
findOverlaps-methods for finding overlapping integer ranges and/or positions.
nearest-methods for finding the nearest integer range and/or position.
## --------------------------------------------------------------------- ## BASIC EXAMPLES ## --------------------------------------------------------------------- ## Example 1: ipos1 <- IPos(c("44-53", "5-10", "2-5")) ipos1 length(ipos1) pos(ipos1) # same as 'start(ipos1)' and 'end(ipos1)' as.character(ipos1) as.data.frame(ipos1) as(ipos1, "IRanges") as.data.frame(as(ipos1, "IRanges")) ipos1[9:17] ## Example 2: pos_runs <- IRanges(c(1, 6, 12, 17), c(5, 10, 16, 20)) ipos2 <- IPos(pos_runs) ipos2 ## Example 3: ipos3A <- ipos3B <- IPos(c("1-15000", "15400-88700")) npos <- length(ipos3A) mcols(ipos3A)$sample <- Rle("sA") sA_counts <- sample(10, npos, replace=TRUE) mcols(ipos3A)$counts <- sA_counts mcols(ipos3B)$sample <- Rle("sB") sB_counts <- sample(10, npos, replace=TRUE) mcols(ipos3B)$counts <- sB_counts ipos3 <- c(ipos3A, ipos3B) ipos3 ## --------------------------------------------------------------------- ## MEMORY USAGE ## --------------------------------------------------------------------- ## Coercion to IRanges works... ipos4 <- IPos(c("1-125000", "135000-575000")) ir4 <- as(ipos4, "IRanges") ir4 ## ... but is generally not a good idea: object.size(ipos4) object.size(ir4) # 1739 times bigger than the IPos object! ## Shuffling the order of the positions impacts memory usage: ipos4s <- sample(ipos4) object.size(ipos4s) ## AN IMPORTANT NOTE: In the worst situations, IPos still performs as ## good as an IRanges object. object.size(as(ipos4s, "IRanges")) # same size as 'ipos4s' ## Best case scenario is when the object is strictly sorted (i.e. ## positions are in strict ascending order). ## This can be checked with: is.unsorted(ipos4, strict=TRUE) # 'ipos4' is strictly sorted ## --------------------------------------------------------------------- ## USING MEMORY-EFFICIENT METADATA COLUMNS ## --------------------------------------------------------------------- ## In order to keep memory usage as low as possible, it is recommended ## to use a memory-efficient representation of the metadata columns that ## we want to set on the object. Rle's are particularly well suited for ## this, especially if the metadata columns contain long runs of ## identical values. This is the case for example if we want to use an ## IPos object to represent the coverage of sequencing reads along a ## chromosome. ## Example 5: library(pasillaBamSubset) library(Rsamtools) # for the BamFile() constructor function bamfile1 <- BamFile(untreated1_chr4()) bamfile2 <- BamFile(untreated3_chr4()) ipos5 <- IPos(IRanges(1, seqlengths(bamfile1)[["chr4"]])) library(GenomicAlignments) # for "coverage" method for BamFile objects cov1 <- coverage(bamfile1)$chr4 cov2 <- coverage(bamfile2)$chr4 mcols(ipos5) <- DataFrame(cov1, cov2) ipos5 object.size(ipos5) # lightweight ## Keep only the positions where coverage is at least 10 in one of the ## 2 samples: ipos5[mcols(ipos5)$cov1 >= 10 | mcols(ipos5)$cov2 >= 10]