R: Comparing and ordering ranges

Ranges-comparison {IRanges}

R Documentation

Comparing and ordering ranges

Description

Methods for comparing and/or ordering Ranges objects.

Usage

## match() & selfmatch()
## ---------------------

## S4 method for signature 'Ranges,Ranges'
match(x, table, nomatch=NA_integer_, incomparables=NULL,
      method=c("auto", "quick", "hash"))

## S4 method for signature 'Ranges'
selfmatch(x, method=c("auto", "quick", "hash"))

## order() and related methods
## ----------------------------

## S4 method for signature 'Ranges'
is.unsorted(x, na.rm=FALSE, strictly=FALSE)

## S4 method for signature 'Ranges'
order(..., na.last=TRUE, decreasing=FALSE, method=c("auto", "shell", "radix"))

## Generalized parallel comparison of 2 Ranges objects
## ---------------------------------------------------

## S4 method for signature 'Ranges,Ranges'
pcompare(x, y)

rangeComparisonCodeToLetter(code)

Arguments

`x, table, y`	Ranges objects.
`nomatch`	The value to be returned in the case when no match is found. It is coerced to an `integer`.
`incomparables`	Not supported.
`method`	For `match` and `selfmatch`: Use a Quicksort-based (`method="quick"`) or a hash-based (`method="hash"`) algorithm. The latter tends to give better performance, except maybe for some pathological input that we've not encountered so far. When `method="auto"` is specified, the most efficient algorithm will be used, that is, the hash-based algorithm if `length(x) <= 2^29`, otherwise the Quicksort-based algorithm. For `order`: The `method` argument is ignored.
`na.rm`	Ignored.
`strictly`	Logical indicating if the check should be for strictly increasing values.
`...`	One or more Ranges objects. The Ranges objects after the first one are used to break ties.
`na.last`	Ignored.
`decreasing`	`TRUE` or `FALSE`.
`code`	A vector of codes as returned by `pcompare`.

Details

Two elements of a Ranges derivative (i.e. two integer ranges) are considered equal iff they share the same start and width. duplicated() and unique() on a Ranges derivative are conforming to this.

Note that with this definition, 2 empty ranges are generally not equal (they need to share the same start to be considered equal). This means that, when it comes to comparing ranges, an empty range is interpreted as a position between its end and start. For example, a typical usecase is comparison of insertion points defined along a string (like a DNA sequence) and represented as empty ranges.

The "natural order" for the elements of a Ranges derivative is to order them (a) first by start and (b) then by width. This way, the space of integer ranges is totally ordered.

pcompare(), ==, !=, <=, >=, < and > on Ranges derivatives behave accordingly to this "natural order".

is.unsorted(), order(), sort(), rank() on Ranges derivatives also behave accordingly to this "natural order".

Finally, note that some inter range transformations like reduce or disjoin also use this "natural order" implicitly when operating on Ranges derivatives.

pcompare(x, y): Performs element-wise (aka "parallel") comparison of 2 Ranges objects of x and y, that is, returns an integer vector where the i-th element is a code describing how x[i] is qualitatively positioned with respect to y[i].

Here is a summary of the 13 predefined codes (and their letter equivalents) and their meanings:

      -6 a: x[i]: .oooo.......         6 m: x[i]: .......oooo.
            y[i]: .......oooo.              y[i]: .oooo.......

      -5 b: x[i]: ..oooo......         5 l: x[i]: ......oooo..
            y[i]: ......oooo..              y[i]: ..oooo......

      -4 c: x[i]: ...oooo.....         4 k: x[i]: .....oooo...
            y[i]: .....oooo...              y[i]: ...oooo.....

      -3 d: x[i]: ...oooooo...         3 j: x[i]: .....oooo...
            y[i]: .....oooo...              y[i]: ...oooooo...

      -2 e: x[i]: ..oooooooo..         2 i: x[i]: ....oooo....
            y[i]: ....oooo....              y[i]: ..oooooooo..

      -1 f: x[i]: ...oooo.....         1 h: x[i]: ...oooooo...
            y[i]: ...oooooo...              y[i]: ...oooo.....

                      0 g: x[i]: ...oooooo...
                           y[i]: ...oooooo...

Note that this way of comparing ranges is a refinement over the standard ranges comparison defined by the ==, !=, <=, >=, < and > operators. In particular a code that is < 0, = 0, or > 0, corresponds to x[i] < y[i], x[i] == y[i], or x[i] > y[i], respectively.

The pcompare method for Ranges objects is guaranteed to return predefined codes only but methods for other objects (e.g. for GenomicRanges objects) can return non-predefined codes. Like for the predefined codes, the sign of any non-predefined code must tell whether x[i] is less than, or greater than y[i].

rangeComparisonCodeToLetter(x): Translate the codes returned by pcompare. The 13 predefined codes are translated as follow: -6 -> a; -5 -> b; -4 -> c; -3 -> d; -2 -> e; -1 -> f; 0 -> g; 1 -> h; 2 -> i; 3 -> j; 4 -> k; 5-> l; 6 -> m. Any non-predefined code is translated to X. The translated codes are returned in a factor with 14 levels: a, b, ..., l, m, X.

match(x, table, nomatch=NA_integer_, method=c("auto", "quick", "hash")): Returns an integer vector of the length of x, containing the index of the first matching range in table (or nomatch if there is no matching range) for each range in x.

selfmatch(x, method=c("auto", "quick", "hash")): Equivalent to, but more efficient than, match(x, x, method=method).

duplicated(x, fromLast=FALSE, method=c("auto", "quick", "hash")): Determines which elements of x are equal to elements with smaller subscripts, and returns a logical vector indicating which elements are duplicates. duplicated(x) is equivalent to, but more efficient than, duplicated(as.data.frame(x)) on a Ranges object. See duplicated in the base package for more details.

unique(x, fromLast=FALSE, method=c("auto", "quick", "hash")): Removes duplicate ranges from x. unique(x) is equivalent to, but more efficient than, unique(as.data.frame(x)) on a Ranges object. See unique in the base package for more details.

x %in% table: A shortcut for finding the ranges in x that match any of the ranges in table. Returns a logical vector of length equal to the number of ranges in x.

findMatches(x, table, method=c("auto", "quick", "hash")): An enhanced version of match that returns all the matches in a Hits object.

countMatches(x, table, method=c("auto", "quick", "hash")): Returns an integer vector of the length of x containing the number of matches in table for each element in x.

order(...): Returns a permutation which rearranges its first argument (a Ranges object) into ascending order, breaking ties by further arguments (also Ranges objects).

sort(x): Sorts x. See sort in the base package for more details.

rank(x, na.last=TRUE, ties.method=c("average", "first", "random", "max", "min")): Returns the sample ranks of the ranges in x. See rank in the base package for more details.

Author(s)

Hervé Pagès

Examples

## ---------------------------------------------------------------------
## A. ELEMENT-WISE (AKA "PARALLEL") COMPARISON OF 2 Ranges OBJECTS
## ---------------------------------------------------------------------
x0 <- IRanges(1:11, width=4)
x0
y0 <- IRanges(6, 9)
pcompare(x0, y0)
pcompare(IRanges(4:6, width=6), y0)
pcompare(IRanges(6:8, width=2), y0)
pcompare(x0, y0) < 0   # equivalent to 'x0 < y0'
pcompare(x0, y0) == 0  # equivalent to 'x0 == y0'
pcompare(x0, y0) > 0   # equivalent to 'x0 > y0'

rangeComparisonCodeToLetter(-10:10)
rangeComparisonCodeToLetter(pcompare(x0, y0))

## Handling of zero-width ranges (a.k.a. empty ranges):
x1 <- IRanges(11:17, width=0)
x1
pcompare(x1, x1[4])
pcompare(x1, IRanges(12, 15))

## Note that x1[2] and x1[6] are empty ranges on the edge of non-empty
## range IRanges(12, 15). Even though -1 and 3 could also be considered
## valid codes for describing these configurations, pcompare()
## considers x1[2] and x1[6] to be *adjacent* to IRanges(12, 15), and
## thus returns codes -5 and 5:
pcompare(x1[2], IRanges(12, 15))  # -5
pcompare(x1[6], IRanges(12, 15))  #  5

x2 <- IRanges(start=c(20L, 8L, 20L, 22L, 25L, 20L, 22L, 22L),
              width=c( 4L, 0L, 11L,  5L,  0L,  9L,  5L,  0L))
x2

which(width(x2) == 0)  # 3 empty ranges
x2[2] == x2[2]  # TRUE
x2[2] == x2[5]  # FALSE
x2 == x2[4]
x2 >= x2[3]

## ---------------------------------------------------------------------
## B. match(), selfmatch(), %in%, duplicated(), unique()
## ---------------------------------------------------------------------
table <- x2[c(2:4, 7:8)]
match(x2, table)

x2 %in% table

duplicated(x2)
unique(x2)

## ---------------------------------------------------------------------
## C. findMatches(), countMatches()
## ---------------------------------------------------------------------
findMatches(x2, table)
countMatches(x2, table)

x2_levels <- unique(x2)
countMatches(x2_levels, x2)

## ---------------------------------------------------------------------
## D. order() AND RELATED METHODS
## ---------------------------------------------------------------------
is.unsorted(x2)
order(x2)
sort(x2)
rank(x2, ties.method="first")

[Package IRanges version 2.12.0 Index]