XStringSet-comparison {Biostrings} | R Documentation |
Methods for comparing and ordering the elements in one or more XStringSet objects.
Element-wise (aka "parallel") comparison of 2 XStringSet objects is based on the lexicographic order between 2 BString, DNAString, RNAString, or AAString objects.
For DNAStringSet and RNAStringSet objects, the letters in the respective alphabets (i.e. DNA_ALPHABET and RNA_ALPHABET) are ordered based on a predefined code assigned to each letter. The code assigned to each letter can be retrieved with:
dna_codes <- as.integer(DNAString(paste(DNA_ALPHABET, collapse=""))) names(dna_codes) <- DNA_ALPHABET rna_codes <- as.integer(RNAString(paste(RNA_ALPHABET, collapse=""))) names(rna_codes) <- RNA_ALPHABET
Note that this order does NOT depend on the locale in use. Also note that comparing DNA sequences with RNA sequences is supported and in that case T and U are considered to be the same letter.
For BStringSet and AAStringSet objects, the alphabetical order is defined by the C collation. Note that, at the moment, AAStringSet objects are treated like BStringSet objects i.e. the alphabetical order is NOT defined by the order of the letters in AA_ALPHABET. This might change at some point.
pcompare()
and related methodsIn the code snippets below,
x
and y
are XStringSet objects.
pcompare(x, y)
:
Performs element-wise (aka "parallel") comparison of x
and
y
, that is, returns an integer vector where the i-th element
is less than, equal to, or greater than zero if the i-th element in
x
is considered to be respectively less than, equal to, or
greater than the i-th element in y
.
If x
and y
don't have the same length, then the shortest
is recycled to the length of the longest (the standard recycling rules
apply).
x == y
, x != y
, x <= y
, x >= y
,
x < y
, x > y
:
Equivalent to pcompare(x, y) == 0
, pcompare(x, y) != 0
,
pcompare(x, y) <= 0
, pcompare(x, y) >= 0
,
pcompare(x, y) < 0
, and pcompare(x, y) > 0
, respectively.
order()
and related methodsIn the code snippets below, x
is an XStringSet object.
is.unsorted(x, strictly=FALSE)
:
Return a logical values specifying if x
is unsorted. The
strictly
argument takes logical value indicating if the check
should be for _strictly_ increasing values.
order(x, decreasing=FALSE)
:
Return a permutation which rearranges x
into ascending or
descending order.
rank(x, ties.method=c("first", "min"))
:
Rank x
in ascending order.
sort(x, decreasing=FALSE)
:
Sort x
into ascending or descending order.
duplicated()
and unique()
In the code snippets below, x
is an XStringSet object.
duplicated(x)
:
Return a logical vector whose elements denotes duplicates in x
.
unique(x)
:
Return the subset of x
made of its unique elements.
match()
and %in%
In the code snippets below,
x
and table
are XStringSet objects.
match(x, table, nomatch=NA_integer_)
:
Returns an integer vector containing the first positions of an identical
match in table
for the elements in x
.
x %in% table
:
Returns a logical vector indicating which elements in x
match
identically with an element in table
.
is.na()
and related methodsIn the code snippets below, x
is an XStringSet
object. An XStringSet
object never contains missing values
(these methods exist for compatibility).
is.na(x)
: Returns FALSE
for every element.
anyNA(x)
: Returns FALSE
.
H. Pagès
XStringSet-class,
==
,
is.unsorted
,
order
,
rank
,
sort
,
duplicated
,
unique
,
match
,
%in%
## --------------------------------------------------------------------- ## A. SIMPLE EXAMPLES ## --------------------------------------------------------------------- dna <- DNAStringSet(c("AAA", "TC", "", "TC", "AAA", "CAAC", "G")) match(c("", "G", "AA", "TC"), dna) library(drosophila2probe) fly_probes <- DNAStringSet(drosophila2probe) sum(duplicated(fly_probes)) # 481 duplicated probes is.unsorted(fly_probes) # TRUE fly_probes <- sort(fly_probes) is.unsorted(fly_probes) # FALSE is.unsorted(fly_probes, strictly=TRUE) # TRUE, because of duplicates is.unsorted(unique(fly_probes), strictly=TRUE) # FALSE ## Nb of probes that are the reverse complement of another probe: nb1 <- sum(reverseComplement(fly_probes) %in% fly_probes) stopifnot(identical(nb1, 455L)) # 455 probes ## Probes shared between drosophila2probe and hgu95av2probe: library(hgu95av2probe) human_probes <- DNAStringSet(hgu95av2probe) m <- match(fly_probes, human_probes) stopifnot(identical(sum(!is.na(m)), 493L)) # 493 shared probes ## --------------------------------------------------------------------- ## B. AN ADVANCED EXAMPLE ## --------------------------------------------------------------------- ## We want to compare the first 5 bases with the 5 last bases of each ## probe in drosophila2probe. More precisely, we want to compute the ## percentage of probes for which the first 5 bases are the reverse ## complement of the 5 last bases. library(drosophila2probe) probes <- DNAStringSet(drosophila2probe) first5 <- narrow(probes, end=5) last5 <- narrow(probes, start=-5) nb2 <- sum(first5 == reverseComplement(last5)) stopifnot(identical(nb2, 17L)) ## Percentage: 100 * nb2 / length(probes) # 0.0064 % ## If the probes were random DNA sequences, a probe would have 1 chance ## out of 4^5 to have this property so the percentage would be: 100 / 4^5 # 0.098 % ## With randomly generated probes: set.seed(33) random_dna <- sample(DNAString(paste(DNA_BASES, collapse="")), sum(width(probes)), replace=TRUE) random_probes <- successiveViews(random_dna, width(probes)) random_probes random_probes <- as(random_probes, "XStringSet") random_probes random_first5 <- narrow(random_probes, end=5) random_last5 <- narrow(random_probes, start=-5) nb3 <- sum(random_first5 == reverseComplement(random_last5)) 100 * nb3 / length(random_probes) # 0.099 %