character-utils {S4Vectors} | R Documentation |
Some low-level string utilities that operate on ordinary character vectors. For more advanced string manipulations, see the Biostrings package.
unstrsplit(x, sep="") # 'sep' default is "" (empty string) strsplitAsListOfIntegerVectors(x, sep=",") # 'sep' default is ","
x |
For For |
sep |
A single string containing the separator character.
For |
unstrsplit(x, sep)
is equivalent to (but much faster than)
sapply(x, paste0, collapse=sep)
. It's performing the reverse
transformation of strsplit( , fixed=TRUE)
, that is,
if x
is a character vector with no NAs and sep
a single
string, then unstrsplit(strsplit(x, split=sep, fixed=TRUE), sep)
is identical to x
. A notable exception to this though is when
strsplit
finds a match at the end of a string, in which case the
last element of the output (which should normally be an empty string)
is not returned (see ?strsplit
for the details).
strsplitAsListOfIntegerVectors
is similar to the
strsplitAsListOfIntegerVectors2
function shown in the
Examples section below, except that the former generally raises
an error where the latter would have inserted an NA
in
the returned object. More precisely:
The latter accepts NAs in the input, the former doesn't (raises an error).
The latter introduces NAs by coercion (with a warning), the former doesn't (raises an error).
The latter supports "inaccurate integer conversion in coercion" when the value to coerce is > INT_MAX (then it's coerced to INT_MAX), the former doesn't (raises an error).
The latter coerces non-integer values (e.g. 10.3) to an int by truncating them, the former doesn't (raises an error).
When it fails, strsplitAsListOfIntegerVectors
will print
an informative error message.
Finally, strsplitAsListOfIntegerVectors
is faster and uses
much less memory than strsplitAsListOfIntegerVectors2
.
unstrsplit
returns a character vector with one string per list
element in x
.
strsplitAsListOfIntegerVectors
returns a list where each list
element is an integer vector. There is one list element per string
in x
.
Hervé Pagès
The strsplit
function in the base
package.
## --------------------------------------------------------------------- ## unstrsplit() ## --------------------------------------------------------------------- x <- list(A=c("abc", "XY"), B=NULL, C=letters[1:4]) unstrsplit(x) unstrsplit(x, sep=",") unstrsplit(x, sep=" => ") data(islands) x <- names(islands) y <- strsplit(x, split=" ", fixed=TRUE) x2 <- unstrsplit(y, sep=" ") stopifnot(identical(x, x2)) ## But... names(x) <- x y <- strsplit(x, split="in", fixed=TRUE) x2 <- unstrsplit(y, sep="in") y[x != x2] ## In other words: strsplit() behavior sucks :-/ ## --------------------------------------------------------------------- ## strsplitAsListOfIntegerVectors() ## --------------------------------------------------------------------- x <- c("1116,0,-19", " +55291 , 2476,", "19184,4269,5659,6470,6721,7469,14601", "7778889, 426900, -4833,5659,6470,6721,7096", "19184 , -99999") y <- strsplitAsListOfIntegerVectors(x) y ## In normal situations (i.e. when the input is well-formed), ## strsplitAsListOfIntegerVectors() does actually the same as the ## function below but is more efficient (both in speed and memory ## footprint): strsplitAsListOfIntegerVectors2 <- function(x, sep=",") { tmp <- strsplit(x, sep, fixed=TRUE) lapply(tmp, as.integer) } y2 <- strsplitAsListOfIntegerVectors2(x) stopifnot(identical(y, y2))