rbdf {mvbutils} | R Documentation |
rbind
concatenates its arguments by row; see cbind
for basic documentation. There is an rbind
method for data frames which mvbutils
overrides, and rbdf
calls the override directly. The mvbutils
version should behave exactly as the base-R version, with two exceptions:
zero-row arguments are not ignored, e.g. so that factor levels that never appear are not dropped.
dimensioned (array or matrix) elements do not lose any extra attributes (such as class
).
I find the zero-row behaviour more logical, and useful because e.g. it lets me create an empty.data.frame
with the correct type/class/levels for all columns, then subsequently add rows to it. The behaviour for matrix (array) elements allows e.g. the rbind
ing of data frames that contain matrices of POSIXct
elements without losing the POSIXct
class (as in my package nicetime).
When rbind
ing data frames, best practice is to make sure all the arguments really are data frames. Lists and matrices also work OK (they are first coerced to data frames), but scalars are dangerous (even though base-R will process them without complaint). rbind
is quirky around data frames; unless all the arguments are data frames, sometimes rbind.data.frame
will not be called even when you'd expect it to be, and the coercion of scalars is frankly potty; see Details and EXAMPLES. mvbutils:::rbind.data.frame
tries to mimic the base-R scalar coercion, but I'm not sure it's 100% compatible. Again, the safest way to ensure a predictable outcome, is to make sure all arguments really are data frames, and/or to call rbdf
directly.
rbind(..., deparse.level = 1) # generic rbind(..., deparse.level = 1) # S3 method for data frames rbdf(..., deparse.level = 1) # explicitly call S3 method for data frames (circumvent rbind dispatch)
... |
Data frames, or things that will coerced to data frames. NULLs are ignored. |
deparse.level |
not used by |
See cbind
documentation in base-R.
R's dispatch mechanism for rbind
is as follows [my paraphrasing of base-R documentation]. Mostly, if any argument is a data frame then rbind.data.frame
will be used. However, if one argument is a data frame but another argument is a scalar/matrix of a class that has an rbind
method, then "default rbind" will be called instead. Although the latter still returns a data frame, it stuffs up e.g. class attributes, so that POSIXct
objects will be turned into huge numbers. Again, if you really want a data frame result, make sure all the arguments are data frames.
In mvbutils:::rbind.data.frame
(and AFAIK in the base-R version), arguments that are not data frames are coerced to data frames, by calling data.frame()
on them. AFAICS this works predictably for list and matrix arguments; note that lists need names, and matrices need column names, that match the names of the real data frame arguments, because column alignment is done by name not position. Behaviour for scalars is IMO weird; see Examples. The idea seems to be to turn each scalar into a single-row data frame, coercing its names and truncating/replicating it to match the columns of the first real data frame argument; any names
of the scalar itself are disregarded, and alignment is by position not name. Although mvbutils:::rbind.data.frame
tries to mimic this coercion, it seems to me unnecessary (the user should just turn the scalar into something less ambiguous), confusing, and dangerous, so mvbutils
issues a warning. Whether I have duplicated every quirk, I'm not sure.
Note also that R's accursed drop=TRUE
default means that things you might reasonably think should be data frames, might not be. Under some circumstances, this might result in rbind.data.frame
being bypassed. See Examples.
Short of rewriting data.frame
and rbind
, there's nothing mvbutils
can do to fix these quirks. Whether base-R should consider any changes is another story, but back-compatibility probably suggests not.
[Taken from the base-R documentation, modified to fit the mvbutils
version]
The rbind
data frame method first drops any NULL arguments, then coerces all others to data frames (see Details for how it does this with scalars). Then it drops all zero-column arguments. (If that leaves none, it returns a zero-column zero-row data frame.) It then takes the classes of the columns from the first argument, and matches columns by name (rather than by position). Factors have their levels expanded as necessary (in the order of the levels of the levelsets of the factors encountered) and the result is an ordered factor if and only if all the components were ordered factors. (The last point differs from S-PLUS.) Old-style categories (integer vectors with levels) are promoted to factors. Zero-row arguments are kept, so that in particular their column classes and factor levels are taken account of.
Because the class of each column is set by the first data frame, rather than "by consensus", numeric/character/factor conversions can be a bit surprising especially where NAs are involved. See the final bit of EXAMPLES.
cbind
and data.frame
in base-R; empty.data.frame
## Why base-R dropping of zero rows is odd #rbind( data.frame( x='yes', y=1)[-1,], data.frame( x='no', y=0))$x # mvbutils ##[1] no ##Levels: yes no # two levels #base::rbind( data.frame( x='yes', y=1)[-1,], data.frame( x='no', y=0))$x # base-R #[1] no #Levels: no # lost level #rbind( data.frame( x='yes', y=1)[-1,], data.frame( x='no', y=0, stringsAsFactors=FALSE))$x ##[1] no ##Levels: yes no #base::rbind( data.frame( x='yes', y=1)[-1,], data.frame( x='no', y=0, stringsAsFactors=FALSE))$x ##[1] "no" # x has turned into a character ## Quirks of scalar coercion #evalq( rbind( data.frame( x=1), x=2, x=3), baseenv()) # OK I guess ## x ##1 1 ##x 2 ##x1 3 #evalq( rbind( data.frame( x=1), x=2:3), baseenv()) # NB lost element ## x ##1 1 ##x 2 #evalq( rbind( data.frame( x=1, y=2, z=3), c( x=4, y=5)), baseenv()) # NB gained element! Try predicting z[2]... ## x y z ##1 1 2 3 ##2 4 5 4 #evalq( rbind( data.frame( x='cat', y='dog'), cbind( x='flea', y='goat')), baseenv()) # OK ## x y ##1 cat dog ##2 flea goat #evalq( rbind( data.frame( x='cat', y='dog'), c( x='flea', y='goat')), baseenv()) # Huh? ##Warning in `[<-.factor`(`*tmp*`, ri, value = "flea") : ## invalid factor level, NAs generated ##Warning in `[<-.factor`(`*tmp*`, ri, value = "goat") : ## invalid factor level, NAs generated ## x y ##1 cat dog ##2 <NA> <NA> #evalq( rbind( data.frame( x='cat', y='dog'), c( x='flea')), baseenv()) # Hmmm... ##Warning in `[<-.factor`(`*tmp*`, ri, value = "flea") : ## invalid factor level, NAs generated ##Warning in `[<-.factor`(`*tmp*`, ri, value = "flea") : ## invalid factor level, NAs generated ## x y ##1 cat dog ##2 <NA> <NA> #try( evalq( rbind( data.frame( x='cat', y='dog'), cbind( x='flea')), baseenv())) # ...mmmm... ##Error in rbind(deparse.level, ...) : ## numbers of columns of arguments do not match ## Data frames that aren't: #data.frame( x=1,y=2)[-1,] # a zero-row DF-- OK ## [1] x y ## <0 rows> (or 0-length row.names) #data.frame( x=1)[-1,] # not a DF!? ## numeric(0) #data.frame( x=1)[-1,,drop=FALSE] # OK, but exceeeeeedingly cumbersome ## <0 rows> (or 0-length row.names) ## Implications for rbind: #rbind( data.frame( x='yes')[-1,], x='no') ## [,1] ## x "no" # rbind.data.frame not called! #rbind( data.frame( x='yes')[-1,,drop=FALSE], x='no') ##Warning in rbind(deparse.level, ...) : ## risky to supply scalar argument(s) to 'rbind.data.frame' ## x ##x no ## Quirks of ordering and character/factor conversion: #rbind( data.frame( x=NA), data.frame( x='yes'))$x ##[1] NA "yes" # character #rbind( data.frame( x=NA_character_), data.frame( x='yes'))$x #[1] <NA> yes #Levels: yes # factor! #rbind( data.frame( x='yes'), data.frame( x=NA))$x[2:1] ##[1] <NA> yes ##Levels: yes # factor again