R: rbind Concatenate data frames by row, keeping any zero-row...

rbdf {mvbutils}

R Documentation

rbind Concatenate data frames by row, keeping any zero-row arguments

Description

rbind concatenates its arguments by row; see cbind for basic documentation. There is an rbind method for data frames which mvbutils overrides, and rbdf calls the override directly. The mvbutils version should behave exactly as the base-R version, with two exceptions:

zero-row arguments are not ignored, e.g. so that factor levels that never appear are not dropped.
dimensioned (array or matrix) elements do not lose any extra attributes (such as class).

I find the zero-row behaviour more logical, and useful because e.g. it lets me create an empty.data.frame with the correct type/class/levels for all columns, then subsequently add rows to it. The behaviour for matrix (array) elements allows e.g. the rbinding of data frames that contain matrices of POSIXct elements without losing the POSIXct class (as in my package nicetime).

When rbinding data frames, best practice is to make sure all the arguments really are data frames. Lists and matrices also work OK (they are first coerced to data frames), but scalars are dangerous (even though base-R will process them without complaint). rbind is quirky around data frames; unless all the arguments are data frames, sometimes rbind.data.frame will not be called even when you'd expect it to be, and the coercion of scalars is frankly potty; see Details and EXAMPLES. mvbutils:::rbind.data.frame tries to mimic the base-R scalar coercion, but I'm not sure it's 100% compatible. Again, the safest way to ensure a predictable outcome, is to make sure all arguments really are data frames, and/or to call rbdf directly.

Usage

rbind(..., deparse.level = 1) # generic
rbind(..., deparse.level = 1) # S3 method for data frames
rbdf(..., deparse.level = 1) # explicitly call S3 method for data frames (circumvent rbind dispatch)

Arguments

`...`	Data frames, or things that will coerced to data frames. NULLs are ignored.
`deparse.level`	not used by `rbind.data.frame`, it's for the default and generic only

Details

See cbind documentation in base-R.

R's dispatch mechanism for rbind is as follows [my paraphrasing of base-R documentation]. Mostly, if any argument is a data frame then rbind.data.frame will be used. However, if one argument is a data frame but another argument is a scalar/matrix of a class that has an rbind method, then "default rbind" will be called instead. Although the latter still returns a data frame, it stuffs up e.g. class attributes, so that POSIXct objects will be turned into huge numbers. Again, if you really want a data frame result, make sure all the arguments are data frames.

In mvbutils:::rbind.data.frame (and AFAIK in the base-R version), arguments that are not data frames are coerced to data frames, by calling data.frame() on them. AFAICS this works predictably for list and matrix arguments; note that lists need names, and matrices need column names, that match the names of the real data frame arguments, because column alignment is done by name not position. Behaviour for scalars is IMO weird; see Examples. The idea seems to be to turn each scalar into a single-row data frame, coercing its names and truncating/replicating it to match the columns of the first real data frame argument; any names of the scalar itself are disregarded, and alignment is by position not name. Although mvbutils:::rbind.data.frame tries to mimic this coercion, it seems to me unnecessary (the user should just turn the scalar into something less ambiguous), confusing, and dangerous, so mvbutils issues a warning. Whether I have duplicated every quirk, I'm not sure.

Note also that R's accursed drop=TRUE default means that things you might reasonably think should be data frames, might not be. Under some circumstances, this might result in rbind.data.frame being bypassed. See Examples.

Short of rewriting data.frame and rbind, there's nothing mvbutils can do to fix these quirks. Whether base-R should consider any changes is another story, but back-compatibility probably suggests not.

Value

[Taken from the base-R documentation, modified to fit the mvbutils version] The rbind data frame method first drops any NULL arguments, then coerces all others to data frames (see Details for how it does this with scalars). Then it drops all zero-column arguments. (If that leaves none, it returns a zero-column zero-row data frame.) It then takes the classes of the columns from the first argument, and matches columns by name (rather than by position). Factors have their levels expanded as necessary (in the order of the levels of the levelsets of the factors encountered) and the result is an ordered factor if and only if all the components were ordered factors. (The last point differs from S-PLUS.) Old-style categories (integer vectors with levels) are promoted to factors. Zero-row arguments are kept, so that in particular their column classes and factor levels are taken account of. Because the class of each column is set by the first data frame, rather than "by consensus", numeric/character/factor conversions can be a bit surprising especially where NAs are involved. See the final bit of EXAMPLES.

Examples

## Why base-R dropping of zero rows is odd
#rbind( data.frame( x='yes', y=1)[-1,], data.frame( x='no', y=0))$x # mvbutils
##[1] no
##Levels: yes no # two levels
#base::rbind( data.frame( x='yes', y=1)[-1,], data.frame( x='no', y=0))$x # base-R
#[1] no
#Levels: no # lost level
#rbind( data.frame( x='yes', y=1)[-1,], data.frame( x='no', y=0, stringsAsFactors=FALSE))$x
##[1] no
##Levels: yes no
#base::rbind( data.frame( x='yes', y=1)[-1,], data.frame( x='no', y=0, stringsAsFactors=FALSE))$x
##[1] "no" # x has turned into a character
## Quirks of scalar coercion
#evalq( rbind( data.frame( x=1), x=2, x=3), baseenv()) # OK I guess
##   x
##1  1
##x  2
##x1 3
#evalq( rbind( data.frame( x=1), x=2:3), baseenv()) # NB lost element
##  x
##1 1
##x 2
#evalq( rbind( data.frame( x=1, y=2, z=3), c( x=4, y=5)), baseenv()) # NB gained element! Try predicting z[2]...
##  x y z
##1 1 2 3
##2 4 5 4
#evalq( rbind( data.frame( x='cat', y='dog'), cbind( x='flea', y='goat')), baseenv()) # OK
##     x    y
##1  cat  dog
##2 flea goat
#evalq( rbind( data.frame( x='cat', y='dog'), c( x='flea', y='goat')), baseenv()) # Huh?
##Warning in `[<-.factor`(`*tmp*`, ri, value = "flea") :
##  invalid factor level, NAs generated
##Warning in `[<-.factor`(`*tmp*`, ri, value = "goat") :
##  invalid factor level, NAs generated
##     x    y
##1  cat  dog
##2 <NA> <NA>
#evalq( rbind( data.frame( x='cat', y='dog'), c( x='flea')), baseenv()) # Hmmm...
##Warning in `[<-.factor`(`*tmp*`, ri, value = "flea") :
##  invalid factor level, NAs generated
##Warning in `[<-.factor`(`*tmp*`, ri, value = "flea") :
##  invalid factor level, NAs generated
##     x    y
##1  cat  dog
##2 <NA> <NA>
#try( evalq( rbind( data.frame( x='cat', y='dog'), cbind( x='flea')), baseenv())) # ...mmmm...
##Error in rbind(deparse.level, ...) :
##  numbers of columns of arguments do not match
## Data frames that aren't:
#data.frame( x=1,y=2)[-1,] # a zero-row DF-- OK
## [1] x y
## <0 rows> (or 0-length row.names)
#data.frame( x=1)[-1,] # not a DF!?
## numeric(0)
#data.frame( x=1)[-1,,drop=FALSE] # OK, but exceeeeeedingly cumbersome
## <0 rows> (or 0-length row.names)
## Implications for rbind:
#rbind( data.frame( x='yes')[-1,], x='no')
##  [,1]
## x "no" # rbind.data.frame not called!
#rbind( data.frame( x='yes')[-1,,drop=FALSE], x='no')
##Warning in rbind(deparse.level, ...) :
##  risky to supply scalar argument(s) to 'rbind.data.frame'
##   x
##x no
## Quirks of ordering and character/factor conversion:
#rbind( data.frame( x=NA), data.frame( x='yes'))$x
##[1] NA    "yes" # character
#rbind( data.frame( x=NA_character_), data.frame( x='yes'))$x
#[1] <NA> yes
#Levels: yes # factor!
#rbind( data.frame( x='yes'), data.frame( x=NA))$x[2:1]
##[1] <NA>  yes
##Levels: yes # factor again

[Package mvbutils version 2.7.4.1 Index]