DataFrame-class {S4Vectors} | R Documentation |
The DataFrame
class extends the DataTable virtual
class and supports the storage of any type of object (with length
and [
methods) as columns.
On the whole, the DataFrame
behaves very similarly to
data.frame
, in terms of construction, subsetting, splitting,
combining, etc. The most notable exception is that the row names are
optional. This means calling rownames(x)
will return
NULL
if there are no row names. Of course, it could return
seq_len(nrow(x))
, but returning NULL
informs, for
example, combination functions that no row names are desired (they are
often a luxury when dealing with large data).
As DataFrame
derives from Vector
, it is
possible to set an annotation
string. Also, another
DataFrame
can hold metadata on the columns.
For a class to be supported as a column, it must have length
and [
methods, where [
supports subsetting only by
i
and respects drop=FALSE
. Optionally, a method may be
defined for the showAsCell
generic, which should return a
vector of the same length as the subset of the column passed to
it. This vector is then placed into a data.frame
and converted
to text with format
. Thus, each element of the vector should be
some simple, usually character, representation of the corresponding
element in the column.
DataFrame(..., row.names = NULL, check.names = TRUE)
:
Constructs a DataFrame
in similar fashion to
data.frame
. Each argument in ...
is coerced to
a DataFrame
and combined column-wise. No special effort is
expended to automatically determine the row names from the
arguments. The row names should be given in
row.names
; otherwise, there are no row names. This is by
design, as row names are normally undesirable when data is
large. If check.names
is TRUE
, the column names will
be checked for syntactic validity and made unique, if necessary.
To store an object of a class that does not support coercion to
DataFrame
, wrap it in I()
. The class must still have
methods for length
and [
.
In the following code snippets, x
is a DataFrame
.
dim(x)
:
Get the length two integer vector indicating in the first and
second element the number of rows and columns, respectively.
dimnames(x)
, dimnames(x) <- value
:
Get and set the two element list containing the row names
(character vector of length nrow(x)
or NULL
)
and the column names (character vector of length ncol(x)
).
as(from, "DataFrame")
:
By default, constructs a new DataFrame
with from
as
its only column. If from
is a matrix
or
data.frame
, all of its columns become columns in the new
DataFrame
. If from
is a list, each element becomes a
column, recycling as necessary. Note that for the DataFrame
to behave correctly, each column object must support element-wise
subsetting via the [
method and return the number of elements with
length
. It is recommended to use the DataFrame
constructor, rather than this interface.
as.list(x)
: Coerces x
, a DataFrame
,
to a list
.
as.data.frame(x, row.names=NULL, optional=FALSE)
:
Coerces x
, a DataFrame
, to a data.frame
.
Each column is coerced to a data.frame
and then column
bound together. If row.names
is NULL
, they are
retrieved from x
, if it has any. Otherwise, they are
inferred by the data.frame
constructor.
NOTE: conversion of x
to a data.frame
is not
supported if x
contains any list
, SimpleList
,
or CompressedList
columns.
as(from, "data.frame")
: Coerces a DataFrame
to a data.frame
by calling as.data.frame(from)
.
as.matrix(x)
: Coerces the DataFrame
to a
matrix
, if possible.
In the following code snippets, x
is a DataFrame
.
x[i,j,drop]
: Behaves very similarly to the
[.data.frame
method, except i
can be a
logical Rle
object and subsetting by matrix
indices
is not supported. Indices containing NA
's are also not
supported.
x[i,j] <- value
: Behaves very similarly to the
[<-.data.frame
method.
x[[i]]
: Behaves very similarly to the
[[.data.frame
method, except arguments j
and exact
are not supported. Column name matching is
always exact. Subsetting by matrices is not supported.
x[[i]] <- value
: Behaves very similarly to the
[[<-.data.frame
method, except argument j
is not supported.
In the following code snippets, x
is a DataFrame
.
rbind(...)
: Creates a new DataFrame
by
combining the rows of the DataFrame
objects in
...
. Very similar to rbind.data.frame
, except
in the handling of row names. If all elements have row names, they
are concatenated and made unique. Otherwise, the result does not
have row names. Currently, factors are not handled well (their
levels are dropped). This is not a high priority until there is an
XFactor
class.
cbind(...)
: Creates a new DataFrame
by
combining the columns of the DataFrame
objects in
...
. Very similar to cbind.data.frame
, except
row names, if any, are dropped. Consider the DataFrame
as an alternative that allows one to specify row names.
Michael Lawrence
DataTable and SimpleList which DataFrame extends directly.
score <- c(1L, 3L, NA) counts <- c(10L, 2L, NA) row.names <- c("one", "two", "three") df <- DataFrame(score) # single column df[["score"]] df <- DataFrame(score, row.names = row.names) #with row names rownames(df) df <- DataFrame(vals = score) # explicit naming df[["vals"]] # arrays ary <- array(1:4, c(2,1,2)) sw <- DataFrame(I(ary)) # a data.frame sw <- DataFrame(swiss) as.data.frame(sw) # swiss, without row names # now with row names sw <- DataFrame(swiss, row.names = rownames(swiss)) as.data.frame(sw) # swiss # subsetting sw[] # identity subset sw[,] # same sw[NULL] # no columns sw[,NULL] # no columns sw[NULL,] # no rows ## select columns sw[1:3] sw[,1:3] # same as above sw[,"Fertility"] sw[,c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE)] ## select rows and columns sw[4:5, 1:3] sw[1] # one-column DataFrame ## the same sw[, 1, drop = FALSE] sw[, 1] # a (unnamed) vector sw[[1]] # the same sw[["Fertility"]] sw[["Fert"]] # should return 'NULL' sw[1,] # a one-row DataFrame sw[1,, drop=TRUE] # a list ## duplicate row, unique row names are created sw[c(1, 1:2),] ## indexing by row names sw["Courtelary",] subsw <- sw[1:5,1:4] subsw["C",] # partially matches ## row and column names cn <- paste("X", seq_len(ncol(swiss)), sep = ".") colnames(sw) <- cn colnames(sw) rn <- seq(nrow(sw)) rownames(sw) <- rn rownames(sw) ## column replacement df[["counts"]] <- counts df[["counts"]] df[[3]] <- score df[["X"]] df[[3]] <- NULL # deletion