Creating Tabular Arrays

The tabarray has a unified constructor that lets you load data from a variety of different sources.

To make a tabarray, you can pass the constructor an existing Python data object (e.g. a list of records or columns), or a file path to tabular data stored on disk in some format (e.g. CSV file, NumPy binary file), using different keyword arguments to control how the constructor interprets the data you give it. By default, the data types of the columns will be inferred automatically, and default column names will be selected, but you can override these defaults to set names and types by hand using the “names”, “dtype”, and “format” arguments. The constructor has a variety of other keyword arguments for controlling the resulting data object (see tabular.tabarray.tabarray.__new__() for more detailed information.)

To see how the tabarray contructor works, start by importing Tabular in an interactive Python session:

>>> import tabular as tb

From data in memory

You can construct a tabarray from data in a Python object (e.g. a list of records).

Records

Pass a list of records (rows) to the records argument. It is fastest when passed a list of tuples, rather than a list of lists (but this only really matters for large tabarrays).

>>> tb.tabarray(records=[('bork', 1, 3.5), ('stork', 2, -4.0)], names=['a','b','c'])
tabarray([('bork', 1, 3.5), ('stork', 2, -4.0)],
      dtype=[('a', '|S5'), ('b', '<i4'), ('c', '<f8')])

If you do not give the “names” argument, column names ‘f0’, ‘f1’, &c, will be chosen by default:

>>> tb.tabarray(records=[('mork', 1, 2.5)]*100000)
tabarray([('mork', 1, 2.5), ('mork', 1, 2.5), ('mork', 1, 2.5), ...,
       ('mork', 1, 2.5), ('mork', 1, 2.5), ('mork', 1, 2.5)],
      dtype=[('f0', '|S4'), ('f1', '<i4'), ('f2', '<f8')])

Columns

Pass a list of columns to the columns argument. It is fastest when passed a list of NumPy arrays, rather than a list of lists (but this only really matters for large tabarrays).

>>> tb.tabarray(columns=[['bork', 'stork'], [1, 2], [3.5, -4.0]], names=['a','b','c'])
tabarray([('bork', 1, 3.5), ('stork', 2, -4.0)],
      dtype=[('a', '|S5'), ('b', '<i4'), ('c', '<f8')])
>>> import numpy as np
>>> x = np.array(['mork']*10**5);  y = np.ones(10**5, int);  z = np.ones(10**5)*2.5
>>> tb.tabarray(columns=[x,y,z])
tabarray([('mork', 1, 2.5), ('mork', 1, 2.5), ('mork', 1, 2.5), ...,
       ('mork', 1, 2.5), ('mork', 1, 2.5), ('mork', 1, 2.5)],
      dtype=[('f0', '|S4'), ('f1', '<i4'), ('f2', '<f8')])

Arrays

Pass a NumPy array to the array argument. The NumPy array can have uniform or structured dtype.

NumPy array (uniform type)

>>> import numpy as np
>>> tb.tabarray(array=np.array([np.linspace(0,i,5) for i in range(4)]))
tabarray([(0.0, 0.0, 0.0, 0.0, 0.0), (0.0, 0.25, 0.5, 0.75, 1.0),
       (0.0, 0.5, 1.0, 1.5, 2.0), (0.0, 0.75, 1.5, 2.25, 3.0)],
      dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8')])

NumPy array (structured dtype)

Pass a NumPy ndarray or recarray with structured dtype (e.g. named columns, each a specific Python data type) to the array argument. When doing this, also pass the dtype of that object to the dtype argument.

>>> x = np.rec.fromrecords([('bork', 1, 3.5), ('stork', 2, -4.0)], names=['a','b','c'])
>>> tb.tabarray(array=x, dtype=x.dtype)
tabarray([('bork', 1, 3.5), ('stork', 2, -4.0)],
      dtype=[('a', '|S5'), ('b', '<i4'), ('c', '<f8')])

NumPy arrays can themselves be constructed in many ways – see NumPy array creation routines.

Empty arrays

Empty tabular arrays can be constructed using the “shape” parameter:

>>> tb.tabarray(shape=(10,),names=['a','b','c','d'],formats='f8,|S4,i4,i4')
tabarray([(-2.0000071525573744, '\x18', 0, 0), (0.0, '', 0, 0),
       (0.0, '', 0, 0), (6.590560383656763e-304, '\xe0\xee\xa4\x01', 0, 0),
       (0.0, '', 0, 1941674056),
       (7.3861605135730513e-304, '\t\t\x08*', 14157280, 27576368),
       (0.0, '', -897189557, 16575680),
       (-5.5559695710920376e-46, '`\xe3\xfc', 27576368, 0),
       (0.0, '', 0, 0), (0.0, '', -1778927632, 16592112)],
      dtype=[('a', '<f8'), ('b', '|S4'), ('c', '<i4'), ('d', '<i4')])

The empty array does not contain zeros or blanks – it contains randomly selected values.

The dtype object

As is implied by the examples above, tabarrays have an associated dtype object, containing information about the names and formats of the columns in the tabular array. For instance, the dtype of a tabarray with column ‘A’ (length-5 string), column ‘B’ (integer-valued), and column ‘C’ (floating point) is:

dtype([('A', '|S5'), ('B', '<i4'), ('C', '<f8')])

The dtype object can be queried to find relevant information about the tabarray: the length of the dtype is the number of columns of the array, while dtype.names is the is of column names. Tabarray dtypes are inherited directly from NumPy recarray dtypes.

Have a look at Numpy dtypes for more information about dtype objects.

When loading from a list of records or columns, the dtype is inferred by default. You can override the inference of data types by specifying a “formats” argument or a “dtype” argument directly:

>>> tb.tabarray(records=[('bork', 1, 3.5), ('stork', 2, -4.0)], formats='|S5,f8,f8')
tabarray([('bork', 1, 3.5), ('stork', 2, -4.0)],
      dtype=[('a', '|S5'), ('b', '<i4'), ('c', '<f8')])
>>> tb.tabarray(records=[('bork', 1, 3.5), ('stork', 2, -4.0)], dtype=[('a','|S5'),('b','f8'),('c','f8')])
tabarray([('bork', 1, 3.5), ('stork', 2, -4.0)],
      dtype=[('a', '|S5'), ('b', '<i4'), ('c', '<f8')])

From data on disk

You can also construct a tabarray by loading data from a variety of file formats (e.g. delimited text file, NumPy binary file). Have a look at Input and Output for more information.

Table Of Contents

Previous topic

The Organization of the Tabular Package

Next topic

Input and Output

This Page