tabular.io

Functions for tabular.tab.tabarray i/o methods, including to/from separated-value (CSV, e.g. .tsv, .csv) and other text files, binary files, hierarchical separated-value (HSV) format.

tabular.io.loadSV(fname, shape=None, titles=None, aligned=False, byteorder=None, renamer=None, **kwargs)

Load a delimited text file to a numpy record array.

Basically, this function calls loadSVcols and combines columns returned by that function into a numpy ndarray with stuctured dtype. Also uses and returns metadata including column names, formats, coloring, &c. if these items are determined during the loading process.

Parameters

fname : string or file object

Path (or file object) corresponding to a separated variable (CSV) text file.

names : list of strings

Sets the names of the columns of the resulting tabarray. If not specified, names value is determined first by looking for metadata in the header of the file, and if that is not found, are assigned by NumPy’s f0, f1, ... fn convention. See namesinheader parameter below.

formats : string or list of strings

Sets the datatypes of the columns. The value of formats can be a list or comma-delimited string of values describing values for each column (e.g. “str,str,int,float” or [“str”, “str”, “int”, “float”]), a single value to apply to all columns, or anything that can be used in numpy.rec.array constructor.

If the formats (or dtype) parameter are not specified, typing is done by inference. (See also typer parameter below).

dtype : numpy dtype object

Sets the numpy dtype of the resulting tabarray, combining column format and column name information. If dtype is set, any names and formats specifications will be overriden. If the dtype (or formats) parameter are not specified, typing is done by inference. (See also typer parameter below).

The names, formats and dtype parameters duplicate parameters of the NumPy record array creation inferface. Additional paramters of the NumPy inferface that are passed through are shape, titles, byteorder and aligned (see NumPy documentation for more information.)

kwargs: keyword argument dictionary of variable length

Contains various parameters to be passed down to loadSVcols. These may include skiprows, comments, delimiter, lineterminator, uselines, usecols, excludecols, metametadata, namesinheader,**headerlines**, valuefixer, linefixer, colfixer, delimiter_regex, inflines, typer, missingvalues, fillingvalues, verbosity, and various CSV module parameters like escapechar, quoting, quotechar=, doublequote, skipinitialspace.

Returns

R : numpy record array

record array constructed from data in the SV file

metadata : dictionary

Metadata read and constructed during process of reading file.

See Also:

tabular.io.loadSVcols(), tabular.io.saveSV(), tabular.io.DEFAULT_TYPEINFERER()
tabular.io.loadSVcols(fname, usecols=None, excludecols=None, valuefixer=None, colfixer=None, missingvalues=None, fillingvalues=None, typeinferer=None, **kwargs)

Load a separated value text file to a list of column arrays.

Basically, this function calls loadSVrecs, and transposes the string-valued row data returned by that function into a Python list of numpy arrays corresponding to columns, each a uniform Python type (int, float, str). Also uses and returns metadata including column names, formats, coloring, &c. if these items are determined during the loading process.

Parameters

fname : string or file object

Path (or file object) corresponding to a separated variable (CSV) text file.

usecols : sequence of non-negative integers or strings, optional

Only the columns in usecols are loaded and processed. Columns can be described by number, with 0 being the first column; or if name metadata is present, then by name ; or, if color group information is present in the file, then by color group name. (Default is None, e.g. all columns are loaded.

excludecols : sequence of non-negative integers or strings, optional

Converse of usecols, e.g. all columns EXCEPT those listed will be loaded.

valuefixer : callable, or list or dictionary of callables, optional

These callable(s) are applied to every value in each field. The application is done after line strings are loaded and split into fields, but before any typing or missing-value imputation is done. The purpose of the valuefixer is to prepare column values for typing and imputation. The valuefixer callable can return a string or a python object. If valuefixer is a single callable, then that same callable is applied to values in all column; if it is a dictionary, then the keys can be either numbers or names and the value for the key will be applied to values in the corresponding column with that name or number; if it is a list, then the list elements must be in 1-1 correponsdence with the loaded columns, and are applied to each respectively.

colfixer : callable, or list or dictionary of callables, optional

Same as valuefixer, but instead of being applied to individual values, are applied to whole columns (and must return columns or numpy arrays of identical length). Like valuefixer, colfixer callable(s) are applied before typing and missing-value imputation.

missingvalues : string, callable returning string, or list or dictionary of strings or string-valued callable

String value(s) to consider as “missing data” and to be replaced before typing is done. If specified as a callable, the callable will be applied to the column(s) to determine missing value. If specified as a dictionary, keys are expected to be numbers of names of columns, and values are individual missing values for those columns (like valuefixer inferface).

fillingvalues : string, pair of strings, callable returning string, or list or dictionary of strings or string-valued callable

Values to be used to replace missing data before typing is done. If specified as a single non-callable, non-tuple value, this value is used to replace all missing data. If specified as a callable, the callable is applied to the column and returns the fill value (e.g. to allow the value to depend on the column type). If specified as a pair of values, the first value acts as the missing value and the second as the value to replace with. If a dictionary or list of values, then values are applied to corresponding columns.

NOTE: all the missingvalues and fillingvalues functionalities can be replicated (and generalized) using the valuefixer or colfixer parameters, by specifying function(s) which identify and replace missing values. While more limited, using missingvalues and fillingvalues interface is easier and gives better performance.

typer : callable taking python list of strings (or other values) and returning 1-dnumpy array ; or list dictionary of such callables

Function used to infer type and convert string lists into typed numpy arrays, if no format information has been provided. When applied at all, this function is applied after string have been loaded and split into fields. This function is expected to impute missing values as well, and will override any setting of missingvalues or fillingvalues. If a callable is passed, it is used as typer for all columns, while if a dictionary (or list) of callables is passed, they’re used on corresponding columns. If needed (e.g. because formatting information hasn’t been supplied) but typer isn’t specified (at least, for a given column), the constructor defaults to using the utils.DEFAULT_TYPEINFERER function.

kwargs: keyword argument dictionary of variable length

Contains various parameters to be passed on to loadSVrecs, including skiprows, comments, delimiter, lineterminator, uselines, metametadata, namesinheader,**headerlines**, linefixer, delimiter_regex, inflines, verbosity, and various CSV module parameters like escapechar, quoting, quotechar=, doublequote, skipinitialspace.

Returns

columns : list of numpy arrays

List of arrays corresponding to columns of data.

metadata : dictionary

Metadata read and constructed during process of reading file.

See Also:

tabular.io.loadSV(), tabular.io.saveSV(), tabular.io.DEFAULT_TYPEINFERER()
tabular.io.loadSVrecs(fname, uselines=None, skiprows=0, linefixer=None, delimiter_regex=None, verbosity=5, **metadata)

Load a separated value text file to a list of lists of strings of records.

Takes a tabular text file with a specified delimeter and end-of-line character, and return data as a list of lists of strings corresponding to records (rows). Also uses and returns metadata (including column names, formats, coloring, &c.) if these items are determined during the loading process.

Parameters

fname : string or file object

Path (or file object) corresponding to a separated variable
(CSV) text file.

delimiter : single-character string

When reading text file, character to use as delimiter to split fields. If not specified, the delimiter is determined first by looking for special-format metadata specifying the delimiter, and then if no specification is found, attempts are made to infer delimiter from file contents. (See inflines parameter below.)

delimiter_regex : regular expression (compiled or in string format)

Regular expression to use to recognize delimiters, in place of a single character. (For instance, to have whitespace delimiting, using delimiter_regex = ‘[s*]+’ )

lineterminator : single-character string

Line terminator to use when reading in using SVfile
skipinitialspace : boolean
If true, strips whitespace following the delimiter from field.

The delimiter, linterminator and skipinitialspace parameters are passed on as parameters to the python CSV module, which is used for reading in delimited text files. Additional parameters from that interface that are replicated in this constructor include quotechar, escapechar, quoting, doublequote and dialect (see CSV module documentation for more information.)

skiprows : non-negative integer, optional

When reading from a text file, the first skiprows lines are ignored. Default is 0, e.g no rows are skipped.

uselines : pair of non-negative integer, optional

When reading from a text file, range of lines of data to load. (In constrast to skiprows, which specifies file rows to ignore before looking for header information, uselines specifies which data (non-header) lines to use, after header has been striped and processed.) See headerlines below.

comments : single-character string, optional

When reading from a text file, character used to distinguish header lines. If specified, any lines beginning with this character at the top of the file are assumed to contain header information and not row data.

headerlines : integer, optional

When reading from a text file, the number of lines at the top of the file (after the first skiprows lines) corresponding to the header of the file, where metadata can be found. Lines after headerlines are assumed to contain row contents. If not specified, value is determined first by looking for special metametadata in first line of file (see Tabular reference documentation for more information about this), and if no such metadata is found, is inferred by looking at file contents.

namesinheader : Boolean, optional

When reading from a text file, if namesinheader == True, then assume the column names are in the last header line (unless overridden by existing metadata or metametadata directive). Default is True.

linefixer : callable, optional

This callable is applied to every line in the file. If specified, the called is applied directly to the strings in the file, after they’re split in lines but before they’re split into fields. The purpose is to make lines with errors or mistakes amenable to delimiter inference and field-splitting.

inflines : integer, optional

Number of lines of file to use as sample data when inferring delimiter and header.

metametadata : dictionary of integers or pairs of integers

Specifies supplementary metametadata information for use with SVfile loading. See Tabular reference documentation for more information

Returns

records : list of lists of strings

List of lists corresponding to records (rows) of data.

metadata : dictionary

Metadata read and constructed during process of reading file.

See Also:

tabular.io.loadSV(), tabular.io.saveSV(), tabular.io.DEFAULT_TYPEINFERER()
tabular.io.saveSV(fname, X, comments=None, metadata=None, printmetadict=None, dialect=None, delimiter=None, doublequote=True, lineterminator='\n', escapechar=None, quoting=0, quotechar='"', skipinitialspace=False, stringifier=None, verbosity=5)

Save a tabarray to a separated-variable (CSV) file.

Parameters

fname : string

Path to a separated variable (CSV) text file.

X : tabarray

The actual data in a tabular.tab.tabarray.

comments : string, optional

The character to be used to denote the start of a header (non-data) line, e.g. ‘#’. If not specified, it is determined according to the following rule: ‘#’ if metadata argument is set, otherwise ‘’.

delimiter : string, optional

The character to beused to separate values in each line of text, e.g. ‘,’. If not specified, by default, this is inferred from the file extension: if the file ends in .csv, the delimiter is ‘,’, otherwise it is ‘t.’

linebreak : string, optional

The string separating lines of text. By default, this is assumed to be ‘n’, and can also be set to be ‘r’ or ‘rn’.

metadata : list of strings or Boolean, optional

Allowed values are True, False, or any sublists of the list [‘names’, ‘formats’, ‘types’, ‘coloring’, ‘dialect’]. These keys indicate what special metadata is printed in the header.

  • If a sublist of [‘names’, ‘formats’, ‘types’, ‘coloring’, ‘dialect’], then the indicated types of metadata are written out.
  • If True, this is the same as metadata = [‘coloring’, ‘types’, ‘names’,’dialect’], e.g. as many types of metadata as this algorithm currently knows how to write out.
  • If ‘False’, no metadata is printed at all, e.g. just the data.
  • If metadata is not specified, the default is [‘names’], – that is, just column names are written out.

printmetadict : Boolean, optional

Whether or not to print a string representation of the metadatadict in the first line of the header.

If printmetadict is not specified, then:

  • If metadata is specified and is not False, then printmetadata defaults to True.
  • Else if metadata is False, then printmetadata defaults to False.
  • Else metadata is not specified, and printmetadata defaults to False.

See the tabular.io.loadSV() for more information about metadatadict.

stringifier : callable taking 1-d numpy array and returning python list of strings of same length, or dictionary or tuple of such callables.

If specified, the callable will be applied to each column, and the resulting list of strings will be written to the file. If specified as a list or dictionary of callables, the functions will be applied to correponding columns. The default used if stringifier is not specified, is tb.utils.DEFAULT_STRINGIFIER, which merely passes through string-type columns, and converts numerical-type columns directly to correponding strings with NaNs replaced with blank values. The main purpose of specifying a non-default value is to encode numerical values in various string encodings that might be used required for other applications like databases.

NOTE: In certain special circumstances (e.g. when the lineterminator or delimiter character appears in a field of the data), the python CSV writer is used to write out data. To allow for control of the operation of the writer in these circumstances, the following other parameters replicating the interface of the CSV module are also valid, and values will be passed through: doublequote, escapechar, quoting, quotechar, and skipinitialspace. (See python CSV module documentation for more information.)

See Also:

tabular.io.loadbinary(fname)

Load a numpy binary file or archive created by tabular.io.savebinary.

Load a numpy binary file (.npy) or archive (.npz) created by tabular.io.savebinary().

The data and associated data type (e.g. dtype, including if given, column names) are loaded and reconstituted.

If fname is a numpy archive, it may contain additional data giving hierarchical column-oriented structure (e.g. coloring). See tabular.tab.tabarray.__new__() for more information about coloring.

The .npz file is a zipped archive created using numpy.savez() and containing one or more .npy files, which are NumPy binary files created by numpy.save().

Parameters

fname : string or file-like object

File name or open numpy binary file (.npy) or archive (.npz) created by tabular.io.savebinary().

  • When fname is a .npy binary file, it is reconstituted as a flat ndarray of data, with structured dtype.

  • When fname is a .npz archive, it contains at least one .npy binary file and optionally another:

    • data.npy must be in the archive, and is reconstituted as X, a flat ndarray of data, with structured dtype, dtype.
    • coloring.npy, if present is reconstitued as coloring, a dictionary.

Returns

X : numpy ndarray with structured dtype

The data, where each column is named and is of a uniform NumPy data type.

dtype : numpy dtype object

The data type of X, e.g. X.dtype.

coloring : dictionary, or None

Hierarchical structure on the columns given in the header of the file; an attribute of tabarrays.

See tabular.tab.tabarray.__new__() for more information about coloring.

See Also:

tabular.io.savebinary(), numpy.load(), numpy.save(), numpy.savez()
tabular.io.savebinary(fname, X, savecoloring=True)

Save a tabarray to a numpy binary file or archive.

Save a tabarray to a numpy binary file (.npy) or archive (.npz) that can be loaded by tabular.io.savebinary().

The .npz file is a zipped archive created using numpy.savez() and containing one or more .npy files, which are NumPy binary files created by numpy.save().

Parameters

fname : string or file-like object

File name or open numpy binary file (.npy) or archive (.npz) created by tabular.io.savebinary().

X : tabarray

The actual data in a tabular.tab.tabarray:

  • if fname is a .npy file, then this is the same as:

    numpy.savez(fname, data=X)
    
  • otherwise, if fname is a .npz file, then X is zipped inside of fname as data.npy

savecoloring : boolean

Whether or not to save the coloring attribute of X. If savecoloring is True, then fname must be a .npz archive and X.coloring is zipped inside of fname as coloring.npy

See tabular.tab.tabarray.__new__() for more information about coloring.

See Also:

tabular.io.loadbinary(), numpy.load(), numpy.save(), numpy.savez()
tabular.io.loadHSV(path, X=None, names=None, rootpath=None, rootheader=None, coloring=None, toload=None, Nrecs=None)

Load a list of columns (numpy arrays) from a HSV directory.

Load a list of numpy arrays, corresponding to columns of data, from a hierarchical separated variable (HSV) directory (.hsv) created by tabular.io.saveHSV().

This function is used by the tabarray constructor tabular.tab.tabarray.__new__() when passed the HSV argument.

Each column of data inside of the .hsv directory is a separate comma-separated variable text file (.csv), whose name includes the column name and data type of the column (e.g. name.int.csv, name.float.csv, name.str.csv). An ordered list of columns, if provided, is stored in a separate file, header.txt.

A .hsv directory can contain .hsv subdirectories. This allows for hierarchical structure on the columns, which is mapped to a coloring dictionary. For example, a subdirectory named color.hsv contains .csv files corrseponding to columns of data grouped by that color. Note that when the file structure is not flat, tabular.io.loadHSV() calls itself recursively.

Parameters

path : string

Path to a .hsv directory or individual .csv text files, corresponding to individual columns of data inside of a .hsv directory.

X : list of numpy arrays, optional

List of numpy arrays, corresponding to columns of data. Typically, the X argument is only passed when tabular.io.loadHSV() calls itself recursively, in which case each element is a column of data that has already been loaded.

names : list of strings, optional

List of strings giving column names. Typically, the names is only passed when tabular.io.loadHSV() calls itself recursively, in which case each element gives the name of the corresponding array in X.

rootpath : string, optional

Path to the top-level file (directory), i.e. the value of path the first time tabular.io.loadHSV() is called. Typically, the rootpath argument is only passed when tabular.io.loadHSV() calls itself recursively.

rootheader : list of strings, optional

Ordered list of column names. Typically, the rootheader argument is only passed when tabular.io.loadHSV() calls itself recursively, in which case rootheader is filled by parsing the (optional) header.txt file in rootpath, if it exists.

coloring : dictionary, optional

Hierarchical structure on the columns given in the header of the file; an attribute of tabarrays.

Typically, the coloring argument is only passed when tabular.io.loadHSV() calls itself recursively, in which case it contains coloring, i.e. hierarchical structure information, on the arrays in X.

See tabular.tab.tabarray.__new__() for more information about coloring.

See Also: tabular.io.infercoloring()

toload : list of strings, optional

List of strings corresponding to a subset of column names and/or color names; only these columns are loaded.

See Also: tabular.io.thresholdcoloring()

Nrecs : non-negative integer

The number of records in X. Typically, the Nrecs argument is only passed when tabular.io.loadHSV() calls itself recursively, in which case it is set by the first .csv file loaded. Subsequent columns must have the same number of records; when any subsequent column disagrees, it is not loaded and a warning is issued.

Returns

X : list of numpy arrays

List of numpy arrays, corresponding to columns of data, each loaded from one .csv file.

names : list of strings

List of strings giving column names.

coloring : dictionary

Hierarchical structure on the columns given in the header of the file; an attribute of tabarrays.

See tabular.tab.tabarray.__new__() for more information about coloring.

See Also:

tabular.io.saveHSV(fname, X, printheaderfile=True)

Save a tabarray to a hierarchical separated variable (HSV) directory.

The tabarray can later be loaded back from the .hsv by passing fname to the HSV argument of the tabarray constructor tabular.tab.tabarray.__new__().

This function is used by the tabarray method tabular.tab.tabarray.saveHSV().

Each column of data in the tabarray is stored inside of the .hsv directory to a separate comma-separated variable text file (.csv), whose name includes the column name and data type of the column (e.g. name.int.csv, name.float.csv, name.str.csv).

Coloring information, i.e. hierarchical structure on the columns, is stored in the file directory structure of the .hsv, where .hsv subdirectories correspond to colors in the coloring dictionary:

X.coloring.keys()

e.g. a subdirectory named color.hsv contains .csv files corrseponding to columns of data grouped by that color:

X['color']

See tabular.tab.tabarray.__new__() for more information about coloring.

Note that when the file structure is not flat, tabular.io.loadHSV() calls itself recursively.

Parameters

fname : string

Path to a .hsv directory or individual .csv text files, corresponding to individual columns of data inside of a .hsv directory.

X : tabarray

The actual data in a tabular.tab.tabarray.

printheaderfile : boolean, optional

Whether or not to print an ordered list of columns names in an additional file header.txt in all .hsv directories. The order is given by:

X.dtype.names

The header.txt file is used by tabular.io.loadHSV() to load the columns of data in the proper order, but is not required.

See Also:

tabular.io.savecolumns(fname, X)

Save columns of a tabarray to an existing HSV directory.

Save columns of tabarray X to an existing HSV directory fname (e.g. a .hsv directory created by tabular.io.saveHSV()).

Each column of data in the tabarray is stored inside of the .hsv directory to a separate comma-separated variable text file (.csv), whose name includes the column name and data type of the column (e.g. name.int.csv, name.float.csv, name.str.csv).

Coloring is lost.

This function is used by the tabarray method tabular.tab.tabarray.savecolumns().

Parameters

fname : string

Path to a hierarchical separated variable (HSV) directory (.hsv).

X : tabarray

The actual data in a tabular.tab.tabarray.

See Also:

tabular.io.loadHSVlist(flist)

Load tabarrays from a list of hierarchical separated variable directories.

Loads tabarrays from a list of hierarchical separated variable (HSV) paths, assuming they have disjoint columns and identical numbers of rows; then stacks them horizontally, e.g. adding columns side-by-side, aligning the rows.

Colorings can be lost.

Parameters

flist : list of strings

List of paths to hierarchical separated variable (HSV) directories (.hsv) and/or individual .csv text files, corresponding to individual columns of data inside of a .hsv directory.

See Also:

tabular.io.appendHSV(fname, RecObj, order=None)

Append records to an on-disk tabarray, e.g. HSV directory.

Function for appending records to an on-disk tabarray, used when one wants to write a large tabarray that is not going to be kept in memory at once.

If the tabarray is not there already, the function intializes the tabarray using the tabarray __new__ method, and saves it out.

Parameters

fname : string

Path of hierarchical separated variable (.hsv) file to which to append records in RecObj.

RecObj : array or dictionary

  • Either an array with complex dtype (e.g. tabarray, recarray or ndarray), or

  • a dictionary (ndarray with structured dtype, e.g. a tabarray) where

    • keys are names of columns to append to, and
    • the value on a column is a list of values to be appended to that column.

order : list of strings

List of column names specifying order in which the columns should be written; only used when the HSV does not exist and the header specifying order needs to be written.

See Also:

tabular.io.appendcolumns(fname, RecObj, order=None)

Append records to a flat on-disk tabarray, e.g. HSV without subdirectories.

Function for appending columnns a flat on-disk tabarray, (e.g. no colors), used when one wants to write a large tabarray that is not going to be kept in memory at once.

If the tabarray is not there already, the function intializes the tabarray using the tabarray __new__ method, and saves it out.

See tabular.io.appendHSV() for a more general method.

Parameters

fname : string

Path of hierarchical separated variable (.hsv) file of which to append.

RecObj : array or dictionary

  • Either an array with complex dtype (e.g. tabarray, recarray or ndarray), or

  • a dictionary (ndarray with structured dtype, e.g. a tabarray) where

    • keys are names of columns to append to, and
    • the value on a column is a list of values to be appended to that column.

order : list of strings

List of column names specifying order in which the columns should be written; only used when the HSV does not exist and the header specifying order needs to be written.

See Also:

tabular.io.inferdelimiterfromname(fname)

Infer delimiter from file extension.

  • If fname ends with ‘.tsv’, return ‘t’.
  • If fname ends with ‘.csv’, return ‘,’.
  • Otherwise, return ‘t’.

Parameters

fname : string

File path assumed to be for a separated-variable file.

Returns

delimiter : string

String in [‘t’, ‘,’], the inferred delimiter.
tabular.io.inferdialect(fname=None, datalines=None, delimiter_regex=None, verbosity=5)

Attempts to convert infer dialect from csv file lines.

Essentially a small extension of the “sniff” function from Python CSV module. csv.Sniffer().sniff attempts to infer the delimiter from a putative delimited text file by analyzing character frequencies. This function adds additional analysis in which guesses are checked again the number of entries in each line that would result from spliiting relative to that guess. If no plausable guess if found, delimiter is inferred from file name (‘csv’ yields ‘,’, everything else yields ‘ ‘.)

Parameters

fname : pathstring

name of file

datalines : list of strings

list of lines in the data file

lineterminator : single-character string

lineterminator to join split/join line strings

Returns

csv.Dialect obejct
tabular.io.processmetadata(metadata, items=None, comments=None, delimiter_regex=None, ncols=None, verbosity=5)

Process Metadata from stored (or “packed”) state to functional state.

Metadata can come be read from a file “packed” in various ways, e.g. with a string representation of a dialect or coloring dictionary. This function “unpacks” the stored metadata into useable Python objects. It consists of a list of quasi-modular parts, one for each type of recognized metadata.

Parameters

metadata : dictionary

This argument is a dictionary whose keys are strings denoting different kinds of metadata (e.g. “names” or “formats”) and whose values are the metadata of that type. The metadata dictionary is modified IN-PLACE by this function.

items : string or list of strings, optional

The items arguments specifies which metadata keys are to be processed. E.g. of items = ‘names,formats’, then the “names” metadata and “formats” metadata will be processed, but no others Note however, that sometimes, the processing of one type of metadata requires that another be processed first, e.g. “dialect” must processed into an actual CSV.dialect object before “names” is processed. (The processed of “names” metadata involves splitting the names metadat string into a list, using the delmiter. This delimiter is part of the dialect object.) In these cases, if you call processmetadata on one item before its requirements are processed, nothing will happen.

comments : single-character string, optional

The comments character is used to process many pieces of metadata, e.g. it is striped of the left side of names and formats strings before splitting on delimiter.

verbosity : integer, optional

Determines the level of verbosity in the printout of messages during the running of the procedure.

Returns

Nothing
tabular.io.inferheader(lines, comments=None, metadata=None, verbosity=5)

Infers header from a CSV or other tab-delimited file.

This is essentially small extension of the csv.Sniffer.has_header algorithm. provided in the Python csv module. First, it checks to see whether a metametadata dictionary is present, specifiying the lines numbers of metadata lines in the header, and if so, sets the header lines to include at least those lines. Then iookms to see if a comments character is present, and if so, includes those lines as well. If either of the above returns a nono-zero number of headerlines, the function returns that number; otherwise, it uses the csv.Sniffer module, checking each line in succession, and stopping at the first line where the sniffer module finds no evidence of a header, and returning that line numner.

Parameters

lines : line of strings

The list of lines representing lines in the file

comments : single-character string, optional

Comments character specification.

metadata : metadata dictionary, optional

used to deterine a comments character and metametadata dicationary, if present,

Returns

Integer, representing the number of (inferred) header lines at the top of the file
tabular.io.readstoredmetadata(fname, skiprows=0, linenumber=None, comments='#', metametadata=None, verbosity=5)

Read metadata from a delimited text file.

Previous topic

tabular.tab

Next topic

tabular.spreadsheet

This Page