Miscellaneous utilities: uniqify, listunion, listintersection, perminverse

tabular.utils.uniqify(seq, idfun=None)

Relatively fast pure Python uniqification function that preservs ordering.


seq : sequence

Sequence object to uniqify.

idfun : function, optional

Optional collapse function to identify items as the same.


result : list

Python list with first occurence of each item in seq, in order.

Take the union of a list of lists.

Take a Python list of Python lists:

[[l11,l12, ...], [l21,l22, ...], ... , [ln1, ln2, ...]]

and return the aggregated list:

[l11,l12, ..., l21, l22 , ...]

For a list of two lists, e.g. [a, b], this is like:



ListOfLists : Python list

Python list of Python lists.


u : Python list

Python list created by taking the union of the lists in ListOfLists.

Fast inverse of a (numpy) permutation.


s : sequence

Sequence of indices giving a permutation.


inv : numpy array

Sequence of indices giving the inverse of permutation s.

Returns a null value for each of the various kinds of numpy formats.

Default null value function used in tabular.spreadsheet.join().


format : string

Numpy format descriptor, e.g. '<i4', '|S5'.


null : element in [0, 0.0, ‘’]

Null value corresponding to the given format:

  • if format.startswith(('<i', '|b')), e.g. format corresponds to an integer or Boolean, return 0
  • else if format.startswith(‘<f’), e.g. format corresponds to a float, return 0.0
  • else, e.g. format corresponds to a string, return ‘’

Returns a null value for each of various kinds of test values.


test : bool, int, float or string

Value to test.

null : element in [False, 0, 0.0, ‘’]

Null value corresponding to the given test value:

  • if test is a bool, return False
  • else if test is an int, return 0
  • else if test is a float, return 0.0
  • else test is a str, return ‘’

Infer the data type (int, float, str) of a list of strings.

Take a list of strings, and attempts to infer a numeric data type that fits them all.

If the strings are all integers, returns a NumPy array of integers.

If the strings are all floats, returns a NumPy array of floats.

Otherwise, returns a NumPy array of the original list of strings.

Used to determine the datatype of a column read from a separated-variable (CSV) text file (e.g. .tsv, .csv) of data where columns are expected to be of uniform Python type.

This function is used by tabular load functions for SV files, e.g. by :func`` when type information is not provided in the header, and by


column : list of strings

List of strings corresponding to a column of data.


out : numpy array

Numpy array of data from column, with data type int, float or str.

Previous topic


This Page