12.4. C Language Functions

User-defined functions can be written in C (or a language that can be made compatible with C, such as C++). Such functions are compiled into dynamically loadable objects (also called shared libraries) and are loaded by the server on demand. This distinguishes them from internal functions.

Two different calling conventions are currently used for C functions. The newer "version 1" calling convention is indicated by writing a PG_FUNCTION_INFO_V1() macro call for the function, as illustrated below. Lack of such a macro indicates an old-style ("version 0") function. The language name specified in CREATE FUNCTION is 'C' in either case. Old-style functions are now deprecated because of portability problems and lack of functionality, but they are still supported for compatibility reasons.

12.4.1. Dynamic Loading

The first time a user-defined function in a particular loadable object file is called in a backend session, the dynamic loader loads that object file into memory so that the function can be called. The CREATE FUNCTION for a user-defined C function must therefore specify two pieces of information for the function: the name of the loadable object file, and the C name (link symbol) of the specific function to call within that object file. If the C name is not explicitly specified then it is assumed to be the same as the SQL function name.

Note: After it is used for the first time, a dynamically loaded user function is retained in memory, and future calls to the function in the same session will only incur the small overhead of a symbol table lookup.

The following algorithm is used to locate the shared object file based on the name given in the CREATE FUNCTION command:

  1. If the name is an absolute file name, the given file is loaded.

  2. If the name starts with the string $libdir, that part is replaced by the PostgreSQL library directory, which is determined at build time.

  3. If the name does not contain a directory part, the file is searched the path specified by the configuration variable dynamic_library_path.

  4. Otherwise (the file was not found in the path, or it contains a non-absolute directory part), the dynamic loader will try to take the name as given, which will most likely fail. (It is unreliable to depend on the current working directory.)

If this sequence does not work, the platform-specific shared library file name extension (often .so) is appended to the given name and this sequence is tried again. If that fails as well, the load will fail.

Note: The user id the PostgreSQL server runs as must be able to traverse the path to the file you intend to load. Making the file or a higher-level directory not readable and/or not executable by the "postgres" user is a common mistake.

In any case, the file name that is specified in the CREATE FUNCTION command is recorded literally in the system catalogs, so if the file needs to be loaded again the same procedure is applied.

It is recommended to locate shared libraries either relative to $libdir or through the dynamic library path. This simplifies version upgrades if the new installation is at a different location.

Note: PostgreSQL will not compile a function automatically; it must be compiled before it is used in a CREATE FUNCTION command. See Section 12.4.7 for additional information.

12.4.2. Base Types in C-Language Functions

The following table gives the C type required for parameters in the C functions that will be loaded into Postgres. The "Defined In" column gives the actual header file (in the .../src/backend/ directory) that the equivalent C type is defined. Note that you should always include postgres.h first, and that in turn includes c.h.

Table 12-1. Equivalent C Types for Built-In Postgres Types

Built-In Type C Type Defined In
abstimeAbsoluteTimeutils/nabstime.h
boolboolinclude/c.h
box(BOX *)utils/geo-decls.h
bytea(bytea *)include/postgres.h
"char"charN/A
cidCIDinclude/postgres.h
datetime(DateTime *)include/c.h or include/postgres.h
int2int2 or int16include/postgres.h
int2vector(int2vector *)include/postgres.h
int4int4 or int32include/postgres.h
float4(float4 *)include/c.h or include/postgres.h
float8(float8 *)include/c.h or include/postgres.h
lseg(LSEG *)include/geo-decls.h
name(Name)include/postgres.h
oidoidinclude/postgres.h
oidvector(oidvector *)include/postgres.h
path(PATH *)utils/geo-decls.h
point(POINT *)utils/geo-decls.h
regprocregproc or REGPROCinclude/postgres.h
reltimeRelativeTimeutils/nabstime.h
text(text *)include/postgres.h
tidItemPointerstorage/itemptr.h
timespan(TimeSpan *)include/c.h or include/postgres.h
tintervalTimeIntervalutils/nabstime.h
xid(XID *)include/postgres.h

Internally, Postgres regards a base type as a "blob of memory." The user-defined functions that you define over a type in turn define the way that Postgres can operate on it. That is, Postgres will only store and retrieve the data from disk and use your user-defined functions to input, process, and output the data. Base types can have one of three internal formats:

By-value types can only be 1, 2 or 4 bytes in length (also 8 bytes, if sizeof(Datum) is 8 on your machine). You should be careful to define your types such that they will be the same size (in bytes) on all architectures. For example, the long type is dangerous because it is 4 bytes on some machines and 8 bytes on others, whereas int type is 4 bytes on most Unix machines (though not on most personal computers). A reasonable implementation of the int4 type on Unix machines might be:

/* 4-byte integer, passed by value */
typedef int int4;

On the other hand, fixed-length types of any size may be passed by-reference. For example, here is a sample implementation of a Postgres type:

/* 16-byte structure, passed by reference */
typedef struct
{
    double  x, y;
} Point;

Only pointers to such types can be used when passing them in and out of Postgres functions. To return a value of such a type, allocate the right amount of memory with palloc(), fill in the allocated memory, and return a pointer to it. (Alternatively, you can return an input value of the same type by returning its pointer. Never modify the contents of a pass-by-reference input value, however.)

Finally, all variable-length types must also be passed by reference. All variable-length types must begin with a length field of exactly 4 bytes, and all data to be stored within that type must be located in the memory immediately following that length field. The length field is the total length of the structure (i.e., it includes the size of the length field itself). We can define the text type as follows:

typedef struct {
    int4 length;
    char data[1];
} text;

Obviously, the data field shown here is not long enough to hold all possible strings; it's impossible to declare such a structure in C. When manipulating variable-length types, we must be careful to allocate the correct amount of memory and initialize the length field. For example, if we wanted to store 40 bytes in a text structure, we might use a code fragment like this:

#include "postgres.h"
...
char buffer[40]; /* our source data */
...
text *destination = (text *) palloc(VARHDRSZ + 40);
destination->length = VARHDRSZ + 40;
memmove(destination->data, buffer, 40);
...

Now that we've gone over all of the possible structures for base types, we can show some examples of real functions.

12.4.3. Version-0 Calling Conventions for C-Language Functions

We present the "old style" calling convention first --- although this approach is now deprecated, it's easier to get a handle on initially. In the version-0 method, the arguments and result of the C function are just declared in normal C style, but being careful to use the C representation of each SQL data type as shown above.

Here are some examples:

#include "postgres.h"
#include <string.h>

/* By Value */
         
int
add_one(int arg)
{
    return arg + 1;
}

/* By Reference, Fixed Length */

float8 *
add_one_float8(float8 *arg)
{
    float8    *result = (float8 *) palloc(sizeof(float8));

    *result = *arg + 1.0;
       
    return result;
}

Point *
makepoint(Point *pointx, Point *pointy)
{
    Point     *new_point = (Point *) palloc(sizeof(Point));

    new_point->x = pointx->x;
    new_point->y = pointy->y;
       
    return new_point;
}

/* By Reference, Variable Length */

text *
copytext(text *t)
{
    /*
     * VARSIZE is the total size of the struct in bytes.
     */
    text *new_t = (text *) palloc(VARSIZE(t));
    VARATT_SIZEP(new_t) = VARSIZE(t);
    /*
     * VARDATA is a pointer to the data region of the struct.
     */
    memcpy((void *) VARDATA(new_t), /* destination */
           (void *) VARDATA(t),     /* source */
           VARSIZE(t)-VARHDRSZ);    /* how many bytes */
    return new_t;
}

text *
concat_text(text *arg1, text *arg2)
{
    int32 new_text_size = VARSIZE(arg1) + VARSIZE(arg2) - VARHDRSZ;
    text *new_text = (text *) palloc(new_text_size);

    VARATT_SIZEP(new_text) = new_text_size;
    memcpy(VARDATA(new_text), VARDATA(arg1), VARSIZE(arg1)-VARHDRSZ);
    memcpy(VARDATA(new_text) + (VARSIZE(arg1)-VARHDRSZ),
           VARDATA(arg2), VARSIZE(arg2)-VARHDRSZ);
    return new_text;
}

Supposing that the above code has been prepared in file funcs.c and compiled into a shared object, we could define the functions to Postgres with commands like this:

CREATE FUNCTION add_one(int4) RETURNS int4
     AS 'PGROOT/tutorial/funcs.so' LANGUAGE 'c'
     WITH (isStrict);

-- note overloading of SQL function name add_one()
CREATE FUNCTION add_one(float8) RETURNS float8
     AS 'PGROOT/tutorial/funcs.so',
        'add_one_float8'
     LANGUAGE 'c' WITH (isStrict);

CREATE FUNCTION makepoint(point, point) RETURNS point
     AS 'PGROOT/tutorial/funcs.so' LANGUAGE 'c'
     WITH (isStrict);
                         
CREATE FUNCTION copytext(text) RETURNS text
     AS 'PGROOT/tutorial/funcs.so' LANGUAGE 'c'
     WITH (isStrict);

CREATE FUNCTION concat_text(text, text) RETURNS text
     AS 'PGROOT/tutorial/funcs.so' LANGUAGE 'c'
     WITH (isStrict);

Here PGROOT stands for the full path to the Postgres source tree. Note that depending on your system, the filename for a shared object might not end in .so, but in .sl or something else; adapt accordingly.

Notice that we have specified the functions as "strict", meaning that the system should automatically assume a NULL result if any input value is NULL. By doing this, we avoid having to check for NULL inputs in the function code. Without this, we'd have to check for NULLs explicitly, for example by checking for a null pointer for each pass-by-reference argument. (For pass-by-value arguments, we don't even have a way to check!)

Although this calling convention is simple to use, it is not very portable; on some architectures there are problems with passing smaller-than-int data types this way. Also, there is no simple way to return a NULL result, nor to cope with NULL arguments in any way other than making the function strict. The version-1 convention, presented next, overcomes these objections.

12.4.4. Version-1 Calling Conventions for C-Language Functions

The version-1 calling convention relies on macros to suppress most of the complexity of passing arguments and results. The C declaration of a version-1 function is always

Datum funcname(PG_FUNCTION_ARGS)
In addition, the macro call
PG_FUNCTION_INFO_V1(funcname);
must appear in the same source file (conventionally it's written just before the function itself). This macro call is not needed for "internal"-language functions, since Postgres currently assumes all internal functions are version-1. However, it is required for dynamically-loaded functions.

In a version-1 function, each actual argument is fetched using a PG_GETARG_xxx() macro that corresponds to the argument's datatype, and the result is returned using a PG_RETURN_xxx() macro for the return type.

Here we show the same functions as above, coded in version-1 style:

#include "postgres.h"
#include <string.h>
#include "fmgr.h"

/* By Value */

PG_FUNCTION_INFO_V1(add_one);
         
Datum
add_one(PG_FUNCTION_ARGS)
{
    int32   arg = PG_GETARG_INT32(0);

    PG_RETURN_INT32(arg + 1);
}

/* By Reference, Fixed Length */

PG_FUNCTION_INFO_V1(add_one_float8);

Datum
add_one_float8(PG_FUNCTION_ARGS)
{
    /* The macros for FLOAT8 hide its pass-by-reference nature */
    float8   arg = PG_GETARG_FLOAT8(0);

    PG_RETURN_FLOAT8(arg + 1.0);
}

PG_FUNCTION_INFO_V1(makepoint);

Datum
makepoint(PG_FUNCTION_ARGS)
{
    /* Here, the pass-by-reference nature of Point is not hidden */
    Point     *pointx = PG_GETARG_POINT_P(0);
    Point     *pointy = PG_GETARG_POINT_P(1);
    Point     *new_point = (Point *) palloc(sizeof(Point));

    new_point->x = pointx->x;
    new_point->y = pointy->y;
       
    PG_RETURN_POINT_P(new_point);
}

/* By Reference, Variable Length */

PG_FUNCTION_INFO_V1(copytext);

Datum
copytext(PG_FUNCTION_ARGS)
{
    text     *t = PG_GETARG_TEXT_P(0);
    /*
     * VARSIZE is the total size of the struct in bytes.
     */
    text     *new_t = (text *) palloc(VARSIZE(t));
    VARATT_SIZEP(new_t) = VARSIZE(t);
    /*
     * VARDATA is a pointer to the data region of the struct.
     */
    memcpy((void *) VARDATA(new_t), /* destination */
           (void *) VARDATA(t),     /* source */
           VARSIZE(t)-VARHDRSZ);    /* how many bytes */
    PG_RETURN_TEXT_P(new_t);
}

PG_FUNCTION_INFO_V1(concat_text);

Datum
concat_text(PG_FUNCTION_ARGS)
{
    text  *arg1 = PG_GETARG_TEXT_P(0);
    text  *arg2 = PG_GETARG_TEXT_P(1);
    int32 new_text_size = VARSIZE(arg1) + VARSIZE(arg2) - VARHDRSZ;
    text *new_text = (text *) palloc(new_text_size);

    VARATT_SIZEP(new_text) = new_text_size;
    memcpy(VARDATA(new_text), VARDATA(arg1), VARSIZE(arg1)-VARHDRSZ);
    memcpy(VARDATA(new_text) + (VARSIZE(arg1)-VARHDRSZ),
           VARDATA(arg2), VARSIZE(arg2)-VARHDRSZ);
    PG_RETURN_TEXT_P(new_text);
}

The CREATE FUNCTION commands are the same as for the version-0 equivalents.

At first glance, the version-1 coding conventions may appear to be just pointless obscurantism. However, they do offer a number of improvements, because the macros can hide unnecessary detail. An example is that in coding add_one_float8, we no longer need to be aware that float8 is a pass-by-reference type. Another example is that the GETARG macros for variable-length types hide the need to deal with fetching "toasted" (compressed or out-of-line) values. The old-style copytext and concat_text functions shown above are actually wrong in the presence of toasted values, because they don't call pg_detoast_datum() on their inputs. (The handler for old-style dynamically-loaded functions currently takes care of this detail, but it does so less efficiently than is possible for a version-1 function.)

One big improvement in version-1 functions is better handling of NULL inputs and results. The macro PG_ARGISNULL(n) allows a function to test whether each input is NULL (of course, doing this is only necessary in functions not declared "strict"). As with the PG_GETARG_xxx() macros, the input arguments are counted beginning at zero. To return a NULL result, execute PG_RETURN_NULL(); this works in both strict and non-strict functions.

The version-1 function call conventions make it possible to return "set" results and implement trigger functions and procedural-language call handlers. Version-1 code is also more portable than version-0, because it does not break ANSI C restrictions on function call protocol. For more details see src/backend/utils/fmgr/README in the source distribution.

12.4.5. Composite Types in C-Language Functions

Composite types do not have a fixed layout like C structures. Instances of a composite type may contain null fields. In addition, composite types that are part of an inheritance hierarchy may have different fields than other members of the same inheritance hierarchy. Therefore, Postgres provides a procedural interface for accessing fields of composite types from C. As Postgres processes a set of rows, each row will be passed into your function as an opaque structure of type TUPLE. Suppose we want to write a function to answer the query

SELECT name, c_overpaid(emp, 1500) AS overpaid
FROM emp
WHERE name = 'Bill' OR name = 'Sam';
In the query above, we can define c_overpaid as:
#include "postgres.h"
#include "executor/executor.h"  /* for GetAttributeByName() */

bool
c_overpaid(TupleTableSlot *t, /* the current row of EMP */
           int32 limit)
{
    bool isnull;
    int32 salary;

    salary = DatumGetInt32(GetAttributeByName(t, "salary", &isnull));
    if (isnull)
        return (false);
    return salary > limit;
}

/* In version-1 coding, the above would look like this: */

PG_FUNCTION_INFO_V1(c_overpaid);

Datum
c_overpaid(PG_FUNCTION_ARGS)
{
    TupleTableSlot  *t = (TupleTableSlot *) PG_GETARG_POINTER(0);
    int32            limit = PG_GETARG_INT32(1);
    bool isnull;
    int32 salary;

    salary = DatumGetInt32(GetAttributeByName(t, "salary", &isnull));
    if (isnull)
        PG_RETURN_BOOL(false);
    /* Alternatively, we might prefer to do PG_RETURN_NULL() for null salary */

    PG_RETURN_BOOL(salary > limit);
}

GetAttributeByName is the Postgres system function that returns attributes out of the current row. It has three arguments: the argument of type TupleTableSlot* passed into the function, the name of the desired attribute, and a return parameter that tells whether the attribute is null. GetAttributeByName returns a Datum value that you can convert to the proper datatype by using the appropriate DatumGetXXX() macro.

The following query lets Postgres know about the c_overpaid function:

CREATE FUNCTION c_overpaid(emp, int4) 
RETURNS bool
AS 'PGROOT/tutorial/obj/funcs.so' 
LANGUAGE 'c';

While there are ways to construct new rows or modify existing rows from within a C function, these are far too complex to discuss in this manual.

12.4.6. Writing Code

We now turn to the more difficult task of writing programming language functions. Be warned: this section of the manual will not make you a programmer. You must have a good understanding of C (including the use of pointers and the malloc memory manager) before trying to write C functions for use with Postgres. While it may be possible to load functions written in languages other than C into Postgres, this is often difficult (when it is possible at all) because other languages, such as FORTRAN and Pascal often do not follow the same calling convention as C. That is, other languages do not pass argument and return values between functions in the same way. For this reason, we will assume that your programming language functions are written in C.

The basic rules for building C functions are as follows:

12.4.7. Compiling and Linking Dynamically-Loaded Functions

Before you are able to use your PostgreSQL extension function written in C they need to be compiled and linked in a special way in order to allow it to be dynamically loaded as needed by the server. To be precise, a shared library needs to be created.

For more information you should read the documentation of your operating system, in particular the manual pages for the C compiler, cc, and the link editor, ld. In addition, the PostgreSQL source code contains several working examples in the contrib directory. If you rely on these examples you will make your modules dependent on the availability of the PostgreSQL source code, however.

Creating shared libraries is generally analoguous to linking executables: first the source files are compiled into object files, then the object files are linked together. The object files need to be created as position-independent code (PIC), which conceptually means that they can be placed at an arbitrary location in memory when they are loaded by the executable. (Object files intended for executables are not compiled that way.) The command to link a shared library contains special flags to distinguish it from linking an executable. --- At least this is the theory. On some systems the practice is much uglier.

In the following examples we assume that your source code is in a file foo.c and we will create an shared library foo.so. The intermediate object file will be called foo.o unless otherwise noted. A shared library can contain more than one object file, but we only use one here.

BSD/OS

The compiler flag to create PIC is -fpic. The linker flag to create shared libraries is -shared.

gcc -fpic -c foo.c
ld -shared -o foo.so foo.o
This is applicable as of version 4.0 of BSD/OS.

FreeBSD

The compiler flag to create PIC is -fpic. To create shared libraries the compiler flag is -shared.

gcc -fpic -c foo.c
gcc -shared -o foo.so foo.o
This is applicable as of version 3.0 of FreeBSD.

HP-UX

The compiler flag of the system compiler to create PIC is +z. When using GCC it's -fpic. The linker flag for shared libraries is -b. So

cc +z -c foo.c
or
gcc -fpic -c foo.c
and then
ld -b -o foo.sl foo.o
HP-UX uses the extension .sl for shared libraries, unlike most other systems.

Irix

PIC is the default, no special compiler options are necessary. The linker option to produce shared libraries is -shared.

cc -c foo.c
ld -shared -o foo.so foo.o

Linux

The compiler flag to create PIC is -fpic. On some platforms in some situations -fPIC must be used if -fpic does not work. Refer to the GCC manual for more information. The compiler flag to create a shared library is -shared. A complete example looks like this:

cc -fpic -c foo.c
cc -shared -o foo.so foo.o

NetBSD

The compiler flag to create PIC is -fpic. For ELF systems, the compiler with the flag -shared is used to link shared libraries. On the older non-ELF systems, ld -Bshareable is used.

gcc -fpic -c foo.c
gcc -shared -o foo.so foo.o

OpenBSD

The compiler flag to create PIC is -fpic. ld -Bshareable is used to link shared libraries.

gcc -fpic -c foo.c
ld -Bshareable -o foo.so foo.o

Digital Unix/Tru64 UNIX

PIC is the default, so the compilation command is the usual one. ld with special options is used to do the linking:

cc -c foo.c
ld -shared -expect_unresolved '*' -o foo.so foo.o
The same procedure is used with GCC instead of the system compiler; no special options are required.

Solaris

The compiler flag to create PIC is -KPIC with the Sun compiler and -fpic with GCC. To link shared libraries, the compiler option is -G with either compiler or alternatively -shared with GCC.

cc -KPIC -c foo.c
cc -G -o foo.so foo.o
or
gcc -fpic -c foo.c
gcc -G -o foo.so foo.o

Unixware

The compiler flag to create PIC is -K PIC with the SCO compiler and -fpic with GCC. To link shared libraries, the compiler option is -G with the SCO compiler and -shared with GCC.

cc -K PIC -c foo.c
cc -G -o foo.so foo.o
or
gcc -fpic -c foo.c
gcc -shared -o foo.so foo.o

Tip: If you want to package your extension modules for wide distribution you should consider using GNU Libtool for building shared libraries. It encapsulates the platform differences into a general and powerful interface. Serious packaging also requires considerations about library versioning, symbol resolution methods, and other issues.

The resulting shared library file can then be loaded into Postgres. When specifying the file name to the CREATE FUNCTION command, one must give it the name of the shared library file (ending in .so) rather than the intermediate object file.

Note: Actually, Postgres does not care what you name the file as long as it is a shared library file.

Refer back to Section 12.4.1 about where the server expects to find the shared library files.