cgiemail 2.0 User Guide

March, 2003 -- lcs

This is a tutorial and reference manual for the Web application developer. It assumes you already have a Web server with the cgiemail program installed on it, and that you are ready to start creating cgiemail applications.

This document describes cgiemail Release 2.0. It is upwardly compatible with previous releases (which is the reason for some inconsistent naming practices and odd syntax artifacts). It also contains many new features not found in any previous release.

Contents

Overview

A cgiemail application consists of an HTML form page to collect inputs from the user, and a template file which describes how the resulting email message (sent by cgiemail) is to be constructed. The cgiemail program itself runs on the Web server. It is invoked by the server when it receives the right kind of request. The "cgi" in its name is an acronym for Common Gateway Interface, a standard mechanism used by Web servers to run special application programs to satisfy Web requests.

The form page is a regular HTML page containing a FORM tag that presents a fill-in form to the user. This, in turn, contains INPUT tags to collect values that are passed to cgiemail as CGI variables, or inputs. The form page can be static or dynamic; the only part that matters, from cgiemail's point of view, is the ACTION attribute which invokes cgiemail.

The template is a plain text file which is stored on the Web server or is available to it through a network filesystem. It contains the text of the message interspersed with variable invocations which are replaced with the values of CGI variables as the message is generated.

Creating an Application

To build a simple application, all you have to do is: Test ~by loading the form page into your Web browser, then submit the form. This should invoke cgiemail on the Web server to send the email message.

Since the names of the INPUT tags in your form must match the "variables" in your template, it is important to keep them synchronized. You may want to start by writing the template first, or by just listing the variables. We describe the HTML form first here, because that is what the user sees first.

How it Works

Understanding the mechanism of cgiemail may help you make sense of the following instructions. You can skip this section if you don't care about the technical details, or if you know so much about CGI applications that it's obvious.

  1. Load the application's HTML page into your browser. This page contains a FORM tag with an ACTION attribute that invokes the cgiemail program. However, cgiemail is not even in the picture until you click the "Submit" button. You can do whatever you like to the form's inputs.

  2. Click "Submit", and your browser invokes the ACTION URL, passing it all of the inputs from the form as CGI arguments. Now, cgiemail is in control, running on the Web server. It is responsible for the next two events:

  3. If the inputs are acceptable (i.e. if no required inputs were left blank), and there are no fatal errors, it creates a message from the email template file and sends it off. If the inputs are not acceptable, no mail is sent.

  4. Finally, cgiemail sends its response page back to the browser. When the email was sent off successfully, this will be either the default success page (which simply displays the message that was sent), or a custom success page (see below) if one was configured. If it failed, the result is a failure page describing the error or, optionally, a custom failure page.

Writing the HTML Form

This page assumes you already know some basic HTML and have some exposure to forms and the Common Gateway Interface (CGI). If you need to know more about these topics, please turn to the World Wide Web Consortium (W3C)'s HTML Home Page.

The FORM tag is the only part of the HTML code dictated by cgiemail:

  1. It must contain the attribute METHOD=POST.

  2. The ACTION attribute is constructed as follows (e.g.):
    http://web.mit.edu/bin/cgiemail/florey/www/questions3.txt
    |                             ||                        |
    |<--URL of cgiemail program-->||<--- URI of template -->|
    
    The "URI of the template" is the path to the template file on your Web server, relative to the server's document root. This means, if you were to append the URI of the template to "http://" and the hostname of your Web server, you would get an URL pointing to the template page itself, e.g.
    http://web.mit.edu/florey/www/questions3.txt
    |                ||                        |
    |<-- Hostname -->||<--- URI of template -->|
    
    This is how cgiemail finds your template -- it reads the path from the rest of the URL that invoked it.

Now create the contents of your form. Each tag that defines an input (such as INPUT, SELECT, etc) must have a NAME attribute, so you can refer to it by name in the template. We recommend enclosing the NAME attribute's value in quotes, and using a descriptive variable name, e.g.

  <b>Month:</b> <INPUT TYPE=TEXT NAME="birthday-month" SIZE=20>
This tag defines a variable named birthday-month in the template. Variable names are case-sensitive, and may include alphanumeric characters, the dash ('-'), and underscore ('_').

Creating a Simple Template

This section describes how to create a very simple template, which just inserts the inputs entered in the form into the email message. The template "language" includes some powerful features that let you manipulate and process the inputs. They are discussed later, in the "Advanced Features" section.

Template Syntax

The template is a plain text file. The contents of the template are copied verbatim into the email message, except for variable invocations. These are always surrounded by square brackets ([, ]). To insert the value of a variable, just write its name surrounded by square brackets. For example, this line includes two values:
  The sender's birthday is [birthday-month] / [birthday-day]
The result in the message will look like:
  The sender's birthday is 09 / 02

If you need to put a literal square bracket into the message, just precede it with a backslash (\) character, like so:

  Did you misunderestimate \[sic\] the total? [answer]
The backslash tells cgiemail to ignore the special powers of that square bracket character.

The Structure of an Email Message

The product of your template becomes an electronic mail message, so it must follow strict rules of formatting to be acceptable to the electronic mail system. The complete rules for simple text messages are in the Internet standards document RFC 2822.

For most applications, these rules are all you need to know:

  1. The text message is divided into lines. A line is defined by the newline character or characters that end it; depending on your platform these may be any of: A blank line is a line with no characters other than spaces before the newline.

    If you create the template file with a document editor such as MS Word, make sure it inserts newlines where you want them. You may find it easier to use a simple text editor (such as Emacs) on template files.

  2. Begin the message with email header lines. All of the lines before the first blank line are considered headers. Do not allow a blank line to be inserted in the headers, since it will end them prematurely.

    Headers are like the envelope of a physical letter. Each header line describes something about the message, such as the sender, recipient, or subject.

    Each header line starts with a label that ends in a colon (':'). The rest of the line, after the colon, is the value. To send a simple email message, you only need to be concerned with these headers:

    From: email-address
    This identifies the sender of the message. When the recipient replies to it, the reply goes here. Also, if there is any difficulty delivering the message, the mail system returns the message to the From: address. You should use your own email address or a mailing list concerned with the Web page, e.g.
      From: python-fans@mit.edu
    
    To: email-address [, email-address ..]
    The recipient(s) of the message. If you want to send it to multiple addresses, separate them with commas.

    Cc: email-address [, email-address ..]
    This is the "carbon copy" header, which acts like another "To:" line, although it has the implication that the "Cc:" recipients are just getting a copy -- the message is primarily intended for the "To:" recipients. You can use this to give the user filling out your form a copy of the mail; for example, if they were asked to enter their email address in an input named "my-email", add a header like:
      Cc: [my-email]
    
    Subject: text..
    Some text describing the purpose and content of the message. You can use this to identify the messages coming from your cgiemail application with obvious keywords, e.g.
      Subject: THREE QUESTIONS RESPONSE
    

  3. Always include a blank line after the last header line. This separates the headers from the body (which is defined as "the rest of the message"). Without a blank line, your message may be rejected by the mail system or cause unexpected results.

  4. The body of the message contains whatever you want it to. Please include real line breaks so the message has the correct appearance to recipients with text-based mail readers.

  5. You can other kinds of header lines, if you like, but be sure to follow the rules in RFC 2822.

Simple Example

Here is a simple example of a cgiemail application. First, here is the source for the HTML page:

<FORM METHOD="POST" ACTION="http://web.mit.edu/bin/cgiemail/florey/www/questions3.txt"> Your e-mail address: <INPUT NAME="email"><p> Your name: <INPUT NAME="yourname"><p> Your quest: <INPUT NAME="quest"><p> Your favourite colour: <INPUT NAME="colour"><p> <INPUT TYPE="submit" value="Send e-mail"> </FORM>

This is the corresponding template file. This example presumes it is stored under the path /mit/florey/www/questions3.txt on the Web server, so the URL that invokes cgiemail is http://web.mit.edu/bin/cgiemail/florey/www/questions3.txt.


To: strangeman@big-chasm.org From: strangeman@big-chasm.org Cc: [email] Subject: questions three What is your name? [yourname] What is your quest? [quest] What is your favourite colour? [colour]

In a Web browser, the form looks like this:


Your e-mail address:
Your name:
Your quest:
Your favourite colour:

...And it sends this email message:


To: strangeman@big-chasm.org From: strangeman@big-chasm.org Cc: gollum@shire.me Subject: questions three What is your name? Gollum What is your quest? Reclaim my preciousssss! What is your favourite colour? gold! ..no, red..aiiEEEE!
Try copying the example HTML page and template to your own system. You will have to change the value of the ACTION attribute of the FORM tag in the HTML to get it to work, but that is all you'll need to change.

Then, you can experiment with adding different kinds of inputs and changing the template to get used to designing with cgiemail.

Advanced cgiemail Features

The following sections describe all of the powerful features of cgiemail. Most features are controlled either by specially named CGI variables, or by extra syntax elements in the template page.

Built-In CGI Variables

Cgiemail defines some variables for you. You can use them in the same way as you use CGI input variables, except they are not defined in the HTML page. In fact, you may not define names in your HTML page.

The pre-defined variables are:

Name Description of Contents
cgierrmsg The error message describing the most recent failure in cgiemail. It is intended to be used in a failure page template. See the Success and Failure Templates for more information.
cgierrinfo This is set to the extra information (if any) about the last error that occurred in cgiemail. It is intended to be used in a failure page template (see above).
cgidate The current date and time in the local timezone. The format is fixed: "day-of-weekdate Month year hh:mm:ss", for example, Sat, 01 Feb 2003 14:38:39 EST. If you want more control over the date format, consider using Server Side Includes on your HTML page to generate the value of a hidden INPUT tag, which then becomes a value you can insert in the template.
cgiuniqueid A string of numbers and a period (.) which is unique to this invocation of cgiemail. The string is made up of the current timestamp (including a 4-digit year) and the process ID, so even two cgiemail processes running at the same time will have different values of cgiuniqueid.

It is intended for applications where the message generated by a Web page is fed into a record management system (such as a database or spreadsheet). Even if a user submits the same form several times, the value of cgiuniqueid is different in each one, so you can distinguish them in your spreadsheet or database.

The cgiuniqueid value can also be passed along to subsequent Web pages (through the success template) to identify them as well, and allow you to "link" their data to the first page.

cgirelease The release (version) of the cgiemail program executing your form, e.g. "2.0".

Special Input CGI Variables

Some input variable names which have special meaning to cgiemail. Note that the names are case-sensitive.

Name Description
required-name
  or
requiredname
Any input whose name begins with the special word required is considered mandatory: if it does not have a value, the form is rejected with the error "Required field was left blank", followed by the variable name. (See the cgilabel section for a way to customize this error message.)

An input is considered blank if its value is the empty string.

Required inputs are only checked when they are used in the template. If a required input only appears in a section of the template which is conditionalized out, then it will not be checked.

cgilabel-name An input whose name begins with the special prefix cgilabel- defines a label for another input variable. The label is used in error messages instead of that variable's name.

Labels are intended to be used with required variables. When your required variable has a cryptic or unintuitive name, use the label to make the error message for its missing value more understandable.

For example, your form has an input required-choice2. If the user leaves this blank, she gets a message like, Required field was left blank: required-choice2. It is not obvious what part of the form this refers to!

Now, add the HTML:
<INPUT TYPE=HIDDEN NAME="cgilabel-required-choice2" VALUE="What is your Favorite Color?">
The error will now appear as Required field was left blank: What is your Favorite Color?.

addendum The value of addendum is displayed at the end of the automatically-generated result page, the page returned when the cgiemail form is submitted.

If your HTML page defines a custom failure or success page (see next two variables), the automatic page is not used, so this variable is ignored.

success
required-success
Setting this to a URL or the filename of a template file changes the page sent to the requestor's browser when the cgiemail request completes successfully.

See the Alternate "Success" (and "Failure") Page section for instructions on how to use this feature.

failure
required-failure
Setting this to a URL or the filename of a template file changes the page sent to the requestor's browser when the cgiemail request fails with a fatal error.

See the Alternate "Success" (and "Failure") Page section for instructions on how to use this feature.

cgiemail-mailopt When set to "sync", the mail-sending program is run in synchronous mode so its diagnostic output is captured. Useful for debugging mail problems. By default, mail is sent asynchronously so some diagnostic information is not available.

This feature is intended for use by system administrators, so if the preceding paragraph makes no sense to you, just ignore this variable.

debug-source When this is set to a non-empty value, a crude rendition of the CGI argument source (i.e. POST arguments or query string) is shown at the end of the default success page. This is not useful for most application developers and users, unless they are debugging a Web server.

Accessing Environment Variables

In a template, any variable name beginning with "$" (dollar sign) is evaluated as a Unix environment variable, as if it were passed to the getenv(3) C library function. This gives your template access to all of the information passed through environment variables in the Common Gateway Interface (CGI). For example, [$REMOTE_USER] in a template will be replaced by the value of the REMOTE_USER variable in the CGI environment.

The CGI Specification, at http://www.w3.org/CGI/, lists all of these variables.

Some Web servers set some additional environment variables not mentioned in the CGI specification. For example, the Apache-SSL server puts the client certificate's subject information in environment variables when the request comes over an SSL connection. You can also access these in your cgiemail template.

Formatting of Values

You can change the way a value appears in the template by adding formatting options inside the square brackets of the variable invocation. All of the previous examples have shown simple variable invocations, with no formatting. The complete, general syntax of a variable invocation is:
        [ { format,  { argument,.. } }  variable-name ]
The format and arguments appear within braces to show they are optional. If included, the format and any arguments are each followed by a comma separator to distinguish them from the rest of the invocation. If you need to include a literal comma in a format or argument, "escape" it with a backslash ("\"), e.g.
    [%s\, ,multi-foo]

The following formats are available:

Format Args Description
%H n/a HTML-encoded: The variable's value is encoded so that any special characters appear literally in an HTML context, instead of having their special effect. For example, the metacharacters "<" and ">" are replaced by the appropriate character entities. This is useful in templates that generate an HTML page.
%U n/a URL-encoded: Encode the value for inclusion in a URL. Spaces and metacharacters are encoded as "%" followed by the hexadecimal character code.
%s n/a Printf string: Format the value as a printf-style string. The format may include any fixed width and precision specifiers accepted by the standard C library function printf(3); e.g. "%-12s" left-justifies the value in a field 12 characters wide. The format string may also include trailing characters after the format directive which are output after each value. The trailing text can include special characters like "\n" (newline).
%f n/a Printf floating-point: Format the value as a printf-style floating point number. The format may include any fixed width and precision specifiers accepted by the standard C library function printf(3); e.g. "%.2f" displays a number with two digits to the right of the decimal point, such as a dollar amount. The format string may include trailing text the %s format.
%d, %x, %o n/a Printf integer: Format the value as a printf-style integer number. The format may include any fixed width and precision specifiers accepted by the standard C library function printf(3); e.g. "%04d" displays a number at least four digits wide with leading zeroes if needed. The format string may include trailing text the %s format.
@separators index Token: This format tokenizes the variable's value and selects one "word", or token, designated by the index number. The first word is numbered 1.

Tokenizing breaks the value into "words" separated by any one or more of the separator characters (which appear directly after the @). See the examples below to help understand how this works.

~/regexp/ n/a Regular Expression: Extracts the portion of the variable's value that matches the regular expression regexp. The regular expression follows the rules for Extended Standard Regular Expressions. If the regular expression includes any subexpressions (enclosed in parentheses), the text matching the first subexpression is extracted instead of the text matching the entire expression. If the variable's value is not a match, this format returns the empty string.

Remember to \-escape the square bracket, backslash, and comma characters ( [ ] \ , ) within a variable invocation. You must thus use two backslashes to escape any regular expression metacharacters.

= expression n/a Evaluate Arithmetic Expression: Outputs the result of the arithmetic expression following the "=". See the section on Arithmetic and Logical Expressions for more details.

To apply special formatting, e.g. a floating-point printf format, to the result of an expression, you can assign it to a CGI variable first, and then format that variable in a separate invocation.

"string literal" n/a String Literal: This format outputs whatever string is between the quotes. The closing quote is actually optional, but it is allowed because experienced programmers will feel uncomfortable leaving an "unclosed" quote.

Although this format may seem redundant in the context of a template, it has a purpose in the variable-assignment pseudo-format.

Examples

These examples of variable invocations are shown in a template environment with the following CGI variables and values:
        oneword  =   "Example<TAG>"
        multiword =  "one, two, three"
        purchase1 =  "Widget @ $13.59"
        purchase1qty = "3"

Variable Invocation Output
[%H,oneword] Example&lt;TAG&lt;
[%U,oneword] Example%3CTAG%3E
[%-20s,oneword] Example<TAG>        
[@ ,2,multiword] two,
[@\, ,2,multiword] two
[@w,1,multiword] one, t
[~/\[0-9\\.\]+/,purchase1] 13.59
[:amt, ~/\[0-9\\.\]+/,purchase1]
[= amt * purchase1qty]
40.77000000

Variables with Multiple Values

If your HTML page contains more than one INPUT tag with the same NAME attribute, the corresponding CGI variable has multiple values. When cgiemail evaluates a variable invocation that generates output, it repeats the invocation for each value. Format invocations such as %s can control how the values are spliced together in the generated text.

Most formats will output the formatted result for each value, with a space character as a separator after each value but the last. The printf-based formats, however, do not append extra spaces -- so you have total control over the way values are separated.

In the following example, the brother input has three values: "Groucho", "Harpo", and "Chico".

Variable Invocation Output
[%s\n,brother] Groucho
Harpo
Chico
[%s\, ,brother] Groucho, Harpo, Chico
[@o,1,brother] Gr Harp Chic

Arithmetic and Logical Expressions

The pseudo-format "evaluate" introduced by the equals-sign (=), and the conditionalization invocations both employ arithmetic and logical expressions. An expression is a description of a computation that returns a value, conforming to a language with precise rules. Spreadsheet programs and computer programming languages use expressions, and an expression language, this way too.

A cgiemail expression is written in the familiar algebraic notation employed by widespread computer languages like C and Java. An expression is a sequence of values and operators. Values are either variable names or literal (constant) expressions such as numbers and strings. Operators are special characters representing a computation, like the plus sign (+), which says to add the values on either side of it.

Values

Operators

The following table lists all of the supported operators by order of precedence: higher-precedence calculations are done before lower-precedence ones, although sub-expressions within parentheses are done first. Arithmetic operators return a floating point value. Logical operators return either 1.0 for true or 0.0 for false.

Operator Type Description
( ) -- Parentheses: These define a subexpression whose value is computed first, regardless of the precedence of surrounding operators. The only effect of parentheses is to ensure the expression they enclose has the highest precedence.
* Arith. Multiplication: Result is the product of left-hand and right-hand values.
/ Arith. Division: Result is left-hand value divided by right-hand value.
+ Arith. Addition: Result is the sum of the left-hand and right-hand values.
- Arith. Subtraction: Result is the left-hand value minus the right-hand value.
! Log. Logical Inverse: Invert the logical sense of the value to the right of the operator. This is the only unary (takes one operand) operator.
<
<=
Log. Less Than: Returns true when left-hand value is less than (or equal, in the case of <=) the right-hand value. Strings are compared lexically (and case-sensitive), while numbers are compared arithmetically. Strings may not be compared to numbers.
>
>=
Log. Greater Than: Returns true when left-hand value is greater than (or equal, in the case of >=) the right-hand value. Strings are compared lexically (and case-sensitive), while numbers are compared arithmetically. Strings may not be compared to numbers.
==
!=
Log. Equality: Returns true when left-hand value is the same as the right-hand value, or different in the case of !=. Strings are compared lexically (and case-sensitive), while numbers are compared arithmetically. Strings may not be compared to numbers.
~
!~
Log. Regular Expression: Returns true when left-hand string value matches the regular expression in the right-hand argument. The regular expression is a string value, such as a literal, and may include optional slash (/) delimiters at the start and end. It must conform to the rules for Extended Standard Regular Expressions. The ~ operator tests for a match, while the !~ operator is true if the operands do not match.

Remember that you must escape the square bracket, backslash, and comma characters ( [ ] \ , ) within the variable invocation. You must thus use two backslashes to escape any regular expression metacharacters.

Example: purchase1 ~ "/\[0-9\\.\]+/" will return true if the variable purchase1 contains a number.

&& Log. Logical And: Returns true when the left-hand value and the right-hand value are both logically true.
|| Log. Logical Or: Returns true when either the left-hand value or the right-hand value is logically true.

These rules also apply to expressions:

Examples

Expression Result
50.00 + 13 * guests 76.00 (when guests == 2)
'abc' > "def" 0.0
!"" && ("" || "x" = "x") 1.0

Setting Variable Values

The colon (:) pseudo-format assigns a new value to a CGI variable. It can also add a new variables, when you set a variable that was not defined already. The general syntax is:

[:variable, invocation]

Where: variable is the name of the variable to be set, and invocation is the inside part (within the []) of a variable invocation that provides the new value of the variable. The variable is set to whatever would normally be output by that invocation. The "nested" invocation may use any of the formatting invocations described above. The string-literal format (") is specifically provided so variables can be set to a literal value.

The values of environment variables cannot be set.

Examples:

    [:total, = 30 * guests]   -- Set total to result of calculation
    [:firstname, @ ,1,name]   -- Extract first word of "name"
    [:lastpage, "true"]       -- Set variable to the literal string "true"

Conditionalization

Conditionalization pseudo-invocations let you selectively disable and enable regions of the the template as it is processed. When a section of the template is conditionalized "off", cgiemail acts as if that section were deleted. The variable invocations in such an "off" section are not evaluated, so:

The syntax of a conditional is:

    [#if  expression  ]
    [#elif  expression  ]
    [#else]
    [#endif]
The expression syntax is described in the section Arithmetic and Logical Expressions.

Conditionalizations may be nested. An outer conditional in an off (disabled) state also disables all of the conditionals within it.

Here is how each operator works:

[#if expression ]
If the expression is true, the template is processed normally until the next conditional:

If the expression is false, the template is ignored -- except for nested conditional operators -- until the next matching #else, #elif, or #endif conditional.

[#elif expression ]
This must follow an #if or #elif at the same level. When the preceding conditional was false, it acts just like #if; otherwise, if the preceding block was enabled, it acts like #else and disables everything after it until the closing #endif.

[#else]
This must follow an #if or #elif. When the preceding conditional was false, it re-enables the template and starts interpreting the template normally, until its matching #endif. When the preceding conditional was true, the template is ignored (including nested conditionals) until the closing #endif.

[#endif]
This must follow one of the other operators at the same level. The #endif operator "closes" the conditional and returns control to the next-outermost conditional, or no conditional if the top level was closed.

About Newlines:

The conditional operators only affect the text of the template between them. A newline character after the closing conditional is not controlled by that conditional. This can lead to extraneous newlines appearing in the template output: for example, this template demonstrates the problem.
    ***************************
    [#if required-color = /[rR]ed/]Good choice!
    [#else]Ick, why do you like [required-color]?
    [#endif]
    ***************************
The output will look like, e.g.:
    ***************************
    Ick, why do you like Puce?
    
    ***************************
..since the newline after the "[#endif]" always gets into the output. One solution is to begin the next line right after [#endif]:
    ***************************
    [#if required-color = /[rR]ed/]Good choice!
    [#else]Ick, why do you like [required-color]?
    [#endif]***************************

Examples

Conditional text lets you remove an entire line from the template entirely when the user omits an optional input, e.g.
    [#if favorite-flavor]Favorite Ice-Cream Flavor: [favorite-flavor]
    [#endif]
You can use conditionalizations to "translate" the value of an input from cryptic keywords to understandable phrases. For example, if the application has a menu to select the success input from a set of filenames, to control the result page, conditionalizations can translate the filename into a meaningful description:
    Payment Mechanism:
    [#if required-success == "ccard.html"]Credit Card
    [#elif required-success == "check.html"]Check in US dollars
    [#elif required-success == "none.html"]No payment required
    [#else]SANITY CHECK: unknown success value [required-success][#endif]
Since conditionalizations can be nested, a block of the form that includes conditionalizations can be controlled by one outer test:
    [#if use-billing-addr != "checked"]
        Name: [shipping-name]
        Street: [shipping-street][#if shipping-state]
        State: [shipping-state][#if shipping-zip] [shipping-zip][#endif][#endif]
    [#endif]

Stopping with an Error

The error pseudo-format lets your template stop processing and quit cgiemail with a fatal error. The general syntax is:
[! message ]
The message is the error message returned to the user in the error page.

This is mainly useful within a conditionalization, so you can add a test that makes the form fail with a fatal error if the inputs do not pass a validation test.

Example:

    [#if color !~ "/\[Rr\]ed/"][! Try choosing a color closer to red.][#endif]

Reference Table of Variable Invocations

Here are all of the formatting instructions and commands that may be written inside a variable invocation in a cgiemail template. They are listed together here to serve as a convenient reference for the experienced developer who just needs a reminder of the syntax of each command:

Syntax Description
[variable] Simple Variable Reference: The value of the variable is output with no alterations. Multiple values are separated by spaces.
[%H, variable] HTML-encoded: This format command outputs the variable's value with HTML metacharacters translated into character entities. Multiple values are separated by spaces.
[%U, variable] URL-encoded: This format command outputs the variable's value with special characters translated so it can appear in a URL. Multiple values are separated by spaces.
[%s, variable] Printf as string: The variable's value is formatted according to the specified printf string directive. See the Formatting section for more details. Multiple values are concatenated without intervening spaces.
[%f, variable] Printf as floating-point value: The variable's numeric value is formatted according to the specified printf float directive. See the Formatting section for more details.
[%d, variable]
[%x, variable]
[%o, variable]
Printf as integer value: The variable's numeric value is formatted according to the specified printf integer directive. See the Formatting section for more details.
[@sep,i,variable] Select a Token: Tokenize the variable's value using separator characters sep, and output token number i. Multiple values are separated by spaces. See the Formatting section for more details.
[~/regexp/, variable] Regular Expression Match: Output the portion of the variable's value that matches the regular expression, or its first subexpression if any. Multiple values are separated by spaces. See the Formatting section for more details.
[= expression] Evaluate Expression: Outputs the result of the arithmetic expression. See the Arithmetic and Logical Expressions section for more details.
["string literal"] Literal: The characters in string literal are output verbatim. This format is provided for use with with the variable-assignment pseudo-format.
[:variable, invocation] Assign Variable: The template variable is set to the text that would be output by invocation if it were a variable invocation. See the section on Setting Variable Values for more details.
[#if expression]
[#elif expression]
[#else]
[#endif]
Conditionalizations: These are special pseudo-invocations that let you "turn off" output of the template (and also disable side effects such as setting of variables). See the Conditionalization section for more details.
[! message] Error Message: When this invocation is encountered in the template, cgiemail stops processing with a fatal error, returning the given message. See the section Stopping with an Error for more details.

Expert Techniques with cgiemail

Here are some ways to use the advanced features of cgiemail to enhance your Web applications.

Testing with cgiecho

cgiecho is a companion program to cgiemail that just shows the output generated by the template file without sending any mail. It can help you test and debug templates more rapidly since you don't have to wait for an email message to see the latest results.

Bear in mind that cgiecho only shows the template output when it is allowed to send the default, automatically-generated "success" page. If you set the success or required-success CGI variable, the output from the template file will never be seen. A Web designer could take advantage of this feature, using cgiecho as a mechanism to get CGI arguments into the text of a Web page, e.g. to present a form with custom default values in its inputs.

You use cgiecho by changing the ACTION attribute of the FORM tag in your HTML page: substitute cgiecho for cgiemail in the URL. For example,

 ACTION="http://web.mit.edu/bin/cgiecho/florey/www/questions3.txt">
Try putting cgiecho in your own example page to see how it works.

Diagnosing Mail Problems

If your Web application seems to be working (i.e. cgiemail displays the success page), but you never get any email, here are some things to try:

  1. Check the headers in the message generated from the template. Is the To: line correct, and does it have your fully qualified email address (i.e. florey@mit.edu, not just "florey")? Does the template file start with the header lines, and not a blank line? (If you are using a custom "success" page, disable it temporarily to get the default success page which shows the filled-in template.)

  2. Normally, the cgiemail program dispatches email asynchronously, meaning it just starts the message on its way and then terminates. This saves time and gets your "success" page displayed more promptly, but if there is a problem sending the mail you will not find out about it.

    You can temporarily force cgiemail to wait for the mail to be delivered, and report its success. Add the following line to your HTML file, to supply a special extra input variable which is ready by cgiemail itself:

    <INPUT TYPE="hidden" NAME="cgiemail-mailopt" VALUE="sync">
    
    Be sure to remove this line once you have solved the problem since it will cause needless delays for all your users.

Making Inputs Mandatory

Use this technique when your HTML form has inputs where you want to be sure the user has entered something or actively made a choice. For example, your form has a checkbox they must check to indicate they have read an agreement.

When a mandatory input is left out, cgiemail stops with a fatal error indicating the missing input and does not send mail. This is an effective means of enforcing the rule.

To create a mandatory input, choose a CGI variable name that begins with the word "required". This is the variable name in the template, and in the NAME attribute of the HTML input tag. Variables named "requiredzipcode", "required-firstname", and "required_LastName" are all implicitly mandatory inputs.

See the Special Input CGI Variables section for more details about this, and about the cgilabel convention which you can use to improve the error messages for missing inputs.

Any type of INPUT tag can be made mandatory, by ensuring the default value is an empty string and giving it a name that starts with "required". For example, this menu choice has a default selection whose label instructs the user to make a choice, and whose value will be rejected by cgiemail:

  <B>I am a:</B>
  <SELECT name="required-ptype">
  <OPTION value="">--Choose One--
  <OPTION value="student">Student
  <OPTION value="faculty">Faculty Member
  <OPTION value="staff">Admin. &amp; Teaching Staff
  </SELECT>

Alternate Methods

Although the "required" prefix on variable names forces the user to enter some value, it accepts anything that is not an empty string. To apply more sophisticated validation tests, you can try:

Add Message to Default "Success" Page

This feature lets you add your own message to the end of cgiemail's default "success" page. Normally, cgiemail just displays "The following email message was sent", and a copy of the message. If you provide a value for the CGI variable "addendum", it will be added to the Web page, but not the email message.

For example, add this line to the HTML in the simple example, to add a note to the success page:

  <INPUT TYPE=HIDDEN NAME="addendum" VALUE="Thank you for your answers!">
You may not include HTML markup in the value, since that would cause a security problem. See the Security Issues section for more details.

Alternate "Success" (and "Failure") Page

XXX Add lnk to security section 3sec-xss

You can designate your own web page to be shown upon success, or failure, of the cgiemail script. Set the success or required-success CGI variable to change the success page, and the failure or required-failure to customize the failure page. For example:

  <INPUT TYPE=HIDDEN NAME="success" VALUE="result.html">
The value of the success (or failure) variable may be either a URL or a filename:

Since the success and failure pages are chosen by CGI variables, they can be set by any kind of INPUT tags in the HTML form, user-selected as well as hidden. For example, the form could offer a menu of different follow-on pages (e.g. for different audiences like student and teacher). It is implemented by using the menu's to choose a template filename for success or required-success:

  <SELECT NAME="required-success">
  <OPTION VALUE="" SELECTED>-- Choose One --
  <OPTION VALUE="student.html">I am a Student
  <OPTION VALUE="teacher.html">I am a Teacher
  </SELECT>
The HTML template offers some subtly powerful possibilities: the success (or failure) page can include another FORM tag, driving cgiemail or some other Web application. Using a a template for the success page, you can give it pre-loaded default values or hidden inputs based on inputs to the current form.

Using cgiemail with Server Side Includes

Server Side Includes (SSIs) are special tags in your HTML page that get evaluated by the web server when it delivers the original HTML page, that is, the form that later drives your cgiemail application. The SSIs are active before cgiemail comes onto the scene. They can interact in useful ways, however. For more details about SSIs, see http://httpd.apache.org/docs/mod/mod_include.html

SSIs can insert dynamic content anywhere in your HTML page, even in the quoted string that is the VALUE attribute of an INPUT tag. So, you can use them to pre-load the value of an interactive input, or generate the value of a hidden input. These values are then passed on to cgiemail.

For example, you can include the date the HTML page was last modified in your template output by adding this tag to your FORM: (The SSI directive is italicized)

    <INPUT TYPE=HIDDEN NAME="last-mod-time"
           VALUE="<!--#flastmod file="my-cgiemail-form.shtml" -->" >
In your template, it is just another variable:
    HTML page last modified on: [last-mod-time]

SSIs are available on the MIT web servers.

Confidential Information and cgiemail

There is one word to describe how to handle confidential data with cgiemail: don't. Even if you use an SSL-equipped Web server to protect the data as it travels from the user's browser to cgiemail, it will be vulnerable when it is sent in email. Never ask for credit card numbers, Social Security numbers, and other unpublished data in a cgiemail application.

Plain-text electronic mail is an inherently insecure medium: The message passes through many post office servers on its way to the recipient. It is stored temporarily on each server, and possibly intercepted or observed. Since it probably travels unencrypted between servers it might be observed on the network by packet sniffers as well.

Why not encrypt the email? It is possible to adapt cgiemail to send S/MIME or PGP-encoded mail, but it requires a lot of extra administrative effort and risk. The web server has to store key pairs that match keys held by the recipient of the mail, and cgiemail has to use a type of encrypted mail all the recipients can decode.

It is preferable to keep confidential information, such as credit card numbers, away from cgiemail in the first place. You may be able to use it in concert with a "second page", e.g. a credit-card payment application: Create a cgiemail form for the initial page where the user chooses goods to buy. The form's template computes the price and passes it on to a custom "success" page, which is a form that collects the user's credit card number and then passes it directly to the credit-card payment application.

MIT Web Server Features and cgiemail

This section describes how cgiemail interacts with special features of the MIT Web servers maintained by MIT Information Systems.

You can skip this section if your pages are not hosted on the web.mit.edu server. Consult your system administrator and webmaster about any relevant features of your web server.

Athena lockers and AFS

In the Athena environment, personal and shared files are stored in the network file system AFS (Andrew File System). The Web server also has access to AFS, so any files you let it read are implicitly part of its "website".

Athena also implements a concept called lockers. A locker is a simple, one-word name for a place in the AFS hierarchy. Every user's home directory is a locker named by their username. Other lockers are used for shared resources, courses, and some even for the sole purpose of hosting a website on web.mit.edu. See this Web Publishing document for instructions on building a website in your locker.

The web.mit.edu servers run a special module that lets you refer to any Athena locker by placing its name in the first element of the path. For example, the "bar.html" file in the foo locker is accessed through the URL http://web.mit.edu/foo/bar.html.

URLs starting with a locker name are known as short form since it is a lot more compact than full AFS paths. You can also use the short form of the template file URI in the URL that invokes cgiemail.

For example, consider a user named florey who develops a form and and template in the "webstuff" subdirectory of his locker. On an Athena Unix system, the template would be in /mit/florey/webstuff/form1.txt. The URL to invoke cgiemail on this form would thus be:

    http://web.mit.edu/bin/cgiemail/florey/webstuff/form1.txt

Of course, you can also construct the cgiemail URL with an absolute AFS path such as:

    http://web.mit.edu/bin/cgiemail/afs/athena.mit.edu/user/f/l/florey/webstuff/form1.txt
The locker paths are neater, more compact, and more robust if the locker is ever moved.

Security Issues with cgiemail

The web is a computing platform of great power and subtlety, which inspires the creation of tools like cgiemail. It also contains inherent vulnerabilities which are often exploited by programmers with malicious intent. A vulnerability is a loophole which may be used by an attacker to gain access to services or data to which he or she is not normally entitled.

All of the known and reported vulnerabilities in earlier versions of cgiemail have been addressed in Release 2.0. Some, like cross-site scripting, cannot be completely prevented by cgimail and also require vigilance on the part of the application author. These are enumerated in the next sections.

The Cross-Site Scripting Vulnerability

Cross-site scripting (XSS) allows an attacker to hide malicious scripts in an innocent-looking Web page loaded from a trusted source. Please see CERT's page CERT Advisory CA-2000-02 Malicious HTML Tags Embedded in Client Web Requests for a concise description of the mechanism and risks.

To prevent XSS attacks, a Web application must never allow the input provided by a user to be echoed directly on a dynamic Web page. In the context of cgiemail this is only a possibility if you use custom "failure" or "success" pages. Those pages should use the %H format in any variable invocations that display the value of a CGI input variable that may have been set by the user. This includes any variable that was not set within the template itself -- even variables supplied by the form page in <INPUT TYPE=HIDDEN ..> tags may be overridden by POST or GET arguments.

If you want to do other editing on the variable in an invocation, assign it to an intermediate variable which is then invoked with %H, e.g.:

    Choice: [:ed-choice, @ ,2,choice][%H,ed-choice]

Email Relay Vulnerability

Since cgiemail generates and sends an email message, there is always the possibility that it can be abused to inject a malicious message into the mail system. One possible vulnerability is the use of your website as a mail relay, a mechanism used by senders of objectionable mail such as SPAM. See the article How SPAM is sent for an introduction to the problem.

If your cgiemail application has a mail template which invokes a variable in the mail headers, e.g. to set the From: address, previous versions of cgiemail allowed the Web client to insert a value containing newline characters into the headers. This let the requestor insert a complete mail message with whatever headers he desired, including the blank line and message body, leaving the remains of your template at the very end of the message.

With Release 2.0, cgiemail does not allow values containing newlines to be inserted into the header section of an email message.

Email Header Vulnerabilities

The sendmail program that handles much of the mail traffic on the Internet is prone to having new vulnerabilities discovered, some of which can be exploited by carefully crafted email addresses.

Since cgiemail hands its email messages off to sendmail, it can be used to attack sendmail on your web server (even if it is not otherwise listening to the network).

To protect against this possibility, we recommend that your email templates do not invoke any variables in the header section. If you must include a variable in the header (such as the From: address, at least have the template test that it only contains "allowed" characters and is not longer than some maximum (100 characters, in this example). This ought to rule out many sendmail attacks:

    [#if email-from !~ "/^\[-a-zA-Z0-9\\.+_@%\]{1,100}$/"]
      [! Illegal email address format.]
    [#endif]From: [email-from]

Other Email Vulnerabilities

Never forget, and be sure to remind anyone receiving messages generated by a cgiemail application, that electronic mail is an inherently insecure medium. Remember these principles:
  1. Presence of a message is not proof that the form was submitted.. It is easy to spoof an email message (that is, create your own message that looks as if it came from the cgiemail application).

  2. Absence of a message is not proof that the form was not submitted. Messages are sometimes lost, for example when the recipient's mailbox is full or disabled, or when some piece of mail software fails.

File Exposure

Since the template mechanism lets cgiemail and cgiecho potentially read any file under the Web server's document root, there is a small risk of allowing files to be exposed inadvertently. There are several mitigating factors, however.

To gain access to a file:

These constraints make it unlikely, though not impossible, that cgiemail or cgiecho could be exploited to expose files that would otherwise be protected from view.

Other Risks

XXX TODO:
outline:
 - move section above on confidential info

 - cgiemail is not particularly prone to a number of common sources
   of vul in web applications:
    - does not execute arbitrary commands nor include user input in cmd.
    - does not expose information about the machine, except for its version.
      that is a useful diagnostic tool.
    - DOS; little if any *extra* vul (web & email already open to abuse)
      does not cause extraordinary load on server, no loops.
    - eavesdroppers? nothing above normal webserver vul.
    - software errors allowing stack-smashing attacks; simple
       C program, carefully audited, minimal memory mgt, runs once.
       (please alert cgiemail@mit of any new discoveries)
    - spoofing vul no worse than ordinary email