---
--- Description of the cp and wcsp formats
--- 

Plan:
    A. Introduction
    B. Description of the cp format
    C. Description of the wcsp format
    D. How to translate cp to wcsp format


    A. Introduction
    ---------------

The  cp  and wcsp  formats  are  used  to encode  Weighted  Constraint
Satisfaction Problems  (WCSPs). The  CSP framework has  been augmented
with so-called soft  constraints with which it is  possible to express
preferences among  solutions. The  WCSP framework associates  costs to
tuples  and the goal  is to  find a  complete assignment  with minimum
cost. A reference to what a WCSP is can be found in:

In the quest of the best form of local consistency for Weighted CSP
Javier Larrosa and Thomas Schiex
In Proc. of the 18th IJCAI
Acapulco, Mexico, August 2003
www.inra.fr/bia/T/schiex/Export/ijcai03.pdf

In order to encode WCSPs, two formats have been defined. The cp format
is a high level format that  can be translated into the low level wcsp
format.

    B. Description of the cp format
    -------------------------------

The goal of  this format is to simplify the description  of a WCSP.  A
constraint is described either by a list of tuples with their costs or
by a formula (any C-code expression which returns an integer).

The syntax of the cp format  is line oriented: each line defines a new
variable or a  new constraint (except for comment  lines, problem name
definition, and  constraints in extension with their  list of tuples).
Any line  starting with  a '#' character is ignored  (it is  a comment
line). Blank lines are ignored too.

The first non-comment line contains  the problem name and if necessary
a global upper  bound of the problem (optional). If  there is no upper
bound, then the sum of all  the maximum costs for each soft constraint
plus one is  taken: ub = 1 + sum_{c  in constraints}(max_{t in tuples}
(max(0,cost(c,t)))).  Negative  cost values  are used to  express hard
constraints,  and   will  be  automatically  replaced   by  ub  during
translation into the wcsp format.

Next lines define variables and  constraints. A variable is defined by
its name and its domain, as  a list of integer values (can be positive
or negative, not necessarily  consecutive or increasing numbers).  The
name  of  a  variable  must  be  a sequence  of  letters,  digits,  or
underscores, and it  may not begin with a  digit.  Case is significant
in variable names;  a and A are distinct  variables.  The variables in
the scope of a constraint should  be defined before they appear in the
constraint.  A domain should contain at least one value.

A constraint can be defined  by a formula (constraint in intension) or
by  a list of  tuples with  their costs  (constraint in  extension). A
constraint in  intension is defined  by any expression  involving some
already-defined variable names and  which returns an integer (positive
or  negative   for  infinity)  when   it  is  evaluated  by   the  awk
language. Note that awk uses the same syntax as the C language. During
translation,  all the possible  tuples are  assigned to  the variables
involved in the constraint before the expression is evaluated in order
to get the  cost.  The cost value which occurs most  often is taken as
the  default  cost  and  only  tuples with  different  costs  will  be
generated.

Expressions should  be self-content and perform no  side effects. They
may  call some awk  functions. A  special awk  library, whose  name is
libcp.awk, is loaded before the evaluation of expressions. So the user
can  add  his/her  own  functions  to  this  library.   Two  important
predefined functions  are hard  and soft functions.   Function hard(e)
returns -1  (infinite cost) if e  is equal to 0,  otherwise it returns
zero. Function soft(v,  e) returns v if e is equal  to 0, otherwise it
returns zero. Look at libcp.awk to see the other predefined functions.
Moreover, the special variable ub  (global upper bound, equal to -1 if
not defined) may be used inside the awk functions and expressions.

A constraint  in extension is defined  by a first  line which contains
the  variable names in  its scope  and the  default cost  value.  Each
following line defines a tuple (domain values assigned to variables in
the same order as in the first line) and its corresponding cost value.

A typical  example is the  4-Queens problem.  Place 4 queens on  a 4x4
chessboard in such a way that no one queen could take another queen:

# problem name (and possibly global upper bound)
4-QUEENS

# variables with their explicit domains
queen_row1 1 2 3 4
queen_row2 1 2 3 4
queen_row3 1 2 3 4
queen_row4 1 2 3 4

# constraints defined by a formula or by a list of tuples

hard( alldiff(queen_row1, queen_row2, queen_row3, queen_row4) )

hard( abs(queen_row1 - queen_row2) != 1 )
hard( abs(queen_row1 - queen_row3) != 2 )

# hard( abs(queen_row1 - queen_row4) != 3 )
# is equivalent to:
queen_row1 queen_row4 0
1 4 -1
4 1 -1

hard( abs(queen_row2 - queen_row3) != 1 )
hard( abs(queen_row2 - queen_row4) != 2 )
hard( abs(queen_row3 - queen_row4) != 1 )

<end-of-file benchs/academics/4queens.cp>

Note that  the alldiff  and abs functions  are already defined  in the
original libcp.awk library.


    C. Description of the wcsp format
    ---------------------------------

The wcsp  format is a simple format  which should be easy  to parse by
WCSP solvers.  It is composed of a list of numerical terms (except for
the  first one  which defines  the problem  name) separated  by space,
tabulation or  end of line (see  isspace and fscanf function  in the C
language).  Instead of using  names for making reference to variables,
variable  indexes are  employed.   The same  for  domain values.   All
indexes start at  zero. All the constraints are  defined in extension,
by  their  list  of  tuples.  A  default cost  value  is  defined  per
constraint in order to reduce the size of the list. Only tuples with a
different  cost value should  be given.  All the  cost values  must be
positive.  The structure of the format is: first, the problem name and
dimensions,  then  the definition  of  the  variables,  and last,  the
definition of the constraints.

a) Problem name and dimensions:

<Problem name>
<Number of variables (N)>
<Maximum domain size>
<Number of constraints>
<Global upper bound of the problem (UB)>

b) Definition of the variables:

<Domain size of variable with index 0>
...
<Domain size of variable with index N - 1>

c) Definition of a constraint:

<Arity of the constraint>
<Index of the first variable in the scope of the constraint>
...
<Index of the last variable in the scope of the constraint>
<Default cost value>
<Number of tuples with a cost different than the default cost>

 and for every tuple:

<Index of the value assigned to the first variable in the scope>
...
<Index of the value assigned to the last variable in the scope>
<Cost of the tuple>

There  can be  several constraints  with  the same  scope (the  solver
should combine  them into one  constraint). The arity of  a constraint
may  be equal  to zero.   In this  case, there  is no  tuples  and the
default cost value  is added to the total solution  cost.  This can be
used to represent a global lower bound of the problem.  The goal is to
find an  assignment of all  the variables with minimum  cost, strictly
lower than the global upper bound  UB. Tuples with a cost greater than
or equal to UB are forbidden (hard constraint).

See the 4-Queens example in the next section.


    D. How to translate cp to wcsp format
    -------------------------------------

You need to have the following software installed on your computer:

gawk version 3.1 or later (previous versions or other awk programs will not work!)
       GNU/Linux systems: http://www.gnu.org/software/gawk/gawk.html
	        (sources) http://ftp.gnu.org/gnu/gawk/gawk-3.1.3.tar.gz

       Windows systems: http://gnuwin32.sourceforge.net/packages/gawk.htm
	       (binary) http://gnuwin32.sourceforge.net/downlinks/gawk-bin.php

Moreover, you need the following files in your working directory:
- cp2wcsp.awk (in benchs/translators)
- libcp.awk (in benchs/translators)

The command line usage is the following:

gawk -f cp2wcsp.awk problem.cp > problem.wcsp

If libcp.awk  is placed in a  different directory, you  should set the
AWKPATH environment variable: set AWKPATH = mydirectory.

For a faster  evaluation of the constraint formulae,  mawk can be used
instead   of    gawk   (see   ftp://ftp.whidbey.net/pub/brennan/   and
http://gnuwin32.sourceforge.net/packages/mawk.htm).  If you  have mawk
installed,  just  uncomment the  two  lines  using  mawk in  the  file
cp2wcsp.awk.

The 4-Queens problem in cp format  is translated in the wcsp format by
typing this command: awk -f cp2wcsp.awk ../academics/4queens.cp

4-QUEENS 4 4 7 1
4 4 4 4
4 0 1 2 3 1 24
0 1 2 3 0
0 1 3 2 0
0 2 1 3 0
0 2 3 1 0
0 3 1 2 0
0 3 2 1 0
1 0 2 3 0
1 0 3 2 0
1 2 0 3 0
1 2 3 0 0
1 3 0 2 0
1 3 2 0 0
2 0 1 3 0
2 0 3 1 0
2 1 0 3 0
2 1 3 0 0
2 3 0 1 0
2 3 1 0 0
3 0 1 2 0
3 0 2 1 0
3 1 0 2 0
3 1 2 0 0
3 2 0 1 0
3 2 1 0 0
2 0 1 0 6
0 1 1
1 0 1
1 2 1
2 1 1
2 3 1
3 2 1
2 0 2 0 4
0 2 1
1 3 1
2 0 1
3 1 1
2 0 3 0 2
0 3 1
3 0 1
2 1 2 0 6
0 1 1
1 0 1
1 2 1
2 1 1
2 3 1
3 2 1
2 1 3 0 4
0 2 1
1 3 1
2 0 1
3 1 1
2 2 3 0 6
0 1 1
1 0 1
1 2 1
2 1 1
2 3 1
3 2 1
<end-of-file benchs/academics/4queens.wcsp>

The awk  script will choose  the best default  cost value in  order to
minimize  the number  of  tuples  for every  constraint  defined by  a
formula only.  Moreover, variables with  only one value per domain are
constants which disapear after translation into the wcsp format. Also,
negative  cost   values  are  replaced  by  the   global  upper  bound
ub. Finally, the  awk script is able to detect  possible errors in the
input file.


Any  comment  or contribution  to  these  formats  should be  sent  to
simon.degivry@toulouse.inra.fr

S. de Givry, 2004.
