Jasmin Guide

<title>The Jasmin User Guide</title>

<center>
<img src="jasmin.gif">
<h1>Jasmin Guide</h1>
<i>Jonathan Meyer, July '96</i>
</center>

<h2>1.1 Introduction</h2>

Welcome to Jasmin version 1.0.<p>

Jasmin is a Java Assembler Interface. It takes ASCII descriptions for Java
classes written in a simple assembler-like syntax using the Java Virtual
Machine instruction set. It converts them into binary Java class files, 
suitable for loading into a Java interpreter.<p>

Jasmin was written as a companion to the book "Java Virtual Machine",
soon to be published by O'Reilly, written by myself and Troy Downing.<p>

This document covers the rules and syntax used in Jasmin. It doesn't really
explain much about the Java Virtual Machine. For more on the VM itself, see:
<p>

<dl>
<dd>
<a href="http://java.sun.com/doc/language_vm_specification.html">
Sun's Java Virtual Machine Specification
</a>
</dl>

For more on Jasmin, see:<p>
<dl>
<dt><a href="instructions.html">Jasmin Instruction Syntax</a> 
<dd>for a description of the syntax of Java VM instructions in Jasmin.<p>

<dt><a href="about.html">About Jasmin</a> 
<dd>for answers to questions like "what is Jasmin?", "why should
I use Jasmin?", "why did you write Jasmin?".<p>
</dl>

Jasmin is designed as a simple assembler, and has a clean easy-to-learn syntax
with very few bells and whistles.<p>

Where possible, I have adopted a 1:1 mapping between Jasmin constructs and the
conventions followed by Java class files. For example, package names in Jasmin
are delimited with the '/' character (e.g. "java/lang/String") used by the
class file format, instead of the '.' character (java.lang.String) used in the
Java language.<p>

The Jasmin assembler does little compile-time processing and
checking of the input code. For example, it doesn't check that
classes you reference actually exist, or that your type signatures are
well formed. Jasmin also lacks many of the feautures
found in full macro assemblers. For example, it doesn't
inline mathematical expressions, perform variable
substitutions, or support macros.<p>

On the other hand, using Jasmin you can quickly try out nearly
all of the features of the Java Virtual Machine, including
methods, fields, subroutines, exception handlers, and so on.
The Jasmin syntax is also readable and compact.<p>

<h2>1.2 The Jasmin Tokenizer</h2>

A Jasmin file is a sequence of statements, each separated by a
newline. There are three types of statements: directives, instructions
and labels. Jasmin files also contain comments, type signatures,
class names, and numbers. The rules for these items are described
below.<p>

<h4>Directives</h4>

A directive is a keyword prefixed with a '.', followed by a number
of parameters, separated by spaces. The directives in Jasmin are:<p>

<pre>
    .catch .class .end .field .implements .interface .limit .line 
    .method .source .super .throws .var
</pre>

Some example usages of directives are shown below:<p>

<pre>
    .method public myMethod()V

    .limit stack 10

    .end method
</pre>

<h4>Instructions</h4>

Instructions are mnemonics for Java Virtual Machine opcodes. Each instruction
has zero or more parameters (depending on the type of instruction - see <a
href="instructions.html">Jasmin Instruction Syntax</a> for a description
of the syntax of instructions in Jasmin) separated by spaces and terminated
with a newline.<p>

Below you can see some examples:<p>

<pre>
     ldc    "Hello World"
     iinc   1 -1
     bipush 10
</pre>

<h4>Labels</h4>

A Jasmin label is a name followed by a ':', for example:<p>

<pre>
    Foo:

    Label:
</pre>

Label names cannot start with a numeric digit, and cannot contain
any of the special characters:<p>

<pre>
   = : . " -
</pre>

Labels also cannot be one of the reserved words (i.e. an instruction, a
keyword or a directive name). Other than that, there are few restrictions on
label names. For example, you could use the label:<p>

<pre>
   #_1:
</pre>

Label names are scoped to the method they are declared within. Labels can 
only be declared within method definitions.

<h4>Comments</h4>

A comment starts with a ';' character, and extends to the end of
the line. Note that the semicolon must be preceeded by a whitespace
character (a space, tab or newline), since the tokenizer treats
a sequence of characters containing an embedded semicolon as a single
token. For example,<p>

<pre>
   abc;def
</pre>

is the single token "abc;def", and<p>

<pre>
   Ljava/lang/String;
</pre>

is the token "Ljava/lang/String;", whereas<p>

<pre>
   foo ; baz ding
</pre>

is the token "foo" followed by a comment "baz ding".<p>

<h4>Numbers and Strings</h4>

In Jasmin, only simple decimal and integer numeric formats are
recognized. Floats in scientific or exponent format are not yet
supported. Character codes and octal aren't currently supported either. This
means you can have:<p>

<pre>
    1, 123, .25, 0.03, 0xA
</pre>

but not<p>

<pre>
    0x10, 01, 1e-10, 'a'
</pre>

Quoted strings are also very basic. The full range of
backslash escape
sequences are not supported yet, although "\n" and "\t"
are.<p>

<h4>Class Names, Method Names, Fields and Signatures</h4>

Class names in Jasmin should be written using the Java class file format
conventions, so java.lang.String becomes java/lang/String. Type signatures are
also written as they appear in class files (e.g. "I" speficies an integer,
"[Ljava/lang/Thread;" is an array of Threads).<p>

Methods are specified using a single token, e.g.<p>

<pre>
     invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
</pre>

invokes the method called "println" in the class java.io.PrintStream, which
has the signature "(Ljava/lang/String;)V". In general, a method specification
is formed of three parts: the characters before the last '/' form the class
name. The characters between the last '/' and '(' are the method name. The
rest of the string is the signature.<p>

<pre>
     foo/baz/Myclass/myMethod(Ljava/lang/String;)V
     ---------------         ---------------------
           |         --------         |
           |            |             |
         class        method       signature

</pre>

As a final example, you would call the Java method: <p>

<pre>
   class mypackage.MyClass {
       int foo(Object a, int b[]) { ... }
   }
</pre>

using:<p>

<pre>
   invokevirtual mypackage/MyClass/foo(Ljava/lang/Object;[I)I
</pre>

Field names are specified in Jasmin using two tokens, one giving the name
and class of the field, the other giving its signature. For example:<p>

<pre>
    getstatic mypackage/MyClass/my_font   Ljava/lang/Font;
</pre>

gets the value of the field called "my_font" in the class mypackage.MyClass.
The type of the field is "Ljava/lang/Font;" (i.e. a Font object).<p>

<h2>1.3 Jasmin File Structure</h2>

The first things that appear in a Jasmin file define information
about the class that the file contains - such as the name of the
class, the name of the source file that the class was defined in,
and the name of its superclass. This section describes the
format of this header information.<p>

A Jasmin file typically starts with three directives:<p>

<pre>
    .source &lt;source-file&gt;
    .class  &lt;access-spec&gt; &lt;class-name&gt;
    .super  &lt;class-name&gt;
</pre>

For example, the file defining MyClass might start with the directives:<p>

<pre>
    .source MyClass.j
    .class  public MyClass
    .super  java/lang/Object
</pre>

This declares that MyClass is a public class, and that it inherits
from java.lang.Object.<p>

Note that the .source directive is optional. It specifies the
value of the "SourceFile" attribute that is placed in the class
file. (This is used by Java to print out debugging info
if something goes wrong in one of the methods in the class).<p>

If you generated the Jasmin file automatically (e.g. as the result of 
compiling a file written in another syntax) you should use the .source
directive to tell Java the name of the originating file. Note that
the source file name should not include any pathname. So use "foo.src"
but not "/home/user/foo.src".<p>

If no .source directive is given, the name of the Jasmin
file you are compiling is used instead as the SourceFile attribute
instead.<p>

The .class and .super directive tell the JVM the name of this
class and its superclass. These directives take parameters as
follows: 

<dl>
<dt>&lt;class-name&gt;
<dd>is the name of the class, including
any packages. For example mypackage/MyClass.<p>

<dt>&lt;access-spec&gt;
<dd>defines access permissions and other attributes for
the class. This is a list of zero or more of the following
keywords:<p>

<dl><dd>
   public, private, protected, static, final,
   synchronized, volatile, transient, native,
   interface, abstract
</dl>
</dl>

Note that, instead of using the directive .class,
you can alternatively use the directive .interface. This uses
the same syntax as .class, but indicates that the Jasmin file 
is defining a Java interface, not a Java class. Writing:
<pre>
    .interface public foo
</pre>
is in fact equivalent to writing:
<pre>
    .class public interface foo
</pre>
except that the former is clearer to the reader.<p>

After .source, .class and .super, you can list optionally list the
interfaces that are implemented by the class you are defining, using 
zero or more .implements directives. The syntax of .implements is:

<pre>
    .implements &lt;class-name&gt;
</pre>
where &lt;class-name&gt; has the same format as was used by .class and .super.
For example:

<pre>
    .class foo
    .super java/lang/Object
    .implements Edible
    .implements java/lang/Throwable
</pre>

<p>

After this header information, there follows zero or more field definitions 
and method definitions, as described in the following sections.<p>

<h2>1.4 Field Definitions</h2>

A field is defined using the .field directive:<p>

<pre>
    .field &lt;access-spec&gt; &lt;field-name&gt; &lt;signature&gt; [ = &lt;value&gt; ]
</pre>

where:<p>

<dl>
<dt>&lt;access-spec&gt;
<dd>is as for the .class directive (see above).<p>

<dt>&lt;field-name&gt;
<dd>is the name of the field.<p>

<dt>&lt;signature&gt;
<dd>is its type signature.<p>

<dt>&lt;value&gt;
<dd>is an integer, a quoted string or a decimal number.<p>
</dl>

For example, the Java field definition:<p>

<pre>
    public int foo;
</pre>

becomes<p>

<pre>
    .field public foo I
</pre>

whereas the constant:<p>

<pre>
    public static final float PI = 3.14;
</pre>

becomes<p>

<pre>
    .field public static final PI F = 3.14
</pre>

<h2>1.5 Method Definitions</h2>

A method is defined using the basic form:<p>

<pre>
    .method &lt;access-spec&gt; &lt;method-name&gt;&lt;method-signature&gt;
        &lt;statements&gt;
    .end method
</pre>

where:

<dl>
<dt>&lt;access-spec&gt;
<dd>is as for the .field and .class directives.<p>

<dt>&lt;method-name&gt;
<dd>is the name of the method.<p>

<dt>&lt;method-signature&gt;
<dd>gives the method's argument types and return type.<p>

<dt>&lt;statements&gt;
<dd>is the code defining the body of the method.<p>
</dl>

Method definitions cannot be nested. Also note that Jasmin does not
insert an implicit 'return' instruction at the end of a method,
so the most basic Jasmin method is:<p>

<pre>
   .method foo()V
       return
   .end method
</pre>

<h4>Method Directives</h4>

The following directives can be used inside method definitions:<p>

<dl>
<dt><pre>.limit stack &lt;integer&gt;</pre><p>
<dd>Sets the maximum size of the operand stack
required by the method.

<dt><pre>.limit vars &lt;integer&gt;</pre><p>
<dd>Sets the number of local variables
required by the method.

<dt><pre>.line &lt;integer&gt;</pre><p>
<dd>This is used to tag the subsequent
instruction(s) with a line number. Debuggers use this information,
together with the name of the source file (see .source above) 
to show at what line in a method things went wrong. If you are
generating Jasmin files by compiling a source file which uses
another syntax, this directive lets you indicate what line 
numbers in the source file produced corrosponding JVM 
instructions. For example:

<pre>
    .method foo()V
    .line 5    
        bipush 10    // these instructions generated from line 5
        istore_2     // of the source file.
    .line 6
        ... 
</pre>

<dt><pre>.var &lt;var-number&gt; is  &lt;name&gt; &lt;signature&gt; from &lt;label1&gt; to &lt;label2&gt;</pre><p>
<dd>The .var directive is used to define the name, signature and scope of
a local variable number. This information is used by debuggers 
so that they can be more helpful when printing out the values of local
variables (rather than printing just a local variable number, the
debugger can actually print out the name of the variable). For example:

<pre>
    .method foo()V
        .limit vars 1

        ; declare variable 0 as an "int Count;"
        ; whose scope is the code between Label1 and Label2
        .var 0 is Count I from Label1 to Label2

    Label1:
        bipush 10
        istore_0
    Label2:

        return
    .end method
</pre>
   
<dt><pre>.throws &lt;classname&gt;</pre><p>
<dd>Indicates that this method can throw
exceptions of the type indicated by &lt;classname&gt;. This
information isn't used by the Java runtime system (as far as
I can tell), but it is used by the Java compiler to enforce the
convention that methods must either catch exceptions they
can cause, or declare that they throw them.

<dt><pre>.catch &lt;classname&gt; from &lt;label1&gt; to &lt;label2&gt; using &lt;label3&gt;</pre><p>
<dd>Adds an entry at the end of the exceptions table for the
method. The entry indicates that if an exception which is
an instance of &lt;classname&gt; or one of its subclasses is raised
while executing the code between &lt;label1&gt; and &lt;label2&gt;, then
the interpreter should jump to &lt;label3&gt;.<p>

If classname is the keyword "all", then exceptions of any
class are caught by the handler.<p>

</dl>

<h4>Abstract Methods</h4>

Abstract methods should contain no statements, other than .throws
directives. So<p>

<pre>
    .method abstract myAbstract()V
    .end method
</pre>

and

<pre>
    .method abstract anotherAbstract()V
        .throws java/io/IOException
    .end method
</pre>

are both legal abstract methods, whereas<p>

<pre>
    .method abstract anotherAbstract()V
        .limit stack 10
    .end method
</pre>

is illegal.<p>

<h2>1.6 Java VM Instructions</h2>

Java VM instructions are placed between the <code>.method</code> and
<code>.end method</code> directives. VM instructions can take zero or more
parameters, depending on the type of instruction used. Some example
instructions are shown below:

<pre>
    iinc 1 -3    ; decrement local variable 1 by 3

    bipush 10    ; push the integer 10 onto the stack

    pop          ; remove the top item from the stack.

</pre>

See <a href="instructions.html">Jasmin Instruction Syntax</a> for more
details on the syntax of instructions in Jasmin.<p>

<h4>Wide Instructions</h4>

The Java VM has several instructions which come in two forms - a standard
form and a 'wide' form (with a name ending in the suffix '_w') which
uses more bytes in the bytecode and works with a greater range of values.<p>

For example, the ldc instruction uses a one-byte index (and can address constants
in the constant pool whose indices are in the range 0-255), whereas ldc_w uses a two-byte 
index, and can address any of the 65535 possible entries in the constant pool.<p>

In addition, there are about a dozen instructions for referencing local variables.
All of these instructions come in two forms - one which uses a single byte to
identify which variable to use (giving you access to local variables 0 to 255), and 
one which uses two bytes (giving you access to all the local variables from 0 to 65535).
The 'wide' opcode is used to widen one of these instructions so that it uses the two-byte
form rather than the one-byte form.<p>

In Jasmin, the assembler automatically decides which instruction should be used,
always opting for the instruction that takes the fewer bytes in the class file if there
is a choice.<p>

Hence the assembler automatically switches from ldc to ldc_w when addressing a constant
whose constant pool index is greater than 255. Jasmin also determines when the wide 
instruction is needed to widen a local variable index to 16-bits.<p>

This means that, in Jasmin, you can simply write:<p>

<pre>
   ldc "Hello World"
</pre>

and the assembler will decide whether to use ldc or
ldc_w. Similarly, if you write<p>

<pre>
   iload 300
</pre>

Jasmin will automatically insert a 'wide' opcode before the iload opcode.<p>

<h2>1.7 Running Jasmin</h2>

The <code>jasmin</code> command runs Jasmin on a file.
For example:

<pre><strong>    % jasmin myfile.j</strong></pre>

assembles the file "myfile.j". Jasmin looks at the
<code>.class</code> directive contained in the file to
decide where to place the output class file. So if myfile.j starts
with:<p>

<pre>
    .class mypackage/MyClass
</pre>

then Jasmin will place the output class file "MyClass.java" in the
subdirectory "mypackage" of the current directory. It will create the
mypackage directory if it doesn't exist.<p>

You can use the "-d" option to tell jasmin to place the output
in an alternative directory. For example,

<pre><strong>    % jasmin -d /tmp myfile.j </strong></pre>

will place the output in /tmp/mypackage/MyClass.class.<p>

Finally, you can use the "-g" option to tell Jasmin to include
line number information (used by debuggers) in the resulting 
.class file. Jasmin will number the lines in the Jasmin source
file that JVM instructions appear on. Then, if an error occurs,
you can see what instruction in the Jasmin source caused the error. 
Note that specifying "-g" causes any .line directives within the 
Jasmin file to be ignored.
<p>

<hr><address>Copyright (c) Jonathan Meyer, July 1996</address>
<hr>
<a href="http://mrl.nyu.edu/meyer/jvm/jasmin.html">Jasmin Home</a> |
<a href="http://mrl.nyu.edu/meyer/">Jon Meyer's Home</a>