Jasmin Guide

Jonathan Meyer, July '96

1.1 Introduction

Welcome to Jasmin version 1.0.

Jasmin is a Java Assembler Interface. It takes ASCII descriptions for Java classes written in a simple assembler-like syntax using the Java Virtual Machine instruction set. It converts them into binary Java class files, suitable for loading into a Java interpreter.

Jasmin was written as a companion to the book "Java Virtual Machine", soon to be published by O'Reilly, written by myself and Troy Downing.

This document covers the rules and syntax used in Jasmin. It doesn't really explain much about the Java Virtual Machine. For more on the VM itself, see:

Sun's Java Virtual Machine Specification
For more on Jasmin, see:

Jasmin Instruction Syntax
for a description of the syntax of Java VM instructions in Jasmin.

About Jasmin
for answers to questions like "what is Jasmin?", "why should I use Jasmin?", "why did you write Jasmin?".

Jasmin is designed as a simple assembler, and has a clean easy-to-learn syntax with very few bells and whistles.

Where possible, I have adopted a 1:1 mapping between Jasmin constructs and the conventions followed by Java class files. For example, package names in Jasmin are delimited with the '/' character (e.g. "java/lang/String") used by the class file format, instead of the '.' character (java.lang.String) used in the Java language.

The Jasmin assembler does little compile-time processing and checking of the input code. For example, it doesn't check that classes you reference actually exist, or that your type signatures are well formed. Jasmin also lacks many of the feautures found in full macro assemblers. For example, it doesn't inline mathematical expressions, perform variable substitutions, or support macros.

On the other hand, using Jasmin you can quickly try out nearly all of the features of the Java Virtual Machine, including methods, fields, subroutines, exception handlers, and so on. The Jasmin syntax is also readable and compact.

1.2 The Jasmin Tokenizer

A Jasmin file is a sequence of statements, each separated by a newline. There are three types of statements: directives, instructions and labels. Jasmin files also contain comments, type signatures, class names, and numbers. The rules for these items are described below.

Directives

A directive is a keyword prefixed with a '.', followed by a number of parameters, separated by spaces. The directives in Jasmin are:

    .catch .class .end .field .implements .interface .limit .line 
    .method .source .super .throws .var
Some example usages of directives are shown below:

    .method public myMethod()V

    .limit stack 10

    .end method

Instructions

Instructions are mnemonics for Java Virtual Machine opcodes. Each instruction has zero or more parameters (depending on the type of instruction - see Jasmin Instruction Syntax for a description of the syntax of instructions in Jasmin) separated by spaces and terminated with a newline.

Below you can see some examples:

     ldc    "Hello World"
     iinc   1 -1
     bipush 10

Labels

A Jasmin label is a name followed by a ':', for example:

    Foo:

    Label:
Label names cannot start with a numeric digit, and cannot contain any of the special characters:

   = : . " -
Labels also cannot be one of the reserved words (i.e. an instruction, a keyword or a directive name). Other than that, there are few restrictions on label names. For example, you could use the label:

   #_1:
Label names are scoped to the method they are declared within. Labels can only be declared within method definitions.

Comments

A comment starts with a ';' character, and extends to the end of the line. Note that the semicolon must be preceeded by a whitespace character (a space, tab or newline), since the tokenizer treats a sequence of characters containing an embedded semicolon as a single token. For example,

   abc;def
is the single token "abc;def", and

   Ljava/lang/String;
is the token "Ljava/lang/String;", whereas

   foo ; baz ding
is the token "foo" followed by a comment "baz ding".

Numbers and Strings

In Jasmin, only simple decimal and integer numeric formats are recognized. Floats in scientific or exponent format are not yet supported. Character codes and octal aren't currently supported either. This means you can have:

    1, 123, .25, 0.03, 0xA
but not

    0x10, 01, 1e-10, 'a'
Quoted strings are also very basic. The full range of backslash escape sequences are not supported yet, although "\n" and "\t" are.

Class Names, Method Names, Fields and Signatures

Class names in Jasmin should be written using the Java class file format conventions, so java.lang.String becomes java/lang/String. Type signatures are also written as they appear in class files (e.g. "I" speficies an integer, "[Ljava/lang/Thread;" is an array of Threads).

Methods are specified using a single token, e.g.

     invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
invokes the method called "println" in the class java.io.PrintStream, which has the signature "(Ljava/lang/String;)V". In general, a method specification is formed of three parts: the characters before the last '/' form the class name. The characters between the last '/' and '(' are the method name. The rest of the string is the signature.

     foo/baz/Myclass/myMethod(Ljava/lang/String;)V
     ---------------         ---------------------
           |         --------         |
           |            |             |
         class        method       signature

As a final example, you would call the Java method:

   class mypackage.MyClass {
       int foo(Object a, int b[]) { ... }
   }
using:

   invokevirtual mypackage/MyClass/foo(Ljava/lang/Object;[I)I
Field names are specified in Jasmin using two tokens, one giving the name and class of the field, the other giving its signature. For example:

    getstatic mypackage/MyClass/my_font   Ljava/lang/Font;
gets the value of the field called "my_font" in the class mypackage.MyClass. The type of the field is "Ljava/lang/Font;" (i.e. a Font object).

1.3 Jasmin File Structure

The first things that appear in a Jasmin file define information about the class that the file contains - such as the name of the class, the name of the source file that the class was defined in, and the name of its superclass. This section describes the format of this header information.

A Jasmin file typically starts with three directives:

    .source <source-file>
    .class  <access-spec> <class-name>
    .super  <class-name>
For example, the file defining MyClass might start with the directives:

    .source MyClass.j
    .class  public MyClass
    .super  java/lang/Object
This declares that MyClass is a public class, and that it inherits from java.lang.Object.

Note that the .source directive is optional. It specifies the value of the "SourceFile" attribute that is placed in the class file. (This is used by Java to print out debugging info if something goes wrong in one of the methods in the class).

If you generated the Jasmin file automatically (e.g. as the result of compiling a file written in another syntax) you should use the .source directive to tell Java the name of the originating file. Note that the source file name should not include any pathname. So use "foo.src" but not "/home/user/foo.src".

If no .source directive is given, the name of the Jasmin file you are compiling is used instead as the SourceFile attribute instead.

The .class and .super directive tell the JVM the name of this class and its superclass. These directives take parameters as follows:

<class-name>
is the name of the class, including any packages. For example mypackage/MyClass.

<access-spec>
defines access permissions and other attributes for the class. This is a list of zero or more of the following keywords:

public, private, protected, static, final, synchronized, volatile, transient, native, interface, abstract
Note that, instead of using the directive .class, you can alternatively use the directive .interface. This uses the same syntax as .class, but indicates that the Jasmin file is defining a Java interface, not a Java class. Writing:
    .interface public foo
is in fact equivalent to writing:
    .class public interface foo
except that the former is clearer to the reader.

After .source, .class and .super, you can list optionally list the interfaces that are implemented by the class you are defining, using zero or more .implements directives. The syntax of .implements is:

    .implements <class-name>
where <class-name> has the same format as was used by .class and .super. For example:
    .class foo
    .super java/lang/Object
    .implements Edible
    .implements java/lang/Throwable

After this header information, there follows zero or more field definitions and method definitions, as described in the following sections.

1.4 Field Definitions

A field is defined using the .field directive:

    .field <access-spec> <field-name> <signature> [ = <value> ]
where:

<access-spec>
is as for the .class directive (see above).

<field-name>
is the name of the field.

<signature>
is its type signature.

<value>
is an integer, a quoted string or a decimal number.

For example, the Java field definition:

    public int foo;
becomes

    .field public foo I
whereas the constant:

    public static final float PI = 3.14;
becomes

    .field public static final PI F = 3.14

1.5 Method Definitions

A method is defined using the basic form:

    .method <access-spec> <method-name><method-signature>
        <statements>
    .end method
where:
<access-spec>
is as for the .field and .class directives.

<method-name>
is the name of the method.

<method-signature>
gives the method's argument types and return type.

<statements>
is the code defining the body of the method.

Method definitions cannot be nested. Also note that Jasmin does not insert an implicit 'return' instruction at the end of a method, so the most basic Jasmin method is:

   .method foo()V
       return
   .end method

Method Directives

The following directives can be used inside method definitions:

.limit stack <integer>

Sets the maximum size of the operand stack required by the method.
.limit vars <integer>

Sets the number of local variables required by the method.
.line <integer>

This is used to tag the subsequent instruction(s) with a line number. Debuggers use this information, together with the name of the source file (see .source above) to show at what line in a method things went wrong. If you are generating Jasmin files by compiling a source file which uses another syntax, this directive lets you indicate what line numbers in the source file produced corrosponding JVM instructions. For example:
    .method foo()V
    .line 5    
        bipush 10    // these instructions generated from line 5
        istore_2     // of the source file.
    .line 6
        ... 
.var <var-number> is  <name> <signature> from <label1> to <label2>

The .var directive is used to define the name, signature and scope of a local variable number. This information is used by debuggers so that they can be more helpful when printing out the values of local variables (rather than printing just a local variable number, the debugger can actually print out the name of the variable). For example:
    .method foo()V
        .limit vars 1

        ; declare variable 0 as an "int Count;"
        ; whose scope is the code between Label1 and Label2
        .var 0 is Count I from Label1 to Label2

    Label1:
        bipush 10
        istore_0
    Label2:

        return
    .end method
.throws <classname>

Indicates that this method can throw exceptions of the type indicated by <classname>. This information isn't used by the Java runtime system (as far as I can tell), but it is used by the Java compiler to enforce the convention that methods must either catch exceptions they can cause, or declare that they throw them.
.catch <classname> from <label1> to <label2> using <label3>

Adds an entry at the end of the exceptions table for the method. The entry indicates that if an exception which is an instance of <classname> or one of its subclasses is raised while executing the code between <label1> and <label2>, then the interpreter should jump to <label3>.

If classname is the keyword "all", then exceptions of any class are caught by the handler.

Abstract Methods

Abstract methods should contain no statements, other than .throws directives. So

    .method abstract myAbstract()V
    .end method
and
    .method abstract anotherAbstract()V
        .throws java/io/IOException
    .end method
are both legal abstract methods, whereas

    .method abstract anotherAbstract()V
        .limit stack 10
    .end method
is illegal.

1.6 Java VM Instructions

Java VM instructions are placed between the .method and .end method directives. VM instructions can take zero or more parameters, depending on the type of instruction used. Some example instructions are shown below:
    iinc 1 -3    ; decrement local variable 1 by 3

    bipush 10    ; push the integer 10 onto the stack

    pop          ; remove the top item from the stack.

See Jasmin Instruction Syntax for more details on the syntax of instructions in Jasmin.

Wide Instructions

The Java VM has several instructions which come in two forms - a standard form and a 'wide' form (with a name ending in the suffix '_w') which uses more bytes in the bytecode and works with a greater range of values.

For example, the ldc instruction uses a one-byte index (and can address constants in the constant pool whose indices are in the range 0-255), whereas ldc_w uses a two-byte index, and can address any of the 65535 possible entries in the constant pool.

In addition, there are about a dozen instructions for referencing local variables. All of these instructions come in two forms - one which uses a single byte to identify which variable to use (giving you access to local variables 0 to 255), and one which uses two bytes (giving you access to all the local variables from 0 to 65535). The 'wide' opcode is used to widen one of these instructions so that it uses the two-byte form rather than the one-byte form.

In Jasmin, the assembler automatically decides which instruction should be used, always opting for the instruction that takes the fewer bytes in the class file if there is a choice.

Hence the assembler automatically switches from ldc to ldc_w when addressing a constant whose constant pool index is greater than 255. Jasmin also determines when the wide instruction is needed to widen a local variable index to 16-bits.

This means that, in Jasmin, you can simply write:

   ldc "Hello World"
and the assembler will decide whether to use ldc or ldc_w. Similarly, if you write

   iload 300
Jasmin will automatically insert a 'wide' opcode before the iload opcode.

1.7 Running Jasmin

The jasmin command runs Jasmin on a file. For example:
    % jasmin myfile.j
assembles the file "myfile.j". Jasmin looks at the .class directive contained in the file to decide where to place the output class file. So if myfile.j starts with:

    .class mypackage/MyClass
then Jasmin will place the output class file "MyClass.java" in the subdirectory "mypackage" of the current directory. It will create the mypackage directory if it doesn't exist.

You can use the "-d" option to tell jasmin to place the output in an alternative directory. For example,

    % jasmin -d /tmp myfile.j 
will place the output in /tmp/mypackage/MyClass.class.

Finally, you can use the "-g" option to tell Jasmin to include line number information (used by debuggers) in the resulting .class file. Jasmin will number the lines in the Jasmin source file that JVM instructions appear on. Then, if an error occurs, you can see what instruction in the Jasmin source caused the error. Note that specifying "-g" causes any .line directives within the Jasmin file to be ignored.


Copyright (c) Jonathan Meyer, July 1996

Jasmin Home | Jon Meyer's Home