Return-Path: Received: from pacific-carrier-annex.mit.edu by po10.mit.edu (8.9.2/4.7) id NAA17750; Fri, 10 Jan 2003 13:56:55 -0500 (EST) Received: from hermes.sun.com (hermes.sun.com [64.124.140.169]) by pacific-carrier-annex.mit.edu (8.9.2/8.9.2) with SMTP id NAA28663 for ; Fri, 10 Jan 2003 13:53:21 -0500 (EST) Date: 10 Jan 2003 09:09:40 -0800 From: "JDC Tech Tips" To: alexp@mit.edu Message-Id: <279751861212592262@hermes.sun.com> Subject: Core Java Technologies Tech Tips, Jan. 10, 2003 (Using Charsets and Encodings, Using Reflection to Create Class Instances) Mime-Version: 1.0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Mailer: SunMail 1.0 Core Java Technologies Technical Tips
image
image
Core Java Technologies Technical Tips
image
   View this issue as simple text January 10, 2003    

In this Issue

Welcome to the Core JavaTM Technologies Tech Tips, January 10, 2003. Here you'll get tips on using core Java technologies and APIs, such as those in Java 2 Platform, Standard Edition (J2SETM).

This issue covers:

Using Charsets and Encodings
Using Reflection To Create Class Instances

These tips were developed using Java 2 SDK, Standard Edition, v 1.4.

This issue of the Core Java Technologies Tech Tips is written by Glen McCluskey.

Pixel
Pixel

USING CHARSETS AND ENCODINGS

Suppose that you're doing some Java programming, and have need to write characters to a file:

    import java.io.*;
    
    public class Encode1 {
        public static void main(String args[]) 
            throws IOException {
                Writer writer = new FileWriter("out");
                writer.write("testing");
                writer.close();
        }
    }

When you run this program in the United States in the SolarisTM Operating Environment or on the Windows platform, the result is a text file "out" of 7 bytes. This is what you would expect.

But there is an important issue here. Java characters are 16-bit, that is, each character is two bytes long. The Encode1 program writes a 7-character string to a file, and the result is a 7-byte file. You might ask: what happened to the other bytes, shouldn't there be 14 bytes written?

This issue falls under the title "character encodings". The problem is how to map between 16-bit characters representing Java data, and 8-bit bytes stored in data files. And in fact, it's trickier than simply "widening" or "narrowing" the character between 8 and 16 bits because there are literally hundreds of different character encoding schemes in use around the world. This means that the specific sequence of 8-bit bytes needed to represent a particular Java string changes from platform to platform and from locale to locale.

The Java system solves this problem by allowing you to choose the particular encoding scheme that's required when writing out characters. It also provides a reasonable default encoding based on your platform and locale. The Java system supports default encodings for performing I/O, as in the example above. In addition, you can also specify other named encodings ("charsets"). These encodings are described by string names, such as "UTF-8", and by instances of the java.nio.charset.Charset class. Charset is abstract, so the actual instances are objects of subclasses of Charset.

In the Encode1 example, one way of solving the encoding problem is to always write two bytes out for each character. However the file will have null bytes interspersed. Another approach is to throw away the high byte of each Java character. This will work in the example above, but it wouldn't work if you tried to write a string of Greek or Japanese instead..

What actually happens in this example is that the second approach is used -- the high byte is discarded. If you change the output line in the Encode1 program from:

    writer.write("testing");

to:

    writer.write("testing\u1234");

the total output length will be 8 bytes instead of 7, even though the Unicode character \u1234 cannot be represented using a single byte.

""Discard" in the previous discussion can have a couple of meanings. If the high byte of a Java character is 0, as is the case for characters representing 7-bit ASCII, then discard means to omit the high byte. However, another meaning applies to the situation where you have a Java character that is not mappable using a particular encoding. In such a case the character (two bytes) may be replaced by a default substitution byte. In the case above, \u1234 is replaced with 0x3f.

Let's now look at how to use charsets, mappings between characters and bytes. One basic question you might have is: what charsets are available? Here's a program that displays a list:

    import java.nio.charset.*;
    import java.util.*;
    
    public class Encode2 {
        public static void main(String args[]) {
            Map availcs = Charset.availableCharsets();
            Set keys = availcs.keySet();
            for (Iterator iter = 
                keys.iterator();iter.hasNext();) {
                    System.out.println(iter.next());
                }
        }
    }

The output should look something like this (but without the "*" character):

    ISO-8859-1*
    ISO-8859-15
    US-ASCII*
    UTF-16*
    UTF-16BE*
    UTF-16LE*
    UTF-8*
    windows-1252

The "*" is shown here to identify charsets that must be supported on all Java platforms.

Another basic question: what is the default charset on my local system? Here's a program that displays the name of the default:

    import java.io.*;
    import java.nio.charset.*;
    
    public class Encode3 {
        public static void main(String args[]) 
            throws IOException {
                FileWriter filewriter = 
                    new FileWriter("out");
                String encname = 
                    filewriter.getEncoding();
                filewriter.close();
                System.out.println(
                    "default charset is: " + encname);
    
                /*
                Charset charset1 = 
                    Charset.forName(encname);
                Charset charset2 = 
                    Charset.forName("windows-1252");
                if (charset1.equals(charset2)) {
                    System.out.println(
                        "Cp1252/windows-1252 equal");
                }
                else {
                    System.out.println(
                        "Cp1252/windows-1252 unequal");
            }
            */
        }
    }

When you run this program, you might see a result like this:

    default charset is: Cp1252

Notice that this charset is not on the list of required charsets that every Java implementation must support. There is no requirement that the default charset must be one of the required charsets. This example also has some commented-out logic that shows how you can determine whether two charsets are equal or not. It turns out that "windows-1252" and "Cp1252" are in fact names for a single charset. The logic is commented out because there is no requirement that the Cp1252 charset be supported, and so the logic here might not be meaningful to you.

You may have seen other ways to get the default local charset name, such as querying the "file.encoding" system property. This approach might work, but this property is not guaranteed to be defined on all Java platforms.

In the Encode3 program, Charset.forName is used to find the Charset object for a string name such as "US-ASCII". Here's another example that uses this technique:

    import java.nio.charset.*;
    
    public class Encode4 {
        public static void main(String args[]) {
            if (args.length != 1) {
                System.out.println(
                    "missing charset name");
                System.exit(1);
            }
    
            String charsetname = args[0];
            Charset charset;
    
            try {
                charset = Charset.forName(charsetname);
                System.out.println(
                    "charset lookup successful");
            }
            catch (UnsupportedCharsetException exc) {
                System.out.println(
                    "unknown charset: " + charsetname);
            }
        }
    }

If you run the program, like this:

    $ java Encode4 XYZ

it will check whether "XYZ" is a supported Charset on the local system, and if so, obtain the Charset object.

Given all this background, how do you actually make use of charsets? Here's a rework of the first example, Encode1:

    import java.io.*;
    
    public class Encode5 {
        public static void main(String args[]) 
            throws IOException {
                FileOutputStream fileoutstream =
                    new FileOutputStream("out");
                Writer writer = new OutputStreamWriter(
                    fileoutstream, "UTF-8");
                writer.write("testing");
                writer.close();
        }
    }

The Encode1 program is not portable. It applies the default charset, which can vary based on platform and locale. By contrast, the Encode5 program uses a standard charset (UTF-8). As mentioned earlier, the default encoding used in the Encode1 example discards the high byte of Java characters. Using the UTF-8 encoding solves this problem. If you change the output line in the Encode program from:

    writer.write("testing");

to:

    writer.write("testing\u1234");

it still works. And UTF-8 has the advantage of handling 7-bit ASCII in a graceful way.

Here's another example. It shows how you can convert Java strings to byte vectors, specifying an encoding:

    import java.io.*;
    
    public class Encode6 {
        public static void main(String args[])
        throws UnsupportedEncodingException {
            String str = "testing";
    
            byte bytevec1[] = str.getBytes();
            byte bytevec2[] = str.getBytes("UTF-16");
    
            System.out.println("bytevec1 length = " +
                bytevec1.length);
            System.out.println("bytevec2 length = " +
                bytevec2.length);
        }
    }

The output on your system should look something like this:

    bytevec1 length = 7
    bytevec2 length = 16

The first conversion applies the default charset. The second conversion uses the UTF-16 charset.

There's one final thing to discuss about character encodings. You might wonder what a typical mapping or encoding algorithm really looks like. Here is some actual code taken from DataOutputStream.writeUTF. It's used to map a character vector into a byte vector:

    for (int i = 0; i < strlen; i++) {
        c = charr[i];
        if ((c >= 0x0001) && (c <= 0x007F)) {
            bytearr[count++] = (byte) c;
        }
        else if (c > 0x07FF) {
            bytearr[count++] = 
                (byte) (0xE0 | ((c >> 12) & 0x0F));
            bytearr[count++] = 
                (byte) (0x80 | ((c >>  6) & 0x3F));
            bytearr[count++] = 
                (byte) (0x80 | ((c >>  0) & 0x3F));
        }
        else {
            bytearr[count++] = 
                (byte) (0xC0 | ((c >>  6) & 0x1F));
            bytearr[count++] = 
                (byte) (0x80 | ((c >>  0) & 0x3F));
        }
    }

Characters are taken from charr, converted into 1-3 bytes, and written into bytearr. Characters in the range 0x1 - 0x7f (7-bit ASCII) are mapped into themselves. Characters with value 0x0 and in the range 0x80 - 0x7ff are mapped into two bytes. All other characters are mapped into three bytes.

For more information about charsets and encodings, see section 9.7.1, Character Encodings, in "The JavaTM Programming Language Third Edition" by Arnold, Gosling, and Holmes. Also see the documentation for Supported Encodings and Charset. The document Unicode Transformation Formats: UTF-8 & Co. is another good place to learn about charsets and encodings.

Pixel
Pixel

USING REFLECTION TO CREATE CLASS INSTANCES

Imagine that you're doing some Java programming, and you need to create a new instance of the A class. You write some code like this:

    A aref = new A();

Pretty obvious, right?

Suppose, however, you take a step further and specify that the name of the class is found in a string made available at run time. It's still possible to proceed, like this:

    String classname; // can be either A, B, or C

    A aref = null;
    B bref = null;
    C cref = null;

    if (classname.equals("A"))
        aref = new A();
    else if (classname.equals("B"))
        bref = new B();
    else
        cref = new C();

This code works, but it's cumbersome. Also, it can't be expanded much further without major effort.

There's another approach that works much better in this kind of situation. The basic idea is that you use Class.forName to obtain a java.lang.Class object for a class whose string name you specify. java.lang.Class is a class whose instances represent Java types, such as classes and interfaces and arrays. After you obtain a java.lang.Class instance, you can call newInstance to create a new object of the class represented by the java.lang.Class instance. The code looks like this:

    Class cls = Class.forName(classname);

    Object obj = cls.newInstance();

This sequence creates an object of the class whose string name is classname.

After you have a java.lang.Class instance, you can also find out other things about the represented class, for example, what methods and fields it contains. You can look up methods by name, and use reflection to call these methods.

Let's look at an example to make these ideas a little more concrete. The example uses java.lang.Class and reflection to implement a class and method exerciser. The idea is that you have some classes and methods, and you'd like to write a driver program to test them. For example, for this input:

    $ java NewDemo A string1 string2 @ f2 string3 string4 string5

the driver creates an object of class A, using string1/string2 as string arguments to the A constructor. The driver then calls A.f2 for the created object, using string3/string4/string5 as arguments to the f2 method.

Note that the driver program doesn't know anything about the A class. It's written in a general way to work with any class. The driver looks up and manipulates class and method names using java.lang.Class and reflection.

Here's what the code looks like:

    import java.lang.reflect.*;
    
    public class NewDemo {
        Class cls;
    
        Object obj;
    
        Constructor ctor;
        Object ctorargs[];
    
        Method meth;
        Object methargs[];
    
        String args[];
        int divpos;
    
        // parse input of the form:
        //
        //  classname arg1 arg2 ... 
        //   @ methodname arg1 arg2 ...
    
        public NewDemo(String a[]) throws 
            ClassNotFoundException, 
            NoSuchMethodException {
    
            args = a;
    
            // search for @ divider in input
    
            divpos = -1;
            for (int i = 0; i < args.length; i++) {
                if (args[i].equals("@")) {
                    divpos = i;
                    break;
                }
            }
            if (divpos < 1 || divpos + 1 == args.length) {
                throw new IllegalArgumentException(
                    "bad syntax");
            }
    
            // load appropriate class 
            // and get Class object
    
            String classname = args[0];
            cls = Class.forName(classname);
    
            // find the constructor, 
            // if arguments specified for it
    
            if (divpos > 1) {
                Class ptypes[] = new Class[divpos - 1];
                for (int i = 0; i < ptypes.length; i++) {
                    ptypes[i] = String.class;
                }
                ctor = cls.getConstructor(ptypes);
    
                // set up the constructor arguments
    
                ctorargs = new Object[divpos - 1];
                for (int i = 0; i < ctorargs.length; i++) {
                    ctorargs[i] = args[i+1];
                }
            }
    
            // find the right method
    
            String methodname = args[divpos + 1];
            int firstarg = divpos + 2;
            Class ptypes[] = 
                new Class[args.length - firstarg];
            for (int i = 0; i < ptypes.length; i++) {
                ptypes[i] = String.class;
            }
            meth = cls.getMethod(methodname, ptypes);
    
            // set up the method arguments
    
            methargs = new Object[ptypes.length];
            for (int i = 0; i < methargs.length; i++) {
                methargs[i] = args[firstarg + i];
            }
        }
    
        // create an object of the specified class
    
        public void createObject() throws 
            InstantiationException,
            IllegalAccessException, 
            InvocationTargetException {
        
            // if class has no-arg constructor, 
            // use it
    
            if (ctor == null) {
                obj = cls.newInstance();
            }
    
            // otherwise use constructor with arguments
    
            else {
                obj = ctor.newInstance(ctorargs);
            }
        }
    
        // call the method and display its return value
    
        public void callMethod() throws 
            IllegalAccessException,
            InvocationTargetException {
    
            Object ret = meth.invoke(obj, methargs);
            System.out.println("return value: " + ret);
        }
    
        public static void main(String args[]) {
    
            // create a NewDemo instance 
            // and call the method
    
            try {
                NewDemo nd;
    
                nd = new NewDemo(args);
                nd.createObject();
                nd.callMethod();
            }
    
            // display any resulting exception
    
            catch (Exception e) {
                System.out.println(e);
                System.exit(1);
            }
        }
    }

Here is a test class you can use with the demo:

    public class A {
        public A() {
            System.out.println("call: A.A()");
        }
        public A(String s1, String s2) {
            System.out.println(
                "call: A.A(" + s1 + "," + s2 + ")");
        }
        public void f1() {
            System.out.println("call: A.f1()");
        }
        public double f2(
            String s1, String s2, String s3) {
            System.out.println("call: A.f2(" + s1 + "," + s2 +
                    "," + s3 + ")");
            return 12.34;
        }
    }

You need to compile this class in the usual way.

The NewDemo constructor is used to parse the input line, to find the java.lang.Class object for the specified class, and to find the appropriate constructor and method. Then createObject is called to create an instance of the class. Finally, callMethod is used to actually call the method for the class instance.

The constructor and method are found by creating a java.lang.Class vector that contains the types of each parameter to the constructor or method. This example takes the liberty of assuming that all parameters are of String type, and thus the corresponding java.lang.Class object is "String.class". Then getConstructor and getMethod are used to find the actual constructor or method to use.

If you run the driver, by saying:

    java NewDemo A @ f1

The output is:

    call: A.A()
    call: A.f1()
    return value: null

Here are additional driver runs:

    java NewDemo A @ f2 str1 str2 str3 

    java NewDemo A str4 str5 @ f1

    java NewDemo A str6 str7 @ f2 str8 str9 str10

And here are their respective results:

    call: A.A()
    call: A.f2(str1,str2,str3)
    return value: 12.34

    call: A.A(str4,str5)
    call: A.f1()
    return value: null

    call: A.A(str6,str7)
    call: A.f2(str8,str9,str10)
    return value: 12.34

Some examples of driver runs with bad input are:

    java NewDemo
    java NewDemo A
    java NewDemo A @
    java NewDemo A str1 @ f1
    java NewDemo A @ f1 str1
    java NewDemo B @ f1
    java NewDemo A str11 str12 @
    java NewDemo A @ f3 str1

The results are:

    java.lang.IllegalArgumentException: bad syntax
    java.lang.IllegalArgumentException: bad syntax
    java.lang.IllegalArgumentException: bad syntax
    java.lang.NoSuchMethodException
    java.lang.NoSuchMethodException: f1
    java.lang.ClassNotFoundException: B
    java.lang.IllegalArgumentException: bad syntax
    java.lang.NoSuchMethodException: f3

The techniques illustrated here are extremely powerful, and allow you to manipulate types and methods by name at run time. These techniques are used by tools such as interpreters, debuggers, and object exercisers.

For more information about using reflection to create class instances see section 11.2.1, The Class class, and section 11.2.6, The Method Class, in "The JavaTM Programming Language Third Edition" by Arnold, Gosling, and Holmes.

Pixel
Pixel

Reader Feedback

  Very worth reading    Worth reading    Not worth reading 

If you have other comments or ideas for future technical tips, please type them here:

 

Have a question about JavaTM programming? Use Java Online Support.

Pixel
Pixel
Pixel

IMPORTANT: Please read our Terms of Use, Privacy, and Licensing policies:
http://www.sun.com/share/text/termsofuse.html
http://www.sun.com/privacy/
http://developer.java.sun.com/berkeley_license.html


Comments? Send your feedback on the Core JavaTM Technologies Tech Tips to: jdc-webmaster@sun.com

Subscribe to other Java developer Tech Tips:

- Enterprise Java Technologies Tech Tips. Get tips on using enterprise Java technologies and APIs, such as those in the Java 2 Platform, Enterprise Edition (J2EETM).
- Wireless Developer Tech Tips. Get tips on using wireless Java technologies and APIs, such as those in the Java 2 Platform, Micro Edition (J2METM).

To subscribe to these and other JDC publications:
- Go to the JDC Newsletters and Publications page, choose the newsletters you want to subscribe to and click "Update".
- To unsubscribe, go to the subscriptions page, uncheck the appropriate checkbox, and click "Update".


ARCHIVES: You'll find the Core Java Technologies Tech Tips archives at:
http://java.sun.com/jdc/TechTips/index.html


Copyright 2003 Sun Microsystems, Inc. All rights reserved.
901 San Antonio Road, Palo Alto, California 94303 USA.


This document is protected by copyright. For more information, see:
http://java.sun.com/jdc/copyright.html


Sun, Sun Microsystems, Java, Java Developer Connection, J2SE, J2EE, and J2ME are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.

Sun Microsystems, Inc.
Please unsubscribe me from this newsletter.