Technical Tips

Return-Path: Received: from fort-point-station.mit.edu by po10.mit.edu (8.9.2/4.7) id UAA01968; Tue, 4 Jun 2002 20:44:39 -0400 (EDT) Received: from hermes.sun.com (hermes.sun.com [64.124.140.169]) by fort-point-station.mit.edu (8.9.2/8.9.2) with SMTP id UAA11196 for ; Tue, 4 Jun 2002 20:44:39 -0400 (EDT) Date: Tue, 4 Jun 2002 16:44:39 GMT-08:00 From: "JDC Editorial" To: alexp@mit.edu Message-Id: <16600087-335761239@hermes.sun.com> Subject: JDC Tech Tips Mime-Version: 1.0 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: SunMail 1.0 Technical Tips

View this issue as simple text June 4, 2002

Using the CharSequence Interface
Programming With Buffers

These tips were developed using java^TM 2 SDK, Standard Edition, v 1.4.

This issue of the JDC Tech Tips is written by Glen McCluskey.

Using the CharSequence Interface

Suppose that you're writing a method in the Java programming language. Suppose too that you want to specify one of the method parameters such that a user of the method can pass a string or sequence of characters to the method. An obvious way to do this is to use a String parameter. But what happens if the method caller has a StringBuffer or CharBuffer object instead of a String? Or what happens if the caller wants to pass a character array to your method? In those cases, using a String parameter will not work. Instead, you can use the java.lang.CharSequence interface. This is an interface that supports generalization of character sequences.

A class that implements the CharSequence interface must define the following methods:

length - returns the number of characters in the sequence

charAt - returns a character at a given position

subSequence - returns a subsequence of a sequence

toString - returns a String containing the sequence characters

The String, StringBuffer, and CharBuffer classes implement the CharSequence interface, so passing an object of one of these classes to a method with a CharSequence parameter will work. Let's look at an example:

    import java.nio.*;
    
    public class CSDemo1 {
    
        // dump out information about a CharSequence
    
        public static void dumpSeq(CharSequence cs) {
            System.out.println(
                            "length = " + cs.length());
            System.out.println(
                       "first char = " + cs.charAt(0));
            System.out.println("string = " + cs);
            System.out.println();
        }
   
        public static void main(String args[]) {
    
            // String
    
            String s = "test";
            dumpSeq(s);
    
            // StringBuffer
    
            StringBuffer sb = new StringBuffer("ing");
            dumpSeq(sb);
    
            // CharBuffer
    
            CharBuffer cb = CharBuffer.allocate(3);
            cb.put("123");
            cb.rewind();
            dumpSeq(cb);
        }
    }

In the CSDemo1 program, dumpSeq is a method with a CharSequence parameter. The parameter is passed instances of String, StringBuffer, and CharBuffer. When you run the program, the output is:

    length = 4
    first char = t
    string = test

    length = 3
    first char = i
    string = ing

    length = 3
    first char = 1
    string = 123

This example makes clear that you can use CharSequence as a generalization of character sequences. It shows that you can write methods that accept objects of any class type that implements the CharSequence interface.

Let's look at another example, one that defines a CharArrayWrapper class that implements the CharSequence interface. The class is used to wrap a character array such that the wrapper object can be passed as an argument to a method expecting a CharSequence. Here's what the code looks like:

    class CharArrayWrapper implements CharSequence {
        private char vec[];
        private int off;
        private int len;
    
        // length of sequence
    
        public int length() {
            return len;
        }
    
        // character at a given index
    
        public char charAt(int index) {
            if (index < 0 || index >= len) {
                throw new IndexOutOfBoundsException(
                    "invalid index");
            }
            return vec[index];
        }
    
        // subsequence from start (inclusive) 
        // to end (exclusive)
    
        public CharSequence subSequence(
                                  int start, int end) {
            if (start < 0 || end < 0 || end > len || 
                                         start > end) {
                throw new IndexOutOfBoundsException(
                    "invalid start/end");
            }
            return new CharArrayWrapper(
                              vec, start, end - start);
        }
    
        // convert to string
    
        public String toString() {
            return new String(vec, off, len);
        }
    
        // construct an CharArrayWrapper
        // from a portion of a char array 
    
        public CharArrayWrapper(
                        char vec[], int off, int len) {
            if (vec == null) {
                throw new NullPointerException(
                                      "null vec");
            }
            if (off < 0 || len < 0 || 
                              off > vec.length - len) {
                throw new IllegalArgumentException(
                                        "bad off/len");
            }
            this.vec = vec;
            this.off = off;
            this.len = len;
        }
   
        // construct a CharArrayWrapper 
        // from a char array
    
        public CharArrayWrapper(char vec[]) {
            this(vec, 0, vec.length);
        }
    }
    
    public class CSDemo2 {
        public static void main(String args[]) {
    
            // create array and wrap it
    
            char vec[] = {'a', 'b', 'c', 'd', 'e'};
            CharSequence cs = 
                             new CharArrayWrapper(vec);
    
            // test the CharArrayWrapper 
            // interface implementation
    
            System.out.println("string = " + cs);
            System.out.println(
                            "length = " + cs.length());
            System.out.println("subSequence(2,4) = " +
                cs.subSequence(2, 4));
            System.out.println(
                        "charAt(0) = " + cs.charAt(0));
        }
    }

In the CSDemo2 program, the subSequence method is implemented by creating another CharArrayWrapper object, one that has a view of a portion of the original character array. The toString method is implemented by calling a String constructor with the array as its argument. Note that creating a string in this way is fairly expensive, because the array is copied. If instead, you simply provide a CharArrayWrapper for an array, it involves no copying.

The CharArrayWrapper class and CharSequence provide a read-only interface. There's no way to modify the character sequence using this interface. If you're familiar with the java.nio.CharBuffer class, you'll notice that using that class to wrap a character array achieves a similar effect to CharArrayWrapper. That's because CharBuffer implements the CharSequence interface. The CharArrayWrapper class and CharSequence provide a read-only interface. There's no way to modify the character sequence using this interface. If you're familiar with the java.nio.CharBuffer class, you'll notice that using that class to wrap a character array achieves a similar effect to CharArrayWrapper. That's because CharBuffer implements the CharSequence interface.

If you run the CSDemo2 program, the results look like this:

    string = abcde
    length = 5
    subSequence(2,4) = cd
    charAt(0) = a

The CharSequence interface is used in the regular expression package (java.util.regex). You can see an example of its use by examining the BufferDemo7 program in the Programming With Buffers tip that follows this tip.

For more information about the CharSequence interface, see the interface description.

Programming With Buffers

If you've done any I/O programming in the Java programming language, you've probably had a situation where you're reading from an input stream into an array, and then writing the array contents to an output stream. Typically, you might have an array that's 1024 bytes long, and each read of the stream puts 0-to-1024 bytes into the array.

An array used in this way is also called a buffer. Version 1.4 of the Java 2 Platform, Standard Edition adds a major new set of buffer classes. These classes are found in the java.nio package. The classes are based on the the superclass java.nio.Buffer. Formally, a buffer is a linear, finite sequence of elements of a specific primitive type such as char (CharBuffer) or double (DoubleBuffer). Buffers are typically used in I/O operations, but you can also use them in other areas.

This tip examines some uses of buffers. Let's start by looking at an example that shows how buffers are allocated:

    import java.nio.*;
    
    public class BufferDemo1 {
        public static void main(String args[]) {
    
            // allocate non-direct buffer
    
            ByteBuffer bb1 = ByteBuffer.allocate(100);
            System.out.println(bb1);
    
            // allocate direct buffer
    
            ByteBuffer bb2 = 
                        ByteBuffer.allocateDirect(100);
            System.out.println(bb2);
        }
    }

Buffers have a fixed size, in this example 100 bytes. Here, the program allocates one buffer as a direct buffer, and another as non-direct. What does this mean? The documentation for ByteBuffer explains it this way:

A byte buffer is either direct or non-direct. Given a direct byte buffer, the Java virtual machine* will make a best effort to perform native I/O operations directly upon it. That is, it will attempt to avoid copying the buffer's content to (or from) an intermediate buffer before (or after) each invocation of one of the underlying operating system's native I/O operations.

Direct buffers could have higher allocation costs, and they're not always the right choice. However, they are worth considering to improve the performance of I/O.

When you run the BufferDemo1 program, the result is:

    java.nio.HeapByteBuffer[pos=0 lim=100 cap=100]
    java.nio.DirectByteBuffer[pos=0 lim=100 cap=100]

HeapByteBuffer and DirectByteBuffer are internal classes used in the implementation of ByteBuffer.

You might wonder what the pos, lim, and cap values represent in the result. These are basic properties of a buffer. They represent the position, limit, and capacity of the buffer, respectively. A buffer's position is the index (0 basis) of the next element to be read or written. A buffer's limit is the index of the next element that should not be read or written. (The difference between the limit and the position is a buffer's "remaining value," that is, the number of elements remaining in the buffer.) A buffer's capacity is its fixed number of elements. In the BufferDemo1 example, the capacity is 100 elements, and each element is a single byte.

These basic properties of a buffer represent one answer to the question "how does a buffer differ from an array of the same primitive type?" A buffer is essentially a wrapper on top of an array. It contains additional state information about the use of the array, for example, the index of the next location to be read or written.

Let's solidify these concepts with another example:

    import java.nio.*;
    
    public class BufferDemo2 {
    
        // dump state of a buffer
    
        static void dumpState(Buffer b) {
            System.out.println(
                           "position=" + b.position());
            System.out.println("limit=" + b.limit());
            System.out.println(
                           "capacity=" + b.capacity());
            System.out.println(
                         "remaining=" + b.remaining());
            System.out.println();
        }
    
        public static void main(String args[]) {
    
            // allocate buffer
    
            IntBuffer ib = IntBuffer.allocate(2);
            dumpState(ib);
    
            // add a value to it
    
            ib.put(37);
            dumpState(ib);
    
            // add another value to it
    
            ib.put(47);
            dumpState(ib);
        }
    }

The BufferDemo2 program allocates a two-long int buffer. It then adds two values to the buffer using relative put operations. These operations are relative to the current position in the buffer. Each time the program adds an element, the position is incremented, and the number of remaining elements decreases by one. If you run the program, the output looks like this:

    position=0
    limit=2
    capacity=2
    remaining=2
    
    position=1
    limit=2
    capacity=2
    remaining=1
    
    position=2
    limit=2
    capacity=2
    remaining=0

Let's look at a more complicated example. It's another example of using buffers for I/O -- here buffers are used to copy one file to another. However, the example has a couple of unusual aspects to it. First, the input and output "files" are byte arrays. Second, the example simulates a case where I/O doesn't work very well, that is, the case where an I/O request is only partially successful. For example, in this case, you might request that two bytes be read from a file, but zero, one, or two bytes might actually be read.

Here's the code:

    import java.nio.ByteBuffer;
    import java.util.Random;
    
    public class BufferDemo3 {
        static Random rn = new Random(0);
        
        // size of input "file" (buffer)
    
        static final int FILESIZE = 10;
    
        // input and output "files" (arrays of bytes)
    
        static byte infile[] = new byte[FILESIZE];
        static int inptr = 0;
        static byte outfile[] = new byte[FILESIZE];
        static int outptr = 0;
    
        // initialize input
    
        static {
            for (int i = 0; i < infile.length; i++) {
                infile[i] = (byte)(i + 1);
            }
        }
    
        // read 0-2 bytes from input file
    
        static int read(ByteBuffer bb) {
    
            // at end of file?
    
            if (inptr == infile.length) {
                return -1;
            }
    
            // read 0-2 bytes from input 
            // and put into buffer
    
            int n = rn.nextInt(3);
            int cnt = 0;
            while (bb.hasRemaining() && n > 0 &&
            inptr < infile.length) {
                bb.put(infile[inptr++]);
                cnt++;
                n--;
            }
    
            // return the amount read
    
            return cnt;
        }
    
        // write to output file
    
        static int write(ByteBuffer bb) {
    
            // read 0-2 bytes from buffer 
            // and add to output
    
            int n = rn.nextInt(3);
            int cnt = 0;
            while (bb.hasRemaining() && n > 0) {
                outfile[outptr++] = bb.get();
                cnt++;
                n--;
            }
    
            // return amount written
    
            return cnt;
        }
    
        public static void main(String args[]) {
    
            // allocate buffer
    
            ByteBuffer bb = ByteBuffer.allocate(2);
            bb.clear();
    
            // transfer from input to output file,
            // using buffer as intermediary
    
            int cnt = 0;
            while (cnt < FILESIZE) {
    
                // handle case where at end of file
                // but buffer not yet empty
    
                if (
                  read(bb) < 0 && bb.position() == 0) {
                    break;
                }
                bb.flip();
                cnt += write(bb);
    
                // handle case where write only
                // partially successful
    
                bb.compact();
            }
    
            // check results of copy
    
            for (int i = 0; i < outfile.length; i++) {
                if (outfile[i] != i + 1) {
                    System.out.println(
                                     "compare error");
                }
            }
        }
    }

The main method of the BufferDemo3 program allocates a two-long byte buffer, clears it, and then uses the buffer to copy from one file (byte array) to another. Clearing the buffer makes it ready for use by setting the limit to the capacity and the position to zero.

Each time it calls the read method, the program transfers zero, one, or two bytes from the input into the buffer. Each write call transfers zero, one, or two bytes from the buffer into the output.

Notice that for each copy iteration, the program reads from the input, flips the buffer, and writes to the output. What does it mean to flip the buffer? Flipping sets the buffer limit to the current position, and then sets the position to zero. What's the benefit of flipping? Consider that as you write into a buffer, its position keeps going up. At some point, when you're done writing, you want to read from that buffer. Flipping captures the position (the amount you've written), and then sets the position to zero. This allows you to read up to the limit.

The read and write methods have to be sensitive to the fact that the buffer they're passed might not be empty. That's why the read and write methods call the hasRemaining method. Also, the loop in the main method has to handle the case where an end of file has been reached, but the buffer is not yet empty. That's why the main method includes the test:

    if (read(bb) < 0 && bb.position() == 0)

If you omit the latter part of this test, the program will work most of the time, but it occasionally does an incorrect copy.

The main loop also has to handle the case where a write is only partially successful. In this case, the buffer still contains data to be written after the write method returns. This situation is handled by the compact method. This method copies the bytes between the buffer's current position and its limit to the beginning of the buffer. If you omit the call to the compact method, the copy hangs.

Let's look at some additional buffer features. Suppose that you have a byte array containing the bytes 0x39 and 0x30. These bytes represent the quantity 12345 as a little-endian 16-bit value (least significant byte first). How can you extract the value from this byte array? Here's one way:

    import java.nio.*;
    
    public class BufferDemo4 {
    
        // byte value for the short value 12345 
        // (little-endian)
    
        static final byte buf[] = {0x39, 0x30};
    
        public static void main(String args[]) {
    
            // allocate buffer
    
            ByteBuffer bb = ByteBuffer.allocate(2);
    
            // add contents of buf to it
    
            bb.put(buf);
    
            // prepare buffer for reading
    
            bb.flip();
    
            // make read-only, change order to
            // little-endian, and create view
    
            ShortBuffer sb = bb.asReadOnlyBuffer().
                order(ByteOrder.LITTLE_ENDIAN).
                asShortBuffer();
    
            // dump out original two bytes
    
            System.out.println(Integer.toHexString(bb.get(0)));
            System.out.println(Integer.toHexString(bb.get(1)));
    
            // dump out original two bytes as a short
    
            System.out.println(bb.getShort());
    
            // dump out little-endian view of short
    
            System.out.println(sb.get());
        }
    }

The BufferDemo4 program creates a two-long buffer, and then does a bulk add of the byte array containing the two values.

The program then creates a read-only view of the buffer. Here, the ordering is changed from big-endian to little-endian. Finally, the program creates a short view of the buffer. This process is an example of "invocation chaining." In invocation chaining, each method returns a buffer, such that method invocations can be chained together. The net result of this operation is a view of the byte buffer as a sequence of read-only short values. Each value is composed of two bytes, with the bytes of each short assumed to be in little-endian order.

The original two bytes are dumped out using absolute get methods that do not affect the buffer position. Then the short value is displayed from the original buffer (which defaults to big-endian order). Last, the short value is displayed from the short view buffer, using little-endian order. The result is:

The short values are different because two different byte orderings are applied. In other words, the byte values 0x39 and 0x30 have the value 14640 if viewed as a big-endian short. They have the the value 12345 if viewed as a little-endian short.

The buffer classes include two methods, duplicate and slice, that you can use to create copies of buffers. These methods make shallow copies, that is, they copy only state information, and share the underlying buffer contents. Changes to the contents of one buffer is visible in a copy. The duplicate method copies the capacity, limit, and position values of a buffer. The slice method sets the position to zero, and the limit and capacity to the number of elements remaining in the original buffer. Let's look at an example of how duplicate and slice work:

    import java.nio.*;
    
    public class BufferDemo5 {
        public static void main(String args[]) {
    
            // allocate a buffer and add values to it
    
            IntBuffer ib = IntBuffer.allocate(2);
            ib.put(37);
            ib.put(47);
    
            // create duplicate and slice of buffer
    
            ib.position(1);
            IntBuffer dup = ib.duplicate();
            IntBuffer slc = ib.slice();
    
            // display buffer details
    
            System.out.println(dup);
            System.out.println(slc);
    
            // change index 0 of duplicate and then
            // display value of original
    
            dup.put(0, 57);
            System.out.println(ib.get(0));
        }
    }

If you run this program, the result is:

    java.nio.HeapIntBuffer[pos=1 lim=2 cap=2]
    java.nio.HeapIntBuffer[pos=0 lim=1 cap=1]
    57

The first line of output is the duplicate of the original buffer. The duplicate contains the same position, limit, and capacity as the original. The second line of output is the slice of the original. The position was set to 1 when the slice was created. So the slice buffer has a smaller limit and capacity, and contains a view of only the last element of the original buffer.

The last output line illustrates the idea that changes to the duplicate are reflected in the original. When the value at index 0 in the duplicate is changed, the original buffer is changed as well. That's because the original and the duplicate share the same contents.

A final example illustrates the use of MappedByteBuffer, a subclass of ByteBuffer. Use a MappedByteBuffer to map a region of a file to a buffer. A MappedByteBuffer was used in the May 7, 2002 Tech Tip "File Channels".

Recall that the earlier Tech Tip in this issue, "Using the CharSequence Interface," showed how to implement the CharSequence interface in an array wrapper class. Here's another example of implementing CharSequence, this time with the character sequence in a file.

Suppose you have the following program:

    import java.io.*;
    
    public class BufferDemo6 {
        public static void main(String args[]) 
                                   throws IOException {
    
            // write "testing" as a string to the 
            // output file, each char encoded as 
            // two bytes
    
            FileOutputStream fos = 
                          new FileOutputStream("data");
            DataOutputStream dos = 
                             new DataOutputStream(fos);
            dos.writeChars("testing");
            dos.close();
            fos.close();
        }
    }

The BufferDemo6 program writes the string "testing" to a file, with each Unicode character written as two bytes.

What if you could make the contents of this file "implement" the CharSequence interface, so that the contents could be passed to a method that expects a CharSequence parameter? Is there any way to do this? Here's one approach, that amends the CSDemo1 program example from the Tech Tip "Using the CharSequence Interface":

    import java.io.*;
    import java.nio.*;
    import java.nio.channels.*;
    
    public class BufferDemo7 {
    
        // dump out information about a CharSequence
    
        public static void dumpSeq(CharSequence cs) {
            System.out.println(
                            "length = " + cs.length());
            System.out.println(
                       "first char = " + cs.charAt(0));
            System.out.println("string = " + cs);
            System.out.println();
        }

        public static void main(String args[]) 
                                   throws IOException {
    
            // String
    
            String s = "test";
            dumpSeq(s);
    
            // StringBuffer
    
            StringBuffer sb = new StringBuffer("ing");
            dumpSeq(sb);
    
            // CharBuffer
    
            CharBuffer cb = CharBuffer.allocate(3);
            cb.put("123");
            cb.rewind();
            dumpSeq(cb);
    
            // mapped file
        
            FileInputStream fis = 
                           new FileInputStream("data");
            FileChannel fc = fis.getChannel();
            MappedByteBuffer mbb =
                fc.map(FileChannel.MapMode.READ_ONLY, 
                                         0, fc.size());
            cb = mbb.asCharBuffer();
            dumpSeq(cb);
            fc.close();
            fis.close();
        }
    }

The first part of the BufferDemo7 program is the same as the CSDemo1 program. The difference is the last part of BufferDemo7. In that part, the program opens the data file written by BufferDemo6, gets a channel, and then maps the file contents to a MappedByteBuffer. It then creates a CharBuffer view of the MappedByteBuffer.

Because CharBuffer implements the CharSequence interface, the CharBuffer object can be passed to the dumpSeq method, and is treated just like a String or StringBuffer. The CharBuffer view in this case represents a region of a file. So accessing characters through the CharSequence interface means that characters are being read from a file.

If you run the BufferDemo7 program, the output looks like this:

    length = 4
    first char = t
    string = test

    length = 3
    first char = i
    string = ing

    length = 3
    first char = 1
    string = 123

    length = 7
    first char = t
    string = testing

For more information about programming with buffers, see the "New I/0 Functionality for Java 2 Standard Edition 1.4"

* As used in this document, the terms "Java virtual machine" or "JVM" mean a virtual machine for the Java platform.

IMPORTANT: Please read our Terms of Use, Privacy, and Licensing policies:
http://www.sun.com/share/text/termsofuse.html
http://www.sun.com/privacy/
http://developer.java.sun.com/berkeley_license.html

Comments? Send your feedback on the Java^TM Developer Technical Tips to: jdc-webmaster@sun.com

Go to the subscriptions page to subscribe or unsubscribe to this newsletter.

ARCHIVES: You'll find the Java Developer Connection Technical Tips archives at:
http://developer.java.sun.com/developer/JDCTechTips/

Sun, Sun Microsystems, Java, Java Developer Connection, J2ME, JAIN, and PersonalJava are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.

Please send me newsletters in text

Please unsubscribe me from this newsletter