Core Java Technologies Technical Tips

Return-Path: Received: from pacific-carrier-annex.mit.edu by po10.mit.edu (8.9.2/4.7) id NAA17130; Tue, 5 Nov 2002 13:56:53 -0500 (EST) Received: from hermes.sun.com (hermes.sun.com [64.124.140.169]) by pacific-carrier-annex.mit.edu (8.9.2/8.9.2) with SMTP id NAA02443 for ; Tue, 5 Nov 2002 13:56:52 -0500 (EST) Date: 5 Nov 2002 08:51:03 -0800 From: "JDC Tech Tips" To: alexp@mit.edu Message-Id: <248348962122823785@hermes.sun.com> Subject: Core Java Technologies Tech Tips, Nov. 5, 2002 (Sets, Expression Evaluation Order) Mime-Version: 1.0 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: SunMail 1.0 Core Java Technologies Technical Tips

View this issue as simple text November 5, 2002

In this Issue

Welcome to the Core Java^TM Technologies Tech Tips, November 5, 2002. Here you'll get tips on using core Java technologies and APIs, such as those in Java 2 Platform, Standard Edition (J2SE^TM).

This issue covers:

Using HashSet, LinkedHashSet, and TreeSet

Understanding Expression Evaluation Order

These tips were developed using Java 2 SDK, Standard Edition, v 1.4.

This issue of the Core Java Technologies Tech Tips is written by Glen McCluskey.

See the Subscribe/Unsubscribe note at the end of this newsletter to subscribe to Tech Tips that focus on technologies and products in other Java platforms.

USING HASHSET, LINKEDHASHSET, AND TREESET

A set is a collection of elements without any duplicates. The Set and SortedSet interfaces describe the properties of sets, and the HashSet, LinkedHashSet, and TreeSet classes implement these interfaces. This tip examines these classes, and illustrates where each might be appropriate for use in your Java applications.

Let's start with a simple example:

    import java.util.*;
    
    public class SetDemo1 {
        static final int MIN = 1;
        static final int MAX = 10;
    
        public static void main(String args[]) {
            Set sethash = new HashSet();
            for (int i = MAX; i >= MIN; i--) {
                sethash.add(new Integer(i*i));
            }
            System.out.println("HashSet = " + sethash);
    
            Set setlink = new LinkedHashSet();
            for (int i = MAX; i >= MIN; i--) {
                setlink.add(new Integer(i*i));
            }
            System.out.println(
                         "LinkedHashSet = " + setlink);
    
            Set settree = new TreeSet();
            for (int i = MAX; i >= MIN; i--) {
                settree.add(new Integer(i*i));
            }
            System.out.println("TreeSet = " + settree);
        }
    }

This SetDemo1 program creates an instance of each kind of set, and adds the squares of the values from 10 to 1 to the set. It wraps each integer value in an Integer object.

When you run the program, the result is:

    HashSet = [9, 25, 4, 36, 100, 1, 49, 81, 16, 64]
    LinkedHashSet = [100, 81, 64, 49, 36, 25, 16, 9, 4, 1]
    TreeSet = [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

The results illustrate that elements in a HashSet are retrieved (iteration ordering) in apparent random order. The elements of a LinkedHashSet are retrieved in the order that they are inserted into the set. The elements of a TreeSet are retrieved in ascending sorted order.

HashSet is a good choice for representing sets if you don't care about element ordering. But if ordering is important, then LinkedHashSet or TreeSet are better choices. However, LinkedHashSet or TreeSet come with an additional speed and space cost.

Let's examine that cost by running another program:

    import java.util.*;
    
    public class SetDemo2 {
        static final int N = 500000;
    
        static List numlist = new ArrayList(N);
    
        // fill up List with random Integer values
    
        static {
            Random rn = new Random();
            for (int i = 0; i < N; i++) {
                numlist.add(new Integer(
                                rn.nextInt()));
            }
        }
    
        static void dotime(Set set, String which) {
    
            long start = System.currentTimeMillis();
    
            // add all List values to Set
    
            set.addAll(numlist);
    
            // look up all List values in Set
    
            for (int i = 0; i > N; i++) {
                if (!set.contains(numlist.get(i))) {
                    System.out.println(
                        "contains error");
                }
            }
    
            long elapsed = 
                    System.currentTimeMillis() - start;
    
            System.out.println(which + " " + elapsed);
        }
    
        public static void main(String args[]) {
            if (args.length != 1) {
                System.out.println(
                    "Usage: java SetDemo2 " +
                    "HashSet|LinkedHashSet|TreeSet");
                System.exit(1);
            }
    
            // do timing of HashSet / 
            // LinkedHashSet / TreeSet
    
            if (args[0].equals("HashSet")) {
                dotime(new HashSet(), "HashSet");
            }
            else if (args[0].equals("LinkedHashSet")) {
                dotime(
                   new LinkedHashSet(), "LinkedHashSet");
            }
            else {
                dotime(new TreeSet(), "TreeSet");
            }
        }
    }

The SetDemo2 program creates a set, adds a list of Integer values to it, and then looks up each value in the set.

To compare the performance of the three kinds of sets, you need to run the SetDemo2 program three times, like this:

    java SetDemo2 HashSet

    java SetDemo2 LinkedHashSet

    java SetDemo2 TreeSet

It's done this way so that each type of set will start with a clean slate for timing purposes.

The output of the three runs will be something like this:

    HashSet 2688

    LinkedHashSet 4078

    TreeSet 6469

Your results might differ, perhaps significantly, depending on your operating environment. But the results should indicate that TreeSet takes more than twice as long as HashSet.

A HashSet is implemented with a backing HashMap structure. In other words, a HashSet uses a hash table as the underlying data structure to represent a set. A LinkedHashSet is similar, but it also superimposes a doubly-linked list on the elements to keep track of the insertion order. A TreeSet is based on a TreeMap, which, in turn, is based on red-black trees. These trees are something like binary trees, with a collection of nodes containing left and right subtrees, and so on. However the trees have additional functionality to keep them balanced, that is, to keep their shape from becoming skewed and thereby degrading tree lookup performance.

LinkedHashSet and TreeSet cost more than HashSet, but do more. A LinkedHashSet is useful in making set copies, such that the original set's iteration ordering is preserved:

    void f(Set s) {
        Set copy = new LinkedHashSet(s);
    }

This technique works regardless of how the original set is implemented.

Note too that the cost of TreeSet relative to HashSet and LinkedHashSet varies with the size of the set. Tree-based data structures get slower as the number of elements get larger. The cost to look up an element in a hash-based set or map is independent of the set's size, assuming a reasonable hash function. By comparison, the cost to look up an element in a tree-based set or map is proportional to the log of the number of elements in the collection.

There is one exception to the rule that LinkedHashSets are slower than HashSets: iteration over a LinkedHashSet is generally faster than iteration over a HashSet.

TreeSet is useful if you want to retrieve the set elements in sorted order. A LinkedHashSet does this only if the elements were inserted in sorted order.

You can also use TreeSet to compute a subset of a set, with the subset backed by the original set. Let's look at how this works:

    import java.util.*;
    
    public class SetDemo3 {
    
        static final int MIN = -5;
        static final int MAX = 5;
    
        public static void main(String args[]) {
    
            // create a SortedSet with 
            // odd MIN-MAX Integer values
    
            SortedSet set = new TreeSet();
            for (int i = MIN; i <= MAX; i += 2) {
                set.add(new Integer(i));
            }
            System.out.println("set = " + set);
    
            // create a subset from 0 (inclusive) 
            // to MAX (exclusive)
    
            SortedSet subset =
                set.subSet(
                    new Integer(0), new Integer(MAX));
            System.out.println("subset = " + subset);
    
            // add a new element to the subset
    
            subset.add(new Integer(0));
            System.out.println(
                "set after add = " + set);
        }
    }

The SetDemo3 program creates a set [-5, -3, -1, 1, 3, 5]. Then it creates a subset. The subset starts at 0 (inclusive) and goes up to 5 (exclusive), so that it contains [1, 3]. The program then adds value 0 to the subset. Finally the program prints the original set. The printed original set contains the 0 value, which demonstrates that the subset is backed by the original set.

When you run the SetDemo3 program, you should see the following result:

    set = [-5, -3, -1, 1, 3, 5]
    subset = [1, 3]
    set after add = [-5, -3, -1, 0, 1, 3, 5]

The key to creating a subset in this way is the ability to specify a range of element values.

HashSet and LinkedHashSet do not represent their elements in sorted order. So it would be difficult to use them in a way that TreeSet was used in the SetDemo3 example.

Because TreeSet keeps its elements sorted, it can offer other features such as the first and last methods, that are used to retrieve the lowest and highest elements in a set, respectively.

For more information about HashSet, LinkedHashSet, and TreeSet, see section 16.5, Set and SortedSet, in The Java(tm) Programming Language Third Edition by Arnold, Gosling, and Holmes. Also, see the tutorial Introduction to the Collections Framework.

UNDERSTANDING EXPRESSION EVALUATION ORDER

Suppose you are reading some Java code, and you come across an example like this:

    public class EvalDemo1 {
        public static void main(String args[]) {
            int x = 10;
            x = x + (x = 5);
            System.out.println(x);
        }
    }

What value is printed when this program is run? There are two plausible answers. For the expression:

    x + (x = 5)

the Java compiler might evaluate the left-hand operand (10), and then the right-hand operand (5), before adding them together. If it did that, the result would be the value 15. Or the compiler might evaluate the "(x = 5)" operand first -- this gives a value of 5. When the compiler evaluates the left-hand operand, it picks up the value of 5 for x. The result in this case will be 10.

So which of these two possible results is the actual result in the Java programming language? The answer is 15. What drives this result is the fact that the Java language has a rule that the left-hand operand of a binary operator is evaluated before the right-hand one. Interestingly, there is no equivalent rule in the C language. For example, a C program that corresponds to the EvalDemo1 program:

    #include <stdio.h>
    
    int main() {
        int x = 10;
        x = x + (x = 5);
        printf("%d\n", x);
    }

might produce different results. For example, running the C program with a number of popular C compilers, produces the output 10 rather than 15.

Here's another example that looks at the order in which method arguments are evaluated:

    public class EvalDemo2 {
        static void f(int a, int b) {
            System.out.println(a + " " + b);
        }
    
        public static void main(String args[]) {
            int x = 10;
            f(x, x = 5);
        }
    }

Method arguments are evaluated left-to-right, so when you run this program the output is:

    10 5

Let's look at a more complicated example:

    public class EvalDemo3 {
        static int f1() {
            System.out.println("called f1");
            return 37;
        }
    
        static int f2() throws Exception {
            System.out.println("called f2");
            throw new Exception();
        }
    
        public static void main(String args[]) 
            throws Exception {
                int x = f1() / (0 * f2());
        }
    }

The EvalDemo3 program calls the f1 method to get the value of the left-hand operand for the / operator. Then it evaluates the right-hand operand before it does the division. How is the right-hand operand evaluated?

It's important to understand that the f2 method is called. You might wonder why. In fact, you might assume that an expression "0 * f2()" will always be equal to zero -- that is, no matter what f2 returns. So why the need to call f2? The answer is that the assumption overlooks possible side effects when f2 is called.

In a similar way, the compiler cannot assume that an expression:

    f1() / (0 * f2())

reduces to:

    f1() / 0

and arrange for an ArithmeticException to be thrown, as is normally done for integer division by zero.

The f2 method is called, and ArithmeticException is not thrown. The result of running the EvalDemo3 program is:

    called f1
    called f2
    Exception in thread "main" java.lang.Exception
        at EvalDemo3.f2(EvalDemo3.java:9)
        at EvalDemo3.main(EvalDemo3.java:14)

This result indicates that the f2 method call is not optimized away, that is, the call is actually made. Then another exception (one different than ArithmeticException) is thrown before evaluation ever reaches the division operation.

In this example, both operands of the / operator are evaluated before the division is done. Because evaluation of the right-hand operand triggers an exception, the division never takes place.

But you should understand that there are three operators (&&, ||, ?:) where operands are not always evaluated first. Let's look at an example:

    public class EvalDemo4 {
        static int f1() {
            System.out.println("called f1");
            return 37;
        }
    
        static int f2() {
            System.out.println("called f2");
            return 47;
        }
    
        public static void main(String args[]) {
            boolean b = true;
            int x = b ? f1() : f2();
            System.out.println(x);
        }
    }

When you run the EvalDemo4 program, the result is:

    called f1
    37

Here the f2 method is never called. When the expression:

    b ? f1() : f2()

is evaluated, b has a true value, f1 is called, and f2 is not evaluated. In a similar way, if you have some code like this:

    boolean b = false;

    if (b && f())
        ...

then f will not be called. That's because b is false. The right-hand operand of && is not evaluated if the left-hand one is false -- that's why the && operator (as well as the || operator) is called a "short circuit" operator.

Let's look at a final example. Suppose you've got some code that does floating-point arithmetic, and at one point in the code, there is usage like this:

    4.0 * n * 0.5

and at another point, there is some code like this:

    2.0 * n

Are these two expressions the same, given that 4.0 * 0.5 is equivalent to 2.0?

Not necessarily. Here's a demonstration:

    public strictfp class EvalDemo5 {
        public static void main(String args[]) {
            double d = 8e+307;
            System.out.println(2.0 * d);
            System.out.println(4.0 * d * 0.5);
        }
    }

The output of this demo is:

    1.6E308
    Infinity

In the EvalDemo5 code, multiplying 8e+307 by 4.0 causes an overflow. Multiplying the value by 0.5 comes too late to do any good.

People tend to think of multiplication as associative, but this assumption does not necessarily hold when doing floating-point calculations. You need to be careful when applying what appear to be commonsense algebraic identities and similar rules to Java floating-point arithmetic.

For example, the standard Double.isNaN method is implemented as:

    static public boolean isNaN(double v) {
        return (v != v);
    }

In other words, a value is NaN (not a number) if it is not equal to itself. You might think that it would be impossible for a value not to be equal to itself, but this is so for NaNs.

For more information about expression evaluation order, see section 6.8.1, Order of Evaluation, in The Java(tm) Programming Language Third Edition by Arnold, Gosling, and Holmes. Also see section 15.7, Evaluation Order, in The Java(tm) Language Specification Second Edition by Gosling, Joy, Steele, and Bracha.

IMPORTANT: Please read our Terms of Use, Privacy, and Licensing policies:
http://www.sun.com/share/text/termsofuse.html
http://www.sun.com/privacy/
http://developer.java.sun.com/berkeley_license.html

Comments? Send your feedback on the Core Java^TM Technologies Tech Tips to: jdc-webmaster@sun.com

Subscribe to other Java developer Tech Tips:

- Enterprise Java Technologies Tech Tips. Get tips on using enterprise Java technologies and APIs, such as those in the Java 2 Platform, Enterprise Edition (J2EE^TM).
- Wireless Developer Tech Tips. Get tips on using wireless Java technologies and APIs, such as those in the Java 2 Platform, Micro Edition (J2ME^TM).

To subscribe to these and other JDC publications:
- Go to the JDC Newsletters and Publications page, choose the newsletters you want to subscribe to and click "Update".
- To unsubscribe, go to the subscriptions page, uncheck the appropriate checkbox, and click "Update".

ARCHIVES: You'll find the Core Java Technologies Tech Tips archives at:
http://java.sun.com/jdc/TechTips/index.html

Copyright 2002 Sun Microsystems, Inc. All rights reserved.
901 San Antonio Road, Palo Alto, California 94303 USA.

This document is protected by copyright. For more information, see:
http://java.sun.com/jdc/copyright.html

Core Java Technologies Tech Tips
November 5, 2002

Sun, Sun Microsystems, Java, Java Developer Connection, J2SE, J2EE, and J2ME are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.

Please send me newsletters in text.
Please unsubscribe me from this newsletter.