Date: Fri, 24 Sep 1999 01:42:45 GMT
Message-Id: <199909240142.BAA21324@hermes.java.sun.com>
From: JDCTechTips@sun.com
Subject: JDC Tech Tips  Sept. 23, 1999
To: JDCMember@sun.com
Reply-To: JDCTechTips@sun.com
Errors-To: JDCMailErrors@sun.com
Precedence: junk
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii


 J  D  C    T  E  C  H    T  I  P  S

                      TIPS, TECHNIQUES, AND SAMPLE CODE


WELCOME to the Java Developer Connection(sm) (JDC) Tech Tips, 
September 23, 1999. This issue covers:

         * Extracting Links from an HTML File
         * Sorting Arrays

This issue of the JDC Tech Tips is written by Patrick Chan,
the author of the publication "The Java(tm) Developers Almanac"
(http://www.amazon.com/exec/obidos/ASIN/0201432986/xeo).

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
EXTRACTING LINKS FROM AN HTML FILE

There are many applications that fetch an HTML page from the Web 
and then extract the links from the page.  For example, a
link-checker application fetches a page, extracts the links, and 
then checks the links to see of they refer to actual pages.  

The HTML 3.2 support in the Java(tm) 2 platform makes it fairly easy 
to find and parse links. This tip demonstrates how to use that 
support.

The first step is to create an editor kit.  The purpose of an editor
kit is to parse data in some format, such as HTML or RTF, and store
the information in a data structure that fully represents the data.
This data structure, called a Document, allows you to examine and 
modify the data in a convenient way.

Let's look at an example. In the following example program, we're 
going to examine the HTML data in a Document object. The program 
looks for A (anchor) tags and extracts the HREF attribute information 
from these tags.


import java.io.*;
import java.net.*;
import javax.swing.text.*;
import javax.swing.text.html.*;

class GetLinks {
    public static void main(String[] args) {
        EditorKit kit = new HTMLEditorKit();
        Document doc = kit.createDefaultDocument();

        // The Document class does not yet handle charset's properly.
        doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
        try {
            // Create a reader on the HTML content.
            Reader rd = getReader(args[0]);

            // Parse the HTML.
            kit.read(rd, doc, 0);

            // Iterate through the elements of the HTML document.
            ElementIterator it = new ElementIterator(doc);
 	    javax.swing.text.Element elem;
 	    while ((elem = it.next()) != null) {
	        SimpleAttributeSet s = (SimpleAttributeSet)
                    elem.getAttributes().getAttribute(HTML.Tag.A);
                if (s != null) {
                    System.out.println(s.getAttribute(HTML.Attribute.HREF));
                }
 	    }
        } catch (Exception e) {
            e.printStackTrace();
        }
	System.exit(1);
    }

    // Returns a reader on the HTML data. If 'uri' begins
    // with "http:", it's treated as a URL; otherwise,
    // it's assumed to be a local filename.
    static Reader getReader(String uri) throws IOException {
        if (uri.startsWith("http:")) {
            // Retrieve from Internet.
	    URLConnection conn = new URL(uri).openConnection();
            return new InputStreamReader(conn.getInputStream());
        } else {
            // Retrieve from file.
	    return new FileReader(uri);
        }
    }
}


This program takes one parameter from the command line.  If the
parameter starts with "http:", the program treats the parameter as 
a URL and fetches the HTML from that URL.  Otherwise, the parameter 
is treated as a filename and the HTML is fetched from that file.

For example, 

    $ java GetLinks http://java.sun.com

retrieves the HTML from the main page at java.sun.com.

The editor kit is an HTMLEditorKit object that contains an HTML
parser. It creates a Document object that can represent HTML. And 
it's the editor kit's read() method that parses the HTML and stores 
the information in the Document.

Once the HTML data is saved in the Document object, we're ready to
look for links. This is done by creating an iterator (using
ElementIterator) that iterates over all the visible text pieces 
(called elements) in the HTML. For each text piece, we check to see 
if it has been formatted for linking, in other words, whether the 
text is formatted with the A (anchor) tag. We do this by calling 
getAttributes().getAttribute(HTML.Tag.A). If the text piece has been 
formatted with the A tag, the method call returns the set of 
attributes of the A tag used to format that text piece. Otherwise 
the method call simply returns null.  

Note: The name getAttributes() is a little confusing because it has
nothing to do with HTML attributes; the "attributes" in this case 
are all the HTML tags (such as an A tag) that were used to format 
that text piece.

Now we have the set of attributes of the A tag used to format 
a piece of text; it's stored in a SimpleAttributeSet object. So we 
just need to get the value of the HREF attribute and we're done.  
We can do this by calling getAttribute(HTML.Attribute.HREF) on the 
A tag's attribute set.


SORTING ARRAYS

This tip discusses how you can sort data in arrays.  Sorting arrays 
of primitive types is easy. There are seven methods in the class
Arrays for sorting arrays of each of the seven primitive types: 
byte, char, double, float, int, long, and short. Here's an example 
that sorts an array of doubles.

import java.util.*;
import java.awt.*;
 
class Sort1 {
    // Sorts an array of random double values.
    public static void main(String[] args) {
        double[] dblarr = new double[10];
        for (int i=0; i<dblarr.length; i++) {
            dblarr[i] = Math.random();
        }
        
        // Sort the array.
        Arrays.sort(dblarr);
        //Print the array
        for (int i=0; i<dblarr.length; i++){
        System.out.println(dblarr[i]);
        }
    }
}


Sorting an array of objects is just as easy if the objects implement
the Comparable interface, java.util.Comparable. This interface gives 
a natural ordering for a class so that objects of that class can be 
sorted. Here's an example that sorts an array of type String that 
implements Comparable.

import java.util.*;
import java.awt.*;

class Sort2 {
    // Sorts the arguments in args.
    public static void main(String[] args) {
        Arrays.sort(args);
        //Print the arguments in args
        for (int i=0; i<args.length; i++){
        System.out.println(args[i]);
        }
    }
}


What if the objects do not implement Comparable? Well, you've got
two choices: you can modify the objects to implement Comparable, or
you can supply a Comparator to the sort method. Let's look at the 
first option first.

To make an object comparable you need to add Comparable to the
object's implements list.  You then need to modify the object to
implement the compareTo() method.  The compareTo() method compares 
the object with another object of the same type. If the object should 
appear before the other object, compareTo() should return a negative
number. If the object should appear after the other object,
compareTo() should return a non-zero positive number. Zero should be
returned if the objects are equal.

Point is an AWT class that is not comparable. The following example 
creates a version of Point that is comparable. It sorts points by 
distance from the origin.

import java.util.*;
import java.awt.*;

class MyPoint extends java.awt.Point implements Comparable {
    MyPoint(int x, int y) {
        super(x, y);
    }
    public int compareTo(Object o) {
        MyPoint p = (MyPoint)o;
        double d1 = Math.sqrt(x*x + y*y);
        double d2 = Math.sqrt(p.x*p.x + p.y*p.y);
        if (d1 < d2) {
            return -1;
        } else if (d2 < d1) {
            return 1;
        } 
	return 0;
    }
}
class Sort3 {
    public static void main(String[] args) {
        Random rnd = new Random();
        MyPoint[] points = new MyPoint[10];
        for (int i=0; i<points.length; i++) {
            points[i] = new MyPoint(rnd.nextInt(100), rnd.nextInt(100));
        }
        Arrays.sort(points);
        //Print the points
        for (int i=0; i<points.length; i++){
        System.out.println(points[i]);
        }
    }
}


If you can't or don't want to make an object Comparable, you can
supply a Comparator object to the Arrays.sort() method.  The
Comparator object must implement two methods: compare() and 
equals(). The behaviour of the compare() method is almost identical 
to the compareTo() method of the Comparable interface. The equals() 
method simply returns true if the two objects passed to it are equal.

The next example is similar to the one above.  However, instead of
creating a special kind of Point, we create a comparator that can 
sort Point objects.


import java.util.*;
import java.awt.*;

class PointComparator implements Comparator {
    public int compare(Object o1, Object o2) {
	Point p1 = (Point)o1;
	Point p2 = (Point)o2;
	double d1 = Math.sqrt(p1.x*p1.x + p1.y*p1.y);
	double d2 = Math.sqrt(p2.x*p2.x + p2.y*p2.y);
	if (d1 < d2) {
	    return -1;
	} else if (d2 < d1) {
	    return 1;
	} 
	return 0;
    }
    public boolean equals(Object o1, Object o2) {
        return compare(o1, o2) == 0;
    }
}
class Sort4 {
    public static void main(String[] args) {
        Random rnd = new Random();
        Point[] points = new Point[10];
        for (int i=0; i<points.length; i++) {
            points[i] = new Point(rnd.nextInt(100), rnd.nextInt(100));
        }
        Arrays.sort(points, new PointComparator());
        //Print the points
        for (int i=0; i<points.length; i++){
        System.out.println(points[i]);
        }
    }
}

.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .

- NOTE
The names on the JDC mailing list are used for internal Sun
Microsystems(tm) purposes only.  To remove your name from the list,
see Subscribe/Unsubscribe below.


- FEEDBACK
Comments?  Send your feedback on the JDC Tech Tips to:

jdc-webmaster@sun.com


- SUBSCRIBE/UNSUBSCRIBE
The JDC Tech Tips are sent to you because you elected to subscribe
when you registered as a JDC member.  To unsubscribe from JDC Email,
go to the following address and enter the email address you wish to
remove from the mailing list:

http://developer.java.sun.com/unsubscribe.html


To become a JDC member and subscribe to this newsletter go to:

http://java.sun.com/jdc/


- ARCHIVES
You'll find the JDC Tech Tips archives at:

http://developer.java.sun.com/developer/TechTips/index.html


- COPYRIGHT
Copyright 1999 Sun Microsystems, Inc. All rights reserved.
901 San Antonio Road, Palo Alto, California 94303 USA.

This document is protected by copyright.  For more information, see:

http://developer.java.sun.com/developer/copyright.html


JDC Tech Tips 
September 23, 1999