Received: from SOUTH-STATION-ANNEX.MIT.EDU by po10.MIT.EDU (5.61/4.7) id AA14776; Wed, 17 Feb 99 07:52:50 EST Received: from hermes.javasoft.com by MIT.EDU with SMTP id AA17190; Wed, 17 Feb 99 07:52:26 EST Received: by hermes.java.sun.com (SMI-8.6/SMI-SVR4) id NAA07582; Wed, 17 Feb 1999 13:05:23 GMT Date: Wed, 17 Feb 1999 13:05:23 GMT Message-Id: <199902171305.NAA07582@hermes.java.sun.com> X-Mailing: 72 From: JDCTechTips@sun.com Subject: JDC Tech Tips Vol. 2 No. 7 To: JDCMember@sun.com Reply-To: JDCTechTips@sun.com Errors-To: JDCMailErrors@sun.com Precedence: junk Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Beyond Email 2.1 J D C T E C H T I P S TIPS, TECHNIQUES, AND SAMPLE CODE WELCOME to the Java Developer Connection(sm) Tech Tips, Vol. 2 No. 7. This issue covers: * Converting Pathnames to URLs * Using Vector in the Collection Framework * Reading/Writing Unicode Using I/O Stream Encodings - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - T I P S , T E C H N I Q U E S , A N D S A M P L E C O D E CONVERTING PATHNAMES TO URLS A new feature of the Java(tm) 2 Platform is the File.toURL method, which is used to convert a pathname specification to a URL (Uniform Resource Locator, used on the Web). A simple example that illustrates this method is: import java.io.*; import java.net.*; public class url { public static void main(String args[]) { if (args.length != 1) { System.err.println("missing filename"); System.exit(1); } File f = new File(args[0]); try { URL u = f.toURL(); System.out.println(u); } catch (MalformedURLException e) { System.err.println(e); } } } For input of: $ java url paper.txt (current directory is t:\tmp) output is: file:/T:/tmp/paper.txt and this URL can be specified to view the local file in Netscape or Microsoft web browsers. Such a method is useful in applications that have to treat local pathnames and web-based resources in a uniform way. USING VECTOR IN THE COLLECTION FRAMEWORK Collections are a new feature of the Java 2 Platform, and are described in detail in various articles available on the Java Developer Connection. Collections are used to organize and operate on groups of data elements. For example, ArrayList is a replacement for Vector, and HashMap is similar to Hashtable. The old classes such as Vector are still available, but the new ones are preferred. So an obvious question is how to convert between old and new. You might, say, have a Vector object in an application, and you want to call a method that takes an ArrayList argument. One way of doing such a conversion is illustrated by the following example: import java.util.*; public class convert { public static void process(ArrayList al) { for (int i = 0; i < al.size(); i++) System.out.println(al.get(i)); } public static void main(String args[]) { Vector vec = new Vector(); vec.addElement("123"); vec.addElement(new Integer(456)); vec.addElement(new Double(789)); process(new ArrayList(vec)); } } A Vector is created, and several elements added to it. Then the process method is called, and it is passed an ArrayList object, one created via a constructor that takes a Vector argument. More precisely, what is happening here is that there is an ArrayList constructor that takes a "Collection" interface argument, and Vector has been retrofitted to implement the Collection interface, and so an ArrayList can be created from a Vector via this constructor. There are a number of other conversion mechanisms available in the collection framework, for hooking together old and new code. READING/WRITING UNICODE USING I/O STREAM ENCODINGS The Java programming language uses two-byte Unicode characters, while one-byte characters are common in other languages such as C (which uses ASCII). An obvious question that comes up is therefore: how are Java characters stored in disk files, and how can the Java language make use of the huge quantity of data out there that is encoded in ASCII? When the JDK(tm) software, such as version 1.0.2, first became available, this problem hadn't been solved. For example, DataInputStream.readLine is a method for reading lines of input, but it fails to properly convert bytes to characters, and is now deprecated. You won't necessarily notice this failure until you start to more fully use the Unicode character set. This problem has been solved by means of the Reader and Writer I/O classes. These sit on top of a byte stream (such as FileInputStream), and apply encoding bytes -> characters or characters -> bytes. There's an encoding that is applied by default, and you can determine its name via a small program: public class encode { public static void main(String args[]) { String p = System.getProperty("file.encoding"); System.out.println(p); } } On my machine, running Java 2 software, this prints out "Cp1252", which is a code for: Windows Western Europe / Latin-1 A table of encodings can be found at: http://java.sun.com/products/jdk/1.1/intl/html/intlspec.doc7.html If you want to directly specify encodings, one way of doing so is illustrated by the following program, which writes all the lower case letters of the Unicode alphabet to a file. Some of these characters have a non-zero high byte (that is, they are greater in value than '\u00ff'), and preserving both bytes of the character is therefore important. The encoding used is one called UTF-8, which has the property of representing ASCII text as itself (one byte), and other characters as two or three bytes. import java.io.*; public class enc1 { public static void main(String args[]) { try { FileOutputStream fos = new FileOutputStream("out"); OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF8"); for (int c = '\u0000'; c <= '\uffff'; c++) { if (!Character.isLowerCase((char)c)) continue; osw.write(c); } osw.close(); } catch (IOException e) { System.err.println(e); } } } This program reverses the process: import java.io.*; public class enc2 { public static void main(String args[]) { try { FileInputStream fis = new FileInputStream("out"); InputStreamReader isr = new InputStreamReader(fis, "UTF8"); for (int c = '\u0000'; c <= '\uffff'; c++) { if (!Character.isLowerCase((char)c)) continue; int ch = isr.read(); if (c != ch) System.err.println("error"); } isr.close(); } catch (IOException e) { System.err.println(e); } } } InputStreamReader and OutputStreamWriter are the classes where byte streams are converted to character streams and vice versa. This issue is quite an important one if you are concerned with writing applications that operate in an international context. . . . . . . . . . . . . . . . . . . . . . . . . -- NOTE The names on the JDC mailing list are used for internal Sun Microsystems(tm) purposes only. To remove your name from the list, see Subscribe/Unsubscribe below. -- FEEDBACK Comments? Send your feedback on the JDC Tech Tips to: JDCTechTips@Sun.com -- SUBSCRIBE/UNSUBSCRIBE The JDC Tech Tips are sent to you because you elected to subscribe when you registered as a JDC member. To unsubscribe from JDC Email, go to the following address and enter the email address you wish to remove from the mailing list: http://developer.java.sun.com/unsubscribe.html To become a JDC member and subscribe to this newsletter go to: http://java.sun.com/jdc/ -- ARCHIVES You'll find the JDC Tech Tips archives at: http://developer.java.sun.com/developer/javaInDepth/TechTips/index.html -- COPYRIGHT Copyright 1999 Sun Microsystems, Inc. All rights reserved. 901 San Antonio Road, Palo Alto, California 94303 USA. This document is protected by copyright. For more information, see: http://developer.java.sun.com/developer/copyright.html The JDC Tech Tips are written by Glen McCluskey. JDC Tech Tips Vol. 2 No. 7 February 16, 1999