Received: from PACIFIC-CARRIER-ANNEX.MIT.EDU by po10.MIT.EDU (5.61/4.7) id AA13390; Tue, 29 Aug 00 16:18:43 EDT Received: from hermes.javasoft.com by MIT.EDU with SMTP id AA23372; Tue, 29 Aug 00 16:18:33 EDT Received: (from nobody@localhost) by hermes.java.sun.com (8.9.3+Sun/8.9.1) id UAA02674; Tue, 29 Aug 2000 20:19:52 GMT Date: Tue, 29 Aug 2000 20:19:52 GMT Message-Id: <200008292019.UAA02674@hermes.java.sun.com> X-Authentication-Warning: hermes.java.sun.com: Processed from queue /bulkmail/data/ed_81/mqueue5 X-Mailing: 253 From: JDCTechTips@sun.com Subject: JDC Tech Tips August 29, 2000 To: JDCMember@sun.com Reply-To: JDCTechTips@sun.com Errors-To: bounced_mail@hermes.java.sun.com Precedence: junk Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Beyond Email 2.2 J D C T E C H T I P S TIPS, TECHNIQUES, AND SAMPLE CODE WELCOME to the Java Developer Connection(sm) (JDC) Tech Tips, August 29, 2000. This issue is about bytecode. Programmers coding in the Java(tm) programming language rarely view the compiled output of their programs. This is unfortunate, because the output, Java bytecode, can provide valuable insight when debugging or troubleshooting performance problems. Moreover, the JDK makes viewing bytecode easy. This tip shows you how to view and interpret Java bytecode. It presents the following topics related to bytecode: * Getting Started With javap * How Bytecode Protects You From Memory Bugs * Analyzing Bytecode to Improve Your Code This tip was developed using Java(tm) 2 SDK, Standard Edition, v 1.3. This issue of the JDC Tech Tips is written by Stuart Halloway, a Java specialist at DevelopMentor (http://www.develop.com/java). You can view this issue of the Tech Tips on the Web at http://developer.java.sun.com/developer/TechTips/2000/tt0829.html - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - GETTING STARTED WITH JAVAP Most Java programmers know that their programs are not typically compiled into native machine code. Instead, the programs are compiled into an intermediate bytecode format that is executed by the Java(tm) Virtual Machine*. However, relatively few programmers have ever seen bytecode because their tools do not encourage them to look. Most Java debugging tools do not allow step-by-step execution of bytecode; they either show source code lines or nothing. Fortunately, the JDK(tm) provides javap, a command-line tool that makes it easy to view bytecode. Let's see an example: public class ByteCodeDemo { public static void main(String[] args) { System.out.println("Hello world"); } } After you compile this class, you could open the .class file in a hex editor and translate the bytecodes by referring to the virtual machine specification. Fortunately, there is an easier way. The JDK includes a command line disassembler called javap, which will convert the byte codes into human-readable mnemonics. You can get a bytecode listing by passing the '-c' flag to javap as follows: javap -c ByteCodeDemo You should see output similar to this: public class ByteCodeDemo extends java.lang.Object { public ByteCodeDemo(); public static void main(java.lang.String[]); } Method ByteCodeDemo() 0 aload_0 1 invokespecial #1 4 return Method void main(java.lang.String[]) 0 getstatic #2 3 ldc #3 5 invokevirtual #4 8 return From just this short listing, you can learn a lot about bytecode. Begin with the first instruction in the main method: 0 getstatic #2 The initial integer is the offset of the instruction in the method. So the first instruction begins with a '0'. The mnemonic for the instruction follows the offset. In this example, the 'getstatic' instruction pushes a static field onto a data structure called the operand stack. Later instructions can reference the field in this data structure. Following the getstatic instruction is the field to be pushed. In this case the field to be pushed is "#2 ." If you examined the bytecode directly, you would see that the field information is not embedded directly in the instruction. Instead, like all constants used by a Java class, the field information is stored in a shared pool. Storing field information in a constant pool reduces the size of the bytecode instructions. This is because the instructions only have to store the integer index into the constant pool instead of the entire constant. In this example, the field information is at location #2 in the constant pool. The order of items in the constant pool is compiler dependent, so you might see a number other than '#2.' After analyzing the first instruction, it's easy to guess the meaning of the other instructions. The 'ldc' (load constant) instruction pushes the constant "Hello, World." onto the operand stack. The 'invokevirtual' invokes the println method, which pops its two arguments from the operand stack. Don't forget that an instance method such as println has two arguments: the obvious string argument, plus the implicit 'this' reference. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HOW BYTECODE PROTECTS YOU FROM MEMORY BUGS The Java programming language is frequently touted as a "secure" language for internet software. Given that the code looks so much like C++ on the surface, where does this security come from? It turns out that an important aspect of security is the prevention of memory-related bugs. Computer criminals exploit memory bugs to sneak malicious code into otherwise safe programs. Java bytecode is a first line of defense against this sort of attack, as the following example demonstrates: public float add(float f, int n) { return f + n; } If you add this function to the previous example, recompile it, and run javap, you should see bytecode similar to this: Method float add(float, int) 0 fload_1 1 iload_2 2 i2f 3 fadd 4 freturn At the beginning of a Java method, the virtual machine places method parameters in a data structure called the local variable table. As its name suggests, the local variable table also contains any local variables that you declare. In this example, the method begins with three local variable table entries, these are for the three arguments to the add method. Slot 0 holds the this reference, while slots 1 and 2 hold the float and int arguments, respectively. In order to actually manipulate the variables, they must be loaded (pushed) onto the operand stack. The first instruction, fload_1, pushes the float at slot 1 onto the operand stack. The second instruction, iload_2, pushes the int at slot 2 onto the operand stack. The interesting thing about these instructions is in the 'i' and 'f' prefixes, which illustrate that Java bytecode instructions are strongly typed. If the type of an argument does not match the type of the bytecode, the VM will reject the bytecode as unsafe. Better still, the bytecodes are designed so that these type-safety checks need only be performed once, at class load time. How does this type-safety enhance security? If an attacker could trick the virtual machine into treating an int as a float, or vice versa, it would be easy to corrupt calculations in a predictable way. If these calculations involved bank balances, the security implications would be obvious. More dangerous still would be tricking the VM into treating an int as an Object reference. In most scenarios, this would crash the VM, but an attacker needs to find only one loophole. And don't forget that the attacker doesn't have to search by hand--it would be pretty easy to write a program that generated billions of permutations of bad byte codes, trying to find the lucky one that compromised the VM. Another case where bytecode safeguards memory is array manipulation. The 'aastore' and 'aaload' bytecodes operate on Java arrays, and they always check array bounds. These bytcodes throw an ArrayIndexOutOfBoundsException if the caller passes the end of the array. Perhaps the most important checks of all apply to the branching instructions, for example, the bytecodes that begin with 'if.' In bytecode, branching instructions can only branch to another instruction within the same method. The only way to transfer control outside a method is to return, throw an exception, or execute one of the 'invoke' instructions. Not only does this close the door on many attacks, it also prevents nasty bugs caused by dangling references or stack corruption. If you have ever had a system debugger open your program to a random location in code, you're familiar with these bugs. The critical point to remember about all of these checks is that they are made by the virtual machine at the bytecode level, not just by the compiler. A compiler for a language such as C++ might prevent some of the memory errors discussed above, but its protection applies only at the source code level. Operating systems will happily load and execute any machine code, whether the code was generated by a careful C++ compiler or a malicious attacker. In short, C++ is object-oriented only at the source code level, however Java's object-oriented features extend down to the compiled code. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ANALYZING BYTECODE TO IMPROVE YOUR CODE The memory and security protections of Java bytecode are there for you whether you notice them or not, so why bother looking at the bytecode? In many cases, knowing how the compiler translates your code into bytecode can help you write more efficient code, and can sometimes even prevent insidious bugs. Consider the following example: //return the concatenation str1+str2 String concat(String str1, String str2) { return str1 + str2; } //append str2 to str1 void concat(StringBuffer str1, String str2) { str1.append(str2); } Try to guess how many function calls each method requires to execute. Now compile the methods and run javap. You should see output like this: Method java.lang.String concat1(java.lang.String, java.lang.String) 0 new #5 3 dup 4 invokespecial #6 7 aload_1 8 invokevirtual #7 11 aload_2 12 invokevirtual #7 15 invokevirtual #8 18 areturn Method void concat2(java.lang.StringBuffer, java.lang.String) 0 aload_1 1 aload_2 2 invokevirtual #7 5 pop 6 return The concat1 method makes five method calls: new, invokespecial, and three invokevirtuals. That is quite a bit more work than the concat2 method, which makes only a single invokevirtual call. Most Java programmers have been warned that because Strings are immutable it is more efficient to use StringBuffers for concatenation. Using javap to analyze this makes the point in dramatic fashion. If you are unsure whether two language constructs are equivalent in performance, you should use javap to analyze the bytecode. Beware of the just-in-time (JIT) compiler, though. Because the JIT compiler recompiles the bytecodes into native machine code, it can apply additional optimizations that your javap analysis does not reveal. Unless you have the source code for your virtual machine, you need to supplement your bytecode analysis with performance benchmarks. A final example illustrates how examining bytecode can help prevent bugs in your application. Create two classes as follows. Make sure they are in separate files. public class ChangeALot { public static final boolean debug=false; public static boolean log=false; } public class EternallyConstant { public static void main(String [] args) { System.out.println("EternallyConstant beginning execution"); if (ChangeALot.debug) System.out.println("Debug mode is on"); if (ChangeALot.log) System.out.println("Logging mode is on"); } } If you run the class EternallyConstant you should get the message: EternallyConstant beginning execution. Now try editing the ChangeALot file, modifying the debug and log variables to both be true. Recompile only the ChangeALot file. Run EternallyConstant again, and you will see the following output: EternallyConstant beginning execution Logging mode is on What happened to the debugging mode? Even though you set debug to true, the message "Debug mode is on" didn't appear. The answer is in the bytecode. Run javap on the EternallyConstant class, and you will see this: Method void main(java.lang.String[]) 0 getstatic #2 3 ldc #3 5 invokevirtual #4 8 getstatic #5 11 ifeq 22 14 getstatic #2 17 ldc #6 19 invokevirtual #4 22 return Surprise! While there is an 'ifeq' check on the log field, the code does not check the debug field at all. Because the debug field was marked final, the compiler knew that the debug field could never change at runtime. Therefore, it optimized the 'if' statement branch by removing it. This is a very useful optimization indeed, because it allows you to embed debugging code in your application and pay no runtime penalty when the switch is set to false. Unfortunately, this optimization can lead to major compile-time confusion. If you change a final field, you have to remember to recompile any other class that might reference the field. That's because the 'reference' might have been optimized away. Java development environments do not always detect this subtle dependency, something that can lead to very odd bugs. So, the old C++ adage is still true for the Java environment. "When in doubt, rebuild all." Knowing a little bytecode is a valuable assist to any programmer coding in the Java programming language. The javap tool makes it easy to view bytecodes. Occasionally checking your code with javap can be invaluable in improving performance and catching particularly elusive bugs. There is substantially more complexity to bytecode and the VM than this tip can cover. To learn more, read Inside the Java Virtual Machine by Bill Venners. . . . . . . . . . . . . . . . . . . . . . . . - NOTE The names on the JDC mailing list are used for internal Sun Microsystems(tm) purposes only. To remove your name from the list, see Subscribe/Unsubscribe below. - FEEDBACK Comments? Send your feedback on the JDC Tech Tips to: jdc-webmaster@sun.com - SUBSCRIBE/UNSUBSCRIBE The JDC newsletter mailing lists are used for internal Sun Microsystems(TM) purposes only. The JDC Tech Tips are sent to you because you elected to subscribe. To remove your name from a JDC newsletter mailing list: o Go to the JDC Front page. (http://java.sun.com/jdc/) o Select Subscribe to free JDC newsletters. If you are not logged in automatically, type in your UserID and password. o Update your newsletter preferences and click Update. To subscribe to a JDC newsletter mailing list: o Go to the JDC Front page. (http://java.sun.com/jdc/) o If you are already a member of the JDC, log in. If you are not a member, select Register (It's free!) and fill out the requested information. o Select Subscribe to free JDC newsletters. o Choose the newsletters you want to subscribe to and click Update. - ARCHIVES You'll find the JDC Tech Tips archives at: http://developer.java.sun.com/developer/TechTips/index.html - COPYRIGHT Copyright 2000 Sun Microsystems, Inc. All rights reserved. 901 San Antonio Road, Palo Alto, California 94303 USA. This document is protected by copyright. For more information, see: http://developer.java.sun.com/developer/copyright.html JDC Tech Tips August 29, 2000 * As used in this document, the terms "Java virtual machine" or "JVM" mean a virtual machine for the Java platform.