XML.com: XML From the Inside Out

XML.comWebServices.XML.comO'Reilly Networkoreilly.com
  Resources | Buyer's Guide | Newsletter | Safari Bookshelf
advertisement
Track down differences in XML and source code - Download Altova DiffDog 2005!
  Topics
Business
Databases
Graphics
Metadata
Mobile
Programming
Schemas
Semantic Web
Style
Web
Web Services

Sponsored Zones
Novell Learning Channel



   Essentials
Annotated XML
What is XML?
What is XSLT?
What is XSL-FO?
What is XLink?
What is XML Schema?
What is XQuery?
What is RDF?
What is RSS?
What is AJAX?
What are Topic Maps?
What are Web Services?
What are XForms?
XSLT Recipe of the Day

Manage Your Account
Forgot Your Password?

  Find
Search
Article Archive

Add XML to your Website

  Columns
<taglines/>
Dive into XML
Hacking Congress
Hacking the Library
Java Web Services Security
Jon Udell
Perl and XML
Practical XQuery
Python and XML
Rich Salz
Sacré SVG
Standards Lowdown
The Restful Web
Transforming XML
XML Q&A
XML Tourist
XML-Deviant

  Guides
XML Resources
Buyer's Guide
Events Calendar
Standards List
Submissions List

  Toolbox

Syntax Checker





Atom Feed
RSS Feed


XML.com supported by:

Web Directory
Mortgage Refinancing


   Print.Print
Email.Email article link
Discuss.Discuss
Add to Project.Add to Project

Tags: xhtml xml html web mime
Bookmark with del.icio.us

The Road to XHTML 2.0: MIME Types

by Mark Pilgrim
March 19, 2003

XHTML's Dirty Little Secret

Let's pretend that you've migrated to XHTML -- probably XHTML 1.0 Transitional, unless you're one of those weird geek alpha designers who insist on doing everything with Strict DOCTYPEs. It wasn't that hard, right? Lowercase all your tags; add some end tags to match your <p> and <li> tags; add some slashes to <br /> and <img />; update your DOCTYPE; get on with your life!

Let's also pretend, for the sake of argument, that you're validating your spiffy new XHTML markup on a regular basis. You might even have one of those sporty "valid XHTML" badges lurking at the bottom of pages. Good for you.

Now here's a dirty little secret: browsers aren't actually treating your XHTML as XML. Your validated, correctly DOCTYPE'd, completely standards compliant XHTML markup is being treated as if it were still HTML with a few weird slashes in places they don't belong (like <br /> and <img />).

Why? The answer is MIME types.

MIME types are as old as the Web; in fact, they're older. Every page you browse, every image you download, every stylesheet and JavaScript and PDF and silly little Flash movie you view through your browser, all have a MIME type associated with them. For HTML pages, the MIME type is text/html. For XHTML, the MIME type is supposed to be application/xhtml+xml.

Related Reading

HTML & XHTML: The Definitive Guide

HTML & XHTML: The Definitive Guide
By Chuck Musciano, Bill Kennedy

Table of Contents
Index
Sample Chapter

Read Online--Safari Search this book on Safari:
 

Code Fragments only

(Tip: If you use the advanced page of the W3C validator with the "verbose output" option checked, it will validate your page and show you what MIME type your server is sending for that page.)

The current MIME type situation is a bit of a mess. According to the W3C's Note on XHTML Media Types:

  • HTML 4 should be served as text/html. This is what everybody does, so no problem there.
  • "HTML compatible" XHTML (as defined in appendix C of the XHTML 1.0 specification) may be served as text/html, but it should be served as application/xhtml+xml. This is probably the sort of XHTML you're writing now, so you could go either way.
  • XHTML 1.1 should not be served as text/html.
  • Although the spec is not finalized yet, all indications are that XHTML 2.0 must not be served as text/html.

So the first step on the road to XHTML 2.0 is conquering the XHTML MIME type, application/xhtml+xml.

A Messy Transition

You can start using the application/xhtml+xml MIME type immediately for your existing XHTML pages, but there are a few serious caveats you need to consider first:

  1. All of your pages must be well-formed XML. Technically, they don't need to be valid XHTML (you could have a <div> element inside a <span> element and be well-formed but invalid). But all your end tags must match all your start tags, no overlaps, none missing.

    When I say must, I mean must. Mozilla and its derivatives are the only major browsers that can handle the XHTML MIME type (more on that in a minute), and they are ultra-strict about it. If a single end tag is missing, Mozilla users won't see your page at all; they'll see an XML debugging message instead.

  2. Most current browsers don't handle the application/xhtml+xml MIME type correctly, so you'll need to make provisions for serving up your XHTML the old-fashioned way (as text/html) to these browsers. (The list of non-XHTML-aware browsers includes Internet Explorer 6 for Windows, so it's not as if you can skip this step.) If your pages are dynamically generated, you can alter the Content-type programmatically. If you're serving up static files, you'll need to resort to mod_rewrite or a similar solution. More on this in a minute, too.

  3. Cascading stylesheets are parsed slightly differently in the XML world. When attached to HTML pages, CSS selectors are case-insensitive. But when attached to XML pages (including XHTML pages served with the proper XHTML MIME type), CSS selectors are case-sensitive. This shouldn't come as too much of a surprise; everything in XML is case-sensitive. Keep all your CSS selectors lowercase and you'll be okay.

  4. Also on the subject of CSS, the <body> element is somewhat magical in HTML, but not in XML. The technical background is not worth delving into; the upshot is that if you define CSS styles on body, you should define them on html as well. For example, if you define a background color on body, it will apply to the entire page in HTML, but it may not in XML. You'll need to define the background on html as well.

  5. Your JavaScript may need some tweaking for case-sensitivity as well. Whereas the HTML DOM is case-insensitive (and tag names are returned from functions like getElementsByTagName() in uppercase), the XML DOM is case-sensitive and tag names are returned in lowercase. To quote the W3C on XHTML and the HTML DOM:

    Developers need to take two things into account when writing code that works on both HTML and XHTML documents. When comparing element or attribute names to strings, the string compare needs to be case insensitive, or the element or attribute name needs to be converted into lowercase before comparing against a lowercase string. Second, when calling methods that are case insensitive when used on a HTML document (such as getElementsByTagName() and namedItem()), the string that is passed in should be lowercase.

  6. Also on the JavaScript front, methods like document.write do not work; you will need to use document.createElementNS and friends instead. For example, if your XHTML-as-HTML document currently uses this script to insert a linked stylesheet:

    if (document.getElementById) {
      document.write("<link rel=\"stylesheet\" 
         type=\"text/css\" href=\"/css/js.css\" media=\"screen\" />")
    }

    Your XHTML-as-XML document would need to use something like this instead (thanks to Experts Exchange for this code):

    if (document.getElementById) {
      var l=document.createElementNS("http://www.w3.org/1999/xhtml","link");
      l.setAttribute("rel", "stylesheet");
      l.setAttribute("type", "text/css");
      l.setAttribute("href", "/css/js.css");
      l.setAttribute("media", "screen");
      document.getElementsByTagName("head")[0].appendChild(l);
    }
  7. Still on the JavaScript front, collections like document.images, document.applets, document.links, document.forms, and document.anchors do not exist when serving XHTML as XML. You'll need to use the more generic document.getElementsByTagName() method and weed out the elements you're actually interested in. Mozilla bug 111514 has a long discussion on this issue.

It can be difficult to get JavasSript to work properly in both HTML and XML modes. This is a short-term problem (XHTML 2.0 only has one mode: XML), but it's a serious one. If you use any JavaScript on your pages now, you may be better off waiting to make the jump to XHTML 2.0 all at once, rather than migrating slowly.

Accommodating legacy browsers

More Dive Into XML Columns

Identifying Atom

XML on the Web Has Failed

The Atom Link Model

Normalizing Syndicated Feed Content

Atom Authentication

As I mentioned, Mozilla is the only major browser that currently handles application/xhtml+xml correctly; for all other browsers, you'll need to serve your XHTML as HTML, using the old text/html MIME type. (Here's a complete list of browser conformance tests.) But you can't browser-sniff for Mozilla (for instance, by searching for "Gecko" in the User-Agent), because some browsers (OmniWeb, Opera) have options to lie about who they are, and other browsers (Safari) include the magic word "Gecko" in their User-Agent string by default. Luckily for us, HTTP has a specific solution for this problem, one which is so elegant (compared to the rest of this mess) that I didn't believe it would actually work until I tried it.

Mozilla, in its infinite wisdom, will tell a server that it accepts application/xhtml+xml in the HTTP_ACCEPT header that it sends with every request.

That's it. All scripting environments provide access to these HTTP headers; so, armed with this nugget of information, we can devise a variety of ways to serve up the same page as application/xhtml+xml to browsers that claim to support it and as text/html to everyone else.

PHP

<?php
if ( stristr($_SERVER["HTTP_ACCEPT"],"application/xhtml+xml") ) {
  header("Content-type: application/xhtml+xml");
}
else {
  header("Content-type: text/html");
}
?>

Python (CGI script)

import os
if os.environ.get('HTTP_ACCEPT', '').find('application/xhtml+xml') > -1:
    print 'Content-type: application/xhtml+xml'
else: print 'Content-type: text/html'

And, finally, if you're just serving up static HTML files, you can use Apache's mod_rewrite module to dynamically change the MIME type for conforming browsers by putting these rules in your .htaccess file:

RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml
RewriteCond %{HTTP_ACCEPT} !application/xhtml\+xml\s*;\s*q=0
RewriteCond %{REQUEST_URI} \.html$
RewriteCond %{THE_REQUEST} HTTP/1\.1
RewriteRule .* - [T=application/xhtml+xml]

Next month: The Road to XHTML 2.0, part 2. "What happened to my IMG tag?"


Comment on this articleAre you migrating sites to XHTML 2.0? Share your experience in our forum.
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Newest First
  • WTF XHTML and createElement
    2005-06-08 14:24:16 wewereright1054 [Reply]

    Ok, I have been trying to use the createElement function, and all works well until I try to use it for images. Using a Firefox extension I am able to see what the render source is. Basically shows me the image with all the attributes that I had set for it all looks nice. However, it isn't in XHTML format It just closes the tag with out puting the space and / as we are told to for XHTML. Since this is the road that we are going down to how do we solve this problem. WHen in XHTML and the image tag is like HTML 4.0 you get a space under the image, which blows. You get the same for embed tags, so forget using this method for writing in flash movies or images. I guess we will have to stick with document.write, which again blows. Is there a way to override the native code for this function through javascript so it appears correctly. Any other idea's?

  • Apache correction
    2004-08-22 12:12:45 AdamBull [Reply]

    Using the mod_rewrite method, the code given above doesn't affect directory indexes, since the URI then doesn't contain .html.


    Changing REQUEST_URI to REQUEST_FILENAME fixes the problem.


    Adam Bull

  • Validating PHP
    2004-01-24 13:22:45 Richard Allsebrook [Reply]

    As the W3C validator doesnt send a HTTP_ACCEPT header, you must modify the PHP sample above to read:


    if ( isset($_SERVER["HTTP_ACCEPT"]) and stristr($_SERVER["HTTP_ACCEPT"],"application/xhtml+xml") ) {
    header("Content-type: application/xhtml+xml");
    }
    else {
    header("Content-type: text/html");
    }


    Otherwise an error message is squirted out before the header. This causes further problems as headers MUST be sent before any content.

  • validator.w3.org still shows text/html?
    2003-05-13 03:01:00 Frank Farm [Reply]

    Hey Mark, thanks for the informative article! I'm trying to implement your PHP code snippet on my sites. However, when I insert the code and then validate at validator.w3.org using "verbose output" as you suggest, I still get "text/html" rather than "application/xhtml+xml". If I just say 'header("Content-type: application/xhtml+xml");' then validator.w3.org reports "application/xhtml+xml" as we want, but then since that header is being served for all browsers my sites break in some of them. When I do more testing, it seems that the HTTP_ACCEPT value being sent by the validator is a null string, but that can't be right, right? I figure I must be doing something wrong. Any suggestions?


    • validator.w3.org still shows text/html?
      2004-11-30 18:00:32 zcorpan [Reply]

      The validators HTTP_ACCEPT is empty. You can see it by outlining this with the validator:
      <h1>Your UA Accepts <?php echo $_SERVER['HTTP_ACCEPT']; ?></h1>


      http://lists.w3.org/Archives/Public/www-validator/2004May/0159.html

  • createElement
    2003-05-03 06:42:07 Garrett Smith [Reply]

    The mime-type is not tricky, but the cross browser problems are.


    JS DOM bindings are faster w/XHTML, but if I use createElementNS, it won't work in IE.


    So I can use a branch:


    if(createElementNS) {
    }
    else{
    }


    But this introduces overhead, especially in a lengthy loop of creating a table of many rows.


    Also, document.write doesn't work with XHTML in Mozilla. document.write works with HTML in mozilla, but not properly (it is bugged).

  • Using XML mime types in IE
    2003-03-31 04:53:52 David Carlisle [Reply]

    As An alternative to sending XHTML with an HTML mime type
    one can send it as text/xml so long as you add
    a stylesheet PI specifying an identity transform
    which IE will accept as XHTML-HTML conversion.


    This is particularly useful if you are using XHTML modularisation to use features that are not available in HTML, for example MathML
    see
    http://www.w3.org/Math/XSL
    for an implemenattion of this.
    the stylesheets there are complicated by trying to detect available MathML renderes, but the same
    basic mechanism with a one line identity XSLt transform
    would work for basic XHTML.


    Of course you can then extend to do an XHTML2 - HTML XSLT conversion and experiment with XHTML2 in IE6...



    David


    • Using XML mime types in IE
      2003-04-07 06:57:40 Steven Pemberton [Reply]

      "Of course you can then extend to do an XHTML2 - HTML XSLT conversion and experiment with XHTML2 in IE6..."


      Actually, you don't need to use XSLT to experiment with XHTML 2.0 in IE; you can do it out of the box, just using some CSS extensions. See


      http://w3future.com/weblog/gems/xhtml2.xml


      Works with Opera and Mozilla too.

  • Browsers that accept the XHTML MIME type
    2003-03-27 02:20:01 Steven Pemberton [Reply]

    "Mozilla and its derivatives are the only major browsers that can handle the XHTML MIME type"


    Mark is a little hard here. The link he gives (http://www.w3.org/People/mimasa/test/xhtml/media-types/results) shows that Icab and Opera also handle the XHTML MIME type. They do one thing incorrectly: uppercase selectors in CSS match lowercase element names. But if your CSS only uses lowercase selectors (and there's little reason not to) then those browsers will perform perfectly well. In other words, correct documents will be processed correctly.


    Steven Pemberton

    • Browsers that accept the XHTML MIME type
      2003-04-03 21:23:58 Masayasu Ishikawa [Reply]

      Actually Opera 7.10b1 has fixed this problem, it passed all tests.


      http://www.w3.org/People/mimasa/test/xhtml/media-types/Opera7.10b1


      I'm pleased to know that the situation is being improved.

  • mod_write suggestion...
    2003-03-27 00:17:26 Kevin Hanna [Reply]

    If you choose to use the Apache mod_write method you may want to change the file extention from html to xhtml:


    RewriteCond %{REQUEST_URI} \.xhtml$


    If you have non xhtml files in the same directory (or if you choose to add this rule to your whole site), some browsers I'm sure will choke if they are told the document is "application/xhtml+xml"


    Kevin Hanna

  • Doubt
    2003-03-20 14:34:05 Sérgio Giraldo [Reply]

    So, how should I test my page for MIME/TYPE ? I didn´t get it. Can I put this the 'application/xhtml+xml' inside the page or this is an issue for my ISP ? You know, I´ve benn serving my pages with 'application/xhtml+xml' in META TAGS for a while and I never heard of this problem.
    Furthermore, my page is validated as TEXT/XML no matter the browser I submit (MOZILLA,IE or OPERA).


    Thanks for the great article
    Sergio
    www.milrumos.com.br

    • Doubt
      2003-03-21 08:01:47 Ted Pibil [Reply]

      You can use the mod_rewrite to detect whether or not the UA can handle application/xhtml+xml. Visit here to see how: http://schneegans.de/tips/apache-xhtml.html
      It is in German, but you you should get the gist from the htaccess commands.


      This is a hack, and if you are a purist, you can save your documents as .xhtml and set the mime type for them, but then your only audience will be Gecko users.

    • Testing for MIME type
      2003-03-21 06:54:20 Mark Pilgrim [Reply]

      As stated in the article, you can use the "verbose" option of the W3C validator to see the MIME type your web server is sending with your page. The MIME type can not be faked with META tags; it must be sent in the "Content-type" HTTP header by your web server, before it sends the content of your page. There are a number of ways of setting this up, but META tags are not one of them.


Sponsored By:


Contact Us | Our Mission | Privacy Policy | Advertise With Us | | Submissions Guidelines
Copyright © 2005 O'Reilly Media, Inc.