Last time we read about this stuff was in 2002'ish and HTML was dead. HTML 4.01 was the end - the future was the stricter world of XHTML with XML - the Holy Grail of the markup languages - and we had all better get used to it. So while majority of our pages remained HTML 4.01, we started limbering up with a few XHTML 1.0 transitional pages using this definion:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
At the same time we started to use lower case tags everywhere and got into the habit of closing non-closed tags with /> kluges like <br />, <img /> even in HTML 4.01 pages. Trying to adopt good habits defined by the W3C XHTML documentation slowly and incrementally. Whether we liked it or not. All pain for no demonstrable gain. But strict man, really strict rules.
Now (mid 2007) it seems that HTML is alive and well with HTML5 being worked on (somewhat contentiously) by the W3C as an incremental approach - though perhaps not reaching standard status until 2010 or later. HTML5 apparently is trying to reconcile HTML with XHTML as well as trying to provide a pragmatic evolution. There are some proposed new tags and features which look kinda interesting but the key issue is that instead of threatening complete page failure if you get something wrong (which always scared the devil out of us) HTML5 focuses on a standard way of rendering bad pages - taming the tag soup rather than punishing it with a blank page.
So we are dropping all our XHTML DOCTYPES and reverting to HTML 4.01 transitional. Without any guilt. Instead perhaps we'll focus on trying to get rid of those align, target, width, border attributes and fix the google advertising markup that prevents our pages from validating as strict HTML 4.01. Or perhaps we'll do nothing until the W3C publishes the draft paper defining the changes from HTML4 to HTML5 that they have so far refused to do. No way we're going down a blind-alley again.
And if HTML5 goes off in some bizarre direction like banning all /> closures (like this validator does) - we'll just ignore it like we did (mostly) with XHTML. There is too much HTML out there and sadly too many pages without any DOCTYPE (according to this research only 40% of pages have a DOCTYPE) and only 2.6% of all pages are valid (X)HTML.
And in the meantime due to the lack of standards development in the HTML world we have emerging movements such as Microformats. Not sure if these are aberant or useful.
Seems a simple enough task. Click the link on the right to the W3C's DOCTYPE recommendations. Scan the list, copy-and-paste into your web page and that's that.
So you're being good and you use an XHTML DOCTYPE and get warnings about character sets from the W3C's page validator. But you get the green bar so you ignore them. Then you hit a page that uses strange characters and this time the characters set message really hurts - no green bar. So you delete the first <?xml line before the DOCTYPE and the problem goes away and you get the green bar again. But wait a minute - why was it there in the first place and are there consequences if it's deleted and if so what are those consequences? Seems like you need to know stuff not just blindly cut and paste. And then you stumble across this page and it gets more confusing - because the recommendations are very different. Just what's a working stiff supposed to do?
Here are the four main (non-frameset) DOCTYPES and our view of them:
# for masochists or those using CMS systems <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> # most of us can get away with this one as long as you are # into /> kluges for br, hr, img, meta etc. <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> # apart from align, border, width, target attributes and google adsense # code this one is achievable <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> # ritual suicide may be best if you can't get here # apart from the occasional page bug that is <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
First, DOCTYPE is only one part of the puzzle, the meta tag Content-Type within the <head></head> section also plays a significant role. An example of the tag is shown below.
<meta http-equiv="Content-Type" content="text/html;">
|Irrespective of the DOCTYPE these pages use the normal tag soup HTML parser with a few extra rules thrown in to make life thoroughly miserable. Even if your DOCTYPE is XHTML - no XML treatment.
|Or application/xml. With an XHTML DOCTYPE you get XML treatment.
Every time we see the word charset we get a headache. Our pages historically all have this charset:
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
Charset=windows-1252 defines Microsoft's CP 1252 character set and is widely supported. It differs from ISO 8859-1 (and -15) by using an additional 27 characters. But the <?xml definition uses utf-8, hence the W3C's gentle (mostly) warning when it sees both. UTF-8 is an encoding scheme and implies the Unicode character set. So the fix is either delete the <?xml line, change the charset to read:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Or delete the charset value (since it is already defined in the <?xml line in this case).
Note: If a browser receives a document without any charset value (and no <?xml value) it may default to CP-1252 (MSIE) or ISO 8859-1, which is the default HTTP procotcol character set, but it may also guess the character set based on any meta http-equiv="Content-Language" value present or attempt to sniff the document. It is an extremely good idea to have either a charset or a language and preferably both in all pages. Charsets supported by Microsoft's IE. Firefox's default charset is ISO-8859-1. To get a full list of supported values use Menu->Tools->Options. Select Content Tab, Fonts & Colors, click Advanced button, Character Encoding combo list defines supported values. Opera 9 works in Unicode (UTF-16) by default but attempts to sniff the page character encoding if none is present and supports multiple Legacy encodings which include windows-1252, ISO 8859-1 etc..
There is a lot of opinion based useability and accessibility information which we tend to avoid like the plague. There are links on the right of this page to research based material. What about good old fashioned predominantly text sites, like ours, and how to make them more accessible? Here is what we have done:
Page Structure: We lay our pages out so that the central (content) pane loads before the side menus. This should allow screen readers to get to the meat of the content quicker. Though we still have concerns that the banner section with a ton of drop-down menu HTML loads first.
Aural Stylesheet: We are considering adding an aural style-sheet though the W3C's accessibilty page does not even mention this but would need to fix the menu problem (see previous point) which is pretty structural. If there had been some decent advice when we built the pages we coud have avoided the problem. Finally, we note that there is no button or similar visual clue that indicates an aural style sheet exists. We also see very little difference between a print stylesheet and an aural one since they both should remove non-essential clutter.
Character Sizes: Graphic designers want to control the layout of their pages for aesthetic reasons. This means using pt or px sizes for font rendering. Which can work against the user's browser selection in many cases. We have recently moved to using em sizes in the central pane - the content - while continuing to use pt sizes in the left and right hand menus due to space constraints. The theory being that em is defined in relation to the user's selected point size, rather than being an absolute value.
We have changed our line spacing (leading) in the central content pane from the browsers default (1.1'ish of point size) to 1.3em to increase readibility by adding more whitespace.
We have increased our central column margins from 10 to 20px to improve readability and reduce the amount of text per line.
There is less of this than one might expect. Here are the general conclusions that we found and have - generally - adopted:
Readers prefer black on white text and in general any dark on light is preferred to inverse text - light text on a dark background, for example, white text on a black background.
Sans-serif fonts for screen reading. Serif for printing.
12 point Arial is preferred to 12pt Verdana but 9/10pt Verdana is preferred to 9/10pt Arial.
White space is good. And more is better.
Yeah, that's all we found.
Problems, comments, suggestions, corrections (including broken links) or something to add? Please take the time from a busy life to 'mail us' (at top of screen), the webmaster (below) or info-support at zytrax. You will have a warm inner glow for the rest of the day.
If you are happy it's OK - but your browser is giving a less than optimal experience on our site. You could, at no charge, upgrade to a W3C standards compliant browser such as Firefox