This page summarises what, at face value, seems a remarkably simple concept - character representation. Turns out it's more like a nightmare. The column marked Relationship tries to define the relationships between the various standards.
Name | Standard | Aliases | Description | Relationship |
ASCII | ANSI X3.4-1986 ISO 646 ITU-T T.50 |
US-ASCII IA5 IRA5 ISO 646 |
ASCII is encoded as an 8 bit field but only uses the 7 bits 00 to 7F (0 to 127 decimal). What is frequently generically called ASCII is normally US-ASCII but various national definitions exist which typically have only two printable differences. | ASCII is the same as IA5 or more properly now International Reference Alphabet No. 5 (IRA5) and previously International Alphabet No. 5 (defined in ITU-T T.50) and ISO 646. It has the same character values as the first 128 entries in ISO 8859-1 (Latin-1), ISO 8859-15 (Latin-9) and CP1252. The first 128 characters in Unicode and ISO 10646 (UCS) are the same but the character encoding is different. |
IA5 | ITU-T T.50 | IRA5 ASCII ISO 646 |
International Alphabet No. 5 (ISO 646) now renamed International Reference Alphabet No. 5 (IRA5). | |
IRA5 | ITU-T T.50 | IA5 ISO 646 ASCII |
International Reference Alphabet No. 5 (IRA5) (was International Alphabet No. 5 - IA5) and is the ITU equivalent of ASCII and ISO 646. IRA5 is encoded as an 8 bit field but only uses the 7 bits 00 to 7F (0 to 127 decimal). | IRA5 is almost the same as ISO 646 and ASCII (typically two - national/international variant - differences). The character values are the same as the first 128 entries in ISO 8859-1 (Latin-1), ISO 8859-15 (Latin-9) and CP1252. The first 128 characters in Unicode and ISO 10646 (UCS) are the same but the character encoding is different. |
ISO 646 | ISO 646 | IA5 IRA5 ASCII |
ISO 646 is encoded as an 8 bit field but only uses the 7 bits 00 to 7F (0 to 127 decimal). | ISO 646 is the same as IRA5 (IA5) and ASCII. The character values are the same as the first 128 entries in ISO 8859-1 (Latin-1), ISO 8859-15 (Latin-9) and CP1252. The first 128 characters in Unicode and ISO 10646 (UCS) are the same but the character encoding is different. |
ISO 8859-1 | ISO 8859-1 | Latin-1 | ISO 8859-1 is part of a large family (ISO 8859-1 to 8859-16) and is encoded as an 8 bit field which uses all 8 bits 00 to FF (0 to 255 decimal). | The first 128 character values are the same as IRA5, ISO 646, ASCII, ISO 8859-15 (Latin-9) and CP1252. The first 128 characters in Unicode and ISO 10646 (UCS) are the same but the character encoding is different. |
ISO 8859-15 | ISO 8859-15 | Latin-9 | ISO 8859-15 is part of a large family (ISO 8859-1 to 8859-16) and is encoded as an 8 bit field which uses all 8 bits 00 to FF (0 to 255 decimal). It differs from 8859-1 by 8 changes including the euro symbol. | The first 128 character values are the same as IRA5, ISO 646, ASCII, ISO 8859-1 (Latin-1) and CP1252. The first 128 characters in Unicode and ISO 10646 (UCS) are the same but the character encoding is different. |
ISO 10646 | ISO 10646 | UCS | ISO 10646 (Universal Character Set) is designed to be the replacement for all previous character sets by providing a single family of standards for the encoding of all possible characters and symbols in all written languages. It has two implementations UCS-2 (a 16 bit encoding) and UCS-4 (a 32 bit encoding). | The first 128 characters (but not the encoding) in ISO 10646 are the same as ASCII, IA5, IRA5 and ISO 646, 8859-1 and 8859-15. Unicode from version 1.1 is the same as ISO 10646. |
Unicode | Unicode Consortium | - | Unicode (currently version 3.0). | From version 1.1 is fully compatible with ISO 10646. |
CP1252 | Microsoft | code page 1252 | Microsoft's version of ISO 8859-1. There are 27 differences from 8859-1 (it includes the euro) - all in range x80 - x9F. 8 bit encoding. | The first 128 character values are the same as IRA5, ISO 646, ASCII, ISO 8859-1 (Latin-1) and ISO 8859-15 (Latin-9). The first 128 characters in Unicode and ISO 10646 (UCS) are the same but the character encoding is different. |
Transformations | ||||
These values define how the underlying codeset of Unicode/ISO 10646 are sent over the wire. They are not character sets. | ||||
UTF-7 | RFC 2152 | - | UCS Transformation Format-7. Defines how ISO 10646 (UCS) is transformed for non-MIME email data communications. May use from 1 to 9 octets for a single ISO 10646/Unicode character. | |
UTF-8 | RFC 3629 | UTF-2 FSS-UTF |
UCS Transformation Format-8. Defines how ISO 10646 (UCS) is transformed for MIME enabled data communications. May use from 1 to 7 octets for a single ISO 10646/Unicode character. | |
UTF-16 | RFC 2781 | - | UCS Transformation Format-16. Defines how ISO 10646 (UCS) is transformed for data communications. May use 1 or 2 octets for a single ISO 10646/Unicode character and thus reduces any UCS-4 to a UCS-2 format before encoding. |
ISO 8859-1 Latin alphabet No. 1 West European ISO 8859-2 Latin alphabet No. 2 Central and East European ISO 8859-3 Latin alphabet No. 3 South European, Maltese & Esperanto ISO 8859-4 Latin alphabet No. 4 North European ISO 8859-5 Latin/Cyrillic alphabet Slavic languages ISO 8859-6 Latin/Arabic alphabet Arabic ISO 8859-7 Latin/Greek alphabet modern Greek ISO 8859-8 Latin/Hebrew alphabet Hebrew and Yiddish ISO 8859-9 Latin alphabet No. 5 Turkish ISO 8859-10 Latin alphabet No. 6 Nordic (Sámi, Inuit, Icelandic) ISO 8859-11 Latin/Thai alphabet Thai ISO 8859-12 not been defined) ISO 8859-13 Latin alphabet No. 7 Baltic Rim ISO 8859-14 Latin alphabet No. 8 Celtic ISO 8859-15 Latin alphabet No. 9 adds euro to -1 (8 changes) ISO 8859-16 Latin alphabet No. 10 South-Eastern Europe
Problems, comments, suggestions, corrections (including broken links) or something to add? Please take the time from a busy life to 'mail us' (at top of screen), the webmaster (below) or info-support at zytrax. You will have a warm inner glow for the rest of the day.
Tech Stuff
If you are happy it's OK - but your browser is giving a less than optimal experience on our site. You could, at no charge, upgrade to a W3C standards compliant browser such as Firefox
Search
Share
Page
Standards
ISO (International)
IEC (International)
ANSI (US)
DIN (Germany)
ETSI (EU)
BSI (UK)
AFNOR (France)
Telecom
TIA (US)
ECIA (US)
ITU (International)
IEEE (US)
ETSI (EU)
OFCOM (UK)
Internet
Electronics
Site
Copyright © 1994 - 2024 ZyTrax, Inc. All rights reserved. Legal and Privacy |
site by zytrax hosted by javapipe.com |
web-master at zytrax Page modified: January 20 2022. |