Basic Latin (Unicode block)
| C0 Controls and Basic Latin | |
|---|---|
| Range | U+0000..U+007F (128 code points) |
| Plane | BMP |
| Scripts | Latin (52 char.) Common (76 char.) |
| Symbol sets | Arabic numerals Punctuation |
| Major alphabets | English French Spanish German Vietnamese |
| Assigned | 128 code points 33 Control or Format |
| Unused | 0 reserved code points |
| Source standards | ISO/IEC 8859, ISO 646 |
| Unicode version history | |
| 1.0.0 | 128 (+128) |
| Note: [1][2] | |
The Basic Latin or C0 Controls and Basic Latin Unicode block is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding.
The Basic Latin block was included in its present from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire.[3]
Contents
Table of characters[edit]
| Code | Result | Description | Acronym |
|---|---|---|---|
| C0 controls | |||
| U+0000 | Null character | NUL | |
| U+0001 | Start of Heading | SOH | |
| U+0002 | Start of Text | STX | |
| U+0003 | End-of-text character | ETX | |
| U+0004 | End-of-transmission character | EOT | |
| U+0005 | Enquiry character | ENQ | |
| U+0006 | Acknowledge character | ACK | |
| U+0007 | Bell character | BEL | |
| U+0008 | Backspace | BS | |
| U+0009 | Horizontal tab | HT | |
| U+000A | Line feed | LF | |
| U+000B | Vertical tab | VT | |
| U+000C | Form feed | FF | |
| U+000D | Carriage return | CR | |
| U+000E | Shift Out | SO | |
| U+000F | Shift In | SI | |
| U+0010 | Data Link Escape | DLE | |
| U+0011 | Device Control 1 | DC1 | |
| U+0012 | Device Control 2 | DC2 | |
| U+0013 | Device Control 3 | DC3 | |
| U+0014 | Device Control 4 | DC4 | |
| U+0015 | Negative-acknowledge character | NAK | |
| U+0016 | Synchronous Idle | SYN | |
| U+0017 | End of Transmission Block | ETB | |
| U+0018 | Cancel character | CAN | |
| U+0019 | End of Medium | EM | |
| U+001A | Substitute character | SUB | |
| U+001B | Escape character | ESC | |
| U+001C | File Separator | FS | |
| U+001D | Group Separator | GS | |
| U+001E | Record Separator | RS | |
| U+001F | Unit Separator | US | |
| ASCII punctuation and symbols | |||
| U+0020 | Space | SP | |
| U+0021 | ! | Exclamation mark | |
| U+0022 | " | Quotation mark | |
| U+0023 | # | Number sign | |
| U+0024 | $ | Dollar sign | |
| U+0025 | % | Percent sign | |
| U+0026 | & | Ampersand | |
| U+0027 | ' | Apostrophe | |
| U+0028 | ( | Left parenthesis | |
| U+0029 | ) | Right parenthesis | |
| U+002A | * | Asterisk | |
| U+002B | + | Plus sign | |
| U+002C | , | Comma | |
| U+002D | - | Hyphen-minus | |
| U+002E | . | Full stop | |
| U+002F | / | Slash | |
| ASCII digits | |||
| U+0030 | 0 | Digit Zero | |
| U+0031 | 1 | Digit One | |
| U+0032 | 2 | Digit Two | |
| U+0033 | 3 | Digit Three | |
| U+0034 | 4 | Digit Four | |
| U+0035 | 5 | Digit Five | |
| U+0036 | 6 | Digit Six | |
| U+0037 | 7 | Digit Seven | |
| U+0038 | 8 | Digit Eight | |
| U+0039 | 9 | Digit Nine | |
| ASCII punctuation and symbols | |||
| U+003A | : | Colon | |
| U+003B | ; | Semicolon | |
| U+003C | < | Less-than sign | |
| U+003D | = | Equal sign | |
| U+003E | > | Greater-than sign | |
| U+003F | ? | Question mark | |
| U+0040 | @ | At sign | |
| Uppercase Latin alphabet | |||
| U+0041 | A | Latin Capital letter A | |
| U+0042 | B | Latin Capital letter B | |
| U+0043 | C | Latin Capital letter C | |
| U+0044 | D | Latin Capital letter D | |
| U+0045 | E | Latin Capital letter E | |
| U+0046 | F | Latin Capital letter F | |
| U+0047 | G | Latin Capital letter G | |
| U+0048 | H | Latin Capital letter H | |
| U+0049 | I | Latin Capital letter I | |
| U+004A | J | Latin Capital letter J | |
| U+004B | K | Latin Capital letter K | |
| U+004C | L | Latin Capital letter L | |
| U+004D | M | Latin Capital letter M | |
| U+004E | N | Latin Capital letter N | |
| U+004F | O | Latin Capital letter O | |
| U+0050 | P | Latin Capital letter P | |
| U+0051 | Q | Latin Capital letter Q | |
| U+0052 | R | Latin Capital letter R | |
| U+0053 | S | Latin Capital letter S | |
| U+0054 | T | Latin Capital letter T | |
| U+0055 | U | Latin Capital letter U | |
| U+0056 | V | Latin Capital letter V | |
| U+0057 | W | Latin Capital letter W | |
| U+0058 | X | Latin Capital letter X | |
| U+0059 | Y | Latin Capital letter Y | |
| U+005A | Z | Latin Capital letter Z | |
| ASCII punctuation and symbols | |||
| U+005B | [ | Left Square Bracket | |
| U+005C | \ | Backslash [A] | |
| U+005D | ] | Right Square Bracket | |
| U+005E | ^ | Circumflex accent | |
| U+005F | _ | Low line | |
| U+0060 | ` | Grave accent | |
| Lowercase Latin alphabet | |||
| U+0061 | a | Latin Small Letter A | |
| U+0062 | b | Latin Small Letter B | |
| U+0063 | c | Latin Small Letter C | |
| U+0064 | d | Latin Small Letter D | |
| U+0065 | e | Latin Small Letter E | |
| U+0066 | f | Latin Small Letter F | |
| U+0067 | g | Latin Small Letter G | |
| U+0068 | h | Latin Small Letter H | |
| U+0069 | i | Latin Small Letter I | |
| U+006A | j | Latin Small Letter J | |
| U+006B | k | Latin Small Letter K | |
| U+006C | l | Latin Small Letter L | |
| U+006D | m | Latin Small Letter M | |
| U+006E | n | Latin Small Letter N | |
| U+006F | o | Latin Small Letter O | |
| U+0070 | p | Latin Small Letter P | |
| U+0071 | q | Latin Small Letter Q | |
| U+0072 | r | Latin Small Letter R | |
| U+0073 | s | Latin Small Letter S | |
| U+0074 | t | Latin Small Letter T | |
| U+0075 | u | Latin Small Letter U | |
| U+0076 | v | Latin Small Letter V | |
| U+0077 | w | Latin Small Letter W | |
| U+0078 | x | Latin Small Letter X | |
| U+0079 | y | Latin Small Letter Y | |
| U+007A | z | Latin Small Letter Z | |
| ASCII punctuation and symbols | |||
| U+007B | { | Left Curly Bracket | |
| U+007C | | | Vertical bar | |
| U+007D | } | Right Curly Bracket | |
| U+007E | ~ | Tilde | |
| Control character | |||
| U+007F | Delete | DEL | |
- A The letter U+005C (\) may show up as a Yen or Won sign in Japanese/Korean fonts mistaking Unicode (especially UTF-8) as a legacy character set which replaced the backslash with these signs.[4]
Subheadings[edit]
The C0 Controls and Basic Latin block contains six subheadings.[5]
C0 controls[edit]
The C0 Controls, referred to as C0 ASCII control codes in version 1.0, are inherited from ASCII and other 7-bit and 8-bit encoding schemes. The Alias names for C0 controls are taken from the ISO/IEC 6429:1992 standard.[5]
ASCII punctuation and symbols[edit]
This subheading refers to standard punctuation characters, simple mathematical operators, and symbols like the dollar sign, percent, ampersand, underscore, and pipe.[5]
ASCII digits[edit]
The ASCII Digits subheading contains the standard European number characters 1–9 and 0.[5]
Uppercase Latin alphabet[edit]
The Uppercase Latin alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the majuscule.[5]
Lowercase Latin alphabet[edit]
The Lowercase Latin Alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the minuscule.[5]
Control character[edit]
The Control Character subheading contains the "Delete" character.[5]
Compact table[edit]
| C0 Controls and Basic Latin[1] Official Unicode Consortium code chart (PDF) |
||||||||||||||||
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| U+000x | NUL | SOH | STX | ETX | EOT | ENQ | ACK | BEL | BS | HT | LF | VT | FF | CR | SO | SI |
| U+001x | DLE | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN | EM | SUB | ESC | FS | GS | RS | US |
| U+002x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
| U+003x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
| U+004x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
| U+005x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
| U+006x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
| U+007x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | DEL |
Notes
|
||||||||||||||||
Emoji[edit]
The Basic Latin block contains twelve emoji: U+0023, U+002A and U+0030–U+0039.[6][7] They're keycap base characters, for example #️⃣ (U+0023 NUMBER SIGN U+FE0F VS16 U+20E3 COMBINING ENCLOSING KEYCAP).
A standardized variant is defined for a zero with a short diagonal stroke: U+0030 DIGIT ZERO, U+FE00 VS1 (0︀).
The block has 24 standardized variants defined to specify emoji-style (U+FE0F VS16) or text presentation (U+FE0E VS15) for the following twelve base characters: U+0023, U+002A and U+0030–U+0039.[8]
All of these base characters default to a text presentation.
| U+ | 0023 | 002A | 0030 | 0031 | 0032 | 0033 | 0034 | 0035 | 0036 | 0037 | 0038 | 0039 |
| base code point | # | * | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| base+VS15 (text) | #︎ | *︎ | 0︎ | 1︎ | 2︎ | 3︎ | 4︎ | 5︎ | 6︎ | 7︎ | 8︎ | 9︎ |
| base+VS16 (emoji) | #️ | *️ | 0️ | 1️ | 2️ | 3️ | 4️ | 5️ | 6️ | 7️ | 8️ | 9️ |
See also[edit]
References[edit]
- ^ "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
- ^ "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.
- ^ The Unicode Standard Version 1.0, Volume 1. Addison-Wesley Publishing Company, Inc. 1990. ISBN 0-201-56788-1.
- ^ Sorting it all Out : When is a backslash not a backslash?
- ^ a b c d e f g "Unicode 6.2 code charts" (PDF). The Unicode Standard. Retrieved 1 April 2013.
- ^ "UTR #51: Unicode Emoji". Unicode Consortium. 2016-06-03.
- ^ "UCD: Emoji Data for UTR #51". Unicode Consortium. 2016-06-02.
- ^ "Unicode Character Database: Standardized Variation Sequences". The Unicode Consortium.