Check out my first novel, midnight's simulacra!

Using Unicode: Difference between revisions

From dankwiki
No edit summary
Line 1: Line 1:
Unicode 14.0 is scheduled for release September 2021. There is no "Unicode 13.1", but Emoji 13.1 was released in September 2020 under the auspices of the Unicode Consortium. Unicode 13.0 was released in March 2020.
Unicode 14.0 was released September 14, 2021. There is no "Unicode 13.1", but Emoji 13.1 was released in September 2020 under the auspices of the Unicode Consortium. Unicode 13.0 was released in March 2020.


Good references include:
Good references include:
Line 39: Line 39:
* Monospace (U1D7F6+): 𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿
* Monospace (U1D7F6+): 𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿
* Seven-segment (U1FBF0+): 🯰🯱🯲🯳🯴🯵🯶🯷🯸🯹
* Seven-segment (U1FBF0+): 🯰🯱🯲🯳🯴🯵🯶🯷🯸🯹
==UTF-8==
The One True Encoding, almost always. See [https://www.ietf.org/rfc/rfc3629.txt RFC 3629] and Annex D of ISO/IEC 10646.
UTF-8 encoding yields up to four bytes per encoded codepoint. Valid ASCII (all characters less than 0x80) are directly encoded using a single byte. This four byte maximum arises from RFC 3629 §3, which defines UTF-8 on codepoints through only 0x10FFFF (suitable for handling the 17 defined Planes as of Unicode 14); if the 10646 maximum of U+7FFFFFFF is considered, UTF-8 would encode up to six bytes.
The 2048 codepoints U+D800 through U+DFFF cannot be encoded in UTF-8; they are metapoints intended for use with UTF-16.
Along with the octets F5--FF, C0 and C1 never appear in valid UTF-8. ASCII characters never show up as parts of other, multibyte characters.
Octets of the form 10xxxxxx are continuation bytes, and can only be found after a valid initial byte.


== [[libc]] ==
== [[libc]] ==