(Z)String UTF8 Aware

General FreeBASIC programming questions.
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: (Z)String UTF8 Aware

Post by marcov »

Munair wrote: For example, UTF-8 is 100% ASCII compatible
Factually incorrect. Ascii can be broken into pieces on every index yet both parts will still be printable glyphs, UTF-8 cannot.
Munair
Posts: 1286
Joined: Oct 19, 2017 15:00
Location: Netherlands
Contact:

Re: (Z)String UTF8 Aware

Post by Munair »

marcov wrote:
Munair wrote: For example, UTF-8 is 100% ASCII compatible
Factually incorrect. Ascii can be broken into pieces on every index yet both parts will still be printable glyphs, UTF-8 cannot.
I think you know what I mean.
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: (Z)String UTF8 Aware

Post by marcov »

Munair wrote:
marcov wrote:
Munair wrote: For example, UTF-8 is 100% ASCII compatible
Factually incorrect. Ascii can be broken into pieces on every index yet both parts will still be printable glyphs, UTF-8 cannot.
I think you know what I mean.
To be honest, I do not. UTF-16 also has the same base range of ASCII in each char.
Munair
Posts: 1286
Joined: Oct 19, 2017 15:00
Location: Netherlands
Contact:

Re: (Z)String UTF8 Aware

Post by Munair »

marcov wrote:To be honest, I do not. UTF-16 also has the same base range of ASCII in each char.
Yes, but UTF-16 allows only two code units, word or double word, whereas UTF-8 allows four units, single byte to double word. Even though &h0041 would work as a valid ASCII value, the word would have to be cast to byte, which, strictly speaking is not compatible. Even worse, with UTF-16 endianess, &h4100 would fail altogether. Simply put, one cannot guarantee that UTF-16 encoded text contains valid ASCII.

In short: "UTF-8 requires either 8, 16, 24 or 32 bits (one to four octets) to encode a Unicode character, UTF-16 requires either 16 or 32 bits to encode a character, and UTF-32 always requires 32 bits to encode a character."
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: (Z)String UTF8 Aware

Post by jj2007 »

Munair wrote:UTF-8 is 100% ASCII compatible
To clarify: The range of the ASCII (http://www.asciitable.com/) charset is 0...127, and they are all identical to the respective UTF-8 range. So YES, UTF-8 is 100% ASCII compatible.
caseih
Posts: 2158
Joined: Feb 26, 2007 5:32

Re: (Z)String UTF8 Aware

Post by caseih »

Right. ASCII is valid utf-8 and for code points under 128, utf-8 bytes are the same as ASCII. The same cannot be said for utf-16.
Post Reply