UTF-8 Variable Length String Library

User projects written in or related to FreeBASIC.
jj2007
Posts: 1210
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: UTF-8 Variable Length String Library

Postby jj2007 » Dec 23, 2017 18:31

Munair wrote:- Added: UReverse()
You inspired me, thanks ;-)
marcov
Posts: 2757
Joined: Jun 16, 2005 9:45
Location: Eindhoven, NL
Contact:

Re: UTF-8 Variable Length String Library

Postby marcov » Dec 23, 2017 18:38

Munair wrote: The code that ICU develops to cover all rules is some 15MB in size. It is easiest to use composed characters, but unfortunately the use of separate diacritical marks is encouraged.


And afaik on OS X, System APIs expect and return denormalized strings.
Munair
Posts: 834
Joined: Oct 19, 2017 15:00
Location: 't Zand, NL
Contact:

Re: UTF-8 Variable Length String Library

Postby Munair » Jan 02, 2018 22:01

With the recent discussion of how GUI libraries targeting multiple platforms usually lead to bloatware, it may be interesting to note that today's Unicode and support for converging between codepages easily 'blow up' an executable to 230kB. Here is a piece of code to demonstrate some common tasks with UTF-8 that I've been testing:

Code: Select all

#include once "encodings.bi"

dim buffer as string
dim s as string

' UTF-32 file
open "textfile.txt" for input as #1
  line input #1, buffer
close #1

' encode to UTF-8
s = Encodings.Decode(buffer)
if Encodings.Invalid then
  print "Binary file."
  end
end if

' convert UTF-8 to UTF-16 and write to file
buffer = Encodings.EncodeUTF16BE(s)
open "textfile16.txt" for output as #1
  print #1, chr(&hFE, &hFF) + buffer
close #1

' convert UTF-8 to UTF-32 and write to file
buffer = Encodings.EncodeUTF32BE(s)
open "textfile32.txt" for output as #1
  print #1, chr(&h0, &h0, &hFE, &hFF) + buffer
close #1
end

#include once "encodings.bas"
It may not be something that some of us westerners realize, accustomed as we are to good old plain ASCII. But Unicode has become a standard and should be supported by any application dealing with exchanging text.
Iczer
Posts: 47
Joined: Jul 04, 2017 18:09

Re: UTF-8 Variable Length String Library

Postby Iczer » Jan 16, 2018 16:43

about converting ASCII-string into normal utf-8 string : what can be done with a numeric character reference : "&#nnnn;" or " &#xhhhh;"
where nnnn is the code point in decimal form, and hhhh is the code point in hexadecimal form?
marcov
Posts: 2757
Joined: Jun 16, 2005 9:45
Location: Eindhoven, NL
Contact:

Re: UTF-8 Variable Length String Library

Postby marcov » Jan 16, 2018 18:01

Iczer wrote:about converting ASCII-string into normal utf-8 string : what can be done with a numeric character reference : "&#nnnn;" or " &#xhhhh;"
where nnnn is the code point in decimal form, and hhhh is the code point in hexadecimal form?


That form is not ascii, but a document specific escape sequence (like html). You would need an interpreter for the relevant document format to translate it into proper utf8.
Last edited by marcov on Jan 16, 2018 19:14, edited 1 time in total.
jj2007
Posts: 1210
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: UTF-8 Variable Length String Library

Postby jj2007 » Jan 16, 2018 18:43

Iczer wrote:about converting ASCII-string into normal utf-8 string : what can be done with a numeric character reference : "&#nnnn;" or " &#xhhhh;"
where nnnn is the code point in decimal form, and hhhh is the code point in hexadecimal form?

You can parse the string and translate the sequences to their UTF-8 or UTF-16 equivalents. Can you provide an example for testing?

Return to “Projects”

Who is online

Users browsing this forum: srvaldez and 21 guests