You inspired me, thanks ;-)Munair wrote:- Added: UReverse()
UTF-8 Variable Length String Library
Re: UTF-8 Variable Length String Library
Re: UTF-8 Variable Length String Library
And afaik on OS X, System APIs expect and return denormalized strings.Munair wrote: The code that ICU develops to cover all rules is some 15MB in size. It is easiest to use composed characters, but unfortunately the use of separate diacritical marks is encouraged.
Re: UTF-8 Variable Length String Library
With the recent discussion of how GUI libraries targeting multiple platforms usually lead to bloatware, it may be interesting to note that today's Unicode and support for converging between codepages easily 'blow up' an executable to 230kB. Here is a piece of code to demonstrate some common tasks with UTF-8 that I've been testing:
It may not be something that some of us westerners realize, accustomed as we are to good old plain ASCII. But Unicode has become a standard and should be supported by any application dealing with exchanging text.
Code: Select all
#include once "encodings.bi"
dim buffer as string
dim s as string
' UTF-32 file
open "textfile.txt" for input as #1
line input #1, buffer
close #1
' encode to UTF-8
s = Encodings.Decode(buffer)
if Encodings.Invalid then
print "Binary file."
end
end if
' convert UTF-8 to UTF-16 and write to file
buffer = Encodings.EncodeUTF16BE(s)
open "textfile16.txt" for output as #1
print #1, chr(&hFE, &hFF) + buffer
close #1
' convert UTF-8 to UTF-32 and write to file
buffer = Encodings.EncodeUTF32BE(s)
open "textfile32.txt" for output as #1
print #1, chr(&h0, &h0, &hFE, &hFF) + buffer
close #1
end
#include once "encodings.bas"
Re: UTF-8 Variable Length String Library
about converting ASCII-string into normal utf-8 string : what can be done with a numeric character reference : "&#nnnn;" or " &#xhhhh;"
where nnnn is the code point in decimal form, and hhhh is the code point in hexadecimal form?
where nnnn is the code point in decimal form, and hhhh is the code point in hexadecimal form?
Re: UTF-8 Variable Length String Library
That form is not ascii, but a document specific escape sequence (like html). You would need an interpreter for the relevant document format to translate it into proper utf8.Iczer wrote:about converting ASCII-string into normal utf-8 string : what can be done with a numeric character reference : "&#nnnn;" or " &#xhhhh;"
where nnnn is the code point in decimal form, and hhhh is the code point in hexadecimal form?
Last edited by marcov on Jan 16, 2018 19:14, edited 1 time in total.
Re: UTF-8 Variable Length String Library
You can parse the string and translate the sequences to their UTF-8 or UTF-16 equivalents. Can you provide an example for testing?Iczer wrote:about converting ASCII-string into normal utf-8 string : what can be done with a numeric character reference : "&#nnnn;" or " &#xhhhh;"
where nnnn is the code point in decimal form, and hhhh is the code point in hexadecimal form?
Re: UTF-8 Variable Length String Library
The links in the first post have been updated.
Re: UTF-8 Variable Length String Library
Moved out of Archive since the code is being updated again.
Re: UTF-8 Variable Length String Library
Thanks Imortis!
Re: UTF-8 Variable Length String Library
UMid is copy() in Pascal? But that is a build-in, for strings probably in astrings/ustrings.inc. For utf8 in characters that is harder, perhaps lazutils?Munair wrote:
One routine that's missing is the UMid statement. Perhaps someone can help out there, but I doubt there is a real need for it. Meanwhile, you're welcome to test, use and improve the code. MLGPL License included. ;)
Re: UTF-8 Variable Length String Library
(Free)BASIC has the functions Left(), Right() and Mid(), which are equivalent to Pascal's Copy(), but also a Mid statement for convenience to directly change a specific part of the string:marcov wrote:UMid is copy() in Pascal? But that is a build-in, for strings probably in astrings/ustrings.inc. For utf8 in characters that is harder, perhaps lazutils?
Code: Select all
' function
s = mid(s, 2, 3)
' statement
mid(s, 4, 5) = text
' statement equivalent to:
s = left(s, 3) + text + mid(s, 9)
Re: UTF-8 Variable Length String Library
Ah yes, there is also the lhs version. No there is none in that form, nor syntax (with a function/buildin on the left hand side of =)Munair wrote:
I don't know of a Pascal equivalent for the Mid statement.
Re: UTF-8 Variable Length String Library
utf8.bi(30) error 4: Duplicated definition in 'type PChar as zstring ptr'
Comment out this line get rid of the error.
Comment out this line get rid of the error.
Re: UTF-8 Variable Length String Library
When compiling examples.bas or utf8.bas (both including utf8.bi) I'm not getting this error. As far as I can see there's no duplicate definition of PChar. Did you include utf8.bi in another file/project?
I recommend against commenting out definitions in the library as it could break library functionality. Try to find where else PChar is defined in your own file/project instead. You may also get this error if you try to include UTF8.bi more than once (e.g. without the once keyword).
Re: UTF-8 Variable Length String Library
PCHAR is a windows data type. If windows.bi is included, PCHAR will be defined.