Automatic conversion WSTRING to (Z)STRING

Juergen Kuehlwein · Post by **Juergen Kuehlwein** » May 05, 2019 13:41

FB performs automatic conversion when assigning a WSTRING to a STRING or ZSTRING, therfore i can code:

DIM s AS STRING
DIM w as WSTRING * 16

w = "Hello"
s = w
w = s
print s
print w

The automatic conversion performed here is "wide string to ANSI" and vice versa in Windows.

Is this the same in Linux ( "wide string <-> ANSI"), or is the automatic conversion in Linux "wide string <-> UTF8" ?

JK

badidea · Post by **badidea** » May 05, 2019 16:08

I don't know anything about wstring and utf.
What I can say, is that this code seems to work fine on my linux laptop:

Code: Select all

DIM s AS STRING
DIM w as WSTRING * 16

w = "H€llΩ"
s = w
w = s
print s, len(s)
print w, len(w)

Result:
H€llΩ 8 H€llΩ 5

MrSwiss · Post by **MrSwiss** » May 05, 2019 16:20

badidea wrote:Result:
H€llΩ 8
H€llΩ 5

As seen above len(String) contains 8 (UByte) while len(WString) contains 5 (UShort?),
which indicates, that *..IX* systems use UTF-8 (as standard) which is a very different
system, looking at it from a DOS/WIN perspective (ASCII/ANSI) ...
(Btw. not certain, that WString on **ix isn't evtl. 32 bits (ULong) UTF-32)

The only immutable part of all those systems is: 0 to 127, which isn't ever changed!
(as described in the lower half, of the ASCII-Table)

Juergen Kuehlwein · Post by **Juergen Kuehlwein** » May 05, 2019 17:34

Thanks a lot!

On **ix WSTRING is 32 bits (ULong) UTF-32), but the system standard is UTF8, i already know this. The one thing i wanted to know is, how FB handles automatic conversions on **ix systems. Your replies demonstrate, that it must be "wide <-> UTF8", which is different from Windows. Fine - this makes many things easier...

JK

Post by **coderJeff** » May 05, 2019 18:26

Just be mindful where the conversion is occurring.

On windows, I have observed, that an editor might display UTF-8 strings with the characters (glyphs) you expect but the actual bytes in the file are really just ASCII chars >= 128. When fbc reads the source code, the translation of literals depends on the BOM saved with the file.

For explicit testing, try encoding the string with !"\uXXXX" escapes to compare.

badidea · Post by **badidea** » May 05, 2019 20:14

I used Geany, the source hex-codes are:
"H" = 48 "€" = E2 82 AC "l" = 6C "l" = 6C "Ω" = CE A9
UTF-8, I understand.

Same output (lengths 8 & 5)with this code:

Code: Select all

DIM s AS STRING
DIM w as WSTRING * 16

'w = "H€llΩ"
w = !"\u0048\u20ac\u006c\u006c\u03a9"
s = w
w = s
print s, len(s)
print w, len(w)

Juergen Kuehlwein · Post by **Juergen Kuehlwein** » May 05, 2019 21:17

Jeff,

yes i know, the source file encoding IS important when using literals like "фыва.txt", for correct working you need a correct encoding (BOM, UTF8 or UTF16). I have an experimental compiler and rtl version running in Windows, which can deal with real unicode file names like "фыва.txt". As it seems (and this was the reason for my question) i don´t have to adapt anything for the **ix part, because there automatic conversion to UTF8 takes place for file and directory names. I´m speaking of implicit conversion for procedure arguments and results. This affects statements like "OPEN", "COMMAND", "CURDIR", "DIR", etc. (all statements dealing with paths)

For Windows we need to implement the ...W version of the Windows API or the _w... functions of the C runtime for these functions, for Linux it should already be working.

So please could someone confirm that a WSTRING file name like "фыва.txt" is working in Linux:

Code: Select all

DIM w AS WSTRING * 260 = "фыва.txt"

OPEN w FOR OUTPUT ENCODING "ascii" AS #1 
  PUT #1,,"test"
CLOSE #1

This should create a file named "фыва.txt" in Linux. In Windows it currently fails to create a file at all, my new version can do it.

JK

badidea · Post by **badidea** » May 05, 2019 21:23

Works fine here:
badidea@linux-laptop:~/Desktop/wstring$ ls -l total 44 -rwxrwxr-x 1 badidea badidea 36584 May 5 23:22 wstring -rw-rw-r-- 1 badidea badidea 270 May 5 23:21 wstring.bas -rw-rw-r-- 1 badidea badidea 4 May 5 23:22 фыва.txt

Juergen Kuehlwein · Post by **Juergen Kuehlwein** » May 05, 2019 21:26

Great - thanks a lot!

marpon · Post by **marpon** » May 06, 2019 7:39

As i can remember , when checking some years ago, it is not so simple,
the conversions wstring to string and opposite are very dependant of the locale charset,

in linux that locale charset could be normally configured to an utf8 charset like e.g "de_DE.UTF-8"
in this case the conversion is done correctly to utf8

but
linux give the choice for different Locale e.g "fr_FR.iso-8859-1", (in fact similar to cp1252 in windows)
in that case it is a real mess...

marcov · Post by **marcov** » May 06, 2019 7:50

MrSwiss wrote:As seen above len(String) contains 8 (UByte) while len(WString) contains 5 (UShort?),
which indicates, that *..IX* systems use UTF-8 (as standard) which is a very different

Note that it depends on distro, so it is not 100%. Be careful with embedded or old distros (like e.g. the linux on your NAS if you have access to it)

system, looking at it from a DOS/WIN perspective (ASCII/ANSI) ...

Windows 10 since 2 iterations (1803) hasn an option to make the 1-byte codepage utf8. It is still beta in fall release.

(Btw. not certain, that WString on **ix isn't evtl. 32 bits (ULong) UTF-32)

Strictly speaking afaik the wchar_t size is platform dependent, but usually 4-bytes in practice. But that is because it isn't widely used as API string type , so the wchar_t is mostly only used for certain conversions and inside font rendering routines.

Afaik QT mostly uses 16-bit strings, and so does Apple's Cocoa.

The only immutable part of all those systems is: 0 to 127, which isn't ever changed!
(as described in the lower half, of the ASCII-Table)

unicode is now 25 years old or so. (first drafts early nineties). You can be arch conservative with 25 years old junk, and still support unicode :-)

Automatic conversion WSTRING to (Z)STRING

Automatic conversion WSTRING to (Z)STRING

Re: Automatic conversion WSTRING to (Z)STRING

Re: Automatic conversion WSTRING to (Z)STRING

Re: Automatic conversion WSTRING to (Z)STRING

Re: Automatic conversion WSTRING to (Z)STRING

Re: Automatic conversion WSTRING to (Z)STRING

Re: Automatic conversion WSTRING to (Z)STRING

Re: Automatic conversion WSTRING to (Z)STRING

Re: Automatic conversion WSTRING to (Z)STRING

Re: Automatic conversion WSTRING to (Z)STRING

Re: Automatic conversion WSTRING to (Z)STRING