Automatic conversion WSTRING to (Z)STRING

New to FreeBASIC? Post your questions here.
Post Reply
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Automatic conversion WSTRING to (Z)STRING

Post by Juergen Kuehlwein »

FB performs automatic conversion when assigning a WSTRING to a STRING or ZSTRING, therfore i can code:

Code: Select all

DIM s AS STRING
DIM w as WSTRING * 16

w = "Hello"
s = w
w = s
print s
print w
The automatic conversion performed here is "wide string to ANSI" and vice versa in Windows.

Is this the same in Linux ( "wide string <-> ANSI"), or is the automatic conversion in Linux "wide string <-> UTF8" ?


JK
badidea
Posts: 2591
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: Automatic conversion WSTRING to (Z)STRING

Post by badidea »

I don't know anything about wstring and utf.
What I can say, is that this code seems to work fine on my linux laptop:

Code: Select all

DIM s AS STRING
DIM w as WSTRING * 16

w = "H€llΩ"
s = w
w = s
print s, len(s)
print w, len(w)
Result:
H€llΩ 8
H€llΩ 5
MrSwiss
Posts: 3910
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: Automatic conversion WSTRING to (Z)STRING

Post by MrSwiss »

badidea wrote:Result:
H€llΩ 8
H€llΩ 5
As seen above len(String) contains 8 (UByte) while len(WString) contains 5 (UShort?),
which indicates, that *..IX* systems use UTF-8 (as standard) which is a very different
system, looking at it from a DOS/WIN perspective (ASCII/ANSI) ...
(Btw. not certain, that WString on **ix isn't evtl. 32 bits (ULong) UTF-32)

The only immutable part of all those systems is: 0 to 127, which isn't ever changed!
(as described in the lower half, of the ASCII-Table)
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: Automatic conversion WSTRING to (Z)STRING

Post by Juergen Kuehlwein »

Thanks a lot!

On **ix WSTRING is 32 bits (ULong) UTF-32), but the system standard is UTF8, i already know this. The one thing i wanted to know is, how FB handles automatic conversions on **ix systems. Your replies demonstrate, that it must be "wide <-> UTF8", which is different from Windows. Fine - this makes many things easier...


JK
coderJeff
Site Admin
Posts: 4326
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Automatic conversion WSTRING to (Z)STRING

Post by coderJeff »

Just be mindful where the conversion is occurring.

On windows, I have observed, that an editor might display UTF-8 strings with the characters (glyphs) you expect but the actual bytes in the file are really just ASCII chars >= 128. When fbc reads the source code, the translation of literals depends on the BOM saved with the file.

For explicit testing, try encoding the string with !"\uXXXX" escapes to compare.
badidea
Posts: 2591
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: Automatic conversion WSTRING to (Z)STRING

Post by badidea »

I used Geany, the source hex-codes are:
"H" = 48
"€" = E2 82 AC
"l" = 6C
"l" = 6C
"Ω" = CE A9

UTF-8, I understand.

Same output (lengths 8 & 5)with this code:

Code: Select all

DIM s AS STRING
DIM w as WSTRING * 16

'w = "H€llΩ"
w = !"\u0048\u20ac\u006c\u006c\u03a9"
s = w
w = s
print s, len(s)
print w, len(w)
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: Automatic conversion WSTRING to (Z)STRING

Post by Juergen Kuehlwein »

Jeff,

yes i know, the source file encoding IS important when using literals like "фыва.txt", for correct working you need a correct encoding (BOM, UTF8 or UTF16). I have an experimental compiler and rtl version running in Windows, which can deal with real unicode file names like "фыва.txt". As it seems (and this was the reason for my question) i don´t have to adapt anything for the **ix part, because there automatic conversion to UTF8 takes place for file and directory names. I´m speaking of implicit conversion for procedure arguments and results. This affects statements like "OPEN", "COMMAND", "CURDIR", "DIR", etc. (all statements dealing with paths)

For Windows we need to implement the ...W version of the Windows API or the _w... functions of the C runtime for these functions, for Linux it should already be working.

So please could someone confirm that a WSTRING file name like "фыва.txt" is working in Linux:

Code: Select all

DIM w AS WSTRING * 260 = "фыва.txt"

OPEN w FOR OUTPUT ENCODING "ascii" AS #1 
  PUT #1,,"test"
CLOSE #1
This should create a file named "фыва.txt" in Linux. In Windows it currently fails to create a file at all, my new version can do it.


JK
Last edited by Juergen Kuehlwein on May 05, 2019 21:25, edited 2 times in total.
badidea
Posts: 2591
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: Automatic conversion WSTRING to (Z)STRING

Post by badidea »

Works fine here:
badidea@linux-laptop:~/Desktop/wstring$ ls -l
total 44
-rwxrwxr-x 1 badidea badidea 36584 May 5 23:22 wstring
-rw-rw-r-- 1 badidea badidea 270 May 5 23:21 wstring.bas
-rw-rw-r-- 1 badidea badidea 4 May 5 23:22 фыва.txt
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: Automatic conversion WSTRING to (Z)STRING

Post by Juergen Kuehlwein »

Great - thanks a lot!
marpon
Posts: 342
Joined: Dec 28, 2012 13:31
Location: Paris - France

Re: Automatic conversion WSTRING to (Z)STRING

Post by marpon »

As i can remember , when checking some years ago, it is not so simple,
the conversions wstring to string and opposite are very dependant of the locale charset,

in linux that locale charset could be normally configured to an utf8 charset like e.g "de_DE.UTF-8"
in this case the conversion is done correctly to utf8

but
linux give the choice for different Locale e.g "fr_FR.iso-8859-1", (in fact similar to cp1252 in windows)
in that case it is a real mess...
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Automatic conversion WSTRING to (Z)STRING

Post by marcov »

MrSwiss wrote:As seen above len(String) contains 8 (UByte) while len(WString) contains 5 (UShort?),
which indicates, that *..IX* systems use UTF-8 (as standard) which is a very different
Note that it depends on distro, so it is not 100%. Be careful with embedded or old distros (like e.g. the linux on your NAS if you have access to it)
system, looking at it from a DOS/WIN perspective (ASCII/ANSI) ...
Windows 10 since 2 iterations (1803) hasn an option to make the 1-byte codepage utf8. It is still beta in fall release.
(Btw. not certain, that WString on **ix isn't evtl. 32 bits (ULong) UTF-32)
Strictly speaking afaik the wchar_t size is platform dependent, but usually 4-bytes in practice. But that is because it isn't widely used as API string type , so the wchar_t is mostly only used for certain conversions and inside font rendering routines.

Afaik QT mostly uses 16-bit strings, and so does Apple's Cocoa.
The only immutable part of all those systems is: 0 to 127, which isn't ever changed!
(as described in the lower half, of the ASCII-Table)
unicode is now 25 years old or so. (first drafts early nineties). You can be arch conservative with 25 years old junk, and still support unicode :-)
Post Reply