Code: Select all
VAR c_str = @CHR(&h74, &h65, &h73, &h74, &h20 _
, &he4, &hb8, &had, &he5, &h9b, &hbd, &he8, &haa, &h9e)
test_C_function(c_str)
?LEFT(*c_str, 4)
?MID(*c_str, 6)
Code: Select all
VAR c_str = @CHR(&h74, &h65, &h73, &h74, &h20 _
, &he4, &hb8, &had, &he5, &h9b, &hbd, &he8, &haa, &h9e)
test_C_function(c_str)
?LEFT(*c_str, 4)
?MID(*c_str, 6)
This code was wrong. The compiler compiled but report this message: warning 3(2): Passing different pointer types. The displayed text is also incorrect.TJF wrote: ↑Apr 02, 2022 12:20 const char* translates to CONST ZSTRING PTR
ExampleCode: Select all
test_C_function(@"test 中国語") VAR c_str = @"test 中国語" '' creates a ZSTRING PTR test_C_function(c_str) ?LEFT(*c_str, 4) ?MID(*c_str, 6)
Your reports are horror. My last code produces three outputs:
Code: Select all
fbc -gen gcc -r test.bas
Code: Select all
C_STR$0 = (uint8*)"test \xE4\xB8\xAD\xE5\x9B\xBD\xE8\xAA\x9E";
Code: Select all
VAR c_str = @CHR(&h74, &h65, &h73, &h74, &h20 _
, &he4, &hb8, &had, &he5, &h9b, &hbd, &he8, &haa, &h9e)
Code: Select all
VAR c_str = @"test 中国語"
My code matches your description in the first post. So either your description or your header is wrong. Why didn't you report earlier?
This indicates that the C-lib isn't UTF-8 only. Instead it considers the setting of the system code page -> one more parameter, complicating the issue.
Each source should be English in the origin (UTF-8 encoded). Later it gets adapted by native translators using the libintl tools.
This confirms my asumption. In case of an UTF-8 source file you'll get
Code: Select all
C_STR$0 = (uint8*)"test \xE4\xB8\xAD\xE5\x9B\xBD\xE8\xAA\x9E";
This is correct. Internally however fbc doesn't track UTF-8 usage.
This should be documented. It will save a lot of trouble.coderJeff wrote: ↑Apr 03, 2022 13:51This is correct. Internally however fbc doesn't track UTF-8 usage.
If there is a BOM (including UTF-8), the string is converted to a WSTRING depending on platform and the original UTF-8 encoding is lost.
Windows -> UTF16LE
Linux -> UTF32LE
Big Endian ->UTF32BE
If there is no BOM, then the file is read as ascii and a UTF-8 encoded string in the literal string of the source file is stored as-is without conversion internally as if the bytes were an ascii string. So, in this usage, the string literal is passed around as a ZSTRING.
This is a bug in fbc!
The UTF8 library pointed to by Vortex has both zstring ptr and const zstring ptr defined as:TJF wrote: ↑Apr 02, 2022 12:20 const char* translates to CONST ZSTRING PTR
ExampleCode: Select all
test_C_function(@"test 中国語") VAR c_str = @"test 中国語" '' creates a ZSTRING PTR test_C_function(c_str) ?LEFT(*c_str, 4) ?MID(*c_str, 6)
Code: Select all
type PChar as zstring ptr
type PCChar as const zstring ptr
I agreed. This must be a bug. BTW you were right. Geany is the best editor for FreeBASIC.
fbc will parse files source files encoded with any of those BOM's listed.