Opps, yes, I wasn't thinking about realloc. But I think the code needs more work; I would not be happy including messy code full of false comments (confusion between bytes and characters) into my project.
Was it really necessary to reimplement OPEN ... ENCODING?
GNU/Linux and other Unices more or less all behave the same, with some exceptions like Android, which has an incomplete and non-standard libc, especially when it comes to widechar strings. However I'm not sure whether they all use "UTF-8" in the locale name. I read "On some UNIX-type systems, non-standard names are used for encodings". I'm also not certain whether they all use 4-byte wstrings, but you can use "sizeof(wstring)" to test this instead of assuming (it seems a pretty safe assumption, though). As I wrote in the other thread, it's wrong to assume that the default encoding on Unix is UTF8, which your code does, in the Dw_Str function.
setlocale is very unpleasant, because it controls both external encodings of text (filenames, text printed to console), and internal encodings of strings in your program. Which is why I'm not using mbstowcs/etc at all, I need internal strings to be UTF8 regardless of what the external encoding is! That's also why I don't like wstring: no control over whether it's 16 bit or 32 bit.
Unfortunately, calling setlocale at runtime in Dwstr_To_Str2 isn't thread-safe, but I don't know how to fix that. mbstowcs_l, which takes the locale as an argument, isn't standard.
marpon wrote:its clear you can use it as it is, it's not open source.
Sorry, I can't use it (if I had wanted to use wstrings), and neither can anyone else who writes free/open source software, such as FreeBASIC itself. Your license is incompatible with ALL open source/free software, by the OSI and FSF standard definitions of open source and of free software! The all disallow putting restrictions on selling the executables or source code.
> the valgrind warning on 32 builds
I 'm not familiar with valgrind , so i do not identify where it is in the code
Unfortunately valgrind doesn't print the line numbers; but maybe this gdb backtrace will help:
Code: Select all
(gdb) bt full
#0 0x0403582c in _vgr20370ZU_libcZdsoZa_wcslen (str=0x433f218) at ../shared/vg_replace_strmem.c:1660
#1 0x0804ed66 in DWSTR::ADD (THIS=..., PWSZSTR=<error reading variable>, NLEN=<error reading variable>) at /mnt/common/src/dwstr/DWSTR.inc:454
NLENSTRING = 0
#2 0x0804eaf0 in DWSTR::operator= (THIS=..., PWSZSTR=<error reading variable>) at /mnt/common/src/dwstr/DWSTR.inc:334
No locals.
#3 0x0804b62f in main (__FB_ARGC__=<error reading variable>, __FB_ARGV__=<error reading variable>) at linux_test_1_dwstr.bas:105
...
DW2 = {M_PBUFFER = 0x4339288, M_BUFFERLEN = 0, M_CAPACITY = 260, M_GROWSIZE = 260, M_FLAG = 0}
(gdb) frame 1
#1 0x0804ed66 in DWSTR::ADD (THIS=..., PWSZSTR=<error reading variable>, NLEN=<error reading variable>) at /mnt/common/src/dwstr/DWSTR.inc:454
454 nLenString = .LEN(*pwszStr)
(gdb) p PWSZSTR
$1 = (uinteger *) 0x433f218
(gdb) p NLEN
$2 = -1
its just because its a define to a constant, i imagine it could be faster than every time sizeof(wstring), my quest of speed...
sizeof is always evaluated at compiletime (even if you use inheritance and do sizeof(*baseclassptr), it doesn't figure out the size of the actual object at runtime)
I can imagine you don't use Ucase/Lcase ,mid, reverse, replace, ... even len , to play with unit codes, at least for that kind of functions wstrings are more easy.
If you want to handle surrogate pairs correctly, then UTF16 isn't really easier than UTF8.
Implementing LEN/MID/LEFT/RIGHT/ASC/CHR for UTF8 is not too hard, but almost always I care about the number of pixels long a string is (with variable-width fonts), or drawing just the right/left X pixels of it, instead of the number of characters.
You can just use normal LCASE/UCASE on UTF8 STRINGs if you only want them to work on ASCII characters; implementing them for unicode is far too complex and I don't need that.
Normal INSTR also works with UTF8. The result is bytes, not characters, but actually that's usually what you want anyway.
I don't think I've ever had to reverse a string. But there many other operations, like excluding certain characters or replacing substrings or comparing strings. Actually, it looks like most of them will just work on UTF8. It's a clever encoding.