Number of bytes used by effective string in wstring

General FreeBASIC programming questions.
eanon
Posts: 24
Joined: Aug 16, 2006 15:32

Number of bytes used by effective string in wstring

Postby eanon » Sep 14, 2006 8:20

How to know the number of bytes used by the effective string in a wstring ?

Better than a long text, here some code :

Code: Select all

option explicit

const MAXCHARS = 255
dim strBuff as wstring * MAXCHARS

print "Check all different nature of length around wstring"
print "---------------------------------------------------"
strBuff = "Hello!"
print ".Current content of strBuff : '" & strBuff & "' (without quotes)"
print ".Number of possible characters to the total : " & MAXCHARS & " (including final 0)"
print ".Number of reserved bytes to the total : " & sizeof(strBuff)
print ".Number of characters in-use for current content : " & len(strBuff)
print ".Number of bytes in-use for current content : ???"
print " [len(strBuff) * sizeof(wstring) ? but every char needs different nb of bytes]"
sleep
end


How to determinate the last value ? Something like "len(strBuff) * sizeof(wstring)" would suppose every chararacter use the same number of bytes... So, how ? Do I have to browser byte per byte and stop my count when I encounter final zero-char ?
counting_pine
Site Admin
Posts: 6225
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Postby counting_pine » Sep 14, 2006 12:47

Every character does use the same amount of bytes. In Strings and ZStrings, each character uses one byte; in WStrings, each character uses two bytes.
v1ctor
Site Admin
Posts: 3801
Joined: May 27, 2005 8:08
Location: SP / Bra[s]il
Contact:

Postby v1ctor » Sep 14, 2006 14:26

The wstring character size is fixed, but it takes from 1 (DOS) to 4 (Linux) bytes, so always use "characters * len( wstring )", len( wstring ) will return the number of bytes taken in the current platform.
eanon
Posts: 24
Joined: Aug 16, 2006 15:32

Postby eanon » Sep 14, 2006 15:07

So : "len(strBuff) * len(wstring)" rather than "len(strBuff) * sizeof(wstring)" in my example. Understood ! Thanks v1ctor !
counting_pine
Site Admin
Posts: 6225
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Postby counting_pine » Sep 14, 2006 16:24

That's interesting, I didn't know wstring had different sizes on different systems. Sorry for providing semi-false info before.

I think len(some_type) and sizeof(some_type) both give the same values, so you were OK using sizeof before.

I think they only differ with variables that contain string data. Sizeof always returns the size of the type, but for string variables len will return the number of characters in the string.
1000101
Posts: 2556
Joined: Jun 13, 2005 23:14
Location: SK, Canada

Postby 1000101 » Sep 14, 2006 17:24

v1ctor wrote:The wstring character size is fixed, but it takes from 1 (DOS) to 4 (Linux) bytes, so always use "characters * len( wstring )", len( wstring ) will return the number of bytes taken in the current platform.


This brings up an interesting question. Is there a method to load stored wstrings from one platform onto another and have it convert properly? That is, if a wstring is saved under Win32, what is the recommended method to load it properly under Dos/Linux?
VirusScanner
Posts: 775
Joined: Jul 01, 2005 18:45

Postby VirusScanner » Sep 15, 2006 1:04

Use the encoding option of the OPEN function, you should be able to write in a standard text format like UTF-8.
eanon
Posts: 24
Joined: Aug 16, 2006 15:32

Postby eanon » Sep 15, 2006 6:40

counting_pine wrote:That's interesting, I didn't know wstring had different sizes on different systems. Sorry for providing semi-false info before.
Don't worry, counting_pine, it's the role of a public forum to do each learn from each in a different way in every thread.

counting_pine wrote:I think len(some_type) and sizeof(some_type) both give the same values, so you were OK using sizeof before.
In fact, I didn't thought at all about something like len(wstring), because, in my mind, len was only for characters count, not bytes, but v1ctor introduced this way : so, maybe, v1ctor, could you confirm that we can use both and identically len(wstring) and sizeof(wstring) according to our "own style" ?

counting_pine wrote:Sizeof always returns the size of the type, but for string variables len will return the number of characters in the string.
Thus, [number_of_characters * number_of_bytes_per_character_in_this_type] is what we want. We're agree ; and, of course, because the wstring type allocates the same number of bytes to all characters in a given platform. So, this said, it just remains the previous question about difference or not between len|sizeof(wstring). V1ctor ? You are naturally the one who knows, could you highlight this point ?
v1ctor
Site Admin
Posts: 3801
Joined: May 27, 2005 8:08
Location: SP / Bra[s]il
Contact:

Postby v1ctor » Sep 15, 2006 13:31

The only difference with sizeof() and len() is when using them with strings, len() will return the string length in characters (the number of chars in the string data, until the nul-terminator), while sizeof will return the string total size in characters (for var-len string it will return the string descriptor size).

So for any type but strings, both give the same result.

Note: len() with fixed-len (STRING *) strings will always return the string total size, to be compatible with QB.
1000101
Posts: 2556
Joined: Jun 13, 2005 23:14
Location: SK, Canada

Postby 1000101 » Sep 15, 2006 19:26

VirusScanner wrote:Use the encoding option of the OPEN function, you should be able to write in a standard text format like UTF-8.


Unfortunately that is not an option as the files are not written in text but in binary. Strings are just one of many different bits of information which are written and they are written with a length prefix. Unfortunately all it supports right now is ASCII (zStrings) as wStrings are OS dependant. Which is what led to my question to begin with - type conversion. Also, converting wStrings to UTF-8 rather much defeats the purpose of wStrings :P
v1ctor
Site Admin
Posts: 3801
Joined: May 27, 2005 8:08
Location: SP / Bra[s]il
Contact:

Postby v1ctor » Sep 15, 2006 20:08

The wstring (or wchar_t internally) isn't a encoding format, so the input or output file must be opened in utf-8, -16, or -32 encoding, what only works in text-modes.

Now if you want really to store them in binary as-is and load them yourself, http://fbc.cvs.sourceforge.net/fbc/FreeBASIC/inc/utf_conv.bi?revision=1.1&view=markup can be used to convert the UTF to whatever format the current wstring is. That's part of the runtime library, undocumented by now, but won't be removed as INPUT#, WRITE# and PRINT# use them.

For example (untested):

Code: Select all

function load_wstring_bin _
   ( _
     byref file as handle, _
     byval field_pos as integer, _
     byval field_chars as integer _
   ) as wstring ptr

   dim as wstring ptr buff = allocate( field_chars * len( wstring ) )

   if get( #file.num, field_pos, *buff, field_chars ) <> 0 then
      return 0
   end if

   dim as integer chars = len( *buff )

   dim as wstring ptr res = allocate( (chars+1) * len( wstring ) )

   UTFToWChar( file.encoding, buff, res, chars+1 )

   deallocate( buff )

   function = res

end function


The file header must contain the encoding it was saved, that can be calculated like:

Code: Select all

function get_wstr_encoding( )
 select case len( wstring )
 case 1
  function = UTF_ENCOD_ASCII
 case 2
  function = UTF_ENCOD_UTF16
 case 4
  function = UTF_ENCOD_UTF32
 end select
end function

Return to “General”

Who is online

Users browsing this forum: No registered users and 4 guests