Getting the true unicode code

New to FreeBASIC? Post your questions here.
Morthawt
Posts: 8
Joined: Nov 03, 2012 3:40

Getting the true unicode code

Postby Morthawt » Nov 03, 2012 3:45

Hi. I am new to this coming from an autoit scripting looking for more power and speed. I am trying to get the unicode code for characters so I can build my base64 converter in freebasic to compare speeds etc.

The issue I am having is that when I ask for a unicode number for a character with Asc it is giving me totally incorrect numbers. The € symbol reads as it being number 128 when I print its value using Asc. I was on the wiki for UTF trying to add up the binary and it being way off so I did some searching and found the true number for € is actually 8364.

What function/command etc do I need to actually get a unicode number? The Asc says it does but obviously it doesn't. I cannot manually create unicode base 2 binary if the number Asc gives me is not accurate.

Thanks in advance.
Morthawt
Posts: 8
Joined: Nov 03, 2012 3:40

Getting the real unicode code for characters

Postby Morthawt » Nov 03, 2012 3:50

I just made a post but its missing and must have errored out because it is not showing up on my profile.

Basically I am asking how do you get the REAL unicode code for a character because Asc is giving incorrect numbers. € shows as number 128 when really it is 8364. I have verified this online and by using Autoit's wAsc feature. So far freebasic only shows me wrong numbers even though it says it will give me unicode numbers.

The purpose for me needing this is so I can manually generate and decode UTF-8 base 2 8-bit binary for my base64 converter.

If you could help me out I would appreciate it, I am new to this coming only from Autoit scripting.

Thanks
counting_pine
Site Admin
Posts: 6166
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Re: Getting the real unicode code for characters

Postby counting_pine » Nov 03, 2012 4:39

Hi Morthawt, welcome to the forum :)
Morthawt wrote:I just made a post but its missing and must have errored out because it is not showing up on my profile.

Due to spam problems we decided a while back to vet all posts from newly registered members, in order to prevent spam getting onto the forum.
Sometimes this can result in a short delay until a moderator appears and sees the new posts.
I've approved your posts and combined them into one thread for you. And I've removed you from the Newly Registered group so any future posts will sail through.
TJF
Posts: 3456
Joined: Dec 06, 2009 22:27
Location: N47°, E15°

Re: Getting the true unicode code

Postby TJF » Nov 03, 2012 10:36

Hello Morthawt, welcome to the forum!

I guess you neither mean UTF-8 nor UTF-32, but UTF-16. So I recommend to check the WSTRING type of FreeBasic and related statements.
Morthawt
Posts: 8
Joined: Nov 03, 2012 3:40

Re: Getting the true unicode code

Postby Morthawt » Nov 03, 2012 11:24

Well what I need to do is take a massive string of data or eventually when I learn how file data and read it character by character and get its unicode number.

This shows 128

print asc("€", 1)

Also, so does this:

dim a as string
a = "€"
print WStr(a)
print asc(a, 1)

It is supposed to say: 8364 that way I can encode that number into the UTF-8 binary scheme. I cannot encode that character as 128 because it is not infact 128 at all. I even checked with my autoit converter program I made and it returns correct numbers. This baffles me because the helpfile for Asc says it will give unicode numbers.

What am I doing wrong? How can I get the accurate 8364 for the € character?
dkl
Site Admin
Posts: 3205
Joined: Jul 28, 2005 14:45
Location: Germany

Re: Getting the true unicode code

Postby dkl » Nov 03, 2012 14:12

Try this code, saved as UTF-8/16/32 file, with a BOM (the BOM is needed for fbc to recognize the encoding):

Code: Select all

dim a as wstring * 32
a = "€"
print asc( a )
Morthawt
Posts: 8
Joined: Nov 03, 2012 3:40

Re: Getting the true unicode code

Postby Morthawt » Nov 03, 2012 14:22

I put that in but it still prints 128. Surely there has to be a way to get the unicode value for a character? If there isn't then I cannot continue any further with my attempted converter program in this language because I need to make a binary stream that is in UTF-8 which means I need to get the REAL unicode number so I can do the conversion correctly to binary.
dkl
Site Admin
Posts: 3205
Joined: Jul 28, 2005 14:45
Location: Germany

Re: Getting the true unicode code

Postby dkl » Nov 03, 2012 14:51

It has to be saved as Unicode file, then it works for me.

It should also work when the source file was saved using the ANSI codepage encoding provided the correct codepage is used to save the file and when running the program. I couldn't get that to work yet though, there might be a bug with the ANSI codepage -> Unicode conversion.
Morthawt
Posts: 8
Joined: Nov 03, 2012 3:40

Re: Getting the true unicode code

Postby Morthawt » Nov 03, 2012 14:55

ok when I save the bas file as UTF-8 it changes the € to € and when I manually put in € it comes up with 128 again. I am confused. With my autoit one I have an interface where the given string typed in can be processed as a ANSI input/output or UTF-8. I am trying to make the same thing in FB, for now just with basic strings etc because I have no idea how to deal with files or GUI yet. Does the bas file have to be saved as UTF-8 for it to work? Then if it does will that mean everything is forced to be interpreted as UTF-8 meaning ANSI encoding strings will be screwed up because of it thinking some combinations of ANSI charcters is "supposed" to be one unicode one?
Morthawt
Posts: 8
Joined: Nov 03, 2012 3:40

Re: Getting the true unicode code

Postby Morthawt » Nov 03, 2012 15:04

With the bas file saved in UTF-8 the € in the code gets altered in "a" as you can see. However when I take the actual unicode character in b and convert it to a wide string and get the character it still comes back with 128. So A comes up correctly with the right number when it is supplied with the ASCII bytes that make up the unicode character and even though the real character is converted to wide format with Wstr it still comes up with the wrong number of 128

dim a as Wstring * 32
dim b as String
a = "€"
b = "€"

print Asc(a)
print Asc(Wstr(b))
MOD
Posts: 554
Joined: Jun 11, 2009 20:15

Re: Getting the true unicode code

Postby MOD » Nov 03, 2012 15:08

If you're using FBEdit it could be the problem as it can't handle unicode. "€" is an unicode "€" if the editor can't show unicode. Try to open the code with notepad++ or any other editor with unicode support.
dkl
Site Admin
Posts: 3205
Joined: Jul 28, 2005 14:45
Location: Germany

Re: Getting the true unicode code

Postby dkl » Nov 03, 2012 15:21

I think the Win32 version of FB is lacking an internal call to setlocale(), which is the same issue seen on Linux before. (FB uses the CRT mbstowcs() function to do the conversion, not MultiByteToWideChar(), and the CRT locale must be switched from the default "C" to the system codepage first)

This could be a work-around to fix run-time conversions:

Code: Select all

#include once "crt/locale.bi"
setlocale( 0, "" )

dim s as string
dim a as wstring * 32

s = "€"
print s[0]

a = s
print a[0]

a = "€"
print a[0]


Of course that won't fix any compile-time conversions, until fbc itself is fixed.
Morthawt
Posts: 8
Joined: Nov 03, 2012 3:40

Re: Getting the true unicode code

Postby Morthawt » Nov 03, 2012 15:24

Ok, so are you basically saying I have stumbled into a bug with freebasic? If so how do we go about getting it resolved?

edit: btw a Wasc to go with the Wchr that already exists would be perfect, that way you can either get the TRUE unicode value or ANSI values. There is a Wchr but for some reason there is no Wasc
dkl
Site Admin
Posts: 3205
Joined: Jul 28, 2005 14:45
Location: Germany

Re: Getting the true unicode code

Postby dkl » Nov 03, 2012 15:40

I'll try and see whether my fix theory is correct sometime during the week. There are no plans to make a new FB release anytime soon, but maybe I can upload a preview/snapshot build.

By the way, there are multiple asc() functions, overloaded for different kinds of parameters, and the proper one should be chosen depending on the argument types. wchr() differs from chr() in the return value type, not the parameter types, this is something that is currently not handled with overloading (because then the proper chr() function would have to be chosen based on context which can be ambigious) and thus it has to use a different name.
Morthawt
Posts: 8
Joined: Nov 03, 2012 3:40

Re: Getting the true unicode code

Postby Morthawt » Nov 03, 2012 16:13

Ok well I will wait and see until then, then. Because this is just too frustrating for a new guy to get to grips with. I thought it would just work until I saw there was no Wasc and then when Asc said it gives unicode I figured that was all I needed but I guess not if its going to give the value 128 for a unicode character with a very high value.

I am just going to put this out of my mind and check the thread until a reply because otherwise I will get so frustrated with failure I will likely give up the whole idea knowing me.

Return to “Beginners”

Who is online

Users browsing this forum: No registered users and 1 guest