Ansi, Utf8 & Utf16 encoding problems

General discussion for topics related to the FreeBASIC project or its community.
marcov
Posts: 3455
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Ansi, Utf8 & Utf16 encoding problems

Post by marcov »

jj2007 wrote: Of course, the console is another story; it works fine for Russian with the Lucida Console or Consolas fonts on my Italian OS, but to display Chinese in the console, you need some acrobatics - unless you have a Chinese Windows version, I suppose.
No it is possible. IIRC you either need to install the new terminal app, or

set your app and terminal encoding to utf8, select a right font, and start your terminal with CMD /w
excuse my Freepascal/Delphi, but the code to do that looks like this:

Code: Select all

Const
  LF_FACESIZE = 32;
 
Type
  CONSOLE_FONT_INFOEX = record
    cbSize     : ULONG;
    nFont      : DWORD;
    dwFontSize : COORD;
    FontFamily : UINT;
    FontWeight : UINT;
    FaceName   : array [0..LF_FACESIZE-1] of WCHAR;
  end;
 
{ Only supported in Vista and onwards!}
 
function SetCurrentConsoleFontEx(hConsoleOutput: HANDLE; bMaximumWindow: BOOL; var CONSOLE_FONT_INFOEX): BOOL; stdcall; external kernel32;
 
var
  New_CONSOLE_FONT_INFOEX: CONSOLE_FONT_INFOEX;
 
initialization
 // these two switch the FPC to UTF8 as 1-byte encoding.
 SetMultiByteConversionCodePage(CP_UTF8);   // for general strings
 SetMultiByteRTLFileSystemCodePage(CP_UTF8); // for filenames.

  SetConsoleOutputCP(DefaultSystemCodePage); // winapi console. DefaultSystemCodepage is now utf8.
  SetTextCodePage(Output, DefaultSystemCodePage);  // adapt the FPC stdout descriptor.
 
  FillChar(New_CONSOLE_FONT_INFOEX, SizeOf(CONSOLE_FONT_INFOEX), 0); // zero structure
  New_CONSOLE_FONT_INFOEX.cbSize := SizeOf(CONSOLE_FONT_INFOEX);
  New_CONSOLE_FONT_INFOEX.FaceName := 'Consolas';
//  New_CONSOLE_FONT_INFOEX.FaceName := 'Lucida Console';  //use Lucida Console for Win XP, XP is no longer supported.
  New_CONSOLE_FONT_INFOEX.FontWeight := 400;
  New_CONSOLE_FONT_INFOEX.dwFontSize.Y := 16;
 
  SetCurrentConsoleFontEx(StdOutputHandle, False, New_CONSOLE_FONT_INFOEX);
end.
As said earlier in these thread, I have experimented with the UTF8 manifesting in gui programs, but not yet with console programs, so I don't know. For GUI programs it is easy as it is just a tick in lazarus. One of the reasons is that resource handling is currently complicated with the old nightmare that is windres. Pretty soon (hopefully weeks), a release will come with a FPC native resource compiler and we can part with that trainwreck.

p.s. I think your masm32 framework somehow shields you. Win32 exe are not utf8 by default, you have to do stuff for it, so probably done by the bloated masm32 framework. (:-)
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Ansi, Utf8 & Utf16 encoding problems

Post by jj2007 »

Yes, the big problem with the console is finding the right font. Lucida Console and Consolas do handle cyrillic, but they don't know about Chinese or Arabic. IIRC even on Win10 I can't display Chinese with my Italian OS (my dev machine is Win7-64).
marcov wrote:p.s. I think your masm32 framework somehow shields you. Win32 exe are not utf8 by default, you have to do stuff for it, so probably done by the bloated masm32 framework. (:-)
I use the bloated Masm32 SDK sparingly ;-)

With a few exceptions, my macros do not use the Masm32 library; i.e. they are coded from scratch. For example, a simple...

uMsgBox 0, cfm$("Добро пожаловать\nمرحبا بكم \n歡迎"), "Welcome in three languages:", MB_OK

... translates to:

Code: Select all

push 0                                   ; MB_OK
push offset ra_lbl2                      ; ASCII "Welcome in three languages:"
call MbwRec                              ; translate to Utf-16
push eax                                 ; |Title
push offset ??00F5                       ; UTF-8 "Добро пожаловать مرحبا بكم 歡迎"
call MbwRec                              ; translate to Utf-16
push eax                                 ; |Text
push 0                                   ; |hOwner = NULL
call MessageBoxW                         ; \USER32.MessageBoxW	
So under the hood I do a translation of a byte stream to a Utf-16 stream, and then use the wide version of the required API. Works just fine.

Here is the FreeBasic equivalent:

#include "Windows.bi"
MessageBoxW(0, "Добро пожаловать"+Chr(13)+"مرحبا بكم "+Chr(13)+"歡迎", "Welcome in three languages:", MB_OK)

Under the hood:

Code: Select all

push 0                                   ; |/Type = MB_OK|MB_DEFBUTTON1|MB_APPLMODAL
push offset 00404044                     ; ||Caption = "Welcome in three languages:"
push offset 00404004                     ; ||Text = "Добро пожаловать مرحبا بكم 歡迎"
push 0                                   ; ||hOwner = NULL
call <jmp.&USER32.MessageBoxW>           ; |\USER32.MessageBoxW
As you can see, the compiler does the translation to Utf-16, i.e. the string at 404004 is already in wide format, provided by the compiler. I might do the same but have decided to leave the strings as Utf8 and convert them when needed. Performance-wise the conversion is a no-brainer, and it saves some bytes in the executable.
dodicat
Posts: 7979
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Ansi, Utf8 & Utf16 encoding problems

Post by dodicat »

I have used consolas as default.

Code: Select all

#include "windows.bi"
Shell "color f0"
Sub changefontsize(w As Long, h As Long,ftype As String="consolas")
    Dim As  _CONSOLE_FONT_INFOEX  x
    With x
        .cbsize=Sizeof(_CONSOLE_FONT_INFOEX)
        .nfont=0
        .dwfontsize=Type(w,h)
        .fontfamily=0
        .fontweight=100
        .facename=ftype
    End With
    setcurrentconsolefontex(GetStdHandle(STD_OUTPUT_HANDLE),1, @x )
End Sub

Sub changeconsolesize(cols As Long,lines As Long)
    Shell "MODE CON: COLS="+Str(cols)+ "LINES="+Str(lines)
End Sub

Sub getfontsize(x as _CONSOLE_FONT_INFOEX)
        x.cbsize=Sizeof(_CONSOLE_FONT_INFOEX)
    getcurrentconsolefontex(GetStdHandle(STD_OUTPUT_HANDLE),false, @x )
End Sub

dim as _CONSOLE_FONT_INFOEX f

getfontsize(f)
print "Initial font size =  ";f.dwFontSize.x;" by";f.dwFontSize.y 

'====================
Print "Press a key"
Sleep


changefontsize(13,28)

Print "new font size this session = 13 by 28"

Dim As String asci
For n As Long=0 To 255
    asci+=Chr(n) 'create a string
Next

Locate 10
Print asci


Print "Now change the console size again,  press a key"
Sleep
Cls
changefontsize(15,30)
changeconsolesize(50,4)

Print "New console size = COLS=50 LINES=4"
Print "Font size 15 by 30"
Print "Press a key to end . . . "
Sleep

 
Windows 10 console can handle Russian letters no problem.
Powershell does not as default.
I haven't tried Chinese fonts.
This demo is plain ansi.
Neither masm or pascal, sorry!
marcov
Posts: 3455
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Ansi, Utf8 & Utf16 encoding problems

Post by marcov »

Please note that for manifesting I was talking about UTF8 with -A functions, not converting them, and passing them on to UTF16 apis.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Ansi, Utf8 & Utf16 encoding problems

Post by jj2007 »

marcov wrote:Please note that for manifesting I was talking about UTF8 with -A functions, not converting them, and passing them on to UTF16 apis.
That is Windows 10, and I bet that you'll find MultiByteToWideChar once you dive into the WhatEverA call ;-)

If you can manifest your interest in Utf8, it probably means that M$ loads a global variable to do the job:

MultiByteToWideChar(CP_ASMANIFESTED, 0, @source, -1, @dest, buffersize)

Actually, if I think about it, I could probably implement that feature with a manifest file for Win7, too. However, simply initialising CP_ASMANIFESTED to CP_UTF8 would be far easier.

Under the hood it's probably still RtlUTF8ToUnicodeN in ntdll.
marcov
Posts: 3455
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Ansi, Utf8 & Utf16 encoding problems

Post by marcov »

jj2007 wrote:
marcov wrote:Please note that for manifesting I was talking about UTF8 with -A functions, not converting them, and passing them on to UTF16 apis.
That is Windows 10, and I bet that you'll find MultiByteToWideChar once you dive into the WhatEverA call ;-)
Yes, and a not too old one. (1905 or so), but basically that is pretty much the only supported windows version except for a very small handful on Win8.1

The call from A to W always was there, for all A calls anyway, that is not new. It is just that now also UTF8 is supported for it
So effectively it removes the bloat of those calls thus moves from your 1-byte string app using EXE. (though probably only for the ansi calls, maybe not for the OEM ones).
Post Reply