Open a file with a unicode-encoded name

New to FreeBASIC? Post your questions here.
newbieforever
Posts: 84
Joined: Jun 21, 2018 11:14

Re: Open a file with a unicode-encoded name

Postby newbieforever » Jul 02, 2018 12:08

@MrSwiss, jj2007:

I know, I strain your patience!

The file which content I read is a file containing concatenated files (program files, text files), and one of this files is a text file (e.g. created by Notepad) which was saved as unicode; and this file contains the file name I have to extract.

As described, I read stg from the file in the usual FB way:

Code: Select all

Dim As String stg, fil
'...
Open "Combined.bin" For Binary As #1
stg = Input(lof(1), #1)


Then I do the string manipulations. All this is working perfectly! And I need all this manipulations for my main job. Also the extracting of the file name works perfectly, fil = mid(strg, pos, lngth) doesnt't fail! An example: if the file name saved in the unicode-encoded text file is 'Kč.ahk', fil is the following byte sequence: '4B 00 0D 01 2E 00 61 00 68 00 6B 00'. I would be able e.g. to write this string into a file or similar things, all this works perfectly. The problem in this context is only how to use this string 'fil' to create a file (by CreateFileW()) with the name 'Kč.ahk'...
jj2007
Posts: 749
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Open a file with a unicode-encoded name

Postby jj2007 » Jul 02, 2018 12:12

newbieforever wrote:The problem in this context is only how to use this string 'fil' to create a file (by CreateFileW()) with the name 'Kč.ahk'...
I posted working code above.
Josep Roca
Posts: 376
Joined: Sep 27, 2016 18:20
Location: Valencia, Spain

Re: Open a file with a unicode-encoded name

Postby Josep Roca » Jul 02, 2018 12:57

A working example using my WinFBX framework and Paul Squire's WinFBE editor:

Code: Select all

'#CONSOLE ON
#include "windows.bi"
#include "Afx/CWSTR.inc"
using Afx

DIM cwsFilename AS CWSTR = "Добро.txt"
DIM cwsText AS CWSTR = "Дмитрий Дмитриевич Шостакович"
DIM hFile AS HANDLE = CreateFileW(cwsFilename, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL)
IF hFile THEN
   DIM dwBytesWritten AS DWORD
   DIM bSuccess AS LONG = WriteFile(hFile, cwsText, LEN(cwsText) * 2, @dwBytesWritten, NULL)
   CloseHandle(hFile)
END IF


CWSTR is a dynamic unicode string data type implemented as a class.
In the WinFBE editor, the file must use the UTF-16 (BOM) option.
MrSwiss
Posts: 2728
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: Open a file with a unicode-encoded name

Postby MrSwiss » Jul 02, 2018 13:10

Maybe MultiByteToWideChar function is, what you are looking for ...
(docs.microsoft.com)
newbieforever
Posts: 84
Joined: Jun 21, 2018 11:14

Re: Open a file with a unicode-encoded name

Postby newbieforever » Jul 02, 2018 14:29

The following code should demonstrate my problem more clearly. Until now I dont see that there would be a solution for this problem in the above contributions, but maybe there are misunderstandings from my side...

Code: Select all

' Put this character string into a notepad file:
' <Kč.txt>
' and save the file as unicode under the name unic.txt.
' In a hex editor can be seen that the file name between < and > is encoded
' by the byte sequence '4B 00 0D 01 2E 00 74 00 78 00 74 00'.

Dim As String stg, fil
Dim As Integer p1, p2

''' Extracting the file name from the file unic.txt:
Open "unic.txt" For Binary As #1
stg = Input(lof(1), #1)
Close 1
p1 = instr(stg, "<" & chr(0))
p2 = instr(stg, ">" & chr(0))
fil = mid(stg, p1 + 2, p2 - p1 - 2)
''' Please note that 'stg' (Dim As String stg) is used for other string manipulations too!
''' Extracting 'fil' is only one of these string manipulations.

''' Only for testing: Writing the extracted file name (string 'fil') to a file:
Open "testit.txt" For Binary As #1
Print #1, fil;
Close 1
' In a hex editor can be seen that the extracted file name is exactly the
' byte sequence from the unic.txt: '4B 00 0D 01 2E 00 74 00 78 00 74 00'.

''' =================================================

#Define winapimode 1
#Include "file.bi"
#include "Windows.bi"

Sub savefile(filename As wstring, p As String)
   Dim As any ptr n, pstr
   Dim As Integer byteswritten
   n=CreateFileW(filename, GENERIC_WRITE, 0, 0, CREATE_ALWAYS, 0, 0)
   If n<>-1 then
      WriteFile(n, Peek(any ptr, VarPtr(p)), Len(p), @byteswritten, 0)
      CloseHandle(n)
   End If
End Sub

''' =================================================

''' This works fine:
savefile("Kč.txt", "Text")

''' This would work too:
'Dim As WString * 250 fi = "Kč.txt"
'savefile(fi, "Text")

''' THE QUESTION IS: HOW TO USE savefile() WITH 'fil'???
''' ====================================================
MrSwiss
Posts: 2728
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: Open a file with a unicode-encoded name

Postby MrSwiss » Jul 02, 2018 16:43

newbieforever wrote:''' THE QUESTION IS: HOW TO USE savefile() WITH 'fil'???
Simple answer: you cannot!!!

*fil* is a String, but savefile() requires a WString argument (FileName).
Therefore, you'll have to convert: String --> WString (then, use the WString in the call).
Use: MultiByteToWideChar function (WIN-API), for the conversion.
jj2007
Posts: 749
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Open a file with a unicode-encoded name

Postby jj2007 » Jul 02, 2018 16:59

newbieforever wrote:The following code should demonstrate my problem more clearly. Until now I dont see that there would be a solution for this problem in the above contributions
I gave you only the savefile part, including WriteFile. What you posted works up to a certain point, but apparently FB doesn't like printing Unicode to a file. Even MultiByteToWideChar can't solve that problem - your strings are already wide, no need to convert them.

What is weird:

Code: Select all

#include "Windows.bi"
#define UNICODE
...
fil = mid(stg, p1 + 2, p2 - p1 - 2)
MessageBoxW(0, @fil, "Hello fil", MB_OK)
MessageBoxW(0, @fil[0], "Hello fil[0]", MB_OK)
I am sure it is a BASIC feature explained somewhere in the manual.
MrSwiss
Posts: 2728
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: Open a file with a unicode-encoded name

Postby MrSwiss » Jul 02, 2018 17:08

jj2007 wrote:Even MultiByteToWideChar can't solve that problem - your strings are already wide, no need to convert them.
I'd reconsider that statement, if I where you: the byte order in the String is now: *ass about face*,
which is also addressed in the conversion ...

Afaik: String's are encoded big-endian ...
dodicat
Posts: 5086
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Open a file with a unicode-encoded name

Postby dodicat » Jul 02, 2018 17:59

If your actual file names are non unicode then freebasic can use encoding for file content.

Code: Select all


Sub savefileUTF(filename As String,p As wstring)
    Dim As Long n=Freefile
    Open filename For Output Encoding "utf16" As #n
    Print #n,p
    Close #n
End Sub

sub loadfileUTF(filename as string,s as wstring)
    Dim As Long f=Freefile
    Open filename For input Encoding "utf16" As #f
    Dim As wString  * 500 ln
    Do Until Eof(f)
        Line Input #f, ln
        s+=ln + Chr(10)
    Loop
    Close #f
End sub

dim as wstring * 15 nm="<Kc.txt>"
savefileUTF("unic.txt",nm)

dim as wstring * 20 stg
loadfileUTF("unic.txt",stg)
print "stg = ";stg

var p1 = instr(stg, "<" & chr(0))
var p2 = instr(stg, ">" & chr(0))
dim as string fil = mid(stg, p1 + 2, p2 - p1 - 2)
print "fil = ";fil
print
'some unicode
dim as wstring * 500 w="Klüft skräms inför på fédéral électoral große ... " + wchr(&h0414, &h043e, &h0431, &h0440, &h043E)+chr(13,10)

savefileUTF(fil,w)
dim as wstring * 10000 ret

loadfileUTF(fil,ret)
print ret
print "done"
sleep
 
jj2007
Posts: 749
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Open a file with a unicode-encoded name

Postby jj2007 » Jul 02, 2018 21:23

dodicat wrote:If your actual file names are non unicode then freebasic can use encoding for file content.
...
Open filename For Output Encoding "utf16" As #n
Which is pretty odd given that utf16 is Windows' version of Unicode ;-)

Have you tested utf8, dodicat? That could actually work (but I didn't test it yet). EDIT: tested, doesn't work; it adds a utf8 BOM but Open fails.

Here is my testbed: It creates Добро.txt and writes Hello World to it, then loads the content from the newly created file. It works also with more exotic languages, e.g. dim as wstring * 24 file= "مرحبا بكم.txt" (not all IDEs allow direct assignments like this, therefore I used the wchr() notation below).
For winapimode 0, it uses Abcde.txt and FB file i/o commands instead.

Code: Select all

#Define winapimode 1   ' 0=use FB functions, 1=use WinAPI functions
#Include "file.bi"
#include "Windows.bi"

Sub savefile(filename As wstring, p As String)
  Print "Opening [";filename;"]"
  #if winapimode
   Dim As any ptr n, pstr
   Dim As Integer byteswritten
   n=CreateFileW(filename, GENERIC_WRITE, 0, 0, CREATE_ALWAYS, 0, 0)
   If n<>-1 then   ' Peek(any ptr, VarPtr(p))
      WriteFile(n, @p[0], Len(p), @byteswritten, 0)
      Print byteswritten; " bytes written"
      CloseHandle(n)
   Else
      Print "Unable to open " + filename
   End If
  #else
   Dim As Integer n
   n=freefile
   If Open(filename For Binary Access Write As #n)=0 then
      Put #n,,p
      Close
   Else
      Print "Unable to open " + filename
   End If
  #endif
End Sub

Function loadfile(file as wstring) as String
   ' If FileExists(file)=0 Then Print file;" not found":Sleep:end
  Dim As String text="ERROR"
  #if winapimode
    Dim as any ptr f
    Dim as integer fsize, bytesread
   f=CreateFileW(file, GENERIC_READ, 0, 0, OPEN_EXISTING, 0, 0)
   if f then
      fsize=GetFileSize(f, 0)
      text=String(fsize, Asc("x"))
      ReadFile(f, @text[0], fsize, @bytesread, 0)
      CloseHandle(f)
   endif
  #else
   var f=freefile
   Open file For Binary Access Read As #f
   If Lof(f) > 0 Then
      text = String(Lof(f), 0)
      Get #f, , text
   End If
   Close #f
  #endif
  return text
end Function

#if winapimode   ' the filename:
   dim as wstring * 10 file= wchr(&h0414, &h043e, &h0431, &h0440, &h043E, &h002E, &h0074, &h0078, &h0074)
#else
   dim as wstring * 10 file= "Abcde.txt"
#endif
print "The filename is ["; file; "]"
dim as string binaryfile="Hello World, how are you today?"   ' "1234"+chr(0)+"5"+chr(206)+"6789"

savefile(file,binaryfile)       'saved to disk

dim as string s =loadfile(file) 'loaded from disk
print s

' kill file 'erase the file

sleep
newbieforever
Posts: 84
Joined: Jun 21, 2018 11:14

Re: Open a file with a unicode-encoded name

Postby newbieforever » Jul 03, 2018 6:34

I do not know if I'm still totally confused or already completely despaired...

It seems that all FB experts here see no way to read a file name from a unicode-encoded text file and then create a file of that name.

Is FB really unable to handle such a simple task??? Can this be true???
jj2007
Posts: 749
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Open a file with a unicode-encoded name

Postby jj2007 » Jul 03, 2018 8:43

newbieforever wrote:Is FB really unable to handle such a simple task??? Can this be true???
I got it running, but unic.txt must delimit the filename with [brackets]:

Code: Select all

#Define winapimode 1   ' 0=use FB functions, 1=use WinAPI functions
#Include "file.bi"
#include "Windows.bi"

Sub savefile(filename As wstring, p As String)
  Print "Opening [";filename;"]"
  #if winapimode
   Dim As any ptr n, pstr
   Dim As Integer byteswritten
   n=CreateFileW(filename, GENERIC_WRITE, 0, 0, CREATE_ALWAYS, 0, 0)
   If n<>-1 then   ' Peek(any ptr, VarPtr(p))
      WriteFile(n, @p[0], Len(p), @byteswritten, 0)
      Print byteswritten; " bytes written"
      CloseHandle(n)
   Else
      Print "Unable to open " + filename
   End If
  #else
   Dim As Integer n
   n=freefile
   If Open(filename For Binary Access Write As #n)=0 then
      Put #n,,p
      Close
   Else
      Print "Unable to open " + filename
   End If
  #endif
End Sub

Function loadfile(file as wstring) as String
   ' If FileExists(file)=0 Then Print file;" not found":Sleep:end
  Dim As String text="ERROR"
  #if winapimode
    Dim as any ptr f
    Dim as integer fsize, bytesread
   f=CreateFileW(file, GENERIC_READ, 0, 0, OPEN_EXISTING, 0, 0)
   if f then
      fsize=GetFileSize(f, 0)
      text=String(fsize, Asc("x"))
      ReadFile(f, @text[0], fsize, @bytesread, 0)
      CloseHandle(f)
   endif
  #else
   var f=freefile
   Open file For Binary Access Read As #f
   If Lof(f) > 0 Then
      text = String(Lof(f), 0)
      Get #f, , text
   End If
   Close #f
  #endif
  return text
end Function

#if winapimode   ' the filename:
   ' dim as wstring * 30 file= "مرحبا بكم.txt"
   dim as wstring * 30 file= "Добро.txt"
   Dim as integer posL, posR
   dim as string tmp
   dim as string unic =loadfile("unic.txt") 'loaded from disk
   posL=Instr(unic, "["+chr(0))   'the chr(0) gets ignored!
   posR=Instr(unic, "]"+chr(0))   ' ">" would NOT work
   Print "L=";posL;", Rx=";posR
   ' posR=posL+20 ' ** doesn't work: Instr(unic, ">"+chr(0))
   tmp=Mid(unic, posL+2, posR-posL-3)+chr(0)+chr(0)

   asm pushad   ' copies name from tmp to file
   asm mov esi, [tmp]
   asm lea edi, [file]
   asm mov ecx, 30
   asm rep movsb
   asm popad

'    MessageBoxW(0, @tmp[0], "tmp:", MB_OK)
'    MessageBoxW(0, @file[0], "File:", MB_OK)
'    Print unic
   Print "L=";posL;", R=";posR
   ' file=tmp
#else
   dim as wstring * 10 file= "Abcde.txt"
#endif
print "The filename is ["; file; "]"
dim as string binaryfile="Hello World, how are you?"   ' "1234"+chr(0)+"5"+chr(206)+"6789"

savefile(file,binaryfile)       'saved to disk

dim as string s =loadfile(file) 'loaded from disk
print s

' kill file 'erase the file

sleep

There seems to be a bug or undefined behaviour in Instr(). While FB claims that its strings are not zero-delimited C strings, the behaviour of Instr is different: Instr(unic, ">"+chr(0)) would always return the first occurrence of ">", whether followed by a zero byte or not. And by chance, the Unicode representation of Добро.txt does contain some early matches for 3E hex - otherwise I would never had noticed this behaviour. It works with [square brackets] because there aren't any in Добро.txt ...

Note there must be more elegant solutions to do this, but I am not expert enough in this language.
newbieforever
Posts: 84
Joined: Jun 21, 2018 11:14

Re: Open a file with a unicode-encoded name

Postby newbieforever » Jul 03, 2018 10:39

jj2007, that's incredible, overwhelming!

If I correctly understand your method, you are able to convert the file name string extracted from the file into a wstring (by appending chr(0)+chr(0) to it and aplying some asm commands). Anyway, this is ingenious!!!

At the moment I don't understand all details of your code and have to study them, and then to look how this can be adapted for my project.

A lot of work...

PS: The selection of a delimiter character could be a problem (btw, for my example your method seems to work even with <...>) because brackets are allowed in Windows file names...
Btw, instr does not find e.g. "<" & chr(0), but it finds chr(0)...
jj2007
Posts: 749
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Open a file with a unicode-encoded name

Postby jj2007 » Jul 03, 2018 11:28

newbieforever wrote:jj2007, that's incredible, overwhelming!
No, it's some lines of dirty hacks. As written earlier, there must be more elegant ways to do this, but so far I couldn't find them.

Btw:

Code: Select all

   posL=Instr(unic, "<"+chr(0))   'the chr(0) gets ignored!
   posR=Instr(unic, ">"+chr(0))   ' ">" would NOT work

But this works:

Code: Select all

   posL=Instr(unic, chr(60)+chr(0))   ' OK
   posR=Instr(unic, chr(62)+chr(0))   ' OK

Same for this version:

Code: Select all

   dim as string ml=chr(60) & chr(0)
   dim as string mr=chr(62) & chr(0)
   posL=Instr(unic, ml)   'OK
   posR=Instr(unic, mr)   'OK
Mysteries of BASIC ;-)
newbieforever
Posts: 84
Joined: Jun 21, 2018 11:14

Re: Open a file with a unicode-encoded name

Postby newbieforever » Jul 03, 2018 13:05

jj2007:

Fantastico!!! How did you come up with the idea that it could work with chr+chr???

Return to “Beginners”

Who is online

Users browsing this forum: No registered users and 1 guest