Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

For other topics related to the FreeBASIC project or its community.
jmg
Posts: 87
Joined: Mar 11, 2009 3:42

Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Postby jmg » Sep 07, 2018 21:49

ie have the FreeBASIC numeric parser skip-over underscore chars. They are there to help programmers group/read the numbers.

This 'tolerates underscores' is becoming more common in Embedded Microcontroller space, and it is common for FreeBASIC to be used to talk to MCUs
badidea
Posts: 1414
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Postby badidea » Sep 07, 2018 22:33

Freebasic uses &h, &o, &b. More general would be to allow underscores anywhere in these notations? E.g.:
&hFF_BB_00_00 or &h_FFBB_0000 or more crazy &h__D__E__A__D__B__E__E__F or a binary number &b_0101_0001_1110_0000

Just for fun, a quick and dirty converter from string, without any error checking:

Code: Select all

'12345678901234567890
'0xff_dd_cc_aa")
'12345678901234567890
'80_000_000
'12345678901234567890
'0b0000_0000_0000_0000

function special2int(valueStr as string) as ulong
   dim as string temp
   'hexadecimal number
   if mid(valueStr, 1, 2) = "0x" then
      temp = "&h" + mid(valueStr, 3, 2) + mid(valueStr, 6, 2)
      temp += mid(valueStr, 9, 2) + mid(valueStr, 12, 2)
      return valint(temp)
   end if
   'octal number
   if mid(valueStr, 1, 1) = "8" then
      temp = "&o" + mid(valueStr, 2, 1) + mid(valueStr, 4, 3)
      temp += mid(valueStr, 8, 3)
      return valint(temp)
   end if
   'binary number
   if mid(valueStr, 1, 2) = "0b" then
      temp = "&b" + mid(valueStr, 3, 4) + mid(valueStr, 8, 4)
      temp += mid(valueStr, 13, 4) + mid(valueStr, 18, 4)
      return valint(temp)
   end if
   return 0
end function

print hex(special2int("0xff_dd_cc_aa"))
print oct(special2int("83_124_777"))
print bin(special2int("0b1001_0110_0000_1111"))
jmg
Posts: 87
Joined: Mar 11, 2009 3:42

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Postby jmg » Sep 08, 2018 0:41

badidea wrote:Freebasic uses &h, &o, &b. More general would be to allow underscores anywhere in these notations? E.g.:
&hFF_BB_00_00 or &h_FFBB_0000 or more crazy &h__D__E__A__D__B__E__E__F or a binary number &b_0101_0001_1110_0000

Yes, I was (of course) meaning the general form, and &h is my typo oops..
counting_pine
Site Admin
Posts: 6170
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Postby counting_pine » Sep 08, 2018 9:12

Some thoughts that occur to me:
- I like the idea and I've considered it before but never looked into it heavily
- it could be extended to decimal numbers too, e.g. 1_234_567. Languages like Ruby allow this
- it could probably be used unambiguously anywhere in the number except at the start of decimals ('_1' is a valid variable name) and perhaps octals without the 'o'? (Probably best just to ban it at the start everywhere for consistency.)

It would need to be implemented in FB in various different places:
- In the number parser, for hex/oct/bin numbers, as well as integers and perhaps floats.
- We might need to check if numbers keep their "stringy" form anywhere once parsed (e.g. the preprocessor). Things might break if there are particular assumptions in the code, e.g. that the number '1000000' contains 7 characters. Possibly also that adding underscores doesn't affect the "length", which might affect whether FB thinks it can fit in a given type.

For consistency, there should probably also be runtime library support. This is a fairly massive task and potentially involves implementing/checking lots of different cases:
- Formats (Hex/Bin/Oct/Decimal)
- Types (Longint, Long, Single, Double)
- Function/keyword type (Val/Valint/Vallng/Cdbl/Csng/Cint, and Cint<>, etc) - although a lot of these will call the same functions, at least at runtime)
- Compile-time constants (e.g. Cint("&h11_22_33_44")) - it seems we don't actually allow this, so maybe this one's OK.
dodicat
Posts: 5880
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Postby dodicat » Sep 08, 2018 10:24

For fun

Code: Select all


Function Remove(Byval Text As String,Char As String="_") As String
    Dim As Long i
    For n As Long = 0 To Len(Text)-1
        If Text[n]<> Asc(char) Then Text[i]= Text[n]:i+=1
    Next
    Return Left(Text,i)
End Function

#macro __(t,b...)
Val(remove(#t,b))
#endmacro

Print __(1_234_567)

Print __(&h__D__E__A__D__B__E__E__F)

Print __(&hFF_BB_00_00 Or &h_FFBB_0000)

Print __(&b_0101_0001_1110_0000)

Print __(26-10-48,"-")  'not a dash


Sleep
 
badidea
Posts: 1414
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Postby badidea » Sep 08, 2018 11:03

dodicat wrote:Print __(&hFF_BB_00_00 Or &h_FFBB_0000)

Is not working, but your remove function is simpler then my special2int function.
Print __(&hFF_BB_00_01) or __(&h_FFBB_0010) also does not work?
With ValLng:

Code: Select all

Function Remove(Byval Text As String, Char As String="_") As string
    Dim As Long i
    For n As Long = 0 To Len(Text)-1
        If Text[n]<> Asc(char) Then Text[i]= Text[n]:i+=1
    Next
    Return Left(Text,i)
End Function

#macro __(t,b...)
   Vallng(Remove(#t,b))
#endmacro

Print hex(__(&hFF_BB_00_01) or __(&h_FFBB_0010))
Sleep
St_W
Posts: 1468
Joined: Feb 11, 2009 14:24
Location: Austria
Contact:

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Postby St_W » Sep 08, 2018 13:58

I like the idea as I know it already from other languages and doesn't cause any compatibility issues with existing code and is totally optional without bringing any disadvantages. For most uses adding support for such numeric literals at compile would perfectly do. Thus adding support for that in the RTL is optional IMHO and could be added any time later with a new compiler version. Btw the same is is true for numeric literals starting with an underscore.

For example, C# added support for that in C# 7.0 https://docs.microsoft.com/en-us/dotnet ... provements
and later added support for leading underscores (in binary/hex literals) in C# 7.2 https://docs.microsoft.com/en-us/dotnet ... c-literals
That looks like a reasonable way to go for FB too IMHO.

Also note that e.g. adding support for parsing such numeric literals with the built-in VAL() function could cause compatibility issues.
dodicat
Posts: 5880
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Postby dodicat » Sep 08, 2018 18:28

Indeed badidea
The or must come out
Print __(&hFF_BB_00_00) Or __(&h_FFBB_0000)
MrSwiss
Posts: 3180
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Postby MrSwiss » Sep 09, 2018 0:20

Sorry, don't understand:
dodicat wrote:The or must come out --- Print __(&hFF_BB_00_00) Or __(&h_FFBB_0000)

Code: Select all

&FFBB0000 Or &hFFBB0000 = &hFFBB0000
The second &h Or(ed) doesn't make sense, anyhow.
jmg
Posts: 87
Joined: Mar 11, 2009 3:42

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Postby jmg » Sep 09, 2018 0:26

St_W wrote:I like the idea as I know it already from other languages and doesn't cause any compatibility issues with existing code and is totally optional without bringing any disadvantages. For most uses adding support for such numeric literals at compile would perfectly do. Thus adding support for that in the RTL is optional IMHO and could be added any time later with a new compiler version. Btw the same is is true for numeric literals starting with an underscore.


Fully agree, this is optional, and done at compile, and would be a safe superset, with some rules around what is allowed.
jj2007
Posts: 1203
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Postby jj2007 » Sep 09, 2018 1:14

This is a pretty exotic number format. I would suggest to be prudent, you never know who relies on Val("1_234") returning 1 instead of 1234.
Perhaps a global switch, or an additional parameter that tells Val() and friends to ignore the understroke? OTOH, it is always possible to use a function that eliminates the understrokes, e.g.:

Code: Select all

  Print Str$(Val(Replace$("1_234_567", "_", "")))
jmg
Posts: 87
Joined: Mar 11, 2009 3:42

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Postby jmg » Sep 09, 2018 7:18

St_W wrote:For example, C# added support for that in C# 7.0 https://docs.microsoft.com/en-us/dotnet ... provements and later added support for leading underscores (in binary/hex literals) in C# 7.2 https://docs.microsoft.com/en-us/dotnet ... c-literals
That looks like a reasonable way to go for FB too IMHO.

Nice links, and provides a 'working reference' as to 'industry practice' - I'd agree follow Microsoft's lead here.
St_W
Posts: 1468
Joined: Feb 11, 2009 14:24
Location: Austria
Contact:

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Postby St_W » Sep 10, 2018 13:00

jj2007 wrote:[...] you never know who relies on Val("1_234") returning 1 instead of 1234.
Yes, we could run into compatibility issues when modifying the behaviour of the RTL functions as noted above, so I'd favor supporting this at compile time only and not change the RTL.
Adding an option to revert to the old behaviour, as you mentioned, is also an option of course, but it still requires changes to compile old code (if not to code, then at least to the compilation options).

Return to “Community Discussion”

Who is online

Users browsing this forum: No registered users and 26 guests