Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

General discussion for topics related to the FreeBASIC project or its community.
Post Reply
jmg
Posts: 89
Joined: Mar 11, 2009 3:42

Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Post by jmg »

ie have the FreeBASIC numeric parser skip-over underscore chars. They are there to help programmers group/read the numbers.

This 'tolerates underscores' is becoming more common in Embedded Microcontroller space, and it is common for FreeBASIC to be used to talk to MCUs
badidea
Posts: 2586
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Post by badidea »

Freebasic uses &h, &o, &b. More general would be to allow underscores anywhere in these notations? E.g.:
&hFF_BB_00_00 or &h_FFBB_0000 or more crazy &h__D__E__A__D__B__E__E__F or a binary number &b_0101_0001_1110_0000

Just for fun, a quick and dirty converter from string, without any error checking:

Code: Select all

'12345678901234567890
'0xff_dd_cc_aa")
'12345678901234567890
'80_000_000
'12345678901234567890
'0b0000_0000_0000_0000

function special2int(valueStr as string) as ulong
	dim as string temp
	'hexadecimal number
	if mid(valueStr, 1, 2) = "0x" then
		temp = "&h" + mid(valueStr, 3, 2) + mid(valueStr, 6, 2)
		temp += mid(valueStr, 9, 2) + mid(valueStr, 12, 2)
		return valint(temp)
	end if
	'octal number
	if mid(valueStr, 1, 1) = "8" then
		temp = "&o" + mid(valueStr, 2, 1) + mid(valueStr, 4, 3)
		temp += mid(valueStr, 8, 3)
		return valint(temp)
	end if
	'binary number
	if mid(valueStr, 1, 2) = "0b" then
		temp = "&b" + mid(valueStr, 3, 4) + mid(valueStr, 8, 4)
		temp += mid(valueStr, 13, 4) + mid(valueStr, 18, 4)
		return valint(temp)
	end if
	return 0
end function

print hex(special2int("0xff_dd_cc_aa"))
print oct(special2int("83_124_777"))
print bin(special2int("0b1001_0110_0000_1111"))
jmg
Posts: 89
Joined: Mar 11, 2009 3:42

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Post by jmg »

badidea wrote:Freebasic uses &h, &o, &b. More general would be to allow underscores anywhere in these notations? E.g.:
&hFF_BB_00_00 or &h_FFBB_0000 or more crazy &h__D__E__A__D__B__E__E__F or a binary number &b_0101_0001_1110_0000
Yes, I was (of course) meaning the general form, and &h is my typo oops..
counting_pine
Site Admin
Posts: 6323
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Post by counting_pine »

Some thoughts that occur to me:
- I like the idea and I've considered it before but never looked into it heavily
- it could be extended to decimal numbers too, e.g. 1_234_567. Languages like Ruby allow this
- it could probably be used unambiguously anywhere in the number except at the start of decimals ('_1' is a valid variable name) and perhaps octals without the 'o'? (Probably best just to ban it at the start everywhere for consistency.)

It would need to be implemented in FB in various different places:
- In the number parser, for hex/oct/bin numbers, as well as integers and perhaps floats.
- We might need to check if numbers keep their "stringy" form anywhere once parsed (e.g. the preprocessor). Things might break if there are particular assumptions in the code, e.g. that the number '1000000' contains 7 characters. Possibly also that adding underscores doesn't affect the "length", which might affect whether FB thinks it can fit in a given type.

For consistency, there should probably also be runtime library support. This is a fairly massive task and potentially involves implementing/checking lots of different cases:
- Formats (Hex/Bin/Oct/Decimal)
- Types (Longint, Long, Single, Double)
- Function/keyword type (Val/Valint/Vallng/Cdbl/Csng/Cint, and Cint<>, etc) - although a lot of these will call the same functions, at least at runtime)
- Compile-time constants (e.g. Cint("&h11_22_33_44")) - it seems we don't actually allow this, so maybe this one's OK.
dodicat
Posts: 7976
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Post by dodicat »

For fun

Code: Select all


Function Remove(Byval Text As String,Char As String="_") As String
    Dim As Long i
    For n As Long = 0 To Len(Text)-1
        If Text[n]<> Asc(char) Then Text[i]= Text[n]:i+=1
    Next 
    Return Left(Text,i)
End Function

#macro __(t,b...)
Val(remove(#t,b))
#endmacro

Print __(1_234_567)

Print __(&h__D__E__A__D__B__E__E__F)

Print __(&hFF_BB_00_00 Or &h_FFBB_0000)

Print __(&b_0101_0001_1110_0000)

Print __(26-10-48,"-")  'not a dash


Sleep
 
badidea
Posts: 2586
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Post by badidea »

dodicat wrote:Print __(&hFF_BB_00_00 Or &h_FFBB_0000)
Is not working, but your remove function is simpler then my special2int function.
Print __(&hFF_BB_00_01) or __(&h_FFBB_0010) also does not work?
With ValLng:

Code: Select all

Function Remove(Byval Text As String, Char As String="_") As string
    Dim As Long i
    For n As Long = 0 To Len(Text)-1
        If Text[n]<> Asc(char) Then Text[i]= Text[n]:i+=1
    Next
    Return Left(Text,i)
End Function

#macro __(t,b...)
	Vallng(Remove(#t,b))
#endmacro

Print hex(__(&hFF_BB_00_01) or __(&h_FFBB_0010))
Sleep
St_W
Posts: 1619
Joined: Feb 11, 2009 14:24
Location: Austria
Contact:

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Post by St_W »

I like the idea as I know it already from other languages and doesn't cause any compatibility issues with existing code and is totally optional without bringing any disadvantages. For most uses adding support for such numeric literals at compile would perfectly do. Thus adding support for that in the RTL is optional IMHO and could be added any time later with a new compiler version. Btw the same is is true for numeric literals starting with an underscore.

For example, C# added support for that in C# 7.0 https://docs.microsoft.com/en-us/dotnet ... provements
and later added support for leading underscores (in binary/hex literals) in C# 7.2 https://docs.microsoft.com/en-us/dotnet ... c-literals
That looks like a reasonable way to go for FB too IMHO.

Also note that e.g. adding support for parsing such numeric literals with the built-in VAL() function could cause compatibility issues.
dodicat
Posts: 7976
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Post by dodicat »

Indeed badidea
The or must come out
Print __(&hFF_BB_00_00) Or __(&h_FFBB_0000)
MrSwiss
Posts: 3910
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Post by MrSwiss »

Sorry, don't understand:
dodicat wrote:The or must come out --- Print __(&hFF_BB_00_00) Or __(&h_FFBB_0000)

Code: Select all

&FFBB0000 Or &hFFBB0000 = &hFFBB0000
The second &h Or(ed) doesn't make sense, anyhow.
jmg
Posts: 89
Joined: Mar 11, 2009 3:42

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Post by jmg »

St_W wrote:I like the idea as I know it already from other languages and doesn't cause any compatibility issues with existing code and is totally optional without bringing any disadvantages. For most uses adding support for such numeric literals at compile would perfectly do. Thus adding support for that in the RTL is optional IMHO and could be added any time later with a new compiler version. Btw the same is is true for numeric literals starting with an underscore.
Fully agree, this is optional, and done at compile, and would be a safe superset, with some rules around what is allowed.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Post by jj2007 »

This is a pretty exotic number format. I would suggest to be prudent, you never know who relies on Val("1_234") returning 1 instead of 1234.
Perhaps a global switch, or an additional parameter that tells Val() and friends to ignore the understroke? OTOH, it is always possible to use a function that eliminates the understrokes, e.g.:

Code: Select all

  Print Str$(Val(Replace$("1_234_567", "_", "")))
jmg
Posts: 89
Joined: Mar 11, 2009 3:42

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Post by jmg »

St_W wrote: For example, C# added support for that in C# 7.0 https://docs.microsoft.com/en-us/dotnet ... provements and later added support for leading underscores (in binary/hex literals) in C# 7.2 https://docs.microsoft.com/en-us/dotnet ... c-literals
That looks like a reasonable way to go for FB too IMHO.
Nice links, and provides a 'working reference' as to 'industry practice' - I'd agree follow Microsoft's lead here.
St_W
Posts: 1619
Joined: Feb 11, 2009 14:24
Location: Austria
Contact:

Re: Feature request - tolerate 80_000_000 & 0xff_dd_cc_aa

Post by St_W »

jj2007 wrote:[...] you never know who relies on Val("1_234") returning 1 instead of 1234.
Yes, we could run into compatibility issues when modifying the behaviour of the RTL functions as noted above, so I'd favor supporting this at compile time only and not change the RTL.
Adding an option to revert to the old behaviour, as you mentioned, is also an option of course, but it still requires changes to compile old code (if not to code, then at least to the compilation options).
Post Reply