New array features

General discussion for topics related to the FreeBASIC project or its community.
Post Reply
Lost Zergling
Posts: 534
Joined: Dec 02, 2011 22:51
Location: France

Re: New array features

Post by Lost Zergling »

@Juergen Kuehlwein, Iczer
To illustrate my point the code below (RAD). Multi-level work would be done and little time on my side too..
Any suggestion/request will be welcome

Code: Select all

#Include once "D:\Basic\LZLE_FBsite.bi"

Dim Shared GarbageCollector As List
Dim Shared WorkingIndex As List
WorkingIndex.HashTag(" ") 'Need one non null value before recycling or some other properties. To be fixed next ver.
Function a_Unique( A_Array() As Integer) As uInteger
    Dim As Integer s=LBound( A_Array , 0 ) : Dim As Integer t=UBound( A_Array , 0 ) : 
    Dim As Integer i=0 : Dim As Integer k=0 : Dim As uInteger uCount=0
    WorkingIndex.GarbageSnatch(GarbageCollector)
    For i=LBound( A_Array , 0 ) To UBound( A_Array , 0 ) ' For each dimension...
        For k=LBound( A_Array , i ) To UBound( A_Array , i )
            If WorkingIndex.HashTag(Str(A_Array(i,k)))=1 Then                 
                Print "array(" & i & "," & k & ")=0"
                A_Array(i,k)=0 : uCount+=1
            End If
        Next k            
    Next
    WorkingIndex.Recycle : GarbageCollector.GarbageSnatch(WorkingIndex)
    Return uCount
End Function

Dim d_num As Double
Dim array() As Integer
d_num=a_Unique( array() )
Print "---"
ReDim array(10 To 11, 20 To 22)
array(1,10)=20
array(1,11)=11
array(2,20)=20
array(2,21)=11
array(2,22)=23
Print a_Unique( array() ) & " duplicates set to 0"
Print a_Unique( array() ) & " duplicates set to itself"
sleep
system

Tourist Trap
Posts: 2958
Joined: Jun 02, 2015 16:24

Re: New array features

Post by Tourist Trap »

Juergen Kuehlwein wrote: Fxm´s code is a clever workaround for a missing feature. Currently you can use VARPTR for retrieving a pointer to an array´s elements, but you cannot retrieve the array´s descriptor with VARPTR. I already coded the necessary changes for the compiler to return this pointer with VARPTR: p = @array, or p = VARPTR(array). It implements the same syntax as U/LBOUND: array variable without index. As far as i can tell, fxm´s structure definition here for the array descriptor is correct.
Thinking again about this, it became clear that you are right. There is something like a DESCPTR missing to point at the descriptor of an array. It may well be nice to have it in order to increase the user's control on the arrays.
fxm
Moderator
Posts: 12081
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: New array features

Post by fxm »

Remember these 2 posts (26 Jul 2015 in How accessing to the array's descriptor structure?):
fxm wrote:@dkl,

By analogy with the var-len strings for which we can get the descriptor address with the syntax '@s' (and '@s[n]' to get the address of one character data):
- Could we also get the descriptor address of a var-len array with the syntax '@array' (already '@array(n)' to get the address of one element)?
dkl wrote:I'm not sure; in case of Strings there is the String type to represent the descriptor, but for arrays, what type should be used? In other words, the language doesn't currently support having pointers to arrays and then accessing the array elements through that.
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: New array features

Post by Juergen Kuehlwein »

@fxm,

thanks for testing! Obviously in the sorting code there is a misconception of how FB arranges memory for multidimensional arrays, therefore this doesn´t work as advertised. But sorting one-dimensional arrays works as it should. Do you agree?

Thanks again for spotting the bug in array_delete. As already posted above, i´m thinking of a slightly different approach without REDIM, this would allow for covering multi-dimensional arrays too. Would you mind me implementing parts of the code you posted (you will be given credits)? To me your code for inserting and deleting elements seems hard to beat.

BTW you definitely have an excellent knowledge about FB´s arrays as your posts show. I don´t want to re-invent the wheel, so if you already have code, which might be useful for this topic, would you like to share it, or at least contribute in reviewing and improving my code and finding the the bugs i missed? Of course this applies to all other members willing to help too.


@all,

having read your posts i don´t want to make arrays a replacement for a database system, but i want to add some useful and basic features, so i plan for the following:

- varptr(array) -> returns the array´s descriptor
- definition of the array descriptor and FB´s memory layout for arrays -> fxm´s descriptor definition + memory layout

- array_info -> return information about array internals held by the array descriptor (is dimmed, total # of elements, total size in bytes and the like)
- array_calc -> conversion of multi-dimensional indices to a linear index and vice versa
- array_sort -> sort an array or parts (starting linear index + count) of it (maybe sort a second array in the exact same order)
- array_insert -> insert a new element at a given index and shift all following or the next n elements one position up
- array_delete -> delete an array element and shift all following or the next n elements one position down
- array_scan -> search an array or parts (starting linear index + count) of it for a given value (find first, find all, find highest, find lowest)
- array_set -> create an array as memory overlay at a given address
- array_reset -> delete overlay array without invalidating the data and freeing the memory
and maybe
- array_unique -> delete doubles (close gaps but don´t redim)


JK
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: New array features

Post by Juergen Kuehlwein »

@fxm,

as posted above, i coded a (working) compiler version, which allows for a "VARPTR(array)" or "@array syntax" - it is possible!. The type is irrelevant for retrieving a pointer to the descriptor, i use "ANY PTR" just like you. Of course accessing the array elements is a different thing, then we need the type of those elements, but that´s not what we primarily want "VARPTR(array)" for.


JK
UEZ
Posts: 972
Joined: May 05, 2017 19:59
Location: Germany

Re: New array features

Post by UEZ »

Here my suggestions:

array_shuffle -> Shuffles selected rows of 1D or 2D arrays - can be limited to a specific column in 2D arrays
array_transpose -> transposes a 1D or 2D array (swaps rows and columns)
array_concatenate -> concatenate two arrays - either 1D or 2D with the same number of columns
array_search -> finds an entry within a 1D or 2D array
array_swap -> swaps elements of a 1D array and full or part rows/columns of a 2D array
array_reverse -> takes the given array and reverses the order in which the elements appear
array_pop -> returns the last / first element of an array, deleting that element from the array at the same time
array_push -> add new values without increasing array size by inserting at the end the new value and deleting the first one or vice versa
Tourist Trap
Posts: 2958
Joined: Jun 02, 2015 16:24

Re: New array features

Post by Tourist Trap »

Juergen Kuehlwein wrote: Of course accessing the array elements is a different thing, then we need the type of those elements, but that´s not what we primarily want "VARPTR(array)" for.
I can not say if it will be relevant or not to have VARPTR(array) return the array descriptor address. But even if I leave this aside, I can see that to be consistent with the way the syntax work, you'll need to add a keyword.

The documentation of VARPTR says:
When the operand is of type String, the address of the internal string descriptor is returned. Use Operator Strptr (String Pointer) to retrieve the address of the string data.
So a normal user may expect this for arrays (or something similar):
When the operand is of type String an array, the address of the internal string array descriptor is returned. Use Operator Strptr (String Pointer) Arrptr (Array Pointer) to retrieve the address of the string data.
fxm wrote:
dkl wrote:I'm not sure; in case of Strings there is the String type to represent the descriptor, but for arrays, what type should be used? In other words, the language doesn't currently support having pointers to arrays and then accessing the array elements through that.
That's where dkl is perfectly right as usual. VARPTR, is something that is related to a well defined datatype. For instance if we have an Integer Ptr, we get its address by VarPtr. This is what this means.
For an array, this is simply not a datatype. We could say that this doesn't matter. I think we can have the feature and also take care of the syntactic issues. It leads for me to this proposal:
  • introducing ArrPtr(array) for the data section, or just leave VarPtr(array(index)) be used instead. The fact that an array can be erased may suggest that ArrPtr(array) would be a little disturbing in this case.
  • introducing DescPtr(array) for the array descriptor address (or anything similar), and leaving VarPtr operate on stuff that have a well defined DataType.
Just instant thinking... I can be wrong!
May be then VarPtr(array) would point to descriptor, and VarPtr(array(index)) to some data... I don't know.
fxm
Moderator
Posts: 12081
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: New array features

Post by fxm »

When a n-dimension array is defined, the address of the data section is get by:
@array(Lbound(array, 1), Lbound(array, 2), ..., Lbound(array, n))

For an erased dynamic array:
@array(Lbound(array, 1), Lbound(array, 2), ..., Lbound(array, n))
is still valid, corresponding to:
@array(0, 0, ..., 0)
but it returns 0.
(all other index values inducing a runtime error).
Lost Zergling
Posts: 534
Joined: Dec 02, 2011 22:51
Location: France

Re: New array features

Post by Lost Zergling »

All this brings me to the ideas developping array-extensions dedicated to a context oriented database. So less rapid on the pure calculation but heavier on average data sets, and rather dedicated to the manipulation of high volumes.
For this purpose, a low level feature seem useful to me: array_AllOf to instantiate a ForAll or equivalent and an iterator array_Step to jump from one element to the next (or a ForAll refv in array). For the course of all (or part) elements of an array, sequential access would probably be faster than nested loops using indented sequential.
To read several posts, I know not to be the only one to think that a slightly faster sequential iterator would be theoretically feasible and useful to optimize the performances of certain types of algorithms. This seems to me more complicated than it seems at first glance for two reasons: the management of the possible splitted data in memory and the question of the sheer performance of the iterator regarding the fullfilled FB namespace. I may be wrong.
Just an opinion on the subject : keywords proposed by UEZ sounds good in a pure low-level logic : avoid search (more than 2 dim), unique and sort which seem more database oriented primitives type. This doesn't mean they would'nt be usefull, just mean they should be dedicated to small elementary operations.
speedfixer
Posts: 606
Joined: Nov 28, 2012 1:27
Location: CA, USA moving to WA, USA
Contact:

Re: New array features

Post by speedfixer »

back to this:

@Juergen Kuehlwein:
Before making a new pull request i would like to know, what would be the preferred/best way of implementing the new array features (sort, insert, delete). I see three ways to go:

1.) as is - add it as include file (definitions and run time code in array.bi). The features are available only, if array.bi is included. This makes everything acessible to the user.

2.) add only the definitions to array.bi, and add the code to the runtime library, which still requires array.bi to be included for making the features available (just like file.bi). This keeps parts of the low level stuff away from the user

3.) add it to the compiler (new keywords etc.) and add the actual code to the runtime library, which makes an include file (array.bi) obsolete. This keeps all the low level stuff away from the user

Obviously #1 is easiest and compared to other features not an unusual way. What do you think, how should i procede?
I disagree with paul doe re: choice #2.

Great efforts have been made by the developers in the past to keep FB as small and fast as possible, especially the compile process. Some have even gone to even great efforts of further reducing compile size.

see:
viewtopic.php?f=3&t=25853&p=235187&hili ... le#p235187
viewtopic.php?f=17&t=23372&p=205818&hil ... le#p205818
viewtopic.php?f=9&t=26728&p=247809&hili ... le#p247809

I would bet others have depended on these features of FB for their production code. Personally, I compile all my reusable code into discrete static units to keep size down. There are enough extra added that my program compile time is WAY down from an all-included source. Faster development. For a final, I THEN put all the source together - better optimization, smaller exe.

Use option 1.

Then there won't be a lot of surprises or questions when everyone updates to this next new version. We are programmers. It is our CHOICE to add or not add. I expect most would prefer to have that choice and to have unused functionality NOT be included in the base RT lib as much as possible. Would you like the graphics module ALWAYS included for every compile? Or the error code?

I AM excited over the possibilities of these new features and this direction of development. Yeah, FB!

david
MrSwiss
Posts: 3910
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: New array features

Post by MrSwiss »

I tend to agree with speedfixer, use option #1. (preferred)

Also, just providing 'primitives' like required Macro's and such, should do.
Extending the 'primitives' according to the 'job at hand', by the application
developer, at the sole discretion of the programmer in question ...
I'd strictly stick with the required minimal code, to enable extensions if
and when needed only.
Those wanting more, can then do it themselfs ... (aka: DIY).
(I'm not talking about OOP, stay strictly procedural, for 'primitives'.)
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: New array features

Post by Juergen Kuehlwein »

@Tourist Trap
May be then VarPtr(array) would point to descriptor, and VarPtr(array(index)) to some data... I don't know.
yes, that´s what i would prefer! If VARPTR can return a String descriptor (which isn´t syntactically consistent with, what VARPTR usually returns), why shouldn´t it be allowed to return an Array descriptor? I think, if a user can get the difference between a String descriptor and it´s data, he will be able to get the difference between an Array descriptor and the array´s data too. The more, the syntax is similar to L/UBOUND, which expects the array´s name without parenthesis. So "VARPTR(array)" is syntactically different from "VARPTR(array(index))". Obviously we don´t have a need for "DESCPTR" for Strings, so why use it for Arrays?


@all,

personally i would prefer option #1 too, because then everyone can inspect the code and can take it as a start for own special features. Just combine the provided basic features or write own specialized code using the all the basic information supplied by such an include file. If it will become part of the run time library, it must be written in C, which isn´t everyone´s favorite here. That is, for many users the provided information will be lost or at least some kind of hidden (in the run time sources).

To be honest, putting the code into the run time library doesn´t necessarrily impose a size penalty on the resulting executable. If you take a look at the run time code, you will find a lot of small files doing one special task. Why is that? It´s a matter of granularity. The linker is clever enough to add only the run time code, it actually needs. So the smaller the entities in the run time sources the less code, which actually isn´t required, is linked in.


@MrSwiss,

this is exactly, what i have in mind.


JK
speedfixer
Posts: 606
Joined: Nov 28, 2012 1:27
Location: CA, USA moving to WA, USA
Contact:

Re: New array features

Post by speedfixer »

[edit after quick test]
*Sometimes* it may not result in a larger executable, but that isn't a secure or predictable enough statement that I would make.

Certain:

the base runtime lib is still larger.

That means a longer compile time. Might not be much but ...

When I recompile my libraries, that is 24 *groups* of libraries with 5 to 25 modules in each.
I already have scripted this to move to tmp(mem) dir and back. This gave a 30% shorter compile time which == about 2 minutes. I'm on a fast 6 core system. And I am a long from complete with what I have planned at this time.

I'm sure there are others that have had to make similar adjustments to make the code/compile/test/code cycle reasonable, even with the very fast FB compiler.

david
Last edited by speedfixer on May 18, 2019 0:19, edited 1 time in total.
paul doe
Moderator
Posts: 1730
Joined: Jul 25, 2017 17:22
Location: Argentina

Re: New array features

Post by paul doe »

Juergen Kuehlwein wrote:...
@all,

personally i would prefer option #1 too, because then everyone can inspect the code and can take it as a start for own special features. Just combine the provided basic features or write own specialized code using the all the basic information supplied by such an include file. If it will become part of the run time library, it must be written in C, which isn´t everyone´s favorite here. That is, for many users the provided information will be lost or at least some kind of hidden (in the run time sources).
...
Then what's the point of this thread? Stir controversy?
speedfixer
Posts: 606
Joined: Nov 28, 2012 1:27
Location: CA, USA moving to WA, USA
Contact:

Re: New array features

Post by speedfixer »

Then, paul doe, what are you suggesting?


With your quote, perhaps I don't understand part what is suggested as a difference.

Why would someone NOT look at code source - rtlib or not; separated source or not - to understand how features work?
C source or not?

While we might DESIRE that all source is FB, I don't think it will be another decade before that *could* be true.
The rtlib source is 450+ files. Those interested look at them frequently.
I don't program in C. I think C and their derivatives are so hard to read that 90% of all the security problems we have seen in the last 15 years are ONLY because C family languages are hard to read.
BUT - the effort has been made to make them fast and efficient. Best tool. So, I read and use C sometimes.


What controversy?
People have a difference of opinion. That is being discussed.

david
Post Reply