Import a jaggered CSV file and save data to a common comma delimeted csv format

Post your FreeBASIC source, examples, tips and tricks here. Please don’t post code without including an explanation.
geofs
Posts: 12
Joined: Apr 14, 2020 13:34

Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by geofs »

Revised version

Input
A,1,"Hello, World",Freebasic,9
B,2,Hello World,"Freebasic",5
C,"Hello World","Freebasic"
D,"7",7,"7"

Output
'"A","1","Hello, World","Freebasic","9"
'"B","2","Hello World","Freebasic","5"
'"C","Hello World","Freebasic","",""
'"D","7","7","7",""

Code: Select all

dim as string temp
dim as integer x, y, row, col, qot

'find how many rows in csv file
row = 0
open "data.csv" for input as #1
  while not eof (1)
    line input #1, temp
    row = row + 1
  wend
  seek #1, 1
  
  'find how many columns in each row of csv file
  dim as integer fld(row)
  col = 0
  for y = 1 to row
    line input #1, temp
    fld(y) = 1
    qot = 0
    for x = 1 to len(temp)
      if mid(temp,x,1) = chr(34) then qot = qot + 1
      if qot = 0 and mid(temp,x,1) = "," then fld(y) = fld(y) + 1
      if qot = 2 then qot = 0
    next x
    if fld(y) > col then col = fld(y)
  next y
  seek #1, 1
     
  'import data into a 2 dimentional array
  dim as string linein(row,col)
  for y = 1 to row
    for x = 1 to fld(y)
      input #1, linein(y,x)
    next x
  next y    
close #1    

'save data to csv
open "save.csv" for output as #1
  for y = 1 to row
    for x = 1 to col
      if x < col then
        write #1, linein(y,x),
        else
          write #1, linein(y,x)
      end if  
    next x    
  next y
close #1
Last edited by geofs on May 25, 2020 0:52, edited 6 times in total.
Tourist Trap
Posts: 2958
Joined: Jun 02, 2015 16:24

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by Tourist Trap »

Hi geofs,

looks fine. Just a thing you can use the 'CODE' tag (just above the text zone when you post a message) in order to make things easier to read here. Like this:

Code: Select all

'This is a variaion of my import, sort & export a CSV file that has an unkonwn amount of rows & columns tip,
'This sample of code will import data from a comma delimeted csv file that is in a jaggered format & export the data
'to a common comma delimeted csv format with selected columns.
'
' CSV to be imported example
'
' A,1,2,3,4,5
' B,1,2,3
' C,1,2,3,4,5,6,7,8
' etc
'
dim as string temp
dim as integer x, y, row, col

'find how many rows in csv file
row = 0
open "data.csv" for input as #1
while not eof (1)
line input #1, temp
row = row + 1
wend
seek #1, 1

'find how many columns in each row of csv file
dim as integer del(row)
col = 0
for y = 1 to row
line input #1, temp
del(y) = 1
for x = 1 to len(temp)
if mid(temp,x,1) = "," then del(y) = del(y) + 1
next x
if del(y) > col then col = del(y)
next y
seek #1, 1

'import data into a 2 dimentional array
dim as string linein(row,col)
for y = 1 to row
for x = 1 to del(y)
input #1, linein(y,x)
next x
next y
close #1

'export a new csv file with selected columns 1, 5, 6 & 9
open "save.csv" for output as #1
for y = 1 to row
write #1, linein(y,1),linein(y,5),linein(y,6),linein(y,9)
next y
close #1
Thanks anyway.
Lost Zergling
Posts: 538
Joined: Dec 02, 2011 22:51
Location: France

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by Lost Zergling »

From the moment you are in the option to store in memory (in a structured way) the entire file, you could consider using lists in addition to the arrays. The weak point of using only arrays (fixed length) is that you have to read the file twice (a first time to know the maximum number of columns), even when you store it in memory. By using variable length arrays, memory management and complexity can become problematic.
By oversimplifying the problem, let's say that unnecessary disk access must be minimized because it potentially consumes time and resources in an order of magnitude different from the rest of the system's resources, except that with little processing this will go unnoticed. In row two, the problem of memory management (consumption, cache usage, fragmentation, access time). In row three, the "computing" resources, ie the ALU and the stack (which also uses the cache). Of course all this is schematic, what matters are the bottlenecks.
To respond in a not too complex way to some of these problems, a solution would be to use a semi-persistent construction in memory with variable geometry (LZLE, I suggest).
You load, for example, the list as follows:
MyList.HashTag (Str_Line_Number): MyList.Val (Str_Line_In) in a single pass.
Then, to access one or more elements of the "array", you do:
MyList.HashTag (Str_Line_Number): StrToArray (MyList.Val, MyDynArray, separator)
(several functions to load a chain in an array can be found here viewtopic.php?f=2&t=28541#p272259)
Even if it is slower than an array, as soon as the disk cache loses its relevance (batch processing), the time savings will be noticeable.
The same goes for memory if you have to constantly resize it. Another advantage using lists (ie in addition to arrays in this use case) is the possible reuse of lines in chronological order (FIFO buffer using 'tracking') or computed, so as to handle large datasize files.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by jj2007 »

When you are ready to attack a slightly more complicated "jaggered" csv file, go to the United Nations, select "All countries", click "View data", then "Download the entire MDG dataset" (click on the @).
Line 42539, for example, should read
WB Staff estimates. Cambodia Poverty Assessment 2013 "Where Have All The Poor Gone?", May 2013. Cambodia Annual Socio-Economic Survey 2008
M$ Excel, for example, finds only
WB Staff estimates. Cambodia Poverty Assessment 2013 Where Have All The Poor Gone?"
Have fun.
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by dodicat »

I get row 42254 as the first row containing
"WB Staff estimates. Cambodia Poverty"

And there is a number "1529" as the first column.

Code: Select all

 

Function tally overload(somestring As String,partstring As String) As Integer
    Dim As Integer i,j,ln,lnp,count,num
    ln=Len(somestring)
    lnp=Len(partstring)
    count=0
    i=-1
    Do
        i+=1
        If somestring[i] <> partstring[0] Then continue do
        If somestring[i] = partstring[0] Then
            For j=0 To lnp-1 
                If somestring[j+i]<>partstring[j] then continue do
            Next j
        End If
        count+=1
        i=i+lnp-1
    Loop Until i>=ln-1 
    Return count
End Function

Function column(SomeString As String,num As Long,sep as string=",") As String
  Dim As Long count
  count=1
  Dim As Long position,p,L
  L=len(sep)
  position=Instr(SomeString,sep)
  If position=0 Then column= somestring:exit function
  If num=1 Then
  column= Mid(Somestring,1,position-1)
  else
  position=0
  Do
    count+=1
    position=Instr(position+L,SomeString,sep)
    If position=0 Then Exit Do
    If count=num Then
      p=position
      position=Instr(position+L,SomeString,sep)
      column= Mid(Somestring,p+L,position-p-L):exit function
    End If
  Loop
  end if
End Function

#include "file.bi"
Function loadfile overload(file As String) As String
    Var  f=Freefile
    if  FileExists(file)=0 then print file;"  not found,press a key":sleep:end
    Open file For Binary Access Read As #f
    Dim As String text
    If Lof(f) > 0 Then
        text = String(Lof(f), 0)
        Get #f, , text
    End If
    Close #f
    Return text
end function

Function loadfile(file As String,a() as string) As long
    Var  f=Freefile,row=0
    open file for input as #f
    while not eof (1)
      row+=1
    line input #f,a(row)
  wend
  close #f
  return row
end function


var lim= tally(loadfile("MDG_Export_20200523_145825294.csv"),chr(10))'end of line
redim as string s(1 to lim+1)
print "Number of rows ",lim
print
print

 loadfile("MDG_Export_20200523_145825294.csv",s())
 
 
 ''find a suitable row
 'for n as long=1 to lim
  'if instr(s(n),"WB Staff estimates. Cambodia Poverty") then
    'print n,s(n)
    'end if
  'next
'sleep

for n as long=1 to 110
  dim as string d=column(s(42254),n,",")
  if d="" then exit for
  print d
next
print
print "Done"
sleep

 
My result

Code: Select all

 Number of rows               44599


"1529"
"WB Staff estimates. Cambodia Poverty Assessment 2013 "Where Have All The Poor Gone?"
  May 2013. Cambodia Socio-Economic Survey 2010."

Done
 
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by jj2007 »

dodicat wrote:I get row 42254 as the first row containing
"WB Staff estimates. Cambodia Poverty"

And there is a number "1529" as the first column.
...
My result

Code: Select all

 Number of rows               44599


"1529"
"WB Staff estimates. Cambodia Poverty Assessment 2013 "Where Have All The Poor Gone?"
  May 2013. Cambodia Socio-Economic Survey 2010."

Done
 
Should be 1576<delimiter>WB Staff estimates. Cambodia Poverty Assessment 2013 "Where Have All The Poor Gone?", May 2013. Cambodia Socio-Economic Survey 2010. - check it in an editor.

But don't worry, even M$ Excel fails miserably: 1576<delimiter>WB Staff estimates. Cambodia Poverty Assessment 2013 Where Have All The Poor Gone?"<delimiter>May 2013. Cambodia Socio-Economic Survey 2010."

Recall does it correctly, though (but it took me a while to get it right, this csv file has quite some strange "features", such as line feed characters inside cells)
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by dodicat »

OK, in my download with the deliminator ","""
gives

Number of rows 44599


"1529"
WB Staff estimates. Cambodia Poverty Assessment 2013 "Where Have All The Poor Gone?", May 2013. Cambodia Socio-Economic Survey 2010."


and LibreOffice Calc gives:

1529 WB Staff estimates. Cambodia Poverty Assessment 2013 "Where Have All The Poor Gone? May 2013. Cambodia Socio-Economic Survey 2010."

So it is just a fiddle around with the deliminator. (Office work was never my game)

But the question remains as far as I can see, where have all the poor folk gone?

Thomas Gray:
Let not Ambition mock their useful toil,
Their homely joys, and destiny obscure;
Nor Grandeur hear with a disdainful smile
The short and simple annals of the poor.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by jj2007 »

dodicat wrote:and LibreOffice Calc gives:

1529 WB Staff estimates. Cambodia Poverty Assessment 2013 "Where Have All The Poor Gone? May 2013. Cambodia Socio-Economic Survey 2010."
Plain wrong. Excel is not correct, either, but at least it gets the "1576" (and it's 44552 lines, not 44599).

Here's what you see in an editor, the lines before and after included:

Code: Select all

"1575","Source: Poverty assessment. Sierra Leone Integrated Household Survey (SLIHS). Comparable with the 2003 estimates."
"1576","WB Staff estimates. Cambodia Poverty Assessment 2013 "Where Have All The Poor Gone?",  May 2013. Cambodia Socio-Economic Survey 2010."
"1577","Source: Poverty assessment. Sierra Leone Integrated Household Survey (SLIHS). Not comparable with prior series."
But the question remains as far as I can see, where have all the poor folk gone?

Thomas Gray:
Let not Ambition mock their useful toil,
Their homely joys, and destiny obscure;
Nor Grandeur hear with a disdainful smile
The short and simple annals of the poor.
;-)
robert
Posts: 169
Joined: Aug 06, 2019 18:45

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by robert »

jj2007 wrote:
dodicat wrote:and LibreOffice Calc gives:

1529 WB Staff estimates. Cambodia Poverty Assessment 2013 "Where Have All The Poor Gone? May 2013. Cambodia Socio-Economic Survey 2010."
Plain wrong. Excel is not correct, either, but at least it gets the "1576" (and it's 44552 lines, not 44599).

Here's what you see in an editor, the lines before and after included:

Code: Select all

"1575","Source: Poverty assessment. Sierra Leone Integrated Household Survey (SLIHS). Comparable with the 2003 estimates."
"1576","WB Staff estimates. Cambodia Poverty Assessment 2013 "Where Have All The Poor Gone?",  May 2013. Cambodia Socio-Economic Survey 2010."
"1577","Source: Poverty assessment. Sierra Leone Integrated Household Survey (SLIHS). Not comparable with prior series."
But the question remains as far as I can see, where have all the poor folk gone?

Thomas Gray:
Let not Ambition mock their useful toil,
Their homely joys, and destiny obscure;
Nor Grandeur hear with a disdainful smile
The short and simple annals of the poor.
;-)
NOT Plain wrong.

Download the dataset again and you will see that it is significantly different from your first download. Try and make sense of that ....
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by dodicat »

My lines are
"1575","RHS 2010"
"1576","Interim DHS 2007-2008"
So we have different files.
jj2007
We did similar stuff a while back with these MDG_export files.
viewtopic.php?f=3&t=27045&p=252969&hilit=csv#p252969

These MDG_Export files are Pen pusher very volatile IMHO.
geofs
Posts: 12
Joined: Apr 14, 2020 13:34

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by geofs »

While I was trying to solve the the issue of importing the MDG_Export file, It came to my attention that my code did not allow for a comma that may turn up in one of the fields, ie. "Hello, World", I have since updated my code.

As for the MDG Export file, I have reason to believe that the file is not formatted correctly to be a jaggered nor a normal comma delimited csv file, hence the reason it cannot be imported sucessfully into a spreadsheet, database table or my program.
Last edited by geofs on May 25, 2020 0:44, edited 3 times in total.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by jj2007 »

robert wrote:NOT Plain wrong.

Download the dataset again and you will see that it is significantly different from your first download. Try and make sense of that ...
Yep, they are actively working on their dataset. That happens, even at the United Nations Statistical Office ;-)
"1514","WB Staff estimates. Cambodia Poverty Assessment 2013 "Where Have All The Poor Gone?", May 2013. Cambodia Socio-Economic Survey 2011."
So the number of the footnote is now 1514, in line 42237 (says the editor) or 42232 (says my spreadsheet editor, and that's the correct count), or 42233 (says M$ Excel (my versions are 2003 and 2010)).

The latter splits column 2 incorrectly in two separate fields:
1514<delimiter>WB Staff estimates. Cambodia Poverty Assessment 2013 Where Have All The Poor Gone?"<delimiter> May 2013. Cambodia Socio-Economic Survey 2011."

Finally, LibreOffice takes 30 seconds to launch, then 19 seconds to open the csv file. After only 4 seconds, it finds "Where have all" in line 42231 and shows the following:
1514<delimiter>WB Staff estimates. Cambodia Poverty Assessment 2013 "Where Have All The Poor Gone?<delimiter> May 2013. Cambodia Socio-Economic Survey 2011."

So congrats to LibreOffice, they are almost compatible to M$ Excel. The 19 seconds to open a csv file incorrectly are a bit long (mine takes 67 milliseconds) but OK, it's free, isn't it?

Btw there is nothing wrong with the data, format-wise:

"1514","WB Staff estimates. Cambodia Poverty Assessment 2013 "Where Have All The Poor Gone?", May 2013. Cambodia Socio-Economic Survey 2011."

These are exactly two columns (not three, as Excel and LibreOffice believe). The red quotes ("WB Staff ... Survey 2011") delimit the field. Inside the field there is a text delimited by an even number of quotes ("Where...gone"). Then follows a comma before May 2013, and both Excel and LibreOffice forget that we are still inside a quoted text.
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by dodicat »

You can store (jaggedly) into a udt (type info), and have a look at any row of your choice.
Using the MDG_Export_20200523_145825294.csv or it's update if necessary.

Code: Select all


Function tally Overload(somestring As String,partstring As String) As Integer
  Dim As Integer i,j,ln,lnp,count,num
  ln=Len(somestring):If ln=0 Then Return 0
  lnp=Len(partstring):If lnp=0 Then Return 0
  count=0
  i=-1
  Do
    i+=1
    If somestring[i] <> partstring[0] Then Continue Do
    If somestring[i] = partstring[0] Then
      For j=0 To lnp-1 
        If somestring[j+i]<>partstring[j] Then Continue Do
      Next j
    End If
    count+=1
    i=i+lnp-1
  Loop Until i>=ln-1 
  Return count
End Function

Function findAndReplace(original As String ,find As String ,replace As String) As String
  If Len(find) = 0 Then Return original
  Var t=tally(original,find) 'find occurencies of find 
  If t=0 Then Return original
  Dim As Long found,n,staid,m
  Var Lf = Len(find),Lr = Len(replace),Lo = Len(original)
  t = Len(original) - t * Lf + t * Lr             'length of output string
  Dim As String res = String(t,0)                 'output string
  Do
    If original[n] = find[0] Then               'got a possible
      For m = 0 To Lf - 1
        If original[n + m] <> find[m] Then Goto lbl 'no
      Next m
      found = 1                               'Bingo
    End If
    If found Then
      For m = 0 To Lr - 1
        res[staid] = replace[m]             'insert the replacerment
        staid += 1
      Next m
      n += Lf
      found = 0
      Continue Do
    End If
    lbl:
    res[staid] = original[n]
    staid += 1
    n += 1
  Loop Until n >= Lo
  Return res
End Function

Function StringSplit(s_in As String,chars As String,result() As String) As Long
  Dim As Long ctr,ctr2,k,n,LC=Len(chars)
  Dim As boolean _tally(Len(s_in))
  #macro check_instring()
  n=0
  While n<Lc
    If chars[n]=s_in[k] Then 
      _tally(k)=true
      If (ctr2-1) Then ctr+=1
      ctr2=0
      Exit While
    End If
    n+=1
  Wend
  #endmacro
  
  #macro split()
  If _tally(k) Then
    If (ctr2-1) Then ctr+=1:result(ctr)=Mid(s_in,k+2-ctr2,ctr2-1)
    ctr2=0
  End If
  #endmacro
  '==================  LOOP TWICE =======================
  For k  =0 To Len(s_in)-1
    ctr2+=1:check_instring()
  Next k
  If ctr=0 Then
    If Len(s_in) Andalso Instr(chars,Chr(s_in[0])) Then ctr=1':beep
  End If
  If ctr Then Redim result(1 To ctr): ctr=0:ctr2=0 Else  Return 0
  For k  =0 To Len(s_in)-1
    ctr2+=1:split()
  Next k
  '===================== Last one ========================
  If ctr2>0 Then
    Redim Preserve result(1 To ctr+1)
    result(ctr+1)=Mid(s_in,k+1-ctr2,ctr2)
  End If
  Return Ubound(result)
End Function


#include "file.bi"
Function loadfile Overload(file As String) As String
  Var  f=Freefile
  If  Fileexists(file)=0 Then Print file;"  not found,press a key":Sleep:End
  Open file For Binary Access Read As #f
  Dim As String text
  If Lof(f) > 0 Then
    text = String(Lof(f), 0)
    Get #f, , text
  End If
  Close #f
  Return text
End Function

Type info
  As String cols(Any)
  Dim As Long id
  Declare Sub Get(n As Long,() As info)
  Declare Sub show()
End Type

Sub info.get(n As Long,i() As info) 
  This= i(n)
  this.id=n
End Sub 

Sub info.show()
  Print "Row number ";this.id
  Print "Columns    ";Ubound(cols)
  Print
  For n As Long=1 To Ubound(cols)
    cols(n)=findandreplace(cols(n),Chr(0),", ")
    Print cols(n)
  Next n
End Sub

'====================

Dim As Double t=Timer
Dim As String f=loadfile("MDG_Export_20200523_145825294.csv")

f=findAndReplace(f,", ",Chr(0)) 'sieve out internal commas

Redim As String s()
stringsplit(f,Chr(10),s())  'load all lines (rows) to string array
Print "Number of rows ",Ubound(s)
Print

Dim As info i(1 To Ubound(s))'create a jagged info type


For n As Long=1 To Ubound(s) 'split from string array to info array by comma
  stringsplit(s(n),",", i(n).cols())
Next n

Dim As info ans 'get some results

ans.get(42254,i())
ans.show()
print
Print "time taken ";Timer-t
Sleep

  
My results

Code: Select all

 Number of rows               44597

Row number  42254
Columns     2

"1529"
"WB Staff estimates. Cambodia Poverty Assessment 2013 "Where Have All The Poor Gone?",  May 2013. Cambodia Socio-Economi
c Survey 2010."

time taken  1.223847500048578
 
geofs
Posts: 12
Joined: Apr 14, 2020 13:34

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by geofs »

Sorry if I sound ignorant, I find your code a little difficult to follow, is your code importing the data into an array with seperate fields?
dodicat wrote:You can store (jaggedly) into a udt (type info), and have a look at any row of your choice.
Using the MDG_Export_20200523_145825294.csv or it's update if necessary.

Code: Select all


Function tally Overload(somestring As String,partstring As String) As Integer
  Dim As Integer i,j,ln,lnp,count,num
  ln=Len(somestring):If ln=0 Then Return 0
  lnp=Len(partstring):If lnp=0 Then Return 0
  count=0
  i=-1
  Do
    i+=1
    If somestring[i] <> partstring[0] Then Continue Do
    If somestring[i] = partstring[0] Then
      For j=0 To lnp-1 
        If somestring[j+i]<>partstring[j] Then Continue Do
      Next j
    End If
    count+=1
    i=i+lnp-1
  Loop Until i>=ln-1 
  Return count
End Function

Function findAndReplace(original As String ,find As String ,replace As String) As String
  If Len(find) = 0 Then Return original
  Var t=tally(original,find) 'find occurencies of find 
  If t=0 Then Return original
  Dim As Long found,n,staid,m
  Var Lf = Len(find),Lr = Len(replace),Lo = Len(original)
  t = Len(original) - t * Lf + t * Lr             'length of output string
  Dim As String res = String(t,0)                 'output string
  Do
    If original[n] = find[0] Then               'got a possible
      For m = 0 To Lf - 1
        If original[n + m] <> find[m] Then Goto lbl 'no
      Next m
      found = 1                               'Bingo
    End If
    If found Then
      For m = 0 To Lr - 1
        res[staid] = replace[m]             'insert the replacerment
        staid += 1
      Next m
      n += Lf
      found = 0
      Continue Do
    End If
    lbl:
    res[staid] = original[n]
    staid += 1
    n += 1
  Loop Until n >= Lo
  Return res
End Function

Function StringSplit(s_in As String,chars As String,result() As String) As Long
  Dim As Long ctr,ctr2,k,n,LC=Len(chars)
  Dim As boolean _tally(Len(s_in))
  #macro check_instring()
  n=0
  While n<Lc
    If chars[n]=s_in[k] Then 
      _tally(k)=true
      If (ctr2-1) Then ctr+=1
      ctr2=0
      Exit While
    End If
    n+=1
  Wend
  #endmacro
  
  #macro split()
  If _tally(k) Then
    If (ctr2-1) Then ctr+=1:result(ctr)=Mid(s_in,k+2-ctr2,ctr2-1)
    ctr2=0
  End If
  #endmacro
  '==================  LOOP TWICE =======================
  For k  =0 To Len(s_in)-1
    ctr2+=1:check_instring()
  Next k
  If ctr=0 Then
    If Len(s_in) Andalso Instr(chars,Chr(s_in[0])) Then ctr=1':beep
  End If
  If ctr Then Redim result(1 To ctr): ctr=0:ctr2=0 Else  Return 0
  For k  =0 To Len(s_in)-1
    ctr2+=1:split()
  Next k
  '===================== Last one ========================
  If ctr2>0 Then
    Redim Preserve result(1 To ctr+1)
    result(ctr+1)=Mid(s_in,k+1-ctr2,ctr2)
  End If
  Return Ubound(result)
End Function


#include "file.bi"
Function loadfile Overload(file As String) As String
  Var  f=Freefile
  If  Fileexists(file)=0 Then Print file;"  not found,press a key":Sleep:End
  Open file For Binary Access Read As #f
  Dim As String text
  If Lof(f) > 0 Then
    text = String(Lof(f), 0)
    Get #f, , text
  End If
  Close #f
  Return text
End Function

Type info
  As String cols(Any)
  Dim As Long id
  Declare Sub Get(n As Long,() As info)
  Declare Sub show()
End Type

Sub info.get(n As Long,i() As info) 
  This= i(n)
  this.id=n
End Sub 

Sub info.show()
  Print "Row number ";this.id
  Print "Columns    ";Ubound(cols)
  Print
  For n As Long=1 To Ubound(cols)
    cols(n)=findandreplace(cols(n),Chr(0),", ")
    Print cols(n)
  Next n
End Sub

'====================

Dim As Double t=Timer
Dim As String f=loadfile("MDG_Export_20200523_145825294.csv")

f=findAndReplace(f,", ",Chr(0)) 'sieve out internal commas

Redim As String s()
stringsplit(f,Chr(10),s())  'load all lines (rows) to string array
Print "Number of rows ",Ubound(s)
Print

Dim As info i(1 To Ubound(s))'create a jagged info type


For n As Long=1 To Ubound(s) 'split from string array to info array by comma
  stringsplit(s(n),",", i(n).cols())
Next n

Dim As info ans 'get some results

ans.get(42254,i())
ans.show()
print
Print "time taken ";Timer-t
Sleep

  
My results

Code: Select all

 Number of rows               44597

Row number  42254
Columns     2

"1529"
"WB Staff estimates. Cambodia Poverty Assessment 2013 "Where Have All The Poor Gone?",  May 2013. Cambodia Socio-Economi
c Survey 2010."

time taken  1.223847500048578
 
Last edited by geofs on May 25, 2020 0:46, edited 2 times in total.
geofs
Posts: 12
Joined: Apr 14, 2020 13:34

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by geofs »

Cheers for the tip ...
Tourist Trap wrote:Hi geofs,

looks fine. Just a thing you can use the 'CODE' tag (just above the text zone when you post a message) in order to make things easier to read here. Like this:

Code: Select all

'This is a variaion of my import, sort & export a CSV file that has an unkonwn amount of rows & columns tip,
'This sample of code will import data from a comma delimeted csv file that is in a jaggered format & export the data
'to a common comma delimeted csv format with selected columns.
'
' CSV to be imported example
'
' A,1,2,3,4,5
' B,1,2,3
' C,1,2,3,4,5,6,7,8
' etc
'
dim as string temp
dim as integer x, y, row, col

'find how many rows in csv file
row = 0
open "data.csv" for input as #1
while not eof (1)
line input #1, temp
row = row + 1
wend
seek #1, 1

'find how many columns in each row of csv file
dim as integer del(row)
col = 0
for y = 1 to row
line input #1, temp
del(y) = 1
for x = 1 to len(temp)
if mid(temp,x,1) = "," then del(y) = del(y) + 1
next x
if del(y) > col then col = del(y)
next y
seek #1, 1

'import data into a 2 dimentional array
dim as string linein(row,col)
for y = 1 to row
for x = 1 to del(y)
input #1, linein(y,x)
next x
next y
close #1

'export a new csv file with selected columns 1, 5, 6 & 9
open "save.csv" for output as #1
for y = 1 to row
write #1, linein(y,1),linein(y,5),linein(y,6),linein(y,9)
next y
close #1
Thanks anyway.
Post Reply