Import a jaggered CSV file and save data to a common comma delimeted csv format

Post your FreeBASIC source, examples, tips and tricks here. Please don’t post code without including an explanation.
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by dodicat »

Hi geofs.
Rather than a 2d array I use a type info with a dynamic array to store the data.
The id field is only there to store a row number (element of the array) so I can print it at the show sub.
I assume that the .csv file is grammatically correct, that every internal comma has a space behind it.
I load the whole array into a string.
I replace every comma + space (internal commas) with a chr(0).
Then I load the whole string into an array with chr(10) as the deliminator so this array stores all the rows.
Then I go through all the rows and fill the .cols array with the columns.
(They are of various dimensions, not all 98)
If I want to show a row with it's columns I then replace the chr(0) with it's original comma+space.

Nothing is saved to file in my code.

My subs are a bit long due to string indexing for speed.
geofs
Posts: 12
Joined: Apr 14, 2020 13:34

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by geofs »

G'day jj2007, though the data technically appears to be in the correct format, it is far from ideal from a Data Analysts point of view (talking from experience), double quotation marks & commas should be avoided when entering data into fields as a rule so as it makes it easy to import it into whatever application you choose. Though I have since discovered that the latest beta of Softmaker Office Planmaker 2021 Application appears to be able to open the file correctly?

1549 WB Staff estimates. Cambodia Poverty Assessment 2013 Where Have All The Poor Gone?, May 2013. Cambodia Socio-Economic Survey 2011.

I have inspected the spreadsheet from head to toe and it seems to be correctly displayed, looks like MS O & Libre O need to do some homework :)

jj2007 wrote:
robert wrote:NOT Plain wrong.

Download the dataset again and you will see that it is significantly different from your first download. Try and make sense of that ...
Yep, they are actively working on their dataset. That happens, even at the United Nations Statistical Office ;-)
"1514","WB Staff estimates. Cambodia Poverty Assessment 2013 "Where Have All The Poor Gone?", May 2013. Cambodia Socio-Economic Survey 2011."
So the number of the footnote is now 1514, in line 42237 (says the editor) or 42232 (says my spreadsheet editor, and that's the correct count), or 42233 (says M$ Excel (my versions are 2003 and 2010)).

The latter splits column 2 incorrectly in two separate fields:
1514<delimiter>WB Staff estimates. Cambodia Poverty Assessment 2013 Where Have All The Poor Gone?"<delimiter> May 2013. Cambodia Socio-Economic Survey 2011."

Finally, LibreOffice takes 30 seconds to launch, then 19 seconds to open the csv file. After only 4 seconds, it finds "Where have all" in line 42231 and shows the following:
1514<delimiter>WB Staff estimates. Cambodia Poverty Assessment 2013 "Where Have All The Poor Gone?<delimiter> May 2013. Cambodia Socio-Economic Survey 2011."

So congrats to LibreOffice, they are almost compatible to M$ Excel. The 19 seconds to open a csv file incorrectly are a bit long (mine takes 67 milliseconds) but OK, it's free, isn't it?

Btw there is nothing wrong with the data, format-wise:

"1514","WB Staff estimates. Cambodia Poverty Assessment 2013 "Where Have All The Poor Gone?", May 2013. Cambodia Socio-Economic Survey 2011."

These are exactly two columns (not three, as Excel and LibreOffice believe). The red quotes ("WB Staff ... Survey 2011") delimit the field. Inside the field there is a text delimited by an even number of quotes ("Where...gone"). Then follows a comma before May 2013, and both Excel and LibreOffice forget that we are still inside a quoted text.
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by dodicat »

What downloaded file gives
1549 WB Staff estimates. Cambodia Poverty Assessment 2013 Where Have All The Poor Gone?, May 2013. Cambodia Socio-Economic Survey 2011.
because
MDG_Export_20200523_145825294.csv
disagrees.
checked with Libre and wordpad.
geofs
Posts: 12
Joined: Apr 14, 2020 13:34

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by geofs »

I downloaded the file again then another 20 mins later and got a different result!, seems that we have a moving target!

Tue 26 May 2020 09∶08∶34
1,937,510 bytes Zipped
MDG_Export_20200526_013735313.csv =
1560 WB Staff estimates. Cambodia Poverty Assessment 2013 Where Have All The Poor Gone?, May 2013. Cambodia Socio-Economic Survey 2011.

Tue 26 May 2020 09∶31∶13
1,937,942 bytes Zipped
MDG_Export_20200526_015725456.csv =
1627 WB Staff estimates. Cambodia Poverty Assessment 2013 Where Have All The Poor Gone?, May 2013. Cambodia Socio-Economic Survey 2011.

Confirmed by Text Editor, SoftMaker Office PlanMaker & LibreOffice Calc
dodicat wrote:What downloaded file gives
1549 WB Staff estimates. Cambodia Poverty Assessment 2013 Where Have All The Poor Gone?, May 2013. Cambodia Socio-Economic Survey 2011.
because
MDG_Export_20200523_145825294.csv
disagrees.
checked with Libre and wordpad.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by jj2007 »

geofs wrote:I downloaded the file again then another 20 mins later and got a different result!, seems that we have a moving target!
Yes, it seems somebody is working on the footnotes. So the footnote number may have changed, but the relevant cell text should read

Code: Select all

WB Staff estimates. Cambodia Poverty Assessment 2013 "Where Have All The Poor Gone?",  May 2013. Cambodia Socio-Economic Survey 2011.
Btw this lovely file has also approx. 34 cells containing linefeed characters, e.g.

Code: Select all

December 2012.
Excluding fixed wireless local loop (WLL) subscriptions, ISDN voice-channel equivalents.
Tourist Trap
Posts: 2958
Joined: Jun 02, 2015 16:24

Re: Import a jaggered CSV file and save data to a common comma delimeted csv format

Post by Tourist Trap »

geofs wrote:Cheers for the tip ...
Approved :)
Post Reply