Friday, 20 June 2008

Structure of the initial NLPG dataset

Today I got my first look at the initial NLPG dataset. I accessed the data on the NLPG / Intelligent Addressing FTP server. I was expecting to see one big .csv file but no, there are actually 19 seperate .zip files on the server representing large geographic areas - i.e. Greater London, North East, Wales etc. If you open one of the .zip files up the .csv's are contained inside - except each large geographic area is subdivided - so East Midlands has a different .csv for Leicestershire, Nottinghamshire, Derbyshire etc. Oh, and each large geographic region has a unitary and non - unitary .zip file but it looks like there is only one unitary .csv file in each .zip file.

Phew! Wikipedia ( reckons there are 49 administrative counties in England and Wales so I can expect to see 49 .csv files (plus another 9 unitary .zip files) making 58 in total.

I will need to debatch each of these seperately into SQL Server and then run a join script to create (hopefully) a unique record for each distinct address in the country. I can then load these as entities into the data hub.

No comments: