Friday 27 June 2008

Importing the NLPG (2)

Phew! This has been a bit of a nightmare. The individual .csv files are too large in size for my laptop or the development server that I use (each has 2GB of RAM). What I have ended up doing is the following:
  1. Written a VB script to split each master .csv file into a number of seperate ones, each containing 5000 rows (plus the remainder).
  2. Used a second VB script to 'pre-process' the file ready for import into SQL Server. This has involved working through each file created in 1 line by line and adding the correct number of commas to fit the number of columns in the table schema I am importing into.
  3. When I end up with thousands of pre-processed .csv files I will need to import them into SQL Server. My web trawling on how to do this with a T-SQL script has drawn a blank but it looks like I will be able to create an SSIS package to achieve the import. This is something I have never done before so it's yet more learning to go through.

I have tested 1 and 2 out over the last few days. 1 is pretty slow to run (it gets through a file in about two hours and there are 70 + files). I might have to look at splitting the job between several servers. 2 is very fast - I did 177 files in about 10 minutes on my laptop today.

No comments: