Monday, 14 December 2009

SWIFT Gazetteer

We were commissioned on Friday to look into matching the SWIFT gazetteer (which is one of the databases used by social / wellbeing services) against the Local Land and Property Gazetteer. I was given a file that contained just over 110,000 address rows from SWIFT and started doing some of the investigation work today.

Happily about 80% of the SWIFT addresses had a UPRN (Unique Property Reference Number) next to them and I was able to match these pretty easily to the LLPG UPRNs; all of the values that were on the Knowsley LLPG file looked right apart from some postcode discrepanicies. Some of the UPRNs in SWIFT are out of borough (mainly Liverpool / Sefton addresses) and I'll need to get a copy of the Merseyside - wide gazetteer and repeat the process.

This leaves about 10,000 rows (ish) that need to be matched 'by hand' and I'll continue working on the routines this week. Doing work like this is pretty tedious but it's important to be thorough and to have a clear step - by - step process so that the work you do is transparent. We've done data matching exercises in the past so now I can use all that experience and write a clear, concise document explaining what I have done, what the results have been etc. (it's almost like writing up a science project at school). Someone else could easily follow my work and recreate what I have done, or use it as a basis for matching a different data set.

SWIFT doesn't follow the BS7666 format for organising address and there is data randomly strewn across multiple columns - you can see the house number in as many as five of the columns. So I'll need to do some work which standardises all the data into just a few columns and then try to match on these. The important thing is being able to explain why each row didn't match and this could quickly become labour - intensive. I'll post some more updates this week.