Thursday 12 June 2008

What data do I want to take from the NLPG?

There is a wealth of property information in the NLPG and to take everything would be overkill. My aim is to build a series of single row address records for each distinct UPRN in the database - no duplicates allowed! This data can be brought together from four of the record types in the NLPG:

  • Street Record (11)
  • Street Descriptor (15)
  • Basic Land and Property Unit (21)
  • Land and Property Identifier (24)

by writing a database query that joins the tables together. Referential integrity is pretty easy - the join is either made on UPRN or USRN.

The quest for no duplicates is hampered, however, by the fact that each LPI and BLPU has a status. These range from 1 - 9 and indicate that the record is approved, awaiting approval or historic. Obviously I don't want to include all the historic data - I just want the most up to date information about an addres. Fortunately each record also has a last updated date - so I can take the most recently updated record - and a processing order - so I can take the highest value of this for each UPRN too. It's great working with a dataset that has been well designed!

By taking this approach you seriously reduce the amount of data that you have to store and searching is therefore quicker when you're trying to use this data at the enterprise level. The test extract file from Intelligent Addressing contained 1506 rows of LPI's; by using a filtering script only 347 distinct current addresses are created - 77% of the information is discarded because it's not required.

No comments: