Friday 28 December 2007

Distributed Databases

When would you choose to implement distributed databases, rather than local databases? Would you go for this design if all your computers were localized in one building? What if they are spread out in your country? What would you recommend when they are spread worldwide? When would you mirror databases and when would you use databases tailored for the local environment?

I would move from local to distributed databases (DD) as soon as some or all of the data held required synchronisation between users; as soon as the data required more security than a local database management system (DBMS) such as Microsoft Access can offer; or as soon as the data becomes 'mission critical'. Of course a DD requires the application(s) submitting data to it to be compliant; either this or there must be capacity to implement applications that can interface with a DD. The decision would also be underpinned by financial considerations - implementing a DD requires an investment in both hardware and expertise.

In the case of all computers localised in one building then the case for a DD is harder to make. Imagine the building is home to a sales and marketing company with a small Customer Relationship Management (CRM) system. They could easily all hold a copy of the CRM on their local machines. Provided that they were happy to only ever have all the data in the same state once a day after a job to synch the data was run then there would be no problem. Any faults in the synching of the data could be easily tracked to the machine responsible. Of course, doing business this way is inefficient and not very secure. A simple distributed database - perhaps comprising a single SQL 2005 server, a seperate server to store data and log files, a tape drive to put the backups on and a simple web app to perform CRM functionality - would make the DD in this case relatively inexpensive.

What if the computers are spread out in the country? A distributed database is a must in this case because the logistics involved in maintaning local copies of data across multiple sites is overwhelming. In this case its perfectly feasible to have the distributed database located in one site in the country and all the computers communicating with this site. When the computers are spread worldwide a different approach is required. One such approach is the OceanStore system. "The core of the system is composed of a multitude of highly connected “pools”, among which data is allowed to “flow” freely. Clients connect to one or more pools, perhaps intermittently" (Kubiatowicz, Bindel, Chen et al, 2000). These pools - located in various locations globally - allow for much greater resilience in the face of disaster because all database objects are replicated across all pools, and the failure of one pool does not affect the availability of data.

"Fast failover with minimal data loss has traditionally involved higher hardware cost and greater software complexity. Database mirroring, however, can fail over quickly with no loss of committed data, does not require proprietary hardware, and is easy to set up and manage" (Talmage, 2005). Despite what Ron Talmage says implementing advanced database techniques such as mirroring requires expertise to be constantly available to keep it maintained; if an organisation were unable to obtain this availability, or would not benefit from mirroring because unavailablity of data was not high up the risk assessment, then I would suggest that tailoring databases for local use is the cheaper and quicker option - especially in the case of smaller businesses that only employ a small number of people. In a large organisation I think that the existence of data on databases is as important as filing tax returns and can't see why they wouldn't want to do everything possible to secure their data and make it available - especially in the post Sarbanes - Oxley world.

Refs:

Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C. & Zhao, B. (2000)
OceanStore: An Architecture for Global-Scale Persistence Storage [Online]
Proceedings of the Ninth international Conference on Architectural Support for Programming Languages and Operating Systems, 2000
Available from http://oceanstore.cs.berkeley.edu/publications/papers/pdf/asplos00.pdf (Accessed 28th Dec. 2007)

Talmage, R (2005) Database Mirroring in SQL Server 2005 [Online] Microsoft Corporation
Available from http://www.microsoft.com/technet/prodtechnol/sql/2005/dbmirror.mspx (Accessed 28th Dec. 2007)

No comments: