Sunday 30 December 2007

Indexed vs. Hashed Files

Compare the two non-sequential file structure models: the random (hashed) file and the index file. What advantages does the first one have over the second and what advantages does the second have over the first? What would be your criteria for choosing one over the other for different applications?

"An index for a file contains a list of the keys stored in the file along with entries indicating where the record containing each key is stored" (Brookshear, 2007). Without the index, searching for a particular value in the file would require a full scan of the file. Using an index greatly speeds up the search process. In a non-indexed file with 1000 records an average of 500 comparisons would be needed to find a record. But with an evenly spaced index 100 records in size the number of operations is reduced by a factor of 5 (Callari, 1996). There are several disadvantages to using an index - the need to provide storage space for the index, the fact that the index must be updated as soon as the content of the file is updated, and the fact that an index only improves search speed once the file gets past a certain size.

In a hashed file, data storage space is divided into several buckets and the records in the file are distributed among the buckets according to an algorithm (Brookshear, 2007). The speed of searching for a record is improved by applying the reverse of the algorithm to determine which bucket the record is held in, then only having to search through that bucket for the record rather than the whole file. Problems occur in a hashed file because the way that records are distributed to buckets can result in more search keys ending up in one bucket than others (clustering). Records can be hashed with the same value - collision - and it is possible for buckets to overflow if some means of dynamically increasing their size is not implemented.

The fact that indexes are one of the fundamental concepts behind database design leads me to believe that indexes are best used when there are no limits on storage space - a large database will always have space available for indexes because of the extra efficiency in searching. Also, I think that indexes are less prone to errors than hashing files - no chance of collision or clustering. However hashing isn't as wasteful in terms of needing extra resources to implement, and is quicker when done properly because there aren't two files to maintain.

Refs:

Brookshear, J. G. (2007) Computer Science, An Overview (9th Ed.) Pearson Education Inc., Boston, MA

Callari, F (2006) Indexed Sequential [Online] Quebec: McGill University, Montreal, Quebec, Canada
Available from http://www.cim.mcgill.ca/~franco/OpSys-304-427/lecture-notes/node56.html (Accessed 30th Dec. 2007)

Friday 28 December 2007

Distributed Databases

When would you choose to implement distributed databases, rather than local databases? Would you go for this design if all your computers were localized in one building? What if they are spread out in your country? What would you recommend when they are spread worldwide? When would you mirror databases and when would you use databases tailored for the local environment?

I would move from local to distributed databases (DD) as soon as some or all of the data held required synchronisation between users; as soon as the data required more security than a local database management system (DBMS) such as Microsoft Access can offer; or as soon as the data becomes 'mission critical'. Of course a DD requires the application(s) submitting data to it to be compliant; either this or there must be capacity to implement applications that can interface with a DD. The decision would also be underpinned by financial considerations - implementing a DD requires an investment in both hardware and expertise.

In the case of all computers localised in one building then the case for a DD is harder to make. Imagine the building is home to a sales and marketing company with a small Customer Relationship Management (CRM) system. They could easily all hold a copy of the CRM on their local machines. Provided that they were happy to only ever have all the data in the same state once a day after a job to synch the data was run then there would be no problem. Any faults in the synching of the data could be easily tracked to the machine responsible. Of course, doing business this way is inefficient and not very secure. A simple distributed database - perhaps comprising a single SQL 2005 server, a seperate server to store data and log files, a tape drive to put the backups on and a simple web app to perform CRM functionality - would make the DD in this case relatively inexpensive.

What if the computers are spread out in the country? A distributed database is a must in this case because the logistics involved in maintaning local copies of data across multiple sites is overwhelming. In this case its perfectly feasible to have the distributed database located in one site in the country and all the computers communicating with this site. When the computers are spread worldwide a different approach is required. One such approach is the OceanStore system. "The core of the system is composed of a multitude of highly connected “pools”, among which data is allowed to “flow” freely. Clients connect to one or more pools, perhaps intermittently" (Kubiatowicz, Bindel, Chen et al, 2000). These pools - located in various locations globally - allow for much greater resilience in the face of disaster because all database objects are replicated across all pools, and the failure of one pool does not affect the availability of data.

"Fast failover with minimal data loss has traditionally involved higher hardware cost and greater software complexity. Database mirroring, however, can fail over quickly with no loss of committed data, does not require proprietary hardware, and is easy to set up and manage" (Talmage, 2005). Despite what Ron Talmage says implementing advanced database techniques such as mirroring requires expertise to be constantly available to keep it maintained; if an organisation were unable to obtain this availability, or would not benefit from mirroring because unavailablity of data was not high up the risk assessment, then I would suggest that tailoring databases for local use is the cheaper and quicker option - especially in the case of smaller businesses that only employ a small number of people. In a large organisation I think that the existence of data on databases is as important as filing tax returns and can't see why they wouldn't want to do everything possible to secure their data and make it available - especially in the post Sarbanes - Oxley world.

Refs:

Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C. & Zhao, B. (2000)
OceanStore: An Architecture for Global-Scale Persistence Storage [Online]
Proceedings of the Ninth international Conference on Architectural Support for Programming Languages and Operating Systems, 2000
Available from http://oceanstore.cs.berkeley.edu/publications/papers/pdf/asplos00.pdf (Accessed 28th Dec. 2007)

Talmage, R (2005) Database Mirroring in SQL Server 2005 [Online] Microsoft Corporation
Available from http://www.microsoft.com/technet/prodtechnol/sql/2005/dbmirror.mspx (Accessed 28th Dec. 2007)

Sunday 23 December 2007

Storing Data in a Sparse Matrix

One of the problems of storing data in a matrix (a two-dimensional Cartesian structure) is that if not all of the elements are used, there might be quite a waste of space. In order to handle this, we can use a construct called a "sparse matrix", where only the active elements appear. Each such element is accompanied by its two indexes (the row and the column). Discuss in what ways such a structure is similar to and/or different than a list.

A Linked List and a Sparse Matrix are two different data storage formats. "Many numerical problems in real life
applications such as engineering, scientific computing and economics use huge matrices with very few non-zero
elements, referred to as sparse matrices" (Smailbegovic, Gaydadjiev & Vassiliadis, n.d.). The most obvious way to
store a sparse matrix would be in a multi - dimensional array. However this is extremely wasteful because the
majority amount of zero elements that will never get accessed must still have memory allocated to them. "With a
little extra effort, the size of the array can be deferred until the array is created at runtime, but after that
it remains fixed" (Parlante, 2001). When an array is created the memory used by it is all stored in one chunk.

In contrast, a Linked List consists of a series of nodes that have their own independent memory allocation. These
nodes are 'linked' through pointers - each node in the Linked List usually contains two fields, one of which
stores the data for the node, the second of which contains a pointer to the next node in the chain.

The Linked List holds a number of advantages over the Sparse Matrix implemented as a multi - dimensional array.
The first is the fact that the Linked List is much more dynamic. Dynamically changing the size of an array is hard
work for the programmer, whereas a Linked List can be easily grown as required by simply chaining more nodes
together. Also, because an array allocates all its memory in one block, making changes to the array requires
making changes to all its elements. A Linked List is not constrained in this way.

An array does outperform a linked list, however, when thinking about the time taken to access a node - this is
always constant in an array of any size, whereas accessing a node lower down the chain in a linked list sees
increasingly degraded performance. But implementing a sparse matrix using a Linked List rather than a
multi-dimensional array does give benefits to the programmer in terms of performance and ease of implementation.

Refs:

Smailbegovic, F., Gaydadjiev, N. G., Vassiliadi, S. (n.d.) Sparse Matrix Storage Format [Online] Delft, Netherlands: Delft University of Technology
Available from http://ce.et.tudelft.nl/publicationfiles/1106_547_smailbegovic.pdf (Accessed 23rd Dec 2007)

Parlante, N (2001) Linked List Basics [Online] Stanford, USA: University of Stanford
Available from http://cslibrary.stanford.edu/103/LinkedListBasics.pdf (Accessed 23rd Dec 2007)

Friday 21 December 2007

“What we have is a total failure to communicate”

It is said that communications in the analysis step often breaks down? Could you comment on this? Have you experienced this phenomenon? If not, would you accept this statement, and why?

"The requirements analysis and design phase is believed to be the most critical part of the software development lifecycle " (Catherine, 2007). In a presentation on the topic, John Petlick of De Paul University in Chicago describes one of the deficiencies of the waterfall model as "all requirements must be known upfront" (n.d.), which is never the case when communication does break down as I have experienced.

I work as part of a team of four developers. We are expected to deliver business requirements across a number of platforms. But we are constantly involved in development projects where the analysis phase never happens or is over far too quickly. In the former case, the project becomes technology led. The design and execution is done by the development team who fully understand how to make the application in question output a message or file but fail to understand what the application actually means to the business, its users and its customers. In the latter case the analysis phase often starts and ends in a single meeting, with the output being a very rough specification document that lacks the low - level detail necessary when architecting a system. An exercise of 'fill in the blanks' then begins for the developers while the business customers believe their part in the process is done or are too busy to devote any more time to the analysis. The attitude quickly becomes 'it's now the responsibility of the IT people to deliver' which is ludicrous. Getting developers and business analysts to to communicate with more quality and quantity is one of the fundamentals of the agile / xtreme programming model where a business analyst is part of the development team full time and on hand to answer questions straight away. This would be light years away from the current process where progress is viewed on a weekly or fortnightly basis, and it's only then that the developers get their mistakes pointed out.

I also have experience of where the analysis phase is done properly. This has tended to happen on projects where I am the sole developer and where I have a good enough relationship with the customer to go to them face - to - face every day and ask questions. Communications in the analysis phase of local government IT projects often break down because groups of key people have a 'history' that stops them from working together. But until a level of understanding that sees the business afforded as much responsibility in the software engineering realm as the developers is reached, then software projects will continue to fail.

In the UK central government is trying to roll out Contact Point, an index of child information. It was recently announced that the deadline of April 2008 would be put back until October 2008 - the second time a deadline has been extended. At meetings it is always obvious that the developers of the system - one of the largest IT consultancy firms in the country - and its user base (people like me and my colleagues) are consistently on different wavelengths.

Refs:
Catherine, A (2007) Software Development Life Cycle: Basic Steps to Strong Product [Online] ArticlesBase.com
Available from http://www.articlesbase.com/computers-articles/software-development-life-cycle-basic-steps-to-strong-product-194148.html (Accessed 21 Dec 2007)

Petlick, J (n.d.) Software Development Life Cycle (SDLC) [Online] De Paul University, Chicago, IL
Available from condor.depaul.edu/~jpetlick/extra/394/Session2.ppt (Accessed 21 Dec 2007)

Sunday 16 December 2007

Object Oriented Programming vs. Structured Programming Paradigm

Discuss why Object Oriented Programming is considered a better choice than the structured programming paradigm. Does it reflect our "natural" way of thinking or do you find it artificial?

The most obvious reason that Object Oriented Programming (OOP) is considered a better choice is that is supposed to better represent the way that humans think about the world than the structured programming paradigm. We don't really view life as a series of instructions waiting to be processed - this is how a computer operates. Rather, we see ourselves as one of millions of objects that interact with each other, and OOP should make writing software easier because it embraces this view. At a more operational level the supremacy of OOP over procedural programming is nicely summarised by Matt Weisfeld. "Access to data is uncontrolled and unpredicable [in procedural programming] and because you have no control over who has access to the data testing and debugging are much more difficult. Objects address this problem by combining data and behaviour into a nice complete package" (2007).

I do find the OOP paradigm a rather artificial and forced way of thinking. Don't get me wrong, as someone who spends a lot of time developing code I can fully grasp the advantages of it. For example in a current project I have a class called ToolBox that contains a method called AuditLog. I can insert a line into any other method that logs whatever information I desire into a database log without having to constantly re-write (or cut and paste) the code that controls the database insert. And by making this method available in a web service anyone else with access to the same network can use it too. But I don't really think in terms of objects - I just think about what actions I need to perform inside my system and write methods accordingly. Of course if I take into account that fact that I am using objects defined by other programmers - for example, there is a class in .NET and Java called XmlDocument - and can pass these in and out of my own methods then I am probably using OOP more than I give myself credit for, but that is not intentional.

I just think that if OOP really reflected how we think about the world then it wouldn't be such a struggle to come to terms with. "Novice designs are littered with regressions to global thinking: gratuitous global variables, unnecessary pointers, and inappropriate reliance on the implementation of other objects" (Beck & Cunningham, 1989). However I wonder if this comes from the fact that todays industry - standard OOP languages (C# / Java) are still have their roots in procedural programming. I also think that because I learned to program procedurally first and then came to OOP later I find it harder to make the transition. "Indeed many people who have no training in computer science and no idea how a computer works find the object - oriented model of problem solving quite natural" (Budd, 1987).

Refs:
Weisfeld, M. (2007) Moving from Procedural to Object-Oriented Development [Online] JupiterMedia Corporation, USA
Available from http://www.developer.com/java/other/article.php/10936_3304881_2 (Accessed 16th Dec. 2007)

Beck, K. & Cunningham, W. (1989) A Laboratory for Teaching Object Oriented Thinking [Online] Sigplan Notices Vol 24 No 10
Available from http://c2.com/doc/oopsla89/paper.html (Accessed 16th Dec. 2007)

Budd, T (1987) A Little Smalltalk [Online] Pearson Education Inc.
Available from http://www.iam.unibe.ch/~ducasse/FreeBooks/LittleSmalltalk/ALittleSmalltalk.pdf (Accessed 16th Dec. 2007)

Interpreted vs Compiled Code

Interpreted code executes much more slowly than compiled code, yet several systems use them extensively. The most well known are Visual Basic (it has two modes: interpreter mode and compiler mode) and JavaScript. Discuss the merits and weaknesses of interpreting, as opposed to compiling, and explain when and why they are used.

The major advantage of interpreted languages is the platform independence that they offer. The best example of this is JavaScript. Rather than forcing the user of a computer to install every single application that they wanted to use, applications written using JavaScript can be delivered through a web browser which installs all the necessary files (in reality downloading them to a temporary folder) without the user knowing anything about it. This means that applications can be easily made available to any operating system(such as Windows, Linux and Mac OS) that supports a web browser. The platform independence is one of the main reasons for the rapidly increasing popularity of using thin client web - based applications rather than thick - client desktop applications. Without interpreted JavaScript I would have to install one application to use Google Mail, another to use Expedia, another to use Lastminute.com (of course these websites probably wouldn't exist at all but you get the idea).

A second advantage is speed in which interpreted languages can be developed in. When working with JavaScript, for example, a programmer can embed some code inside a tag within some HTML and, assuming that they have the Java Virtual Machine installed, open the web page in a browser and see the results straight away. The developer can quickly change some aspect of the code, refresh the browser and the change will be instantly reflected. Contrast this with compiler language development. The coder must change their code and then run it through a compiler with the necessary parameters to produce a compiled assembly such as a Dynamic Link Library (DLL). The DLL must then be deployed to the correct location (perhaps another development server, and in fact this deployment can sometimes require another step because in a Windows environment all assemblies must be manually deployed to the General Assembly Cache) and finally the assembly must be executed by a program.

One of the most commonly cited reasons for using a compiled language rather than an interpreted language is speed - because an interpreted language has to be compiled each and every time it is run it creates an extra overhead. But in reality most modern web applications use a combination of interpreted and compiled code. For example, an e-commerce website is likely to use some embedded JavaScript to perform simple functionality at the client side such as validating the data inside a TextBox. This will create a small overhead in terms of needing to compile the code at every run time, but the tiny amount of code required to do this makes the overhead pretty neglibible. The more error - prone parts of the website - such as interacting with a database to store data or perform complicated searches - are likely to be performed by pre-compiled web services or stored procedures / functions on the database server itself. The IBM Information Centre states "we can see that it would make sense to use a compiled language for the intensive parts of an application (heavy resource usage), whereas interfaces (invoking the application) and less-intensive parts could be written in an interpreted language" (2007).

Refs:

IBM Corporation (2007) Compiled versus interpreted languages [Online] IBM Corporation, NY, USA
Available from http://publib.boulder.ibm.com/infocenter/zoslnctr/v1r7/index.jsp?topic=/com.ibm.zappldev.doc/zappldev_85.html (Accessed 16th Dec 2007)

Thursday 6 December 2007

Program verification in the future

How do you envision the ways program verification and performance tuning will be accomplished in the future? Will it still be a work of art? Will it always be the result of one's experience? Or do you forecast more and more automatic ways (algorithms?) to accomplish these tasks?

One vision of the future of program verification was presented by J. Strother Moore in 2005. Moore stated that "the verification community seeks to build a system that can follow the reason of an average human engaged in a creative, computation, problem". My reading around this subject indicates that the holy grail defined by Moore is a long way off yet. On his weblog Wesner Moise (who worked on Excel for six years at Microsoft) states "it seems that just a few people are driving the field of software verification forward".

The verification of programs is much more complicated than actually writing the programs themselves. Much of the literature on the World Wide Web comes from international conferences on the subject that are attended by professors from university computer science departments. The field is incredibly theoretical and requires an excellent understanding of mathematics to fully comprehend. While I can type 'C# hello world tutorial' into Google and see a list of about a million websites that will teach me how to implement this simple algorithm I am sure that a similar exercise in terms of program verification would be fruitless.

It is because of the inherent complexity that I think program verification will continue to be more about personal skill and experience rather than automatic means.
Algorithms do contribute - for example, Microsoft's Visual Studio development package pre - compiles code when you are writing it and identifies where potential problems are before letting you build the code into a library. I maintain though that if the methods of program verification were easy they would be taught at undergraduate level along with database design and software engineering and so verification will continue to come from testing and good design.

Testing only goes so far, though. A partner company recently released a new version of their indexing database and they fired about six million different test cases at it as part of the verification. All this means is that case six million and one - which could potentially occur to a customer - will cause the software to fail. The best form of verification in my opinion is to adopt correct design methodologies - for example a bottom up methodology, where software is built from smaller components that are themselves easily tested and easy to identify when they cause an error. This robust design comes from the experience of system architects and from the discipline of system developers in making sure that the code they write works properly when they write it rather than from automatic means.

References:

Moore, J. S. (2005) A Mechanized Program Verifier [Online] Austin, University of Texas
Available from http://www.cs.utexas.edu/users/moore/publications/zurich-05/talk.html (Accessed 6th Dec 2007)

Moise, W (2006) Program Verification [Online] Seattle, USA
Available from http://wesnerm.blogs.com/net_undocumented/2006/12/program_verific.html (Accessed 6th Dec 2007)

Sunday 2 December 2007

The Operating System is the Network...

Discuss the statement: "Global communication has developed to such a degree that the true operating system is the net itself, where the individual operating systems are just its nodes". Do you agree? What if the operating systems are heterogeneous (not the same brand)? How is such a "super operating system" managed? If you agree with this statement, can you list concepts and examples? If you do not agree, how do you regard operating systems and networks?

John Gage, one of the fifteen founders of Sun Microsystems, first coined the phrase "the network is the computer" and I agree; the true operating system is the net itself and the individual operating system (OS) are just its nodes.

At first glance it would seem that the heterogeneity of operating systems within the network does not matter - if I sign up for a free email account on the web I don't care which OS the provider uses. I am more interested in storage space and availability. However, Kropf, Plaice and Unger identify "a need for a web operating system which would make available to all sites on a network the resources available on that network" (1997). This moves us away from connecting to the network through the same node (i.e. the PC in my home) every time; instead we would just connect and let the network decide (through its 'super' OS) which node we connect to, and then have all the resources of the network - storage, communication - at our disposal. Of course Linux and Windows Server nodes will always exist so "should a given version not be capable of dealing with a particular request for a service then it can pass it on to another version" (Kropf, Plaice & Unger, 1997). The super OS must be able to recognise that a particular application is incompatible with the node I am working on and make another node available. Heterogenous OS matter very much in this vision of the future.

How is such a super OS managed? Web Operating Systems do already exist. One excellent example is YouOS (http://www.youos.com). The YouOS website defines the web operating system as the "liberation of software from hardware" (WebShaka Inc., 2006). Of course the need for hardware still exists but this will come from massive server farms rather than individual personal computers. The Web OS will automatically take care of storage and archiving of data, leaving the user free to concentrate on their computer - related work or leisure pursuits. One of the keys to management will be API's. Currently I have to go to one website to check my personal email, another to check my work email, a third to manage correspondence for my MSc. Through the Web OS I will be able to configure connections to all these resources and only ever have to log into the network to access them all. The super OS is managed through collaboration, open standards and the personal desire to design and create a utility or tool that will solve a problem, rather than large software companies dictating what we need. Of course, Google, Microsoft and the rest would like to be the company that controls our access to the network - but that starts to get a little Orwellian.

Refs:

Kropf, P. G., Plaice, J. & Unger, H. (1997) Towards a Web Operating System [Online] Association for the Advancement of Computing in Education, Toronto
Available from http://citeseer.ist.psu.edu/rd/3438463%2C508728%2C1%2C0.25%2CDownload/http%3AqSqqSqwww.iro.umontreal.caqSq~kropfqSqarticlesqSqa97-2.pdf
(Accessed 2nd December 2007)

WebShaka Inc. (2006) What is YouOS? [Online]
Available from http://www.youos.com/html/static/manifesto/what.html (Accessed 2nd December 2007)

Saturday 1 December 2007

Dual Definition of an Operating System

It is said that there is a dual definition of what an operating system must do. The first one is to present a "virtual machine" to the user which isolates him/her from the bare-bones hardware and is user friendly and effective. The other one is that it has to manage in the most efficient way the (always) limited resources of the computer system. It is also stated that these two goals often conflict. Do you agree? If yes, explain why and what you would do to resolve this. If you do not agree, give an example of some mutual cooperation. Does the concept of the process play a meaningful part in either or both of these tasks?

I agree that the dual operating system goals of presenting a user interface and managing hardware resources are in conflict with one another. The concept of a process which Brookshear defines as "a dynamic activity whose properties change as time progresses" is meaningful in both these tasks and the conflict between them both. The architecture of todays widely used operating systems is inherently flawed because of cross cutting. This is where wider, non - functional concerns such as system security or fault tolerance dictate how the different modules are accessed. In their paper A Layered Approach to Building Open Aspect - Oriented Systems Netrinant, Elrad and Fayad explain "a typical operating system will need sections of code responding to each of these requirements. That code is not usually segregate into neat modules, but instead interacts and tangles with the underlying functionality". The conflict between interface provision and management of resources arises because "in real operating systems architectural properties, like locking and component interaction schemes, tend to be hard-coded into almost every component of the system" (Lohmann & Spinczyk, n.d.). From a personal point of view, the integration systems that I build as part of my job are designed in using the object - oriented software methodology; I don't belive that today's operating systems completely adopt that approach as a minimum because they still contain a design legacy from previous incarnations of the operating system.
The solution to this is to change the architecture of the operating system. In 1978 Lauer and Needham identifed two rough categories of operating system - message based and process based. They argued that both architectures were interchangeable and neither had a performance advantage over the other. However in 1982 Stroustrup performed an experiment based on this argument and found that "a degradation in response time occurred in a process-based system when the job load and the low level scheduling policy were poorly matched". Because a process - based operating system cannot localise requirements to specific concerns they are difficult to understand, maintain and expand. Contrast this to a message based system; I use Microsoft Biztalk Server to develop integration workflow; a number of orchestrations pass messages between each other and it is extremely easy to document and follow this interchange. I am sure that a similar design technique would reduce the conflict that process - based systems face. Another well - publicised approach to changing the architecture is Aspect Oriented Programming. This can be summarised as the seperation of services - such as writing data to a file - and aspects - the less tangible ideas within an operating system such as security - being kept seperate, and different policies being applied to aspects at different stages of execution. This architecture eliminates cross cutting and the conflict between the layers of the operating system.

References:

Brookshear, J. G. (2007) Computer Science: An Overview (9th Ed.) Pearson Education Inc. Boston, MA, USA.

Netiant, P., Elrad, T. & Fayad, M. E. (2001) A layered approach to building open aspect-oriented systems [Online] ACM Inc.
Available from http://www.isis.vanderbilt.edu/sdp/Papers/Tzilla%20Elrad%20(Aspect-Oriented%20OS).doc (Accessed 1st Dec 2007)

Lohmann, D. & Spinczyk, O (n.d.) Architecture-Neutral Operating System Components [Online] Friedrich - Alexander University, Nuremberg, Germany
Available from http://www4.informatik.uni-erlangen.de/~lohmann/download/SOSP03_Lohmann.pdf (Accessed 1st Dec 2007)

Lauer, H. C. & Needham, R. M. (1978) On the duality of Operating System structures [Online]
Proc. Second International Symposium on Operating Systems, IR1A, Oct. 1978, reprinted in Operating Systems Review, 13,2 April 1979, pp. 3-19. Available from http://www.sics.se/~adam/pt/duality78.pdf (Accessed 1st Dec 2007)

Stroustrup, B. (1982) An experiment with the interchangeability of processes and monitors [Online] Bell Laboratories, NJ, USA
Available from http://www3.interscience.wiley.com/cgi-bin/abstract/113445202/ABSTRACT?CRETRY=1&SRETRY=0 (Accessed 1st Dec 2007)

Saturday 24 November 2007

Memories are cheap

With the current development in computer technology the industry will be soon able to produce huge memories (hundreds of terabytes) on non-volatile memory (one that is not erased when the power is switched off). Discuss how this will affect the architecture of the computer systems, any changes in the data processing and storage techniques, and ultimately the user.

The ability of the computer industry to produce huge volumes of inexpensive non - volatile memory is affecting the architecture of computer systems by taking the responsibility of data storage away from the client PC or device. Compainies across the world are implementing Storage Area Network (SAN) technology which allow remote storage devices to be seen by servers as being locally attached. Employee portals such as Microsoft Office Sharepoint Server 2007 allow employees to work on documents and save them online to the SAN; the need to provide staff with large amounts of local storage space is reduced. The existence of a SAN improves the disaster recovery process and legislation such as Sarbanes - Oxley has actaully forced companies to implement SAN technology to meet legal requirements.Looking beyond commercial enterprises to the gobal use of computers "the ready availability of high capacity, low cost storage systems has fueled the application of both SAN and NAS (Network Area Storage) architectures and made the Internet a high growth area with almost universal acceptance"(Grochowski & Halem, 2003).

Put simply, the concept of the desktop application is becoming increasingly redundant. People are increasingly using applications delivered through a web browser and allowing the providers of the applications to store all related data. Social applications such as Facebook, Flickr and Second Life make no demands on the user to save anything locally other than login credentials. Google Mail offers anyone 1GB of free email storage and allows us to access our mail from anywhere in the world that we can get online. The video games industry will soon stop making its users buy software from a retailer; relying instead on delivery via the internet. Valve Corporation uses its Steam service to let users download games, play them online and provide feedback. Popular client software such as Microsoft Office is being challenged by Google Docs which provides word processor and spreadsheet functionality online. No more need to make a Windows version of an application and a Linux version and a Mac version. Just make a web version. Maybe even leviathan operating systems such as Windows Vista and Linux will become redundant if all the user needs is a simple web browser to access and manipulate their data.All of these advances require vast amounts of storage capacity and is why, for example, "Microsoft is building a mammoth data center on a former bean field in the farming town of Quincy, Washington" (Carr, 2007).

References:

Grochowski, E. & Halem, R. D. (2003) "Technological impact of magnetic hard disk drives on storage systems" IBM Systems Journal, 42 (2), pp.338 - 346, International Business Machines Corporation [Online]
Available from http://www.research.ibm.com/journal/sj/422/grochowski.pdf (Accessed 24th November 2007)

Carr, N (2007) "Software companies are building their way to a very material future" [Online] London: Guardian News and Media Ltd. 2007
Available from http://www.guardian.co.uk/technology/2007/jun/28/comment.guardianweeklytechnologysection
(Accessed 24th November 2007)

RISC vs CISC

Compare and contrast CISC architecture and RISC architecture. Make sure to include the strengths and weaknesses of each as well as applications to which they would be most suited. You may also compare/contrast them with any architecture that may be considered as future replacements for either or both of these two.

The aim of CISC architecture is to do more with fewer instructions, and so the instructions passed to a CPU within CISC are much more complicated. Because of this CPU - oriented approach, the software compiler that essentially feeds the instructions has to do less work. Fewer, more complicated instructions need a smaller amount of storage space and therefore the computer needs less RAM. CISC architecture was therefore ideal in a time when the price of RAM was much more expensive when compared to today's prices, and when the software working with the architecture was inferior.

RISC architecture takes the opposite approach to CISC; it is designed to receive many more simple instructions. RISC allows a number of performance - enhancing techniques to be applied. One such is pipelining - "the technique of allowing the steps in the machine cycle to overlap" (Brookshear, 2007: 124). Another is caching, where because less CPU 'real estate' is required to process the much simpler instruction set, more general purpose registers can be made available which the CPU can use to store a copy of the main memory at any given time and save the time normally taken to communicate with main memory.

Both architectures have drawbacks. CISC allows for much less flexibility in software because the more complicated instruction set that it processes are more rigid in terms of allocating memory and processing programs. Also, CISC cannot implement pipelining because the instructions are always different lengths. While RISC is now more dominant in the processor market (and is the basis behind future processor technology as I will explain below) it is only as good as the software that supports it. Compilers must do much more to simplify the instructions to a level that RISC can process, and for this reason RISC struggled to gain a foothold in the 1980s and 1990s. Software companies such as Microsoft didn't back it - "Windows 3.1 and Windows 95 were designed with CISC processors in mind" (Chen, Novick & Shimano, 2000).

In the last five years the regular increase in performance brought about by each new release of a CPU has slowed, and manufacturers such as Intel and AMD are designing future processors to take advantage of multithreading rather than trying to squeeze in more instructions per clock cycle. Probably the best known of these next generation CPU's is Intel's Itanium which is built on Explicitly Parallel Instruction Computing (EPIC) architecture. This improves in RISC by making the compiler and processor work together to measure how many of the operations in a program can be performed simultaneously, and again much more responsibility for this is passed to the compiler. More space is again allocated to cache memory and also to adding extra cores to the processor. These extra cores allow multiple threads to be executed simultanteously - a huge advantage in the field of virtualisation, for example, because one operating system can be run from each core and the physical size of server farms can be greatly reduced. "Itanium's advantages in instruction level parallelism (ILP) and relatively small cores will give it a clear performance lead over its RISC and CISC rivals as semiconductor technology advances" (Feldman, 2006).

It is clear, however, that today's design techniques are limited; "developers of artificial neural networks argue that the basic CPU - main memory design model is inefficient when compared to the human brain because most of the connectivity is destined to be idle most of the time" (Brookshear, 2007: 125).

References:

Brookshear, G, J. (2007) Computer Science: An Overview 9th ed. Boston: Pearson Education Inc.

Chen, C., Novick, G. & Shimano, K (2000) "risc vs cisc" [Online]
Available from http://cse.stanford.edu/class/sophomore-college/projects-00/risc/about/index.html (Accessed 22nd November 2007)

Feldman, M. (2006) "Itanium's Growing Pains" [Online] Santa Fe, USA: Tabor Publications and Events
Available from http://www.hpcwire.com/hpc/640152.html (Accessed: 22nd November 2007).

Sunday 18 November 2007

Predicting the future

Attempt, in 350-500 words, to predict how a person holding the same position as you are in now, would describe his/her position in ten years' time. Will the position still exist? What will be similar to today, and what will be different? What training will the position require? What elements will be automated? If your job description is complex, simply choose one aspect of it. Whatever you predict, I expect you to substantiate your claim as well as you presently can. You are allowed to “peek ahead” in our textbook, quote outside references, or whatever else you think will convince us.

I fully expect the position that I hold to exist in ten years time. I foresee that the manual process of software development will become more automated but that the wider responsibilities of the position - supporting existing database applications, providing expertise to implement new projects and helping to facilitate business change - will still be the same a decade from today.Part of my job involves developing software solutions that handle date in different formats. Whenever I need to access new data or transform data from one format to another I have to write a new query which can be time consuming. I expect that advances in programming languages such as the Microsoft LINQ Project will revolutionise this. "In short, LINQ would meld queries of multiple data stores into a common development environment, transforming the way queries are programmed into code" (Schwartz & Desmond, 2007). Instead of having to use a separate set of editing tools or a separate syntax to deal with different data sources I expect to be able to work with all data through one interface and one common set of commands, and to save myself lots of time in the process.

My organisation has begun to implement hardware and software infrastructure to communicate with ContactPoint, a national UK database that will store information on children and their interaction with the authorities. As recently as twelve months ago few people in my organisation had heard of ContactPoint; now it is having to cope with issues such as data cleansing and transferring data to the national index via an API interface. The increasing desire of central government to make data available at a national level is one of the reasons why I don't see my position - which is to build and support the infrastructure - changing too much. I envisage, for example, that the government will want to build and index to store information about people claiming income support and another raft of changes to make this happen will need to be planned, implemented and supported.Brookshear (2007: 491) states “without a doubt, advances being made in artificial intelligence have the potential of benefiting humankind” and other revolutions in the field of computer science such as quantum computing and robotics have the potential to change the world. But I don’t think that these will trickle down to local government IT in the next ten years; I feel that my position is pretty safe from drastic change.

References:Schwartz, J and Desmond, M (2007) "Looking to LINQ: Will Microsoft's Language Integrated Query transform programmatic data access?" [Online] 1105 Media Inc, CA, USAAvailable from http://reddevnews.com/features/article.aspx?editorialsid=707 (Accessed 16th Nov 2007)

Brookshear, J. G. (2007) Computer Science: An Overview 9th. Ed.Boston: PearsonSamuel Sambasivam writes:

Saturday 17 November 2007

Hardware and Software; the chicken and the egg

In the early days hardware was always developed first followed by adaptations in the software, do you believe this is still the case today? Use examples to support your conclusion.

I think that adaptations in software have always been reactive to what is available from the hardware industry. Software has either been unable to reach its potential because of the limitations of hardware, or has been overwhelmed by developments. In the first days of computers being available in the home and workplace the available processing power was so small that it prevented software from achieving its goals. A study in Ireland in 1969 showed "that main factors holding back developments among in-house commercial users were the lack of computing capacity, the unsuitability of existing hardware" (Drew & Foster, 1994). The study showed that software packages such as payroll and economic modelling were in demand but the computer hardware failed to provide the resources to realise them.

The field of quantum computing is another example of where the hardware had to come before adaptations in the software. In 1994 Peter Shor described the first quantum computer algorithm but it wasn't until 2001 that IBM demonstrated the algorithm running on a quantum computer. The algorithm couldn't be tested and modified until the necessary hardware was available.The video games industry has historically always worked on a 'hardware first, software second' basis. The revolutionary Nintendo Wii is an example of this. It was around May 2004 that Nintendo first published news of its new hardware project and not until the end of 2005 that third party software developers began to announce what titles they had planned for the new system (NintendoRevolution, 2007). And now the rapid increase of available computing power - from single to dual to quad core processors - challenges the software industry to change to come to terms with these increases. "Some have suggested that the challenges of parallelism bestowed onto the software industry will have programmers looking into the abyss" (Tulloch, 2007). This is an example of where the roadmap for the development of hardware looks very clear - to keep increasing power - but the roadmap for software to harness the power does not.

I think that the development of hardware follows a very scientific and research based path whereas the development of software is much more down to the ideas of people and their ability to identify where a type of software can make a profit or improve peoples lives. The application of advances such as bioinformatics or neural linguistic programming may have great potential but they will always be reliant on the amount of processing power that the hardware industry makes available.

References:

Drew, E. and Foster, F. G. (1994) "Information Technology in Selected Countries" [Online] The United Nations University
Available from http://www.unu.edu/unupress/unupbooks/uu19ie/uu19ie00.htm#Contents (Accessed 17th November 2007)

NintendoRevolution (2007) Nintendo Wii Timeline [Online] NintendoRevolution.ca
Available from http://www.nintendorevolution.ca/07312006/10/nintendo_wii_timeline (Accessed 17th November 2007)

Tulloch, P (2007) "Discussing the many core future" [Online] Tabor Communications and Events, San Diego, USA
Available from http://www.hpcwire.com/hpc/1332461.html (Accessed 17th November 2007)