<mosaic.cnfolio.com>
Technology Exploration Project – M591

Privacy neglected in favour of cost cutting


A look into the loss of CDs containing UK child benefit data


By N. Butler, R. Jonkmann, B. Malata, B. Odulawa, J. Sinden, G. Stapylton, K. Stevens, R. Talukdar, M. Walton


1. Security of Personal Privacy

2. Sending the Data: A Legal Requirement

3. Removing the Sensitive Fields from the CDs
3.1 The Data in a Plain Text CSV File
3.2 Total Data Size Is Dependent On Compression Ratio
3.3 Determining How Much Data Is In Each Record
3.4 An Record Example Can Be Built
3.5 Almost 50% of the Fields Are Unnecessary
3.6 Data Can Be Removed In Less Than a Day
3.7 The Total Cost to Remove Data Is £650

4. Model to Assess the Cost of Sending Letters of Apology to All the Families Affected By the Lost CDs
4.1 Estimate Based On Direct Quotes
4.2 Estimate Based On the Resources Required

5. The Level of Protection Used To Secure the 25 Million Records
5.1 The Possible Attacks Gain Access to the Records
5.2 There Are Programs to Gain Access to Secured Data
5.3 A Model to Assess the Time and Resources Required To Gain Access to the Records
5.4 AZPR, the Perfect Program to Use as a Backbone for the Model

6. How Can You Protect Your Data?
6.1 Some Sensible Precautions
6.2 Suggested Data Protection Software
6.3 Password Use, Best Practices?

7. Conclusion



1. Security of Personal Privacy


The issue of privacy is currently one of the most debated subjects in society today. Protecting your personal identity is becoming increasingly important and not just in the UK. In fact, identity theft is the fastest growing crime in the U.S. today and is responsible for over 52 billion U.S. dollars a year.[1] So, to put a price on identity and privacy is not an easy subject to address, as some may consider their privacy to be of much higher value than others. Despite opinion playing a large role in such a debate, one organization putting a price on other people’s personal information is somewhat controversial. Unfortunately, at some point in early November, that is exactly what happened leaving 25 million identities unsecured and exposed.

On Tuesday 20th November 2007 it emerged that two password protected computer discs holding the personal information of all U.K. residents with a child under 16 had been lost during transit from the HMRC (Her Majesty’s Revenue and Customs) offices to the NAO (National Audit Office). These two CDs contained records of 25 million individuals, including fields for name, address, date of birth, National Insurance numbers and in some cases, bank details. Such private information falling into the wrong hands could have some very serious repercussions and despite their being no evidence to suggest this, at the time this article went to press, they have yet to be found. Later in the enquiry, evidence also emerged that the fields that included private information were not required by the NAO. To cut costs however, these fields remained in the records and were sent, and subsequently lost, along with both CDs. In an attempt to regain some dignity, letters of apology were later sent out to all families affected.

This article will look at what legislation is in place to justify and oversee any communications between the HMRC and the NAO and analyze what went wrong in this situation. Furthermore, research conducted will aim to calculate models to assess the costs required to remove the sensitive fields before sending, the subsequent time and money spent sending letters apologizing for the loss, and the time taken for an individual two crack the passwords that guard the information from prying eyes. It will also look at some methods of protecting data for companies and individuals to reduce the possibilities of data loss occurring. Finally, this report will aim to show that the cost of apologising for the initial mistake by far outweighs the cost of preventing it, and how if the price was paid out upfront to remove the sensitive files, a great deal of money, embarassement and personal data could have been saved.



2. Sending the Data: A Legal Requirement


The National Audit Office has long been considered necessary in order to oversee the spending and finances of all central government departments and agencies. Audits are conducted annually, and these findings are then reported to parliament to assess the efficiency and effectiveness of how they have used public money and hopefully, saving the taxpayer millions of pounds per year. [13] The HMRC is no exception to such securitization, and in fact there are 4 pieces of legislation used to define the authority of the NAO in such matters. [12]

Firstly, the Exchequer and Audit Departments Act, first produced in 1866 and then refined in 1921 (along with many amendments on the way) mostly refers to how public money should be spent, as well how such spending should be controlled. This governs what the NAO are looking for when they audit each department and agency. However, the NAO conducts its actual business of obtaining information on the basis of the National Audit Act 1983, and the Government Resources and Accounts Act 2000. The National Audit Act 1983 is described as "An Act to strengthen Parliamentary control and supervision of expenditure of public money by making new provision for the appointment and status of the Comptroller and Auditor General, establishing a Public Accounts Commission and a National Audit Office and making new provision for promoting economy, efficiency and effectiveness in the use of such money by government departments and other authorities and bodies; to amend or repeal certain provisions of the Exchequer and Audit Departments Acts 1866 and 1921; and for connected purposes".

On top of this, the Government Resources and Accounts Act 2000 is described as "An Act to make provision about government resources and accounts; to provide for financial assistance for a body established to participate in public-private partnerships; and for connected purposes". The acts cover clear political and functional objectives. The administration at the time hoped that these pieces of legislation, combined with resource budgeting, would bring improved management and increased value for money for the taxpayer. Efficiency is the key here, and in reality that is also the main objective of the NAO, to keep any government agency that spends any public money as efficient and effective as they can. All of the legislation, and indeed the results of the NAO (providing that they are positive) help to provide a view of an ‘open’ government that is able to fully disclose the ways they spend tax payers' money. This way, the government can then at least claim to the general public that their taxes are being spent to their full potential.

In order to achieve this however, the NAO need to step in and conduct their audits. The legislation effectively gives them the right to gain access to information about all government departments and subsequent information leading to improving efficiency and effectiveness. Any information that is necessary to determine if the public money is being well spent is necessary to be transferred to the NAO. [19] In many cases, including our case of the HMRC, only a sample of data is necessary and as long as this sample represents a reasonable sample of the overall population, it will suffice. This, along with information referring to finances, can then be used by the NAO to assess that the correct objectives are being achieved.

So, it is clear that the NAO needed access to such information from HMRC, but there is nothing to suggest the information had to be sent. In fact, all that is required is “access” to the information, which could have theoretically occurred on site. For some reason however, an external audit was deemed appropriate, and therefore, a sample of data would need to be sent to the NAO so they can then conduct their audit at the NAO office. In such a case, the NAO can then determine what format they need the data to be in. The NAO asked for a specific data to be sent (namely a sample which excluded many if not all fields of personal information) [11]. The HMRC responded to the request by sending the data in electronic format (the 2 CDs) but neglected to reduce the data to the required sample, but instead were willing to take the unknown and potentially very large financial risk of leaving it in. This risk can be compared with the known quantifiable cost that was deemed too high to remove the data.



3. Removing the Sensitive Fields from the CDs


To remove the sensitive fields containing private information was deemed too costly to be implemented by the HMRC. To give an idea of exactly what this would entail, research was conducted to produce a model estimating what and who was necessary to conduct this task and just how much it would cost.

3.1 The Data in a Plain Text CSV File


The fact the electronic format was a CD is a separate debating point altogether, but the main point to consider here, is what exactly was on the CDs themselves. From a list of email conversations [11] the information stored on the CDs is described as a ‘data scan’. It is known that the information is of multiple records, each with multiple fields. For this reason, it is fair to assume that the format of such is a database of some variety. Therefore the ‘data scan’ can be interpreted to mean a database query of some kind, representing either all or at the very least a sample of the aforementioned database. According to the emails, the data is arranged in 100 separate files, which are then spread across the two CDs. In cases such as this, it is likely that the files are plain text files, specifically comma separated values files (CSV).

3.2 Total Data Size Is Dependent On Compression Ratio


To reduce the overall size of the data, it is known that a method of compression took place. The compression method used is assumed to be zip format, as this is referenced in the email conversations previously described. Zip compression uses a combination of Huffman coding and LZ77, which is a referencing algorithm. [5]

LZ77 works by looking for duplicated strings of characters in a file. When a duplicate string is found it is replaced with a reference to the first occurrence of the string, the reference being much smaller than the string itself. Very repetitive files, with many duplicate strings, make good use of the LZ77 compression and can be typically compressed by about 200 times. It is fair to say that the records on the CDs would have contained a fair amount of repetition. For example, the street address for everybody in the same street will be identical (bar the house number).
The Huffman coding algorithm works by using common characters and reducing their bit size. Generally speaking, each character usually requires about 8 bits. Huffman represents more common characters (such as 'e' or ‘a’) with fewer bits and represents less common characters with more bits. In reality, this gives a net result of reducing the amount of bits required to represent the same information. From research conducted for this article, simply using Huffman coding alone resulted in about two times compression. Using LZ77 and Huffman on similarly structured files that we assume to be present on the CD, a compression rate of between 10 and 50 times resulted. In our tests, adding password to the zipped file had no bearing on its overall size.

3.3 Determining How Much Data Is In Each Record


If it is assumed that the removal of the unnecessary data was to reduce the overall file size, as described in the email conversations, it can then be inferred that the data is spread across two CDs due to the overall size of the data, as opposed to other reasons such as security or data structure. Knowing the size of a standard CD (700MB), an educated estimate that the overall compressed data size must be between 700 MB and 1,400 MB can be made. (The data must have filled one CD otherwise there would be no need for two, but the capacity used on the second CD is unknown). Using the figures of compression from the previous paragraph, uncompressing would give an uncompressed data size of between 7 and 70 GB, with a median of 32 GB. The total data scan contained 25 million records, so using simple mathematics, it can be determined that the record size would be between 292 to 3000 Bytes per person

3.4 An Record Example Can Be Built


Based on information contained in the email conversations and also from the Child Support Agency application form [3], the information contained within the data scan can be inferred. By estimating the average size of each information element, our model shows that there were approximately 800 bytes of data per record. This totals to 20 GB of uncompressed data, which falls within the previously calculated range.

Example of what a record might look like on the CDs, and what information was needed by the NAO
Photo caption


3.5 Almost 50% of the Fields Are Unnecessary


The above diagram of a record shows the fields probably required by the NAO in order for them to carry out their audit. It is fair to say that most of the fields that are not required could be described as private or at the very least sensitive information. From the email conversations, it is known that the parental information and the bank account details were not required by the NAO. In fact, the NAO requested that they be removed (not for the sake of privacy however, but to reduce file size). From our model of what each record looked like, it can be concluded that maybe as much as 50% of the information sent in each record was unnecessary. If it is assumed that the files are of CSV, the fields can simply be taken out, reducing the record to a smaller size, simply with fewer fields.

Photo caption


3.6 Data Can Be Removed In Less Than a Day


There are many ways to remove unwanted data from a record. Assuming that the data was stored in a text file of some kind, and efficiency was a major factor, it would be possible to use a simple Perl script to extract the necessary data from the given dataset. Perl is a computer scripting language and is generally used because of its speed and powerful text processing capabilities. Researchers for this report generated a script to complete this task in about 15 minutes. If extra time for testing and verifying the functionality of such a script is assumed, the time needed before the actual extraction process can start, is approximately 1 to 2 hours at a premium. A snippet of code showing the script to extract the data created for this report can be found in appendix A.

To estimate the time needed for the extraction process to complete, the script was tested with example data similar to that assumed to be on the CDs. To carry out the experiment more accurately, two computers were used. On the one hand an old Intel Centrino Laptop with 1.3 GHz and 512GB RAM and on the other a fairly new IBook with a dual core processor, 2.16 GHz and 2GB RAM. The diagrams below show how the record size affects the time needed for the whole process, for each system:

Time to extract necessary data, Centrino


Time to extract necessary data, Dual Core


As the curves are nearly linear, the time to process all 25 million records can be derived from the example data. The table below shows the times calculated for each estimated record length according to the computer used. Therefore, the best case would be around 22 minutes and the worst around 2 hours and 10 minutes.

Photo caption


Taking the 800 byte size deducted from the email conversations and a likely figure for an average record size, extraction would take a bit over 60 minutes on the slower machine. It is worth noting that the specification of the hardware used in this model is not exactly state of the art, even in the case of the dual core. It might be assumed that a professional service working on behalf of the government may have access to slightly faster, more technologically advanced machinery. This would likely reduce the time taken for the overall extraction.

3.7 The Total Cost to Remove Data Is £650


The above models determine that the extraction process would take 1.5 to 4 hours. Even with a more complex structure of the files containing the data, there remains no doubt that it can be done in less than a day. Considering that an external consultant firm would be assigned to do it, the cost to remove the unnecessary data would be £650 (Average quote, see Appendix E). From the investigations carried out for this article it is fair to say that the data extraction could be carried out on a normal PC, so the costs for the processing time and power can be neglected.



4. Model to Assess the Cost of Sending Letters of Apology to All the Families Affected By the Lost CDs


Once the CDs had been lost, and the mistake had been admitted to the general public, the government took the wise step of sending out apologies to everyone involved. The interesting point to consider is that privacy was ignored due to cost, but saving face was not. Research was conducted in order to determine just how much it costs to send a letter to every family affected. In order for this to be carefully assessed, a couple of factors need to be taken into account. The model itself could be assessed in a series of ways; firstly the costs could be calculated by each resource necessary in the process, or secondly; by getting a direct quote from various companies that deal with direct mailing. With both modelling methods, the amount of recipients is of utmost importance when attempting to calculate an accurate estimate. According to quarterly statistics from the HMRC [7] taken in August 2007, there are 7.5 million families in receipt of child benefits, with 13.3 million children claimed for. However, according to various newspaper articles and the media over 25 million records have been lost. This begs the question of how these 25 million records came about, as there only seem to be 13.3 million children for whom benefits are being claimed for. This in itself may raise concerns about benefit fraud and also the possibility of duplicate records. For the purposes of this model however, the figure used throughout this article of 25 million records will be used, with 7.5 million families affected. Therefore, it can be assumed that 7.5 million letters were required to be sent.

A model such as this can be very complex and extremely difficult to get accurate results. There are factors that are either unknown or remain very difficult to get accurate readings for. The quotes that follow contain figures that are available to a major retailer, but do not necessarily reflect the prices that the government might pay for the same resources. Labour and manpower prices are also very difficult to express; not only is it hard to know how much is required, but it is just as tricky to assess how much the government are paying people for jobs such as this without direct experience. For this reason, they have not been included in the following estimate.

4.1 Estimate Based On Direct Quotes


In order to get a more accurate estimate of quote for this volume of data to be sent out, various companies were approached to provide a quote as part of the research for this article. Companies that were contacted all dealt with bulk mailing. After carefully sifting into various quotes, the average cost for the government to use a Bulk Mail shot company came to around £0.27 a letter, and this includes the printing, handling, envelopes, and postage. More detailed explanation of these quotes can be found in Appendix B. Using this method as a model, the likely amount of tax payers money used to apologise for the missing data would extend to approximately £2,025,000 (0.27 x 7.5 million). Whichever method was used, it can be estimated that the cost of posting the letters is in the region of 1.5 million to 2.025 million pounds.
Photo caption


4.2 Estimate Based On the Resources Required


Creating a model for assessing the costs based on the individual resources needed might not be able to present an entirely accurate or effective conclusion, but it will be able to provide a general overview of the costs that would be incurred in the process. It is important to remember that the government is likely to get highly discounted prices on many resources in the following diagram. It is also debatable as to whether the 17.5 value added tax applies in this context.

From the figures below, it is possible to see that the cost of sending out 7.5 million letters of apology will cost around £2 million pounds, and furthermore, this cost is not including any sort of damage (e.g. wrong addresses, insufficient resources due to unplanned failures). Other factors not assessed include the labour and manpower involved in carrying out each process, e.g. feeding the printer continuously with sufficient paper, applying the stamps to the envelopes, packaging the letters etc. A more detailed explanation of these costs are shown below, along with the costs to a supplier to provide the resources:
Envelope Costs
Cost Price To Major Retailer
1000 Envelopes = £6.52
7.5 million envelopes (7,500 x £6.52) = £48,900
VAT @ 17.5% = £8,557.50
Total Cost = £57,457.50
Paper Cost
Cost Price To Major Retailer
5 reams of 500 sheets = £8.05
7.5 million envelopes (7,500 x £8.05) = £60,375
VAT @ 17.5% = £10,565.63
Total Cost = £70,940.63
Postage Cost
Using Royal Mail
10,000 x 2nd Class Stamp Roll = £2,400
7.5 million stamps (2400 x 750) = £1,800,000

Printing Costs

The cost of finding a suitable printing quote for printing the volume needed varies according to different factors. These factors include the printing capacity of the printer itself, the ink to be used, and the time needed to print the amount needed. The Xerox Docu-Print 180/180mx printer is an example of a high volume printer that is capable of printing up to 6 million pages per month. The cost of a suitable toner for this printer is the Xerox Black toner, which has an average yield of 750,000 pages. [23]

Using this method, the overall cost came to £1,928,398.13

Photo caption




5. The Level of Protection Used To Secure the 25 Million Records


From the email conversations [11] it can be deduced that there is a password (or indeed, passwords) associated with the files stored on the CDs (pg 9 second part of text). Also from this text, it can be derived that based on previous transactions of a similar nature, the files were stored as "100 zipped files on 2 CDs" (pg 9 first part of text). There are several ways that the passwords could be implemented for protecting this kind of information. In reality, there are 3 possible options that could have been chosen. Firstly, there could have been one single global password that controlled access to both CDs and all data within them. Secondly, there may have been a different password for each CD, that allows access to the zip files kept within each, or finally, there may have been an individual password for each and every zipped file stored on the CDs. It is assumed that the latter is fairly unlikely due to the fact that even if this system were to be implemented, simply to access these files (even for the most vigilant authorized user with the correct passwords) would require a significant amount of time and effort. Despite this, even if this was the case, if just a single one of those passwords were to be broken, and the data protected by it was accessed, the result is still a significant loss and misuse of data.

In order to crack any password, there are a number of software utilities available to a potential hacker in order to gain access. Common software titles include Jack the Ripper, Picozip, and AZPR which offer password cracking capability without much prior knowledge or expertise from the user. These are readily available to download in various forms for free from the Internet. For the purposes of this report, research was conducted into how long it would take and the resources needed to crack a likely password or passwords used to protect the information on the CDs. This depends on a few things; namely the length of password, the type of password, the amount of passwords, and the hardware used to attempt to crack them.

5.1 The Possible Attacks Gain Access to the Records


Brute force

As suggested by the name, brute force password attacks involve bombarding a password input with every possible combination of characters available. Assuming there is adequate time and processing power, cracking a password with a brute force attack is eventually inevitable. Brute force attacks require a fair bit of processing power as the passwords need to be actually generated by the cracking program, and then tested on the protected file. Although a brute force attack will always succeed given time, it excels against shorter passwords. To a certain extent the complexity of a password does not matter, only the number of digits involved increases time. Brute force attacks can be made more efficient by attacking the more commonly used characters first. For example, standard lowercase alphabet characters are the most commonly used digits in a password. So the better designed brute force attacks cycle through every combination of lowercase letters before exploring the less frequently used ascii characters which are also available for password use. Generally speaking, to protect your system against a brute force attack, the longer the password is, the longer it will take to crack,

Dictionary/Wordlist

Dictionary attacks work by testing a list of pre-defined values (typically stored in a text or csv file) with the protected file. The main difference between a dictionary attack and a brute force attack is the speed. Since the passwords do not need to be generated, it drastically reduces the processing power needed. However the obvious disadvantage is that the attack is limited by the number of pre-defined passwords that are stored in the csv file. Dictionary attacks are extremely effective against weak passwords, independent of length. There are many rough word lists out there which include literally millions of entries. These include pretty much every word from every common language, many names of people and places, and combinations of these words in mixed cases with numbers appended. The success of a dictionary attack is directly linked with the strength of the password being cracked and the quality of the wordlist being used. To protect a system against dictionary attacks, it is common practice to include numbers and characters in the middle of words, which in effect give the password an illusion of being a jumble of letters, numbers and characters.

5.2 There Are Programs to Gain Access to Secured Data


Initially the task of creating a program to crack passwords may seem daunting, however it is relatively simple. A basic cracking program will include the two main attack methods previously mentioned, a dictionary attack and a brute force attack. Little programming knowledge is required to create a program to read in values from a CSV file. Again it does not take much programming language to create a program which cycles through every single combination of ASCII characters, to create a brute force attack. The difficulty comes in efficiently interfacing the created cracking program with the protected zip files. There are various methods but the simplest would be outputting a command line instruction to extract the zip file with the generated password. However, zero programming knowledge is actually required to crack zip files. There is a wide array of freeware and commercial programs available on the internet. Most of these programs are advertised as “password recovery tools”, however they can be used to break passwords without authority. Most of these programs have the same features: brute force, dictionary and smart brute force.

Example programs are listed below:

5.3 A Model to Assess the Time and Resources Required To Gain Access to the Records


For the model to be created, some realistic assumptions must first be made. The first assumption is a strong password or passwords were used, which would therefore be protected against a dictionary attack. (This means the passwords are not words that can be found within the dictionary, i.e. are split up with a series of numbers, or characters). This is common practice for any company so it is likely that the government will also abide by this rule. The second assumption is that the password is 8 (or more) characters long. This is generally termed a strong length password, meaning brute force attacks cannot feasibly be carried out by a single standard pc, and the time taken to crack them using this method is drastically increased. The third assumption is that each zip uses the same password. The reason for assuming this is that it is highly unlikely that there will be 100 unique strong passwords that are 8 characters long. It would be impossible for any one person to remember these passwords and would add a lot of time in administration when accessing and creating the zip files.

5.4 AZPR, the Perfect Program to Use as a Backbone for the Model


To help construct a model for calculating the cost and time required to break the password(s) used on the CDs, tests were conducted using the free version of AZPR. (It is important to remember that this software is available to anyone who has an internet connection, and of course, a computer). The test machine was an Intel Core 2 6400 running @ 2.13ghz per core, with 2GB DDR 800 RAM. It is worth noting that only a single core was being used during the test. Below is a screenshot of the test in progress, and further screenshots can be found in Appendix C.



The table below shows the actual results from running AZPR on the test machine against various passwords.

Photo caption


As the trial version of AZPR was limited to 5 digit passwords, calculations based on the previous results were used in order to work out the values for longer passwords. There are 95 possible characters that may be used when pass wording the zip files. It appears that AZPR is able to try roughly 17,000,000 passwords per second on the test machine. It takes 3ms to test the initial password. Using this information, the table below was created.

Photo caption


The average time it would take to crack an 8 digit password with the system used in the tests was 54 201 hours. To put this into perspective, this is over 6 years. However AZPR allows for a very easy set up, meaning that it can be run on multiple machines without duplicating password attempts. According to the results, to crack one of the passwords within a week of solid brute force attacking, it would take 160 dual core machines of equal or greater speed than that of the test machine. Although this seems like a lot, it must be remembered that successful cracking would result in access to 25 million records, each containing personal information which where relevant, includes bank details. From our results, we can also put together a model for the cost of achieving such a goal. Assuming the password is longer than 5 digits, the full version of AZPR (costing £25 and available from http://www.elcomsoft.com/azpr.html), or a similar brute force password system (that allows zip passwords to be cracked) would be necessary.

Using the same type machine as used within the test scenario:
Cost of cheap dual core (2.2ghz per core) system = £217 available from (http://www.overclockers.co.uk/productlist.php?&groupid=43&catid=964&sortby=priceAsc)
160 of these systems at £217 = (£217 x 160) = £34 720
Total cost to crack password in less than one week including password cracking software (AZPR) = £34 745

So, it is fair to say that one week, and £34,745 later, at least some of the data could be accessed without authorization. It is fair to say that any breach of data as sensitive as this can be deemed highly serious, and a breach of security. If a separate password protected each CD, it is fair to say that the time would be doubled using the same hardware in order for all the information to be accessed.



6. How Can You Protect Your Data?


For an Individual

To protect your personal and private information and to keep it that way, following the steps below should help:

The above points are the best way for an individual to protect their data. Strong passwords are passwords that do not contain personal information, contain more than just lower case letters and are as lengthy as possible. To help make a password stronger, do not contain dictionary words on their own, and break up the word with numbers, rather than putting numbers at the beginning and end of the word. As described earlier, this can help fend off a dictionary attack. For example, according to a study carried out by In Technology [8] the 8th most commonly used password is monkey. To make this password stronger, split the word up with numbers, but do not split it up into other dictionary words. Try using mo9nkey5. This makes the password two characters longer, and splits it into two words (that would not appear in a dictionary) separated by two numbers, also increasing the character set used within the password.

To help protect from remote attacks, there are several free anti-spyware and firewall programs easily accessible to the public. One such example of a free anti-spyware program is AVG, which also offers free software firewall package [2] as well as anti-virus software that can be readily updated. Hardware firewalls exist in most modern routers, which are often provided when you purchase an internet connection. If you don't already have a hardware firewall, they are relatively cheap and easy to find, and are nothing more than a small box that sits between your modem and your computer system. It is important to remember that threats to personal data are consistently changing and evolving. Many software packages designed to protect data come with the ability to update to help protect against the latest attacks, whether this be the definition file for your anti-virus, or the firmware in your router. To keep protected, it is important that regular updates are completed.

For a business

For businesses to protect their data, the points mentioned above for the individual all still apply, and there are several more methods they should consider, or should already be using:

6.1 Some Sensible Precautions


The most sensible precaution to use is to keep your password secret at all times, that is the intention of a password, a piece of information known only to you that will allow you access to information you do not want others to see. People often use passwords that they can easily remember, but unfortunately these passwords are commonly weak and very insecure. On top of this, and even at an individual level, frequently backing up your data, and backing up in large quantities, can save serious headaches if a hard drive lets go.

6.2 Suggested Data Protection Software


For an Individual

Cypherix is a software developer focusing on encryption software. Their product, Cryptainer, is free disk encryption software which creates containers within a storage medium which stores encrypted and password protected files. The software can be loaded or unloaded as the user needs, meaning that the drive will be hidden when it is unloaded, making it even harder to access the encrypted data. The Cryptainer drive also allows the user to access sensitive programs within the drive, restricting access to programs that may harm your computer, or may contain sensitive information. Cryptainer also offers the user the ability to encrypt files and send them as email attachments, which the receiver can decrypt using Decypher-IT, another free utility from Cypherix. Cryptainer is available as a free download from the company's website, http://www.cypherix.co.uk.

For a Business

For businesses, Grid Data Security is a variation of a One Time Password system, and the creators are currently developing specific modules for it for use within the wide markets. Perhaps most notable, they are developing a version of Grid for Governmental use, Grid Gov [21]. This would be particularly useful in the situation we have encountered here. One Time Passwords are different from the more regularly used Static Password system, that we are all so familiar with. As a part of the login process, the server sends the user a "challenge", which may be in the form of a randomly generated number, or may even be something significantly more complicated. When the user receives this "challenge" they enter it into the corresponding OTP generator, which may be a physical hand held device, or a piece of software which generates the returnable OTP. The server follows the same process and compares the entered password to the one generated server side and if they match, the user is authenticated. The way that Grid alters this is that it provides the user with an interface that allows them to disguise their currently used passwords that aren't already used in OTP systems. It does this by showing the user a graphical keyboard with numbers in each of the four corners of the key. Upon configuration, the user can select the sensitive region of the key, and when the user enters their password, they click on the letters of the password in the correct order. The software then enters the number that is in the sensitive region of the key. The user can also type this number in or use the provided keypad for even more security. The numbers in the corner of the keys are randomly generated and the software decodes these each time to produce the password to the application it is required for. Further information about this product is available from http://www.syferlock.com.

6.3 Password Use, Best Practices?



Photo caption


From this table we can see that just increasing the length of your lower case password exponentially increases the time needed to try all possibilities with a brute force method. Adding in the extra complexity of characters other than lower case increases the processing time further and making passwords over six characters long impractical to break using a brute force method.



7. Conclusion


Based upon the results collected from the models produced for this report, it is fair to say that an extensive amount of money and time has been wasted. Not forgetting of course, the extensive amount of private information that has been lost. Many would agree that to atone for a mistake is the right course of action, and it is likely that some would form an argument against the tax payer contributing for an apology to a mistake that should never have happened. Still, considering the procedures and policies that should have been followed when dealing with sensitive data, there is no doubt that it would show the government in an even poorer light had they did not personally apologized to each and every family. Tempting alternatives must have been simply to broadcast the message of apology over the news or through the media, but it shows an air of moral correctness that the recognition of a serious error has been made, and an attempt to personally apologize for the mistake has been made.

Still, it certainly has come at a very steep price and the government has no excuses for such a huge mistake. Judging by what has been calculated, the cost of removing the data before it was sent could be as little as £500 in the right hands. That seems like a good investment, and judging on the evidence of this report it definitely would have been. Considering the fact that when the accusing finger is pointing in their direction, they are willing to spend in excess of £1.5 million apologizing, something doesn’t seem to add up. In some ways, the HMRC are justifying the existence of NAO, who are in place to control government spending. If they had spent the £500 removing the data, it would have saved the whole problem, even if the CDs were still to be lost. Undoubtedly, putting a price on the privacy of individuals is a difficult task, and could possibly present a new article in and of itself. However, £500 is probably a fairly low price, considering that one record acquaints to 0.002 of a penny.

Despite all this, there is no excuse for the way the data was sent, and indeed lost. Two CDs using a stamp and an envelope is somewhat primitive, especially when dealing with a such a large amount of sensitive information relating to so many people. The fact remains that someone could have come and picked up the CDs or the data could have been sent electronically where it could have been secured using encryption. It can be assumed with a far greater level of confidence that if the information was sent using the recommended techniques described in this article, the data would be much safer. It would appear the only encryption that took place was the address on the envelope.

Clearly, the postman didn’t have the key.



References


[1] About.com (2007) General Identity Theft Statistics Retrieved December 2007 from http://www.auto-theft.info/Statistics.htm
[2] AVG Free Advisor (2007) Free Basic Protection From AVG Retrieved December 2007 from http://free.grisoft.com/doc/welcome/us/frt/0
[3] Child Support Agency (2007) Application for Child Maintenance Retrieved December 2007 from http://csa.gov.uk/en/PDF/forms/pen/csf001_print.pdf
[4] Cypherix (2007) Encryption Software from Cypherix Retrieved December 2007 from http://www.cypherix.co.uk/
[5] Deutsch P. (1996) DEFLATE, Compressed Data Format Specification (RFC 1951) Retrieved December 2007, from http://tools.ietf.org/html/rfc1951
[6] Elcom Soft (2007) Advanced Zip Password Recovery Retrieved December 2007, from http://www.elcomsoft.com/azpr.html
[7] HM Revenue And Customs (2007) Child Benefit Quartely Statistics Retrieved December 2007 from http://www.hmrc.gov.uk/stats/child_benefit/aug-07.pdf
[8] In Technology (2007) Top 10 Most Common Passwords Retrieved December 2007 from http://www.intechnology.org/top-10-most-common-passwords
[9] Last Bit Software (2007) Zip Password Retrieved December 2007 from http://lastbit.com/zippsw/default.asp
[10] Lost Password (2007) Zip Key Retrieved December 2007, from http://www.lostpassword.com/zip.htm
[11] National Audit Office (2007) Child Benefit Data (Email Conversations) Retrieved December 2007 from http://www.nao.org.uk/publications/nao_reports/07-08/child_benefit_data.pdf
[12] National Audit Office (2007) Questions About the National Audit Office Retrieved December 2007 from http://www.nao.org.uk/about/faqs.htm#NAOauthority
[13] National Audit Office (2007) The National Audit Office Retrieved December 2007 from http://www.nao.org.uk/
[14] Office 2 Me (2007) A4 Copier Paper Quote Retrieved December 2007, from http://www.office2me.co.uk/q-connect-a4-copier-paper-80gsm-white-pack2500-p-17062.html
[15] Pozadzides, J. (2007) How I'd Hack Your Weak Passwords Retrieved December 2007 from http://onemansblog.com/2007/03/26/how-id-hack-your-weak-passwords/
[16] Pico Zip (2007) Recovery Tool Retrieved December 2007, from http://www.picozip.com/prt/index.html
[17] Pits Hanger (2007) Gummed Mailing Wallets Quote Retrieved December 2007, from http://www.pitshanger-ltd.co.uk/index.php?page=products&catID=1&typeID=14
[18] Public Audit Forum (2002) Freedom of Information and Public Sector Audit Retrieved December 2007, from http://www.public-audit-forum.gov.uk/PAFFOIFINALVERSION.pdf
[19] Publications and records (2007) Commons Hansard Debate Retrieved December 2007, from http://www.publications.parliament.uk/pa/cm200708/cmhansrd/cm071120/debtext/71120-0005.htm
[20] Royal Mail (2007) Roll Of Stamps Quote Retrieved December 2007, from http://www.royalmail.com/portal/rm/shop?catId=9300091&pageId=shp_prdlist&category=cat45940018&cartPreviousStatus=true&_requestid=1326
[21] Sypher Lock (2007) Grid Data Security and One Time Passwords Retrieved December 2007 from http://www.sypherlock.com
[22] White, Fidelma. Hollingsworth, Kathryn (1999) Audit, Accountability and Government Retrieved December 2007, from http://books.google.com/books?id=bFVkhJu8a5MC&pg=PA37&lpg=PA37&dq=exchequer+and+audit+departments+act+1866&source=web&ots=rrlGaD2Xhc&sig=J30V4VY4V8PCmRGNwV1f_oZ7Oos#PPA35,M1
[23] Xerox (2007) Printer Search Results Retrieved December 2007, from http://www.xerox.com/xrx-search/MainSearchServlet?searchString=5R161)&searchOption=Option1_NULL&Xcntry=USA&Xlang=en_US&gateway=%2fgo%2fxrx&hostName=http%3a%2f%2fwww.xerox.com



Appendix A


Code snippet showing the script used to remove the unwanted fields

# try to open the source files and print out an error if not possible
open (SOURCE_FILE, $sourceFile) or die "\nERROR: unable to open \"$sourceFile\"!";
open (TARGET_FILE, ">".$targetFile) or die "\nERROR: unable to open \"$targetFile\"!";
# read each line of the source file
while (<SOURCE_FILE>)
{
   # extract the fields of each record and store them in an array
   my @array = split (/;/, $_);
   # generate a new record containing only the fields needed.
   my $newRecord$array[0].";".$array[1].";".$array[2].";".$array[3].";".$array[4]."\n";
   # write the new record to the target file
   print TARGET_FILE $newRecord;
} # do this until the end of the source file is reached
close SOURCE_FILE;
close TARGET_FILE;


Appendix B


Postage Model

Internal Creation Method

Envelope Costs
Cost Price To Major Retailer
1000 Envelopes = £6.52
7.5 million envelopes (7,500 x £6.52) = £48,900
VAT @ 17.5% = £8,557.50
Total Cost = £57,457.50

Paper Quote
Cost Price To Major Retailer
5 reams of 500 sheets = £8.05
7.5 million envelopes (7,500 x £8.05) = £60,375
VAT @ 17.5% = £10,565.63
Total Cost = £70,940.63

Postage Costs
Using Royal Mail
10,000 x 2nd Class Stamp Roll = £2400
7.5 million stamps (2400 x 750) = £1800000

Quote from Pitshanger Envelopes: (http://www.pitshanger-ltd.co.uk/index.php?page=products&catID=1&typeID=14)
A box of envelopes containing 1000 envelopes each costs = £ 10.15
Therefore, cost of 7.5 million envelopes (7,500 x 10.15) = £76,125
VAT @ 17.5% = £13,321.88
Free delivery = £0
Total Costs = £89,446.88

Quote from Office 2 Me website (http://www.office2me.co.uk/q-connect-a4-copier-paper-80gsm-white-pack2500-p-17062.html)
Box of 5 reams, each ream containing 500 sheets (2500 sheets) = £9.79
7.5 million paper (9.79 x 3000) = £29,370

Direct Mailing Method
Companies contacted were:

Quality Mailing Solutions
Who gave a quote of (£0.20p per item for postage and £0.07p per item for printing to production), therefore total costs = (0.20 x 7,500,000) + (0.07 x 7,500,000) = £2,025,000 (at £0.27p per item)

Rocket mailing
Envelope with return address printed on = £9/ 1000
Therefore for 7.5million people = 9 x 7500 = £67,500

Printing on letters and folded into envelopes = £22.50 / 1000
Therefore 7.5million letters = 22.50 x 7500 = £168,750

To enclose one item to an envelope = £9 / 1000
7.5 million letters in an envelope = 9 x 7500 = £67,500

Royal Mail
Postage using Royal mail mailsort service = £1,122,975.00
Giving a Total cost = £1,426,725 (at a rate of approximately £0.20p per item).
Plus a cost of £500 was told to be added for contigencies, thereby bringing the total to = £1,427,225.

Holmdale Mailing
Description: DL Print & Mail
Address file Supplied
Letter heading: 80gsm white bond
Printing: Black one side from supplied template
Enclosure: Machine enclose into a white window DL envelope
Dispatch: To be advised
For 7.5million = £67,500 at a unit price of £0.09
Price excludes postage
The best priced postage will be the Royal Mail mailsort service = £1,122,975
Total = £1,190,475 (approximately £0.16p per item)

Mailbox inc.
A rate of about £0.45p per item was given, bringing their total to = £3,375,000

Appendix C


Diagrams showing the time curve to extract the necessary data from the sample data set

Time to extract necessary data, Centrino

Time to extract necessary data, Dual Core




Password Length Only Lowercase All Characters
3 characters
4 characters
5 characters
6 characters
7 characters
8 characters
9 characters
0.02 seconds
.046 seconds
11.9 seconds
5.15 minutes
2.23 hours
2.42 days
2.07 months
0.86 seconds
1.36 minutes
2.15 hours
8.51 days
2.21 years
2.10 centuries
20 millennia

Excerpt from table at One Mans Blog [15]


Appendix D


Further Screenshots showing AZPR in progress




Appendix E


Quote for one day of database work to remove the data from the CDs:
Data Systems UK£550
Monaghan Consultants Ltd£750