<mosaic.cnfolio.com>

Project journal

Updating past entries as this has been written down and not put in the digital logbook as I couldn't access it

06/03/12
I did some refresher research on core components needed to complete the prototype based on the initial design. The reason this is a refresher I have used these components before when co designing the website with someone else. The core components I looked at were the database software (MySQL) and the two web languages (PHP and HTML).
Basic ideas of the software and relevant commands.
MySQL
MySQL is a database system used on the site and allows for creation of relational database structures on a web-server somewhere in order to store data for query. The web server for samkenney.com also has phpMyAdmin which allows for a GUI access to the MySQL database software.
Relevant commands
• Connect- allows for connection to defined MySQL database
• Select-allows for selection of database data based on parameters given in statement. Example (SELECT column name(s) FROM table name).
• WHERE- allows for more detailed selection
• Insert- allows for insertion of data to the database. Example (INSERT INTO table name
VALUES (value1, value2, value3...)).

There are the two web languages that are needed to complete the prototype.
PHP
PHP is a general-purpose server-side scripting language originally designed for Web development to produce dynamic Web pages, I.E content driven from a database based on scripts within the page. In the case of samkenney.com it is used to deliver the blog content to the pages and as well handle the comment before they are put in the database. PHP works closely with MySQL as it used normally with in the PHP code for the dynamic parts.
Relevant commands
• $_POST Variable- allows for values that have been passed from form to be used
• Session Variables- allows for values to be stored for a session that has been set
• If/else statements- allows for multi path in code based on define condition
• Echo-allows for print out on screen, either for error checking or live production purposes.


HTML
HTML is the main language for web pages and provides the basic building-blocks of webpages by using HTML elements. In the case of samkenney.com it uses HTML to provide the basic layout and elements for the core functionally.
Relevant commands
• Img- allows for image elements to be displayed.
• Form-allows for user input to be pasted to script to perform functions based on the users input.
• Table-allows for a layout structure that consists of rows and columns.


07/03/12

Not much done today as I had an exam and was traveling home but on route home, I started to write out a rough pseudo code of captcha system. This pseudo code broke down into three parts which were; the view comment page, post comment script and captcha image loader script.

The view comment page

<php>
Session start
Create random ID based on ID numbers in the captcha database and store as captcha ID variable
<?>
<HTML>
<Form>
Link to script
Username input box
Comment input box
Image link to captcha script based on variable captcha ID
Captcha answer input box
</form>
</HTML>

Post comment script

<php>
Session start
Store post inputs as PHP variables
Load/connect to MySQL data and point at captcha table
Query captcha answer with the answer defined in user input based on the variable captcha ID
(If no result found)
{
Echo “Answer wrong, are you human?
}
(If result found)
{
Load/connect to comment table
Insert the input variables (username/comment)
Echo “your comment has been posted”
}
<?>

Captcha image loader script

<php>
Load/connect the captcha table
Query table for image field based on variable captcha ID
Echo image field
<?>

systemlogic

08/03/12
Database for the captcha system worked out to need these fields; ID, Image data, Captcha Question and Captcha Answer.
Trying to decide how to store the Image data, by reference to image location on file via the MySQL database or sorting the image as a binary in the table itself?

The second option I haven’t tried before so a bit of research is needed.
To store the image in the MySQL data a blob field would be used, which is binary large objects that can hold a variable amount of data, in the case of the captcha system it would be able to store the image data. There are a few benefits to storing in the MySQL table which are, easy backup of the image database and easy management of the images via the MySQL software gui.
http://dev.mysql.com/doc/refman/5.0/en/blob.html

Also from the research I found example of implementations of loading images from the database that push me towards choosing that to be the path I should take with the code.
http://www.phpriot.com/articles/images-in-mysql

Based on this information I created a table in Mysql with these fields.
Column Name Type
id Int(4)
img_name varchar(255)
img_data blob
captcha_question varchar(256)
captcha_answer varchar(256)

With this information I went to work on getting the Captcha image loader script and displaying it to the view comment page.
• The first step was to create a way of getting the captcha image and captcha information in to the database. To do this I created a simple form that took user inputs and passed them to a script that would input them in to the database
form

• The second step was to create the script that would input them in to the database, this requires script to take the value from the user inputs and convert/pass them to the database. The problem is one of the inputs is an image file so it needs to handle to extract the name of the file and data. PHP has a functionally that allows files to be looked at for parts of information like name and data, an example of this is $_FILES[imgfile][name] which grabs the name of the image.

One this was worked out and the inputs were stored as variables a connection to the MySQL data and table was needed to allow a SQL query so the research SQL command to connect to the database was used. A SQL query had to be written to pass the data. Based on my research on the 06/03/13 I choose to use the insert command to pass the data to the database. Which is show below.
INSERT INTO `samscaptcha`(`img_name`,`img_data`,`captcha_question`,`captcha_answer`) VALUES ('$fileName','$content','$question','$answer')";
While writing this I had a few problems with using the wrong type of quote marks which caused the SQL query not to work, after changing the quote marks around I found that this (‘) was the needed mark to work.
This shows the table working

tableworking

• The next step was to create the Captcha image loader script and this requires a query to the database that looks for the Captcha id and returns the image data of the defined captcha to display. As the captcha id method has not been coded yet I used a number that I knew was in the database as the id. The SQL query looks for the image data where the id matches. This is shown below

mysql_query("SELECT `img_data` FROM `samscaptcha` WHERE `id` = 1"
If it finds a value it will echo out the image data, to test this I ran the script to see if it echo out the captcha in id 1. This is what I saw, so it works.
image1
Need to finish off image loading with captcha ID random generator next time I work on the project.


10/03/12
Based on what I had done last time I worked on the project, I needed a way of random generating the captcha ID to be used in the image loader query.
I found a way this could be done by using a function in PHP that creates random number in a select range of numbers, here’s an example rand(1,2);
Based on the research I did before I came up with the idea to assign the random number to a PHP session variable so it can be use where ever there is still a valid PHP session.
With this I had to make a change to the SQL query so it would take a variable that was past to with the captcha ID value.

SELECT `img_data` FROM `samscaptcha` WHERE `id` = '$capid '

The past value comes from the request url used in displaying the captcha image in the form. The form uses a img tag that takes the PHP session variable (captcha id) and passes it to the script for it to display the image needed. As show below.
echo "<img src='http://samkenney.com/viewimage.php?id=$capnum' alt='Captcha'/>";
With all that working now this is what the form looks like
userform
The last task that is needed for the captcha prototype is the post comment script which checks the users inputted captcha answer with the captcha answer that is in the database.
To do this I need to edit the existing post comment script to include the check for correct captcha answer. This will require a SQL query and an if else statement. Based on the user’s captcha answer and the session variable for the captcha id this SQL was created.

SELECT `captcha_answer` FROM `samscaptcha` WHERE `id`='$capID' and captcha_answer='$answer'

Once the SQL was done the next task was to create the two paths the program could take based on the SQL result. The first was the answer was right and the comment would be posted as before the captcha system was there and path two which is where the user input is wrong and the comment is no longer posted.

Path 1
path1
Path2
path2

Once complete I quickly ran through the system testing it to test the basic logic and then to find the holes I couldn’t find I posted it on Facebook for friends to test the system and get feedback.
facebook1
With that a friend found a fault in the system, well not in the captcha system but in the comment system which affects the captcha system. The fault is the user inputs can be sent to the system again and again by refreshing the page once the captcha has been answered. This needs to be fixed for the system to work as intended.

14/03/12
Been busy with non-university stuff so haven’t had much time to work on the project but it stumble across an answer to the refresh problem. The fix is a method called Post/Redirect/Get which is where you redirect after they have posted to stop them being able to refresh and resubmit.

Post/Redirect/Get
Post/Redirect/Get

With this information I decided to redirect from the post comment script based on the path taken. One redirect uses a PHP session variable to take a user back to the page that they commented on if the captcha answer is correct and the other redirects to a captcha failed page.
fail

The next task is to tidy up the input boxes and standardize the captcha image as they look untidy and out of place. I created a template for the captcha image so text can be superimposed on it, this makes sure that the size of the image is the same for each captcha. Also I set sizes for all Input boxes, so they all look the same and are the same length

newinput


15/03/12
Had a meeting with my project lecture and demo my demo to hit and got some feedback of the system, he was happy with the progress of the work so far. Also asked about the structure of my logbook which he said was on target for what is a good look but he gave some pointers on content in the logbook. This was about the turn how the system works in to a more visual image and to focus more on the captcha system not the supporting stuff.

16/03/12
Making a change to the way that the captcha system generates its ID number that is used in loading the image and also in the captcha checking system. In its current state it has a static range of captcha id that is hard coded in to the system, so this mean you have to change the code if you want to add more captchas which is not practical so it needs to be changed.
To do this I decided I would query the database to find how many captchas are in the system and use that value when generating the captcha id for the page. From looking at some SQL command I have decided the COUNT command would allow for this function as it allows you to count things in the database.
This is the SQL command I came up with “SELECT COUNT('id') FROM samscaptcha” which searches and counts the captchas. This combined with the rand() function in PHP allows for a random id to be generated.

Another little add on to the code to stop the same captcha coming up twice in a row there is a do while loop that states if same random number is generated then re generated.
To make this code more portable it has been taken out of the view post page and put in its own PHP page and included where needed.

After completing a thought crossed my mind, which was what happens if you cannot read or understand one captcha, you would need to refresh the page. That would mean you lose the inputs that you had typed in and that seemed like a bad thing. So based on that a rough idea of a refresh the image on the page came where if you clicked on the image it would change and load and new one.
Based on this I needed to change the captcha image loader script to generate ID numbers too as it would be called onclick so it would need to choose a new ID. So the code that I just made was put in to the image loader and tweeted till it refreshed the image on click. Which it did well up until the point you got to post the comment and use the post to database script, then it would show the fail page saying the answer was wrong even though it was right.

This means there is something wrong with the PHP session variables if the image is loading but the checking function is not.
To find out what is wrong I created a PHP page with the PHP session variables echoing out onto the page to see them while the system is working. After playing around with the system for a while I worked out what the problem was. The problem was that they PHP session variables were not updating on load even though the image changed, which means the last PHP session variable value is still there and being posted to the checker, this is why the answer are correct but are getting rejected.

By the time I got this point it was 3am on the 17th and decided it was too late to proceed with trying to fix it.

17/03/12
Woke up with an idea how to fix this problem, by taking the old image loader script to load the image and set the captcha id when the page is loaded. Then use the new image loader script when you click on the image to load a new image and set the captcha id with the new number. By doing this it stops the problem with the non-updating PHP session variables.
After this change was made testing was done again to ensure the logic behind this captcha system still worked and the new refresh function worked on all main browsers. Once this was done, I can confirm the idea that I had work and the captcha system is working how I intended for it to work.

Also I posted it out to Facebook for friends to spot faults with it as it allows for a 3rd set of eyes to find faults that I could not.

The rest of the day will be spent adding more captchas to the database as the number is too low. Then in the next few days start to plan out my survey and where I’m going to post it or be to get people to do it.

Addon- this is the script for reloading and the new image loader the image as I forgot to add it in my last update

<script language="javascript">
var clicks = 0;
function updateImg()
{
clicks++
var doc = document.getElementById("turing");
doc.src = "http://www.samkenney.com/viewimage2.php" + "?act=" + clicks;
}
</script>
<img id="turing" src="http://samkenney.com/viewimage.php?id=<?php echo $_SESSION['id'] ?>" /> <a href="#" onclick="updateImg();">Can't read the Captcha? click here to refresh the image</a>


<?php
session_start();
// Connect to database to find captcha range*********************
mysql_connect ('localhost', 'ahost839_ahost83', '2329') ;
mysql_select_db ('ahost839_samsblog');
$result = mysql_query("SELECT COUNT('id') FROM samscaptcha");
$count = mysql_result($result, 0);
do {
// Set random number*********************************************
$capnum= rand(1,$count);
} while ($capnum == $_SESSION['id']);
// Set Captcha ID***********************************************
$_SESSION['id']=$capnum;
// Set page redirect********************************************
$capid = $_SESSION['id'];
header('Content-type: image/JPG');
$dbconn = mysql_connect ('localhost', 'ahost839_ahost83', '2329') or die("Error Occurred-".mysql_error());
mysql_select_db('ahost839_samsblog', $dbconn) or die("Unable to select database");
$query = mysql_query("SELECT `img_data` FROM `samscaptcha` WHERE `id` = '$capid '");
while($row = mysql_fetch_array($query)){
echo $row['img_data'];
$db->close();
}
?>


20/03/12
The prototype captcha system is now complete so I created a flow diagram of the captcha systems final logic
final logic


Below I will walk through the diagram

1. Load the comments page
2. The random captcha id generator is linked the view comments page so is called automatically. This script queries the data to find how many captchas are in the system so it can generated a number that is in range of the amount of captchas in the system to be used as the id.
3. The results of the query is pasted back the generator script and it creates a number.
4. The generated number is passed back to the comment page where the captcha is placed. This is done via PHP session variables.
5. The comment page uses this number to call the Captcha image loader 1 to get the captcha image.
6. Captcha image is send back and rendered on page.
7/8. Option1- User calls a refresh method to change the Captcha image, this creates it own random number via a generator (stores it in the PHP session variable already set up) and passed the image back to the page. This option is used if the user does not understand or cannot answer the question and wants to change the question to try and get a question they can answer.
7/8. Option2- User’s answer the captcha question and the answer is passed to the captcha checking script. This queries the database with the answer based on the captcha id that stored in the PHP session variable. If correct posts the comments data to the database and if not correct redirects them to a fail page and ask if they are human.

My task for the next few days is to come up with a survey that will allow me to get feedback on the captcha system I have created and allow me to do analysis of that information to find the answer to my main objective of finding out strengths and flaws of a CAPTCHA system that uses logic puzzles

The first that needs to be looked at this what type of surveying method I will use to get feedback on the prototype system. From research I see what there two main methods are;
Questionnaires
• Group administered questionnaire- these are where you take a sample of people and ask a structured sequence of questions in a set location.

• Mail/remote questionnaire- these are where you give a sample of people a structured sequence of questions and ask them to fill it out in their own time and return it to you.

Interviews

• Interviews are like Group administered questionnaire but do not have a structured sequence of questions that are asked, this allows for the person that you are asking to give their view about the subject that you might not have thought of a question for.

To choose what type of surveying method I need to think about a few things which include;
Target audience Issues, Sampling Issues, Question Issues, Content Issues, Bias Issues and Administrative Issues
Target audience Issues

Can the total amount audience members I intend to survey as a group give a number?
The audience cannot be quantified as the audience to the intended site are not kept on in any type of record that can be looked up and can be viewed as random amount due to the nature of the internet.

Is the target audience literate?
The audience would have to have some level of illiteracy because the audience would know how to use a computer due to the subject of the survey, so a level of illiteracy is assumed.
Are there language issues?

The audience might use different languages but as survey is indented to be written in English which is taught as a second language in many countries to a basic degree, this should limit the language issues.

Will the target audience cooperate?
The audience should be on board to cooperate as the survey should allow them to give feedback on a system they would use if they visited the site in subject.

What are the geographic restrictions?
The audience could be spread over a wide geographical due to the nature of the intended subject area of the survey.

Sampling Issues

What data is available about the target audience?
There is not much information about the target audience other that they have used the intended subject area of the survey.

Can respondents be found?
The audience can be located in similar places to the intended subject area of the survey

Who is the target audience?
The audience is one that is interested in gadgets, computers, TV/film and my information about my time at university. So the target audience would fit the demographic of 16-30 males based on other studies with similar content.

Can all target audience be sampled?
No as some of the target audience as they might not see the survey because of not visiting the site in the survey time. Also some of the target audience will not response the targeted ask for feedback due to time restrictions or other personal reasons.

Question Issues

What types of questions can be asked?
The survey will consist of short answer questions that will be targeted that parts of the subject matter. Also there will be one question that is med size to allow the target to write other comments about the subject matter.

How complex will the questions be?
Simple question that only ask one question and do not consist of multi parts.

Will screening questions be needed?
Yes to demine if they have use the item that is the subject area of the survey

Content Issues

Are target audience expected to know about the subject?
Yes and no, as some of the target audience know about the subject as I have talked about it will them before and to complete the survey you will need to know the subject, but it is not requires as I link will be provide for them to look at the subject before taking the survey.

Bias Issues

Can false responses be avoided?
Yes and no, as this is will be a university project the target audience is more like that it will give a valid response but some responses will be false responses due to human nature when filling out surveys.

Can you avoid social desirability with in the target audience?
Yes and no, as the subject area of the survey is on something I have created so it the target audience might be less honest with the answers if they know me but for the larger audience the answers should be valid.

Administrative Issues

Costs
The cost of the survey need to be low as I am a student with limited income to fund a undertaking.

Facilities
Allow I could mostly likely get the facilities to conduct a survey in, the nature of the target audience geographical locations it would not be feasible to survey in person.

Time
The survey would have strict time frame to be complete in, about 3-4 weeks which means that exposure of the survey would have to be in such a way to get the result back within the time frame.

Personnel
As this is my final year project it will just be me surveying which limits the amount of surveying I can do.

Based on the questions that I have just answered it shows the type of surveying method I should use which is Mail/remote questionnaire as it provides best way of contacting my target audience. It also allows me to ask the type of question that I require as it offers a structured sequence of questions and also allows me to be within budget and my time fame due to the quick uptake this type of surveying method can offer.

21/03/12
The task today is to come up with questions that help discover the needed information for the main objective and work in the type of surveying method chosen.

To do this I will look at Types of Questions what could be used to see if my basic idea will be the type of questions that will be used in the survey. Once that is done I will come up with a few questions for the survey and review them to see if the meet a set of requirements. Then finally I will look at the structure of the survey to make sure that it flows in a correct order to make it easy to follow and simple to answer.

There are two main areas of questions which are structured response questions and Unstructured Response.
Structured response questions

These types of question offer a few types of response which are;
• Dichotomous Questions- are questions that have two possible responses
• Questions Based on Level Of Measurement- are questions that use a Measurement suck as a ranking of scale to determine the response
• Filter or Contingency Questions- are questions that are used determine if a person is qualified or experienced enough to answer of another question.

These are the types of question that are used to get responses
• Fill-In-The-Blank Questions- are questions that give you a small one line answer space
• Multi-option variable Questions- are questions that display number options that a few could be selected.
• Single-option variable Questions- are questions that display number options that one could be selected.

Unstructured Response
• Short text field questions- are questions that give the replier room to write text-based response based on what they think.

With that information I made a change in plan for the type of questions I was going to ask in the survey. I would now in include questions that would use Level of Measurement to ask about time to complete the captcha answer and include a Filter or Contingency Question about the refresh feature.

With that research done I will now rough draft the questions that I want to include in the survey.
1. What is your age?
2. Before today did you know what CAPTCHA is?
Yes/No
3. Was CAPTCHA question easy to read?
Yes/No
4. How long did it take you solve the CAPTCHA?
10sec-40sec, 41sec to 60 sec, 1min+
5. Explain how easy or hard it was to solve the CAPTCHA?
6. Did you use help to solve the CAPTCHA?
Yes/NO
7. If yes what help?
Search engine/website, book/newspaper, other
8. Do you think the type of question CAPTCHA was suitable for this type of task?
Yes/NO
9. If yes Why?
10. What would have helped you answer the CAPTCHA question easier?
Simpler questions, Clearer Text, Audio option, other
11. Do you think it will help stop spam comments on the site?
Yes/NO
12. Any other comments?

As you can see I change my mind again with the types of question to be used in the survey, added some dichotomous Questions to get yes or no responses, added more Level of Measurement as this allow for stats to easy worked out and also I Single-option variable Questions with a Fill-In-The-Blank space to streamline the responses and still allow for the option for the user to add what they want. Also I drop the question was going to ask about the refresh function and asked about the logic question instead ask it offers better feedback to answer the main objective.

Set of requirements for reviewing the questions.
• Are the questions clear?
• Do they meet the information you need from responses?
• Is it detached from my personal view so they are no bias?
The questions meet all three requirements because the questions are simple to understand, this was found out by asking a housemate to read over the questions to check if they could understand them and what they were asking. The responses to the question give the information that I need to work out the main objective of the project and the question that are asked are worded in a way that they are no influenced by the question.

The structure of the survey
The structure of the survey should allow the target to slowly work into the survey with an opening question which it does with an age question as the first question and then it ask a few simple questions to get the person engaged in the survey.
Another thing to make sure that the survey questions work in a chronological order as in if a question as about something make sure it leads on to a similar question if there is one and this is done in the structure with the Filter or Contingency Questions as they ask and refer you to the question below if you meet the criteria.
Another thing to look at is using a limited response set as it limits the mistakes that can be made by the user; this survey structure only ever has a max number of 4 to choose from, so it limits the mistakes that can be made.
Based on these guidelines the current structure works and will be used in the public survey

Final structure of survey
1. What is your age?
2. Before today did you know what CAPTCHA is?
Yes/No
3. Was CAPTCHA question easy to read?
Yes/No
4. How long did it take you solve the CAPTCHA?
10sec-40sec, 41sec to 60 sec, 1min+
5. Explain how easy or hard it was to solve the CAPTCHA?
6. Did you use help to solve the CAPTCHA?
Yes/NO
7. If yes what help?
Search engine/website, book/newspaper, other
8. Do you think the type of question use in the CAPTCHA was suitable for a CAPTCHA system?
Yes/NO
9. If no why?
10. What would have helped you answer the CAPTCHA question easier?
Simpler questions, Clearer Text, Audio option, other
11. Do you think it will help stop spam comments on the site?
Yes/NO
12. Any other comments?

23/03/12

After talking to my project lecture I have decided to go back a bit in the project and rethink about what im trying to achieve with the survey as the current state it is in is too broad with the questions and is written from the view of a person that knows about the subject matter (CAPTCHAs) is about. This makes it hard for someone that does not know about the subject mater to answer the questions asked in the survey and would make it hard to break the information down and make a conclusion from the results without guessing some meanings which is bad research.

So to do this I will create new survey which will have to think of a way in which a person with no subject know could answer the survey and provide me with the data that is needed to prove or dis prove the hypothesis that will be created and also provide me with information to find the over all objective of the project.

The purpose of the survey is to find out if the type of logic question used in the prototype system is good question to be used in a CAPTCHA system that is used to prevent comment spam.

With that defined I will create questions that I think will help me to find out the answer to that question
What age are you
This question allows to see if age is a factor when comparing the system

On a scale of 0-5 how hard was it to read and understand the question?
0,1,2,3,4,5
This question allows be to quantify the difficulty of reading and understanding what had to be done before even answering the question

Out of this options choose the level of hardness that solving the question is the same as?
Answering a simple maths question 2+2=4
Answering a mid level maths question 4 squared = 16
Answering a high level maths question 54 squared - 29% of 6552 = 1 015.92
Answering a expert level maths question y = log x (If y = 10) What is X?
This question allows be to quantify the difficulty of answering the question

Did you use help to solve the question?
None
Website
Book
Newspaper/Magazine
Friend or family
This question allows be to quantify the people that needed help to answer the question and also allows me to find out where they got the help from.

What would have helped you answer the question easier?
Nothing
Simpler questions
Larger or clearer text
Hint to what answer that the question wants (I.E type of answer or a small hint)
Allow for more than more than one answer to the question if there similar answers that are correct
Different topic used for the questions
Accessibility options (Audio questions)
This question allows me to see what the down falls of the CAPTCHA system and what aspects make a good CAPTCHA system.

What was the most time consuming task with posting the comment?
Reading the question
Understanding the question
Finding a question from the database you could answer
Working out the answer to the question
This question allows me to see where people had trouble with with the CAPTCHA system

Out of the following what type of task which do you prefer?
What is the sun?
3+7?
Out of these pictures which on is a dog?
Retype this sentence as you can see it "hELlO tHIs A SenTEnCe"
This question allows me to see if the type of question is the best choice of question to use in the CAPTCHA system based on user feedback.

Which one of the options do you think offer the best method to prove your human and prevent spam (Quickest, easiest, most effective) ?
Logical tasks (Answering questions or other logical task such as identifying objects)
A form of registration (Username and password)
SMS validation (using a phone to prove your human)
This question allows me to see if the choose method of protecting the comment system is the best way of protecting it

24/03/12
After the total redesign of the questions for the survey though it would be best to get feedback on the questions to make sure they were worked correctly so they would be easy to understand and answer, but also see if the information they provided would give data which I could use after the survey was complete to answer the main objective of this project.
To get feedback I will choose a small subset of people I know with different levels of technical knowledge to look and the questions, fill in the survey and leave any other comments they want to, that could tell me where the survey could be improved. I will carry on doing this today to try and get the survey to a standard where it can be released to the public for data to be got.


25/03/12
After getting the feedback from the subset of people, there were a few things needed to be changed which included the wording of the questions and an additional question.
The wording of the question was the most commented on the feedback as some of the questions were not clear to the read when reading, either it didn’t make sense to them or badly worded. This is clearly an area that needs to looked at today and improved with the help of the feedback I got that included ideas how to word the question to make it more clear.
The other big point with the feedback noticing some people were getting the question wrong a few times but they say they put the right answer. So to quantify that problem a new question will be added; “How many times did you have answer the question before you answered it right? This question will allow another view in to how easy the question was to answer based on the amount of times need to answer the CAPTCHA questions, if low about amount it is a good puzzle and a high amount is not a good puzzle.

Based on the feedback on the wording of the question these are the new questions that will be asked as they fix the problem they had before and this is known as the new questions were given out to the subset again to see if this was an improvement and it was, most of them said it was good enough to release.
On a scale of 0-5 where 0 is the easiest, how hard was it to read and understand the question?

0,1,2,3,4,5

Out of the list of options below choose which you think is equivalent to the level of hardness of your answered question?

Answering a simple maths question 2+2=4
Answering a mid level maths question 4 squared = 16
Answering a high level maths question 54 squared - 29% of 6552 = 1 015.92
Answering a expert level maths question y = log x (If y = 10) What is X?

How many times did you have answer the question before you answered it right?
0,1,2,3,4,5+

Did you use any help to solve the question, if so what help did you use?

None
Website
Book
Newspaper/Magazine
Friend or family

What was the most time consuming task with posting the comment?

Reading the question
Understanding the question
Finding a question from the database you could answer
Working out the answer to the question

Which of the following would have helped you make the question easier to answer?

Nothing
Simpler questions
Different topic used for the questions
Allow for more than more than one answer to the question if there similar answers that are correct
Hint to what answer that the question wants (I.E type of answer or a small hint)
Larger or clearer text
Accessibility options (Audio questions)

Out of the following tasks to verify you are human, do you prefer?

What is the sun?
3+7?
Out of these pictures which on is a dog?
Retype this sentence as you can see it "hELlO tHIs A SenTEnCe"

Which one of the options do you think offer the best method to prove you are human and prevent spam (Quickest, easiest, most effective) ?

Logical tasks (Answering questions or other logical task such as identifying objects)
A form of registration (Username and password)
SMS validation (using a phone to prove you are human)

The last things I will do today is rough draft up my hypotheses (I what I think should happen) for the projects survey and get a few people to check over it to make sure that it what need and makes sense.

26/03/12
Today I got my email for my project moderator, so I will draft up an executive summary of my project today and set up a meeting for the Friday.
After a few of emails we confirmed a time and date to meet on the Friday at 9:55am, there was a little bit of confusion with showing him the logbook as it is digital but we came to understanding.

Waiting on feedback on hypotheses from a few people, so the executive summary of my project might be all I do today

27/03/12
Before carrying out the public survey there were predictions made for each question of the survey and these predictions are shown as figure percentage plus they also have a short reason why the figure percentage figure would be set at that amount.

On a scale of 0-5 where 0 is the easiest, how hard was it to read and understand the question?
0,1,2,3,4,5
0- 30%
1- 40%
2- 10%
3- 10%
4- 5%
5- 5%

With the guidance included on the page via the instructions I believe the target will know what to do when the question is posed and I believe the question are written in a way that they are easily understood to someone who knows the subject matter asked in the questions. This is the case the target audience should know about the subject matter as it matches the site subject matter and should be able to easily understand. The only problem could happen is that some of the target audience might have trouble reading the question due to visual problems of their own; this is why there is some accounting for a few people finding it hard to read the question in the prediction. Due to personal experience there is understanding that the way that question is displayed is easy with people who have visual problems when reading.

Out of the list of options below choose which you think is equivalent to the level of hardness of your answered question?
0. Answering a simple maths question 2+2=4
1. Answering a mid-level maths question 4 squared = 16
2. Answering a high level maths question 54 squared - 29% of 6552 = 1 015.92
3. Answering an expert level maths question y = log x (If y = 10) What is X?
0- 50%
1- 40%
2- 5%
3- 5%
As the target audience survey should know about the subject of the question as it is the type of subject that is talked about on the site, then the results of the survey should indicate this by showing the majority of the results a low level of equivalent hardness was needed to answer the question.

How many times did you have answer the question before you answered it right?
0,1,2,3,4,5+
0- 55%
1- 25%
2- 5%
3- 5%
4- 5%
5- 5%
As the question subject is known to the user as it is the type of subject that is talked about on the site and the predicted level of hardness is low, then this amount of times that is needed to answer the question should also be low. This is due to the level of hardness and the number of times needed to answer the question being related closely.

Did you use any help to solve the question, if so what help did you use?
0- None 70%
1- Website 20%
2- Book 2%
3- Newspaper/Magazine 3%
4- Friend or family 5%

Based on the target’s knowledge about the subject of the question which should be good due to the type of content talked about on the site and the simple style questions. This should be able to be seen in the results with a majority selecting the option none. If they don’t know the answer it is more than likely that the target would just search the answer as they are already on the internet.

What was the most time consuming task with posting the comment?
1. Reading the question 30%
2. Understanding the question 15%
3. Finding a question from the database you could answer 25%
4. Working out the answer to the question 20%

This question’s result might be spread depends on the person's ability preform certain tasks or other limiting factors like reading, knowledge and mental working out ability. The result of the question should lean towards will be the reading will take the most time as the implementation as this should be the hardest task if the other parts of the CAPTCHA are good.

Which of the following would have helped you make the question easier to answer?
1. Nothing 50%
2. Simpler questions 5%
3. Different topic used for the questions 10%
4. Allow for more than more than one answer to the question if there similar answers that are correct 5%
5. Hint to what answer that the question wants (I.E type of answer or a small hint) 10%
6. Larger or clearer text 15%
7. Accessibility options (Audio questions) 5%

Due to the simple questions and a known subject matter the results should show a majority selecting of the option nothing. This is due to the predictions that were made for the last few questions as this question is closely related to them as they all are about the usability of the CAPTCHA and should have answers that do not conflict with each other, so by prediction nothing as the main cause it matches the other predictions.

Out of the following tasks to verify you are human, do you prefer?
1. What is the sun? 35%
2. 3+7? 15%
3. Out of these pictures which one is a dog? 30%
4. Retype this sentence as you can see it "hELlO tHIs A SenTEnCe" 20%

This question’s results might be spread depending on the person’s personal preference of type of task. The result of the question should lean towards option 1 and 3 as they offer a simple task than the others. This is due to the other options could give people problems such as; working out math problems or put off by the idea of math or sight problem that make retyping something hard as it maybe be hard to read as shown in the background research.

Which one of the options do you think offer the best method to prove you are human and prevent spam (Quickest, easiest, most effective) ?
1. Logical tasks (Answering questions or other logical task such as identifying objects) 60%
2. A form of registration (Username and password) 30%
3. SMS validation (using a phone to prove you are human) 10%

Due to option one being less time consuming to complete from past experience this should be the most chosen. The reason why this is thought is that it only requires doing one task on the same page, unlike the other options that are provided in the question.

28/03/12

Tasks:

Create clear hypotheses

Notes:

Now the survey questions and result Predictions are done, a hypotheses needs to be made for the survey. The hypotheses needs give a brief understanding on what the result of the survey will help you to work out and the expected result pattern to prove if the CAPTCHA system is good and expected result pattern if the the CAPTCHA system is bad.

Task work:

The aim of the project is to research and understand the strengths and flaws of a CAPTCHA system that uses logic puzzles and this is achieved by implementing a CAPTCHA system with a chosen logical puzzles, which is a certain type of logical puzzle that requires a respondent to answer a set question with a correct answer by deducing from the question. The questions in the survey are based on finding out what the respondents thinks of the implemented CAPTCHA system within comment system on a blog. Feedback in the survey will allow for an answer to if the type of logic puzzle used in the prototype system is good puzzle to be used in a CAPTCHA system that is used to prevent comment spam.
CAPTCHAs have been around since the year 2000 (Ahn, Blum and Langford 2004) and many of the respondents should have seen the use of them somewhere on the internet. So I believe that a majority of respondents would know about them and be able to complete the survey and provide valid feedback based on their present and past experiences.
The overall survey results should be mixed across the respondents based on their ability to work out the type puzzle.
If the type of logic puzzle used in the prototype system is good puzzle then the results should show that the best qualities for a good puzzle and good way of preventing comment spam such as easy to understand and answer and also easy to complete are chosen. To prove this if the puzzle is good it will follow this expected closely.

The expected pattern
• Question 1 of the survey should have low results which would indicated that understand the question is easy to the respondents.
• Question 2 of the survey should have a result which either the first or second option is chosen which shows that only a simple to mid-level cognitive thinking is needed to answer this type of question.
• Question 3 of the survey should have low results which would indicate that the expected answer for the question was clear from the question asked.
• Question 4 of the survey should have the result of the option “none” indicated the question asked was simple and needed no help to solve.
• Question 5 of the survey should have the result of the option Reading the question as this would be the option is the other parts of the CAPTCHA could not be take longer to do.
• Question 6 of the survey should have the result of the option “nothing” or “Larger or clearer text” or “Accessibility options “as these either show the question was simple and easy to answer or the limits/pitfalls of the implementation of the CAPTCHA.
• Question 7 of the survey should have the result of the option “What is the sun?” as it is the same type of question asked in the prototype implementation of the CAPTCHA system.
• Question 8 of the survey should have the result of the option “Logical tasks” as it shows that the respondents believe that spam is best tackled by the same type task as in the prototype implementation.

Conclusion:

Good basic hypotheses but more work is need on backing up why the expected pattern is a sign of a good CAPTCHA system, will do this when coming to write up the report as it will is a easy addon

30/03/12

Tasks:

Mod meeting
Create the survey form online
Post the survey public

Notes:

Have a mod meeting this morning.
Now the survey questions are ready they need to be post online so people can give feedback, This will be done by using a website that offers surveys to be hosted called http://www.surveymonkey.com/. The survey will have a information bit telling the user what to do.

The created survey is at http://svy.mk/captchafinalyearsurvey

Task work:


Information included

Using this comment system provided in the link (copy and paste in to your address bar) post a comment using the comment form and then please fill out the survey. To answer the question on the page, you can you help to answer it if needed.
http://www.samkenney.com/viewpost.php?id=100

Mod meeting

Talked about what could be done to make the project better, which we had some disagreement on with looking at other types of captcha system. He said it would good to implement and survey about those to get a compare to my captcha system, but i said that is out of the project goal and would be a waste of my time to try and implement as this is a research project. There was a good idea taken out of it, which was to find out the good points and bad points of other captcha system so they can be used in a compare.


Conclusion:

On target or ahead of what I need to be
Easy to do and will allow be to target people online

Filling in what happened as I haven't updated the log book in a while due to not much happening other that

01/04/12 to 22/04/12

Tasks:

Getting feedback

Notes:

During this time the focused on getting people to fill in the survey from my targeted group (Site users and sites that have similar content)

Task work:


Week one

Target feedback 20/50people

Conclusion:

People got 19 people

Week two

Target feedback 40/50 people

Conclusion:

People got to 40 people

Week three

Target feedback 50/50 people

Conclusion:

People got to 100 people, so well over my target

Conclusion:

A lot more people filled the survey in due more targeted surveying in the third week, which give me a better set of data to draw ideas from in the analysing

23/04/12

Tasks:

Researching other captcha systems

Notes:

I will be looking at three main types captcha systems used on the internet today which include Retype Text-Based CAPTCHAs, Image-Based CAPTCHAs and Math-Based CAPTCHAs

Task work(so far):


Retype Text-Based CAPTCHAs

Retype Text-Based CAPTCHAs use distorted words that a user is asked to type. Retype Text based CAPTCHAs are in the form of an image containing a difficult to recognize text string to be identified by bots using OCR techniques. The standard layout is the captcha image is displayed and the user types what they see in a text box provided near the CAPTCHA image on the Web page.
Advantages
• Non-language dependent- This means it’s not language dependent as it doesn’t not keep to one language therefore a person with any language can complete the task.
• Simple task- This means that the task to complete is none taxing to the user and regarded as easy.
• Text strings can be automatically generated- This means that an unlimited number of CAPTCHA images can be produced strengthening the system.
Disadvantages
• Sometimes very difficult to read- This is due to distortion of the words used in the CAPTCHA
• Are not compatible with users with disabilities unless alternate method is there to as people with visual problems.
• Time-consuming to decipher- This due to distortion making being hard to read therefore needs more time to work out.
• Weak to Artificial Intelligence. This is due to they have been compromised in attacks as published in studies.

Image-Based CAPTCHAs

There are a few Image-Based CAPTCHA system but the there are two main ones used which are; Identify the object and How are this related. The How are this related system presents a user with a set of images all associated with the same object or concept. The user is required to enter the object or concept to which all the images belonged to e.g. the program might present pictures of Globe, football, Planet and baseball expecting the user to correctly associate all these pictures with the word ball. The Identify the object system presents a user with a set of images with one target image and asks them to pick out a defined object that is displayed; this work by letting the human workout which of the image is what is asked for, something that computer has trouble with due the complexity of determining objects.
Advantages
• Simple task- This means that the task to complete is none taxing to the user and regarded as easy.
• Not as weak to Artificial Intelligence. This is due to the complexity of image recognition needed to identify from a random set of images but there are cases where attacks work as published in studies.
Disadvantages
• Sometimes difficult to identify objects or target group- This due to similar objects making difficult to find the target image or unclear target
• Are not always compatible with users with -This is due to People with Visual problem that have trouble making out some of the images within the question.
• Time-consuming process of the picture base creation- This is due to the picture base needing to be updated over time as the pictures become known to attackers.
• Technical/implementation difficulties- This includes the resources needed to sort images on the web server, Traffic needed to transmit/load images and the space for the implementation on the web page.

Math-Based CAPTCHAs

Math-Based CAPTCHAs use math questions to tell if you're human, these questions are simple mathematic questions that are easy to solve. These Math-Based CAPTCHAs can be present in a few ways but the main two that used are; numerical or textually. Numerical Math-Based CAPTCHAs display the question using numeric numbers in the questions. This leaves this open to OCR software identify the numbers, so it relies on security through obscurity of the website. Textually display the question using text numbers in the questions which allow it make it harder to work for a computer but it’s still open to OCR software with AI to crack the CAPTCHAs, so also relies on security through obscurity of the website.
Advantages
• Simple task- This means that the task to complete is none taxing to the user and regarded as easy.
• Non-language dependent- This means it’s not language dependent as uses an international language which is mathematics therefore a person with any language can complete the task
• Math problems can be automatically generated which means unlimited number of captcha images can be produced strengthening the system
Disadvantages
• Are not as compatible with users with disabilities as people with conative problem might have difficulty using the system.
• Weak to Artificial Intelligence as they can be easy compromised by being parsed by OCR.

24/04/12 and 25/04/12

Tasks:

Extract survey data from online survey
Look for anomalies

Notes:

As the online survey software doesn't allow for auto export it needs to be copied to a excel file so it can be worked with in the cleaning processes and analysis.
Research how to use excel to preform functions needed for cleaning the data up
Once manually exported, I will look at the questions and results to come up with what the anomalies are in the data and clean them up for a cleaner data set

Task work(so far):


Manually transferred the survey results in to a excel file and created graphs with pre cleaned data
see the Results.xlsx attachment.

Based on the question asked I have determined there could be a few inconsistencies that could happen based on the feedback given by the targets namely answers that may contradict with an answer from another question. These inconsistencies could include;
• Question 2 and Question 3- A target answering Question 2 with a hardness level of 1 and then in Question 3 answering 3 or above.
• Question 2 and Question 4- A target answering Question 2 with a hardness level of 1 and then in Question 5 choosing any option other than NONE.
• Question 2 and Question 5- A target answering Question 2 with a hardness level of 1 and then in Question 5 choosing option 4 stating that working out the answer was the hardest part of the captcha.
• Question 2 and Question 6- A target answering Question 2 with a hardness level of 1 and then in Question 6 answering with option 2
• Question 1 and Question 5- A target answering Question 1 with a hardness level of 1 and then in Question 5 answering with option 2

After looking through the data set we found that some of the results match the inconsistencies that might occur. Below the table shows how many of each of defined inconsistency appears in the data set.

Question 2 and Question 3 3
Question 2 and Question 4 1
Question 2 and Question 5 4
Question 2 and Question 6 3
Question 1 and Question 5 1

There are a few inconsistencies found and after some reflection might need to go through a second cleaning to sure up the data set results more, but that this time I’m not sure if that would be needed. New excel sheet is uploaded with the cleaning functions in and it is called CleanResults.xlsx

26/04/12
Tasks:
Clear the data a second time
Notes:
After thinking about if the first cleaning was enough, I decided that the data should go through another cleaning to clean the data from results matched the old cleaning anomalies but instead of hardness level of 1 it would be hardness level of 2
Conclusion:
Work completed with ease, See clear results for this work

27/04/12
Tasks:
Created clear results graphs
Notes:
Now the data is clean, the graphs of the clear data needed to be create to show a visual view of the results
Conclusion:
Work completed with ease, See clear results for this work

28/04/12
Tasks:
Look at data with quantitative view and write about it
Notes:
From the clean data results a quantitative view will be written based on the research, hypothesis and predictions.
Conclusion:
Work complete, took a while to get going with the way to write it. See clear results for this work

30/04/12
Tasks:
Look at data with Qualitative view and write about it
Notes:
From the clean data results a Qualitative view will be written based on the research, hypothesis and predictions.
Conclusion:
Work complete, once again took a while to get going with the way to write it. See clear results for this work

1/05/12
Tasks:
Start to write the final report.
Covering these sections; Abstract, Summary, Statement of Objectives (Aims)
Notes:
This information will be taken from the project goals and the preliminary report as there are pieces that can be used from there for this subject.
Conclusion:
Work complete, simple task of taking work done a fitting it within what is needed

4/05/12
Tasks:
Continue to write the final report.
Covering these sections; Background Theory, Theory directly relevant to the Project and Project plan.
Notes:
This information will be taken from come from the research and information done in the preliminary report and research, information and changes throughout the project.
Conclusion:
Work complete, simple task of taking work done a fitting it within what is needed

7/05/12
Tasks:
Continue to write the final report.
Covering these sections; CAPTCHA Design Considerations and Implementation Options, CAPTCHA Preferred Solution, Detailed discussion of functional CAPTCHA preferred solution and Captcha Development and Test programme
Notes:
This information will be written as it did not get put in the logbook before this point of time. It will take in the view of what options there are for the Design and Implementation from research done, will explain why this was chosen and talk about its Development and Test.
Conclusion:
Work complete, this was missing from the logbook this took a while to complete as it need going over as some stuff was missing from my notes.

12/05/12
Tasks:
Continue to write the final report.
Covering these sections; Survey introduction and Design Considerations and Implementation Options, Survey Preferred Solution and Survey Development and Test programme
Notes:
This information will be taken from come from logbook with new information explaining why this was chosen and talk about its Development and Test.
Conclusion:
Work complete, took longer than expected due to some of the new information as it was hard to explain why it was chosen at first

15/05/12
Tasks:
Continue to write the final report.
Covering these sections; Hypothesis and predictions and Survey results and Analysis
Notes:
This information will be taken from come from the logbook with new information that supports the hypothesis via the research done and will also come from the files that contain this work done earlier.
Conclusion:
Work complete, simple task of taking work done a fitting it within what is needed

16/05/12
Tasks:
Continue to write the final report.
Covering this section the conclusions of the project
Notes:
This information will be written from analyse of the survey data summing up the finding and relating it back to the research.
Conclusion:
Work complete, semi hard task of summing up the project but took as long as I expected it to take to complete

20/05/12
Tasks:
Continue to write the final report.
Covering this section the Self-Appraisal
Notes:
This information will be written from a look back view of the project as a whole
Conclusion:
Work complete, Report is now complete just need proofreading and editing to make sure it is ready for handing in.

21/05/12-28/05/12
Tasks:
Proofread the final report and correct errors.
Print report
Create project poster
Notes:
This will be slowly done as the report is too large to read through in a limited amount of days and once done this will be printed as the printing guidelines state. The project poster information is just information from the report so this will be taken from there and reformatted for the template given.

Conclusion:
Work complete, Report is now checked and edited and was printed 3 times for each needed person to have a copy (me,the supervisor and the moderator). The poster is done with the information from the report and will be printed tomorrow sometime. The poster file is called poster. That was the last task that had to be completed before project demo day.



The project is available at www.samkenney.com for viewing, if you need access to the server (I.E login details) to look at the structure of the site and the database please email me at kenney29@gmail.com.

Attachment Timestamp Size
Project backup.zip 2012-05-29 08:29 5.83 MB
CleanResult.xlsx 2012-04-25 12:34 52.67 KB
Results.xlsx 2012-04-24 13:14 30.81 KB
System logic new.jpg 2012-03-20 12:05 100.75 KB
newinput.jpg 2012-03-17 12:32 35.92 KB
prg.png 2012-03-17 12:29 52.07 KB
fail.jpg 2012-03-17 12:29 49.65 KB
facebook1.jpg 2012-03-14 14:08 60.1 KB
path2.jpg 2012-03-14 13:57 12.98 KB
path1.jpg 2012-03-14 13:57 15.69 KB
form1.jpg 2012-03-14 13:57 24.58 KB
imageload1.jpg 2012-03-14 13:53 15.11 KB
table.jpg 2012-03-14 13:52 35.43 KB
upform.jpg 2012-03-14 13:52 22.15 KB
System logic.jpg 2012-03-14 12:15 78.12 KB