A More Robust Text Based CAPTCHA For Security in Web ...

6
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 3, Issue 2, March – April 2014 ISSN 2278-6856 Volume 3, Issue 2 March – April 2014 Page 7 Abstract: Nowadays, in web application mapping, developers use Captcha tools for preventing several security attacks. To improve the capability of a character image to hinder many of breaking attacks, we propose a text-based Captcha algorithm in which each challenge will be associated with main features in order to enhance the Captcha security. Different and new features at each challenge will be generated randomly. The randomness is used to reduce the chance of predicting the next challenge, thus, such automated breaking-Captcha techniques will be unable to use these features as attack vectors when they are extracted from the text Captcha. The main intent of this work is to provide an effective and efficient method to generate a robust text-based Captcha that thwarts many of breaking attacks and at the same time provides reliable and usable way that can distinguish between clients and computer. Keywords: Text-based Captcha, web application security, robust Captcha, automatic attacks thwarting. 1. INTRODUCTION Presently, living has changed the dimension with the introduction of the Internet to mankind, ways people connect to each other, advertising, shopping, education, etc. Consequently, system security has become the most important issue for any websites since there are many methods used to intrude the system over the internet [1]. People have developed techniques, systems, programs and software systems that can replace a normal human being to do a job; such kinds of jobs include entering of data into systems, generate data automatically, handling events that occur on or within a system [2]. As a matter of fact, web sites must ensure that the services are supplied to legitimate human users rather than bots to prevent service abuse [3]. To thwart automated attacks, services often ask users to solve a puzzle before being given access to a service. Human Interactive Proofs (HIPs), focus on automation tests that virtually all humans can pass but current computer programs fail. Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) was an acronym that was coined in 2000. It is a type of challenge-response test that only a human completes successfully [4]. CAPTCHAs are designed to be simple problems that can be quickly solved by humans, but are difficult for computers to solve. Using Captchas, services can distinguish legitimate users from computer bots while requiring minimal effort by the human user [5]. In the procedure, a computer or a program creates a test for its user, who is expected to be a human. The test is meant for the humans, that is, it is to be solvable only by humans and not any other machine, system or program. The user is required to provide a correct response to the test and then the user is permitted to access the work. When a correct response is received, it is presumed that the response arrived because of a human user. CAPTCHA techniques have been classified into four categories [2]: - Text based Captcha. - Audio based Captcha. - Image based Captcha. - Video based Captcha. Each type is suitable to serve different group of users 2.LITERATURE REVIEW CAPTCHA has a number of applications on the web such as 1) Registration of web forms: Number of websites offers free registration to services. Unfortunately these may be susceptible to a web robot attack which is a script, capable of registering thousands of email account on the internet, wasting precious web space. 2) Online polling: The result of any online poll can only be trusted if it is ensured only humans have polled [6]. 3) Web crawler: CAPTCHA provides reasonable solution, when one wants that web pages should not be crawled for indexing by search engines [7]. Huang, et al. [8] have exploited text- based CAPTCHA as one of the items that is used to to mitigate the DDoS (Distributed Denial of Service) from an IaaS cloud service. The challenge response procedure can mitigate the traffic and decide if the packets are from human or machine (program). However, much effort has been made to provide a new Captcha format or more robust captcha. For example Thomas and Kaur [2] proposed a CAPTCHA technique that utilizes image from custom mouse cursors and outperforms some most popular CAPTCHA techniques such as Text – based CAPTCHAs and previous Image – based CAPTCHAs. The authors in [9] aimed to provide a method to generate text-based Captchas which are resilient against segmentation attack by proposing an empirical algorithm with support of Taguchi method to guarantee the quality of the chosen colour schemes. While the authors in [10] tried to provide a balance between the effectiveness and human success rates of text-based CAPTCHAs by A More Robust Text Based CAPTCHA For Security in Web Applications Mumtaz M. Ali AL-Mukhtar 1 and Rana Riad K. AL-Taie 2 1 AL-Nahrain University, Information Engineering College, Baghdad, Iraq 2 AL-Nahrain University, Information Engineering College, Baghdad, Iraq

Transcript of A More Robust Text Based CAPTCHA For Security in Web ...

Page 1: A More Robust Text Based CAPTCHA For Security in Web ...

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected]

Volume 3, Issue 2, March – April 2014 ISSN 2278-6856

Volume 3, Issue 2 March – April 2014 Page 7

Abstract: Nowadays, in web application mapping, developers use Captcha tools for preventing several security attacks. To improve the capability of a character image to hinder many of breaking attacks, we propose a text-based Captcha algorithm in which each challenge will be associated with main features in order to enhance the Captcha security. Different and new features at each challenge will be generated randomly. The randomness is used to reduce the chance of predicting the next challenge, thus, such automated breaking-Captcha techniques will be unable to use these features as attack vectors when they are extracted from the text Captcha. The main intent of this work is to provide an effective and efficient method to generate a robust text-based Captcha that thwarts many of breaking attacks and at the same time provides reliable and usable way that can distinguish between clients and computer. Keywords: Text-based Captcha, web application security, robust Captcha, automatic attacks thwarting. 1. INTRODUCTION Presently, living has changed the dimension with the introduction of the Internet to mankind, ways people connect to each other, advertising, shopping, education, etc. Consequently, system security has become the most important issue for any websites since there are many methods used to intrude the system over the internet [1]. People have developed techniques, systems, programs and software systems that can replace a normal human being to do a job; such kinds of jobs include entering of data into systems, generate data automatically, handling events that occur on or within a system [2]. As a matter of fact, web sites must ensure that the services are supplied to legitimate human users rather than bots to prevent service abuse [3]. To thwart automated attacks, services often ask users to solve a puzzle before being given access to a service. Human Interactive Proofs (HIPs), focus on automation tests that virtually all humans can pass but current computer programs fail. Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) was an acronym that was coined in 2000. It is a type of challenge-response test that only a human completes successfully [4]. CAPTCHAs are designed to be simple problems that can be quickly solved by humans, but are difficult for computers to solve. Using Captchas, services can distinguish legitimate users from computer bots while requiring minimal effort by the human user [5]. In the procedure, a computer or a program creates a

test for its user, who is expected to be a human. The test is meant for the humans, that is, it is to be solvable only by humans and not any other machine, system or program. The user is required to provide a correct response to the test and then the user is permitted to access the work. When a correct response is received, it is presumed that the response arrived because of a human user. CAPTCHA techniques have been classified into four categories [2]:

- Text based Captcha. - Audio based Captcha. - Image based Captcha. - Video based Captcha.

Each type is suitable to serve different group of users 2.LITERATURE REVIEW CAPTCHA has a number of applications on the web such as 1) Registration of web forms: Number of websites offers free registration to services. Unfortunately these may be susceptible to a web robot attack which is a script, capable of registering thousands of email account on the internet, wasting precious web space. 2) Online polling: The result of any online poll can only be trusted if it is ensured only humans have polled [6]. 3) Web crawler: CAPTCHA provides reasonable solution, when one wants that web pages should not be crawled for indexing by search engines [7]. Huang, et al. [8] have exploited text-based CAPTCHA as one of the items that is used to to mitigate the DDoS (Distributed Denial of Service) from an IaaS cloud service. The challenge response procedure can mitigate the traffic and decide if the packets are from human or machine (program). However, much effort has been made to provide a new Captcha format or more robust captcha. For example Thomas and Kaur [2] proposed a CAPTCHA technique that utilizes image from custom mouse cursors and outperforms some most popular CAPTCHA techniques such as Text – based CAPTCHAs and previous Image – based CAPTCHAs. The authors in [9] aimed to provide a method to generate text-based Captchas which are resilient against segmentation attack by proposing an empirical algorithm with support of Taguchi method to guarantee the quality of the chosen colour schemes. While the authors in [10] tried to provide a balance between the effectiveness and human success rates of text-based CAPTCHAs by

A More Robust Text Based CAPTCHA For Security in Web Applications

Mumtaz M. Ali AL-Mukhtar1 and Rana Riad K. AL-Taie2

1AL-Nahrain University, Information Engineering College,

Baghdad, Iraq

2AL-Nahrain University, Information Engineering College, Baghdad, Iraq

Page 2: A More Robust Text Based CAPTCHA For Security in Web ...

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected]

Volume 3, Issue 2, March – April 2014 ISSN 2278-6856

Volume 3, Issue 2 March – April 2014 Page 8

proposing a text-based CAPTCHA mechanism in which each challenge is associated with a tip providing humans enough information to recognize the alphabets in a highly distorted text image. Rahman et al. [7] proposed a dynamic image based on three tier CAPTCHA system to make cracking a difficult task for BOT. They observed that the probability of a BOT hacking the proposed prototype system is considerably low. While in [11] a CAPTCHA implementation has been proposed in the form of 3D animation and based on the weak point of computer vision. The derived method could prevent attacks based on both image recognition and moving objects recognition in videos. We aim to provide a secure scheme that is resilient to the state of the art attackers by applying the secure design principles at all layers. We propose a new mechanism for Captcha text-based systems that provides a balance between the two goals: that it is robust against the automated programs and easy for people to be solved. Each challenge in our mechanism consists of random and varied significant features which provide a highly distorted text image which will be resilient effective against automated attacks but harmless for users. 3.PROPOSED SCHEME During the design stage we have been addressing the main principles which play an important role for providing a more robust CAPTCHA. In our proposed scheme, multiple secure features which are extremely effective to obfuscate challenges for the breaking attack but easy to solve by users have been applied. Our processing will disallow replay of previously submitted challenge (never be reused), thus for every time the page is generated (or refreshed), the principle features will be changed as the following: CAPTCHA’s code is a series of characters (uppercase

and lowercase) and numbers. Multiple randomizing functions are used to generate a

random code (stream of characters and numbers) in each challenge in order to make it not susceptible to a dictionary attack.

The length of the code is varied (minimum length is 6 characters-numbers).

Multiple font types are handled to prevent intrusion using image processing techniques when a consistent font is used.

String/code are rotated at various angels. Lines are utilized to prevent segmentation. The

numbers and the length of lines and their positions are varied each time in order to distort the text image randomly before being presented to the user.

The text image is blurred using a specific technique in order to make CAPTCHA difficult for malicious software.

Image dimensions are varied inconsistent with all the features above.

CAPTCHA’s code and line colour are kept in gray scale colours but at a different level for each time.

Captcha is generated by creating an image from text in php using GD library. GD library is an open source code library which has been used in this work for the dynamic creation of images that can be customized a lot in different formats. Before generation the HTML code with the input fields for displaying captcha image and submitting the correct code/string, the captcha image code is first created. We have two files, first file is the file that holds the form which contains the Captcha image, and the other one is the file that is used to generate the captcha image. When a user accesses a website that has been protected with a Captcha to prevent abuse by automated programs, the user requests a secure form from the web server. The web server will automatically generate a random captcha image by calling a php file (let’s say image.php), then the web server sends a file (let’s say captach.php) which holds the form associated with the random captcha image. The steps of the devised algorithm are depicted in figure (1) and stated as follows: 1- On the form page, rand() php functions are used to

generate random numbers (used to determine the font size, code length, font type) and other php functions (str_shuffle() and substr()) are used to generate a code (string) consisting of numbers, uppercase and lowercase letters.

2- The random numbers and string will be sent to the server along with secure sessions that are used for generating a CAPTCHA image.

3- The random values (for font type and font size) and the string (code) will be sent to the server along with secure sessions to be used for generating a Captcha image.

4- At the server side, the image.php file will be called. Since GD library offers tons of ways to dynamically create PNG, JPEG or GIF files and output them directly to the browser, several GD library functions will be used as a way to convert certain text elements into image in order to create the Captcha image making use of the sessions values received from the user side.

5- The CAPTCHA image is retrieved and displayed on the form page as a challenge for the user.

6- The user reads the code, fills in the correct letters and numbers, and his/her form is submitted.

7- The stored code in the session is verified against the solution provided by the user. If the user does not provide a valid code, or if the user refreshes the page, a newly generated session (step 2) is asserted. If verification is successful, user will be directed to next logical step (such as website services).

Page 3: A More Robust Text Based CAPTCHA For Security in Web ...

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected]

Volume 3, Issue 2, March – April 2014 ISSN 2278-6856

Volume 3, Issue 2 March – April 2014 Page 9

Figure 1 Captcha framework

4.IMPLENTATION PHASE To make it possible to provide a Captcha with different font types on every new page, we should have True Type font files (TTF). We are deploying complex twenty TTF files taken from the fonts folder of the window OS with .ttf as their extension. In the first step we change the names of these files to numbers values instead of their original names. For example: (Curlz MT Reguler.ttf) is changed to (1.ttf). This step will help to provide a Captcha with different font types on every refreshed page. These files are in the same folder in which we will place our php script. The steps of implementation are shown in figure 2. In Captcha.php file we will start to set the security random numbers by making use of the rand() php function to: Select a random number between 6, 12 which

represents the length of the code (string) in different values each time the page is refreshed.

Select a random number between 1, 20 which represents the name of the a true font file for providing different font types each time the page is refreshed.

Select a random number between 30,40 which represents the font size for providing different font each time the page is refreshed.

The last two randomly generated numbers will be stored in sessions:

$_SESSION['font']=$font_type.".ttf"; $_SESSION['size'] = $font_size; To avoid the dictionary attack, we generate a random string (code) consisting of characters (uppercase/lowercase) and numbers by following steps: A string which consist of numbers from (0-9) and

letters (uppercase and lowercase) from (a-z), will be shuffled randomly by using str_shuffle() php function.

A portion of the shuffled string will be returned using substr() php function. this portion is specified by the start and length parameters (the length parameter will equal to the code length value which generated previously randomly by using the rand function rand (6,12)). Thus the length of the code/string is made variable at each time.

The string/code is stored in a php session let’s say $_SESSION['secure']. So it can be accessed later for verification and to be used by image.php file for creating the image from that string/code. All the previous sessions are to be satisfied to create the image in the image.php file.

The image.php is deployed to create the Captcha image using GD functions for creating text in image format. This is initiated by calling the secure sessions variables which have been setup in the previous file: $_SESSION['font'], $_SESSION['size'] and $_SESSION [secure]. First we specify the height and the length of the image. In order to make the width of the image to accommodate varying length of the string/code dynamically we will: Calculate the number of characters costituting the

secure code/string that has been stored in the $_SESSION['secure'].

Multiply the result by the font size (usually stored at $_SESSION['size'] session).

The final result will be incremented by an estimating value until we have a good space between the text/code and the image edge, that is when applying the maximum font size=40 and the maximum string/code length= 12.

Specifying the foreground and the background of the image by: Specifying the image background colour with white

colour. Specifying the text colour (image foreground) which

will be at gray scale levels but with different level each time the page is refreshed. The $red, $green and $blue variables hold the random gray scale colour values.

Determining the angle value for rotating the text in addition to the y-coordinate and x-coordinate which is required when writing code/string to the image, in addition to image blurring is carried out by the following steps:

Page 4: A More Robust Text Based CAPTCHA For Security in Web ...

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected]

Volume 3, Issue 2, March – April 2014 ISSN 2278-6856

Volume 3, Issue 2 March – April 2014 Page 10

Figure 2 Implementation steps flowchart

Randomly select the angles values that are used to

rotate the string/code at a different angle for each time the page is refreshed.

Determine bounding box of text/code (Get exact dimensions of text string). These dimensions will be changed with the changing of each angle, font size, font type and string/code values.

Get the width and the height of current code/string from dimensions (which will be varied with every new challenge passed on the other varied items).

Get x-coordinate of centered text horizontally using length of the image and length of the text.

Get y-coordinate of centered text vertically using height of the image and height of the text.

Appling the GD function called imageconvolution() to blur the image by applying a convolution matrix on the image, using a given coefficient and offset.

To mask this string/code and make it still readable by humans but unreadable by computers, we use imageline()

Page 5: A More Robust Text Based CAPTCHA For Security in Web ...

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected]

Volume 3, Issue 2, March – April 2014 ISSN 2278-6856

Volume 3, Issue 2 March – April 2014 Page 11

GD function that adds lines to the image in order to generate some of noise as follows: Apply this function inside a for loop; the loop interval

represents the number of lines that are added and are randomly selected within a limited range.

The lines colour will be as the text colour. Create lines that are completely in a random position so

that they can splash to cross the image. Change the length to be consistent with the image

width (because the width of the image is varied). As the Captcha has been created, it is important to verify the input against the captcha value. Recall that the secret string/code has been stored in a session variable called $_SESSION['secure']. Once the form is submitted, the user’s value is checked against the one stored in the $_SESSION['secure'] session. If they match, the user will be directed to the next step otherwise an error message would be generated and another image with new string/code is generated. 4.RESULTS Figures 3, 4, 5 and 6 show some of samples of the proposed robust text-based captcha.

Figure 3 Font size =34, font file name = 19.ttf, string

length = 9

Figure 4 Font size =33, font file name = 18.ttf, string

length = 12

Figure 5 Font size =40, font file name = 8.ttf, string

length = 7

Figure 6 Font size =30, font file name = 5.ttf, string

length = 9 Every time the secure form page (captch.php file) is refreshed, we have a new CAPTCHA image varied in: width and height, font type, font size, string position, number of lines with, random lines position, font and lines colour (varying in the gray scale level), the length of the lines in conjunction with the image dimension. The string contains mixed combination of from numbers, uppercase and lowercase characters. 5.CONCLUSION As a contribution toward improving the web security in the field of an automated challenge and response against attacks issued by automated programs, we proposed a more robust text_based CAPTCHA. Two main goals have been considered to be achieved that is: simplicity of solving the technique for a human as well as the time that a human actually needs to find the solution. Since a weak CAPTCHA implementation can only provide a false sense of security, we have been addressing the principle features which contribute in effective way to provide more secure challenge and a combination of these features to construct a generic text-based Captcha. To increase the difficulty for segmentation and recognition attacks on Captchas, we varied these significant features at each challenge in ranges potentially acceptable to human users. Our mechanism provides a solution to maximize the robustness and usability of text-based CAPTCHAs simultaneously.

Page 6: A More Robust Text Based CAPTCHA For Security in Web ...

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected]

Volume 3, Issue 2, March – April 2014 ISSN 2278-6856

Volume 3, Issue 2 March – April 2014 Page 12

References [1] Tamang T., and Bhattarakosol P. “Uncover impact

factors of text-based CAPTCHA identification”, 7th International Conference on Computing and Convergence Technology (ICCCT)., pp. 556 - 560 Dec. 2012.

[2] Thomas V.A, and Kaur K., “Cursor CAPTCHA–Implementing CAPTCHA Using Mouse Cursor”, Tenth International Conference on Wireless and Optical Communications Networks (WOCN)., pp. 1 - 5, July 2013

[3] Haichang G., Honggang L., Dan Y., Xiyang L., and Uwe A., “An Audio CAPTCHA to Distinguish Humans from Computers”, Third International Symposium on Electronic Commerce and Security (ISECS)., pp. 265-269, July 2010.

[4] Sushma Yalamanchili, and M. Kameswara Rao, “A Framework for Devanagari Script-based Captcha”, International Journal of Advanced Information Technology (IJAIT)., Vol. 1, No. 4, September 2011.

[5] Gossweiler R., Kamvar M., and Baluja S., “What's Up CAPTCHA? A CAPTCHA Based On Image Orientation”, WWW’09 Proceedings of the 18th international conference on the World wide web, pp. 841-850, April 20–24, 2009.

[6] Luis Von Abn, Manual Blum, Nichlas Hoper and John Langford, “CAPTCHA Using a hard problems for security”, EUROCRYPT, pp. 294–311, 2003.

[7] Rahman UR., Tomar D.S., and Das S., “Dynamic Image Based CAPTCHA”, International Conference on Communication Systems and Network Technologies (CSNT)., pp. 90-94, May 2012.

[8] Huang V.S., Huang R., and Ming Chiang, “A DDoS Mitigation System with Multi-Stage Detection and Text-Based Turing Testing in Cloud Computing”, 27th International Conference on Advanced Information Networking and Applications Workshops (WAINA)., pp. 655-662, March 2013.

[9] Pan Lei, and Zhou Yan, “Developing an Empirical Algorithm for Protecting Text-based CAPTCHAs against Segmentation Attacks”, 12th IEEE International Conference on Trust Security and Privacy in Computing and Communications (TrustCom), pp. 636-643, July 2013.

[10] Wei-Bin Lee, Che-Wei Fan, Kevin Ho, and Chyi-Ren Dow, “A CAPTCHA with Tips Related to Alphabets Upper or Lower Case”, Seventh International Conference, Broadband on Wireless Computing, Communication and Applications (BWCCA)., pp. 458-461, Nov 2012.

[11] Jing-Song Cui, Jing-Ting Mei, Wu-Zhou Zhang, Xia Wang, and Da Zhang, “A CAPTCHA Implementation Based on Moving Objects Recognition Problem”, International Conference on E-Business and E-Government (ICEE)., pp. 1277-1280, May 2010.

AUTHOR Mumtaz AL-Mukhtar is associate professor at Information Engineering College in AL-Nahrain University, Iraq. He received M.Sc. degree in Technical Cybernetics from Czech Technical University, Prague in 1979. He received M.Sc. in computer engineering in 1989 and in 2001 he earned his Ph.D. in the same field from the University of Technology, Iraq. He advised tens of Ph.D dissertations and Master theses. He has published widely in journals and conferences. His research interests include distributed systems, cloud computing, wireless sensor networks, social networks, pervasive computing, and mobile learning. He also holds an Adjunct position of electrical& computer engineering at Michigan State University, USA. Rana Riad received the B.SC. degrees in Computer and Software Engineering from AL-Mustansiriya University in 2005. This paper is part of her M.SC. degree which she is currently pursuing.