Michele Merler Jacquilene Jacob. Applications online are inherently insecure Growing rate of...
-
Upload
cassandra-wilbon -
Category
Documents
-
view
217 -
download
1
Transcript of Michele Merler Jacquilene Jacob. Applications online are inherently insecure Growing rate of...
![Page 1: Michele Merler Jacquilene Jacob. Applications online are inherently insecure Growing rate of hackers Confidentiality of online systems should be.](https://reader035.fdocuments.net/reader035/viewer/2022062620/551a98b05503466b3a8b52e9/html5/thumbnails/1.jpg)
BREAKING AN IMAGE BASED CAPTCHA
Michele Merler
Jacquilene Jacob
![Page 2: Michele Merler Jacquilene Jacob. Applications online are inherently insecure Growing rate of hackers Confidentiality of online systems should be.](https://reader035.fdocuments.net/reader035/viewer/2022062620/551a98b05503466b3a8b52e9/html5/thumbnails/2.jpg)
Applications online are inherently insecure Growing rate of hackers Confidentiality of online systems should be guaranteed by Captchas Image based Captchas propose to overcome issues of text based ones (user friendlyness, robustness to attacks)
BUT…Are they really secure?
Objective
Verify effective security offered by image based Captchas
![Page 3: Michele Merler Jacquilene Jacob. Applications online are inherently insecure Growing rate of hackers Confidentiality of online systems should be.](https://reader035.fdocuments.net/reader035/viewer/2022062620/551a98b05503466b3a8b52e9/html5/thumbnails/3.jpg)
VidoopCaptcha.com
Target System
Verification Solution
Challenge is combination of
images from various categories
User asked to report letters corresponding
to requested categories
![Page 4: Michele Merler Jacquilene Jacob. Applications online are inherently insecure Growing rate of hackers Confidentiality of online systems should be.](https://reader035.fdocuments.net/reader035/viewer/2022062620/551a98b05503466b3a8b52e9/html5/thumbnails/4.jpg)
Process Flow
Training Data
Feature Extractio
n
Train Classifie
r
Test DataFeature
Extraction
Training data
Feature extractio
n
Train using kNN
ResultsPreprocessing
Character Recognizer
Image Category Recognizer
![Page 5: Michele Merler Jacquilene Jacob. Applications online are inherently insecure Growing rate of hackers Confidentiality of online systems should be.](https://reader035.fdocuments.net/reader035/viewer/2022062620/551a98b05503466b3a8b52e9/html5/thumbnails/5.jpg)
Process Flow
Training Data
Feature Extractio
n
Train Classifie
r
Test DataFeature
Extraction
Training data
Feature extractio
n
Train using kNN
ResultsPreprocessing
Character Recognizer
Image Category Recognizer
![Page 6: Michele Merler Jacquilene Jacob. Applications online are inherently insecure Growing rate of hackers Confidentiality of online systems should be.](https://reader035.fdocuments.net/reader035/viewer/2022062620/551a98b05503466b3a8b52e9/html5/thumbnails/6.jpg)
TRAINING DATAImages downloaded from Flickr with a Perl script
~500 images per category
Data Acquisition
TEST DATA200 challenges downloaded from VidoopCaptcha with a Perl script
26 categories
Manual ground truth annotation
![Page 7: Michele Merler Jacquilene Jacob. Applications online are inherently insecure Growing rate of hackers Confidentiality of online systems should be.](https://reader035.fdocuments.net/reader035/viewer/2022062620/551a98b05503466b3a8b52e9/html5/thumbnails/7.jpg)
Process Flow
Training Data
Feature Extractio
n
Train Classifie
r
Test DataFeature
Extraction
Training data
Feature extractio
n
Train using kNN
ResultsPreprocessing
Image Splitting
Character region extractio
n
Character Recognitio
n
Character Recognizer
Image Category Recognizer
![Page 8: Michele Merler Jacquilene Jacob. Applications online are inherently insecure Growing rate of hackers Confidentiality of online systems should be.](https://reader035.fdocuments.net/reader035/viewer/2022062620/551a98b05503466b3a8b52e9/html5/thumbnails/8.jpg)
Test Data-Preprocessing
Image Splitting
Character region extractio
n
Character Recognitio
n
LoG based edge extraction
Horizontal and vertical dominant lines
Generalized Hough transform
Evaluate consistency among subimages
Square (side = sqrt(2)*radius) character regions rescaled to 27x27 pixels
Conversion to grayscale and binarization
1-NN classifier trained on 20 popular fonts images generated with GD library
![Page 9: Michele Merler Jacquilene Jacob. Applications online are inherently insecure Growing rate of hackers Confidentiality of online systems should be.](https://reader035.fdocuments.net/reader035/viewer/2022062620/551a98b05503466b3a8b52e9/html5/thumbnails/9.jpg)
Process Flow
Training Data
Feature Extractio
n
Train Classifie
r
Test DataFeature
Extraction
Training data
Feature extractio
n
Train using kNN
ResultsPreprocessing
Character Recognizer
Image Category Recognizer
![Page 10: Michele Merler Jacquilene Jacob. Applications online are inherently insecure Growing rate of hackers Confidentiality of online systems should be.](https://reader035.fdocuments.net/reader035/viewer/2022062620/551a98b05503466b3a8b52e9/html5/thumbnails/10.jpg)
Character Training Data
Character Feature Extraction
Train using kNN classifier
Character Classification
Training data
Feature extractio
n
Train using 1-
NN
Character Recognizer
64 images generated with GD library for each upper case character, using 20 common fonts
Simple binary vector with all pixels in image
1-NN classifier
![Page 11: Michele Merler Jacquilene Jacob. Applications online are inherently insecure Growing rate of hackers Confidentiality of online systems should be.](https://reader035.fdocuments.net/reader035/viewer/2022062620/551a98b05503466b3a8b52e9/html5/thumbnails/11.jpg)
Process Flow
Training Data
Feature Extractio
n
Train Classifie
r
Test DataFeature
Extraction
Training data
Feature extractio
n
Train using kNN
ResultsPreprocessing
Character Recognizer
Image Category Recognizer
![Page 12: Michele Merler Jacquilene Jacob. Applications online are inherently insecure Growing rate of hackers Confidentiality of online systems should be.](https://reader035.fdocuments.net/reader035/viewer/2022062620/551a98b05503466b3a8b52e9/html5/thumbnails/12.jpg)
Features from all 26 categories
Edge Histograms (6x8 regions)
Color Moments (RGB, 3x3 regions)
Color Histograms (32+32 bins in CbCr) GIST features (314 dims. vectors)
Feature Extraction
For each category, SVM classifier trained on all positive data, negative data randomly taken from other categories
#positive data = #negative data
![Page 13: Michele Merler Jacquilene Jacob. Applications online are inherently insecure Growing rate of hackers Confidentiality of online systems should be.](https://reader035.fdocuments.net/reader035/viewer/2022062620/551a98b05503466b3a8b52e9/html5/thumbnails/13.jpg)
Results
200 test challenges
Image split and character regions detection accuracy: 100%
Character recognition accuracy: 96%
![Page 14: Michele Merler Jacquilene Jacob. Applications online are inherently insecure Growing rate of hackers Confidentiality of online systems should be.](https://reader035.fdocuments.net/reader035/viewer/2022062620/551a98b05503466b3a8b52e9/html5/thumbnails/14.jpg)
Average processing time per challenge: 12 sec.
Best breaking rate: 3%
We can break 9 image Captchas per hour (216/day)
Results
020406080100120140160180200
Edge HistColor Mom ColorHist
GIST
200 test challenges
Single image
Pair images
Triplet images
# r
eco
gniz
ed
imag
es
![Page 15: Michele Merler Jacquilene Jacob. Applications online are inherently insecure Growing rate of hackers Confidentiality of online systems should be.](https://reader035.fdocuments.net/reader035/viewer/2022062620/551a98b05503466b3a8b52e9/html5/thumbnails/15.jpg)
Average processing time per challenge: 12 sec.
Best breaking rate: 3%
We can break 9 image Captchas per hour (216/day)
Results200 test challenges
# p
ass
ed
challe
ng
es
012345678910
Edge HistColor Mom ColorHist
GIST
![Page 16: Michele Merler Jacquilene Jacob. Applications online are inherently insecure Growing rate of hackers Confidentiality of online systems should be.](https://reader035.fdocuments.net/reader035/viewer/2022062620/551a98b05503466b3a8b52e9/html5/thumbnails/16.jpg)
Conclusions
Breaking Image based Captchas is possible
VidoopCaptcha is not 100% secure
Future directions:
- Try other features (SIFT + codebook)
- Obtain cleaner training data (performances suggest poor training data)
- Improve speed and efficiency using more powerful programming languages
- Test online version of Captcha breaker
![Page 17: Michele Merler Jacquilene Jacob. Applications online are inherently insecure Growing rate of hackers Confidentiality of online systems should be.](https://reader035.fdocuments.net/reader035/viewer/2022062620/551a98b05503466b3a8b52e9/html5/thumbnails/17.jpg)
Questions?