RE: Topic 3 - CAPTCHA literature review initial draft completed from Scott Hollier on 2016-12-23 (public-rqtf@w3.org from December 2016)

From: Scott Hollier <scott@hollier.info>
Date: Fri, 23 Dec 2016 05:30:57 +0000
To: "public-rqtf@w3.org" <public-rqtf@w3.org>
Message-ID: <BN6PR01MB2755950A27A99DAACE3B93C1DC950@BN6PR01MB2755.prod.exchangelabs.com>
To Jason, et. al.

I had a look at David’s wiki but my information is categorised differently and as mentioned previously it’ll be difficult for me to format the information with my screen reader in a way that is presented well enough for everyone to read.  As such I’m still going to need some help.  While research is one of my stronger points, formatting wikis is not sadly!

I’ve provided all the references as text below so if there’s someone in the RQTF, or perhaps even W3C more broadly who is more capable at making things look good in a wiki than I am, all the information I have is provided below.  Alternatively I’m happy to provide the EndNote library I’ve used to collate the information if that is more helpful.

Also could you please ammend the minutes for the last meeting to reflect that I offered to do this research and that it was agreed by those on the teleconference that I should proceed as an action item.

Thank you – references to follow.

Scott.


Topic 3 – CAPTCHA, see ‘notes’ for work plan category


Reference Type:  Journal Article
Author: Alexander, George
Year: 2015
Title: Tech: Siri For Your Living Room
Journal: Popular Mechanics
Volume: 192
Issue: 3
Pages: 20
Short Title: Tech: Siri For Your Living Room
ISSN: 00324558
Keywords: Biometrics
Passwords
Google Inc
Abstract:  The Eutogy Captecha You know CAPTCHA, the hard-to-read jumble of letters, numbers, and obfuscating lines that supposedly confirm your humanity every time you go to buy Taylor Swift tickets? Besides being infuriating to real people, especially those using mobile devices, CAPTCHA is no longer fooling the robots. Succeeding it is Google's No CAPTCHA reCAPTCHA, a much simpler set of boxes you click in answer to basic prompts ("Pick your favorite color," "I'm not a robot") that started rolling out late last year.
Notes: mobile devices


Reference Type:  Journal Article
Author: Al-Naymat, Ghazi, Al-Kasassbeh, Mouhammd, Abu-Samhadanh, Nosaiba and Sakr, Sherif
Year: 2016
Title: CLASSIFICATION OF VOIP AND NON-VOIP TRAFFIC USING MACHINE LEARNING APPROACHES
Journal: Journal of Theoretical and Applied Information Technology
Volume: 92
Issue: 2
Pages: 403-414
Short Title: CLASSIFICATION OF VOIP AND NON-VOIP TRAFFIC USING MACHINE LEARNING APPROACHES
ISSN: 18173195
Abstract:  Enhancing network services and security can be achieved by performing network traffic classification identifying applications, which is one of the primary components of network operations and management. The traditional transport-layer and port-based classification approaches have some limitations in achieving accurate identification. In this paper, a real test bed is used to collect first-hand traffic dataset from five different VoIP and Non-VoIP applications that are used by majority of Internet community, namely Skype, YouTube, Yahoo Messenger, GTalk and PayPal. The collected data encompasses new features that have never been used before. In addition, a classification step is performed using off-the-shelf machine learning techniques, specifically Random Forest J48, meta.AdaBoost (J48) and MultiLayer Perceptron to classify the traffic. Our experimental results show that using the new features can dramatically improve the true positive ratio by up to 98% and this is significant outcome towards providing accurate traffic classification.
Notes: machine learning


Reference Type:  Journal Article
Author: Belk, Marios, Fidas, Christos, Germanakos, Panagiotis and Samaras, George
Year: 2015
Title: Do human cognitive differences in information processing affect preference and performance of CAPTCHA?
Journal: International Journal of Human - Computer Studies
Volume: 84
Pages: 1-18
Short Title: Do human cognitive differences in information processing affect preference and performance of CAPTCHA?
ISSN: 1071-5819
DOI: 10.1016/j.ijhcs.2015.07.002
Keywords: Human Interaction Proofs
Captcha
Cognitive Styles
Cognitive Processing Abilities
User Study
Abstract: A Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is a widely used security defense mechanism that is utilized by service providers to determine whether the entity interacting with their system is a human and not a malicious agent. Common design practices of current CAPTCHA schemes barely take into account cultural, contextual, and individual cognitive characteristics and abilities of users. Motivated by recent research which underpins the necessity for designing more user-friendly CAPTCHA, this paper investigates the effect of users’ cognitive styles and cognitive processing abilities towards preference and task performance of CAPTCHA challenges. In the frame of the reported research, two user studies were conducted. The first study (n=131) explored the effect of users’ cognitive styles (Verbal/Imager) on user preference and task performance of two complementary types of CAPTCHA mechanisms
text-recognition and image-recognition. The second study (n=125) explored the effect of users’ cognitive processing abilities (speed of processing, controlled attention, working memory capacity) on task performance in regards with different levels of complexity of both text-recognition and image-recognition CAPTCHA. Analysis of results revealed interaction effects of users’ cognitive processing characteristics towards preference and performance of CAPTCHA, suggesting that individual differences at such an intrinsic level are important to be considered for designing more usable and user-centric CAPTCHA challenges. •We study the effects of cognitive factors on CAPTCHA preference and performance.•Verbals prefer and solve significantly faster text CAPTCHA.•There is a growing trend of Imagers preferring and solving faster image CAPTCHA.•Users with enhanced cognitive processing abilities solve CAPTCHA faster.•Implications about the design of personalized CAPTCHA are discussed.
Notes: mobile devices


Reference Type:  Generic
Author: Bursztein, E., Martin, M. and Mitchell, J. C.
Year: 2011
Title: Text-based CAPTCHA strengths and weaknesses
Pages: 125-137
Short Title: Text-based CAPTCHA strengths and weaknesses
ISBN/ISSN: 15437221
DOI: 10.1145/2046707.2046724
Keywords: Captcha
Knn Classifier
Machine Learning
Reverse Turing Test
Svm
Vision Algorithm
Notes: machine learning


Reference Type:  Journal Article
Author: Catuogno, Luigi and Galdi, Clemente
Year: 2014
Title: On user authentication by means of video events recognition
Journal: Journal of Ambient Intelligence and Humanized Computing
Volume: 5
Issue: 6
Pages: 909-918
Short Title: On user authentication by means of video events recognition
ISSN: 1868-5137
DOI: 10.1007/s12652-014-0248-5
Keywords: Graphical password
Authentication
Human cryptography
Abstract: Graphical password schemes have been widely analyzed in the last couple of decades. Typically such schemes are not resilient to adversaries who are able to collect a considerable amount of session transcripts, and can process them automatically in order to extract the secret. In this paper we discuss a possible enhancement to graphical passwords aiming at making infeasible to the attacker to automatically process the collected transcripts. In particular, we investigate the possibility of replacing static graphical challenges with on-the-fly edited videos. In our approach, the system challenges the user by showing her a short film containing a number of pre-defined pass-events and the user replies with the proof that she recognized such events. We present a proof-of-concept prototype, FilmPW, and discuss some issues related to event life-cycle management. Our preliminary experiments show that such an authentication mechanism is well accepted by users and achieves low error rates.
Notes: machine learning


Reference Type:  Generic
Author: Cetin, Cagri
Year: 2015
Title: Design, Testing and Implementation of a New Authentication Method Using Multiple Devices
Secondary Author: Ligatti, Jay, Goldgof, Dmitry and Liu, Yao
Publisher: ProQuest Dissertations Publishing
Short Title: Design, Testing and Implementation of a New Authentication Method Using Multiple Devices
Keywords: Computer Science
Applied Sciences
Access Control
Authentication Protocols
Mobile Devices
Security
Verification
Abstract: Authentication protocols are very common mechanisms to confirm the legitimacy of someone's or something's identity in digital and physical systems. This thesis presents a new and robust authentication method based on users' multiple devices. Due to the popularity of mobile devices, users are becoming more likely to have more than one device (e.g., smartwatch, smartphone, laptop, tablet, smart-car, smart-ring, etc.). The authentication system presented here takes advantage of these multiple devices to implement authentication mechanisms. In particular, the system requires the devices to collaborate with each other in order for the authentication to succeed. This new authentication protocol is robust against theft-based attacks on single device
an attacker would need to steal multiple devices in order to compromise the authentication system. The new authentication protocol comprises an authenticator and at least two user devices, where the user devices are associated with each other. To perform an authentication on a user device, the user needs to respond a challenge by using his/her associated device. After describing how this authentication protocol works, this thesis will discuss three different versions of the protocol that have been implemented. In the first implementation, the authentication process is performed by using two smartphones. Also, as a challenge, a QR code is used. In the second implementation, instead of using a QR code, NFC technology is used for challenge transmission. In the last implementation, the usability with different platforms is exposed. Instead of using smartphones, a laptop computer and a smartphone combination is used. Furthermore, the authentication protocol has been verified by using an automated protocol-verification tool to check whether the protocol satisfies authenticity and secrecy properties. Finally, these implementations are tested and analyzed to demonstrate the performance variations over different versions of the protocol.
Notes: machine learning


Reference Type:  Journal Article
Author: Conti, Mauro, Guarisco, Claudio and Spolaor, Riccardo
Year: 2015
Title: CAPTCHaStar! A novel CAPTCHA based on interactive shape discovery
Short Title: CAPTCHaStar! A novel CAPTCHA based on interactive shape discovery
Keywords: Computer Science - Human-Computer Interaction
Abstract: Over the last years, most websites on which users can register (e.g., email providers and social networks) adopted CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) as a countermeasure against automated attacks. The battle of wits between designers and attackers of CAPTCHAs led to current ones being annoying and hard to solve for users, while still being vulnerable to automated attacks. In this paper, we propose CAPTCHaStar, a new image-based CAPTCHA that relies on user interaction. This novel CAPTCHA leverages the innate human ability to recognize shapes in a confused environment. We assess the effectiveness of our proposal for the two key aspects for CAPTCHAs, i.e., usability, and resiliency to automated attacks. In particular, we evaluated the usability, carrying out a thorough user study, and we tested the resiliency of our proposal against several types of automated attacks: traditional ones
designed ad-hoc for our proposal
and based on machine learning. Compared to the state of the art, our proposal is more user friendly (e.g., only some 35% of the users prefer current solutions, such as text-based CAPTCHAs) and more resilient to automated attacks.
Notes: machine learning


Reference Type:  Generic
Author: Dantala, Pradeep
Year: 2011
Title: Authentication for multi-located parties and wireless ad hoc networks
Secondary Author: Kak, Subhash, Cline, David and Thomas, Johnson
Publisher: ProQuest Dissertations Publishing
Short Title: Authentication for multi-located parties and wireless ad hoc networks
Keywords: Computer Science
Applied Sciences
Ad Hoc Networks
Authentication
Authentication Agents
Multi-Located Parties
Three Stage Protocol
Wireless Networks
Abstract: This thesis present a new authentication protocol for multi-located parties which uses an agent based scheme that divides the message into two parts together with a key distribution center to ensure stronger authentication. It also presents a protocol for wireless ad hoc networks to combat spamming and reduce traffic overload. The appropriate number of authentication agents was calculated for a wireless ad hoc network. Simulations were run for networks of 200, 1000, 2000 and 4000 nodes and it was found that 0.075 n (n is the number of nodes in the network) authentication agents work well to distribute the load evenly amongst them.
Notes: machine learning


Reference Type:  Generic
Author: Datta, Ritendra
Year: 2009
Title: Semantics and aesthetics inference for image search: Statistical learning approaches
Secondary Author: Wang, James Z. and Li, Jia
Publisher: ProQuest Dissertations Publishing
Short Title: Semantics and aesthetics inference for image search: Statistical learning approaches
Keywords: Artificial Intelligence
Computer Science
Applied Sciences
Automatic Image Annotation
Captcha
Image Recognition
Image Search
Machine Learning
Abstract: The automatic inference of image semantics is an important but highly challenging research problem whose solutions can greatly benefit content-based image search and automatic image annotation. In this thesis, I present algorithms and statistical models for inferring image semantics and aesthetics from visual content, specifically aimed at improving real-world image search. First, a novel approach to automatic image tagging is presented which furthers the state-of-the-art in both speed and accuracy. The direct use of automatically generated tags in real-world image search is then explored, and its efficacy demonstrated experimentally. An assumption which makes most annotation models misrepresent reality is that the state of the world is static, whereas it is fundamentally dynamic. I explore learning algorithms for adapting automatic tagging to different scenario changes. Specifically, a meta-learning model is proposed which can augment a black-box annotation model to help provide adaptability for personalization, time evolution, and contextual changes. Instead of retraining expensive annotation models, adaptability is achieved through efficient incremental learning of only the meta-learning component. Large scale experiments convincingly support this approach. In image search, when semantics alone yields many matches, one way to rank images further is to look beyond semantics and consider visual quality. I explore the topic of data-driven inference of aesthetic quality of images. Owing to minimal prior art, the topic is first explored in detail. Then, methods for extracting a number of high-level visual features, presumed to have correlation with aesthetics, are presented. Through feature selection and machine learning, an aesthetics inference model is trained and found to perform moderately on real-world data. The aesthetics-correlated visual features are then used in the problem of selecting and eliminating images at the high and low extremes of the aesthetics scale respectively, using a novel statistical model. Experimentally, this approach is found to work well in visual quality based filtering. Finally, I explore the use of image search techniques for designing a novel image-based CAPTCHA, a Web security test aimed at distinguishing humans from machines. Assuming image search metrics to be potential attack tools, they are used in the loop to design attack-resistant CAPTCHAs.
Notes: machine learning


Reference Type:  Generic
Author: Djalaliev, Peter
Year: 2013
Title: Mitigating botnet-based DDoS attacks against web servers
Secondary Author: Lee, Adam
Publisher: ProQuest Dissertations Publishing
Short Title: Mitigating botnet-based DDoS attacks against web servers
Keywords: Computer Science
Applied Sciences
Botnets
Distributed Denial-of-Service
Federated Authentication
Hardware Tokens
Kerberos Protocol
Abstract: Distributed denial-of-service (DDoS) attacks have become wide-spread on the Internet. They continuously target retail merchants, financial companies and government institutions, disrupting the availability of their online resources and causing millions of dollars of financial losses. Software vulnerabilities and proliferation of malware have helped create a class of application-level DDoS attacks using networks of compromised hosts (botnets). In a botnet-based DDoS attack, an attacker orders large numbers of bots to send seemingly regular HTTP and HTTPS requests to a web server, so as to deplete the server's CPU, disk, or memory capacity. Researchers have proposed client authentication mechanisms, such as CAPTCHA puzzles, to distinguish bot traffic from legitimate client activity and discard bot-originated packets. However, CAPTCHA authentication is vulnerable to denial-of-service and artificial intelligence attacks. This dissertation proposes that clients instead use hardware tokens to authenticate in a federated authentication environment. The federated authentication solution must resist both man-in-the-middle and denial-of-service attacks. The proposed system architecture uses the Kerberos protocol to satisfy both requirements. This work proposes novel extensions to Kerberos to make it more suitable for generic web authentication. A server could verify client credentials and blacklist repeated offenders. Traffic from blacklisted clients, however, still traverses the server's network stack and consumes server resources. This work proposes Sentinel, a dedicated front-end network device that intercepts server-bound traffic, verifies authentication credentials and filters blacklisted traffic before it reaches the server. Using a front-end device also allows transparently deploying hardware acceleration using network co-processors. Network co-processors can discard blacklisted traffic at the hardware level before it wastes front-end host resources. We implement the proposed system architecture by integrating existing software applications and libraries. We validate the system implementation by evaluating its performance under DDoS attacks consisting of floods of HTTP and HTTPS requests.
Notes: machine learning


Reference Type:  Generic
Author: Golle, Philippe
Year: 2008
Title: Machine learning attacks against the asirra CAPTCHA
Pages: 535-542
Short Title: Machine learning attacks against the asirra CAPTCHA
ISBN/ISSN: 15437221
DOI: 10.1145/1455770.1455838
Keywords: Captcha
Classifier
Machine Learning
Reverse Turing Test
Support Vector Machine
Notes: machine learning


Reference Type:  Generic
Author: Hayata, Tomohiro
Year: 2012
Title: Developing a secure and usable user-cognitive authentication scheme
Secondary Author: Beheshti, Mohsen
Publisher: ProQuest Dissertations Publishing
Short Title: Developing a secure and usable user-cognitive authentication scheme
Keywords: Computer Science
Applied Sciences
Abstract: Today's online services are threatened by malicious programs which would illegally gain credentials at the expense of legitimate human users. As a human authentication test, CAPTCHAs (Completely Automated Public Turing tests to tell Computers and Humans Apart) have many applications for preventing them. The existing text-based CAPTCHAs are not safe as computer-vision techniques develop rapidly. The modified solutions are still in the stage of infancy because they are either hard to solve and costly to implement or easily trespassed due to compromised security. CAPTCHAs are supposed to be easy for humans but difficult for computers. The project design for a novel authentication scheme assumes no need for costly modifications or investment while maximizing rational thinking and creativity to ensure user-cognitive challenge response test whose background protocols and upfront user interface provide secure and usable real-time protection. This research is intended to be the basis for Authentication-as-a-Service for today's web services.
Notes: mobile devices


Reference Type:  Journal Article
Author: Hernández‐Castro, Carlos Javier, Barrero, David F. and R‐Moreno, María D.
Year: 2016
Title: Machine learning and empathy: the Civil Rights CAPTCHA
Journal: Concurrency and Computation: Practice and Experience
Volume: 28
Issue: 4
Pages: 1310-1323
Short Title: Machine learning and empathy: the Civil Rights CAPTCHA
ISSN: 1532-0626
DOI: 10.1002/cpe.3632
Keywords: Captcha
Hip
Machine Learning
Wordnet
Artificial Intelligence
Abstract: Human interactive proofs (HIPs) are a basic security measure on the Internet to avoid automatic attacks. There is an ongoing effort to find a HIP that is secure enough yet easy for humans. Recently, a new HIP has been designed aiming at higher security: the Civil Rights Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA). It employs the empathy capacity of humans to further strengthen Securimage, a well‐known text CAPTCHA. In this paper, we analyze it from a security perspective, finding fundamental design flaws. Using several well‐known machine learning (ML) algorithms, we analyze to what extent these flaws affect its security. We discover that thanks to them, we can create a successful side‐channel attack. This attack is able to correctly solve the HIP on 20.7 of occasions, much more than enough to consider it broken. Thus, we show that there is no need to solve the problem of optical character recognition nor empathy analysis for computers to break this particular HIP. ML can be successfully used to break a HIP that uses both with a side‐channel attack. This security analysis can be applied to other HIPs. It will allow to test whether they are too much information by unexpected ways, given non‐evident design flaws. Copyright © 2015 John Wiley & Sons, Ltd.
Notes: machine learning


Reference Type:  Journal Article
Author: Hidalgo, José María Gómez and Alvarez, Gonzalo
Year: 2011
Title: CAPTCHAs: An Artificial Intelligence Application to Web Security
Journal: Advances In Computers
Volume: 83
Pages: 109-181
Short Title: CAPTCHAs: An Artificial Intelligence Application to Web Security
ISSN: 0065-2458
DOI: 10.1016/B978-0-12-385510-7.00003-5
Abstract: Nowadays, it is hard to find a popular Web site with a registration form that is not protected by an automated human proof test which displays a sequence of characters in an image, and requests the user to enter the sequence into an input field. This security mechanism is based on the Turing Test—one of the oldest concepts in Artificial Intelligence—and it is most often called Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA). This kind of test has been conceived to prevent the automated access to an important Web resource, for example, a Web mail service or a Social Network. There are currently hundreds of these tests, which are served millions of times a day, thus involving a huge amount of human work. On the other side, a number of these tests have been broken, that is, automated programs designed by researchers, hackers, and spammers have been able to automatically serve the correct answer. In this chapter, we present the history and the concept of CAPTCHAs, along with their applications and a wide review of their instantiations. We also discuss their evaluation, both from the user and the security perspectives, including usability, attacks, and countermeasures. We expect this chapter provides to the reader a good overview of this interesting field.
Notes: machine learning


Reference Type:  Generic
Author: Khanna, Sumit
Year: 2009
Title: Breaking the Multi Colored Box: A Study of CAPTCHA
Secondary Author: Harris, Billy, Kizza, Joseph and Thompson, Jack
Publisher: ProQuest Dissertations Publishing
Short Title: Breaking the Multi Colored Box: A Study of CAPTCHA
Keywords: Computer Science
Applied Sciences
Captcha
Filtering
Ocr
Recognition
Abstract: Communication is faster than ever. Innovations in low cost network computing have brought an era in which people can effortlessly and instantaneously view and post opinions collaboratively with others across the world. With such an infrastructure of public message boards, chat rooms and instant messaging systems, there is also a large potential for abuse by people wishing to capitalize on such open services by posting unsolicited advertisements. An entire industry has been constructed around the prevention of unsolicited electronic advertisements (SPAM). This thesis examines various techniques for preventing SPAM, focusing on Completely Automated Public Turing Tests to Tell Computers and Humans Apart (CAPTCHA), a challenge/response technique where an image is displayed with text that is heavily distorted. It also examines the feasibility of breaking CAPTCHA programmatically, alternatives to CAPTCHA based on filtering, improvements to CAPTCHA using photo recognition and avoiding the need for CAPTCHA using naïve approaches.
Notes: machine learning


Reference Type:  Journal Article
Author: Kim, Jonghak, Kim, Sangtae, Yang, Joonhyuk, Ryu, Jung-hee and Wohn, KwangYun
Year: 2014
Title: FaceCAPTCHA: a CAPTCHA that identifies the gender of face images unrecognized by existing gender classifiers
Journal: An International Journal
Volume: 72
Issue: 2
Pages: 1215-1237
Short Title: FaceCAPTCHA: a CAPTCHA that identifies the gender of face images unrecognized by existing gender classifiers
ISSN: 1380-7501
DOI: 10.1007/s11042-013-1422-z
Keywords: CAPTCHA
Crowdsourcing
Gender classification
Human computation
Image tagging
Web application
Abstract: Computers tend to fail to classify human faces by gender, especially upon changes in viewpoint or upon occlusion that make it more difficult to extract the necessary image features. In contrast, humans are good at identifying gender but have difficulties in dealing with a large number of images. Accounting for this gap, we proposed FaceCAPTCHA, a novel image-based CAPTCHA that asks users to identify the gender of face images whose gender cannot be recognized by computers (gender-indiscernible faces). By converting the manual gender classification task into a CAPTCHA test, FaceCAPTCHA was designed to not only continuously identify the gender of gender-indiscernible faces but also differentiate between humans and computers and generate new test images. Our user studies showed that FaceCAPTCHA reliably identifies gender-indiscernible faces. A single eight-image FaceCAPTCHA test was completed in 12.41 s on average with a human success rate of 86.51 %, which can be further increased by filtering error-prone test images. In contrast, the probability of passing a FaceCAPTCHA test by random guessing was 0.006 %. We could therefore conclude that FaceCAPTCHA is robust against malicious attacks and easy enough for practical use.
Notes: machine learning


Reference Type:  Journal Article
Author: Kim, Jong-Woo, Chung, Woo-Keun and Cho, Hwan-Gue
Year: 2010
Title: A new image-based CAPTCHA using the orientation of the polygonally cropped sub-images
Journal: International Journal of Computer Graphics
Volume: 26
Issue: 6
Pages: 1135-1143
Short Title: A new image-based CAPTCHA using the orientation of the polygonally cropped sub-images
ISSN: 0178-2789
DOI: 10.1007/s00371-010-0469-3
Keywords: Image orientation
CAPTCHA
Machine learning
Perceptual recognition
Abstract: With an increasing number of automated software bots and automated scripts that exploit public web services, the user is commonly required to solve a Turing test problem, namely a Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA), before they are allowed to use web services. As a solution of CAPTCHAs, the Image Orientation CAPTCHA is based on the hardness of image orientation. So, there is a close correlation between image orientation detection and the performance of image orientation CAPTCHA. In this paper, we introduce a reliable and effective CAPTCHA based on the orientation of cropped sub-images. Also, we try to investigate the key spatial features of sub-image orientation detection such as crop size, major color components, and the number of orientations. So, the goal of this paper is discovering the relationship between these spatial features and the detecting sub-image orientation by human manual work and machine learning-based softwares, respectively. Our experimental results enable our CAPTCHA system to filter out any sub-images difficult for human. Therefore, our experiment showed that exploiting the key spatial features of cropped sub-images is very useful to design a new image-based CAPTCHA system.
Notes: machine learning


Reference Type:  Generic
Author: Kluever, Kurt
Year: 2008
Title: Evaluating the usability and security of a video CAPTCHA
Secondary Author: Zanibbi, Richard, Butler, Zack and Canosa, Roxanne
Publisher: ProQuest Dissertations Publishing
Short Title: Evaluating the usability and security of a video CAPTCHA
Keywords: Computer Science
Applied Sciences
Captcha
Completely Automated Public Turing Test to Tell Computers and Humans Apart
Hip
Human Interactive Proof
Video Tagging
Youtube
Abstract: A CAPTCHA is a variation of the Turing test, in which a challenge is used to distinguish humans from computers ("bots") on the internet. They are commonly used to prevent the abuse of online services. CAPTCHAs discriminate using hard artificial intelligence problems: the most common type requires a user to transcribe distorted characters displayed within a noisy image. Unfortunately, many users find them frustrating and break rates as high as 60% have been reported (for Microsoft's Hotmail). We present a new CAPTCHA in which users provide three words ("tags") that describe a video. A challenge is passed if a user's tag belongs to a set of automatically generated ground-truth tags. In an experiment, we were able to increase human pass rates for our video CAPTCHAs from 69.7% to 90.2% (184 participants over 20 videos). Under the same conditions, the pass rate for an attack submitting the three most frequent tags (estimated over 86,368 videos) remained nearly constant (5% over the 20 videos, roughly 12.9% over a separate sample of 5146 videos). Challenge videos were taken from YouTube.com. For each video, 90 tags were added from related videos to the ground-truth set
security was maintained by pruning all tags with a frequency ≥ 0.6%. Tag stemming and approximate matching were also used to increase human pass rates. Only 20.1% of participants preferred text-based CAPTCHAs, while 58.2% preferred our video-based alternative. Finally, we demonstrate how our technique for extending the ground truth tags allows for different usability/security trade-offs, and discuss how it can be applied to other types of CAPTCHAs.
Notes: machine learning


Reference Type:  Generic
Author: Korayem, Mohammed
Year: 2015
Title: Social and egocentric image classification for scientific and privacy applications
Secondary Author: Crandall, David, Bollen, Johan, Kapadia, Apu and Radivojac, Predrag
Publisher: ProQuest Dissertations Publishing
Short Title: Social and egocentric image classification for scientific and privacy applications
Keywords: Computer Science
Applied Sciences
Deep Learning
Ecology
Image Classification
Large Scale
Mining Photos
Vision for Privacy
Abstract: Image classification is a fundamental computer vision problem with decades of related work. It is a complex task and is a crucial part of many applications. The vision community has created many standard data sets for object recognition and image classification. While these benchmarks are created with the goal of being a realistic, representative sample of the visual world, they often contain implicit biases relating to how the images were selected (as well as which were ignored). In this thesis, we present two lines of work that apply and test image classification in much more realistic problems. We present systems that utilize image classification, deep learning and probabilistic models on large-scale, realistic, unconstrained, and automatically collected datasets. These capture a wider breadth of life on Earth than conventional datasets due to their scale and diversity. Besides these new datasets and the image classification systems developed, the novel applications we present are interesting in their own right. The first line of work explores the potential of social media imagery to power large-scale scientific analysis. We focus on two prototype problems motivated by ecology: automatically detecting snowfall and vegetation. Using over 200 million Flickr images, each representing a rich description of the world at a specific time and place. Our results indicate that a combination of text mining techniques and image classification can produce high quality data for scientists from large-scale, noisy social images. The second line of work addresses privacy concerns related to wearable cameras, by automatically detecting private imagery. We present two systems that focus on different aspects of what makes an image private. The first is PlaceAvoider, which recognizes images taken in sensitive places such as bedrooms or bathrooms. The second is ObjectAvoider, which tries to detect key objects that may signal private content.
Notes: machine learning


Reference Type:  Generic
Author: Korayem, M., Mohamed, A. A., Crandall, D. and Yampolskiy, R. V.
Year: 2012
Title: Learning Visual Features for the Avatar Captcha Recognition Challenge
Volume: 2
Pages: 584-587
Short Title: Learning Visual Features for the Avatar Captcha Recognition Challenge
DOI: 10.1109/ICMLA.2012.200
Keywords: Components, Circuits, Devices and Systems
Computing and Processing
Bioengineering
Engineered Materials, Dielectrics and Plasmas
Fields, Waves and Electromagnetics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Abstract: Captchas are frequently used on the modern world wide web to differentiate human users from automated bots by giving tests that are easy for humans to answer but difficult or impossible for algorithms. As artificial intelligence algorithms have improved, new types of Captchas have had to be developed. Recent work has proposed a new system called Avatar Captcha, in which a user is asked to distinguish between facial images of real humans and those of avatars generated by computer graphics. This novel system has been proposed on the assumption that this Captcha is very difficult for computers to break. In this paper we test a variety of modern visual features and learning algorithms on this avatar recognition task. We find that relatively simple techniques can perform very well on this task, and in some cases can even surpass human performance.
Notes: machine learning


Reference Type:  Journal Article
Author: Le, Tuan Anh, Baydin, Atilim Gunes and Wood, Frank
Year: 2016
Title: Inference Compilation and Universal Probabilistic Programming
Short Title: Inference Compilation and Universal Probabilistic Programming
Keywords: Computer Science - Artificial Intelligence
Computer Science - Learning
Statistics - Machine Learning
68t37, 68t05
G.3
I.2.6
Abstract: We introduce a method for using deep neural networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages, establishing a framework that combines the strengths of probabilistic programming and deep learning methods. We call what we do "compilation of inference" because our method transforms a denotational specification of an inference problem in the form of a probabilistic program written in a universal programming language into a trained neural network denoted in a neural network specification language. When at test time this neural network is fed observational data and executed, it performs approximate inference in the original model specified by the probabilistic program. Our training objective and learning procedure are designed to allow the trained neural network to be used as a proposal distribution in a sequential importance sampling inference engine. We illustrate our method on mixture models and Captcha solving and show significant speedups in the efficiency of inference.
Notes: machine learning


Reference Type:  Journal Article
Author: Li, Qiujie
Year: 2015
Title: A computer vision attack on the ARTiFACIAL CAPTCHA
Journal: An International Journal
Volume: 74
Issue: 13
Pages: 4583-4597
Short Title: A computer vision attack on the ARTiFACIAL CAPTCHA
ISSN: 1380-7501
DOI: 10.1007/s11042-013-1823-z
Keywords: CAPTCHA
ARTiFACIAL
Attack
Computer vision
Abstract: Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is a reverse Turing test that is used to differentiate bots from humans. Text CAPTCHAs have been widely used in commercial applications, but most of the text CAPTCHAs have been successfully attacked. An alternative is to develop image CAPTCHAs to replace text CAPTCHAs. ARTiFACIAL (Automated Reverse Turing test using FACIAL features) Rui and Liu (2003) is an image CAPTCHA system based on detecting human face and facial features and claimed to be attack-resistant and user-friendly. This paper proposes a compute vision attack on ARTiFACIAL. By carefully analyzing the limitations of face and facial feature detectors that ARTiFACIAL exploits, tailor-made attacking algorithm is designed instead of using general face and facial feature detectors directly. When tested with the 800 ARTiFACIAL challenges, the overall success rate of the attacking algorithm is 18.0 %, which is significantly higher than the estimate of 0.0006 % given in Rui and Liu (2003) for computer vision attacks. It takes an average time 1.47s for a PC with 3.2GHz Intel P4 and 2GB memory to pass an ARTiFACIAL test, compared with 14s for a human subject given in Rui and Liu (2003). From the successful attack, useful lessons for guiding the design of image CAPTCHAs are derived to advance the current understanding of the design of image CAPTCHAs and lead to more secure design.
Notes: machine learning


Reference Type:  Journal Article
Author: Miller, James and Roshanbin, Narges
Year: 2016
Title: Enhancing CAPTCHA Security Using Interactivity, Dynamism, and Mouse Movement Patterns
Journal: International Journal of Systems and Service-Oriented Engineering (IJSSOE)
Volume: 6
Issue: 1
Pages: 17-36
Short Title: Enhancing CAPTCHA Security Using Interactivity, Dynamism, and Mouse Movement Patterns
ISSN: 1947-3052
DOI: 10.4018/IJSSOE.2016010102
Keywords: Captchas
Interactive Captchas
Matching Task
Mouse Dynamics
Security
Abstract: Many existing CAPTCHAs require users to identify characters in a static image and match them with their counterparts in another image. Requiring intelligent human interaction in the matching task of these CAPTCHAs will pose a second challenge, which is straightforward for human users but difficult to emulate for Bots. In this paper, the authors develop several interactive matching tasks involving dynamic elements and demonstrate their impact on CAPTCHA security and usability in a series of tests and user studies. Their tests indicate that requiring intelligent human interaction can substantially decrease the likelihood of a CAPTCHA being broken in addition to making an attack computationally expensive. The authors' results provide both a security and a usability benchmark for the development of interactive dual-challenge CAPTCHAs. Their proposed findings from users' mouse movement data analysis can be readily incorporated in several types of existing CAPTCHA to enhance their security.
Notes: mobile devices


Reference Type:  Journal Article
Author: Nakaguro, Yoichi, Dailey, Matthew N., Marukatat, Sanparith and Makhanov, Stanislav S.
Year: 2013
Title: Defeating line-noise CAPTCHAs with multiple quadratic snakes
Journal: Computers & Security
Volume: 37
Pages: 91-110
Short Title: Defeating line-noise CAPTCHAs with multiple quadratic snakes
ISSN: 0167-4048
DOI: 10.1016/j.cose.2013.05.003
Keywords: Captcha
Segmentation
Gvf Snake
Opencv
Ocr
Abstract: Optical character recognition (OCR) is one of the fundamental problems in artificial intelligence and image processing, but recent progress in OCR represents a security challenge for Web sites that throttle requests with image based CAPTCHAs (Completely Automated Public Turing Tests to Tell Computers and Humans Apart). A CAPTCHA is challenge-response test placed within web forms to determine whether the user is human. Unfortunately, algorithms capable of solving image based CAPTCHAs can be used to create spam accounts and design malicious denial of service (DoS) attacks, causing financial and social damage. The problem of defeating digital image CAPTCHAs is thus twofold. On the one hand, it is an important problem in artificial intelligence and image processing. On the other hand, publicly available CAPTCHAs that are not tested against state of the art machine recognition algorithms may make the systems vulnerable to attack by software bots.This paper considers a very important subclass of text CAPTCHAs, those characterized by salt and pepper noise combined with line (curve) noise. Thus far, attacks on CAPTCHAs with this type of noise have used relatively simple image processing methods with some success, but state-of-the-art segmentation methods have not been fully exploited. In this paper, we propose and benchmark two strong segmentation methods. The first method is a modification of a multiple quadratic snake proposed for road extraction from satellite images. The second competing method is a boundary tracing routine available in the OpenCV open source library.A first numerical experiment indicates excellent accuracy for both methods. A second experiment on human recognition shows that the CAPTCHAs used in the study are already near the threshold of being too hard for humans. Finally, a third numerical experiment presents a more difficult set of CAPTCHAs with the addition of anti-binarization methods. The snake-based method is shown to be more resilient to anti-binarization schemes than boundary tracing and state-of-the art projection-based attacks on CAPTCHAs.Since CAPTCHAs corrupted by small line noise are shown to be difficult for humans and relatively easy for our algorithm, CAPTCHA designers should introduce more challenging distortions into their CAPTCHAs, lest the security of systems based on them be compromised.
Notes: machine learning


Reference Type:  Journal Article
Author: Nayeem, Mir, Akand, Mamunur, Sakib, Nazmus and Kabir, Wasi
Year: 2014
Title: Human Cognition in Automated Truing Test Design
Journal: International Journal of Software Science and Computational Intelligence (IJSSCI)
Volume: 6
Issue: 4
Pages: 1-19
Short Title: Human Cognition in Automated Truing Test Design
ISSN: 1942-9045
DOI: 10.4018/ijssci.2014100101
Keywords: Captcha
Cognitive Psychology
Context
Conversation
Human Interactive Proofs (Hips)
Ocr
Usability
Web Services
Abstract: Nowadays, many services in the internet including Email, search engine, social networking are provided with free of charge due to enormous growth of web users. With the expansion of Web services, denial of service (DoS) attacks by malicious automated programs (e.g., web bots) is becoming a serious problem of web service accounts. A HIP, or Human Interactive Proofs, is a human authentication mechanism that generates and grades tests to determine whether the user is a human or a malicious computer program. Unfortunately, the existing HIPs tried to maximize the difficulty for automated programs to pass tests by increasing distortion or noise. Consequently, it has also become difficult for potential users too. So there is a tradeoff between the usability and robustness in designing HIP tests. In their propose technique the authors tried to balance the readability and security by adding contextual information in the form of natural conversation without reducing the distortion and noise. In the result section, a microscopic large-scale user study was conducted involving 110 users to investigate the actual user views compare to existing state of the art CAPTCHA systems like Google's reCAPTCHA and Microsoft's CAPTCHA in terms of usability and security and found the authors' system capable of deploying largely over internet.
Notes: machine learning


Reference Type:  Journal Article
Author: Nguyen, Vu Duc, Chow, Yang-Wai and Susilo, Willy
Year: 2014
Title: On the security of text-based 3D CAPTCHAs
Journal: Computers & Security
Volume: 45
Pages: 84-99
Short Title: On the security of text-based 3D CAPTCHAs
ISSN: 0167-4048
DOI: 10.1016/j.cose.2014.05.004
Keywords: 3d Captcha
Character Extraction
Optical Character Recognition
Automated Attack
Security
Abstract: CAPTCHAs have become a standard security mechanism that are used to deter automated abuse of online services intended for humans. However, many existing CAPTCHA schemes to date have been successfully broken. As such, a number of CAPTCHA developers have explored alternative methods of designing CAPTCHAs. 3D CAPTCHAs is a design alternative that has been proposed to overcome the limitations of traditional CAPTCHAs. These CAPTCHAs are designed to capitalize on the human visual system's natural ability to perceive 3D objects from an image. The underlying security assumption is that it is difficult for a computer program to identify the 3D content. This paper investigates the robustness of text-based 3D CAPTCHAs. In particular, we examine three existing text-based 3D CAPTCHA schemes that are currently deployed on a number of websites. While the direct use of Optical Character Recognition (OCR) software is unable to correctly solve these text-based 3D CAPTCHA challenges, we highlight certain patterns in the 3D CAPTCHAs can be exploited to identify important information within the CAPTCHA. By extracting this information, this paper demonstrates that automated attacks can be used to solve these 3D CAPTCHAs with a high degree of success. •The robustness of text-based 3D CAPTCHAs against automated attacks is investigated.•Characteristics and vulnerabilities of three existing 3D CAPTCHAs are identified.•Invariant features in the design of 3D CAPTCHAs can be exploited.•We demonstrate that 3D CAPTCHAs are no more secure than their 2D counterparts.
Notes: machine learning


Reference Type:  Journal Article
Author: Nitesh Kumar, Chaudhary and Shraddha, Srivastav
Year: 2014
Title: SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA
Journal: Signal & Image Processing
Volume: 05
Issue: 05
Pages: 27-33
Short Title: SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA
ISSN: 2229-3922
DOI: 10.5121/sipij.2014.5503
Keywords: Captcha
Gaussian Blur
Image Transformations
Optical Character Recognition (Ocr)
Abstract: An efficient, and intuitive algorithm is presented for the identification of speakers from a long dataset (like YouTube long discussion, Cocktail party recorded audio or video).The goal of automatic speaker identification is to identify the number of different speakers and prepare a model for that speaker by extraction, characterization and speaker-specific information contained in the speech signal. It has many diverse application specially in the field of Surveillance , Immigrations at Airport , cyber security , transcription in multi-source of similar sound source, where it is difficult to assign transcription arbitrary. The most commonly speech parameterization used in speaker verification, K-mean, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique is then explained. Gaussian mixture models (GMM), perhaps the most robust machine learning algorithm has been introduced to examine and judge carefully speaker identification in text independent. The application or employment of Gaussian mixture models for monitoring & Analysing speaker identity is encouraged by the familiarity, awareness, or understanding gained through experience that Gaussian spectrum depict the characteristics of speaker's spectral conformational pattern and remarkable ability of GMM to construct capricious densities after that we illustrate 'Expectation maximization' an iterative algorithm which takes some arbitrary value in initial estimation and carry on the iterative process until the convergence of value is observed We have tried to obtained 85 ~ 95% of accuracy using speaker modeling of vector quantization and Gaussian Mixture model ,so by doing various number of experiments we are able to obtain 79 ~ 82% of identification rate using Vector quantization and 85 ~ 92.6% of identification rate using GMM modeling by Expectation maximization parameter estimation depending on variation of parameter
Notes: machine learning


Reference Type:  Journal Article
Author: Powell, Brian M., Goswami, Gaurav, Vatsa, Mayank, Singh, Richa and Noore, Afzel
Year: 2014
Title: fgCAPTCHA: Genetically Optimized Face Image CAPTCHA 5
Journal: Access, IEEE
Volume: 2
Pages: 473-484
Short Title: fgCAPTCHA: Genetically Optimized Face Image CAPTCHA 5
DOI: 10.1109/ACCESS.2014.2321001
Keywords: Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Abstract: The increasing use of smartphones, tablets, and other mobile devices poses a significant challenge in providing effective online security. CAPTCHAs, tests for distinguishing human and computer users, have traditionally been popular
however, they face particular difficulties in a modern mobile environment because most of them rely on keyboard input and have language dependencies. This paper proposes a novel image-based CAPTCHA that combines the touch-based input methods favored by mobile devices with genetically optimized face detection tests to provide a solution that is simple for humans to solve, ready for worldwide use, and provides a high level of security by being resilient to automated computer attacks. In extensive testing involving over 2600 users and 40000 CAPTCHA tests, fgCAPTCHA demonstrates a very high human success rate while ensuring a 0% attack rate using three well-known face detection algorithms.
Notes: mobile devices


Reference Type:  Journal Article
Author: Ragavi, V. and Geetha, G.
Year: 2011
Title: CAPTCHA Celebrating its Quattuordecennial - A Complete Reference
Journal: International Journal of Computer Science Issues (IJCSI)
Volume: 8
Issue: 6
Pages: 340-349
Short Title: CAPTCHA Celebrating its Quattuordecennial - A Complete Reference
ISSN: 1694-0814
Notes: machine learning


Reference Type:  Generic
Author: Ramaiah, Chetan
Year: 2015
Title: Accents in Handwriting: A Hierarchical Bayesian Approach to Handwriting Analysis
Secondary Author: Govindaraju, Venu, Chen, Chang Wen and Jayaraman, Bharat
Publisher: ProQuest Dissertations Publishing
Short Title: Accents in Handwriting: A Hierarchical Bayesian Approach to Handwriting Analysis
Keywords: Computer Science
Applied Sciences
Computer Vision
Handwriting Analysis
Machine Learning
Pgm
Abstract: The individuality of handwriting has been studied extensively in the handwriting analysis domain. An individual's handwriting is believed to be influenced by genetic and cultural factors. Genetic factors include pen grip style, pen pressure, Kinesthesia, motor skills etc., whereas cultural factors include learning through imitation and multilingualism. The traditional approaches in handwriting analysis generally do not attempt to model or quantify these factors. They function on the assumption that each individual's handwriting is unique, without any shared components among individuals. In our dissertation, we first provide evidence to demonstrate the existence of shared influences in handwriting. We postulate that a handwriting sample can be represented as a distribution over a finite set of handwriting styles. We introduce the concept of accent in handwriting, which is defined to be the influence that a person's native script has when learning to write in a different script. We then exploit the concept of accents in handwriting to demonstrably improve on the state of the art results in several handwriting analysis problems. We present three distinct hierarchical Bayesian models to analyze and quantify the influences in handwriting. We demonstrate that a mixture of influences of cultural and genetic factors is the ideal representation for handwriting samples. In our models, each handwritten sample is first represented as a bag of features. The feature representation is modeled as a distribution over a set of finite handwriting styles, and classification in the style space representation is performed to identify the accent. Each writing style is thus represented as a distribution over features. In addition, we propose a generic hierarchical framework for handwriting analysis problems. The first step of the framework is accent identification, after which, an accent specific model is learned for the problem. We have validated our approach on two data sets: (i) an in-house data set collected exclusively for the accents in handwriting task and, (ii) the UNIPEN data set, which has the necessary annotations for our purpose. The performance of our approach is demonstrated by comparing the proposed hierarchical approach with the state of the art approaches in various handwriting analysis problems. In particular, we have shown improved performance in both writer identification and handwriting recognition tasks. Finally, we present a novel handwritten CAPTCHA generation technique where the idea of accents in handwriting enhances the robustness of the CAPTCHA generation process.
Notes: machine learning


Reference Type:  Journal Article
Author: Sano, Shotaro, Otsuka, Takuma, Itoyama, Katsutoshi and Okuno, Hiroshi G.
Year: 2015
Title: HMM-based Attacks on Google's ReCAPTCHA with Continuous Visual and Audio Symbols
Journal: Journal of Information Processing
Volume: 23
Issue: 6
Pages: 814-826
Short Title: HMM-based Attacks on Google's ReCAPTCHA with Continuous Visual and Audio Symbols
ISSN: 1882-6652
DOI: 10.2197/ipsjjip.23.814
Keywords: Captcha
Human Interaction Proof
Hidden Markov Model
Continuous Character/Speech Recognition
Abstract: CAPTCHAs distinguish humans from automated programs by presenting questions that are easy for humans but difficult for computers, e.g., recognition of visual characters or audio utterances. The state of the art research suggests that the security of visual and audio CAPTCHAs mainly lies in anti-segmentation techniques, because individual symbol recognition after segmentation can be solved with a high success rate with certain machine learning algorithms. Thus, most recent commercial CAPTCHAs present continuous symbols to prevent automated segmentation. We propose a novel framework that can automatically decode continuous CAPTCHAs and assess its effectiveness with actual CAPTCHA questions from Google's reCAPTCHA. Our framework is constructed on the basis of a sequence recognition method based on hidden Markov models (HMMs), which can be concisely implemented by using an off-the-shelf library HMM toolkit. This method concatenates several HMMs, each of which recognizes a symbol, to build a larger HMM that recognizes a question. Our experimental results reveal vulnerabilities in continuous CAPTCHAs because the solver cracks the visual and audio reCAPTCHA systems with 31.75% and 58.75% accuracy, respectively. We further propose guidelines to prevent possible attacking from HMM-based CAPTCHA solvers on the basis of synthetic experiments with simulated continuous CAPTCHAs.
Notes: machine learning


Reference Type:  Journal Article
Author: Singh, Amarjot, Bacchuwar, Ketan and Bhasin, Akshay
Year: 2012
Title: A Survey of OCR Applications
Journal: International Journal of Machine Learning and Computing
Volume: 2
Issue: 3
Pages: 314-318
Short Title: A Survey of OCR Applications
ISSN: 2010-3700
DOI: 10.7763/IJMLC.2012.V2.137
Keywords: Texts
Websites
Algorithms
Searching
Genetic Algorithms
Electronics
Optical Character Recognition
Character Recognition
Learning (Ci)
Abstract: Optical Character Recognition or OCR is the electronic translation of handwritten, typewritten or printed text into machine translated images. It is widely used to recognize and search text from electronic documents or to publish the text on a website. The paper presents a survey of applications of OCR in different fields and further presents the experimentation for three important applications such as Captcha, Institutional Repository and Optical Music Character Recognition. We make use of an enhanced image segmentation algorithm based on histogram equalization using genetic algorithms for optical character recognition. The paper will act as a good literature survey for researchers starting to work in the field of optical character recognition.
Notes: machine learning


Reference Type:  Journal Article
Author: Singh, Karanpreet, Singh, Paramvir and Kumar, Krishan
Year: 2016
Title: Application layer HTTP-GET flood DDoS attacks: Research landscape and challenges
Journal: Computers & Security
Short Title: Application layer HTTP-GET flood DDoS attacks: Research landscape and challenges
ISSN: 0167-4048
DOI: 10.1016/j.cose.2016.10.005
Keywords: Systematic Survey
Application Layer Ddos Attacks
Denial of Service
Http-Get Flood
Sophisticated Attacks
Abstract: Application layer Distributed Denial of Service (DDoS) attacks have empowered conventional flooding based DDoS with more subtle attacking methods that pose an ever-increasing challenge to the availability of Internet based web services. These attacks hold the potential to cause similar damaging effects as their lower layer counterparts using relatively fewer attacking assets. Being the dominant part of the Internet, HTTP is the prime target of GET flooding attacks, a common practice followed among various application layer DDoS attacks. With the presence of new and improved attack programs, identifying these attacks always seems convoluted. A swift rise in the frequency of these attacks has led to a favorable shift in interest among researchers. Over the recent years, a significant research contribution has been dedicated toward devising new techniques for countering HTTP-GET flood DDoS attacks. In this paper, we conduct a survey of such research contributions following a well-defined systematic process. A total of 63 primary studies published before August 2015 were selected from six different electronic databases following a careful scrutinizing process. We formulated four research questions that capture various aspects of the identified primary studies. These aspects include detection attributes, datasets, software tools, attack strategies, and underlying modeling methods. The field background required to understand the evolution of HTTP-GET flood DDoS attacks is also presented. The aim of this systematic survey is to gain insights into the current research on the detection of these attacks by comprehensively analyzing the selected primary studies to answer a predefined set of research questions. This survey also discusses various challenges that need to be addressed, and acquaints readers with recommendations for possible future research directions.
Notes: machine learning


Reference Type:  Journal Article
Author: Soupionis, Yannis and Gritzalis, Dimitris
Year: 2010
Title: Audio CAPTCHA: Existing solutions assessment and a new implementation for VoIP telephony
Journal: Computers & Security
Volume: 29
Issue: 5
Pages: 603-618
Short Title: Audio CAPTCHA: Existing solutions assessment and a new implementation for VoIP telephony
ISSN: 0167-4048
DOI: 10.1016/j.cose.2009.12.003
Keywords: Spit
Audio Captcha Attributes
Voip
Authentication
Evaluation
Speech Recognition
Turing Test
Abstract: SPam over Internet Telephony (SPIT) is a potential source of future annoyance in Voice over IP (VoIP) systems. A typical way to launch a SPIT attack is the use of an automated procedure (i.e., bot), which generates calls and produces unsolicited audio messages. A known way to protect against SPAM is a Reverse Turing Test, called CAPTCHA (Completely Automated Public Turing Test to Tell Computer and Humans Apart). In this paper, we evaluate existing audio CAPTCHA, as this type of format is more suitable for VoIP systems, to help them fight bots. To do so, we first suggest specific attributes-requirements that an audio CAPTCHA should meet in order to be effective. Then, we evaluate this set of popular audio CAPTCHA, and demonstrate that there is no existing implementation suitable enough for VoIP environments. Next, we develop and implement a new audio CAPTCHA, which is suitable for SIP-based VoIP telephony. Finally, the new CAPTCHA is tested against users and bots and demonstrated to be efficient.
Notes: machine learning


Reference Type:  Generic
Author: Subils, Jean-Baptiste
Year: 2015
Title: Authentication Via Multiple Associated Devices
Secondary Author: Ligatti, Jay, Goldgof, Dmitry and Liu, Yao
Publisher: ProQuest Dissertations Publishing
Short Title: Authentication Via Multiple Associated Devices
Keywords: Computer Science
Applied Sciences
Access Control
Online Authorization
Online Security
Abstract: This thesis presents a practical method of authentication utilizing multiple devices. The factors contributing to the practicality of the method are: the utilization of devices already commonly possessed by users and the amenability to being implemented on a wide variety of devices. The term "device'' refers to anything able to perform cryptographic operations, store data, and communicate with another such device. In the method presented herein, multiple devices need to be associated with a single user to provide this user an identity in the system. A public key infrastructure is used to provide this identity. Each of the devices associated with a user possesses a public and private key which allow cryptographic operations to be performed. These operations include signing and encrypting data and will prove the identity of each device. The addition of these identities helps authenticate a single user. A wide variety of devices qualifies to be used by this authentication method. The minimum requirements are: the storage of data such as a private key, the ability to communicate, and a processor to perform the cryptographic operations. Smart devices possess these requirements and the manufacture of such devices can be realized at a reasonable cost. This method is malleable and implemented in numerous authentication protocols. This thesis illustrates and explains several instances of these protocols. The method's primary novelty is its resistance to theft-based attacks, which results from the utilization of multiple devices to authenticate users. A user associated with multiple devices needs to be in possession of these devices to correctly perform the authentication task. This thesis focuses on the system design of this novel authentication method.
Notes: machine learning


Reference Type:  Generic
Author: Szu-Yu Lin, A. B., Te-En Wei, A. B., Hahn-Ming Lee, A. B., Jeng, A. B. and Chien-Tsung Liu, A. B.
Year: 2012
Title: A novel approach for re-authentication protocol using personalized information
Volume: 5
Pages: 1826-1829
Short Title: A novel approach for re-authentication protocol using personalized information
ISBN/ISSN: 2160-133X
DOI: 10.1109/ICMLC.2012.6359653
Keywords: Computing and Processing
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Bioengineering
Signal Processing and Analysis
Abstract: Since authentication is the key to access control security in Internet access for every user, therefore, how to verify a user is who he claimed to be is a very important requirement in Internet security. In some situations, users need to be re-authenticated to make sure that they are still actively engaged in real time interaction. For instance, people will be notified to dial to a specific server phone number to reconfirm his identity again before re-login using their account ID and passwords pairs. This approach has been adopted by many online game servers. In this paper, we proposed a novel approach for re-authentication protocol using personalized information with CAPTCHA.
Notes: machine learning


Reference Type:  Journal Article
Author: Tangmanee, Chatpong
Year: 2016
Title: Effects of Text Rotation, String Length, and Letter Format on Text-based CAPTCHA Robustness
Journal: Journal of Applied Security Research
Volume: 11
Issue: 3
Pages: 349-361
Short Title: Effects of Text Rotation, String Length, and Letter Format on Text-based CAPTCHA Robustness
ISSN: 1936-1610
DOI: 10.1080/19361610.2016.1178553
Keywords: Article
Captcha
Text Rotation
String Length
Letter Format
Robustness
Abstract: ABSTRACT CAPTCHA, standing for Completely Automated Public Turing test to tell Computers and Humans Apart, has been adopted as a security check. One way to assess robustness of a text-based CAPTCHA test is to use optical letter reader (OCR) software to read it. If the reading fails, the design is robust. Though many design features have been examined, no research has investigated the effects of text rotation, string length, or letter format on CAPTCHA robustness. Fourteen hundred sixty four text-based CAPTCHA tests were created based on the three variables. The main effects of all three variables and few of the interaction effects on robustness were significant. The findings have both theoretical and practical contributions.
Notes: mobile devices


Reference Type:  Journal Article
Author: Tangmanee, Chatpong and Sujarit-Apirak, Paradorn
Year: 2013
Title: Attitudes towards CAPTCHA: A Survey of Thai Internet Users
Journal: Journal of Global Business Management
Volume: 9
Issue: 2
Pages: 29-41
Short Title: Attitudes towards CAPTCHA: A Survey of Thai Internet Users
ISSN: 18173179
Keywords: Thailand
Studies
User Behavior
Perceptions
Discriminant Analysis
Market Research
Asia & the Pacific
Experiment/Theoretical Treatment
Abstract:  CAPTCHA stands for "Completely Automated Public Turing test to tell Computers and Humans Apart". It requires the deciphering of distorted texts, mostly in English that computers still cannot do well. It is also helpful in preventing the abuse of online services. The current text-based CAPTCHA requires users to be able to read English characters. For Thai Internet users who might not be very familiar with English, a Thai language based CAPTCHA may be a more appropriate option. To date, no published work has examined the extent to which Thai Internet users are familiar with CAPTCHA
therefore, this study attempts to survey their awareness of, and attitudes towards, the online test. Based on 340 usable online questionnaire submissions, it was found that Thai Internet users are aware of CAPTCHA, but their understanding of it does not go very deep. Using exploratory factor analysis, their attitudes towards CAPTCHA can be classified in two dimensions: (1) the perceived drawbacks of the CAPTCHA test and (2) the feasibility of Thai language CAPTCHA. In addition to providing our insights into the application of CAPTCHA in the Thai Internet user context, online service providers could take certain measures to improve users' attitudes and understanding regarding CATPCHA. [PUBLICATION ABSTRACT]
Notes: machine learning


Reference Type:  Generic
Author: Thomas, Achint
Year: 2010
Title: Enhancing cyber security through the use of synthetic handwritten CAPTCHAs
Secondary Author: Govindaraju, Venugopal, Rapaport, William and Scott, Peter
Publisher: ProQuest Dissertations Publishing
Short Title: Enhancing cyber security through the use of synthetic handwritten CAPTCHAs
Keywords: Computer Science
Applied Sciences
Baseline Detection
Captcha
Cyber Security
Handwriting Generation
Reading Proficiency
Textlines
Abstract: Online services which allow users to contribute content and interact remotely over the internet in some manner are common today. Many of these services, like spam control for blogs and email account sign-up, require that they be accessed only by humans and not machines (automated scripts or bots). One method of differentiating between humans and bots is by using a CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart). A number of different genres of CAPTCHAs exist (text-based, visual, auditory, and cognitive). Text-based CAPTCHAs are popular because automatic recognition of degraded, noisy, distorted text with background clutter is still a challenging task for machines, but is a task that humans perform with relative ease. However, recently a significant number of printed-text based CAPTCHAs have been successfully attacked by bots, thus rendering the services they protect vulnerable to attack. Thus there is an urgent need for exploring alternate CAPTCHAs and this serves as the prime motivation for our research. We explore three primary tracks of investigation in this work. Firstly, we define a set of sound design principles, based on an exploit-avoid-resist philosophy, which must be adhered to while building secure CAPTCHAs. Secondly, we improve the effectiveness of text-based CAPTCHAs by substituting printed text with handwritten text and then layering on additional cognitive tasks. To this end, we develop a fully-automated framework for synthetic handwriting generation to design handwritten CAPTCHAs that will exploit the differential in handwriting reading proficiency between humans and machines. Prior work in this area has focused on synthesizing handwritten textlines to conform to a particular user’s style. We present techniques for simulating handwriting without being writer-specific. Unlike previous work, this is a fully-automated approach based on extracting principal curves from handwritten characters. These serve as a set of control points to allow character-level distortion. We use novel techniques for character baseline detection and ligature parameterization to construct the textlines. A parameterized sinusoid-based function is used to allow random perturbation of these textlines. Using this framework as a basis, we present handwritten CAPTCHAs that perform better than current text-based CAPTCHAs at distinguishing between humans and machines. We also present a novel handwritten CAPTCHA which exploits the mixed-text segmentation problem to deliver sub-0.01% machine recognition rates for respectable human performance. Thirdly, we present in general terms a new class of CAPTCHA, the interaction-based CAPTCHA, which requires an entity to interact with the challenge to gain access to the solution space. We show how the interaction-based CAPTCHA requires an entity to solve three tasks – interaction, cognition, and recognition – to be able to solve a CAPTCHA challenge. Additionally, we present the 3D shadow CAPTCHA, a specific instance of this new class of CAPTCHAs. The 3D shadow CAPTCHA uses aspects of 3D scene rendering, ray casting, and perspective projection to present unique challenges to machines while remaining intuitive for humans to solve.
Notes: machine learning


Reference Type:  Generic
Author: Wan, Shengye
Year: 2016
Title: Protecting Web Contents against Persistent Crawlers
Secondary Author: Sun, Kun, Li, Qun and Zhou, Gang
Publisher: ProQuest Dissertations Publishing
Short Title: Protecting Web Contents against Persistent Crawlers
Keywords: Computer Science
Applied Sciences
Confidential Documents
Crawler
Crawler Detection
Defense System
Web Security
Abstract: Web crawlers have been developed for several malicious purposes like downloading server data without permission from website administrator. Armored stealthy crawlers are evolving against new anti-crawler mechanisms in the arms race between the crawler developers and crawler defenders. In this paper, we develop a new anti-crawler mechanism called PathMarker to detect and constrain crawlers that crawl content of servers stealthy and persistently. The basic idea is to add a marker to each web page URL and then encrypt the URL and marker. By using the URL path and user information contained in the marker as the novel features of machine learning, we could accurately detect stealthy crawlers at the earliest stage. Besides effectively detecting crawlers, PathMarker can also dramatically suppress the efficiency of crawlers before they are detected by misleading the crawlers visiting same page's URL with different markers. We deploy our approach on a forum website to collect normal users' data. The evaluation results show that PathMarker can quickly capture all 12 open-source and in-house crawlers, plus two external crawlers (i.e., Googlebots and Yahoo Slurp).
Notes: machine learning


Reference Type:  Journal Article
Author: Wang, Eric and Ye, Yun
Year: 2013
Title: A New Text Based CAPTCHA
Journal: Applied Mechanics and Materials
Volume: 373-375
Pages: 644
Short Title: A New Text Based CAPTCHA
ISSN: 16609336
DOI: 10.4028/www.scientific.net/AMM.373-375.644
Keywords: Captcha
Css
Ocr-Based Attack
Replay Attack
Abstract:  Attacks on text-based CAPTCHAs are getting more sophisticated. Most existing schemes defend attacks by increasing deformation or distortion rate of words which sacrifice the friendliness of the system. In this paper, we propose a secure one time text-based CAPTCHA scheme which can effectively defend multiple attacks without decreasing the friendliness of the scheme for valid users. The core of our design is an information obfuscation scheme which determines which character image to display and which to hide in an online manner from a sequence of 62 character images which makes it more difficult to attack. Encryption method is employed for protecting display policies which control showing and hiding of information. Prototype has been implemented and the preliminary results are encouraging.
Notes: machine learning


Reference Type:  Generic
Author: Xie, Liang
Year: 2008
Title: Mitigating rapidly propagating worm threats in emergent networks
Secondary Author: Zhu, Sencun
Publisher: ProQuest Dissertations Publishing
Short Title: Mitigating rapidly propagating worm threats in emergent networks
Keywords: Computer Science
Applied Sciences
Access Control
Emergent Networks
Peer-to-Peer
Wireless Communications
Worm Threats
Abstract: This dissertation presents a series of techniques that help both client devices and network elements defend against a wide variety of worm attacks. These techniques can be deployed to secure emergent networks including peer-to-peer (P2P) file-sharing systems and wireless communication systems. In recent years, worms have emerged as one of the most disastrous security threats to various information systems and network infrastructures. Although Internet worms have been extensively studied, worm issues in such emergent networks as peer-to-peer (P2P) systems and cellular networks have yet received due attention. This dissertation aims at designing automated, realtime, and systematic countermeasures, which leverage the existing internal communication mechanisms and network infrastructure to contain worm propagation. The proposed defenses consist of security solutions for both client and system software. For P2P networks, this dissertation first proposes a partition-based scheme and a CDS-based scheme to contain ultra-fast topological worm spreads. These schemes leverage the underlying P2P overlay for distributing automated security patches to vulnerable machines. They are unique in adopting graph-theory techniques for containing fast spreading worms. This dissertation then proposes a P2P-tailored solution to combat file-sharing worms in P2P environments. Our solution consists of a download-based scheme and a search-based scheme. Both schemes utilize the existing file-sharing mechanisms to internally disseminate security patches to participating peers in a timely and distributed fashion. For cell-phone networks, this dissertation proposes two device-level defenses for securing smartphone software, namely an access-control–based scheme and a GTT-based scheme. These schemes are unique in that they either enforce security policies in phone devices to identify and block worm attacks or leverage artificial intelligence (AI) methods to differentiate human or worm initiators of the phone applications. This dissertation also proposes a systematic countermeasure consisting of both terminal-level and network-level defenses for combating cell-phone worms. Unlike the existing solutions that split the collaboration between the terminal device and the network to throttle system-wide worm spreads, the proposed solution adopts an identity-based signature scheme at both the sender and the receiver side, and a detection-based automated patching scheme at the network side. Combining terminal-level and network-level defenses effectively speeds up the process of worm detection and victim disinfection. This dissertation also provides solid mathematical analyses, extensive simulations and experiments to evaluate the effectiveness and show the applicability of the proposed defenses. In addition, it discusses some open issues related to the proposed solutions and suggests some interesting directions in combating the worm threats as the emergent networks evolve.
Notes: machine learning


Reference Type:  Generic
Author: Xu, Yi
Year: 2016
Title: Toward robust video event detection and retrieval under adversarial constraints
Secondary Author: Frahm, Jan-Michael, Monronse, Fabian, Berg, Tamara, Crandall, David, Dunn, Enrique and Monrose, Fabian
Publisher: ProQuest Dissertations Publishing
Short Title: Toward robust video event detection and retrieval under adversarial constraints
Keywords: Computer Science
Applied Sciences
Computer Vision
Detection
Privacy
Retrieval
Security
Video
Abstract: The continuous stream of videos that are uploaded and shared on the Internet has been leveraged by computer vision researchers for a myriad of detection and retrieval tasks, including gesture detection, copy detection, face authentication, etc. However, the existing state-of-the-art event detection and retrieval techniques fail to deal with several real-world challenges (e.g., low resolution, low brightness and noise) under adversary constraints. This dissertation focuses on these challenges in realistic scenarios and demonstrates practical methods to address the problem of robustness and efficiency within video event detection and retrieval systems in five application settings (namely, CAPTCHA decoding, face liveness detection, reconstructing typed input on mobile devices, video confirmation attack, and content-based copy detection). Specifically, for CAPTCHA decoding, I propose an automated approach which can decode moving-image object recognition (MIOR) CAPTCHAs faster than humans. I showed that not only are there inherent weaknesses in current MIOR CAPTCHA designs, but that several obvious countermeasures (e.g., extending the length of the codeword) are not viable. More importantly, my work highlights the fact that the choice of underlying hard problem selected by the designers of a leading commercial solution falls into a solvable subclass of computer vision problems. For face liveness detection, I introduce a novel approach to bypass modern face authentication systems. More specifically, by leveraging a handful of pictures of the target user taken from social media, I show how to create realistic, textured, 3D facial models that undermine the security of widely used face authentication solutions. My framework makes use of virtual reality (VR) systems, incorporating along the way the ability to perform animations (e.g., raising an eyebrow or smiling) of the facial model, in order to trick liveness detectors into believing that the 3D model is a real human face. I demonstrate that such VR-based spoofing attacks constitute a fundamentally new class of attacks that point to a serious weaknesses in camera-based authentication systems. For reconstructing typed input on mobile devices, I proposed a method that successfully transcribes the text typed on a keyboard by exploiting video of the user typing, even from significant distances and from repeated reflections. This feat allows us to reconstruct typed input from the image of a mobile phone’s screen on a user’s eyeball as reflected through a nearby mirror, extending the privacy threat to include situations where the adversary is located around a corner from the user. To assess the viability of a video confirmation attack, I explored a technique that exploits the emanations of changes in light to reveal the programs being watched. I leverage the key insight that the observable emanations of a display (e.g., a TV or monitor) during presentation of the viewing content induces a distinctive flicker pattern that can be exploited by an adversary. My proposed approach works successfully in a number of practical scenarios, including (but not limited to) observations of light effusions through the windows, on the back wall, or off the victim’s face. My empirical results show that I can successfully confirm hypotheses while capturing short recordings (typically less than 4 minutes long) of the changes in brightness from the victim’s display from a distance of 70 meters. Lastly, for content-based copy detection, I take advantage of a new temporal feature to index a reference library in a manner that is robust to the popular spatial and temporal transformations in pirated videos. My technique narrows the detection gap in the important area of temporal transformations applied by would-be pirates. My large-scale evaluation on real-world data shows that I can successfully detect infringing content from movies and sports clips with 90.0% precision at a 71.1% recall rate, and can achieve that accuracy at an average time expense of merely 5.3 seconds, outperforming the stat of the art by an order of magnitude.
Notes: mobile devices


Reference Type:  Journal Article
Author: Yan, J. and El Ahmad, A. S.
Year: 2009
Title: CAPTCHA Security: A Case Study
Journal: Security & Privacy, IEEE
Volume: 7
Issue: 4
Short Title: CAPTCHA Security: A Case Study
ISSN: 1540-7993
DOI: 10.1109/MSP.2009.84
Keywords: Computing and Processing
Abstract: A simple but novel attack can break some CAPTCHAs with a success rate higher than 90 percent. In contrast to early work that relied on sophisticated computer vision or machine learning techniques, the authors used simple pattern recognition algorithms to exploit fatal design errors.
Notes: machine learning


Reference Type:  Journal Article
Author: Yang, Tzu- I., Koong, Chorng-Shiuh and Tseng, Chien-Chao
Year: 2015
Title: Game-based image semantic CAPTCHA on handset devices
Journal: An International Journal
Volume: 74
Issue: 14
Pages: 5141-5156
Short Title: Game-based image semantic CAPTCHA on handset devices
ISSN: 1380-7501
DOI: 10.1007/s11042-013-1666-7
Keywords: CAPTCHA
Human interactive proofs
Game-based
Unambiguous image semantic
GISCHA
Abstract: A completely automated public turing test to tell computer and human apart (CAPTCHA) is based on the Turing test, which aims to protect Internet services from automatic script attacks and spams. However, most proposed or deployed CAPTCHAs have been breached. It is possible to enhance the security of an existing CAPTCHA by adding noises systematically adding noises, but distortions would make characters recognition difficult for humans. On the other hand, most of the traditional CPATCHAs require complicated operations using keyboards and mice which may become limitations of modern handset devices. In this study, we propose a novel GISCHA using game-based image semantics with the contributions that 1) use simple keys, mouse, gesture, and accelerometer instead of complex alphabet inputs
2) is language independent
3) enhances the security level without annoying users
4) is based on more advanced human cognitive abilities
and 5) make CAPTCHAs more interesting. The experiment results show that a single GISCHA challenge was completed in 9.06 s on average with a virtual keyboard and 10.25 s on average with accelerometers build in handset devices, and the pass rate of first time use is 94.8 %, which means that it is sufficiently easy for practical use.
Notes: machine learning


Reference Type:  Journal Article
Author: Yeh, Her‐Tyan, Chen, Bing‐Chang and Wu, Yi‐Cong
Year: 2013
Title: Mobile user authentication system in cloud environment
Journal: Security and Communication Networks
Volume: 6
Issue: 9
Pages: 1161-1168
Short Title: Mobile user authentication system in cloud environment
ISSN: 1939-0114
DOI: 10.1002/sec.688
Keywords: Voiceprint Identification
One‐Time Password
Captcha
Visual Cryptography
Cloud
Abstract: In order to reach a safe environment that can be automatically used on the Internet and to take precautions against the Internet fishing attack, the system integrates some features including one‐time password, Completely Automated Public Turing Test to tell Computers and Humans Apart, voiceprint identification of creatural features, and visual cryptography, designing a formula wherein users do not need to remember any accounts and passwords when they surf the Internet through mobile devices, and it aims at smart phones and the Cloud. The formula is able to improve the problems of rampant Internet fishing and the management of passwords. In techniques, on one hand, it uses PIN information visual passwords in cell phones to improve the security of the account
on the other hand, it uses voiceprint identification features so that the system center can ensure the user's identification with a view to improve the leak in mobile devices rather than only to check mobile devices. And then, it utilizes the voiceprint, which we use when we log in, to produce a one‐time password that is able to lower the risk of the account and passwords being attacked by Internet fishing. Through the frame of this research, it can protect our cell phones from being lost and embezzled and can prevent the account and passwords from being attacked by Internet fishing. It can also solve the problem of users forgetting accounts and passwords, and reduce the operational burden of cell phones. Besides, it is capable of preventing the Cloud servers from incurring many malicious registrations and logins, keeping them working efficiently. Copyright © 2012 John Wiley & Sons, Ltd. Through the frame of this research, it can protect our cell phones from being lost and embezzled and can prevent the account and passwords from being attacked by Internet fishing. It can also solve the problem of users forgetting accounts and passwords, and reduce the operational burden of cell phones. Besides, it is capable of preventing the Cloud servers from incurring many malicious registrations and logins, keeping them working efficiently.
Notes: mobile devices


Reference Type:  Journal Article
Author: Yoon, Ji Won, Kim, Hyoungshick and Huh, Jun Ho
Year: 2010
Title: Hybrid spam filtering for mobile communication
Journal: Computers & Security
Volume: 29
Issue: 4
Pages: 446-459
Short Title: Hybrid spam filtering for mobile communication
ISSN: 0167-4048
DOI: 10.1016/j.cose.2009.11.003
Keywords: Spam Sms Messages
Hybrid
Content-Based Filtering
Challenge-Response
Threshold Sensitivity Problem
Abstract: Spam messages are an increasing threat to mobile communication. Several mitigation techniques have been proposed, including white and black listing, challenge-response and content-based filtering. However, none are perfect and it makes sense to use a combination rather than just one. We propose an anti-spam framework based on the hybrid of content-based filtering and challenge-response. A message, that has been classified as uncertain through content-based filtering, is checked further by sending a challenge to the message sender. An automated spam generator is unlikely to send back a correct response, in which case, the message is classified as spam. Our simulation results show the trade-off between the accuracy of anti-spam classifiers and the incurring traffic overhead, and demonstrate that our hybrid framework is capable of achieving high accuracy regardless of the content-based filtering algorithm being used.
Notes: mobile devices




Dr Scott Hollier
Digital Access Specialist
E-mail: scott@hollier.info<mailto:scott@hollier.info> Mobile: +61 (0)430 351 909

Learn more about Scott through his recently published memoir:
Outrunning the Night: a life journey of disability, determination and joy.
Visit outrunningthenight.com<http://www.outrunningthenight.com/> for more information and sample chapter.

From: White, Jason J [mailto:jjwhite@ets.org]
Sent: Friday, 23 December 2016 7:47 AM
To: Scott Hollier <scott@hollier.info>; public-rqtf@w3.org
Subject: RE: Topic 3 - CAPTCHA literature review initial draft completed



From: Scott Hollier [mailto:scott@hollier.info]
To Jason, et. al.

That’s fine, but is there any way I can easily distribute the information to group members prior to the meeting so it can be discussed?
[Jason] Could you add it to the wiki? Perhaps you could add to the page that David started at https://www.w3.org/WAI/APA/task-forces/research-questions/wiki/Relevant_Researchers



________________________________

This e-mail and any files transmitted with it may contain privileged or confidential information. It is solely for use by the individual for whom it is intended, even if addressed incorrectly. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.


Thank you for your compliance.

________________________________
Received on Friday, 23 December 2016 05:31:35 UTC