You shall not pass: Mitigating SQL Injection Attacks on ...

13
You shall not pass: Mitigating SQL Injection Aacks on Legacy Web Applications Rasoul Jahanshahi [email protected] Boston University Adam Doupé [email protected] Arizona State University Manuel Egele [email protected] Boston University ABSTRACT SQL injection (SQLi) attacks pose a significant threat to the security of web applications. Existing approaches do not support object- oriented programming that renders these approaches unable to protect the real-world web apps such as Wordpress, Joomla, or Drupal against SQLi attacks. We propose a novel hybrid static-dynamic analysis for PHP web applications that limits each PHP function for accessing the database. Our tool, SQLBlock, reduces the attack surface of the vulnerable PHP functions in a web application to a set of query descriptors that demonstrate the benign functionality of the PHP function. We implement SQLBlock as a plugin for MySQL and PHP. Our approach does not require any modification to the web app. We evaluate SQLBlock on 11 SQLi vulnerabilities in Wordpress, Joomla, Drupal, Magento, and their plugins. We demonstrate that SQLBlock successfully prevents all 11 SQLi exploits with negligible perfor- mance overhead (i.e., a maximum of 3% on a heavily-loaded web server). KEYWORDS Network Security; Database; SQL Injection; Web Application ACM Reference Format: Rasoul Jahanshahi, Adam Doupé, and Manuel Egele. 2020. You shall not pass: Mitigating SQL Injection Attacks on Legacy Web Applications. In Pro- ceedings of the 15th ACM Asia Conference on Computer and Communications Security (ASIA CCS ’20), October 5–9, 2020, Taipei, Taiwan. ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/3320269.3384760 1 INTRODUCTION The growing number of users for services such as social networks, news, online stores, and financial services makes these services a tempting source of sensitive information for attackers. Symantec’s recent report [7], shows an increase of 56% in web attacks from 2017 to 2018. Moreover, according to Akamai [1], 65.1% of web attacks were SQLi attacks. SQLi is a type of code injection attack, where an attacker aims to execute arbitrary SQL queries on a database. In 2018, the number of SQLi vulnerabilities discovered in the top Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. ASIA CCS ’20, October 5–9, 2020, Taipei, Taiwan © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-6750-9/20/10. . . $15.00 https://doi.org/10.1145/3320269.3384760 four most popular web apps (i.e., Wordpress, Joomla, Drupal, and Magento) increased by 267% compared to the prior year. There has been a great deal of research into identifying SQLi vulnerabilities and defending against SQLi attacks on web apps. Proposed approaches used various techniques such as static anal- ysis [10, 11, 20, 22, 35], dynamic analysis [3, 6, 15, 21, 24, 25, 37], or a mix of static-dynamic analysis [5, 17, 28]. While static analy- sis approaches can be promising, static analysis cannot determine whether input sanitization is performed correctly or not [34]. If the sanitization function does not properly sanitize user-input, SQLi attacks can still happen. Moreover, to the best of our knowledge, prior static analysis approaches for finding SQLi vulnerabilities in PHP web apps do not support Object-oriented programming (OOP) code. Such shortcomings in static analyses leave SQLi vulnerabili- ties undetected in web apps such as Wordpress, Joomla, and Drupal that more than 40% of active websites use [29]. Prior dynamic analyses use taint analysis [15, 18] and compar- ison of query parse trees [3, 6, 25, 26, 36, 37] for detecting SQLi attacks on web apps. Such dynamic analyses follow an incomplete definition of SQLi attacks where a SQLi attack always alters the syntactic structure of an SQL query. Ray et al. [31] show that this incomplete definition in CANDID [3], SQLCheck [36], WASP [15], and SQLPrevent [37] does not prevent specific SQLi attacks and also blocks benign requests. Other dynamic approaches have attempted to create a profile of the executed SQL queries and enforce the profile at runtime [25, 26]. The profiles are a mapping between the parse tree of the benign issued SQL queries and the PHP functions that issued the queries. The profiles created by such approaches are too coarse-grained. Particularly, modern and complex web apps such as Drupal and Joomla define database APIs that perform all database operations. Database APIs create SQL queries using the principle of the encapsulation that allows the local functions to issue an SQL query to the database without passing the SQL query as an argument. In such cases, existing approaches map SQL queries to functions in the database APIs instead of mapping to the func- tion that uses database API for communicating with the database. Hence, the prior approaches create a coarse-grained mapping that can allow an attacker to perform mimicry SQLi attacks. Specifically, Mereidos et al. in SEPTIC [25] propose an approach to block SQLi attacks inside the database. During training mode, SEPTIC records a profile that maps the parse trees of benign issued SQL queries to an identifier. SEPTIC generates the identifier in mysql and mysqli; two database extensions in PHP for commu- nicating with a MySQL database. The identifier is inferred from the PHP call-stack that issued a call to one of the methods in the mysql or mysqli API for executing an SQL query on the database (e.g., mysql_query). The identifier is a sequence of functions in the arXiv:2006.11996v3 [cs.CR] 12 Jul 2020

Transcript of You shall not pass: Mitigating SQL Injection Attacks on ...

Page 1: You shall not pass: Mitigating SQL Injection Attacks on ...

You shall not pass: Mitigating SQL Injection Attacks on LegacyWeb Applications

Rasoul [email protected] University

Adam Doupé[email protected]

Arizona State University

Manuel [email protected] University

ABSTRACTSQL injection (SQLi) attacks pose a significant threat to the securityof web applications. Existing approaches do not support object-oriented programming that renders these approaches unable toprotect the real-world web apps such as Wordpress, Joomla, orDrupal against SQLi attacks.

We propose a novel hybrid static-dynamic analysis for PHPweb applications that limits each PHP function for accessing thedatabase. Our tool, SQLBlock, reduces the attack surface of thevulnerable PHP functions in a web application to a set of querydescriptors that demonstrate the benign functionality of the PHPfunction.

We implement SQLBlock as a plugin for MySQL and PHP. Ourapproach does not require any modification to the web app. Weevaluate SQLBlock on 11 SQLi vulnerabilities in Wordpress, Joomla,Drupal, Magento, and their plugins. We demonstrate that SQLBlocksuccessfully prevents all 11 SQLi exploits with negligible perfor-mance overhead (i.e., a maximum of 3% on a heavily-loaded webserver).

KEYWORDSNetwork Security; Database; SQL Injection; Web Application

ACM Reference Format:Rasoul Jahanshahi, Adam Doupé, and Manuel Egele. 2020. You shall notpass: Mitigating SQL Injection Attacks on Legacy Web Applications. In Pro-ceedings of the 15th ACM Asia Conference on Computer and CommunicationsSecurity (ASIA CCS ’20), October 5–9, 2020, Taipei, Taiwan. ACM, New York,NY, USA, 13 pages. https://doi.org/10.1145/3320269.3384760

1 INTRODUCTIONThe growing number of users for services such as social networks,news, online stores, and financial services makes these services atempting source of sensitive information for attackers. Symantec’srecent report [7], shows an increase of 56% in web attacks from 2017to 2018. Moreover, according to Akamai [1], 65.1% of web attackswere SQLi attacks. SQLi is a type of code injection attack, wherean attacker aims to execute arbitrary SQL queries on a database.In 2018, the number of SQLi vulnerabilities discovered in the top

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected] CCS ’20, October 5–9, 2020, Taipei, Taiwan© 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-6750-9/20/10. . . $15.00https://doi.org/10.1145/3320269.3384760

four most popular web apps (i.e., Wordpress, Joomla, Drupal, andMagento) increased by 267% compared to the prior year.

There has been a great deal of research into identifying SQLivulnerabilities and defending against SQLi attacks on web apps.Proposed approaches used various techniques such as static anal-ysis [10, 11, 20, 22, 35], dynamic analysis [3, 6, 15, 21, 24, 25, 37],or a mix of static-dynamic analysis [5, 17, 28]. While static analy-sis approaches can be promising, static analysis cannot determinewhether input sanitization is performed correctly or not [34]. If thesanitization function does not properly sanitize user-input, SQLiattacks can still happen. Moreover, to the best of our knowledge,prior static analysis approaches for finding SQLi vulnerabilities inPHP web apps do not support Object-oriented programming (OOP)code. Such shortcomings in static analyses leave SQLi vulnerabili-ties undetected in web apps such as Wordpress, Joomla, and Drupalthat more than 40% of active websites use [29].

Prior dynamic analyses use taint analysis [15, 18] and compar-ison of query parse trees [3, 6, 25, 26, 36, 37] for detecting SQLiattacks on web apps. Such dynamic analyses follow an incompletedefinition of SQLi attacks where a SQLi attack always alters thesyntactic structure of an SQL query. Ray et al. [31] show that thisincomplete definition in CANDID [3], SQLCheck [36], WASP [15],and SQLPrevent [37] does not prevent specific SQLi attacks and alsoblocks benign requests. Other dynamic approaches have attemptedto create a profile of the executed SQL queries and enforce theprofile at runtime [25, 26]. The profiles are a mapping between theparse tree of the benign issued SQL queries and the PHP functionsthat issued the queries. The profiles created by such approachesare too coarse-grained. Particularly, modern and complex web appssuch as Drupal and Joomla define database APIs that perform alldatabase operations. Database APIs create SQL queries using theprinciple of the encapsulation that allows the local functions toissue an SQL query to the database without passing the SQL queryas an argument. In such cases, existing approaches map SQL queriesto functions in the database APIs instead of mapping to the func-tion that uses database API for communicating with the database.Hence, the prior approaches create a coarse-grained mapping thatcan allow an attacker to perform mimicry SQLi attacks.

Specifically, Mereidos et al. in SEPTIC [25] propose an approachto block SQLi attacks inside the database. During training mode,SEPTIC records a profile that maps the parse trees of benign issuedSQL queries to an identifier. SEPTIC generates the identifier inmysql and mysqli; two database extensions in PHP for commu-nicating with a MySQL database. The identifier is inferred fromthe PHP call-stack that issued a call to one of the methods in themysql or mysqli API for executing an SQL query on the database(e.g., mysql_query). The identifier is a sequence of functions in the

arX

iv:2

006.

1199

6v3

[cs

.CR

] 1

2 Ju

l 202

0

Page 2: You shall not pass: Mitigating SQL Injection Attacks on ...

PHP call-stack that pass the SQL query as an argument. In enforce-ment mode, SEPTIC checks the parse tree of the SQL query againstthe profile it obtained during training mode. The design of SEPTICleaves two unsolved challenges. (i) The strict comparison of the SQLquery’s parse tree against the profile leads SEPTIC to reject a rangeof dynamic yet benign SQL queries, thus causing false positives. (ii)The coarse-grained mapping of SEPTIC’s profile allows an attackerto perform mimicry SQLi attacks successfully. The approach forcreating the identifier in SEPTIC does not consider the fact thatweb apps do not necessarily pass the SQL query as an argument. Asa result, SEPTIC assigns the SQL queries to a small set of functionsin database API as an identifier.

We evaluated SEPTIC’s protection model against the Drupalged-don vulnerability in Drupal [8]. Database API in drupal uses theencapsulation concept that means Drupal’s functions do not passthe SQL query as an argument to the database API. Hence, SEPTICmaps all issued SQL queries to the same sequence of functions inthe database API instead of the function that interacts with thedatabase through the database API. During training mode, SEPTICcreates its profile by mapping all the received SQL queries to asingle identifier. This mapping in the profile means that any func-tion that communicates with the database in Drupal can issue allthe SQL queries in the SEPTIC’s profile. For instance, an attackercan exploit the Drupalgeddon vulnerability in the presence of theSEPTIC and use the login functionality to issue an SQL query forcreating an admin user.

Considering the challenges and open problems with existing de-fenses against SQLi attacks for PHP web apps, we propose a novelhybrid static-dynamic analysis and its implementation SQLBlockto defend OOP web apps against SQLi attacks. SQLBlock consistsof four steps for defending web apps against SQLi attacks. In thefirst step, SQLBlock collects benign inputs through unit tests orbenign browsing of web apps and creates a mapping between theissued SQL query and the function that issued the query. The staticanalysis is necessary to determine the database API precisely andsubsequently identify the PHP function that uses this API to com-municate with the database correctly. In the next step, SQLBlockcreates a profile based on the issued query from each function inthe web app during the training mode. The profile in SQLBlockis a mapping between the function that issues the SQL query anda query descriptor that describes the benign functionality of theSQL query. In the last step, SQLBlock enforces the profile insidethe database to prevent the execution of any SQL query that doesnot match the profile at the runtime. We evaluate our system on atotal of 11 known SQLi vulnerabilities of the top four most popularreal-world web appsWordpress, Drupal, Joomla, Magento, and theirplugins. SQLBlock defends against all SQLi exploits, while SEPTICcan only defend against four SQLi attacks in our dataset.

In summary, we make the following contributions:

• We recognize that the object-oriented programming para-digm poses challenges for existing systems that lead to falsepositives and reduced protection against SQLi attacks. Wepropose a novel system to statically and precisely identifydatabase API in a PHP web app, and dynamically restrict theSQL queries that MySQL executes based on the PHP functionthat composes the SQL query.

• We present a prototype implementation called SQLBlock asa MySQL plugin. It can be used with minimal modificationsto MySQL for defending against more types of SQLi attacksagainst PHP web apps than prior work. (more details in § 3)

• We evaluate SQLBlock for its security and performance char-acteristics on four popular PHP web apps and seven plugins.SQLBlock protects the database and the web app against11 previously known SQLi vulnerabilities in our evaluationdataset with an acceptable performance overhead (<3%).

We will open source our implementation of SQLBlock, including thetesting and evaluation dataset. Our dataset includes 11 vulnerablePHP web app and plugins, as well as automated Selenium scriptsrecorded from human interactions with each web app.

2 BACKGROUNDIn this section, we provide an overview of object-oriented program-ming in PHP and the PHP extensions that are used to communicatewith MySQL databases. Afterward, we discuss MySQL and its plu-gin architecture. Understanding the OOPmodel in PHP is necessaryfor our static analysis. Besides that, the knowledge of database ex-tensions in PHP for communicating with MySQL and the powerof MySQL plugins shapes the implementation of SQLBlock. Wethen discuss different types of SQLi attacks that impact the profilecreated in step 3 of SQLBlock.

2.1 PHPPHP is an open-source server-side scripting language. Accordingto W3Techs [30], 79.1% of all websites use PHP as their server-sidelanguage. PHP supports binary extensions called plugins that pro-vide PHP with additional features such as cryptographic algorithms,mail transfer, or database communications.

Database API in PHP provides an interface for communicatingwith a database. Database API can be database-specific such asMySQL and SQLite, or a general interface such as PHP Data Ob-jects (PDO) for accessing various databases. The mysqli extensionprovides functionality to access MySQL databases in PHP scripts.Compared to mysql, which is another PHP extension for access-ing MySQL, mysqli provides three additional features: support forprepared statements, multiple statement queries, and transactions.PHP web apps tend to use mysqli due to aforementioned additionalcapabilities.

PDO is an abstraction layer that provides a consistent API foraccessing databases regardless of the database type. This featureallows a PHP script to use the same piece of PHP code to connectto different types of databases and issue queries. Although PDOdelivers a clean and simple API for accessing the database, it onlyprovides generic query-building functionality. For instance, PDOneither supports multiple SQL queries in one string, asynchronousqueries, nor automatic cleanup with persistent connections.

PHP supports the Object-Oriented Programming model, whichintroduces three new concepts for developing PHP web apps: in-heritance, polymorphism, and encapsulation. Inheritance and poly-morphism let developers extend the functionality of classes or im-plement an interface in more than one way. Encapsulation bundlesdata and methods into a single unit. Hence, OOP allows developersto create modular programs and extend the functionality of PHP

Page 3: You shall not pass: Mitigating SQL Injection Attacks on ...

database extension. Additionally, PHP provides dynamic features,such as creating objects from dynamic strings. new is the keywordfor creating objects from a class in PHP. The argument for the newkeyword, can be a class name or a string that represents the nameof the class. An example is shown in Figure 2, line 22, where thevalue of getDriver() defines the class that should be instantiated.

Besides the object-oriented design of database APIs, PHP webapps also implement database procedures. Database procedureshandle instantiating objects from the database API and return anobject from the database API or a sub-type of the database API.Throughout this paper, we call the database API and proceduresas the database access layer. The database access layer in the webapp handles the communication of the web app’s modules with thedatabase. SQLBlock determines the database access layer in PHPweb apps by reasoning about the source code of the PHP web appstatically with respect to the OOP implementation of the web apps.

2.2 MySQLMySQL is an open-source database management system. As of Au-gust 2019, according to Datanyze [12], MySQL is used in 46.03% ofthe deployed websites on the Internet. MySQL supports a pluginAPI that enables developers to extend the functionality of MySQL.MySQL Plugins can implement user authentication, query rewrit-ing components, or new parsers for additional keywords and ca-pabilities. MySQL plugins have access to different data structures,depending on their role. Of particular interest to this paper is thequery rewrite plugin, which can examine and modify a query whenMySQL receives the query before execution.

Query rewrite plugin has access to the parse tree of the SQLquery that MySQL received. Each node in the parse tree based onits type contains information regarding the element it representsfrom the SQL query. For instance, the function node (e.g., IN, <)contains information regarding the number of arguments passed tothe SQL function. SQLBlock uses the information that each nodecontains during its training and enforcement. Postparse pluginsalso have access to the information regarding the type of the SQLquery (e.g., SELECT, INSERT) and the name of the table that theSQL query needs to access. SQLBlock uses the information aboveto create and enforce the query descriptors for each received SQLquery.

2.3 SQL Injection attacksSQL injection (SQLi) is a code injection attack in which an attackeris able to control a SQL query to execute malicious SQL statementsto manipulate the database. SQLi attacks are classified into eightcategories [11, 16]:

(1) Tautologies: The attacker injects a piece of code into theconditional clause (i.e., where clause) in a SQL query suchthat the SQL query always evaluates to true [16]. The goal ofthis attack varies from bypassing authentication to extractingdata depending on how the returned data is used in theapplication.

(2) Illegal/Logically incorrect Queries: By leveraging this vul-nerability, an attacker can modify the SQL query to causesyntax, type conversion, or logical errors [16]. If the webapp’s error page shows the database error, the attacker can

learn information about the back-end database. This vulnera-bility can be a stepping stone for further attacks that revealsthe injectable parameters to the attacker.

(3) UnionQuery: In union query attacks, the attacker tricks theapplication to append data from the tables in the database fora given query [16]. An attacker adds one or more additionalSELECT clause, which start with the keyword UNION, thatleads to merging results from other tables in the databaseto the result of the original SQL query. The goal of such anattack is to extract data from additional tables in the database.

(4) Piggy-backedQuery: Piggy-backed query enables attackersto append at least one additional query to the original query.Therefore the database receivesmultiple queries in one stringfor execution [16]. The attacker does not intend to modifythe original query but to add additional queries. Using thepiggy-backed query, an attacker can insert, extract, or modifydata as well as execute remote commands as well as extractdata from the database. The success of the attack dependson if the database allows the execution of multiple queriesfrom a single string.

(5) Stored procedures: Stored procedures are a group of SQLqueries that encapsulate a repetitive task. Stored proceduresalso allow interaction with the operating system [16], whichcan be invoked by another application, command line, or an-other stored procedure. While a database has a set of defaultstored procedures, the SQL queries in a stored procedurecan be vulnerable similar to SQL queries outside the storedprocedure.

(6) Inference: In this type of attack, the application and thedatabase are prevented from returning feedback and errormessages; therefore, the attacker cannot verify whether theinjection was successful or not [16]. In the inference attacks,the attacker tries to extract data based on answers to true/-false questions about the data already stored in the database.

(7) Alternate Encoding: In order to evade detection, the attack-ers use different encoding methods to send their payload tothe database. Each layer of the application deploys variousapproaches for handling encodings [16]. The difference be-tween handling escape characters can help an attacker toevade the application layer and execute an alternate encodedstring on the database layer.

(8) Second order injections: One common misconception isthat the data already stored in the database is safe to ex-tract [11]. In a second order attack, an attacker sends hiscrafted SQL query to the database to store his attack payloadin the database. The malicious payload stays dormant in thedatabase until the database returns it as a result of anotherquery, and the malicious payload is insecurely used to createanother SQL query.

3 RELATEDWORKIn this section, we review the relevant literature on defending webapps against SQLi attacks. We also compare SQLBlock with fiveexisting approaches and explainwhy prior systems are not sufficientfor PHP web apps that utilize OOP to communicate with databases.

Page 4: You shall not pass: Mitigating SQL Injection Attacks on ...

Our comparison based on the SQLi attack type is presented inTable 1. For each SQLi attack type in Table 1, means the tool candefend against the type of attack, means the tool is ineffective,and means that the tool can partially defend the web app againstSQLi attack. Partially defending means that either the tool canonly defend web apps that do not use OOP for implementing thecommunication with the database, or the definition of SQLi attacksin the tool is incomplete. The last column of Table 1 shows thenumber of SQLi exploits from our dataset in Table 3 that each toolcan prevent.

Static Analysis: Several proposed approaches focus on detectinginjection vulnerabilities statically in the source code of web appli-cations [10, 11, 18, 20, 38]. Dahse et al. [10] proposed RIPS, an inter-and intra-procedural data flow analysis for detecting XSS and SQLivulnerabilities in web apps. Pixy [20] implements a flow-sensitivedata flow analysis to find XSS and SQLi vulnerabilities in web apps.WebSSARI [18] uses taint analysis to track untrusted user-inputs todetect command injection vulnerabilities. Dahse et al. [11] imple-ment a context-sensitive taint analysis to analyze read and writeoperations to the memory locations in webserver for finding thesecond-order injections. Wassermann et al. [38] proposed a staticanalysis for detecting the injection vulnerabilities in web apps. Amajor drawback of prior analyzes are the inability to detect SQLivulnerabilities in web apps such as Wordpress, Joomla, and Drupalthat utilize OOP for communicating with databases.

Dynamic Analysis: Dynamic approaches track user-inputs [3, 6,36, 37], or build a profile of benign SQL queries [24–26, 34] to pre-vent SQLi attacks on web apps. SQLPrevent [37] analyzes generatedqueries for the existence of HTTP request parameters and raisesan alert when an HTTP request parameter modifies the syntaxstructure of a query. SQLGuard [6] proposed a dynamic approachfor comparing the parse tree of issued queries at runtime before andafter the inclusion of user inputs. SQLGuard needs to modify thesource code of the web app. WASP [15] proposes a taint analysisto detect SQLi attacks on web apps. CANDID [3] records a set ofbenign SQL queries that the web app can issue by instrumentingthe web app’s source code and dynamically executing the SQLstatements with benign inputs. CANDID, SQLGaurd, WASP, andSQLPrevent assume that if the input does not change the syntax

Tool Taut.

Illegal/In

correct

Union

Pigg

y-back

Stored

proc.

Infer.

Alt.

encoding

Second

orderinj.

flagg

edas

attack

SQLrand [5] 0SQLCheck [36] 0Merlo et. al. [26] 0SEPTIC [25] 4DIGLOSSIA [34] 5SQLBlock 11

Table 1: Comparison of SQLBlock with other techniqueswith respect to SQLi attack type. SQLBlock provides themost effective protection.

structure of a SQL query, then a SQLi attack has not occurred. Suchan assumption can leave the web app vulnerable to SQLi attacksand also blocks benign generated queries [31]. Unlike CANDID,SQLGaurd, WASP, and SQLPrevent, SQLBlock does not detect SQLiattacks based on the modification to the syntax structure of the SQLquery. SQLBlock generates a set of query descriptors for benignqueries that each PHP function issues to the database. SQLBlockallows functions in the web app to issue queries, as long as thequery matches its query descriptors. Beyond this, SQLBlock doesnot need to modify the source code of the web app for its operation.

SQLCheck [36] tracks user-inputs to SQL queries and flags aSQL query as an attack if user-input modifies the syntactic struc-ture of the SQL query. This incomplete definition of SQLi attacksprevents SQLCheck from defending against tautology, inference,stored-procedure, and alternate encoding attacks. These four at-tacks do not necessarily modify the syntax structure of a SQL query.Considering this weaknesses, SQLCheck cannot protect web appsagainst any of the vulnerabilities in our dataset mentioned in Ta-ble 3.

Diglossia [34] proposed a dual parser as an extension to thePHP interpreter. Diglossia maps the query without user-inputsto a shadow query, and then it checks whether the parse tree ofactual query and the shadow query are isomorphic or not. If bothparse trees are isomorphic and the code in the shadow query isnot tainted with user-inputs, Diglossia passes the query to theback-end database. Diglossia is unable to defend against Second-order injection since Diglossia only checks queries with user-inputs.Moreover, Diglossia cannot detect alternate-encoding and stored-procedure attacks since these attacks do not modify the parse treeof the SQL query [25]. As shown in Table 1, SQLBlock defends webapps against more variants of SQLi attacks than Diglossia.

SEPTIC [25] creates a profile for each issued query during thetraining phase and enforces this profile to protect web apps againstSQLi attacks. During the training, SEPTIC creates a query modelthat includes all the nodes in the parse tree of a SQL query. Theprofile in SEPTIC is a mapping between the query model and anID. The ID is the sequence of functions that pass the query as anargument. During the enforcement, SEPTIC uses this sequence offunctions as an identifier and finds the appropriate query modelin the profile. If the issued query matches the query model in theprofile, SEPTIC allows the database to execute the query. Enforcinga profile based on the exact model of the generated queries thatincludes the name of table columns and number of SQL functionspreventsweb apps from generating dynamic yet benign SQL queries,which causes false positives in SEPTIC. For instance, assume thereis a webpage for searching for published music albums and userscan search based on the name of an album, an artist’s name, orthe released year. If SEPTIC is trained with SQL queries that onlyincludes the album’s name or the released year, it rejects any SQLqueries from a user that searches using the artist’s name. SQLBlocksolves this problem by creating query descriptors for SQL queries.Query descriptors generalize the benign SQL queries, which allowsthe web app to produce a range of dynamic queries.

Furthermore, to create an identifier for each issued query in theprofile, SEPTIC uses the information in the PHP call-stack thatissued the call to methods frommysql ormysqli. SEPTIC checks thesequence of functions in the PHP call-stack for the presence of SQL

Page 5: You shall not pass: Mitigating SQL Injection Attacks on ...

2

PHP Interpreter

Database Extension

DataBase

plugin

PR

SA

Training Mode

EnforcementMode

Mapping of

PHP functions to

query

Function 1 QD 1 ...

Function m ...

.

.

ExecuteSQL query

Do not execute

QD 1 QD k

QD n

HTTP Requests

constants

variables

class instatiations

class definitions

Parser

StringProcessor

Database access layer

Web app

AST

function definitions

1 3

Profile Generator

4

PHP Interpreter

DatabaseExtension

DataBase

plugin

PG

PE

1

DE

1

Benign Requests

DE

Figure 1: SQLBlock extracts the database access layer of the web app, builds a mapping between the function and the SQLqueries it issues, creates a profile for each function in the web app and enforces the profile using a MySQL plugin.

query in function’s arguments. Since OOP web apps do not passthe SQL query as an argument, SEPTIC cannot generate a correctidentifier for SQL queries. Instead, it creates the same identifierfor all the issued queries in the OOP web app. Consequently, anattacker can use a vulnerable function in the web app to issue anyquery from the profile. Considering the coarse-grained mappingthat SEPTIC builds for the web apps that use OOP, SEPTIC candefend against only four variants of SQLi attacks in our dataset. All4 SQLi attacks that SEPTIC can defend against reside in Wordpress.Wordpress does not use the encapsulation concept in its databaseAPI, and its modules provide SQL queries as function arguments;consequently, SEPTIC can correctly create its mapping. SQLBlockovercomes this problem by utilizing a static analysis that identifiesthe database API in the web app, which helps SQLBlock to correctlydetermine the function that interacts with the database.

Merlo et al. [26] proposed a two step approach. First, it interceptsevery function call to mysql_query and records a profile for benignissued SQL queries. The profile is amapping between the issued SQLquery and the PHP function that calls the function mysql_query.During enforcement, [26] looks for the received SQL query in itsmapping profile and if the query does not syntactically match withany recorded query for the PHP function, [26] blocks the query.The proposed approach in [26] maps all the SQL queries to theinternal functions in the database API instead of the appropriatefunction that uses the database API for communicating with thedatabase. Besides that, enforcing a strict comparison of the parsetree limits the functionality of the web app for generating dynamicSQL queries. Table 1 shows that the proposed approach in [26]cannot protect web apps against any of the SQLi attacks in ourdataset.

Hybrid Analysis: Amnesia [17] builds a model of benign queriesin Java web apps statically. At runtime, Amnesia checks the queriespassed to the database against the built model. Amnesia highlydepends on the benign queries that are built during the static anal-ysis, which leads to a high number of false positives when appliedto programs that generate SQL queries dynamically. SQLRand [5]proposed a randomization technique for randomizing queries inweb apps. SQLRand randomizes the SQL queries in the web appand uses an intermediary proxy for de-randomizing before sending

the queries to the database. Since web apps generate SQL queriesdynamically, randomizing the queries using SQLRand is a challeng-ing task. Using an intermediate proxy introduces overwhelmingoverhead to web app performance [16, 36]. Besides that, since thereis one static key that modifies SQL keywords, the knowledge ofnew SQL keywords can compromise the security of SQLRand [6].

4 SYSTEM OVERVIEWIn this section we explain how SQLBlock records benign SQLqueries and limits the access of functions in a web app to the data-base. Figure 1 shows an overview of how SQLBlock defends webapps against SQLi attacks. Specifically, SQLBlock records a profileby observing benign issued queries by a web app. SQLBlock thenenforces the profile from inside the database for every query thatthe web app sends to the database.

In step 1 , SQLBlock performs a static analysis over the webapp to identify the database procedures that are used across theweb app’s scripts. This analysis is done once per web app andSQLBlock uses this information during training and enforcementof the profile.

In step 2 , SQLBlock is in the training mode and records thebenign issued SQL queries by the web app. SQLBlock can use benignbrowsing traces or the web app’s unit tests in its training. SQLBlockcreates a mapping between the benign SQL queries that MySQLreceives and the functions in the web app that used the databaseaccess layer to issue the query to the database.

In Step 3 , SQLBlock leverages the information from the firsttwo steps to assemble a trusted database-access profile. The profileis a set of allowed tables, SQL functions, and type of SQL queriesthat each function in the web app can issue. At the end of the thirdstep, SQLBlock acquires the necessary information to protect theweb app from SQLi attacks.

In step 4 , SQLBlock protects the running web app againstunauthorized database access by filtering access to the databaseaccording to the trusted profile generated in step 3 . The modifieddatabase extension (e.g., PDO in PHP) appends the execution infor-mation (i.e., call-stack) at the end of each SQL query as a commentbefore sending it to MySQL. Prior to the execution of each SQL

Page 6: You shall not pass: Mitigating SQL Injection Attacks on ...

query, SQLBlock extracts the appended execution information fromthe SQL query and identifies the function that communicate withthe database using the database access layer. SQLBlock checks thequery against the profile that corresponds to the function that is-sued the query. Finally, if the SQL query matches the profile, MySQLexecutes the query and returns the results.

4.1 Static Analysis of Web appsThe web app database access layer provides a unified interface tointeract with different databases. In step 1 , SQLBlock identifiesthe database access layer by statically analyzing the web app. Tothis end, SQLBlock creates a class dependency graph (CDG). TheCDG is a directed graph CDG = (V ,E), where the vertices (V ) areclasses and interfaces in the web app. An edge e1,2 ∈ E is drawnbetween v1 ∈ V and v2 ∈ V if v1 extends class v2, implementsinterface v2 .

After creating the CDG, SQLBlock extracts the list of classes andinterfaces in the web app that extend database APIs (e.g., PDO inPHP). To do so, we manually identify database extension classes(e.g., mysqli in PHP). Afterwards, SQLBlock iterates over the ver-tices of the CDG and checks whether a vertex is connected to thedatabase API. If a vertex is connected to the database API, SQLBlockadds it to the database access layer. SQLBlock also adds classes tothe database access layer if their methods initialize an instance of

1 $id = $_GET["id"]

2 function get_public_info{

3 include dirname(__FILE__)."/db/database.php";

4 $users = executeQuery("public_info", $id);

5 ...

6 }

7 get_public_info();

(a) get_public_info.php

1 class DatabaseConnectionmysqli

2 extends mysqli {

3 private $query;

4 function __construct(){

5 parent::__construct("localhost","admin","admin","mysqldb");

6 }

7 public function setQuery( $query ){

8 $this->query = $query;

9 ...

10 }

11 public function execute(){

12 return parent::query($this->query);

13 }

14 public function multi_execute(){

15 $result = parent::multi_query($this->query);

16 ...

17 }

18 }

19 function executeQuery( $tbl, $arg ) {

20 $query = "SELECT * FROM ".$tbl." WHERE id > ".$arg;

21 $classname = "DatabaseConnection".$this->getDriver();

22 return new $classname()->setQuery($query)->multi_execute();

23 }

(b) /db/database.php

Figure 2: Illustrative PHP code snippets demonstrating dy-namic inputs to new keyword

database API in PHP (e.g., mysqli_init). At the end of this itera-tion, SQLBlock possess a list of all classes and interfaces in the webapp that extends the database API.

Besides the object-oriented design of database APIs in web apps,operations on databases (e.g., SELECT operation) also have proce-dures [13]. Database procedures handle creation of objects fromdatabase API and setting correct parameters for modules in theweb app. A database procedure returns an object from a sub-typeof a database API. SQLBlock analyzes the body of the functions andprocedures in the web app for the returned objects. If the returnedobject is from a sub-type of a database API in the web app, thenSQLBlock considers it as a database procedure. At the end of thisstep, SQLBlock extracts information regarding the database API aswell as database procedures. This step is necessary for SQLBlockto find the function that used the database access layer for com-municating with the database during training and enforcing of theprofile.

Figure 2b shows a snippet of PHP code from a class that ex-tends the database API mysqli. There is also a database proce-dure called the executeQuery in Figure 2b that return an objectfrom DatabaseConnectionmysqli that is a subclass of mysqli.Figure 2a shows another snippet of code implementing a functioncalled get_public_info that uses executeQuery to retrieve datafrom the database. SQLBlock identifies DatabaseConnectionmysqlias a subclass of mysqli and executeQuery as a database procedure.

4.2 Collecting information regarding databaseaccess in the web app

In step 2 , we train SQLBlock using benign traces or unit tests tolearn benign SQL queries. Step 2 consists of two components thatwork together to create a mapping between the received SQL queryin MySQL and the function that composed the SQL query. The firstcomponent, appends the execution information at the end of eachSQL query before sending it to the database. The execution infor-mation includes the call-stack in the web app that led to sending aSQL query to the database using database extensions (e.g., PDO ormysqli in PHP).

The second part, a MySQL plugin, intercepts the execution ofthe incoming SQL queries to MySQL. When MySQL receives a SQLquery through benign traces or unit tests, SQLBlock records the SQLquery that MySQL receives including the execution informationappended to the SQL query. Since SQLBlock has access to the parsetree of the SQL query, SQLBlock traverses the parse tree and recordsinformation regarding the type of nodes in the parse tree of theSQL query. SQLBlock also logs the list of tables that the SQL queryaccesses, as well as the type of operation (e.g, SELECT operation) inthe SQL query.

4.3 Creating the profileSQLBlock in step 3 , leverages the access logs collected from benignSQL queries in step 2 and generates a profile that defines the accessto the database for each function in the web app that interactedwith the database. Particularly, the profile contains a set of querydescriptors for each function in the web app. A query descriptorcomprises four components. Each component specifies a differentaspect of the database access, that we explain below.

Page 7: You shall not pass: Mitigating SQL Injection Attacks on ...

• Operation: denotes the type of operation in the SQL query.The operation can be SELECT, INSERT, UPDATE, DELETE,etc. [9]. The profile records the type of operation in eachSQL query. Enforcing the operation type removes the possi-bility of a SQLi attack performing a different operation. Forinstance, when the profile only specifies a SELECT operation,the SQL query cannot perform an INSERT SQL query.

• Table: determines the tables that the SQL query can operateon. Restricting the tables used in a SQL query prevents anattacker from executing a SQL query on a different table.

• Logical Operator: indicates the logical operators [9] usedin the SQL query. Logical operators limit the ability of an at-tacker to use a tautology attack in a SQL query for extractingdata from a table.

• SQL function: determines the list of functions that the queryuses. The component also records the type of arguments thatare passed to each function. The list of functions restricts theattacker to use only the functions that are recorded duringthe training. This limits the attacker’s capability to use al-ternate encoding and stored-procedures attacks against thedatabase.

At the end of step 3 , SQLBlock acquired a set of query descriptorsfor each function in the web app that issued a SQL query based onthe training data that was obtained in step 2 .

4.4 Protecting the Web appIn the last step, SQLBlock is in enforcement mode and uses theprofile created in step 3 to restrict access to the database for eachfunction in the web app. When the database receives a SQL query,SQLBlock extracts information regarding the type of operation,table accesses, and parse tree of the received SQL query. Subse-quently, SQLBlock extracts the function that issued the SQL queryfrom the execution information appended to the incoming SQLquery. Afterwards, SQLBlock looks up in the profile and retrievesquery descriptors associated with the function that composed andissued the SQL query. For each query descriptor associated withfunction, SQLBlock compares each component of the query descrip-tor with the obtained information from the received SQL query.First, SQLBlock checks whether the type of operation in the re-ceived SQL query and in the query descriptor is the same or not.Second, SQLBlock examines the list of tables in the received SQLquery. The list of table in the received SQL query must be a subsetof the list of tables in the query descriptor. For the logical operators,SQLBlock checks whether the logical operators in the SQL querythat MySQL received is subset of the logical operators in the querydescriptor. Finally, SQLBlock inspects the functions used in thereceived SQL query as well as the type of arguments. The functionsand the type of arguments must be in the recorded query descriptor.SQLBlock takes a conservative approach and allows the databaseto execute the SQL query only if all four components of a querydescriptor associated with the function authorize the SQL query.

5 IMPLEMENTATIONIn this section we elaborate on the implementation challenges thatneeded to be addressed build SQLBlock. First, we explain how SQL-Block statically analyzes PHP web apps to identify the database ac-cess layer. Afterwards, we describe how SQLBlock uses the MySQLplugin API to record the SQL queries that the database receives.We explain how SQLBlock creates a precise profile for each PHPfunction based on the SQL queries issued to the database. Finally,we describe SQLBlock’s approach for using a MySQL plugin API torestrict database accesses.

5.1 Static Analysis of web appsIn step 1 , SQLBlock analyzes the web app to determine the data-base API and database interfaces across the PHP scripts in a webapp. SQLBlock performs a flow-insensitive analysis, which focuseson finding database API, interfaces, and procedures.

SQLBlock identifies all PHP files in the web app, using libmagic.We use php-parser [33] to parse each PHP script into an abstract syn-tax tree (AST). SQLBlock identifies classes, interfaces, and abstractdefinitions by scanning AST nodes that represent their correspond-ing definitions. SQLBlock examines interface and class definitionsacross the PHP web app to reason about the dependencies betweenclasses and interfaces, During analysis, SQLBlock creates a classdependency graph (CDG) and draws an edge between interfacesand classes when: 1) An interface extends another interface. 2) Aclass implements an interface. 3) A class extends another class.

After creating the CDG, the static analyzer ( SA ) iterates overthe nodes of the CDG to identify classes and interfaces that facilitatecommunication between the PHP web app and the database. Toaccomplish this, SQLBlock starts with the PDO and mysqli classes;two of the most popular database extensions in PHP. SQLBlockcreates a list of classes and interfaces that share an edge with PDO ormysqli classes in the CDG. For example, after creating the CDG forthe code in Figure 2b, SA identifies DatabaseConnectionmysqlias a subclass of mysqli.

SA must identify database procedures as well. SA decideswhether a procedure is a database procedure or not by analyzingthe type of object it returns. If a procedure returns an object from asubclass of the database API, SA marks that function as a databaseprocedure. For determining the object type that a function returns,SA analyzes the AST node of the return statement. There are twocases that SA is interested to follow:

• Instantiating an object using the new keyword: If thefunction is instantiating an object using the new keywordin the return statement, SA analyzes the argument thatis passed to the new keyword. If the argument is the nameof a subclass of a database API, SA marks the functionas a database procedure. If the argument is a variable, SAperforms a lightweight static analysis as a limited form ofconstant folding over strings that compose the value. SAmarks the function as a database procedure, if the resolvedvalue is a subclass of database API.

• Variable: If the function returns a variable, SA iteratesbackward on the AST to the last assignment of the variableand checks whether the assignment is a class instantiation

Page 8: You shall not pass: Mitigating SQL Injection Attacks on ...

1 SELECT * FROM public_info where id > 0 # mysqli::multi_query@DatabaseConnectionmysqli::multi_execute@executeQuery@get_public_info

2 FIELD@FUNC:>@2@FIELD@LITERAL # recorded info regarding the nodes in the SQL query

3 public_info@0 # recorded info regarding the table and operation type of the SQL query

Figure 3: recorded information regarding the execution of function get_public_info

or not. If it is a class instantiation, SA tries to resolve thetype of instantiated object as described above.

As discussed in § 2.1, PHP web apps often use variables as anargument for creating objects from classes using the new keyword.During analysis, SA keeps track of arguments passed to new inPHP scripts using a string representation.

5.1.1 String Representation. SA encounters strings when han-dling variable assignments and constant definitions. Strings canbe a mixture of literal components, function return values, andvariables.

When SA iterates over an assignment node in the AST, itrecords a set of information from the assignment node in a hashtable. SA keeps track of the name of the variable and the com-ponents on the right side of the assignment. SA also records thename of the function or the name of the class and method thatthe assignment statement occurs in. For example, in Figure 2b atLine 21, the function executeQuery has an assignment statement.The right side of the assignment concatenates a constant stringand a return value from a function. SA records the name of thevariable on the left side of the assignment as well as the value ofthe constant string and the return value from the function. SAalso records the type of operation on the right side (as discussednext, it is a concatenation operation). SA implements commonstring operations to resolve the value of the assignment.

5.1.2 String Operations. SQLBlockmanages frequent string-relatedoperations.

Variables: The argument passed to new can contain variablesdefined in the script. SA keeps track of variable definition inthe scope of script, class, or functions. When there is a variableassignment, SA creates an object for the variable and its value.

Concatenation: In PHP, strings can be constructed by joiningmultiple components with the . and .= operators. SA handlesstring concatenation by creating an object for concatenation andadds components that exist in the concatenation statement.

5.1.3 Identifying Database Procedures. To identify database pro-cedures, SA iterates over the assignments and resolves the valueof variables in the strings by looking for variables in the sameclass and function. If there is a variable without a value, SA rep-resents the value as a regular expression .* wildcard. SA looksfor a match between the generated regular expression and thelist of database API subclasses. For example, in Figure 2b, line 21,SA cannot determine the return value of $this->getDriver. Instead,SA represents the value as a .* wildcard. SA searches the listof database API subclasses for a class that matches the regularexpression DatabaseConnection*, and finds such a class named

DatabaseConnectionmysqli. SA marks executeQuery as a data-base procedure.

At the end of this step, SA has a list of database access layerclasses, interfaces, and procedures.

5.2 Profile Data CollectionThis step trains SQLBlock to create a mapping between issued SQLqueries and the web app’s function that relied on the databaseaccess layer to issue the SQL query. The collected information inthis step is necessary for generating query descriptors in step 3 .As described in § 4, the information collected for each SQL querycontains the operation, the access tables, the logical operators, theSQL functions that the query used, and the type of arguments ineach SQL function.

5.2.1 Attaching a PHP call stack: When MySQL receives a SQLquery, SQLBlock must infer which PHP function actually issuedthe SQL query. To achieve this, we modified the source code of theMySQL driver for the PDO and mysqli extensions. This modificationappends the PHP call-stack at the end of the query as a commentbefore sending it to the database.

To access the PHP call-stack, we use the Zend framework’s built-in function called zend_fetch_debug_backtrace. Zend keeps theinformation regarding the call-stack for the executing PHP script.This information includes the functions, class, their respective ar-guments, the file, and the line number that issued the call. Themodified database extension ( DE ) extracts the PHP call stack andappends it as a comment to the end of the SQL query.

5.2.2 Extracting information from the parse tree: Recorder plugin( PR ) acts as a post-parse MySQL plugin. PR has access to variousinformation regarding the parsed SQL query in MySQL: the type ofoperation (e.g. SELECT operation, etc.), the name of the table, andthe parse tree of the SQL query. MySQL provides a parse tree visitorfunction that PR uses to access the parse tree of SQL queries.

However, MySQL only allows plugins to access literal valuesof the query, such as user inputs in the parse tree. Because SQL-Block needs more information regarding the parsed SQL query,we modified the source code of MySQL-server so that the plugincan access non-Literal values as well. When MySQL invokes PR ,PR records the SQL query that MySQL receives. Afterwards, PRiterates over the parse tree of the SQL query and records the typeof each node. If the node represents a SQL function in the SQLquery, PR also records the number of arguments used in the SQLfunction. The node that represents the SQL function in the SQLquery also holds the number of arguments used in the SQL function.Afterwards, PR records the type of arguments passed to the SQLfunction as they appear in the parse tree of the SQL query. Lastly,PR logs the table and the type of operation for the SQL query thatMySQL received. In MySQL the information regarding the type of

Page 9: You shall not pass: Mitigating SQL Injection Attacks on ...

operation for a SQL query is shown as a number. Hence, PR logsthe type of operation for a SQL query as an encoded number inthe profile. Figure 3 shows the recorded information in the profile,when function get_public_info executes.

At the end of step 2 , SQLBlock has detailed information on thereceived SQL queries for training.

5.3 Creating the ProfileIn step 3 , profile generator ( PG ) creates a profile for each PHPfunction in the web app that accesses the database. PG relies onthe training data from step 2 as input.

PG reads the recorded information from step 2 . As shown inFigure 3, the first line is the SQL query including the PHP call-stack.Using the list created in step 1 , PG must infer which PHP usedthe database access layer to send the SQL query to the database.This is a difficult problem, because the last function on the call stackmight be a helper function that issues all queries for the application(and, in fact, this is how modern real-world PHP applications suchas Wordpress and Joomla are written). PG iterates over the stackof functions in the PHP call-stack and checks whether the functionor the method was recognized as a database procedure or databaseAPI method in step 1 . PG iterates over the stack starting fromthe last call in PHP call-stack until a function is not a databaseprocedure or database API method. PG identifies this function asthe function that created the database query.

As an example, the Line 1 in Figure 3 shows the SQL query thatMySQL receives including the PHP call-stack. PG detects mysqlias a database extension in PHP and DatabaseConnectionmysqli

as a class that extends mysqli. Then, PG visits the next functionexecuteQuery, whichwas identified as a database procedure in step1 . The next function in the PHP call-stack is get_public_info.get_public_info is not in the list of database procedures fromstep 1 , therefore PG identifies it as the PHP function that useddatabase access layer to send the SQL query to the database. PGwill then update get_public_info’s query descriptor.

Afterwards, PG iterates over the nodes of the SQL query’s parsetree and extracts all the logical operators. If all the logical operatorsare the same, PG updates the cond with the respective value. Ifboth logical operators (i.e, both OR and AND) are in the nodesof SQL query’s parse tree, PG sets cond to "Both". If there is nological operators in the SQL query, PG sets cond to "None". Basedon Figure 3, PG specifies that get_public_info does not use anylogical operators in its SQL query.

PG iterates over the list of nodes from the parsed tree of theSQL query and extracts the name of the used functions in the SQLquery as well as their respective arguments. Since the number ofarguments passed to the SQL function can be variable, PG does notrecord each argument’s type. Instead, PG summarizes the types ofarguments that a SQL function relies on. There are multiple typesof functions in MySQL such as numeric, string, comparison, anddate function. All of the aforementioned types of SQL functionsexcept the comparison type either receive less or equal to twoarguments or modifies the content of the first argument passedto the function. Comparison functions in MySQL (e.g., <, IN, etc.)

compare a single argument to a variable sized argument array.Moreover, the single argument appears as the first argument in theSQL comparison functions. Owing to this, PG records the typeof the first argument passed to a SQL function separately. If theargument is a table column, PG records it as a FIELD argument,otherwise PG records it as a LITERAL argument. Afterwards, PGiterates over the rest of the arguments passed to the SQL function.If the type of all the other arguments are the same type (i.e., FIELDor LITERAL), then PG records the value of the respective type inthe profile. Otherwise PG sets the type as var. For instance, basedon Figure 3, PG specifies that function get_public_info usedfunction ">", that the first argument is a table column and thesecond argument is a LITERAL.

Lastly, PG reads the information about the name of the tableand the type of SQL query. For instance, based on line 3 in Figure 3,PG deduces that function get_public_info accesses the tablepublic_info using a 0-type SQL query (i.e., SELECT SQL query).

At the end of step 3 , PG has a set of query descriptors foreach PHP function in the web app that issued a SQL query duringtraining in step 2

5.4 Protecting the web appIn step 4 , the enforcer plugin ( PE ) is on enforcement mode. PE

uses the profile that was generated in step 3 and protects thedatabase from queries that deviate from the profile. Similar to PG ,PE is implemented as a postplugin, which gives it access to theparse tree of the received SQL query. PE also uses the same PHPdatabase extensions as described in § 5.2.1. PE reads the profilefor each PHP function and uses it to analyze the received queries.

After receiving a query, MySQL parses the SQL query and callsPE . PE locates the call-stack and extracts the PHP function thatissued the query with the same approach described in § 5.3. Af-terwards, PE finds the query descriptors in the profile associatedwith the PHP function. PE checks the query against all four com-ponents of each query descriptor found for the PHP function. Foroperation type, PE checks whether the received SQL query hasthe same operation type as it is recorded in the profile. PE alsoexamines that the list of tables accessed for the received SQL queryis a subset of table access listed in the query descriptor. The logi-cal operators used in the received SQL query must be a subset ofthe logical operators in the query descriptor. Finally, the receivedSQL query can only use a subset of functions listed in the querydescriptor. PE also checks whether the arguments passed to eachfunction has the same type as it is recorded in the query descriptor.Only if the SQL query matches with all four components of at leastone query descriptor in the profile, PE allows MySQL to executethe SQL query and return the results. Otherwise PE returns Falseto MySQL-server, aborting execution of the query and returningan error to the web app, thus preventing a potentially maliciousattacker-controlled SQL query from executing.

Page 10: You shall not pass: Mitigating SQL Injection Attacks on ...

6 EVALUATIONWe assessed the ability of SQLBlock to prevent SQLi attacks ona set of popular PHP web apps. We also examined SQLBlock’sfalse positive rate during the benign browsing of the web app.Additionally, we evaluated the performance overhead of step 3 forthe benign browsing. For our evaluation, we answer the followingresearch questions:RQ1 How precise is SQLBlock’s static analysis?RQ2 Is SQLBlock effective against real world SQLi vulnerabilities in

popular web apps?RQ3 How practical is SQLBlock regarding performance overhead and

false positives?

6.1 Evaluation StrategyIn our evaluation, we performed our static analysis once for eachweb app in Section 6.2. We evaluated the database access layer re-solved by our static analysis in RQ1. Then we leveraged our databaseaccess layer to answer RQ2 and RQ3. We trained and built the profilefor SQLBlock using the official unit tests of each web app once andused the generated profile for the experiments to answer RQ2 andRQ3. The official unit tests examine the correctness of functions inthe web app by executing test-inputs and verifying their results.The advantage of unit tests over web crawlers is that there is noneed for manual intervention of administrators, specifically forproviding semantically correct inputs for each form in web apps. Aweb app’s unit tests are specifically tailored to its implementationand therefore are likely to achieve higher code coverage. Figure 4

Figure 4: The line coverage for unit tests and Burp suite onDrupal 7.0

shows that Drupal’s unit tests achieve higher line coverage com-pared to Burp suite and also covers almost all the lines that BurpSuite [23] covered. However, alternative approaches such as webcrawlers can also be used for training SQLBlock.

6.2 Evaluation DatasetWe evaluated SQLBlock on the four most popular PHP web apps,Wordpress, Joomla, Drupal, and Magento. According to W3Techs,

these web apps hold 70.5% of the market share among all existingcontent management systems (CMS) and power 38.4% of all thelive websites on the Internet combined [30]. Administrators installplugins and additional components to customize the web app andextend its functionality. To reflect this behavior in our evaluation,we also evaluate SQLBlock on plugins. We installed four vulnera-ble Wordpress plugins called Easy-Modal, Polls, Form-maker, andAutosuggest. We also installed three vulnerable plugins in Joomlanamed jsJobs, JE photo gallery, and QuickContact. To assess thedefensive capability of SQLBlock, we selected recent versions ofthe web apps and plugins that contain known SQLi vulnerabilities.We also considered the type of SQLi vulnerability in our dataset toinclude all types of SQLi exploits for a comprehensive evaluation.We collected a total of 11 SQLi vulnerabilities in different web appsand plugins.

6.3 Resolving The Database Access Layer (RQ1)In step 1 , SQLBlock scans the PHPweb app to identify the databaseaccess layer that is used to communicate with the database. Step1 is a crucial step to identify the correct function in the PHP call-stack that relies on the database access layer for interacting withthe database.

Table 2 presents the resolved database access layer statistics.The resolved subclasses column specifies the number of classes thatextends the database API in PHP. The resolved database procedurescolumn presents the number of functions that returns an objectfrom a subclass of the database API. Since there is no ground truthfor the database access layer in the web apps, we manually analyzethe output of SA for true positives. Subclasses of database APIs inthe PHP web apps also implement interfaces to facilitate actionssuch as iterating over elements in the object and counting elements.For instance, Drupal implements Iterator and Countable so thatthe PHP script can iterate over or count the number of records thatthe database returns to the PHP script. Since Drupal implementsCountable and Iterator in the subclasses of database API, SAadds these two interfaces to the database access layer. As shown inTable 2, the only false positives we observed during our evaluationare caused by the Iterator and Countable interfaces. All the webapps in our dataset except for Wordpress, use encapsulation in theirdatabase API subclasses and database procedures that show thenecessity of identifying the database access layer for creating aprofile. Without identifying the database access layer, SQLBlockwould operate similar to SEPTIC and map the received queries to asingle identifier.

Web app Resolved subclasses (FP) Resolved database procedureWordpress 4.7 1 -Drupal 7.0 44 (2) 38Joomla 3.7 30 (0) -Joomla 3.8 30 (0) -Mangeto 2.3.0 15 (0) -

Table 2: Resolved database access layer

Page 11: You shall not pass: Mitigating SQL Injection Attacks on ...

6.4 Defensive Capabilities (RQ2)We assessed the defense capabilities of SQLBlock against 11 SQLivulnerabilities listed in Table 3. We built and deployed five Dockercontainers that run a vulnerable version of a web app and a plu-gin. We exploit the vulnerabilities using exploits from MetasploitFramework [27], exploit-db [32], and sqlmap [4]. We consider anattack successful if an attacker can inject malicious SQL code intothe generated query in the web app and the database executes themalicious SQL query.

For this evaluation we used the results of our static analysisin RQ1. We trained SQLBlock using the official unit tests of webapps in their respective repositories. After creating the profile, weconfigured SQLBlock in the enforcement mode and assess whetherthe exploits in exploit-db and Metasploit Framework are successfulor not. Adversaries are not limited to use exploits in our evaluationand can craft their SQL queries to circumvent SQLBlock. To evaluatethe potential of such attacks, we also used sqlmap [4] to generatevarious exploits for the vulnerabilities listed in Table 3.

In Table 3, we present the list of SQLi vulnerabilities that SQL-Block defends the web apps against. The second column in Table3, represents the ID assigned to each vulnerability. We marked theSQLi vulnerabilities that reside in the core of web apps by C . Thethird column shows the type of attacks we performed to exploit therespective vulnerability. SQLBlock protects the web apps againstall 11 SQLi exploits in our dataset, while SEPTIC can only defendagainst four SQLi exploits that only reside in Wordpress plugins.

To evaluate the potential of circumventing SQLBlock, we alsolisted the available query descriptors for the SQL queries that thevulnerable PHP function in each web app or plugin can issue. For in-stance, any potential exploit against the first vulnerability in Table 3is restricted to an UPDATE query exclusively on table wp_em_modalswithout further logical operators. Furthermore, the exploit can onlyuse SQL functions "=" and "IN".

6.5 Performance (RQ3)Performance/responsiveness is a crucial factor for web apps. There-fore, we evaluate SQLBlock’s performance overhead. In SQLBlock,the first three steps can be performed offline. Steps 1 and 3 areautomatic and do not rely on help from the administrator. In step2 , the administrator must perform unit tests or create benigntraffic in the web app to train SQLBlock. Step 4 is deployed asa MySQL plugin and a set of modified PHP database extensionsto sandbox databases against malicious SQL queries. The MySQLserver loads SQLBlock’s protection plugin upon launch. SQLBlockloads the profile and waits for incoming SQL queries. We performour experiments on a 4-core Intel Core i7-6700 with 4Gb of memory2133Mhz DDR4 that runs Linux 4.9.0, with Nginx 1.13.0, PHP 7.1.20,and MySQL 5.7.

For the performance evaluation, we created a Docker [19] con-tainer that runs with a default configuration of PHP, Nginx, andMySQL containing the Drupal 7.0 web app. We measure the per-formance overhead of SQLBlock using ApacheBench [14], a toolfor benchmarking HTTP web servers. We simulated a real-worldscenario by increasing the level of concurrency in ApacheBench.The level of concurrency shows the number of open requests at atime. We measured the network response time of index.html in

Drupal 7.0 that issues 26 queries to MySQL. For more precise re-sults, we measured the response time for 10,000 requests at multiplelevels of concurrency. Table 4 presents our results for the afore-mentioned scenario. The first column in Table 4 shows the level ofconcurrency for each test. The next two columns in Table 4 presentthe network response time for Drupal with/without SQLBlock. Asshown in Table 4 SQLBlock incurs less than (2.5%) overhead to thenetwork response time of the server. Based on the strong protec-tions afforded by SQLBlock, we consider this overhead acceptable.Furthermore, SQLBlock is a prototype with no emphasis on per-formance optimization. Such optimizations likely could reduce theoverhead even further.

We also measured the execution time of queries in MySQL. Wemodified the source code of MySQL to calculate the time it takesfor MySQL to execute a SQL query. For this experiment, we usedApacheBench to send 10,000 requests to index.html in Drupal 7.0,which issued a total of 260,000 queries to MySQL. We measured theaverage execution time of issued queries for two different scenarios.The first scenario is MySQL without SQLBlock’s plugin, and in thesecond scenario, we enabled SQLBlock’s plugin in MySQL. Thelast two columns in Table 4 present the average execution time ofall the received queries to MySQL. The performance overhead ofSQLBlock in MySQL is less than 0.31 ms for each query.

Server Response Time(ms) MySQL Execution Time(ms)Concurrency Unprotected Protected Unprotected Protected

1 27.792 28.338 (1.96%) 0.150 0.234 11.644 11.813 (1.45%) 0.669 0.908 8.907 9.127 (2.46%) 0.732 1.0216 8.885 9.084 (2.23%) 0.740 1.0532 8.971 9.182 (2.35%) 0.747 1.02

Table 4: Response times for requests to Drupal index.php

6.5.1 False Positive Evaluation. We count an operation as a falsepositive if SQLBlock blocks a benign query to the database. For thefalse positive evaluation, we evaluated SQLBlock with Wordpress4.7 and Drupal 7.0. For each web app we used the profile thatwe built in RQ2. Then, we configured SQLBlock in enforcementmode and replayed browsing traces collected by Selenium [2]. Ourbrowsing traces explored the web app as a user and administratorwith the goal of covering the web app as much as possible.

Based on Table 5, only 10.11% of the issued queries during benignbrowsing and the unit test had the same query structure. This legit-imate difference in the query structure of issued queries rendersprior approaches that build their profile based on query structureunable to distinguish benign SQL queries from malicious ones. Forinstance, SEPTIC has above 89% false positive on the same test forDrupal 7.0. SQLBlock allows a query to execute in MySQL as long asthe query matches at least one of the query descriptors associatedwith the PHP function in the profile. In the false positive test forDrupal, SQLBlock did not block any query from the benign Sele-nium browsing. This shows that although the PHP functions duringtraining and testing used different queries, the query descriptorswere the same.

Table 5 shows that 82.57% of queries in our benign browsingweresimilar to queries recorded for SQLBlock’s profile. Although the

Page 12: You shall not pass: Mitigating SQL Injection Attacks on ...

Application Vulnerability SQLi Type Available query descriptorsWordpress 4.7 CVE-2017-12946 Taut., Infer., Alt. Encoding (update, wp_em_modals, none, [(=,field,literal),(IN,field,literal)])Wordpress 4.7 polls-widget 1.2.4 Taut., Infer., Alt. Encoding (update, wp_polls, none, [(=,field,lietral)])Wordpress 4.7 CVE-2019-10866 Infer. (select, wp_formmaker_submits, and, [(=,field,literal)])Wordpress 4.7 WPVDB-9188 Taut., Infer. (select, wp_posts, and, [(=,field,literal)])Drupal 7 CVE-2014-3704 C Taut., Union, Piggy-back, Stored Proc., Infer., Alt. Encoding (select, users, and,[(=,field,literal)])Joomla 3.7 CVE-2017-8917 C Union, Infer.,Alt. Encoding (select, [users, languages, fields], both, [(=,field,literal),(=, field, field),(IN, field, literal)])Joomla 3.8.3 com_jsjobs 1.2.5 Infer. (select, js_jobs_fieldsordering, none, [(=, field, literal)])Joomla 3.8.3 com_jephotogallery 1.1 Union, Infer. (select, jephotogallery, none, [(=, field, literal)])Joomla 3.8.3 CVE-2018-5983 Infer. (select, jquickcontanct_captach, none, [(=, field, literal)])Joomla 3.8.3 CVE-2018-17385 C Second order inj. (select, template_styles, and, [(=, field, literal)])magento 2.3.0 CVE-2019-7139 C Infer., Alt. Encoding (select, catalog_product_frontend_action, and, [(>=, field, literal),(<=, field, literal)])

Table 3: Exploits blocked by SQLBlock

rate of similar issued queries during training and testing of Word-press is higher that Drupal, SQLBlock blocked 7 unique queriesduring the benign browsing, which corresponds to 5% of all issuedqueries. There are two main reasons for the false positives in Word-press. The first reason is MySQL modifying the query based onthe arguments passed to SQL function in the query. For instance,if the length of the array passed to the IN statement in a query isone, MySQL modifies the IN statement to an equal (=) statement.This modification in the query and subsequently in the parse treeof the query leads to false positives for SQLBlock since SQLBlockencounters a different function in enforcement than what is in theprofile. The second reason is missing PHP functions in the profile.During the enforcement, SQLBlock blocks the SQL query if SQL-Block does not find any query descriptor for a PHP function thatissued the SQL query. Six out of seven false positives in Wordpresswas due to lack of query descriptors for the PHP function duringbenign browsing, which implies that covering all the functions thatcan issue a query during the training is an important factor forSQLBlock (Discussed further in § 7).

web app Unit tests Selenium (Unit tests ∩ Selenium) False PositiveDrupal 299961 336 34 (10.11%) 0

Wordpress 3099 132 109 (82.57%) 7Table 5: Number of unique SQL queries during unit testingand Selenium browsing

6.6 Artifact AvailabilitySQLBlock implementation is open-source and available at https://www.github.com/BUseclab/SQLBlock. Additionally, we providethe five Docker containers that include a total of 11 vulnerablePHP web apps and plugins that we used in our evaluation. Ourvulnerability dataset and the automated scripts were a significantpart of our evaluation, and we think that it can be useful for futureworks in this area.

7 DISCUSSION AND LIMITATIONSIn this section, we discuss the limitations of the SQLBlock andpossible future works in this area.

eval Function: PHP web apps use dynamic features imple-mented in PHP extensively, such as the eval function, which evalu-ates a string argument as a PHP code. Currently, SQLBlock does nothandle function and class definitions inside eval. A web app can useeval for defining the database API or procedures dynamically and

use it across the web app. This leads to generating a non-completelist of PHP database API and interfaces for a PHP web app in thestep 1 . In such cases, SQLBlock maps the query descriptors to asmall set of PHP functions that can allow the attacker to execute amalicious query. In future work, the static analyzer in SQLBlockcan be improved to handle the static PHP code passed to eval, todetermine a more precise database access layer.

Incomplete coverage during training: PHP web apps gener-ate dynamic queries based on user inputs. This approach makes itimpossible to issue all possible queries to the database during thetraining phase. Dynamic analyses suffer from incomplete trainingphases, and SQLBlock is not an exception. Our Wordpress false pos-itive test shows that the incomplete coverage of the issued queriesleads to SQLBlock blocking benign queries.

8 CONCLUSIONWe present SQLBlock, a hybrid dynamic-static technique, to restrictthe PHP web app’s access to the database. During the training step,SQLBlock infers issued SQL queries and their respective PHP call-stacks. Using a lightweight static analysis, SQLBlock extracts alist of database API and procedures in the PHP web app. In thethird step, SQLBlock creates a set of query descriptors for eachPHP function in the PHP web app that issued a SQL query to thedatabase. In the final step, SQLBlock acts as a MySQL plugin torestrict the interaction of the PHPweb app andMySQL based on thegenerated query descriptors. SQLBlock can prevent SQLi attacksagainst 11 vulnerabilities in the top four most popular PHP webapps and seven plugins without any false positives for Drupal 7.0and a low number of seven false positives for Wordpress benignbrowsing.

ACKNOWLEDGEMENTSWe thank our shepherd Giancarlo Pellegrino and the anonymousreviewers for their insightful comments and feedback. This workwas supported by the Office of Naval Research (ONR) under grantN00014-17-1-2541.

REFERENCES[1] Akamai. 2019. https://www.akamai.com/us/en/multimedia/documents/

state-of-the-internet.[2] Beans Erik Balde Samit, Barantsev Alexei. 2018. Selenium - Web Browser Au-

tomation. https://docs.seleniumhq.org/.[3] Sruthi Bandhakavi, Prithvi Bisht, P Madhusudan, and VN Venkatakrishnan. 2007.

CANDID: preventing sql injection attacks using dynamic candidate evaluations.In Proceedings of the 14th ACM conference on Computer and communicationssecurity. 12–24.

Page 13: You shall not pass: Mitigating SQL Injection Attacks on ...

[4] G. Bernardo and S. Miroslav. 2019. sqlmap: automatic SQL injection tool. https://sqlmap.org.

[5] Stephen W Boyd and Angelos D Keromytis. 2004. SQLrand: Preventing SQL in-jection attacks. In International Conference on Applied Cryptography and NetworkSecurity. 292–302.

[6] Gregory Buehrer, Bruce W Weide, and Paolo AG Sivilotti. 2005. Using parse treevalidation to prevent SQL injection attacks. In Proceedings of the 5th internationalworkshop on Software engineering and middleware. 106–113.

[7] G. Cleary, M. Corpin, O. Cox, H. Lau, B. Nahorney, D. O’Brien, B. O’Gorman, J.Power, S. Wallace, P. Wood, and Wueest C. 2019. Internet Security Threat Report.Technical Report 24. Symantec Corporation.

[8] The MITRE Corp. 2014. CVE-2014-3704. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-3704.

[9] Oracle Corporation. 2019. https://dev.mysql.com/doc/refman/8.0/en/.[10] Johannes Dahse and Thorsten Holz. 2014. Simulation of Built-in PHP Features

for Precise Static Code Analysis.. In Proceedings of the Network and DistributedSystem Security Symposium.

[11] Johannes Dahse and Thorsten Holz. 2014. Static Detection of Second-orderVulnerabilities in Web Applications. In Proceedings of the 23rd USENIX Conferenceon Security Symposium. 989–1003.

[12] Dataanyze. 2019. MySQL Market Share and Competitor Report. https://www.datanyze.com/market-share/databases/mysql-market-share.

[13] Drupal. 2016. https://www.drupal.org/docs/7/api/database-api.[14] Apache Software Foundation. 2018. ab - Apache HTTP server benchmarking

tool. https://httpd.apache.org/docs/2.4/programs/ab.html.[15] W. Halfond, A. Orso, and P. Manolios. 2008. WASP: Protecting Web Applications

Using Positive Tainting and Syntax-Aware Evaluation. IEEE Transactions onSoftware Engineering 34 (2008), 65–81.

[16] William G Halfond, Jeremy Viegas, Alessandro Orso, et al. 2006. A classifica-tion of SQL-injection attacks and countermeasures. In Proceedings of the IEEEInternational Symposium on Secure Software Engineering. 13–15.

[17] William G. J. Halfond and Alessandro Orso. 2005. AMNESIA: Analysis andMonitoring for NEutralizing SQL-injection Attacks. In Proceedings of the 20thIEEE/ACM International Conference on Automated Software Engineering. 174–183.

[18] Yao-Wen Huang, Fang Yu, Christian Hang, Chung-Hung Tsai, Der-Tsai Lee, andSy-Yen Kuo. 2004. SecuringWebApplication Code by Static Analysis and RuntimeProtection. In Proceedings of the 13th International Conference on World Wide Web.40–52.

[19] Docker Inc. 2018. Docker: Enterprise Container Platform.[20] Nenad Jovanovic, Christopher Kruegel, and Engin Kirda. 2006. Pixy: A Static

Analysis Tool for Detecting Web Application Vulnerabilities (Short Paper). InIEEE Symposium on Security and Privacy. 258–263.

[21] Anyi Liu, Yi Yuan, Duminda Wijesekera, and Angelos Stavrou. 2009. SQLProb: aproxy-based architecture towards preventing SQL injection attacks. In Proceedings

of the 2009 ACM symposium on Applied Computing. 2054–2061.[22] V Benjamin Livshits and Monica S Lam. 2005. Finding Security Vulnerabilities in

Java Applications with Static Analysis.. In USENIX Security Symposium. 18–18.[23] PortSwigger Ltd. 2019. https://portswigger.net/burp.[24] Ibéria Medeiros, Miguel Beatriz, Nuno Neves, and Miguel Correia. 2016. Hacking

the DBMS to prevent injection attacks. In Proceedings of the Sixth ACM Conferenceon Data and Application Security and Privacy. 295–306.

[25] Ibéria Medeiros, Miguel Beatriz, Nuno Neves, and Miguel Correia. 2019. SEP-TIC: Detecting Injection Attacks and Vulnerabilities Inside the DBMS. IEEETransactions on Reliability (2019).

[26] Ettore Merlo, Dominic Letarte, and Giuliano Antoniol. 2007. Automated protec-tion of php applications against SQL-injection attacks. In 11th European Confer-ence on Software Maintenance and Reengineering (CSMR’07). IEEE, 191–202.

[27] Metasploit. 2019. metasploit. https://www.metasploit.com.[28] Abbas Naderi-Afooshteh, Anh Nguyen-Tuong, Mandana Bagheri-Marzijarani,

Jason D Hiser, and Jack W Davidson. 2015. Joza: Hybrid taint inference fordefeating web application sql injection attacks. In 2015 45th Annual IEEE/IFIPInternational Conference on Dependable Systems and Networks. 172–183.

[29] Q-Success. 2019. Usage Statistics and Market Share of Content ManagementSystems for Websites, August 2019. https://w3techs.com/technologies/overview/content_management/all.

[30] Q-Success. 2019. Usage Statistics and Market Share of PHP for Websites, August2019. https://w3techs.com/technologies/details/pl-php/all/all.

[31] Donald Ray and Jay Ligatti. 2012. Defining code-injection attacks. In Acm SigplanNotices. 179–190.

[32] Offensive Security. 2019. Exploit Database. https://exploit-db.com.[33] Vadym Slizov. 2019. php-parser. https://github.com/z7zmey/php-parser.[34] Sooel Son, Kathryn S. McKinley, and Vitaly Shmatikov. 2013. Diglossia: detecting

code injection attacks with precision and efficiency. In Proceedings of the 20thACM SIGSAC conference on Computer. 1181–1192.

[35] Sooel Son and Vitaly Shmatikov. 2011. SAFERPHP: Finding Semantic Vulnerabil-ities in PHP Applications. In Proceedings of the ACM SIGPLAN 6th Workshop onProgramming Languages and Analysis for ecurity. 1–13.

[36] Zhendong Su and Gary Wassermann. 2006. The Essence of Command InjectionAttacks in Web Applications. In Conference Record of the 33rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 372–382.

[37] San-Tsai Sun and Konstantin Beznosov. 2008. Sqlprevent: Effective dynamicdetection and prevention of sql injection attacks without access to the applicationsource code.

[38] Gary Wassermann and Zhendong Su. 2007. Sound and Precise Analysis of WebApplications for Injection Vulnerabilities. In Proceedings of the 28th ACM SIGPLANConference on Programming Language Design and Implementation. 32–41.