comp-sec
Transcript of comp-sec
![Page 1: comp-sec](https://reader034.fdocuments.net/reader034/viewer/2022051816/544c75d2af7959f3138b4592/html5/thumbnails/1.jpg)
This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies areencouraged to visit:
http://www.elsevier.com/copyright
![Page 2: comp-sec](https://reader034.fdocuments.net/reader034/viewer/2022051816/544c75d2af7959f3138b4592/html5/thumbnails/2.jpg)
Author's personal copy
A Hot Query Bank approach to improve detectionperformance against SQL injection attacks
Yu-Chi Chung a,*, Ming-Chuan Wub, Yih-Chang Chen b, Wen-Kui Chang b
aDepartment of Computer Science and Information Engineering, Chang Jung Christian University, 396 Chang Jung Rd., Sec.1, Kway Jen,
Tainan 71101, TaiwanbDepartment of Information Management, Chang Jung Christian University, Tainan, Taiwan
a r t i c l e i n f o
Article history:
Received 15 September 2010
Received in revised form
20 October 2011
Accepted 22 November 2011
Keywords:
Web applications
Security
SQL injection attacks
Hot query
Bloom filter
SQLIA detectors
a b s t r a c t
SQL injection attacks (SQLIAs) exploit web sites by altering backend SQL statements through
manipulating application input. With the growing popularity of web applications, such
attacks have become a serious security threat to users and systems as well. Existing
dynamic SQLIA detectors provide high detection accuracy yet may have ignored another
focus: efficiency. Our research has found that inside most systems exist many hot queries
that current SQLIA detectors have repeatedly verified. Such repetition causes unnecessary
waste of system resources.
The research has completed Hot Query Bank (HQB), a pilot design that can cooperate
with the existing SQLIA detectors in web applications and enhance overall system
performance. HQB simply records hot queries and skip the detector’s verification process
on their next appearances. Algorithms for the design have been proposed. A series of
simulated experiments has been conducted to observe the performance improved from the
design with three respective detectors, SQLGuard, SQLrand, and PHPCheck.
The results have illustrated that utilization of HQB can indeed improve system
performance by 45% of execution time, regardless of different detectors being tested. With
such improvement and robustness, the result promises to provide an add-on feature for
SQLIA detectors in protecting web applications more efficiently. Future works include
further validation of the design in a real web application environment, development of
a standard interface to collaborate with web applications and detectors, etc.
ª 2011 Elsevier Ltd. All rights reserved.
1. Introduction
SQL injection is a type of security vulnerability in the database
layer of a web application (Halfond et al., 2006; Halfond and
Orso, 2005a; Mitropoulos and Spinellis, 2009). SQL injection
attacks (SQLIAs) exploit web sites by altering backend SQL
statements through manipulating application input. With the
growing popularity of Web applications, such attacks have
become a serious security threat to users and systems as well.
These attacks occur especially when the SQL statements
are combinedwith hard-coded strings for user inputs to create
dynamic queries. If a user input is not properly validated,
attackers may be able to change the developer’s intended SQL
command by inserting new SQL keywords or operators
through specially crafted input strings. SQLIAs leverage awide
range of mechanisms and input channels to inject malicious
commands into a vulnerable application (Alon et al., 1996).
The following example illustrates how an attacker can
* Corresponding author.E-mail addresses: [email protected] (Y.-C. Chung), [email protected] (M.-C. Wu), [email protected]
(Y.-C. Chen), [email protected] (W.-K. Chang).
Available online at www.sciencedirect.com
journal homepage: www.elsevier .com/locate/cose
c om p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 2 3 3e2 4 8
0167-4048/$ e see front matter ª 2011 Elsevier Ltd. All rights reserved.doi:10.1016/j.cose.2011.11.007
![Page 3: comp-sec](https://reader034.fdocuments.net/reader034/viewer/2022051816/544c75d2af7959f3138b4592/html5/thumbnails/3.jpg)
Author's personal copy
leverage the vulnerability of an application with a simple
SQLIA.
In the example, the following SQL statement may be typi-
cally used to initiate a web application with a user’s login and
pin as inputs.
The parameters “login” and “pin” are used to dynamically
build an SQL query to check if they match those in the data-
base. If a user submits login and pin as “John” and “101,” the
application dynamically builds the query:
If the inputted login and pin match the corresponding
entry in the database, the user’s account information will be
returned and then displayed. Otherwise, the authentication
fails and a null set will be returned by the database. However,
an applicationwith this statement is vulnerable to SQLIAs. For
example, if an attacker enters his/her login and pin with
“admin’ee” and any value (say “9”), the resulting querywould
be:
In SQL, “ee” is the comment operator and everything after
it will be ignored. Therefore, based upon this query, the
database will simply search for an entry where the login value
is “admin” and return the matched data record. As such, the
administrator’s account information will be released to the
attacker.
It is important to note that the above example merely
represents a simple SQLIA scenario. In reality, there are
various more sophisticated SQLIAs available to attackers.
During the past years, many studies (Halfond and Orso,
2005a; Fu and Qian, 2008; Xu et al., 2006; Haldar et al., 2005;
Mitropoulos and Spinellis, 2009) have proposed various
SQLIA detectors to prevent the occurrence of SQLIAs. They
either detect the vulnerability sources that may result in
SQLIA, or block out the malicious SQLIAs by users during
runtime. In the past, the effectiveness of such SQLIA detectors
are measured mainly with accuracy, that is, the probability of
correct judgment made by a certain detector. Generally,
misjudgment errors may fall on two types: false positive, i.e.,
the detector mistakenly treats a legitimate query as a SQLIA
and blocks it from entering the system; and false negative, i.e.,
the detectormistakenly treats a SQLIA query as legitimate and
allows it to be executed. Actually most of the recent research
results in SQLIAs have reached nearly 100% in accuracy.
However,we found that these results may have ignored another
focus: efficiency in terms of the computation cost. Almost all the
aforementioned studies have adopted a “pessimistic” prin-
ciple in designing their detectors, i.e., equally considering
each incoming query as a potential SQLIA, and verifying it for
its legitimacy. The verification process includes generation of
the parse tree of the SQL statement source, creation of the
parse trees for incoming queries, comparison of the two parse
trees, etc. Such design generally requires much computation
cost. Most of all, it seems to have inevitably generated
unnecessary wastes and thus hindered system overall
performance. Our argument is based on the following
observation.
Queries of most applications form in a pattern that
generally follows Zipf’s law (explained in Section 5.1) (Breslau
et al., 1999; Google, 2010; Zipf, 1929). That is, users are some-
times more interested in certain data items in the database,
and thus tend to query them more frequently. One example
would be that crocs� became very popular in the mid 2007,
and many on-line shopping sites continued to receive queries
about it during that time. A typical web application generally
processes majority of queries repetitively. As a certain query
continues to appear and its appearing frequency has exceeded
a given threshold value, it may be termed as a hot query.
(Please refer to Section 3 for its definition in more details.) It
would be unnecessary and waste system resources that
a SQLIA detector continues to verify such a hot query. For
example, a web application receives 100,000 queries within
a day and among them a hot query q appears repeatedly for
500 times. On its first appearance, the system will send q to
a SQLIA detector to verify its safety. If q is a normal (i.e., non-
SQLIA) query, the 499 subsequent verifications by the detector
are obviously unnecessary wastes of system resource.
Based on the above observation and argument, we have
designed Hot Query Bank (HQB), a mechanism to accelerate the
SQLIA detection process. We shall point out that the mecha-
nism is not a new SQLIA detector. Rather, it can cooperate
with a generic SQLIA detector and accelerate the verification
process based on a dynamic analysis concept (please refer to
Section 2.1.2 for further information about this analysis).
In essence, HQB is a white list mechanism which records
verified hot queries, and, during the runtime, intercepts all the
incoming queries by inquiring the recorded query lists. Only
those not found in the white list will be suspected as potential
SQLIAs and sent to the detector for further verification. Those
verified hot queries (i.e., found in the white list) will be, of
course, no longer sent to the detector for additional verifica-
tion, but rather directly to the database for execution.
However, there are two major challenges in implementing
such a mechanism:
1. HQB should be capable of fast verifying if a user’s input
query exists in the bank. The searching speed in HQB must
be faster than the detection speed of existing SQLIA
detectors. Otherwise, the new mechanism will even slow
down the whole system.
2. HQB should be able to reduce the consumption of system
memory asmuch as possible. Otherwise, it will squeeze the
memory space required by other modules of a web appli-
cation, and then affect the overall system performance.
To the best of our knowledge, it is the first study to apply
the hot query approach in accelerating SQLIA detection
process. We have completed a pilot design of HQB, theoreti-
cally analyzed its efficiency, and substantially implemented
the design with a designated SQLIA detector in a system.
Besides, we have measured the improvement of system
performance in a simulation of various scenarios. Notably, the
design can accelerate the SQLIA detection up to 45%.
c om p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 2 3 3e2 4 8234
![Page 4: comp-sec](https://reader034.fdocuments.net/reader034/viewer/2022051816/544c75d2af7959f3138b4592/html5/thumbnails/4.jpg)
Author's personal copy
The remaining of this paper is organized in five sections.
Section 2 briefly reviews past research about SQLIA detectors
and other studies related to hot queries identification. In
Section 3, we define hot queries and introduce related
parameters required in the research. Section 4 describes the
detailed design of HQB along with its theoretical foundation
and related algorithms. In Section 5, we illustrate the experi-
ment, a series of simulated tests of HQB with three respective
SQLIA detectors, and the efficiency our design has improved.
The final section summarizes the research result and directs
possible further works.
2. Related works
This section first reviews past results on SQLIA detectors, then
discusses studies related to hot queries, and finally explains
why these studies are insufficient to be applied in SQLIA
detection.
2.1. The review of SQLIA detectors
Halfond et al. classified and evaluated the techniques that
counteract SQLIAs (Halfond et al., 2006). Based on their clas-
sification and other results, we have compiled current SQLIAs
prevention techniques into taxonomy. As Fig. 1 shows, all the
countermeasures fall into three categories: static, dynamic, and
hybrid, based upon different focus stages: static during the
development, dynamic at runtime, and the hybrid attempting
to combine both. The following further reviews the three
approaches.
2.1.1. Static approachesStatic approaches detect or counteract the possibility of
a SQLIA at the time of compiling. These approaches scan the
application and leverage information flow analysis or
heuristics to detect codes that could be vulnerable to SQLIAs
(Halfond et al., 2008). The static method requires a large
number of source code changes, which will cause a burden for
programmers. Furthermore, it is not feasible to spend much
time in modifying the source codes for many existing web
applications. As such, many researchers focus on how to
dynamically analyze users’ input SQLs and block the mali-
cious attacks during the runtime.
2.1.2. Dynamic approachesThere are many proposed techniques in the category of
dynamic approach. Notably, the taint-based technique enforces
security policies by marking the un-trusted data and tracing
their flows through programs. For instance, Xu et al. focused
applications with source codes or interpreter in C (Xu et al.,
2006), while Haldar et al. targeted those in Java (Haldar et al.,
2005). Pietraszek and Vanden Berghe modified a PHP inter-
preter to track tainted information at the character level
(Pietraszek and Vanden Berghe, 2005). The technique applies
a context-sensitive analysis to reject SQL queries if an un-
trusted input has been used to create certain types of SQL
tokens. These approaches generally require significant
Fig. 1 e Taxonomy of countermeasures for SQL injection attacks.
c om p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 2 3 3e2 4 8 235
![Page 5: comp-sec](https://reader034.fdocuments.net/reader034/viewer/2022051816/544c75d2af7959f3138b4592/html5/thumbnails/5.jpg)
Author's personal copy
changes to a language’s compiler or its runtime system.
However, the requirement of modifications to the runtime
environment will reduce the portability of a system.
Other dynamic approaches involve query modification.
SQLrand (Boyd and Keromytis, 2004), by Boyd and Keromytis,
reconstructs queries at runtime using a cryptographic key
inaccessible to attackers. It provides a framework that allows
developers to create SQL queries using randomized keywords
instead of the normal ones (Boyd and Keromytis, 2004). A
proxy between the web application and the database inter-
cepts SQLs and de-randomizes the keywords. The injected
SQL keywords would not have been constructed by the
randomized keywords, and thus would result in a syntacti-
cally incorrect query. Whether SQLrand is secure relies on
whether attackers are able to crack the key. Thus the
approach requires developers to rewrite codes for the
application.
User’s inputs may be tagged with delimiters with which an
augmented SQL grammar device can detect the SQLIAs. The
parse tree approach, proposed both by Buehrer et al. and Su and
Wassermann, falls on this category. SQLGuard, by Buehrer
et al., checks at runtime whether the incoming queries
conform to an expected querymodel (Buehrer et al., 2005). The
model is deduced at runtime by examining the query struc-
ture before and after a client’s requests. That is, it will secure
vulnerable SQL statements by comparing the parse tree of
a statement at runtime with that of the original one, and thus
only allow a statement to execute with a matched compar-
ison. It also requires the developers to rewrite codes by using
a special intermediate library.
2.1.3. Hybrid approachesSome countermeasures combine a static analysis during
development with dynamic monitoring at runtime. For
example, AMNESIA associates a querymodelwith the location
of each query in the application and then monitors the
application to detect if any query diverges from the expected
model (Halfond and Orso, 2005a,b, 2006; Halfond et al., 2008).
In the development phase, AMNESIA employs a static analysis
to build a model of SQL queries that an application legally
generates at each access point to the database. At runtime, it
checks all the SQL queries with the built model before sending
them to the database. Unmatched queries are identified as
SQLIAs and treated as exceptions, with which the developers
handle by building recovery logics. Please note that AMNESIA
resembles SQLGuard on their runtime behaviors. Both secure
vulnerable SQL statements by comparing the parse tree of an
input query at runtime with the parse tree of the original
statement. The only difference lies in that the former creates
only the input parse tree at runtime (the parse tree of original
statement is done at development), while the later creates
both at runtime.
The above review has shown that both dynamic and
hybrid approaches are capable of analyzing and verifying
SQLs at runtime. Yet, as explained previously, it is point-
less and wastes system resources to continuously verify
queries that have repeatedly appeared and been proven
legitimate. This is the rationale behind our HQB approach.
The below section further reviews the past results on hot
queries.
2.2. Past results on finding hot queries
In a typical web application, user’s queries flow into the system
continuously like a stream. Our aim is to identify those queries
that appear relatively frequently from such a “query stream.”
Theyare termedas“hotqueries,”becauseof their relativelyhigh
appearing frequency, and thus defined as each of their appear-
ing frequencies exceeds a designated threshold. Some research
results (Alon et al., 1996; Charikar et al., 2002; Cormode and
Muthukrishnan, 2005; Jin et al., 2003; Gibbons and Matias,
1998) have proposed ways of identifying frequent items in
sucha streamingenvironment.Theirmethodmainly intercepts
each query and then filters the hot ones from the stream. Using
a table to record the frequencyofall queries entering the system
would consume a huge memory space and thus affect overall
system efficiency. To overcome this problem, most existing
algorithms have adopted an approximate-based solution, i.e.,
filteringahot query set but allowingavery small partmisjudged
in actuality. Since most algorithms have provided a very low
error judgment solutions (e.g., smaller than 0.1%), a generally
acceptable accuracy in the streaming environment, they are
suitable for such a purpose. Plus, they can greatly reduce
memory consumption needed for recording information, and
thus enhance performance.
Among these studies, hCount (Jin et al., 2003) employs
a hash table as its underling data structure. With the high
searching efficiency, the hash table especially meets our
demand of fast determining a hot query. For this, HQB’s
algorithm for filtering hot queries is based upon hCount. As
Bloom filter (Bloom, 1970) is hCount’s data structure, we further
discuss both in more details in the below sections.
2.2.1. Bloom filtersBloom filter is a hash table, initially designed to support the
membership query (i.e., “Is query q a member of set S?”). It
features with the side effect of Bloom filter error1 (i.e., non-
members misclassified as members), and, yet, with an excel-
lent space utilization capability, which makes it widely used
in the streaming environment (Chang et al., 2004; Rhea and
Kubiatowicz, 2004; Hodes et al., 2002; Reynolds and Vahdat,
2003).
Bloomfilter (BF) is a bit stringwithmbits, eachofwhich is set
to be zero initially. Below,we setA to be the bit string of a Bloom
filter, andA [i] (where 1� i�m) as the i-th bit of the Bloomfilter.
The Bloomfilter uses k independent hash functions h1, h2,., hkwith a range {1, 2,.,m}.When a query qi arrives, we setA[hj(qi)]
to 1 for 1� j� k. Toansweramembershipquery for anyquery qi,
users check whether all bits A[hj(qi)] are set to 1. If the value of
each bit is equal to 1, then qi has surely appeared. Otherwise, qihas not appeared. Due to a hash collision, a Bloom filter may
yield a Bloom filter error.1 That is, the Bloom filter may suggest
that qkhas appearedand yet it has not. The probability of Bloom
filter errors ( f ) canbederidedas follows (Bloom,1970;Guoetal.,
2010):
1 In the literature of Bloom filter, the phenomenon that non-members are misclassified as members is called false positive.To avoid the confusion with the term “false positive” in theliterature of SQLIA, we rename the term “false positive” of Bloomfilter as “Bloom filter error”.
c om p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 2 3 3e2 4 8236
![Page 6: comp-sec](https://reader034.fdocuments.net/reader034/viewer/2022051816/544c75d2af7959f3138b4592/html5/thumbnails/6.jpg)
Author's personal copy
f ¼�1�
�1� 1
m
�kn�k
z�1� e�kn=m
�k; (1)
where m is the size of A, k is the number of hash functions,
and n is the number of queries within the query stream.
A study (Bloom, 1970) has pointed out that with k ¼ (m/n)
ln2, f will have a minimum value of (0.6185)m/n. Though with
the existing errors, Bloom filter still features with the advan-
tages of high speed performance and very low memory
consumption, which makes it very suitable for the environ-
ment strictly limited in time and space (i.e., web applications,
streaming systems, etc). For example, given 10,000 queries
with 128 bits each in a query stream (i.e., n ¼ 10,000), it would
normally account for about 157 kB of space, but it will take
only about 18 kB of space for a Bloom filter to store these
queries, and allow only 0.1% of error probability.
2.2.2. hCountBloom filter was originally designed to support membership
querying that records the information about the existence of
queries (i.e., whether they exist or not), but not their appearing
frequency. In other words, it cannot answer the question
whetheracertainq ishotqueryornot.Toanswer that,weneedto
get the appearing frequency of queries. Jin et al. extended Bloom
filteranddesignedhCount (Jinetal., 2003) toestimate thenumber
of appearances for each query, which has provided a solution.
To record the appearing frequency of a query, hCount
adopts the idea of counting Bloom filter (Fan et al., 2000), but it
uses multiple bits, rather than one bit, to record information
for the size of A[i]. Research (Broder and Mitzenmacher, 2002)
has pointed out that four bits for the size of A[i] should suffice
for most applications.
hCount made another modification on Bloom filter by
cutting A into k slices, with the length of each slice m0 ¼ m/k.
That is, A is no longer a one-dimensional array but a k � m0
two-dimensional array. The advantage of this approach is that
each hash function can correspond to a slice, whichmakes the
hash keys distributed more uniformly.
Fig. 2 demonstrates how hCount works with a simple
example. Let k ¼ 4,m ¼ 20, and the query stream S ¼ (q1, q2, q3,
q2, q1, q1). hCount cuts A into k slices, with the length of each
slice m0 ¼ 5. The initial value of each element of A is zero, as
shown in Fig. 2(a).
When q1 appears for the first time, hCount inputs q1 to each
hash function, calculates its corresponding hash value, and
increases the corresponding element ofA by one. For example,
let h1(q1) ¼ 2, so the value of A[1][h1(q1)] ¼ A[1][2] will be incre-
mented by one. In the same example, let’s assume h2(q1) ¼ 3,
h3(q1)¼ 1, and h4(q1)¼ 4.When q1 appears for the first time, the
resultsofAwill bedepictedas Fig. 2(b). Fig. 2(c) and (d) showthe
results of A after q2 and q3 appear respectively. Fig. 2(e) shows
the content of A after all the queries of S appear.
hCount estimates the frequency of each query by using the
minimum value of the associated counters. For example, in
Fig. 2(e), the frequency of q1 is 3, becausemin(A[1][2],A[2][3], A
[3][1], A[4][4]) ¼ min(3, 6, 4, 3) ¼ 3. Similarly, we can get the
frequencies of q2 and q3 as 2 and 3 respectively.
The hash function collision may cause the estimation to,
more or less, stray from the actual value. For example, the
actual frequency of q3 is 1, while hCount would estimate it as
3, because the hash value of q3 collides with those of q1 and q2(i.e., h1(q3) ¼ h1(q2), h2(q3) ¼ h2(q1) ¼ h2(q2), h3(q3) ¼ h3(q1), and
h4(q3) ¼ h4(q2)). In fact, the study of hCount (Jin et al., 2003) has
also concluded that the estimated frequencies are all greater
than or equal to the actual value.
Although hCount is capable of finding hot queries within
a query stream, it still cannot be directly adopted in our
approach due to the following reasons:
First, Bloom filter is particularly suitable for the static data
set (Guo et al., 2010), and hCount also inherits this charac-
teristic. The so-called “static data set” refers to the data set
with its data cardinality that can be known beforehand.
However, in the web application environment, queries
continue to flow into the server. Eventually, almost every
query would be considered as a hot query due to the high
Bloom filter error probability. This means that hCount might
0
00000
00000
00000
0000
00010
00100
00001
0 0100
h1
h2
h3
h4
q1
00110
00200
01001
0 0110
h1
h2
h3
h4
q2
00210
00300
01002
0 0120
h1
h2
h3
h4
q3
00330
00600
02004
0 0330
a Intial hCount.h
1(q
1) = 2, h
2(q
1) = 3,
h3(q
1) = 1, and h
4(q
1) = 4.
Final result.
h1(q
2) = 3, h
2(q
2) = 3,
h3(q
2) = 4, and h
4(q
2) = 3.
h1(q
3) = 3, h
2(q
3) = 3,
h3(q
3) = 1, and h
4(q
3) = 3.
b c
d e
Fig. 2 e An example of hCount.
c om p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 2 3 3e2 4 8 237
![Page 7: comp-sec](https://reader034.fdocuments.net/reader034/viewer/2022051816/544c75d2af7959f3138b4592/html5/thumbnails/7.jpg)
Author's personal copy
misjudge a malicious query as a hot query, send it to the
database for execution, and jeopardize the system security.
Second, hCount cannot address the issue of “recent” hot
queries. For most web applications, the popularity of queries
changes as time goes by. For example, crocs� was a hot query
item in July 2007 but was nomore in January 2008. Continuous
removal of such out-of-date hot queries is essential to reduce
the consumed memory. However, as hCount does not record
the time a query appears, it is not equipped with the required
removal capability.
Lastly, Bloom filter errors would affect system security. As
discussed earlier, hCount would overestimate the frequency
of queries, which may result in misjudgment of some cold
queries as hot ones. Though this will not cause too much
trouble for most applications, it will affect the system security
under the environment of SQLIA detection. To explain this
simply, let q be a cold query but have been misjudged as a hot
one. The systemwould then keep presuming q as a frequently
appearing and non-malicious query and send it to the data-
base for further implementation. If q is indeed a SQLIA, then
the system security will be seriously affected.
Based upon hCount, our HQB design can further filter the
recent hot queries from the query stream with efficient
memory utilization. In addition, it can deal with the unknown
situations regarding query cardinality, and ensure system
security not to be affected by Bloom filter errors.
3. Preliminary
This section describes the related terms used in this article,
and formally defines the so-called “hot query.” Let S be a query
stream and W be the size of sliding window. Each query q in
the query stream is designated a timestamp q.t to indicate its
arrival time. We would say that q is a valid query if q.t ˛(tnow�W, tnow] where tnow is the current time.We set f (q) as the
occurrence frequency of query q in the sliding window. Let N
denote the sum of net occurrence of all queries in the sliding
window, that is, N ¼ Pcq:t˛ðtnow�W;tnow � fðqÞ.
Definition 3.1. Hot query. Let s be a support parameter
provided by an administrator and s˛ (0, 1). If f(q)� sN, then q is
a hot query.
Example. Say, we have four different queries a, b, c and d,
with each arrival time as shown in Fig. 3, where tnow ¼ 10, and
W ¼ 6. Note that query d is not a valid query because
d.t ¼ 2 ; (tnow � W, tnow] ¼ (4, 10]. In the sliding window, the
occurrence frequency of query a, b, and c are 3, 1, and 1
respectively (i.e., f (a)¼ 3, f (b)¼ 1, and f (c)¼ 1). Thus,N¼ f (a)þf
(b)þf (c) ¼ 5. Assume that the support parameter s is 0.3.
Therefore, in this example, only query a is qualified as a hot
query because f (a) ¼ 3 � sN ¼ 0.3 � 5 ¼ 1.5.
4. The design of Hot Query Bank (HQB)
As mentioned earlier, HQB is designed for enhancing the
efficiency of existing SQLIA detectors. The approach is, in
essence, a white listing storage, which records legitimate hot
queries. On the other hand, if an user’s query cannot be found
in the bank, it is likely to be unsafe and requires extra care,
such as forwarding an exception note to the administrators
or/and simply keeping it in the system log for later verifica-
tion. In this section, we describe the system architecture and
components design, and analyze its expected performance.
4.1. System architecture
Fig. 4 illustrates the system architecture of HQB. HQB can be
viewed as a middleware between web applications and data-
base. It will intercept and analyze queries from web applica-
tions to database. If a query is suspected as a SQLIA, an
exception note will be passed to web applications. Otherwise,
the query is considered legitimate andwill be forwarded to the
database for execution.
Fig. 5 shows a detailed flow chart about how HQB collab-
orates with a SQLIA detector to counter SQLIAs. HQB first
checks an incoming query to see if it is a legitimate hot query.
A passed legitimate hot query will be forwarded to the data-
base for execution. Otherwise, it will be sent to the SQLIA
detector for further inspection, where if it is detected to be
illegal, the detector will throw an exception, or it will be issued
to the database.
4.2. Implementation of HQB
4.2.1. The data structureHQB employs a method resembling hCount to obtain hot
queries, with three additional capabilities. First, a mechanism
tnow
011 98765432
tnow
-W
W = 6
b a ac ad
Fig. 3 e An example of a sliding window.
Web application
HQBSQLIA
detector
unidentifiedqueries
queryresult
result
unidentifiedqueries
Database
legitimatequeries
queryresult
Fig. 4 e System architecture.
c om p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 2 3 3e2 4 8238
![Page 8: comp-sec](https://reader034.fdocuments.net/reader034/viewer/2022051816/544c75d2af7959f3138b4592/html5/thumbnails/8.jpg)
Author's personal copy
is proposed to prevent Bloom filter errors from happening, as
they might cause HQB to mistreat certain cold queries as hot
ones, and thus jeopardize system security. Second, HQB can
handle the dynamic data set. Third, HQB provides a mecha-
nism to obtain recent hot queries. The underlying data
structure of HQB is described as follows:
A HQB consists of a set of Bloom filters. Initially, only one
Bloom filter will exist in HQB. As HQB continuously records
upcoming queries into the Bloom filter and reaches
a threshold of its capacity, HQB will add a new Bloom filter to
handle the additional queries. (Please refer to Section 4.3 for
detailed explanation about the threshold.) This will prevent
Bloom filter error probability from going too high as the query
number continues to grow. Fig. 6 illustrates a HQB with two
Bloom filters (i.e., BF[1]and BF[2]).
The algorithmwill record the last access time (LAT ) for each
Bloom filter. LAT represents the last time that the Bloom filter
is updated. In other words, all queries stored in this Bloom
filter should appear earlier before the LAT, which can be used
to determine whether a Bloom filter expires or not. More
precisely, a Bloom filter will expire if LAT of the Bloom filter is
smaller than tnow �W. HQB will drop the expired Bloom filters
in order to save memory resources.
In addition to Bloomfilters, HQBmaintains a data structure
called Query Repository (QR) to store the content of query. As
shown in Fig. 6, a query (select sex from user where
name ¼ ‘peter’ and pw ¼ ‘123’) is stored in the QR. As
mentioned earlier, Bloom filter errors may jeopardize system
security. We use QR to address this problem. Please refer to
Section 4.2.3 for detailed explanation.
Tomake the below explanation easier, we first define some
symbols. Let BF[i], BF[i].n, and BF[i].LAT be the i-th Bloom filter
of HQB, the number of queries stored in BF[i], and the last
access time of BF[i] respectively. Let HQB.size be the number of
Bloom filters that HQB owns. HQB’s Bloom filters are sorted in
accord with the increasing order of LATs. Therefore, BF[1]and
BF[HQB.size] will represent the farthest and the nearest Bloom
filter from the current time (i.e., tnow) respectively. Let th be the
threshold of each Bloom filter size. Our algorithmwill append
a new Bloom filter to HQB if the number of queries stored in
the current Bloom filter exceeds the threshold (i.e., BF
[i].n� th). In the following,wewill first describe the algorism of
HQB. As to the value of parameter k and th, Section 4.3 will
provide a further explanation.
Unidentifiedquery
is a legitimatequery?
executequery
throw anexception
Yes
No
The kernel of HQB.Please refer to Section 4.2.2for details.
Fig. 5 e Main flow chart of HQB.
BF[1] BF[2]
select sex from user where name = 'peter' and pw = '123'
Query Repository(QR)
Bloom filter
Fig. 6 e The data structures of HQB.
An unidentified query q
i
Insert into HQB(Algorithm 2)
Get qi’s
frequency(Algorithm 3)
is a hot query?
is appeared in QR?
returntrue
returntrue
returnfalse
SQLIAdetector
is a legitimate query?
insert qi
into QR
throw anexception
Yes
Yes
No
No
Yes
No
Fig. 7 e Flow chart of Algorithm 1.
c om p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 2 3 3e2 4 8 239
![Page 9: comp-sec](https://reader034.fdocuments.net/reader034/viewer/2022051816/544c75d2af7959f3138b4592/html5/thumbnails/9.jpg)
Author's personal copy
4.2.2. Main algorithmAs q enters the system, HQB will use Algorithm 1 (its flow
chart is shown in Fig. 7) to judge if q is a legitimate hot query.
Three main tasks of Algorithm 1 are: (1) update the frequency
of q, (2) determine if q is a hot query, and (3) verify if q is
a legitimate query.
The algorithm first checks if q exists in QR. If it does, q
must be a hot and legitimate query (further explanation
later), its occurrence time will be updated, and a true value
returned.
If it does not, q (1) may be a cold but legitimate query, or (2)
it is a SQLIA. Either case, the algorithm will send q to the
detector for verification. If q is legitimate, then the variable
queryLegtimate will be set true. Based upon the status of quer-
yLegtimate, two cases are discussed below:
� queryLegitimate is false: q is a SQLIA, and HQB will come out
with an exception note to users (Line 14).
� queryLegitimate is true: q is a cold but legitimate query.
The algorithm then records the frequency of q. The
task is completed by an insertion function. (Please refer
to the explanation of Algorithm 2 for more details
about the function.) Subsequently, HQB will call isHot-
Query function (i.e., Algorithm 3) to verify if q is a hot
query. If it is, then q may have just become a hot query
from a cold one previously after its continuous occur-
rence. Therefore, HQB will store q in QR and return true
(Lines 11e12).
4.2.3. The correctness of Algorithm 1This section discusses the correctness of Algorithm 1. We
want to explain that if q is a SQLIA, then the algorithm will
definitely return false. According to the algorithm, it will
return true only on two cases: (1) when QR has stored the
information about q (i.e., Line 4), or (2) when q is a legitimate
query (Line 12). Let q be a SQLIA and have appeared for f (q)
times. The correctness of Algorithm 1 may be ensured by
Algorithm 1. Determine whether q is a legitimate query or not.
c om p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 2 3 3e2 4 8240
![Page 10: comp-sec](https://reader034.fdocuments.net/reader034/viewer/2022051816/544c75d2af7959f3138b4592/html5/thumbnails/10.jpg)
Author's personal copy
proving that it will never execute Line 4 or Line 12 (i.e., never
return true), whenever q appears. Now we use the mathe-
matical induction method to complete the proving process
below.
Basis step. As q first appears (i.e., f (q) ¼ 1), because
QR has not stored its information, it will be forwarded to
the SQLIA detector for verification (Line 7), where it will
be detected as a SQLIA. Algorithm 1 will return false
(Line 15).
Inductive step. Let q appears for the kth time (i.e., f (q)¼ k).
Algorithm 1 will return false (Line 15), which represents QR
has not stored information of q. Now let f (q) ¼ kþ1. Because q
has been detected as a SQLIA at its kth appearance and its
information has not been recorded in QR, Algorithm 1 will
still send it to the SQLIA detector to verify its legitimacy and
return false (Line 15), and QR will not stored its information.
Through these process, the correctness of Algorithm 1 will be
ensured.
4.2.4. Query insertionAlgorithm 2 illustrates the detailed process of query insertion.
HQB first checks if the size of the last BF (i.e., BF[HQB.size])
exceeds th. If it exceeds th, a new BF will be appended. HQB
then inserts q into the last BF and increases n of the last BF by
one. HQB also sets the current time (i.e., tnow) to BF.LAT.
Because inserting q into HQB requires computation of the
hash function for k times, the time complexity of Algorithm 2
is O(k).
Algorithm 2. Insert (q).
4.2.5. Determine a recently hot queryAlgorithm 3 illustrates how HQB estimates the frequency of
q, and decides if q is a recently hot query. The variable
total_freq is used for recording the number of times that q has
appeared in the sliding window, and N is used for recording
the sum of net occurrence of all queries in the sliding
window. HQB will take out each BF, use minimum value of
the associated counter to estimate q’s frequency, and then
accumulate the estimated frequency to total_freq. Line 14
judges if q is a hot query with estimated frequency, and
returns the result to Algorithm 1. Work load of Algorithm 3
mainly falls between Line 3 and 13, where HQB needs to
walk through all BF (totally HQB.size), and each BF needs to do
the hash function up to k times. Therefore, its time
complexity is O(HQB.size � k).
Algorithm 3. isHotQuery (q).
4.2.6. Maintain HQBTo save memory, we use Algorithm 4 to dispose of the
expired BFs and queries. First, Algorithm 4 searches HQB for
the BFs that are no longer in the sliding window (i.e.,
BF.LAT < tnow � W ). These expired BFs are dropped in order
to save memory. Notice that all BFs are sorted according to
the increasing order of their LATs. That is, if a certain BF[i] is
found valid (i.e., not expired), and all the subsequent BF[j]s
(i < j � HQB.size) are not expired, then the disposal proce-
dure will be terminated. Also, Algorithm 4 searches QR for
expired queries and drops them from the QR. The time
complexity of Algorithm 4 should be determined by twofold
of discussions. First, in the worst case, Algorithm 4 needs to
look for all BF s, spending O(HQB.size). Second, it needs to
check all queries in the QR to dispose of all expired queries.
Let qcount be the number of queries stored in the QR, and
the time complexity for the second part will be O(qcount).
Adding these two up, the total time complexity for
Algorithm 4 equals O(HQB.size þ qcount).
An interesting question remains. That is, how often
Algorithm 4 should be executed? We may setup an update
interval Tupdate for executing Algorithm 4. With a shorter
Tupdate, Algorithm 4 will be executed more frequently,
which incurs more maintenance cost, but saves more
memory space, and vice versa. We will illustrate how the
interval Tupdate affects HOB’s efficiency (in terms of
computation cost and memory utilization) in the later
experiments.
c om p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 2 3 3e2 4 8 241
![Page 11: comp-sec](https://reader034.fdocuments.net/reader034/viewer/2022051816/544c75d2af7959f3138b4592/html5/thumbnails/11.jpg)
Author's personal copy
Algorithm 4. Maintain HQB.
4.2.7. Time complexity of main algorithmLet us go back to Algorithm 1 and discuss its time complexity.
Work load of Algorithm 1 mainly falls on four parts: the
insert() function, the isHotQuery() function, the cost for
searching QR, and the cost for sending queries to a SQLIA
detector for verification. Time complexity for the two func-
tions has been discussed previously. Here, wemainly focus on
the third and the forth part. Because QR is implemented using
a hash table, its searching time isO(1). Thus,main load falls on
sending queries to a SQLIA detector for verification, which,
however, does not always occur. In fact, HQBwill send a query
out only when the query is not a legitimate hot query.
Assuming a SQLIA detector verifies queries with the time
complexity of Tdetector, the time complexity for sending queries
to a SQLIA detector becomes O(R � Tdetector), where R is the
probability that q is not a legitimate hot query. To sum up,
the total time complexity for Algorithm 1 becomes
O(k þ HQB.size � k þ R � Tdetector).
4.2.8. Memory consumption of QRQR plays a role as a white list in the HQB, only storing all hot
queries that are through verification. That is, the size of QR
equals the number of all hot and legitimate queries. Please
note that if q is a cold legitimate query, it will not be stored in
the QR. The design as such provides advantages as: (1) low
requirement formemory space because of the limited number
of hot legitimate queries, (2) high search speed due to the
small size of QR, and (3) high frequency of occurrence hot
legitimate queries, which makes verification of q as a legiti-
mate query possible by simply checking if QR has stored q.
Therefore, no more sending q to the SQLIA detector for veri-
fication is necessary, and thus the overall performance is
enhanced.
4.3. The parameters of HQB
In this section,wewill discuss how to decide the value of k and
th, given a predefined Bloom filter error probability f and
Bloom filter size m. We first discuss the case where HQB has
only one Bloom filter. Given f and m are fixed, k and th can
be derived as follows (Chang et al., 2004; Broder and
Mitzenmacher, 2002):
k ¼ P�log2f R (2)
th ¼ Pmkln2R (3)
Now consider the case where HQB has only two Bloom
filters. Let fb be their Bloomfilter error probability. Our purpose
is to calculate fb, with which the two Bloom filters can
collaborate to obtain a global Bloom filter error probability f,
i.e., satisfying equation (4).
f ¼ 1� ½ðThere is no Bloom filter error in BF½1�ÞXðThere is no Bloom filter error in BF½2�Þ�
¼ 1� �1� fb
�� �1� fb
�¼ 1� �
1� fb�2
(4)
Then, we can get:
kb ¼ P�log2fbR ¼ P� log2
�1�
ffiffiffiffiffiffiffiffiffiffiffi1� f
q �R (5)
thb ¼ Pmkb
ln2R (6)
where kb represents the number of hash functions, and thbis the number of queries that a Bloom filter can store. In the
same way, if the HQB has v Bloom filters, equation (5) may be
modified as:
kb ¼ P�log2fbR ¼ P� log2
�1�
ffiffiffiffiffiffiffiffiffiffiffi1� fv
q �R (7)
Equation (7) can be used to obtain kb, and by further using
equation (6), one can calculate the upper-limit number of
queries that each Bloom filter may store.
Table 1 illustrates the relationship among v, kb, thb, and N,
as f ¼ 0.05 and m ¼ 10,000. When v increases, the Bloom filter
error probability (i.e., fb) will gradually decrease. As such, it
allows all Bloom filters to collaborate to reach 0.05 (i.e., the
global Bloom filter error probability, (f ). Likewise, we can see
that as fb drops, the number of data each Bloomfilter can store
(i.e., thb) decreases, and the required number of hash func-
tions (i.e., kb) increases. However, the total allowed storage
space (i.e., N ¼ v � thb)) will increase.
Therefore, if we know the maximum number of queries
(i.e., N ) that may appear in a sliding window, we know how to
decide the value of v by using the table. For instance, we can
continuously monitor the query stream and record the
number of queries in a sliding window. Then, after a period of
time, we can obtain themaximumnumber of queries. Assume
the maximum number of queries is 5000 (i.e., N ¼ 5000), we
can easily obtain v ¼ 5, kb ¼ 6, and thb ¼ 1155 from Table 1.
5. Performance evaluation
We conducted a series of simulation experiments to evaluate
the performance of HQB under different environmental
settings. Three SQLIA detectors, SQLGuard (Buehrer et al.,
2005), SQLrand (Boyd and Keromytis, 2004), and PHPCheck
c om p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 2 3 3e2 4 8242
![Page 12: comp-sec](https://reader034.fdocuments.net/reader034/viewer/2022051816/544c75d2af7959f3138b4592/html5/thumbnails/12.jpg)
Author's personal copy
(Nguyen-Tuong et al., 2005), were selected to cooperate with
the HQB. Each represents different verification methods: i.e.,
SQLGuard uses parse tree approach; SQLrand, query modifi-
cation method; and PHPCheck, taint-based technique. System
performance was compared between the detector with and
without HQB (i.e., HQB þ SQLGuard, HQB þ SQLrand,
HQB þ PHPCheck versus SQLGuard, SQLrand, PHPCheck) to
validate the improvement with our design. Here the “perfor-
mance” refers to the total execution time, i.e., the time spent
for the system to verify 100,000 queries (i.e.,Ntotal¼ 100,000). In
addition, the experiment also examined the memory
consumption of HQB under various environmental settings.
All the algorithms are implemented in Java and the experi-
ments are performed on a Windows Vista system with Intel
Core 2 CPU (2.4 GHz) and 4 GB memory. Please note that
although PHPCheck focuses on web applications imple-
mented using PHP (Nguyen-Tuong et al., 2005), its published
result claims that its checking rules are applicable in any other
programming language. Therefore, we have implemented the
stated rules with Java so that PHPCheck can be compatible in
the experiment context. Also note that there are two impor-
tant components in PHPCheck, i.e., the taint-tracking and
lexical analysis module. Taint-tracking identifies which data
come from un-trusted sources, while the lexical analysis
verifies whether the tainted data is safe. Our work aimed to
validate the verification process of a SQLIA detector (i.e., the
lexical analysis part of PHPCheck) in terms of computation
cost. The taint-tracking part is not our research focus, thus we
have ruled out its implementation. In our simulation, all
users’ inputted data are initially marked as tainted since they
come from un-trusted sources. Therefore, the implementa-
tion only needs to do a lexical analysis on those tainted data.
Take the following SQL statement as an example:
Because the string “OR” is a SQL keyword and tainted
(underlined), this statement will be regarded as a SQLIA.
5.1. Environmental settings
Table 2 summarizes the parameters used in the experiment,
with default parameter settings shown in bold. We had
generated10,000 legitimatequeries, and collected 20malicious
queries from the literature of SQLIAs (Halfond et al., 2006;
Halfond and Orso, 2005a; Fu and Qian, 2008; Xu et al., 2006;
Pietraszek and Vanden Berghe, 2005; Boyd and Keromytis,
2004; Nguyen-Tuong et al., 2005; Buehrer et al., 2005) as the
test samples. Each time, one query was selected from these
10,020 samples and put it into the simulator for verification.
The experiment was terminated after running the verification
for 100,000 times, i.e., 100,000 queries in total were examined.
Two key factors may affect how the simulator selects
queries: the ratio of malicious queries (r) and the skew coef-
ficient of a Zipf distribution (q).
r refers to a ratio of the number of malicious queries to the
number of total queries. For example, with r ¼ 10% and
Ntotal ¼ 100,000, there will be approximately 10,000 malicious
queries in the experiment. On selecting a query, the simulator
flips a biased coinwith probability r for heads. If the coin turns
out to be a head, the simulator selects, uniformly and
randomly, a query from the set of malicious queries. On the
contrary, the simulator picks a legitimate query based upon
the Zipf distribution. Let Pr(qi) be the probability of the i-th
query to be selected from the 10,000 legitimate queries, and it
can be derived as:
Pr�qi
� ¼ ð1=iÞqP10000i¼1 ð1=iÞq; (8)
where q is the skew coefficient.
The distribution of Pr(qi) will become more skew as the
value of q increases, which means that there are more hot
queries in the system, and vice versa.
Table 2 e Parameter settings.
Parameter Meaning Value
Ntotal Total number of queries 100,000
r The ratio of malicious queries 0%, 0.01%, 0.1%, 1%, 5%, 10%, 20%
q The skew coefficient of Zipf distribution 0.8, 1.0, 1.2, 1.4,1.6, 1.8, 2.0
s Support parameter (see Section 3) 0.001, 0.005, 0.01, 0.05,0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4
W The size of the sliding window (see Section 3) 500, 1000, 1500, 2000, 2500, 3000
f Bloom filter error probability (see Section 2) 0.05
m The size of a Bloom filter (see Section 2) 10,000
kb The number of hash functions (see Section 4.3) 6
Tupdate Update interval 1000, 2000, 3000, 4000, 5000
Table 1 e The relationship among v, fb, kb, thb, and N.
v fb kb thb N
1 0.05 4 1732 1732
2 0.025320566 5 1386 2772
3 0.016952428 5 1386 4158
4 0.012741455 6 1155 4620
5 0.010206218 6 1155 5775
6 0.008512445 6 1155 6930
7 0.007300832 7 990 6930
8 0.006391151 7 990 7920
9 0.005683045 7 990 8910
10 0.005116197 7 990 9900
c om p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 2 3 3e2 4 8 243
![Page 13: comp-sec](https://reader034.fdocuments.net/reader034/viewer/2022051816/544c75d2af7959f3138b4592/html5/thumbnails/13.jpg)
Author's personal copy
In this experiment, the sliding window size (W ) was set as
3000. We also assumed that queries arrive one at a timeda
query is submitted only after its predecessor has completed.
That is, the maximum number of queries in a sliding window
is 3000 (i.e., N ¼ 3000). This also means HQB should be able to
store at most 3000 queries.
Let Bloom filter error probability ( f ) be 0.05, and the
Bloom filter size (m) be 10,000. Based upon the explanation
of Section 4.3 and Equation (5), the number of hash func-
tions (kb) should be 6.
5.2. The effect of update time (Tupdate)
We first study the effect of update time Tupdate. As discussed in
Section 4.2.6, the execution frequency of Algorithm 4 may
affect the performance of HQB and its memory consumption.
Our goal was to find the most appropriate Tupdate for HQB
through the simulation.
We set Tupdate from 1000 to 5000, and measured the
execution time and thememory consumption of HQB. Fig. 8(a)
shows the total execution time of HQB. Interestingly, Tupdate
does not have an insignificant impact on the performance of
HQB. As discussed in Section 4.2.6, the computation cost of
Algorithm 4 depends on the size of HQB (i.e., HQB.size) and the
number of queries stored in QR (i.e., qcount). Since HQB only
keeps recent hot queries inmemory,HQB.size and qcount could
be very small evenwhen Tupdate ¼ 5000. Therefore, Algorithm 4
can complete its work very quickly.
We also found that three combinations of HQB and detec-
tors perform differently in Fig. 8(a). Among the three,
HQB þ SQLrand performs the best, and HQB þ PHPCheck and
HQB þ SQLGuard rank the second and third respectively. The
reason lies in their different verification processes, that incur
different computation costs. For each input SQL statement,
SQLGuard needs to generate two sets of parse tree and
compare between them, which tends to cause much compu-
tation overhead. PHPCheck is required to tokenize the query
string, then apply the checking rules iteratively through the
tokens, and examine for violation of the rules. Whereas,
SQLrand only needs to analyze the SQL statement and make
a b
Fig. 8 e The effect of update time (Tupdate).
a b
Fig. 9 e The effect of the ratio of malicious queries (r).
c om p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 2 3 3e2 4 8244
![Page 14: comp-sec](https://reader034.fdocuments.net/reader034/viewer/2022051816/544c75d2af7959f3138b4592/html5/thumbnails/14.jpg)
Author's personal copy
sure that all keywords and operators contain a predefined
cryptographic key.
Fig. 8(b) shows how HQB consumes memory under various
Tupdate. The figure illustrates that HQB consumes memory
indifferently to different cooperated detectors. As discussed in
Section 4.2.6, themore frequently Algorithm 4 is executed, the
more quickly the expired information in the memory will be
eliminated, which results in a lessmemory usage. Since Tupdate
has an insignificant impact on HQB’s performance and
a smaller Tupdate leads to a lower memory requirement, we set
Tupdate to 1000 in the later experiments.
5.3. The effect of the ratio of malicious queries (r)
This section studies the effect of the ratio of malicious queries
to the total number of queries. Fig. 9(a) shows the execution
time of each SQLIA detector with and without the use of HQB
under various r values. Two interesting facts can be observed.
First, SQLIA detectors with HQB always outperform those
without HQB. This means that the usage of hot queries can
indeed improve the overall performance. Second, as the r
value increases (i.e., more malicious queries existing in the
system), the performance of both our design (i.e., HQBþ SQLIA
detector) and the detector alone would decrease accordingly.
For any malicious query, HQB sends it to the detector for
further verification. Likewise, the detector alone has to spend
more time to parse a malicious query, which results in
a poorer performance.
Fig. 9(b) shows the memory consumption of HQB under
various r. As r becomes larger, HQB only records in its repos-
itory fewer hot and legitimate queries, and thus requires less
memory space.
5.4. The effect of the skew coefficient of Zipfdistribution (q)
This section looks into how the number of hot queries may
affect system performance. In the experiment, q determines
the occurrence frequency of hot queries; i.e., hot queries tend
to appearmore frequentlywith a greater q. Fig. 10(a) illustrates
that our design spends less execution timeasmorehot queries
occur. Conversely, as q gets smaller, performance of our design
a b
Fig. 10 e The effect of q.
ba
Fig. 11 e The effect of support parameter s.
c om p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 2 3 3e2 4 8 245
![Page 15: comp-sec](https://reader034.fdocuments.net/reader034/viewer/2022051816/544c75d2af7959f3138b4592/html5/thumbnails/15.jpg)
Author's personal copy
gets poorer, and eventually becomes even worse than that of
a SQLIA detector. Fig. 10(a) also shows that the number of hot
queries does not affect the detector’s performance, because
the design has not taken the issue into consideration.
Fig. 10(b) illustrates the memory requirements of HQB
under different q values. A higher q value represents the
scenario where only a small portion of queries are categorized
as hot queries while many more others are not. Hence, not
much information needs to be stored in the repository, nor
does the memory space consumed.
5.5. The effect of support parameter (s)
Fig. 11(a) illustrates how the support parameter (s) impacts the
execution time. A smaller s value means that there is a higher
chance to classify hot query, resulting in a higher number of
hot queries in the system. Note that HQB does not send hot
queries to the SQLIA detector, thus saving the time for their
verifications. This explains why HQB performs better with
a smaller s value.
Fig. 11(b) shows the impact of s on memory consumption.
As expected, our design has a lowermemory requirement as s
increases. Please note, as s > 0.05, memory consumption of
HQB slightly increases. Asmentioned above (Section 4.2), HQB
stores cold but legitimate queries into Bloom filter to record
their occurring frequency. As s grows, queries submitted by
users are more likely to be regarded as cold ones. The
increased number of cold queries requires HQB to use more
Bloom filters, thus consume more memory. However, Bloom
filter is a data structure featured with very limited memory
consumption. That is why HQB only consumes a little more
memory as the cold query number grows.
5.6. The effect of the size of sliding window (W )
Fig. 12(a) illustrates how the size of sliding window impacts
the system’s execution time.We have found that with the size
of slidingwindow increased, the execution time for our design
would slightly increases. Again, our design still significantly
outperforms each SQLIA detector under all W values. With
a larger size of sliding window, HQB requires more Bloom
filters for storing information regarding queries, which, of
course, consumes more memory (as shown in Fig. 12(b)) and
incurs more computation cost. Similarly, the design of all
SQLIA detectors does not take the factor of sliding window
into consideration, thus the size of the sliding windowwill not
affect its performance.
6. Conclusions and future work
This paper reports a newly-designed HQB, which can coop-
erate with the existing SQLIA detectors in web applications
and enhance overall system performance. Our research has
found that inside most systems exist many hot queries that
current SQLIA detectors have repeatedly verified. Such repe-
tition causes unnecessary waste of system resources. HQB
simply records hot queries and skip the detector’s verification
process on their next occurrences, and thus improve system
performance. We have proposed algorithms for recording hot
queries and other related mechanisms, and conducted
a series of experiments to measure the respective perfor-
mance of HQB with three detectors. The experiment results
have demonstrated that utilization of HQB can indeed effi-
ciently improve system performance up to 45% of execution
time, regardless of the different detectors being tested. With
such improvement and robustness, the result promises to
provide an efficient add-on feature for SQLIA detectors in
protecting web applications.
The research also directs some possible future works. Thus
far, we have successfully demonstrated HQB’s performance
through a simulation. Further testing the design and
measuring its performance in a real web application envi-
ronment is necessary for its application. Second, the research
should continue to define and design a standard interface,
which makes HQB easily cooperate with any other SQLIA
a b
Fig. 12 e The effect of sliding window size W.
c om p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 2 3 3e2 4 8246
![Page 16: comp-sec](https://reader034.fdocuments.net/reader034/viewer/2022051816/544c75d2af7959f3138b4592/html5/thumbnails/16.jpg)
Author's personal copy
detectors and web applications. Third, as multi-core proces-
sors are becoming the mainstream in the near future,
applying this new development in HQB will no doubt further
enhance system performance. Fourth, an in-depth sensitivity
analysis on the evaluated performance under different envi-
ronment settingsmay lead to a further understanding of what
factors may impact HQB’s performance.
Acknowledgement
This work is supported by National Science Council of Taiwan
(ROC) under Grants NSC100-2221-E-309-011.
r e f e r e n c e s
Alon N, Matias Y, Szegedy M. The space complexity ofapproximating the frequency moments. In: STOC ’96:proceedings of the twenty-eighth annual ACM symposium ontheory of computing. New York, NY, USA: ACM; 1996. p. 20e9.
Bloom BH. Space/time trade-offs in hash coding with allowableerrors. Communications of ACM 1970;13. ISSN: 0001-0782:422e6.
Boyd S, Keromytis A. SQLrand: preventing SQL injection attacksin stored procedures. In: Proceedings of the 2nd appliedcryptography and network security (ACNS) conference; 2004.p. 292e304.
Breslau L, Cao P, Fan L, Phillips G, Shenker S. Web caching andZipf-like distributions: evidence and implications. In:Eighteenth annual joint conference of the IEEE computer andcommunications societies (INFOCOM’99); 1999. p. 126e34.
Broder A, Mitzenmacher M. Network applications of bloom filter:a survey. Internet Mathematics 2002;1(4):485e509.
Buehrer G, Weide BW, Sivilotti PAG. Using parse tree validation toprevent SQL injection attacks. In: SEM ’05: proceedings of the5th international workshop on software engineering andmiddleware. New York, NY, USA: ACM; 2005. p. 106e13.
Chang F, Feng W, Li K. Approximate caches for packetclassification. In: Twenty-third annual joint conference of theIEEE computer and communications societies (INFOCOM’04);2004. p. 2196e207.
Charikar M, Chen K, Farach-Colton M. Finding frequent items indata streams. In: ICALP ’02: proceedings of the 29thinternational colloquium on automata, languages andprogramming. London, UK: Springer-Verlag; 2002. p. 693e703.
Cormode G, Muthukrishnan S. What’s hot and what’s not:tracking most frequent items dynamically. ACM Transactionson Database System 2005;30(1). ISSN: 0362-5915:249e78.
Fan L, Cao P, Almeida J, Broder AZ. Summary cache: a scalablewide-area web cache sharing protocol. IEEE/ACM Transactionson Networking 2000;8(3). ISSN: 1063-6692:281e93.
Fu X, Qian K. SAFELI-SQL injection scanner using symbolicexecution. In: Proceedings of the 2008 workshop on testing,analysis, and verification of web services and applications(TAV-WEB); 2008. p. 34e9.
Gibbons PB, Matias Y. New sampling-based summary statistics forimproving approximate query answers. In: SIGMOD ’98:proceedings of the 1998 ACM SIGMOD international conferenceonmanagement of data. NewYork, NY, USA: ACM; 1998. p.331e42.
Google. http://www.google.com/press/zeitgeist.html; 2010Guo D, Wu J, Chen H, Yuan Y, Luo X. The dynamic bloom filters.
IEEE Transactions on Knowledge and Data Engineering 2010;22(1). ISSN: 1041-4347:120e33.
Haldar V, Chandra D, Franz M. Dynamic taint propagation for java.In: Proceedings of the 21st annual computer security applicationsconference (ACSAC’05). IEEE Computer Society; 2005. p. 303e11.
Halfond WG, Orso A. AMNESIA: analysis and monitoring forneutralizing SQL-injection attacks. In: Proceedings of the 20thIEEE/ACM international conference on automated softwareengineering. ACM Press; 2005a. p. 174e83.
Halfond WGJ, Orso A. Combining static analysis and runtimemonitoring to counter SQL-injection attacks. In: WODA ’05:proceedings of the third international workshop on dynamicanalysis. New York, NY, USA: ACM; 2005b. p. 1e7.
Halfond WGJ, Orso A. Preventing SQL injection attacks usingAMNESIA. In: ICSE ’06: proceedings of the 28th internationalconference on software engineering. New York, NY, USA:ACM; 2006. p. 795e8.
Halfond W, Viegas J, Orso A. A classification of SQL-injectionattacks and countermeasures. In: Proceedings of the IEEEinternational symposium on secure software engineering(ISSSE); 2006.
Halfond W, Orso A, Manolios P. WASP: protecting webapplications using positive tainting and syntax-awareevaluation. IEEE Transactions on Software Engneering 2008;34(1). ISSN: 0098-5589:65e81.
Hodes TD, Czerwinski SE, Zhao BY, Joseph AD, Katz RH. Anarchitecture for secure wide-area service discovery. WirelessNetworks 2002;8(2/3). ISSN: 1022-0038:213e30.
Jin C, QianW, Sha C, Yu JX, Zhou A. Dynamically maintainingfrequent itemsoveradatastream. In:CIKM’03:proceedingsof thetwelfth international conference on information and knowledgemanagement. New York, NY, USA: ACM; 2003. p. 287e94.
Mitropoulos D, Spinellis D. SDriver: location-specific signaturesprevent SQL injection. Computer & Security 2009;28(3e4):121e9.
Nguyen-Tuong A, Guarnier S, Greene D, Shirley J, Evans D.Automatically hardening web applications using precisetainting. In: 20th IFIP international information securityconference; 2005. p. 372e82.
Pietraszek T, Vanden Berghe C. Defending against injectionattacks through context-sensitive string evaluation. In:Proceedings of the 8th international symposium on recentadvances in intrusion detection (RAID); 2005. p. 124e45.
Reynolds P, Vahdat A. Efficient peer-to-peer keyword searching.In: Middleware ’03: proceedings of the ACM/IFIP/USENIX2003 international conference on middleware. New York,NY, USA: Springer-Verlag New York, Inc.; 2003. p. 21e40.
Rhea SC, Kubiatowicz J. Probabilistic location and routing. In:Twenty-first annual joint conference of the IEEE computer andcommunications societies (INFOCOM’02); 2004. p. 1248e57.
Xu W, Bhatkar S, Sekar R. Taint-enhanced policy enforcement:a practical approach to defeat a wide range of attacks. In:Proceedings of the 15th USENIX security symposium. USENIXAssociation; 2006. p. 121e36.
Zipf GK. Relative frequency as a determinant of phonetic change.Harvard Studies in Classical Philology 1929;40:1e95.
Yu-Chi Chung received his Ph.D. degree in the Department ofComputer Science and Information Engineering at the NationalCheng-Kung University, Taiwan, in 2007. Currently, he is anassistant professor of the Department of Computer Science andInformation Engineering, Chang Jung Christian University, Tai-wan. His research interests include mobile/wireless datamanagement, sensor networks, skyline query processing, spatio-temporal databases, and web information retrieval.
Ming-Chuan Wu received his Ph.D. degree in the Lally School ofTechnology and Management at the Rensselaer PolytechnicInstitute, US, in 2003. Currently, he is an assistant professor of theDepartment of Information Management, Chang Jung Christian
c om p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 2 3 3e2 4 8 247
![Page 17: comp-sec](https://reader034.fdocuments.net/reader034/viewer/2022051816/544c75d2af7959f3138b4592/html5/thumbnails/17.jpg)
Author's personal copy
University, Taiwan. His research interests include databasemanagement, project management, and Information Systemplanning and development.
Dr Yih-Chang Chen is an Assistant Professor in the Department ofInformation Management at Chang Jung Christian University,Tainan, Taiwan. Dr Chen received his BSc degree in Computer andInformation Sciences from Tunghai University, Taiwan in 1992;MSc (Econ) degree in Information Systems Security from TheLondon School of Economics and Political Science (LSE), Universityof London in 1996; and PhD degree in Computer Science from theUniversity of Warwick, United Kingdom in 2002. His researchinterests include business process reengineering, empiricalmodelling, lean thinking and lean management, software engi-neering and requirements engineering, and the use-case approach
to systemdevelopment. Currently he is the deputy director of RFIDResearch Centre at Chang Jung Christian University, Taiwan.
Wen-Kui Chang is a professor at the Department of InformationManagement, and the Dean of the College of Information andEngineering as well at the Chang Jung Christian University, Tai-wan. His research interests include software quality manage-ment, software reliability engineering, software certification, andperformance enhancement under CMMI. He received his M.S.degree in Management Science & Engineering from StanfordUniversity, U.S.A. and Ph.D. in Management Science from Tam-kang University, Taiwan. He has been as the President of theChinese Society for Quality (CSQ) for three years since 2006. He iscurrently the SEI authorized Instructor for the Introduction toCMMI course.
c om p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 2 3 3e2 4 8248