Intelligent Context-Based Pattern Matching Approaches to ...€¦ · of this wave of big data,...

13
Intelligent Context-Based Pattern Matching Approaches to Enhance Decision Making Gaik-Yee Chan (&) , Kim-Loong Ong, Tong-Sheng Wong, and Lork-Yee Yvonne Chow Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Malaysia [email protected] Abstract. In this Internet and Cloud Computing era, every second, there is huge volume of data, whether structured or unstructured, is being stored or retrieved by various applications for use in different ways to support decision making. These business applications certainly require effective and accurate means to store and retrieve information on contextual basis to support decision making. Merely using pattern matching methods without considering the con- text may not help to retrieve the most suitable and accurate information for decision making. This paper therefore introduces three web applications that apply intelligent pattern matching approaches to retrieve accurate information to enhance decision making on contextual basis. In the rst study, stemming and Boyer-Moore methods are incorporated with company policies to auto search, and recommend the right candidate to attend the most appropriate training course. The second study, through several iterations of pattern matching using a lookup table, locates the best three real estate properties that match potential buyer s preferences. In the third study, a color matching scheme is used to nd userspreferred images or photos stored in a Cloud storage. Testing and per- formance evaluation of these methods using the web applications show results that could effectively enhance decision making. Keywords: Context-based Á Decision making Á Pattern matching Stemming 1 Introduction In this current Internet and Cloud Computing technology era, every second, there is huge volume of data, whether structured or unstructured, is being stored or retrieved by various applications for use in different ways to support decision making. To have a feel of how tremendous the volume, how fast the speed and how varied the content is, take for example, Google which on average, processes over 40,000 search queries every second and this could translate to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide [1]. At another instance as of March 2015, You Tube has created 10,000 videos leading to generation of one billion over views and 70 million plus hours of watch time [2]. Not to forget social media, for example, Facebook, could generate up to 2.7 billion likeactions and 300 million of photos per day [3]. In view © Springer International Publishing AG, part of Springer Nature 2018 O. Gervasi et al. (Eds.): ICCSA 2018, LNCS 10960, pp. 485497, 2018. https://doi.org/10.1007/978-3-319-95162-1_33

Transcript of Intelligent Context-Based Pattern Matching Approaches to ...€¦ · of this wave of big data,...

Page 1: Intelligent Context-Based Pattern Matching Approaches to ...€¦ · of this wave of big data, effective decision-making has to be data-driven and coupled with intelligent analytic

Intelligent Context-Based Pattern MatchingApproaches to Enhance Decision Making

Gaik-Yee Chan(&), Kim-Loong Ong, Tong-Sheng Wong,and Lork-Yee Yvonne Chow

Faculty of Computing and Informatics,Multimedia University, Cyberjaya, Malaysia

[email protected]

Abstract. In this Internet and Cloud Computing era, every second, there ishuge volume of data, whether structured or unstructured, is being stored orretrieved by various applications for use in different ways to support decisionmaking. These business applications certainly require effective and accuratemeans to store and retrieve information on contextual basis to support decisionmaking. Merely using pattern matching methods without considering the con-text may not help to retrieve the most suitable and accurate information fordecision making. This paper therefore introduces three web applications thatapply intelligent pattern matching approaches to retrieve accurate information toenhance decision making on contextual basis. In the first study, stemming andBoyer-Moore methods are incorporated with company policies to auto search,and recommend the right candidate to attend the most appropriate trainingcourse. The second study, through several iterations of pattern matching using alookup table, locates the best three real estate properties that match potentialbuyer’s preferences. In the third study, a color matching scheme is used to findusers’ preferred images or photos stored in a Cloud storage. Testing and per-formance evaluation of these methods using the web applications show resultsthat could effectively enhance decision making.

Keywords: Context-based � Decision making � Pattern matchingStemming

1 Introduction

In this current Internet and Cloud Computing technology era, every second, there ishuge volume of data, whether structured or unstructured, is being stored or retrieved byvarious applications for use in different ways to support decision making. To have afeel of how tremendous the volume, how fast the speed and how varied the content is,take for example, Google which on average, processes over 40,000 search queriesevery second and this could translate to over 3.5 billion searches per day and 1.2 trillionsearches per year worldwide [1]. At another instance as of March 2015, You Tube hascreated 10,000 videos leading to generation of one billion over views and 70 millionplus hours of watch time [2]. Not to forget social media, for example, Facebook, couldgenerate up to 2.7 billion “like” actions and 300 million of photos per day [3]. In view

© Springer International Publishing AG, part of Springer Nature 2018O. Gervasi et al. (Eds.): ICCSA 2018, LNCS 10960, pp. 485–497, 2018.https://doi.org/10.1007/978-3-319-95162-1_33

Page 2: Intelligent Context-Based Pattern Matching Approaches to ...€¦ · of this wave of big data, effective decision-making has to be data-driven and coupled with intelligent analytic

of this wave of big data, effective decision-making has to be data-driven and coupledwith intelligent analytic tools [4].

In view of such wave, many organizations as mentioned in [3], be it in health care,telecommunication, finance industries, government agencies, the academia and otherbusinesses are recognizing the fact that data analytics could help their business entitiesin locating the right data and interpret them according to their business requirements,hence more customer-centric. These businesses or their businesses applicationstherefore require effective and accurate means to store and retrieve information oncontextual basis through data analytics to enhance decision making. Moreover, merelylocating or searching data without considering the context may not help to retrieve themost suitable and accurate information for decision-making. Consequently, research in[5] has developed a framework that incorporates data analytic tools and techniques intothe decision making process. This framework integrated with intelligent techniquesaims to enhance the quality of the decision making process while dealing with big data.

For our paper, we introduce three web applications that apply intelligentpattern-matching approaches to retrieve accurate information to enhance decisionmaking on contextual basis. In the first study, stemming and Boyer-Moore methods areincorporated with company policies to auto search, and recommend the right candidateto attend the most appropriate training course. In the second study, through severaliterations of pattern matching in a brute-force manner to locate the best three real estateproperties that match potential buyer’s preferences. In the third study, a color matchingscheme is used to find images or photos stored in a Cloud storage.

This paper is organized as follows, Sect. 2 provides background study, Sect. 3discusses methodologies used, Sect. 4 presents the three case studies with performanceevaluation and analysis of results and Sect. 5 concludes with indication for future work.

2 Background Study

Generally, a keyword search engine should be sufficient in providing retrieval ofinformation. However, most often, these general purpose search engines will retrieve alist of most likely matched results for the users. The users still have to go through eachitem in the list to find the most suitable one. Hence, much unproductive time is wastedin this search manner. Some web applications although provide information retrieval,but are not incorporated with intelligent search algorithms or data analytic techniquesand hence not able to provide accurate and timely support for decision making.

For example, some existing employees training management systems [6–8] doprovide functionality for managing their employees’ training, but allow only thedepartment manager to select and register the training course for the employees. Theselection and registration for the training course is not based on employees trainingneeds or preference. This could create the problem that the employees are not receivingthe most suitable training according to their needs.

For some web-based real estate or properties search engine such as [9–11] they doprovide search function to allow users to obtain information of properties on sales orrent. However, these applications merely list out the information of the properties anddo not do much analytics. For example, they do not have the intelligent search, match

486 G.-Y. Chan et al.

Page 3: Intelligent Context-Based Pattern Matching Approaches to ...€¦ · of this wave of big data, effective decision-making has to be data-driven and coupled with intelligent analytic

and recommend function to offer users some property choices according to theirpreferences hence assisting users in decision making.

Other web applications such as photo album organizers [12–14] do provide key-word search function for users to search and list out their photos stored in the Cloudstorage. However, many of them do not allow image matching whereby photos withsimilar color schemes could be displayed thus providing users another convenient wayto store and retrieve the color photos based on their choices.

Consequently, this leads to our studies being focus on contextual data analytics andintelligent pattern matching methods to auto search and recommend the most suitablechoices according to users’ preferences.

3 Pattern Matching Methods

This section describes some information retrieval methods such as Stemming algo-rithm, Boyer-Moore algorithm and color matching schemes. A point to note is thatinformation retrieval or pattern matching methods are not limited to these few men-tioned above. They are mentioned here due to our research work makes use of thesemethods or variations in the development of our intelligent context-based patternmatching applications to auto search, select and recommend the best choices dependingon users’ preferences. Additionally, the aim of this study is not on the improvement ofthese pattern matching methods but rather to investigate the feasibility and effective-ness of such methods as used in our applications to provide timely and real-time dataanalytics for better and more accurate decision making.

3.1 Stemming Algorithm

Research has been on going to find the best method for effective information retrieval.Many of these researches focus on the Stemming algorithm [15, 16] whereby thealgorithm shall recognize the different variants of the word and then stemmed to a rootword. For example, the words programming, programed, and programs could all bestemmed to the root word ‘program’. To achieve stemming, the words need to beconflated to its various variants [17]. There are many automatic approaches towardsstemming, such as Affix Removal, Table Lookup, Successor Variety and n-gram [18].Although there are many variations of the stemming algorithms developed for manydifferent languages, but the Porter’s stemming algorithm, originally developed for thestemming of the English-language texts, had become the standard stemming model forprocessing of other languages [19].

The following sections describe only the stemming approaches used in our casestudies such as Affix Removal and Table Lookup.

Affix Removal method involves five steps where each step has defined set ofmorphological rules to remove the affixes of words sequentially [15]. For example, thefirst step is to deal with plurals, past or present participles and transforming the lastcharacter ‘y’ to an ‘i’. Take for example the word “specializations” shall be transformedto “specialization”, “disagreed” to “disagree” and “lucky” to “lucki”. The second stepthen handles double suffixes, for example, the word “generalization” is converted to

Intelligent Context-Based Pattern Matching Approaches 487

Page 4: Intelligent Context-Based Pattern Matching Approaches to ...€¦ · of this wave of big data, effective decision-making has to be data-driven and coupled with intelligent analytic

“generalize”. In the third step, suffixes not handled in the second step is furtherremoved, for example, “generalize” is changed to “general”. The forth step removesremaining suffixes, example, “general” becomes “gener”. The last step deals withwords ending with ‘e’ or double consonant, for example, “contribute” becomes“contribut” and “oscill” becomes “oscil”.

The Table Lookup method [18] uses a table to store the stemmed words and theirmorphological variants. During queries or indexes search, the lookup table will bescanned for the corresponding root word. Storage overhead may be a concern forlookup table and usually, B-tree or hash table is used for more efficient searching.

3.2 Boyer-Moore Algorithm

There are many string searching algorithms and one of it is pioneered by researchers inyear 1977 [20] namely the Boyer-Moore string searching algorithm. Since then, moreand more research in improving this algorithm is being conducted such as in [21–25].For our paper, the aim is not to improve on this Boyer-Moore algorithm, but to apply itto our application for fast contextual information searching. Therefore, only the basicconcept of this algorithm is discussed in this section.

Generally, Boyer-Moore algorithm starts comparing the pattern from the leftmostpart of text and moves it to the right. Thus, there are two observations of pattern that areimportant in this algorithm, namely, the good-suffix or matching shift andbad-character or the occurrence shift. When there is a mismatch or complete match ofthe whole pattern, it uses the good-suffix and bad-character to shift the window to theright. A point to note is that Boyer-Moore algorithm has to optimize its use between thegood-suffix shift and bad-character shift in order to avoid negative bad-character shifts.Further details regarding Boyer-Moore algorithm could be referred to [20].

4 Pattern Matching - Case Study

This section presents three case studies in which each case study has developed a webapplication that makes use of pattern matching approaches to intelligently perform dataanalytics based on users’ preferences and auto search, select and recommend suitablechoices for them. Based on these choices, the users can still subjectively decide whichis the best solution for them. In the first study, stemming and Boyer-Moore methods areincorporated with company policies to auto search, and recommend the right candidateto attend the most appropriate training course. The second study, through severaliterations of pattern matching using a lookup table, locates the best three real estateproperties that match potential buyer’s preferences. In the third study, a color matchingscheme is used to find users’ preferred images or photos stored in a Cloud storage.

4.1 Pattern Matching Using Stemming and Boyer-Moore Algorithms

A web-based employee training profile management system is developed with the aimto provide companies with a systematic and efficient way, in particular, to auto selectand recommend the most appropriate training courses for the employees according to

488 G.-Y. Chan et al.

Page 5: Intelligent Context-Based Pattern Matching Approaches to ...€¦ · of this wave of big data, effective decision-making has to be data-driven and coupled with intelligent analytic

their interests or preferences, qualification or expertise levels, and work requirements.The matching algorithms involved are the Stemming algorithm and Boyer-Moorealgorithm.

Basically, there are 5 main stages of searching, matching and recommending theappropriate training course to the suitable employees. Refer to Fig. 1 for step by stepprocess of the auto search, match and recommend stages.

This provides the company the flexibility to tailor their policies to cater foremployee training, for example, the company may set the cut-off point at 2, meaning ascore of 2 or greater, than only the employee is qualified to attend the recommendedtraining.

Refer to Fig. 2 for detail explanation of the stemming and Boyer-Moore algorithmsas applied in this application.

Fig. 1. The 5 stages of searching, matching and recommending training course

Intelligent Context-Based Pattern Matching Approaches 489

Page 6: Intelligent Context-Based Pattern Matching Approaches to ...€¦ · of this wave of big data, effective decision-making has to be data-driven and coupled with intelligent analytic

This web application is tested with different scenarios of employee interest, qual-ification level, departments and so on. The algorithms work satisfactorily to stem andmatch, hence the scores are computed correctly and the corresponding recommendationis accurate. Refer to Fig. 3 for the accurate recommendation generated.

Fig. 2. Applying stemming and Boyer-Moore Algorithms

490 G.-Y. Chan et al.

Page 7: Intelligent Context-Based Pattern Matching Approaches to ...€¦ · of this wave of big data, effective decision-making has to be data-driven and coupled with intelligent analytic

Fig. 3. Recommendations generated to match most suitable training course

Intelligent Context-Based Pattern Matching Approaches 491

Page 8: Intelligent Context-Based Pattern Matching Approaches to ...€¦ · of this wave of big data, effective decision-making has to be data-driven and coupled with intelligent analytic

4.2 Pattern Matching Using Lookup Table

A web application, the real estate property recommender, aims to provide an easy,convenient and accurate way for users to obtain properties information based on theirown preferences. It is implemented with a search, match and recommend function. Thisfunction goes through three iterations to search for the right properties in the order ofthe first, second and third users’ preferences. For each iteration, the preferences shall bematched to a pre-defined lookup table and those matched records shall be saved in atemporary table. After the third match, three best matched properties are found to berecommended. This web application saves the potential buyers time and solve theproblem or difficulty of users looking through newspaper, property magazine, and evencontact the property agent in order to gather information about the property they intendto purchase. Additionally, information gathered through this web application shallensure timeliness, completeness and accuracy since the property database shall beconstantly updated with new and most current information, hence assisting thepotential property buyers to make the most appropriate decision regarding the purchase.

Refer to Fig. 4 for step by step process of the auto search, match and recommendfunction for top-3 best matched properties.

This web application is tested with different scenarios of users’ preferences and theresults show the auto search, match and recommend function is working as expected toproduce the three best matching properties according to user’s first, second and thirdpreferences. Refer to Fig. 5 for the results of four different scenarios.

Fig. 4. Auto search, match and recommend function for top-3 best matched properties

492 G.-Y. Chan et al.

Page 9: Intelligent Context-Based Pattern Matching Approaches to ...€¦ · of this wave of big data, effective decision-making has to be data-driven and coupled with intelligent analytic

Fig. 5. Results of three different scenarios for best matched properties

Intelligent Context-Based Pattern Matching Approaches 493

Page 10: Intelligent Context-Based Pattern Matching Approaches to ...€¦ · of this wave of big data, effective decision-making has to be data-driven and coupled with intelligent analytic

4.3 Image Matching Using Color Scheme

The personal photo album organizer, is a web application developed with the aim toenhance the users experience in storing and retrieving photos and videos in the Cloudstorage. It has functionality that allows users to upload and download photos or videosthrough a web page easily using Internet access. It allows the users to search for theirphotos and videos by matching keywords, date and time, and images. This webapplication saves the users time and solve the problem or difficulty of users looking upa photo or video one by one in order to locate the specific photo or video. Particularly,there is an image matching function whereby images are matched using a color schemealgorithm. This thus provides the users with another convenient way to store andretrieve photos besides using just the keyword, date and time searching and matching.

Refer to Fig. 6 for step by step process of the color scheme image matchingfunction.

This web application is tested with different color images and the results show thesearch and match color scheme algorithm is working as expected to display the correctimages. Refer to Fig. 7 for the results of four different scenarios.

Fig. 6. The color scheme image matching function

494 G.-Y. Chan et al.

Page 11: Intelligent Context-Based Pattern Matching Approaches to ...€¦ · of this wave of big data, effective decision-making has to be data-driven and coupled with intelligent analytic

Fig. 7. Results of four different scenarios for correct matched color images (Color figure online)

Intelligent Context-Based Pattern Matching Approaches 495

Page 12: Intelligent Context-Based Pattern Matching Approaches to ...€¦ · of this wave of big data, effective decision-making has to be data-driven and coupled with intelligent analytic

5 Conclusion and Future Work

As can be seen from the three web applications that pattern matching techniquesincorporated with intelligent contextual data analytics are able to retrieve timely andaccurate information based on users’ inputs, thus assisting in decision making. Futurework shall include evaluation of these applications over the Cloud environment usingreal-time transactions for efficiency measure such as time and load performances.

References

1. Google Search Statistics. www.internetlivestats.com/google-search-statistics/. Accessed 14Dec 2017

2. YouTube Press Statistics. https://www.youtube.com/yt/bout/press/. Accessed 14 Dec 20173. Mukherjee, S., Shaw, R.: Big data-concepts, applications, challenges and future scope. Int.

J. Adv. Res. Comput. Commun. Eng. 2, 66–74 (2016)4. Batarseh, F.A., Latif, E.A.: Assessing the quality of service using big data analytics with

application to healthcare. Big Data Res. 4, 13–24 (2016)5. Elgendy, N., Elragal, A.: Big data analytics in support of the decision making process. Proc.

Comput. Sci. 100, 1071–1084 (2016)6. Etq. http://www.etq.com/employee-training-software/. Accessed 1 July 20167. Halogen software. http://www.halogensoftware.com/ae/products/learning-management.

Accessed 1 July 20168. Intelex. http://www.intelex.com/products/applications/training-management. Accessed 1

July 20169. iProperty.com. https://www.iproperty.com.my/. Accessed 1 Aug 201610. Propertyguru.com. https://www.propertyguru.com.my/. Accessed 1 Aug 201611. Propwall.my. https://propwall.my/. Accessed 1 Aug 201612. Dropbox (2015). https://www.dropbox.com/home. Accessed 22 Nov 201613. TinyPic (2011). http://tinypic.com/. Accessed 22 Nov 201614. Google Photos (2015). https://photos.google.com/. Accessed 3 Sept 201615. Karaa, W.B.A.: A new stemmer to improve information retrieval. Int. J. Netw. Secur. Appl.

5(4), 143–154 (2013)16. Paik, J.H., Mitra, M., Parui, S.K., Järvelin, K.: GRAS: an effective and efficient stemming

algorithm for information retrieval. ACM Trans. Inf. Syst. 29(4), 1–24 (2011). Article 1917. Shama, D.: Stemming algorithms: a comparative study and their analysis. Int. J. Appl. Inf.

Syst. 4(3), 7–12 (2012)18. Kumar, R., Mansotra, V.: Applications of stemming algorithms in information retrieval-a

review. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 6(2), 418–423 (2016)19. Wilett, P.: The porter stemming algorithm: then and now. Program 40(3), 219–223 (2006)20. Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM Mag. 20(10),

702–772 (1977)21. Lecrog, T.: A variation on the Boyer-Moore algorithm. Theor. Comput. Sci. 92, 119–144

(1992)22. Watson, B.W., Watson, R.E.: A Boyer-Moore-style algorithm for regular expression pattern

matching. Sci. Comput. Program. 48(2–3), 99–117 (2003)

496 G.-Y. Chan et al.

Page 13: Intelligent Context-Based Pattern Matching Approaches to ...€¦ · of this wave of big data, effective decision-making has to be data-driven and coupled with intelligent analytic

23. Danvy, O., Rohde, H.K.: On obtaining the Boyer-Moore string-matching algorithm bypartial evaluation. Inf. Process. Lett. 99, 158–162 (2006)

24. Xiong, Z.: A composite Boyer-Moore Algorithm for the string matching problem. In: The11th International Conference on Parallel and Distributed Computing, Applications andTechnologies, pp. 492–496 (2010)

25. Yuan, L.: An improved algorithm for Boyer-Moore string matching in Chinese informationprocessing. In: The 11th International Conference on Computer Science and Service System,pp. 182–184 (2011)

Intelligent Context-Based Pattern Matching Approaches 497