[IEEE 2012 International Conference on Technology Enhanced Education (ICTEE) - Amritapuri, India...

Kloud – A Virtual Elastic Knowledge Cloud A Centralized Directory Based Approach for Education Content Aggregation

ShijuSathyadevan Amrita Cybersecurity Centre, Amrita Vishwa Vidyapeetham

Clappana P.O, Kollam, Kerala 690 525 [email protected]

Abstract— Rapid global growth has caused information to exist as discrete isolated islands dispersed across the globe. As in the case of educational content, a single hub to tap this immense wealth of information asset is still non-existent. This paper introduces “Kloud” (Knowledge based cloud) - an innovative solution to addressing and integrating information. Kloud is designed to amalgamate disjointed information chains pertaining to diverse domains, owned and managed by multitude of organizations, under a single platform to build a powerful information base that could be accessed by a user anywhere anytime. Such an information base will act as single source information feeder to seekers. The system architected to allow elastic inclusivity includes several key modules such as a)Central Content Directory Management Cloud, b) Central Content Scavenger & Aggregator Platform and c) Content Aggregator & Broadcast Agent that overpowers any geographic limitations in residence of knowledge. In contrast to commonly used methodologies this architecture uses a directory-based approach to maintain a master content directory list that retrieves content from its registered providers. This directory is generated by a content aggregator and broadcast agent (CAB Agent) located at the providers end and broadcasted to the central directory server which then mirrors the same across key access points.

Keywords— Content Aggregation, Education Content Repository, Education Content Cloud, Centralized Directory based Content Aggregation, Building Education Knowledge Base.

I. INTRODUCTION World around us thrive with information that if tapped into appropriately, there by then amalgamated, organized and distributed will result in building a reservoir of constructive knowledge base. Schools, Universities, research organizations and R&D wings associated to the industry around the world bore in themselves

treasure chests of information that they have accumulated over the years as part of the various research initiatives. Lot of these information remain isolated and even if shared might not hit the right users at the right time for it to be used as a reference point for further researches to be carried out on a topic in question.

“Appropriate Collection of Information is Knowledge. Knowledge is indeed wealth and is worth being treated as a priceless asset if leveraged appropriately”[4].

A one man army is always prone to meet with the inevitable defeat facing a mighty militia even before the war starts irrespective of one’s valor, strength, courage and determination. What if a number of such valiant warriors join hand in devising an attack pattern and getting to execute it in an organized manner. Such wars are bound to find favorable results. Yet another critical factor that hides away in the mist created is the need for someone to devise, coordinate and forerun the show.

The same applies to this world of wide spread disjoint set of information that is dispersed across the globe addressing wide range of domain spectrums. Apt information if amalgamated can become valuable knowledge that can be leveraged to its full potential in terms of extending the study further or use as a base point to establish a newly derived hypothesis. If so why not there be an effort to group, organize and distribute them effectively and efficiently.

This paper propose in building “Kloud” a Knowledge Cloud, a system that will make every effort to gather this broadly dispersed information assets, centralize them, and there after group those as per their subject domain

Fig. 1 Elastic Content Directory Management Cloud Architecture

weighed with respect to its relevance, eventually distributing all available germane content to its consumers in an organized fashion addressing their wants on a timely fashion. No doubt that there will be questions raised comparing the suggested solution to existing and dominantly prevailing technologies that does similar tasks. Why is this approach different to a search engine like Google capable of spitting out countless number of subject relevant documents that matches the best to the key words being searched in matter of no time? Why should one

go through the pain of registering with a Central Content directory where as one could get access to those information using similar search tools that are readily available with no such formalities to pursue with. More over these regular tools are not second to the behavioral nature of the content seekers either because they have used it long enough to be comfortable with it or is too reluctant to venture other possibilities which might give them an elevated search experience.

It is absolutely genuine and of course a compellingly valid question that need to be tackled and responded appositely in order to justify the very need of this new technology in the mist of other prevailing triumphant technologies that does its job to the near point of perfection. This is not a paper focused in outweighing an overly performing, globally acclaimed search engine or in any way an attempt made to develop another outperforming tool. Search engines sort documents not only based on their degree of relevance of the search keyword in question but also will weigh it against several other factors which will make a document to be listed in the top few pages for it to be noticed easily. Prominent search engines like Google does not display abstracts of search results ensuing in one opening up every document link to verify the relevance of its content.

The scope of this document is confined in addressing content aggregation and sharing pertaining to education domain but foresee that such an effort will pay its dividend in long run assisting the research community. As mentioned throughout this document the recommended solution is designed with scalability in mind so that it can be stretched in all directions to cover other spectrum of information sources as well.

II. TECHNOLOGICAL ELUCIDATION As part of the effort in integrating various isolated information islands spread across the globe, the paper proposes to build an information super highway connecting all identified dispersed multitude of information assets there by enabling easy amalgamation of the same to form an enormous knowledge base which thereby can be distributed to those in need on a timely fashion. The solution proposed also wanted to ensure that the cost associated with the development and implementation of the same are kept to skin depth so that it can be implemented if needed on a mass scale covering the entire globe or can be start off on a low key knot and then be scaled to leverage its elastic characteristics to cover all needed global domain spectrums. The whole architecture even though does not resemble with the actual cloud

computing setup but engulf in itself several of its properties that made it to stand out and be accepted.

Kloud encompasses in itself three well integrated front end components and six subcomponents that work in conjunction with each other to deliver the function that is destine to redefine how education content is being managed and shared.

Central Content Directory Management Cloud (CCDM Cloud)

Central Content Scavenger and Aggregator Platform

o Intelligent Content Manager o Content Directory Manager o Content Receiver o Content Request Handler o Content Refresher o Directory Replicator o Info Miner o Log Collector o Security Manager

Content Aggregator and Broadcast Agent (CAB Agent)

Kloud architecture is made as simple as possible for it to be implemented effortlessly with minimum expense overheads and technical hiccups. There is a lot of information that organizations hold to their heart and would want to continue to be the owners of those priceless assets but are still willing to share it either publically or privately. This model does keep such dictums in mind and would not want the content owners to part with them in the process of sharing it with the world. If the mode of sharing is private then the system is capable of opening up a secure tunnel connecting cooperating parties which will facilitate this requirement. Above all the aforementioned methodology defers from the traditional method of centralizing the actual bulky content as such at a central location there by eradicating the need for expensive high end storage devices to be lined up to support its content aggregation and distribution model. The proposed system uses its own metadata to store all the internal data required to manage the intended task.

Fig. 2 High level Design depicting component level interaction

Before we progress much with technical jargons associated with this architecture, it is important to clarify the usage of the terms “Cloud” and “Elastic” and its relevance in this subject area. This solution does not impose to build an expensive cloud infrastructure to aggregate and host the ocean of documents available out there

in the open. The model simulates a virtual cloud platform in order to accomplish this where in which the underlying hardware and software components are shared and distributed across the various components that make up this system, managed and supported by diverse parties. The term Elastic will highlight the scaling nature of

this architecture. The following section will expand on each of the key components from an architectural and operational perspective. Figure 2 details each of the components and illustrates how they communicate among themselves.

A. CONTENT AGGREGATOR & BROADCAST AGENT (CAB AGENT).

In order to understand the architecture better and to explain things better and for them all to tie together, the document tends to start off explaining the architecture from the content providers end. As and when an organization registers its interest and their intent to share its content repository, they are registered with the master CCDM server. As part of the requirement the content provider is expected to host its content on a dedicated server with in its firewall ensuring that the content is secured from external attacks. Following the registration process the master CCDM will implant a Content Aggregator & Broadcast Agent component (CAB Agent) onto the content provider’s server. Once this agent is instilled, the contents are copied onto a default directory by the provider. CAB agent will scan through the documents and by using its intelligent classification algorithm will group them under different categories. This is accomplished by tagging each of the files with its associated category. Once categorized and tagged it prepares a content directory which is sent to the master CCDM server. It also makes an attempt to build an abstract for the document, which is displayed while listing the directory to the content consumer at the time of submitting a request.

B. CENTRAL CONTENT DIRECTORY MANAGEMENT CLOUD (CCDM CLOUD).

This module comprises of a server farm categorized as Master and Slaves according to the type of service they offer in order to support the intended objective of this initiative. The architecture stresses the relevance of identifying nodal locations around India to plant servers that will comprise the central content directory management cloud. There will be a network of slave servers hooked to a Master CCDM server. Master CCDM server will encompass in itself well aggregated and sorted content directory.

The content directory will hold relevant details pertaining to the actual content that is stored at the participating providers end. The content directory in Master CCDM is aggregated and sorted in such a way that its content can be mapped against a search request and respond in the most efficient manner. As the density of request increases the performance and response rate of the Master CCDM servers are expected to keep up with the demand. Of course it will meet its breaking point eventually whereby it will fail to respond as per the expectation. In order to counter definitive occurrence of such circumstances it is important to encompass in the design a way to mitigate them by sharing the load and distributing the requests in such a way that other mirror CCDM servers can share the load. With this in mind slave servers are incorporated into the design to act as mirror servers that will work in synch with the master CCDM server acknowledging the service requests raised by content consumers with in the vicinity of its service area. On top of this various sub components are designed to work effectively when distributed across dedicated servers they by enabling parallel processing to speed up the response rate.

Master CCDM work with the CAB Agents installed at the content provider’s distribution servers to synchronize the master directory listing. This feature tends to address the highly dynamic nature of the contents stationed at the providers end. Directory replicator module in the master CCDM will ensure that all slave CCDM’s are in synch at any point of time. Master CCDM’s encompasses in itself an engine that controls and coordinates the entire operation. The following section will discuss in detail each of these sub components.

Since such critical systems cannot afford downtimes, the architecture will have the facility to incorporate multiple master CCDM’s that will remain synchronized among themselves so that in case an unexpected catastrophic circumstances such as the active master CCDM server failing to remain operational arise, then the backup server should kick in to take over the request management, content gathering and distribution portfolio’s.

C. CENTRAL CONTENT SCAVENGER & AGGREGATOR PLATFORM (CCSA PLATFORM).

Six sub modules work hand in hand to ensure that all the components that make up this system work together in synch to achieve state of the art amalgamation and distribution of apt content against the requests raised by the consumer community. A brief description of what each module is indent to accomplish is detailed below.

1) INTELLIGENT CONTENT MANAGER (ICM): ICM is the heart center and is designed to sense and control every throb that pulsates through this architecture. Requests of all nature are first intercepted by the ICM and will then channelize them to respective modules for those to be executed. The advantage of doing it accordingly is that each of these control modules can be made to run on separate servers there by taking lot of the processing load away from the Central master CCDM. This also ensures parallel processing through efficient distribution of jobs across servers thus making it possible to handle large volume of requests with minimal latency.

2) CONTENT DIRECTORY MANAGER: Once the directory listing is broadcasted to the master CCDM by the CAB agent residing at the content provider’s facility, it is then regrouped so that it can be classified and categorized once again against a bigger mass of document directory listing accumulated from other content providers. This is indeed an important process which will ensure that documents of similar nature are tied together. Documents are tied together not only based on the subject area to which it is associated to but also based on prominent key words and other parameters that are fine tuned as the system start to become operational.

3) CONTENT REQUEST HANDLER: All requests from content seekers are prone to hit this module for it to be processed in such a way that the best possible results that match the subject of interest are derived from the content directory. A series of mix and match algorithms brain this module enabling it to search through masses and single out those documents that are appropriate, then

weigh them and sort them according to their degree of relevance.

4) CONTENT REFRESHER: As and when documents are refreshed at the content providers end, CAB agent will make incremental changes to the existing directory structures and will broadcast those changes to the Master CCDM server. Content Refresher module will pick up those requests and will apply augmented updates to the master directory structure.

5) DIRECTORY REPLICATOR: Master CCDM replicate the directory structure it builds onto the registered slave CCDM’s. Whenever updates happen to the central repository the Directory Replicator module will make incremental changes to all the slaves simultaneously. Central CCDM generates a hash value based on the directory content and time factor. While replicating the directory structure onto the slave CCDM’s the Directory Replicator Module compares the hash values. If they don’t match, the Replicator uses the time factor embedded in the hash value stored at the slave CCDM to work out the last replication commit point from the master CCDM. It then reads all increments since that point onwards from the master CCDM and will apply those changes to the slave CCDM’s.

6) INFO MINER: Info Miner is a built in data mining engine that will collect various statistics based on historic data. The Idiom “Older you get wiser you become” fits in best in this paradigm because the data mining engine build into the central CCDM engine will ensure that it learns from data assimilated over a period of time thereby enhancing its decision making capabilities. This will result in the engine fine tuning its document aggregation and sorting logic so that the system will respond with the best possible hits against the queries raised.

7) SECURITY MANAGER: Access control modules are integrated within the CAB agent at the providers end. It ensures that only authorized users can make changes to the content stored in those servers. Suitable security measures are also built into the CAB agent to guarantee that the directory structure build by the CAB agent is

not tampered at any point of time. Likewise a similar security manager module at both the master and slave CCDM servers will prevent any malicious activities being carried out. Multilevel security authorization methodology can be availed to secure mission critical operations where in which the security manager module makes it obligatory to get authorization from multiple authorities.

8) LOG COLLECTOR: Each of the modules generates valuable information in the form of logs that are written to the underlying metadata. Logs are vital source of information to where one can turn to in case of any forensic analysis to be carried out in order to verify the authenticity of any of the past actions performed which was later identified as a malicious action. Logs are generated by default as read only and are barred from being subjected to delete operations. Multi-level authorization policy is required to enforce any changes to the logs generated.

III. FEATURES Content consumers can register with the system their area of interest so that whenever a new document pertaining to those subject areas are added to the directory, the user will be alerted. Broadcast groups comprising of users of similar interest can be defined to which updates can be relayed appropriately. Secure private tunnels can be created to interconnect various co-operating institutions of similar research interest to share confidential documents securely. Universities can make use of e-learning tools like “Aview” to broadcast their lectures live or archive them for it to be replayed for authorized users from the video content library. Virtual lab tools can be very well integrated with this system so that authorized users can perform curriculum based or research based experiments or simulations without having to invest heavily on the lab equipments. Cloud features like scalability and reduction in cost are some of the key features that will make this design even more attractive. This does not need costly cloud data centers to be designed and maintained. Content owners are at ease as their assets still remains under their ownership and control.

IV. Future Extensions This platform will stand true for its name proving its elastic nature by incubating several value added features to which it can be extended to.

Can be used in schools across regions to unify their syllabus, share lecture notes, assignments and examination question papers.

Open up to individuals who want to share their ideas or research documents to a private or public group within the “Kloud” frontier.

Kloud can be a research forum where it uses its Info miner to dive deep into its knowledge base, returning a series of solutions to the questions raised by knowledge seekers that qualify to match the closest proximity against the query raised.

This platform can be extended to support centralized management and distribution of government policies and other official documents. This still means that the official documents will remain with the respective bodies but will be aggregated onto a directory structure for it to be retrieved and presented to other authorized personals/departments.

V. CONCLUSION Kcloud through its centralized directory based approach will reduce the possible overheads of building expensive storage for storing the actual content securely. Building scalable models as proposed in this literature should help the country to stretch its functionality to fit in various dynamic demands that the education domain might encounter in the future. This model can also be extended to fit in other domains as well. A full scale implementation of “Kloud” will propel itself to be the next generation “You Tube” for education related content aggregation and distribution.

REFERENCES

[1] P. Venkat Rangan, Brain Hickson, and Ajay Bharadwaj, “University Aggregation Project: Advanced Automatic Aggregation of Heterogeneous Web-Based Content in University Environments,” in iadis, 2003, paper 11.3.4, p. 1039.

[2] (2009) UNESCO website [Online]. Available: http://stats.uis.unesco.org/unesco/Tableviewer/document.aspx?ReportId=121&IF_Language=eng&BR_Country=3560

[3] (2009) QMKJW website. [Online]. Available: http://www.qmkjw.org/education-in-india-colleges-universities-courses-in-india.htm.

[4] Gene Bellinger, Durval Castro, Anthony Mills. Sytems-Thinking website. [Online]. Available: http://www.systems-thinking.org/dikw/dikw.htm

[IEEE 2012 International Conference on Technology Enhanced Education (ICTEE) - Amritapuri, India...

Documents

Transcript of [IEEE 2012 International Conference on Technology Enhanced Education (ICTEE) - Amritapuri, India...