Decision system for stock market investors - Abstract Method2 Abstract The aim of this research is...

1

This Report is submitted to the University of Strathclyde in partial fulfilment of the

Regulations for the Degree of MSc in Information Technology Systems

Decision system for stock market investors

Michael Witty

200590826

Supervised by: Dr Ian Ruthven.

Department of: Computer & Information Sciences.

September 2007

Except where otherwise expressly indicated the work reported in this document is my

own. It has been performed during, and has not been submitted for assessment in

connection with any other award whatsoever.

Signed Date

2

Abstract

The aim of this research is to analyse how investors collect and use typical market

indicators and investigate the ways in which current technology can be used to enable

more informed time critical decisions.

It has been the Holy Grail of mathematicians, economists, investment banks and

programmers alike to try and create systems and techniques which accurately predict

stock market movements in order to ensure financial gain and eliminate risk. The

findings of such research however are inconclusive as to weather the stock market can

be predicted accurately enough to make significant profits.

The efficient market theory concludes that the market already reflects the value of an

investment since all relevant information is currently in the public domain, this

conclusion is increasingly being challenged as more complex computer systems are

directed towards the field, and the vast repositories of data available on the internet

grow.

This research focuses not on predicting the market but providing better tools to help

investors make decisions based on the current market conditions. Recent standards and

technologies such as XML and Web 2.0 have provided solutions to some of the common

problems of data retrieval, representation, organization and manipulation. This research

looks at the use of such technologies.

Acknowledgements

I would like to thank Dr Ian Ruthven for his help and guidance during this project.

3

Table of Contents

1. Introduction ................................................................................................... 5

1.1. Problem Statement ................................................................................ 5

1.2. Overview .............................................................................................. 5

1.3. Scope .................................................................................................. 5

2. Literary Review .............................................................................................. 6

2.1. The Stock Market .................................................................................. 6

2.1.1 Fundamentals ................................................................................... 6

2.1.2 Predictability ..................................................................................... 7

2.1.3 The Efficient Market Hypothesis ........................................................... 7

2.1.4 The Rise in On-line Trading ................................................................. 8

2.2. Information Visualisation ........................................................................ 8

2.2.1 Existing Graphical Representations ...................................................... 9

2.2.2 Issues ............................................................................................ 12

2.2.3 Cognitive Maps ................................................................................ 13

2.3. Functional View of Market Trading ......................................................... 13

2.3.1 Investor Goals................................................................................. 14

2.3.2 MSN Research Wizard ...................................................................... 14

2.3.3 Functional Requirements .................................................................. 15

2.4. Data Retrieval ..................................................................................... 17

2.4.1 Web Content Mining ......................................................................... 17

2.4.2 Extraction Techniques ...................................................................... 17

2.4.3 Examples ....................................................................................... 18

2.4.4 Dapper.Net ..................................................................................... 21

2.5. Data Storage ...................................................................................... 22

2.5.1 XML ............................................................................................... 22

2.5.2 XPath ............................................................................................. 23

2.5.3 Storing XML .................................................................................... 24

2.5.4 XML Databases ................................................................................ 24

2.6. Transforming XML ............................................................................... 25

2.6.1 XSL................................................................................................ 25

2.6.2 SVG ............................................................................................... 27

2.6.3 XForms .......................................................................................... 28

3. Specification ................................................................................................ 30

3.1. Problem Statement .............................................................................. 30

3.2. Stakeholder Analysis ............................................................................ 30

4

3.3. User Goals .......................................................................................... 31

3.4. Use Cases........................................................................................... 32

4. System Design ............................................................................................. 32

4.1. Methodology ....................................................................................... 32

4.2. Required Technologies ......................................................................... 32

4.3. Decisions ............................................................................................ 33

4.4. Modules ............................................................................................. 33

4.4.1 Dapper Module ................................................................................ 34

4.4.2 Database Connection Module ............................................................ 35

4.5. Proposed Architecture .......................................................................... 36

5. Implementation ............................................................................................ 36

5.1. Issues and Design Changes .................................................................. 36

5.2. Module implementation ........................................................................ 37

5.2.1 Utilities Package .............................................................................. 37

5.2.2 Dapp Manager Package .................................................................... 38

5.2.3 DBQuery Package ............................................................................ 40

5.2.4 Style sheet & Icon Design ................................................................. 41

5.3. Final System Architecture ..................................................................... 44

5.4. Interfaces ........................................................................................... 44

6. Testing ........................................................................................................ 46

6.1. Component Testing .............................................................................. 46

6.2. Usability testing .................................................................................. 47

6.3. Speed & Accuracy Test ......................................................................... 48

6.4. Results ............................................................................................... 51

7. Conclusion ................................................................................................... 52

8. Bibliography................................................................................................. 53

5

1. Introduction

1.1. Problem Statement

For Investors, making financial gains requires quick and informed decision-making based

on many different sources of time sensitive data. News articles, company profiles,

financial indicators and general economic conditions are constantly changing and all can

affect the performance and financial well being of a company and therefore its stock

price.

The process of gathering and analysing this information is time consuming. For investors

the cost of their time and charges incurred through buying and selling shares

immediately reduces potentials gains.

Another consideration is the time dependant nature of trading, this means that not only

do investors have to collect and analyse the relevant data they must also make a

decision based on this data while it is still relevant. Although many sites or portals exist

specifically aiming to provide a single source for investors, these sites themselves

become vast and are still predominantly text based.

The task for investors is then not only to discover the relevant information but also

determine some meaning from it.

1.2. Overview

This paper aims to research and develop a graphical display method to present a holistic

view of a market index, specifically the FTSE 100. The intention being that investors can

narrow their research efforts by filtering which companies are worth investigating further

and ultimately help make decisions whether to buy, sell or hold a particular stock. A brief

overview will be given of the stock market and a functional analysis of the trading

process. An analysis of existing systems and technologies will be used to develop a

graphical tool for traders to gain a quick insight into the market.

1.3. Scope

Some assumptions are made about the users of the final system, most financial sites

offer tutorials and helpers to aid first time investors on the intricacies of the stock

market, the assumption has been made that the users and testers of this system where

necessary will have relevant knowledge and experience to do so, and that the writing of

6

such educational material into the final site is beyond the scope of this project. The final

system is also intended to provide proof of concept and as such will demonstrate some

possibilities in terms of retrieving data, however all possible scenarios will not be

implemented, with a view that the system is versatile enough to retrieve data from

many diverse locations assuming the user provides appropriate configuration.

2. Literary Review

First an understanding of the stock market and available technologies is required to

assess high-level functional requirements and identify technologies, which can answer

these requirements.

2.1. The Stock Market

Stock is the term for the outstanding capital of a company or corporation, this stock is

divided up into shares which are traded on an exchange in a similar way to an auction,

the difference being that in a stock market sellers and buyers do not make trades on a

highest or lowest offer wins basis, instead they are matched based on the price they are

willing to trade at.

2.1.1 Fundamentals

Many exchanges exist globally, among these the major ones include the New York Stock

Exchange (NYSE) in America, the Nikkei in Japan and in the UK the London Stock

Exchange (LSE).

Each exchange consists of lists or indexes of companies grouped by market capitalization

(the estimated total value of a company). If a company is listed on a particular index

investors can gauge how large the company is in terms of its financial value. This project

is concerned with the FTSE 100, which lists the top UK companies traded on the London

Stock Exchange.

The price at which shares are bought and sold is governed by many factors, the price of

a stock can be thought of as a reflection of what the market is willing to pay. Expressed

another way it can be said that the market price is a reflection of the perceived value of

a company, this value changes over time to reflect the company’s financial performance

and well-being. As Elinger observes the market is searching for the right price1

1 The Art of Investment Elinger, A.

7

2.1.2 Predictability

The predictability of financial markets has engaged the attention of market professionals

& academic economists & statisticians for many years 2

Being able to predict how a market or individual share is going to behave in the future

would be of great advantage to any investor giving them a guaranteed profit on any

investments they make. As such this is exactly what many investors try to do.

Several methods and techniques exist from fundamental analysis to technical charting.

The effectiveness of such techniques is always being debated and indeed whether or not

it is in fact possible to predict market movements with any degree of accuracy. Some

studies and theories challenge the reasoning behind such pursuits one notable study is

the efficient market hypothesis.

2.1.3 The Efficient Market Hypothesis

Malkiel3 first proposed the Efficient Market Hypothesis in 1973. The findings of his

research suggest that the market cannot be predicted using any of the formal techniques

such as fundamentals and technical analysis. These methods rely on quantitative data

about companies and trade information such as prices and volume, which are freely

accessible in the public domain.

It is proposed that since this information is already freely available to all investors the

market already reflects any implications of this information. The debate continues

between promoters of the EMH and the more traditional technical analysts, as yet no

solid conclusions have been made either way, and with so much attention from various

research sources the debate is likely to continue. Recent advances in techniques,

computing power and larger data sets available via the Internet have fuelled this debate

further4.

It has therefore become clear that an alternative approach is required to instead provide

investors with all the information and data they require in such a way that allows a quick

overview and analysis of market activity to help make investing decisions, Mills5

proposes that investors need to gather and analyse this information as soon as it

becomes available so that timely decisions can be made.

2 Predicting the unpredictable Mills, T

3 Random Walk Down Wall Street Malkiel, B


8

2.1.4 The Rise in On-line Trading

A number of factors have resulted in on-line trading becoming more popular in the past

few years, increased availability of data, increase in net usage, new technology faster

connections and favourable market conditions have made investing in stock more

attractive6.

This vast increase in on-line trading has given rise to many web sites offering trading

tools for investors and market data portals.

In addition recent standards and technologies such as XML and Web 2.0 have enabled

richer web based applications including the use of graphics.

2.2. Information Visualisation

Information visualisation is concerned with the representation of data in a graphical

format, which successfully imparts information to the viewer. This idea was famously

captured by the proverb ‘a picture is worth a thousand words’.

Tufte7 takes this concept further by introducing the idea of data density. Textural based

representations are limited by the viewer’s ability to read and understand the text itself.

Basic Text can be thought of as one dimensional in its ability to communicate

information, being the value the characters represent. Graphics on the other hand can

be used to represent more than one dimension through the use of colour, size, shape

and context. This ability means that more data can be represented over a set area.

Harris8 describes how using colour alone can help authors in the following ways:

Differentiate Elements

Encode areas of equal value

Alert viewer when a predetermined condition occurs

Identify particular values

Indicate similar items

Signify changes in direction, trends conditions


6 Stock Market Psychology, Warneryd, K

7 Envisioning Information Tufte, E

8 Information Graphics Harris,R

9

Improve retention of information

Use gradations to indicate transitions from one set of conditions to another

It can be seen that many of these attributes lend themselves nicely to the stock market

scenario, particularity in the identification of trends and changes in direction for

numerical indicators.

2.2.1 Existing Graphical Representations

The idea of representing data using graphics is not new even in the stock market

scenario; various charts and display methods already exist:

Simple time series: Probably the most synonymous chart with stock markets is the time

series graph, which simply plots one variable against a set time period from this an

investor can see how the price has performed historically.

Figure 1 Example of traditional charting on Self Trade.

Candlesticks: Bar Chart/Candlestick – First devised by a Japanese rice trader the idea of

the candlestick diagram is to show price change over a certain period in relation to the

10

highest and lowest price. Candle sticks are still used today on many sites such as Digital

Look and Self Trade. They are a good example of how graphics can be used to store data

in a smaller area. The example below shows that by using a box and two lines the

diagram can successfully communicate 4 pieces of information to a user at once. When

combined with a time series chart even more information can be imparted.

Figure 2 Candlestick Example9

Heat-maps: The concept of the heat map is to display a particular indicators rate of

change (most commonly the price change over a period) and communicate this change

graphically by changing the colour of the graphic.

Digital Look10 provides one example of a heat-map currently available:

9 http://www.babypips.com/school/what_is_a_candlestick.html

10 http://www.digitallook.com/cgi-bin/dlmedia/investing/visual_tools/heat_maps?

http://www.babypips.com/school/what_is_a_candlestick.htmlhttp://www.digitallook.com/cgi-bin/dlmedia/investing/visual_tools/heat_maps?

11

Figure 3: Digital Look Heat-Map

MSN11 also provides a similar heat-map display again this displays the price change for a

certain period.

11

http://msn.moneyam.com/heatmaps/

http://msn.moneyam.com/heatmaps/

12

Figure 4: MSN Heat-Map

It can be seen that most of these graphical tools only attempt to map one variable, and

in all cases it is almost certainly the change in price over a certain time period.

2.2.2 Issues

Spencer12 observes ‘The mere re-arrangement of how the data is displayed can lead to a

surprising degree of additional insight’

It is clear that graphics can help however on the converse Tufte13 also observes that the

incorrect use of graphics can have a negative effect.

Some common errors include the use of irrelevant decoration, information overload and

negative use of colour. As such the factors must considered when designing such

interfaces. As a guide the following requirements need to be addressed:

Selection of Data – Relevant to a task

Representation – How to represent abstract things

Presentation – Spatial Limitations

12

Information Visualisation Spence, R 13

Visual Explanations Tufte, E

13

Scale Dimensionality - How many dimensions, variables can be displayed

Re-arrangement, interaction & exploration

Internalisation – Minds representation of an internal image

Externalisation – Display of what user actually sees, i.e. computer display

Mental Models – human memory models

Invention, experience & skill

2.2.3 Cognitive Maps

The next consideration in terms of information visualisation is how the user interacts

with the graphic. The idea of a cognitive map is how the user constructs a navigational

guide to an interface in memory; a simple real world analogy would be the London

underground map.

Most passengers on the underground have one goal in mind, which is how to get from

point A to point B and the required connections between the two. As such the

underground map uses colour to represent the different connecting routes and does not

attempt to display any other real world data such as accurate scales because the user is

not interested in this information.

Another analogy would be to think of cognitive maps as the bridge between the real

world, the computer display and the users memory14.

The process of creating these maps can be illustrated by the following sequence:

Browse > CONTENT > model > INTERNAL MODEL > interpret > INTERPRETAION >

Formulate browsing strategy > BROWSING STRATEGY.

To aid this process the concept of Context maps can be used to help users create such

models. Such maps aim to give the viewer an basis on which to build their own cognitive

map.

2.3. Functional View of Market Trading

To gain an understanding of how investors make decisions and the ways in which this

data is analysed a functional analysis of trading activities is undertaken.

14

Mental models, Navigation

14

2.3.1 Investor Goals

Investors all share a common goal to achieve a return on their initial investment. On a

very basic level the goal is to always buy when a stock is undervalued before the market

moves to reflect this, and conversely sell when an investment is overvalued. Put simply

buy low and sell high.

The methods used to achieve this will vary from person to person. Individual goals and

strategies will differ between individual personalities and age groups. Investors can

however be grouped into two general categories as either active or passive traders, also

known as short and long traders.

Active traders aim to make profit from the short-term natural fluctuations in price or

volatility. The frequency of these trades varies, the most extreme example being the day

trader who makes very large trades over short periods to take advantage of daily

fluctuations in price.

Passive traders in comparison aim to take advantage of the markets long-term tendency

to increase, they therefore make very infrequent trades and buy shares periodically to

add to their portfolio as opposed to selling. Most traders generally fall into the second

category15

2.3.2 MSN Research Wizard

MSN Research wizard16 gives a good indication of what is involved when deciding to sell

or buy shares. The page is a kind of expert system using MSN data to guide an investor

through the process of assessing an individual company. The wizard looks mainly at

fundamental data to gauge how good an investment is.

The wizard is split in to 5 main sections. The first step looks at the company’s

fundamentals; a set of indicators used to assess a companies financial well-being.

Fundamentals can be used to determine how profitable a company has been to date and,

as well as giving an idea of the general state of their finances. The kinds of question it

aims to answer include:

How much does the company sell and earn (sales & income)

15

Stock Market Psychology, Warneryd, K 16

http://uk.moneycentral.msn.com/investor/research/wizards/srw.asp?Symbol=GB%3Abp%2E

http://uk.moneycentral.msn.com/investor/research/wizards/srw.asp?Symbol=GB%3Abp%2E

15

How fast is the company growing (sales growth & income growth compared to

industry)

How profitable is the company (profit compared to industry 1yr & 5yr)

How is the companies financial health (debt/equity ratio compared to industry)

Some investors use a company’s past price performance as an indication of future

performance, many will argue that past prices have no bearing on future prices, likewise

some will argue that a company that has performed well to date should perform well in

the future. As such this page basically gives an overview of the stocks performance

measured as price change over the past 1, 3 and 12 months.

Following on from the fundamentals the next section looks at the likely future price of

the investment. Using a company’s profits to earnings ratio along with analyst

expectations an estimate of how the company is likely to perform over the coming 2

years is given.

A company’s share price can be affected by a number of social factors, such as news

stories relating not only to the company itself but general economic conditions. An

extreme example of this is demonstrated by the Northern Rock bank crisis17, which saw

the share price lose 30% of its value overnight. This dramatic drop in price was initiated

after it emerged the company had sought a loan from the Bank of England as a result of

difficult financial conditions. Despite the fact the fundamental business was sound the

panic that ensued as customers withdrew savings caused the market price to freefall.

Recognising the importance of financial news MSN have added in a catalysts section to

the wizard, which details any company specific news stories that could impair or improve

confidence in the company.

Finally another predominant task in the decision process is considered, which is

comparison. Looking at a single company profile can only impart information in a single

context. To get meaning from this data a comparison is required, in this case MSN allows

comparative analysis with up to two other company profiles.

2.3.3 Functional Requirements

17

http://news.bbc.co.uk/1/hi/business/7007076.stm

http://news.bbc.co.uk/1/hi/business/7007076.stm

16

From our initial investigation it is clear that in terms of making wise investments

knowledge is key. As J.K Lassers18 observes of Warren Buffet; one of Americas most

successful investors:

He will seek out every last bit of information he can get, whether it’s a company’s return

on equity or the fact that the CEO is a miser who takes after Ebenezer Scrooge himself.

Using the MSN wizard as a guide the functional tasks can be broken down as follows:

Determine profitability of a company

Determine return on investment

Determine the risk of the investment

Determine the value of the company

An insight in exactly how the data is analysed can also be gained. It can be seen that

most numerical indicators are analysed in the following ways

Value in relation to highs and lows

Value in comparison with a base value such as market or sector

Difference between two values, spreads, rate of change.

Trends and direction.

Identification of changes in trend, turning points.

The main functional requirements can be grouped into two main categories:

Our first main functional requirement is therefore the retrieval and storage of data from

the World Wide Web for analysis.

Secondly to make decisions the data must be analysed, this will involve some or all of

the tasks described in the previous section, which investors already perform on the

various sources available. A graphical interface is proposed which will allow users to

explore and display the retrieved data in different ways to gain a better understanding of

its meaning.

Each top-level requirement is investigated in turn to generate lower level requirements:

18

Pick Stocks Like Warren Buffet Lassers, J K

17

2.4. Data Retrieval

There is a wealth of information available to the investor via the modern Internet. As

such many companies have emerged which aim to provide content to investors for

analysis, sites such as MSN money19, Digital Look20 and Self Trade21. As we have seen

however relevant information can come from a wide range of sources. To access all

these resources manually involves searching and browsing for content. Even with a

comprehensive bookmark list of sites, this activity is time consuming and laborious.

There is a requirement therefore to programmatically extract and consolidate this

information.

2.4.1 Web Content Mining

Web content mining is concerned with discovering information from the many sources

available on the web22. Using data mining techniques content can be analysed and

extracted for use in other applications.

One problem with using such a vast data repository such as the Internet is the dynamic

nature of the content. In order to retrieve data in any circumstance an application needs

to know where to look and a reference of what it is looking for. In the context of the

World Wide Web we are dealing with pages of content which can be written in a range of

formats; ASP, JSP and HTML may change in structure at any time and may not following

strict rules associated with mark up languages.

A further complication is the fact that HTML generally doesn’t contain any type

information and content will almost always be represented as a generic string type. This

poses issues when trying to extract useful information, which will be used by another

program that is strongly typed such as Java.

Luckily despite these issues there are techniques and programs, which solve these

problems:

2.4.2 Extraction Techniques

A basic technique for retrieving web-based content is the concept of Screen Scraping.

Screen scraping involves extracting data from its final output format, usually the visual

19

http://money.uk.msn.com/ 20

http://www.digitallook.com 21

http://www.selftrade.co.uk/

18

display of the program being scraped. In the context of the web this would involve

taking content from the browser directly. This can be achieved by a number of methods

such as regular expressions or dedicated API's. This technique has limitations however,

because the data being extracted is taken from a format which has human readability in

mind, additional processing is required to remove styling elements. The data itself will

not necessarily be structured in a suitable way for use by other programs and as such

requires contextual information added later.

Tree Builders are aimed specifically at web page extraction and take advantage of the

mark up languages structure. A tree builder will attempt to create a tree representation

of a web page in memory by matching start and end tags in the target document. The

program will then build a representation of the structure in order to provide a navigable

context. The designer of the particular extraction program will dictate the way in which

the tree is built and how extensively it caters for specific tag libraries. Once a tree

representation has been created data can be extracted based on its location in a

document. This method is useful for retrieving data from many pages, which have

identical layouts for different content, such as stock prices, but can only work with

supported formats.

W3 introduced the Document Object Model23 or DOM to address these issues, in their

own words:

‘The Document Object Model is a platform- and language-neutral interface that will allow

programs and scripts to dynamically access and update the content’

The introduction of this standard meant that API and program writers had a common

interface to work from. As such parsers can take the tree builder concept to the next

level by building a DOM representation of the page in order to extract its content.

2.4.3 Examples

Implementation of a fully-fledged extraction program is time consuming and not the

main focus of this project; there are many freely available programs for this task, two

notable online examples being Yahoo pipes24 and Dapper25:

Yahoo pipes is a web 2.0 application available exclusively on-line, it relies on structured

data in the form of XML, RSS feeds and JSON as a target content type. The site consists

22

Web Content Mining with Java, Loton, T 23

http://www.w3.org/DOM 24

http://pipes.yahoo.com/pipes/ 25

http://www.dapper.net/

http://www.w3.org/DOMhttp://pipes.yahoo.com/pipes/http://www.dapper.net/

19

of a graphical interface in which users add modules and connect them to create a

customized output from existing web pages.

The modules themselves perform various tasks affording the users control over the data

retrieved from selected URL’s. The output is then displayed as a standard html page,

which can be viewed by anyone who logs into the site.

Example creating a simple RSS feed aggregator.

The “Fetch Feed” module is used to retrieve news stories from the BBC's business feed,

this is simply connected to the output module.

Figure 5 Simple feed to retrieve RSS from the BBC

Multiple feeds can be combined in the fetch feed module, a filter module is added to

allow users to search the feeds for specific terms. The search module is added to provide

a user input on the main page.

20

Figure 6 Simple Aggregator to combine two feeds

A search term box is added in the above example to filter only news items of interest

from the 3 selected news feeds.

Figure 7 Output page for the aggregator with search term box.

21

Other modules can be used to create more complex pipes, XML data can be extracted

directly and manipulated, filtered or combined with other web sources to create useful

pages. However the application is limited to use with live data and the output is

restricted to the standard output, in addition few sources of useful data are freely

available in XML format.

2.4.4 Dapper.Net

Dapper (concatenation of Data Mapper) is another online application that allows users to

extract content from anywhere on the net and output it into various formats including

XML, JSON, RSS feeds etc. Dapper also provides a Java API allowing developers to

connect their programs with dapper to retrieve the extracted content.

Dapp’s are small retrieval applications created using the main site. Each Dapp is created

to parse a specific web page. Initially this is achieved via a virtual browser within the

site. The user interface allows web content to be selected for retrieval. In the example

below the last trade price element is selected. Each selected element can have some

basic manipulation to remove preceding or tailing strings in this case the p is removed.

Figure 8 Dapper UI showing selected content

Any number of elements can be added. Once the content has been selected the user can

add field names and the output grouped. These are reflected in the resulting XML output.

22

Figure 9 Preview showing output

Dapper is flexible enough to allow modifications to the content which is retrieved at a

later date. The addition of the Java API allowing external programs to interface with

Dapper makes it an ideal solution to the retrieval problem.

2.5. Data Storage

The second top-level requirement of the proposed design is the storage of the data

retrieved by Dapper. The output format from dapper is selected when the Dapp is

created and the user has several options including RSS feeds JSON and standard HTML.

Since we are using the data in another application it makes sense to retrieve the data as

XML:

2.5.1 XML

XML is a standard for data exchange and has become popular for use in desktop

applications for configuration files, as well as on the web to store and exchange data.

XML can be thought of as data about data, in that not only does it contain the actual

data but also contextual and structural information.

XML has many advantages; firstly it’s high portability between applications and cross

platform the fact that it has been a W3 standard since 1998 means a lot of applications

and application interfaces are available. For the example Dapp that we created in the

23

previous section the XML output would look as follows (the actual output has been

simplified to show only the elements of interest).

MSNPriceData

http://uk.moneycentral.msn.com/investor/quotes/quotes.asp?symbol=GB:BP.L

1.233

2007-07-29 15:59:25

605.00

Although there is only actually one piece of data, being the last price the Dapp gives lots

of other information within the XML document such as the source of the data, when it

was accessed and the name of the Dapp that accessed it.

The structure of XML is strict in that every start tag must have a corresponding end tag

and each document must have a single root element. In this example it can be seen that

the tag is the root and all the other tags are nested within it. This

characteristic allows logical grouping of elements in hierarchies.

2.5.2 XPath

XPath is a query language that enables the inspection of XML files. The language is a W3

standard and works on a hierarchical basis similar to a file system. An XPath navigates

through the document structure to a particular node or set of nodes depending on how

far down the tree the path goes. This adds an interesting capability to XML documents in

that they can be treated as a very simple database provided an XPath interface is

available.

In the above example we consider the following XPath expression:

//PriceData/last

The double slash at the start tells the path to start from the root node: elements the

following expressions tell it to first navigate to the element which is a child

of and then to the element which is in turn a child of

the result would then be 605.00; the content of our element.

24

2.5.3 Storing XML

Using XML on its own cannot provide a solution which will fully replace a relational

database; although in theory the data could be extracted and continually added to one

large XML file the problems of organization, persistence, availability, security, efficient

search and update still exist.

There is a need therefore to use a RDBMS to store XML data, a number of possible

solutions are available. One solution involves storing XML files directly as a file within the

database, however this solution disregards the logical structure of the XML files when

performing queries on the resulting table.

Another solution would be to create further reference tables to store some of the more

important structural information about a document, which can then be queried. This case

will not cope well with changes to document structure since the underlying tables will

need to be updated to reflect such changes.

Therefore to gain the full advantage of XML the document would need to be decomposed

before insertion into the database and then recompiled when it is extracted. XML

schemas could also be used to ensure the structure is maintained. Although the

database can now provide the same level of logical information as the original document

there are performance ramifications.

2.5.4 XML Databases

XML databases aim to give the best of both worlds. A native XML database allows the

storage of individual documents in collections, which can be queried and updated using

XPath and Xupdate; another standard for performing updates on xml. Collections are

more versatile than a traditional RDBMS in that they can store a set of generic XML

documents regardless of weather they contain the same structure. Collections can also

be stored within collection to provide further levels of grouping and allow queries on

multiple sources.

Apache Xindice is a Java implementation of a native XML database according to

XMLdb.org specifications. Xindice runs as a web application in a suitable container such

as Tomcat, the way in which the database is access and added to is up to the designer of

the application, since Xindice is Java based there is a substantial API to support most of

its functions although it is possible to control via a command line interface.

25

Because it is packages as a web app collections can be viewed via a web browser:

Figure 10 Xindice debug tool showing a collection of XML files

Xindice nicely answers our second requirement to store our retrieved data, since this is

already in XML format courtesy of Dapper. It also means we don’t need to worry about

tailoring for changes in incoming data’s structure and a handy interface is provided to

check up on the collections.

2.6. Transforming XML

The final requirement is to represent the retrieved data in a graphical format, again W3

and XML standards provide the answer. Two standards exist which can address the

problem: XSL and SVG.

2.6.1 XSL

XSL stands for Extensible Style sheet Language. XSL is to XML what CSS is to HTML. W3

continues its mission to separate data from presentation by introducing XML style sheets

or XSTL for short. XSL allows designers to dynamically change the representation of XML

data into other formats such as HTML and SVG.

Using our example output file from before we add an extra line to reference the style

sheet:

26

MSNPriceData

http://uk.moneycentral.msn.com/investor/quotes/quotes.asp?symbol=GB:BP.L

1.233

2007-07-29 15:59:25

605.00

In this case we want to simply display this data in a HTML file along with some other

information, the resulting style sheet would look as follows:

Latest Price

Latest Stock Price :

The style sheet uses our XPath expression to reference the content to display, the

resulting HTML file looks as follows:

Figure 11 Result of XSL transform in Firefox

27

The advantage of this is the separation of data from presentation, we could use the

same style sheet over and over again to display the price of different stocks.

XSL requires a parser to transform XML data, most browsers support this as standard so

that XML can be styled on the client side to provide the desired result in the case of

Firefox Expat is used. It is also possible to style the data on the server side using a third

party parser such as Apache Xalan before passing the resulting transformed document to

the client, in which case they would simply receive the HTML representation.

2.6.2 SVG

SVG stands for Scalable Vector Graphics26 another W3 standard, which extends XML to a

graphical format. SVG aims to address some of the current issues with web-based

images such as file size and varying screen resolutions. Vector Graphics are images

generated from a series of vectors drawn between defined co-ordinates. The relevant

data required to draw the image is stored as XML mark-up using SVG tags. One

advantage of this format is the ability to scale the image without loss of quality or

pixelation. One draw back to this technology is the need for a plug-in to be installed

within the client browser, although Firefox and Opera support SVG as standard IE still

requires the Adobe plug-in.

A further advantage being that SVG is part of the W3 recommendation so it can be

coupled with XSL to generate graphics from XML making it ideal for representing

numerical data graphically.

Again looking at our previous example we use the same XML/XSL combination to draw a

simple box that represents the price of a security. The XSL to achieve this would be as

follows:

26 http://www.w3.org/TR/SVG11/

http://www.w3.org/TR/SVG11/

28

Resulting SVG output, albeit not very interesting the dimensions of the box in this case

are determined by the stock price divided by 10:

Figure 12 Simple box representation of a stock price

The XSL is slightly different in this case because we need to use our data as a value

within the SVG mark-up to do this an XSL variable can be used to temporarily store the

data so it can be used in the transform.

The use of XSL XML and XSL is a nice set of standards, which answer our data

presentation problem, once a suitable XSL template is created it can be reused wherever

required.

2.6.3 XForms

Xforms (short for XML forms) is one of the latest standards from W3 pitched in there

own words as the latest generation of Web Forms to replace the outdated HTML form27.

Xforms aim to make the task of creating web forms easier with many of the standard

tasks involved in such an exercise incorporated into the specification; retrieving and

saving data from local files, validation of user inputs and dynamic content are just a few

examples. One of the main advantages of Xforms however is the ability to access and

update XML content and provide logical bindings between data, even in separate XML

files. Xforms also aim to provide a better user experience with some AJAX like

functionality built in.

Xforms are written in XML using Xform tags they access content in other XML files using

the concept of bindings along with XPath to navigate the documents. A further benefit is

the ability to make asynchronous submissions from the form without any laborious

Javascript.

27

http://www.w3.org/TR/xforms/

http://www.w3.org/TR/xforms/

29

The following example illustrates a simple Xform:

The XML file from the previous example is used to write an Xform to give a user access

the data:

My First Xform

My XML Data

Data:

Save

The above form will appear as follows in an Xform enabled browser:

Figure 13 Our price data from before appears but can now be edited.

With Xforms the designer defines a model representation of the data, this can be

programmed directly into the form or referenced from an external file as in the above

30

example. Here the submission element tells the forms processor to save the file to the

local file system as data.xml.

A further advantage of Xforms is the fact that it can access any other XML standard

mark-up such as XSL. Coupling these two standards together enables the form to not

only access XML data but also manipulate XSL files and therefore change the resulting

SVG output.

3. Specification

Given our initial requirements have been identified and relevant technologies researched

the problem can be assessed in more detail, to generate further requirements the

problem statement can be updated and stakeholder analysis revisited.

3.1. Problem Statement

The initial problem statement was the retrieval and representation of data from the

Internet. We now have to consider how this will be solved using the technologies

identified. Specifically the inclusion of Dapper adds additional functions and requirements

from the point of view of administering and running the retrieval.

3.2. Stakeholder Analysis

The initial stakeholder analysis identifies the primary, secondary and tertiary

stakeholders:

Primary Stakeholders: Administrators

User Profile: The administrators could also be private investors. At present the

assumption is made that some external management is required for the site

whether this is by the investor using the site or a third party.

Role: Ensure errors caused by external factors such as server downtime, changes

to site structure are dealt with, input will be required to respond to such problems

and update Dapp’s as necessary.

Goals: Browse web sources for relevant information. Identify information, which

is of interest. Manage and update Dapp's. Build up and maintain collections of

data sources. Schedule tasks for Dapp's to perform. Manage collected data.

Secondary Stakeholders: Investors and End Users

31

User Profile: Investors and front end users who will access the date retrieved via

the graphical interface.

Role: The data that is retrieved and the format that it is eventually stored in will

affect the people who use that data. Investors want information as soon as it is

available and spending time searching for this information is costly both in terms

of investors time but also in their ability to make informed decisions.

Goals: Gain an overview of all relevant information relating to current or future

potential investments. Select the output format for the data. Filter and Search

data. Extend administrator goals.

Tertiary Stakeholders: Content Owners

User Profile: Web masters and web content owners

Role: Maintaining web pages and content

Goals: Attract users to their sites and in some cases generate revenue through

advertising or subscription

3.3. User Goals

High-level goals identified from problem statement and stakeholder analysis are used to

define top-level use cases.

Manage and update Dapp's.

Build up and maintain collections of data sources.

Manage collected data.

Select different views of the data.

Filter and Search data.

32

3.4. Use Cases

Figure 14 Use Case Diagram

4. System Design

4.1. Methodology

The Design Methodology used is a top down modular approach to development. Starting

with high-level use cases the interfaces and main functionality are determined; from

here functional requirements are elicited as separate modules based on their intended

tasks. The previous sections have outlined the various specifications available to answer

our three top-level requirements, Data retrieval, storage and transformation. These

standards follow a strict Model View Controller paradigm as such it makes sense to

extend this to the whole application.

4.2. Required Technologies

33

Before development some base technologies are required to support the system. Xindice

runs as a web application on a suitable container, in this case Apache Tomcat is chosen.

Once Xindice has been downloaded and unpacked it is deployed to Tomcat and tested

using the appropriate URL, in this case: http://localhost:8282/xindice/?/db.

The top level collection in Xindice is called db, the question mark indicates the debug

page which is automatically loaded when Xindice is accessed using the base URL. This is

the only user interface provided as standard for Xindice. XML files can be viewed via this

tool but not added or manipulated.

Xforms and SVG cannot be viewed on all browsers by default. Extensions are required

for most to support these standards and the level of functionality supported differs

between implementation. As such Firefox is chosen since it provides good support for

SVG and the Mozilla Xforms extension implements most of the Xforms 1.0 functionality

despite still being in a development stage.

To aid development the eclipse IDE is used since it supports most of the standards used

with the exception of Xfoms and SVG. Firefox provides an error console, which is useful

for debugging XML content as such it can provide useful feedback on SVG, XML, XSL and

Xform errors.

4.3. Decisions

Although much of the necessary functionality can be implemented on the client side

using browser extensions, a back end is still required to interface with Dapper and

Xindice.

Java Servlets were chosen to address this requirement partly due to the Java API

support for both Xindice and Dapper but also because java has plenty of XML and DOM

API’s to allow XML data handling. Xforms can post data as XML files direct to the server

to handle these files the server side application must therefore be able to access and

manipulate XML.

4.4. Modules

To simplify the design process the application is split into smaller modules each

addressing a specific function. Keeping with the grouping used so far the main functions

are data retrieval, storage and presentation.

34

4.4.1 Dapper Module

Much of the data retrieval requirement has been addressed by Dapper, however the

Dapper.net provides a means to create the Dapps but not control their execution. The

Dapps themselves only have the ability to execute for one URL at a time, our

requirement is to extract data from different sources but also for multiple pages in the

same resource, this involves specifying parameters directly within the URL.

Our first requirement is therefore a means to execute a Dapp and specify a URL for it to

work on. From our Use cases we also have the requirement to maintain collections of

resources for the Dapp to retrieve from. Finally there are two requirements to select the

Dapp to be used and specify the storage location for the data.

The Dapper API unfortunately is very basic and looks incomplete. As such the interface

options for Java are limited so much of the above functionality needed to be

implemented.

Using our Model View Controller ideal the functionality is divided, first implementing the

data aspect of our module an XML file is created to store the Model view of our Dapp.

Unfortunately the Dapper API does not provide an obvious means to elicit certain

parameters from the site. The model will therefore be a means to represent each Dapp.

From the initial requirements the following information needs to be stored:

Dapp Name

Storage Location

Collection of resource locations (URLs)

Using XML as a storage medium in this way not only makes sense because XML support

is required for the other aspects of the design so extra effort is saved on implementing

another means to store the configuration data.

The View aspect will be taken care of by Xforms again to take advantage of the XML

standards and functionality on the client side. With Xforms users can manipulate the XML

files to address our requirements to add and update lists of resources,

The above will provide a nice interface for some XML but won’t actually do anything so

the controller aspect is required. Since the storage medium for the retrieved data is

Xindice, which needs to run on Tomcat, a servlet container, it is logical to use Java

servlets for our back end functionality.

35

4.4.2 Database Connection Module

To access Xindice methods are required to first of all connect to the collection and

perform queries on the data, again a servlet module will be used for the controller

aspect. The data is stored in collections within the database, the top-level collection db

contains database specific files such as Meta information and should not be used to store

content, as such collections need to be created. Our first set of requirements is therefore

to provide the ability for users to create collections within Xindice.

Once data is retrieved by Dapper it will need to be inserted into a specific collection,

although the data retrieval task is handled by the Dapper module the retrieved data will

be passed to the database connection for insertion. Xindice allows the programmer to

specify a unique id for the document being inserted into the database, however since we

will be querying the XML content directly using Xpath having a suitable system for

identifying documents by their id is not required. In addition Xindice has a mechanism in

place to automatically assign unique ids to files as they are added which saves some

development work.

Finally a requirement exists to query the collections, the Xindice API provides a query

engine, which accepts an XPath string as input, the issue will therefore be to provide a

suitable interface to the user that can be translated to an Xpath query whilst being user

friendly.

To provide the XSL functions a server side function is proposed to work with the other

servlets. A third party parser is required such as Apache Xalan to achieve this. Although

the browser can take care of processing XSL some extension functions may be required

to provide more robust support for numeric processing, many extension libraries exist

which can be used for this purpose.

From these initial design considerations a conceptual architecture was drawn up showing

the relationship between the various components.

36

4.5. Proposed Architecture

Figure 15: Proposed System Architecture

5. Implementation

The implementation approach was again top down, first of all creating the user interfaces

to address user requirements then developing the Java code to accommodate the

intended functions. The final implementation differed slightly from the original design

concept as problems and improvements were discovered in through the implementation

phase of the project.

5.1. Issues and Design Changes

In the initial concept a couple of changes were made: firstly it was initially envisioned

that the final system would behave much like a real world web application with login

details and user specific preferences. It was decide however that this kind of

37

functionality did not add any major benefit to the project nor did it help achieve the

initial goals.

The second change was to the retrieval aspect, it can be seen from the conceptual

architecture that the intention was to allow XSL transformations to be made on the data

before being inserted into the database; unnecessary data could be removed and

additional information added to improve document retrieval. It was later decided to drop

this function since the benefits would be minimal.

5.2. Module implementation

As with the overall architecture some changes were made during the development

process to accommodate new information as it became available. The implementation of

each module is discussed in detail.

5.2.1 Utilities Package

The utilities package was added to provide some basic functions to each of the other

modules rather than repeating code. Two core functions that both the database query

and dapp manager classes would require was the ability to access Xindice and

manipulate XML documents using DOM4J.

The Database Connector class provides basic database functions such as connection,

collection discovery, insertion and retrieval of documents. Queries are also execute via

the Database connector by passing an XPath string expression to the executeQuery()

method. Although no DOM standard implementation is favoured by any of the APIs

DOM4J was chosen because of the range of available functions. Within all the modules

38

XML files are manipulated or passed as DOM4J implementations of the Document

interface.

There were few issues with the database connector because much of the functionality is

available via the Xindice API and little additional functionality had to be coded.

The XML helper class was implemented to carry out the XML document processing which

became a common requirement between classes. The class handles saving, reading and

converting XML between formats.

During the development process it became clear that the generic typing of the retrieved

data by Dapper was going to cause problems with XSL. Some of the SVG transforms

required numerical data without any formatting information included. For example 1000

is represented as 1,000. To ensure the data retrieved is suitable for use with XSL,

regular expressions and additional data validation had to be added. As such the retrieval

process became more complicated. The solution was to add a user defined content type

field to the admin page so that users could specify what kind of data they were

expecting to retrieve. The selection is used to perform regular expressions on the input

strings to ensure the data will work with XSL.

Another feature of Xindice is that it can be used to update the XML files contained within

a collection using XUpdate, it would have been more elegant to store the relevant

configuration files in a separate user collection within Xindice. The issue with doing this

is that frequent database queries would need to be made because of the dependency

between Xforms and the XML files. As such it was decided to keep the files static on the

server and manipulate the documents using the XMLHelper class as such methods were

added to add and remove nodes sets from the document.

5.2.2 Dapp Manager Package

39

Despite the functionality provided by Dapper the execution and management of the

Dapps became more involved that expected. A Dapp implementation class was required

to store and manage the data sent via Xforms, an additional URLlist class was

implemented to manage the list of variables being used for retrieval. Finally a servlet is

used to access the objects.

The XML data submitted by the Xform is used to instantiate the DappImplementation

and Urllist objects. A base URL and a list of variables is specified by the user and stored

in the dapp configuration XML file. Once submitted, the URLlist class is responsible for

generating URLs and keeping track of the current progress. Replacing a predefined

marker in the base URL with a variable creates the URL as follows:

http://uk.moneycentral.msn.com/investor/quotes/quotes.asp?Symbol=GB:{var}

becomes

http://uk.moneycentral.msn.com/investor/quotes/quotes.asp?Symbol=GB:TSCO

The URL can now be passed to the Dapp for retrieval, once the Dapp has executed the

resulting XML file is validated to see if the retrieval was successful and if the data is valid

against our regular expression list. A copy of the output data elements is queried during

validation and added to the configuration file, this was not part of the original

functionality but later added to support the use of regular expressions for typing data,

the added benefit of this is that it can be used as a list of query parameters for the

Xforms interface.

40

It was the intention to use the URLlist class as a progress reporter for the Administration

interface. At present a particularly long list of variables will take a while to execute. It

would be desirable to provide feedback to the user on its progress, which could be

achieved by a servlet and javascript to periodically query the list. However this feature

was omitted due to time constraints.

5.2.3 DBQuery Package

The Database Query package contains the classes required for not only the database

queries but also the XML transforms. The two were packaged together because they are

both used from the same Xform interface.

The function of sorting and querying the data could be achieved in a number of ways. It

was decided that the styling function should be kept separate from the database query

function. One solution would have been to send the result of the database query direct

to the client with a reference to the relevant style sheet and allow the users browser to

perform the transform. This solution however means that every time a user makes a

change to the way in which a document is styled they need to submit another query to

the database in addition to parsing and styling the output again. The fact that the client’s

browser only sees the result of the transform prevents inspection of the output by the

interface, which can aid some of the context information. The decision was made to keep

the data retrieval and styling tasks separate.

When a query is submitted, the DBQuery servlet builds an Xpath expression from the

user input and queries a specified collection. The resulting XML output is not sent to the

client but stored on the server. Once complete a submit is triggered automatically to the

Data Styler servlet which then transforms the output file into and SVG document and

again stored the result on the server. The interface reads the SVG direct from this file

41

the advantage of this set-up is that changes to the style sheet can be performed on the

database output without submitting another query to the database. The added

advantage is that the database output is now directly accessible to the Xform provides

additional functionality such as listing available query parameters.

5.2.4 Style sheet & Icon Design

The style sheet design proved to be the most difficult part of the implementation. The

task was to provide a set of predefined graphical representations that users could select

and manipulate via the main interface. As we have seen XSL provides a mechanism to

use XML data to transform graphics, this can be any SVG parameter, dimension, colour,

opacity, shape etc.

The concept of an icon is used to represent an individual data entity, in this example we

are looking at individual shares as extracted from Dapper. The icon provides a graphical

representation of 1 or more pieces of data. The number of parameters differs between

each icon, the simpler ones only displaying one piece of information as a change in one

of the graphical aspects of the design.

The problem with this concept is providing a context for comparison. Because the data

can be over an infinite range we need some base to compare each item to. To get round

this problem each variable is presented as a percentage of the groups maximum. For

example if we want to display the last price the style sheet first needs to know what the

maximum price is in the data set.

This is achieved through he use of exstl:math a set of extension functions which can be

used in addition to XSL. In this case the math:max function is used to determine the

maximum value of an element in a node set. Once this value has been calculated the

individual elements can be compared to it to work out where they are placed on the

scale. Since this value can now be calculated as a percentage the style sheet can

calculate a corresponding percentage of a graphical value. In figure 16 the opacity of

each box represents each elements last price in relation to the maximum price of the

data set being viewed.

42

Figure 16 simple box icon designs similar to the heat-map concept.

This example is fairly simple and essentially the same as a heat-map to provide more

interesting graphics more complex icons needed to be designed using the same

principle.

To allow representation using different icons without lots of server side processing the

functionality of Xforms is taken advantage of again. As we have seen Xforms can access

any XML based content and as such can access and modify XSL. The basic parameters of

each icon are stored in the style sheet as global parameters. By implementing a simple

input control with a reference to these parameters the shapes can be manipulated.

In figure 16 three global parameters are available; zoom, resolution and text size. The

zoom function simply scales the graphics up by increasing the relevant dimensions. SVG

proves its usefulness, as the graphics remain crisp no matter how much a user zooms in.

The text size parameter is self-explanatory although the same function can be achieved

via most browsers.

Finally the resolution parameter is added as an exaggeration function. In figure 17 a

slightly more complex graphic is illustrated. This time the icon displays the difference

between two parameters as a sloping line. On initial testing of this model it became clear

that for some data sets the differences in slope was negligible for some stocks, making

43

distinction between icons difficult. To address this issue an exaggeration parameter was

added to allow the user to multiply the slope by a certain factor making small differences

more visible.

Figure 17 Rate of change icon showing the difference between two variables

The icons themselves are based on separate templates within the style sheet. The

selection of the icon to be used is achieved via the Xform interface and a binding to the

template reference. This allows the user to select any template from the list.

To improve usability the style sheet interface needed to be dynamic from the point of

view that the number of available user specified variables for each template would differ

and as a consequence would have different meanings in the context of the current

template. A separate style sheet configuration file is provided to the Xform and bound to

the style sheet the result is that the Xform now knows what controls to display when.

Looking at figures 16 and 17 it can be seen that the number of inputs available and the

labelling of these inputs differ between icons.

The positioning of the icons on the screen was another problem, which took considerable

time to resolve. The dynamic nature of the icons and scalability meant hard coding the

positions on the page was not an option, as such each position needed to be calculated

based on a stating point the size of the icons and the screen width.

44

5.3. Final System Architecture

Figure 18 Final system architecture showing changes

5.4. Interfaces

The final interfaces were partly governed by the functional requirements but also

constrained by the capabilities of Xforms. As mentioned earlier the Administration Xform

had a few additions to support the specification of basic type information for the

retrieved content. Xforms can be styled using CSS in much the same was as HTML the

process is not quite as straightforward and again relies on the browsers support. As such

only basic styling was used mainly for positioning elements on the user interface.

Another of xforms advantages is the ability to dynamically display content and controls

without making requests to a server. This function is used on the admin interface to

provide a page style navigation through the various Dapps the user has created; the left

45

and right arrow icons navigate between the Dapps updating the relevant fields. This

ability is also demonstrated by add and remove controls, which allow the user to add

new Dapps or variables and likewise remove them. The changes made by the user still

need to be saved, if the dapp.xml file was stored on the local file system this would be

easy through the use of Xforms built in put submission, since we are running from a

server we need to submit the XML and use a servlet to save the changes, although this

is not a perfect solution Xforms makes the submission asynchronously so the user is not

affected too much.

Figure 19 Admin interface showing Dapp configuration data

There are some outstanding issues with the Xpath navigation, the original intention

being that the Xform should be able to identify or generate the required Xpath to a

specific element by inspecting the database output and the dapp configuration file. The

Xform cannot handle groupings of data in this way and some additional path information

needs to be added by the user. For example we ideally want the user to be able to enter

any parameter that is available as either a search parameter or a styling parameter. This

information is taken from the Dapp and output XML documents stored on the server.

These elements only store the lowest level element names and as such cannot be passed

as a useful xpath parameter since we require the full path, in this case we need to first

access the parent element of Fundamentals first. This is perhaps an oversight in the

design but some modifications can be made to rectify this by adding more contextual

information to the dapp.xml file.

46

Figure 20 A slightly alternative presentation approach where the width of the ring signifies a value

It can be seen in the above illustration that two sets of submission controls are provided

to the user one for the database query and one for the data styler. The svg result is

loaded automatically from the server into a separate iframe, when changes are made to

the style sheet the xform waits until the submission is complete then refreshes the frame

to update the graphic.

6. Testing

In order to test how effective the design is at fulfilling the requirements the testing is

divided into 3 categories: Component testing and usability tests to determine how well

functional requirements are met. To test the initial hypotheses that a graphical interface

will be of advantage to an investor speed accuracy tests are carried out.

6.1. Component Testing

On the software level unit tests were carried out on each component to ensure they

achieve the desired functionality. Each functional requirement is tested in turn to ensure

the final design satisfies the original specification.

47

6.2. Usability testing

After sufficient testing of the base components was completed the User interface had to

be tested to determine how effective the design is in terms of usability and also to

determine if the solution provides proof of concept.

To test the usability of the system an observational approach was taken based on

Nielson’s 5 quality attributes28:

Learn-ability: How easy is it for users to accomplish basic tasks the first time they

encounter the design.

Test subjects were not given any background on the program and asked to try and

interact with it. They were also asked to describe what they were thinking and any

assumptions they had about the interface. The observer did not respond to any direct

questions at this point in order to gauge how effective the interface was at

communicating functionality.

Efficiency: Once users have learned the design, how quickly can they perform

tasks.

After the initial tests users were given the opportunity to ask questions to get a better

understanding of the interface, they were then asked to repeat specific tasks in order to

assess how easy it was to perform specific functions.

Memorability: When users return to the design after a period of not using it, how

easily can they re-establish proficiency?

Test subjects were at this point asked to return to the program after a period of time in

order to assess how easy it was to remember the affordances of the interface.

Errors: How many errors do users make, how severe are these errors, and how

easily can they recover from the errors.

An observational approach was again taken to note any mistakes the user made and

their impact on the system.

Satisfaction: How pleasant is it to use the design.

Finally test subjects were asked on a scale of 1 to 10 how pleasant they felt the interface

was to use.

48

6.3. Speed & Accuracy Test

To test how well the system answers the initial problem an assessment is made of how

well users can gain insight into the data being represented by the system. To test this

two factors were investigated; speed and accuracy.

Experimental Set-up

A set of 100 shares representing the FTSE 100 was selected, the data set being a

representation of the market on a specific date. For each date test subjects were asked

to identify a value in the set firstly on the graphical interface then on a plain text

representation of the same data.

The ordering of the symbols was changed in between tests to ensure subjects didn’t

memorize the positioning of a particular stock. Subjects were timed to see how long it

takes to identify a particular value and then assessed on how accurate they were.

For the first two tests the relative size graphical icon was used. This representation

changes the size of the icon relative to a specified value.

Figure 21 Relative Size box graphic

28

http://www.useit.com/

49

In the first test users were asked to identify which icon they thought represented the

highest and lowest value in a collection.

For the second set of data test subjects were asked to identify trends based on the daily

price movement. For this test the Two Variable Box representation was used

Figure 22 Two variable box graphic showing the relative difference

Correctly identify steepest trend up and down in a collection

Figure 22 shows the basic two variable icon, the rate slope of the line indicates the

difference between the specified variables.

To test the effectiveness of this design users were asked to look at the graphic and

identify which stock they thought was falling the fastest and also which one they thought

was rising the fastest.

The decision times were timed in all cases as a comparison to timings gained using a

text only representation.

50

Figure 23 Two variables relative to a third.

Correctly identify value the indicator near its highest and lowest extreme.

Figure 23 shows one of the more complex icon designs, similar in concept to the

candlestick the diagram it aims to show the direction and rate of change of the daily

price in relation to its year to date high.

As with the previous icon the slope of the line indicates the rate of change and the colour

re-enforces its direction. The position of the line in relation to its container box signifies

how close the current price is to the highest value it has been over the past year.

To test the effectiveness at communicating this information users were again timed and

asked to identify the stock they think is closest to its year to date high and furthest

away.

Figure 24 shows the text only interface, which was implemented as a style sheet

template in order to keep the surrounding interface the same and change as few test

variables as possible. The above tasks were all repeated on this interface again changing

the sorting order of the data to avoid test subjects memorizing data locations.

51

Figure 24 Text only representation of a variable

6.4. Results

Usability Test.

Learn-ability: after observing a set of 5 subjects it became clear that more

contextual information was required for the controls. One test subject commented

that it is not immediately obvious what function some of the controls performed.

Another issue was the openness of some of the controls, for example the zoom

control can be set to any value the user wants and it is not immediately obvious

how large that will make the icons.

Efficiency: On an initial attempt with no instruction some users had difficulty

working out what the controls did, however after a quick demonstration most

could manipulate the data confidently.

Memorability: After a day the users were asked to return to the interface and try

out some basic tasks to see how easy it was to repeat. Most users achieved this

52

task successfully and the main issue seemed to be the initial usage of the

interface.

Errors: The most common errors the users made was to either compare

parameters which were not suitable for any logical comparison and selecting

scales that were caused excessive distortion of the graphics. The first issue is

hard to rectify since the user can define any data source an assumption is made

that they will pick resources suitable for comparison. The second issue can be

rectified by the addition of stricter limits to the interface.

Satisfaction: The overall satisfaction rating was 6 out of 10 from our 5 test

subjects. There is evidently room for improvement on the interface however some

of the test subjects had no prior knowledge of stock market trading and as such

the overall purpose and context of the application was new to them.

Speed-Accuracy Test.

The results of the speed and accuracy test were more promising with 80% of the test

cases the users speed of decision-making was faster than using a test only interface.

The accuracy figures were however less conclusive, with both textural and graphical

accuracy rates of 60%. We would expect the accuracy rates to be around or lower for

the graphical representations because they are not as definite as numerical figures.

The test group used could have been larger and more testing in this area is needed to

make definite conclusions on the effectiveness of the interface, however the initial

results tend to agree the hypothesis that a graphical system is better in terms of gaining

quick insights into large sets of data.

7. Conclusion The application displays a basic answer to the initial requirements albeit a simplified one

but could easily be extended to give a wider range of functions, in its current state it

demonstrates that by using the available web standards a flexible system can be

developed which allows data to be retrieved transformed and represented on the web.

Our test results prove that the system can effectively impart large amounts of

information quickly to the viewer; however further work is required to improve the user

interface, mainly in the areas of contextual information.

To expand the system a user can easily add any content they like provided Dapper.net

could extract it successfully. There are limitations to the data that can be viewed and the

graphical icons that are displayed. Going forward it would be beneficial to provide

53

another interface, which allows users to create icons based on the retrieved data to

create personalized graphical representations.

8. Bibliography Cleveland, Williams S: Visualizing data, Murray Hill, N.J. : At&T Bell Laboratories ; [Summit, N.J. :

Published by Hobart Press, c1993]

Ellinger, A. G: The art of investment, 3rd rev. ed. Bowes and bowes, 1971

Harris, Robert L: Information graphics: a comprehensive illustrated reference

New York : Oxford University Pr

Decision system for stock market investors - Abstract Method2 Abstract The aim of this research is...

Documents

Transcript of Decision system for stock market investors - Abstract Method2 Abstract The aim of this research is...