Decision system for stock market investors - Abstract Method2 Abstract The aim of this research is...
Transcript of Decision system for stock market investors - Abstract Method2 Abstract The aim of this research is...
-
1
This Report is submitted to the University of Strathclyde in partial fulfilment of the
Regulations for the Degree of MSc in Information Technology Systems
Decision system for stock market investors
Michael Witty
200590826
Supervised by: Dr Ian Ruthven.
Department of: Computer & Information Sciences.
September 2007
Except where otherwise expressly indicated the work reported in this document is my
own. It has been performed during, and has not been submitted for assessment in
connection with any other award whatsoever.
Signed Date
-
2
Abstract
The aim of this research is to analyse how investors collect and use typical market
indicators and investigate the ways in which current technology can be used to enable
more informed time critical decisions.
It has been the Holy Grail of mathematicians, economists, investment banks and
programmers alike to try and create systems and techniques which accurately predict
stock market movements in order to ensure financial gain and eliminate risk. The
findings of such research however are inconclusive as to weather the stock market can
be predicted accurately enough to make significant profits.
The efficient market theory concludes that the market already reflects the value of an
investment since all relevant information is currently in the public domain, this
conclusion is increasingly being challenged as more complex computer systems are
directed towards the field, and the vast repositories of data available on the internet
grow.
This research focuses not on predicting the market but providing better tools to help
investors make decisions based on the current market conditions. Recent standards and
technologies such as XML and Web 2.0 have provided solutions to some of the common
problems of data retrieval, representation, organization and manipulation. This research
looks at the use of such technologies.
Acknowledgements
I would like to thank Dr Ian Ruthven for his help and guidance during this project.
-
3
Table of Contents
1. Introduction ................................................................................................... 5
1.1. Problem Statement ................................................................................ 5
1.2. Overview .............................................................................................. 5
1.3. Scope .................................................................................................. 5
2. Literary Review .............................................................................................. 6
2.1. The Stock Market .................................................................................. 6
2.1.1 Fundamentals ................................................................................... 6
2.1.2 Predictability ..................................................................................... 7
2.1.3 The Efficient Market Hypothesis ........................................................... 7
2.1.4 The Rise in On-line Trading ................................................................. 8
2.2. Information Visualisation ........................................................................ 8
2.2.1 Existing Graphical Representations ...................................................... 9
2.2.2 Issues ............................................................................................ 12
2.2.3 Cognitive Maps ................................................................................ 13
2.3. Functional View of Market Trading ......................................................... 13
2.3.1 Investor Goals................................................................................. 14
2.3.2 MSN Research Wizard ...................................................................... 14
2.3.3 Functional Requirements .................................................................. 15
2.4. Data Retrieval ..................................................................................... 17
2.4.1 Web Content Mining ......................................................................... 17
2.4.2 Extraction Techniques ...................................................................... 17
2.4.3 Examples ....................................................................................... 18
2.4.4 Dapper.Net ..................................................................................... 21
2.5. Data Storage ...................................................................................... 22
2.5.1 XML ............................................................................................... 22
2.5.2 XPath ............................................................................................. 23
2.5.3 Storing XML .................................................................................... 24
2.5.4 XML Databases ................................................................................ 24
2.6. Transforming XML ............................................................................... 25
2.6.1 XSL................................................................................................ 25
2.6.2 SVG ............................................................................................... 27
2.6.3 XForms .......................................................................................... 28
3. Specification ................................................................................................ 30
3.1. Problem Statement .............................................................................. 30
3.2. Stakeholder Analysis ............................................................................ 30
-
4
3.3. User Goals .......................................................................................... 31
3.4. Use Cases........................................................................................... 32
4. System Design ............................................................................................. 32
4.1. Methodology ....................................................................................... 32
4.2. Required Technologies ......................................................................... 32
4.3. Decisions ............................................................................................ 33
4.4. Modules ............................................................................................. 33
4.4.1 Dapper Module ................................................................................ 34
4.4.2 Database Connection Module ............................................................ 35
4.5. Proposed Architecture .......................................................................... 36
5. Implementation ............................................................................................ 36
5.1. Issues and Design Changes .................................................................. 36
5.2. Module implementation ........................................................................ 37
5.2.1 Utilities Package .............................................................................. 37
5.2.2 Dapp Manager Package .................................................................... 38
5.2.3 DBQuery Package ............................................................................ 40
5.2.4 Style sheet & Icon Design ................................................................. 41
5.3. Final System Architecture ..................................................................... 44
5.4. Interfaces ........................................................................................... 44
6. Testing ........................................................................................................ 46
6.1. Component Testing .............................................................................. 46
6.2. Usability testing .................................................................................. 47
6.3. Speed & Accuracy Test ......................................................................... 48
6.4. Results ............................................................................................... 51
7. Conclusion ................................................................................................... 52
8. Bibliography................................................................................................. 53
-
5
1. Introduction
1.1. Problem Statement
For Investors, making financial gains requires quick and informed decision-making based
on many different sources of time sensitive data. News articles, company profiles,
financial indicators and general economic conditions are constantly changing and all can
affect the performance and financial well being of a company and therefore its stock
price.
The process of gathering and analysing this information is time consuming. For investors
the cost of their time and charges incurred through buying and selling shares
immediately reduces potentials gains.
Another consideration is the time dependant nature of trading, this means that not only
do investors have to collect and analyse the relevant data they must also make a
decision based on this data while it is still relevant. Although many sites or portals exist
specifically aiming to provide a single source for investors, these sites themselves
become vast and are still predominantly text based.
The task for investors is then not only to discover the relevant information but also
determine some meaning from it.
1.2. Overview
This paper aims to research and develop a graphical display method to present a holistic
view of a market index, specifically the FTSE 100. The intention being that investors can
narrow their research efforts by filtering which companies are worth investigating further
and ultimately help make decisions whether to buy, sell or hold a particular stock. A brief
overview will be given of the stock market and a functional analysis of the trading
process. An analysis of existing systems and technologies will be used to develop a
graphical tool for traders to gain a quick insight into the market.
1.3. Scope
Some assumptions are made about the users of the final system, most financial sites
offer tutorials and helpers to aid first time investors on the intricacies of the stock
market, the assumption has been made that the users and testers of this system where
necessary will have relevant knowledge and experience to do so, and that the writing of
-
6
such educational material into the final site is beyond the scope of this project. The final
system is also intended to provide proof of concept and as such will demonstrate some
possibilities in terms of retrieving data, however all possible scenarios will not be
implemented, with a view that the system is versatile enough to retrieve data from
many diverse locations assuming the user provides appropriate configuration.
2. Literary Review
First an understanding of the stock market and available technologies is required to
assess high-level functional requirements and identify technologies, which can answer
these requirements.
2.1. The Stock Market
Stock is the term for the outstanding capital of a company or corporation, this stock is
divided up into shares which are traded on an exchange in a similar way to an auction,
the difference being that in a stock market sellers and buyers do not make trades on a
highest or lowest offer wins basis, instead they are matched based on the price they are
willing to trade at.
2.1.1 Fundamentals
Many exchanges exist globally, among these the major ones include the New York Stock
Exchange (NYSE) in America, the Nikkei in Japan and in the UK the London Stock
Exchange (LSE).
Each exchange consists of lists or indexes of companies grouped by market capitalization
(the estimated total value of a company). If a company is listed on a particular index
investors can gauge how large the company is in terms of its financial value. This project
is concerned with the FTSE 100, which lists the top UK companies traded on the London
Stock Exchange.
The price at which shares are bought and sold is governed by many factors, the price of
a stock can be thought of as a reflection of what the market is willing to pay. Expressed
another way it can be said that the market price is a reflection of the perceived value of
a company, this value changes over time to reflect the company’s financial performance
and well-being. As Elinger observes the market is searching for the right price1
1 The Art of Investment Elinger, A.
-
7
2.1.2 Predictability
The predictability of financial markets has engaged the attention of market professionals
& academic economists & statisticians for many years 2
Being able to predict how a market or individual share is going to behave in the future
would be of great advantage to any investor giving them a guaranteed profit on any
investments they make. As such this is exactly what many investors try to do.
Several methods and techniques exist from fundamental analysis to technical charting.
The effectiveness of such techniques is always being debated and indeed whether or not
it is in fact possible to predict market movements with any degree of accuracy. Some
studies and theories challenge the reasoning behind such pursuits one notable study is
the efficient market hypothesis.
2.1.3 The Efficient Market Hypothesis
Malkiel3 first proposed the Efficient Market Hypothesis in 1973. The findings of his
research suggest that the market cannot be predicted using any of the formal techniques
such as fundamentals and technical analysis. These methods rely on quantitative data
about companies and trade information such as prices and volume, which are freely
accessible in the public domain.
It is proposed that since this information is already freely available to all investors the
market already reflects any implications of this information. The debate continues
between promoters of the EMH and the more traditional technical analysts, as yet no
solid conclusions have been made either way, and with so much attention from various
research sources the debate is likely to continue. Recent advances in techniques,
computing power and larger data sets available via the Internet have fuelled this debate
further4.
It has therefore become clear that an alternative approach is required to instead provide
investors with all the information and data they require in such a way that allows a quick
overview and analysis of market activity to help make investing decisions, Mills5
proposes that investors need to gather and analyse this information as soon as it
becomes available so that timely decisions can be made.
2 Predicting the unpredictable Mills, T
3 Random Walk Down Wall Street Malkiel, B
4 Predicting the unpredictable Mills, T
-
8
2.1.4 The Rise in On-line Trading
A number of factors have resulted in on-line trading becoming more popular in the past
few years, increased availability of data, increase in net usage, new technology faster
connections and favourable market conditions have made investing in stock more
attractive6.
This vast increase in on-line trading has given rise to many web sites offering trading
tools for investors and market data portals.
In addition recent standards and technologies such as XML and Web 2.0 have enabled
richer web based applications including the use of graphics.
2.2. Information Visualisation
Information visualisation is concerned with the representation of data in a graphical
format, which successfully imparts information to the viewer. This idea was famously
captured by the proverb ‘a picture is worth a thousand words’.
Tufte7 takes this concept further by introducing the idea of data density. Textural based
representations are limited by the viewer’s ability to read and understand the text itself.
Basic Text can be thought of as one dimensional in its ability to communicate
information, being the value the characters represent. Graphics on the other hand can
be used to represent more than one dimension through the use of colour, size, shape
and context. This ability means that more data can be represented over a set area.
Harris8 describes how using colour alone can help authors in the following ways:
Differentiate Elements
Encode areas of equal value
Alert viewer when a predetermined condition occurs
Identify particular values
Indicate similar items
Signify changes in direction, trends conditions
5 Predicting the unpredictable Mills, T
6 Stock Market Psychology, Warneryd, K
7 Envisioning Information Tufte, E
8 Information Graphics Harris,R
-
9
Improve retention of information
Use gradations to indicate transitions from one set of conditions to another
It can be seen that many of these attributes lend themselves nicely to the stock market
scenario, particularity in the identification of trends and changes in direction for
numerical indicators.
2.2.1 Existing Graphical Representations
The idea of representing data using graphics is not new even in the stock market
scenario; various charts and display methods already exist:
Simple time series: Probably the most synonymous chart with stock markets is the time
series graph, which simply plots one variable against a set time period from this an
investor can see how the price has performed historically.
Figure 1 Example of traditional charting on Self Trade.
Candlesticks: Bar Chart/Candlestick – First devised by a Japanese rice trader the idea of
the candlestick diagram is to show price change over a certain period in relation to the
-
10
highest and lowest price. Candle sticks are still used today on many sites such as Digital
Look and Self Trade. They are a good example of how graphics can be used to store data
in a smaller area. The example below shows that by using a box and two lines the
diagram can successfully communicate 4 pieces of information to a user at once. When
combined with a time series chart even more information can be imparted.
Figure 2 Candlestick Example9
Heat-maps: The concept of the heat map is to display a particular indicators rate of
change (most commonly the price change over a period) and communicate this change
graphically by changing the colour of the graphic.
Digital Look10 provides one example of a heat-map currently available:
9 http://www.babypips.com/school/what_is_a_candlestick.html
10 http://www.digitallook.com/cgi-bin/dlmedia/investing/visual_tools/heat_maps?
http://www.babypips.com/school/what_is_a_candlestick.htmlhttp://www.digitallook.com/cgi-bin/dlmedia/investing/visual_tools/heat_maps?
-
11
Figure 3: Digital Look Heat-Map
MSN11 also provides a similar heat-map display again this displays the price change for a
certain period.
11
http://msn.moneyam.com/heatmaps/
http://msn.moneyam.com/heatmaps/
-
12
Figure 4: MSN Heat-Map
It can be seen that most of these graphical tools only attempt to map one variable, and
in all cases it is almost certainly the change in price over a certain time period.
2.2.2 Issues
Spencer12 observes ‘The mere re-arrangement of how the data is displayed can lead to a
surprising degree of additional insight’
It is clear that graphics can help however on the converse Tufte13 also observes that the
incorrect use of graphics can have a negative effect.
Some common errors include the use of irrelevant decoration, information overload and
negative use of colour. As such the factors must considered when designing such
interfaces. As a guide the following requirements need to be addressed:
Selection of Data – Relevant to a task
Representation – How to represent abstract things
Presentation – Spatial Limitations
12
Information Visualisation Spence, R 13
Visual Explanations Tufte, E
-
13
Scale Dimensionality - How many dimensions, variables can be displayed
Re-arrangement, interaction & exploration
Internalisation – Minds representation of an internal image
Externalisation – Display of what user actually sees, i.e. computer display
Mental Models – human memory models
Invention, experience & skill
2.2.3 Cognitive Maps
The next consideration in terms of information visualisation is how the user interacts
with the graphic. The idea of a cognitive map is how the user constructs a navigational
guide to an interface in memory; a simple real world analogy would be the London
underground map.
Most passengers on the underground have one goal in mind, which is how to get from
point A to point B and the required connections between the two. As such the
underground map uses colour to represent the different connecting routes and does not
attempt to display any other real world data such as accurate scales because the user is
not interested in this information.
Another analogy would be to think of cognitive maps as the bridge between the real
world, the computer display and the users memory14.
The process of creating these maps can be illustrated by the following sequence:
Browse > CONTENT > model > INTERNAL MODEL > interpret > INTERPRETAION >
Formulate browsing strategy > BROWSING STRATEGY.
To aid this process the concept of Context maps can be used to help users create such
models. Such maps aim to give the viewer an basis on which to build their own cognitive
map.
2.3. Functional View of Market Trading
To gain an understanding of how investors make decisions and the ways in which this
data is analysed a functional analysis of trading activities is undertaken.
14
Mental models, Navigation
-
14
2.3.1 Investor Goals
Investors all share a common goal to achieve a return on their initial investment. On a
very basic level the goal is to always buy when a stock is undervalued before the market
moves to reflect this, and conversely sell when an investment is overvalued. Put simply
buy low and sell high.
The methods used to achieve this will vary from person to person. Individual goals and
strategies will differ between individual personalities and age groups. Investors can
however be grouped into two general categories as either active or passive traders, also
known as short and long traders.
Active traders aim to make profit from the short-term natural fluctuations in price or
volatility. The frequency of these trades varies, the most extreme example being the day
trader who makes very large trades over short periods to take advantage of daily
fluctuations in price.
Passive traders in comparison aim to take advantage of the markets long-term tendency
to increase, they therefore make very infrequent trades and buy shares periodically to
add to their portfolio as opposed to selling. Most traders generally fall into the second
category15
2.3.2 MSN Research Wizard
MSN Research wizard16 gives a good indication of what is involved when deciding to sell
or buy shares. The page is a kind of expert system using MSN data to guide an investor
through the process of assessing an individual company. The wizard looks mainly at
fundamental data to gauge how good an investment is.
The wizard is split in to 5 main sections. The first step looks at the company’s
fundamentals; a set of indicators used to assess a companies financial well-being.
Fundamentals can be used to determine how profitable a company has been to date and,
as well as giving an idea of the general state of their finances. The kinds of question it
aims to answer include:
How much does the company sell and earn (sales & income)
15
Stock Market Psychology, Warneryd, K 16
http://uk.moneycentral.msn.com/investor/research/wizards/srw.asp?Symbol=GB%3Abp%2E
http://uk.moneycentral.msn.com/investor/research/wizards/srw.asp?Symbol=GB%3Abp%2E
-
15
How fast is the company growing (sales growth & income growth compared to
industry)
How profitable is the company (profit compared to industry 1yr & 5yr)
How is the companies financial health (debt/equity ratio compared to industry)
Some investors use a company’s past price performance as an indication of future
performance, many will argue that past prices have no bearing on future prices, likewise
some will argue that a company that has performed well to date should perform well in
the future. As such this page basically gives an overview of the stocks performance
measured as price change over the past 1, 3 and 12 months.
Following on from the fundamentals the next section looks at the likely future price of
the investment. Using a company’s profits to earnings ratio along with analyst
expectations an estimate of how the company is likely to perform over the coming 2
years is given.
A company’s share price can be affected by a number of social factors, such as news
stories relating not only to the company itself but general economic conditions. An
extreme example of this is demonstrated by the Northern Rock bank crisis17, which saw
the share price lose 30% of its value overnight. This dramatic drop in price was initiated
after it emerged the company had sought a loan from the Bank of England as a result of
difficult financial conditions. Despite the fact the fundamental business was sound the
panic that ensued as customers withdrew savings caused the market price to freefall.
Recognising the importance of financial news MSN have added in a catalysts section to
the wizard, which details any company specific news stories that could impair or improve
confidence in the company.
Finally another predominant task in the decision process is considered, which is
comparison. Looking at a single company profile can only impart information in a single
context. To get meaning from this data a comparison is required, in this case MSN allows
comparative analysis with up to two other company profiles.
2.3.3 Functional Requirements
17
http://news.bbc.co.uk/1/hi/business/7007076.stm
http://news.bbc.co.uk/1/hi/business/7007076.stm
-
16
From our initial investigation it is clear that in terms of making wise investments
knowledge is key. As J.K Lassers18 observes of Warren Buffet; one of Americas most
successful investors:
He will seek out every last bit of information he can get, whether it’s a company’s return
on equity or the fact that the CEO is a miser who takes after Ebenezer Scrooge himself.
Using the MSN wizard as a guide the functional tasks can be broken down as follows:
Determine profitability of a company
Determine return on investment
Determine the risk of the investment
Determine the value of the company
An insight in exactly how the data is analysed can also be gained. It can be seen that
most numerical indicators are analysed in the following ways
Value in relation to highs and lows
Value in comparison with a base value such as market or sector
Difference between two values, spreads, rate of change.
Trends and direction.
Identification of changes in trend, turning points.
The main functional requirements can be grouped into two main categories:
Our first main functional requirement is therefore the retrieval and storage of data from
the World Wide Web for analysis.
Secondly to make decisions the data must be analysed, this will involve some or all of
the tasks described in the previous section, which investors already perform on the
various sources available. A graphical interface is proposed which will allow users to
explore and display the retrieved data in different ways to gain a better understanding of
its meaning.
Each top-level requirement is investigated in turn to generate lower level requirements:
18
Pick Stocks Like Warren Buffet Lassers, J K
-
17
2.4. Data Retrieval
There is a wealth of information available to the investor via the modern Internet. As
such many companies have emerged which aim to provide content to investors for
analysis, sites such as MSN money19, Digital Look20 and Self Trade21. As we have seen
however relevant information can come from a wide range of sources. To access all
these resources manually involves searching and browsing for content. Even with a
comprehensive bookmark list of sites, this activity is time consuming and laborious.
There is a requirement therefore to programmatically extract and consolidate this
information.
2.4.1 Web Content Mining
Web content mining is concerned with discovering information from the many sources
available on the web22. Using data mining techniques content can be analysed and
extracted for use in other applications.
One problem with using such a vast data repository such as the Internet is the dynamic
nature of the content. In order to retrieve data in any circumstance an application needs
to know where to look and a reference of what it is looking for. In the context of the
World Wide Web we are dealing with pages of content which can be written in a range of
formats; ASP, JSP and HTML may change in structure at any time and may not following
strict rules associated with mark up languages.
A further complication is the fact that HTML generally doesn’t contain any type
information and content will almost always be represented as a generic string type. This
poses issues when trying to extract useful information, which will be used by another
program that is strongly typed such as Java.
Luckily despite these issues there are techniques and programs, which solve these
problems:
2.4.2 Extraction Techniques
A basic technique for retrieving web-based content is the concept of Screen Scraping.
Screen scraping involves extracting data from its final output format, usually the visual
19
http://money.uk.msn.com/ 20
http://www.digitallook.com 21
http://www.selftrade.co.uk/
-
18
display of the program being scraped. In the context of the web this would involve
taking content from the browser directly. This can be achieved by a number of methods
such as regular expressions or dedicated API's. This technique has limitations however,
because the data being extracted is taken from a format which has human readability in
mind, additional processing is required to remove styling elements. The data itself will
not necessarily be structured in a suitable way for use by other programs and as such
requires contextual information added later.
Tree Builders are aimed specifically at web page extraction and take advantage of the
mark up languages structure. A tree builder will attempt to create a tree representation
of a web page in memory by matching start and end tags in the target document. The
program will then build a representation of the structure in order to provide a navigable
context. The designer of the particular extraction program will dictate the way in which
the tree is built and how extensively it caters for specific tag libraries. Once a tree
representation has been created data can be extracted based on its location in a
document. This method is useful for retrieving data from many pages, which have
identical layouts for different content, such as stock prices, but can only work with
supported formats.
W3 introduced the Document Object Model23 or DOM to address these issues, in their
own words:
‘The Document Object Model is a platform- and language-neutral interface that will allow
programs and scripts to dynamically access and update the content’
The introduction of this standard meant that API and program writers had a common
interface to work from. As such parsers can take the tree builder concept to the next
level by building a DOM representation of the page in order to extract its content.
2.4.3 Examples
Implementation of a fully-fledged extraction program is time consuming and not the
main focus of this project; there are many freely available programs for this task, two
notable online examples being Yahoo pipes24 and Dapper25:
Yahoo pipes is a web 2.0 application available exclusively on-line, it relies on structured
data in the form of XML, RSS feeds and JSON as a target content type. The site consists
22
Web Content Mining with Java, Loton, T 23
http://www.w3.org/DOM 24
http://pipes.yahoo.com/pipes/ 25
http://www.dapper.net/
http://www.w3.org/DOMhttp://pipes.yahoo.com/pipes/http://www.dapper.net/
-
19
of a graphical interface in which users add modules and connect them to create a
customized output from existing web pages.
The modules themselves perform various tasks affording the users control over the data
retrieved from selected URL’s. The output is then displayed as a standard html page,
which can be viewed by anyone who logs into the site.
Example creating a simple RSS feed aggregator.
The “Fetch Feed” module is used to retrieve news stories from the BBC's business feed,
this is simply connected to the output module.
Figure 5 Simple feed to retrieve RSS from the BBC
Multiple feeds can be combined in the fetch feed module, a filter module is added to
allow users to search the feeds for specific terms. The search module is added to provide
a user input on the main page.
-
20
Figure 6 Simple Aggregator to combine two feeds
A search term box is added in the above example to filter only news items of interest
from the 3 selected news feeds.
Figure 7 Output page for the aggregator with search term box.
-
21
Other modules can be used to create more complex pipes, XML data can be extracted
directly and manipulated, filtered or combined with other web sources to create useful
pages. However the application is limited to use with live data and the output is
restricted to the standard output, in addition few sources of useful data are freely
available in XML format.
2.4.4 Dapper.Net
Dapper (concatenation of Data Mapper) is another online application that allows users to
extract content from anywhere on the net and output it into various formats including
XML, JSON, RSS feeds etc. Dapper also provides a Java API allowing developers to
connect their programs with dapper to retrieve the extracted content.
Dapp’s are small retrieval applications created using the main site. Each Dapp is created
to parse a specific web page. Initially this is achieved via a virtual browser within the
site. The user interface allows web content to be selected for retrieval. In the example
below the last trade price element is selected. Each selected element can have some
basic manipulation to remove preceding or tailing strings in this case the p is removed.
Figure 8 Dapper UI showing selected content
Any number of elements can be added. Once the content has been selected the user can
add field names and the output grouped. These are reflected in the resulting XML output.
-
22
Figure 9 Preview showing output
Dapper is flexible enough to allow modifications to the content which is retrieved at a
later date. The addition of the Java API allowing external programs to interface with
Dapper makes it an ideal solution to the retrieval problem.
2.5. Data Storage
The second top-level requirement of the proposed design is the storage of the data
retrieved by Dapper. The output format from dapper is selected when the Dapp is
created and the user has several options including RSS feeds JSON and standard HTML.
Since we are using the data in another application it makes sense to retrieve the data as
XML:
2.5.1 XML
XML is a standard for data exchange and has become popular for use in desktop
applications for configuration files, as well as on the web to store and exchange data.
XML can be thought of as data about data, in that not only does it contain the actual
data but also contextual and structural information.
XML has many advantages; firstly it’s high portability between applications and cross
platform the fact that it has been a W3 standard since 1998 means a lot of applications
and application interfaces are available. For the example Dapp that we created in the
-
23
previous section the XML output would look as follows (the actual output has been
simplified to show only the elements of interest).
MSNPriceData
http://uk.moneycentral.msn.com/investor/quotes/quotes.asp?symbol=GB:BP.L
1.233
2007-07-29 15:59:25
605.00
Although there is only actually one piece of data, being the last price the Dapp gives lots
of other information within the XML document such as the source of the data, when it
was accessed and the name of the Dapp that accessed it.
The structure of XML is strict in that every start tag must have a corresponding end tag
and each document must have a single root element. In this example it can be seen that
the tag is the root and all the other tags are nested within it. This
characteristic allows logical grouping of elements in hierarchies.
2.5.2 XPath
XPath is a query language that enables the inspection of XML files. The language is a W3
standard and works on a hierarchical basis similar to a file system. An XPath navigates
through the document structure to a particular node or set of nodes depending on how
far down the tree the path goes. This adds an interesting capability to XML documents in
that they can be treated as a very simple database provided an XPath interface is
available.
In the above example we consider the following XPath expression:
//PriceData/last
The double slash at the start tells the path to start from the root node: elements the
following expressions tell it to first navigate to the element which is a child
of and then to the element which is in turn a child of
the result would then be 605.00; the content of our element.
-
24
2.5.3 Storing XML
Using XML on its own cannot provide a solution which will fully replace a relational
database; although in theory the data could be extracted and continually added to one
large XML file the problems of organization, persistence, availability, security, efficient
search and update still exist.
There is a need therefore to use a RDBMS to store XML data, a number of possible
solutions are available. One solution involves storing XML files directly as a file within the
database, however this solution disregards the logical structure of the XML files when
performing queries on the resulting table.
Another solution would be to create further reference tables to store some of the more
important structural information about a document, which can then be queried. This case
will not cope well with changes to document structure since the underlying tables will
need to be updated to reflect such changes.
Therefore to gain the full advantage of XML the document would need to be decomposed
before insertion into the database and then recompiled when it is extracted. XML
schemas could also be used to ensure the structure is maintained. Although the
database can now provide the same level of logical information as the original document
there are performance ramifications.
2.5.4 XML Databases
XML databases aim to give the best of both worlds. A native XML database allows the
storage of individual documents in collections, which can be queried and updated using
XPath and Xupdate; another standard for performing updates on xml. Collections are
more versatile than a traditional RDBMS in that they can store a set of generic XML
documents regardless of weather they contain the same structure. Collections can also
be stored within collection to provide further levels of grouping and allow queries on
multiple sources.
Apache Xindice is a Java implementation of a native XML database according to
XMLdb.org specifications. Xindice runs as a web application in a suitable container such
as Tomcat, the way in which the database is access and added to is up to the designer of
the application, since Xindice is Java based there is a substantial API to support most of
its functions although it is possible to control via a command line interface.
-
25
Because it is packages as a web app collections can be viewed via a web browser:
Figure 10 Xindice debug tool showing a collection of XML files
Xindice nicely answers our second requirement to store our retrieved data, since this is
already in XML format courtesy of Dapper. It also means we don’t need to worry about
tailoring for changes in incoming data’s structure and a handy interface is provided to
check up on the collections.
2.6. Transforming XML
The final requirement is to represent the retrieved data in a graphical format, again W3
and XML standards provide the answer. Two standards exist which can address the
problem: XSL and SVG.
2.6.1 XSL
XSL stands for Extensible Style sheet Language. XSL is to XML what CSS is to HTML. W3
continues its mission to separate data from presentation by introducing XML style sheets
or XSTL for short. XSL allows designers to dynamically change the representation of XML
data into other formats such as HTML and SVG.
Using our example output file from before we add an extra line to reference the style
sheet:
-
26
MSNPriceData
http://uk.moneycentral.msn.com/investor/quotes/quotes.asp?symbol=GB:BP.L
1.233
2007-07-29 15:59:25
605.00
In this case we want to simply display this data in a HTML file along with some other
information, the resulting style sheet would look as follows:
Latest Price
Latest Stock Price :
The style sheet uses our XPath expression to reference the content to display, the
resulting HTML file looks as follows:
Figure 11 Result of XSL transform in Firefox
-
27
The advantage of this is the separation of data from presentation, we could use the
same style sheet over and over again to display the price of different stocks.
XSL requires a parser to transform XML data, most browsers support this as standard so
that XML can be styled on the client side to provide the desired result in the case of
Firefox Expat is used. It is also possible to style the data on the server side using a third
party parser such as Apache Xalan before passing the resulting transformed document to
the client, in which case they would simply receive the HTML representation.
2.6.2 SVG
SVG stands for Scalable Vector Graphics26 another W3 standard, which extends XML to a
graphical format. SVG aims to address some of the current issues with web-based
images such as file size and varying screen resolutions. Vector Graphics are images
generated from a series of vectors drawn between defined co-ordinates. The relevant
data required to draw the image is stored as XML mark-up using SVG tags. One
advantage of this format is the ability to scale the image without loss of quality or
pixelation. One draw back to this technology is the need for a plug-in to be installed
within the client browser, although Firefox and Opera support SVG as standard IE still
requires the Adobe plug-in.
A further advantage being that SVG is part of the W3 recommendation so it can be
coupled with XSL to generate graphics from XML making it ideal for representing
numerical data graphically.
Again looking at our previous example we use the same XML/XSL combination to draw a
simple box that represents the price of a security. The XSL to achieve this would be as
follows:
26 http://www.w3.org/TR/SVG11/
http://www.w3.org/TR/SVG11/
-
28
Resulting SVG output, albeit not very interesting the dimensions of the box in this case
are determined by the stock price divided by 10:
Figure 12 Simple box representation of a stock price
The XSL is slightly different in this case because we need to use our data as a value
within the SVG mark-up to do this an XSL variable can be used to temporarily store the
data so it can be used in the transform.
The use of XSL XML and XSL is a nice set of standards, which answer our data
presentation problem, once a suitable XSL template is created it can be reused wherever
required.
2.6.3 XForms
Xforms (short for XML forms) is one of the latest standards from W3 pitched in there
own words as the latest generation of Web Forms to replace the outdated HTML form27.
Xforms aim to make the task of creating web forms easier with many of the standard
tasks involved in such an exercise incorporated into the specification; retrieving and
saving data from local files, validation of user inputs and dynamic content are just a few
examples. One of the main advantages of Xforms however is the ability to access and
update XML content and provide logical bindings between data, even in separate XML
files. Xforms also aim to provide a better user experience with some AJAX like
functionality built in.
Xforms are written in XML using Xform tags they access content in other XML files using
the concept of bindings along with XPath to navigate the documents. A further benefit is
the ability to make asynchronous submissions from the form without any laborious
Javascript.
27
http://www.w3.org/TR/xforms/
http://www.w3.org/TR/xforms/
-
29
The following example illustrates a simple Xform:
The XML file from the previous example is used to write an Xform to give a user access
the data:
My First Xform
My XML Data
Data:
Save
The above form will appear as follows in an Xform enabled browser:
Figure 13 Our price data from before appears but can now be edited.
With Xforms the designer defines a model representation of the data, this can be
programmed directly into the form or referenced from an external file as in the above
-
30
example. Here the submission element tells the forms processor to save the file to the
local file system as data.xml.
A further advantage of Xforms is the fact that it can access any other XML standard
mark-up such as XSL. Coupling these two standards together enables the form to not
only access XML data but also manipulate XSL files and therefore change the resulting
SVG output.
3. Specification
Given our initial requirements have been identified and relevant technologies researched
the problem can be assessed in more detail, to generate further requirements the
problem statement can be updated and stakeholder analysis revisited.
3.1. Problem Statement
The initial problem statement was the retrieval and representation of data from the
Internet. We now have to consider how this will be solved using the technologies
identified. Specifically the inclusion of Dapper adds additional functions and requirements
from the point of view of administering and running the retrieval.
3.2. Stakeholder Analysis
The initial stakeholder analysis identifies the primary, secondary and tertiary
stakeholders:
Primary Stakeholders: Administrators
User Profile: The administrators could also be private investors. At present the
assumption is made that some external management is required for the site
whether this is by the investor using the site or a third party.
Role: Ensure errors caused by external factors such as server downtime, changes
to site structure are dealt with, input will be required to respond to such problems
and update Dapp’s as necessary.
Goals: Browse web sources for relevant information. Identify information, which
is of interest. Manage and update Dapp's. Build up and maintain collections of
data sources. Schedule tasks for Dapp's to perform. Manage collected data.
Secondary Stakeholders: Investors and End Users
-
31
User Profile: Investors and front end users who will access the date retrieved via
the graphical interface.
Role: The data that is retrieved and the format that it is eventually stored in will
affect the people who use that data. Investors want information as soon as it is
available and spending time searching for this information is costly both in terms
of investors time but also in their ability to make informed decisions.
Goals: Gain an overview of all relevant information relating to current or future
potential investments. Select the output format for the data. Filter and Search
data. Extend administrator goals.
Tertiary Stakeholders: Content Owners
User Profile: Web masters and web content owners
Role: Maintaining web pages and content
Goals: Attract users to their sites and in some cases generate revenue through
advertising or subscription
3.3. User Goals
High-level goals identified from problem statement and stakeholder analysis are used to
define top-level use cases.
Manage and update Dapp's.
Build up and maintain collections of data sources.
Manage collected data.
Select different views of the data.
Filter and Search data.
-
32
3.4. Use Cases
Figure 14 Use Case Diagram
4. System Design
4.1. Methodology
The Design Methodology used is a top down modular approach to development. Starting
with high-level use cases the interfaces and main functionality are determined; from
here functional requirements are elicited as separate modules based on their intended
tasks. The previous sections have outlined the various specifications available to answer
our three top-level requirements, Data retrieval, storage and transformation. These
standards follow a strict Model View Controller paradigm as such it makes sense to
extend this to the whole application.
4.2. Required Technologies
-
33
Before development some base technologies are required to support the system. Xindice
runs as a web application on a suitable container, in this case Apache Tomcat is chosen.
Once Xindice has been downloaded and unpacked it is deployed to Tomcat and tested
using the appropriate URL, in this case: http://localhost:8282/xindice/?/db.
The top level collection in Xindice is called db, the question mark indicates the debug
page which is automatically loaded when Xindice is accessed using the base URL. This is
the only user interface provided as standard for Xindice. XML files can be viewed via this
tool but not added or manipulated.
Xforms and SVG cannot be viewed on all browsers by default. Extensions are required
for most to support these standards and the level of functionality supported differs
between implementation. As such Firefox is chosen since it provides good support for
SVG and the Mozilla Xforms extension implements most of the Xforms 1.0 functionality
despite still being in a development stage.
To aid development the eclipse IDE is used since it supports most of the standards used
with the exception of Xfoms and SVG. Firefox provides an error console, which is useful
for debugging XML content as such it can provide useful feedback on SVG, XML, XSL and
Xform errors.
4.3. Decisions
Although much of the necessary functionality can be implemented on the client side
using browser extensions, a back end is still required to interface with Dapper and
Xindice.
Java Servlets were chosen to address this requirement partly due to the Java API
support for both Xindice and Dapper but also because java has plenty of XML and DOM
API’s to allow XML data handling. Xforms can post data as XML files direct to the server
to handle these files the server side application must therefore be able to access and
manipulate XML.
4.4. Modules
To simplify the design process the application is split into smaller modules each
addressing a specific function. Keeping with the grouping used so far the main functions
are data retrieval, storage and presentation.
-
34
4.4.1 Dapper Module
Much of the data retrieval requirement has been addressed by Dapper, however the
Dapper.net provides a means to create the Dapps but not control their execution. The
Dapps themselves only have the ability to execute for one URL at a time, our
requirement is to extract data from different sources but also for multiple pages in the
same resource, this involves specifying parameters directly within the URL.
Our first requirement is therefore a means to execute a Dapp and specify a URL for it to
work on. From our Use cases we also have the requirement to maintain collections of
resources for the Dapp to retrieve from. Finally there are two requirements to select the
Dapp to be used and specify the storage location for the data.
The Dapper API unfortunately is very basic and looks incomplete. As such the interface
options for Java are limited so much of the above functionality needed to be
implemented.
Using our Model View Controller ideal the functionality is divided, first implementing the
data aspect of our module an XML file is created to store the Model view of our Dapp.
Unfortunately the Dapper API does not provide an obvious means to elicit certain
parameters from the site. The model will therefore be a means to represent each Dapp.
From the initial requirements the following information needs to be stored:
Dapp Name
Storage Location
Collection of resource locations (URLs)
Using XML as a storage medium in this way not only makes sense because XML support
is required for the other aspects of the design so extra effort is saved on implementing
another means to store the configuration data.
The View aspect will be taken care of by Xforms again to take advantage of the XML
standards and functionality on the client side. With Xforms users can manipulate the XML
files to address our requirements to add and update lists of resources,
The above will provide a nice interface for some XML but won’t actually do anything so
the controller aspect is required. Since the storage medium for the retrieved data is
Xindice, which needs to run on Tomcat, a servlet container, it is logical to use Java
servlets for our back end functionality.
-
35
4.4.2 Database Connection Module
To access Xindice methods are required to first of all connect to the collection and
perform queries on the data, again a servlet module will be used for the controller
aspect. The data is stored in collections within the database, the top-level collection db
contains database specific files such as Meta information and should not be used to store
content, as such collections need to be created. Our first set of requirements is therefore
to provide the ability for users to create collections within Xindice.
Once data is retrieved by Dapper it will need to be inserted into a specific collection,
although the data retrieval task is handled by the Dapper module the retrieved data will
be passed to the database connection for insertion. Xindice allows the programmer to
specify a unique id for the document being inserted into the database, however since we
will be querying the XML content directly using Xpath having a suitable system for
identifying documents by their id is not required. In addition Xindice has a mechanism in
place to automatically assign unique ids to files as they are added which saves some
development work.
Finally a requirement exists to query the collections, the Xindice API provides a query
engine, which accepts an XPath string as input, the issue will therefore be to provide a
suitable interface to the user that can be translated to an Xpath query whilst being user
friendly.
To provide the XSL functions a server side function is proposed to work with the other
servlets. A third party parser is required such as Apache Xalan to achieve this. Although
the browser can take care of processing XSL some extension functions may be required
to provide more robust support for numeric processing, many extension libraries exist
which can be used for this purpose.
From these initial design considerations a conceptual architecture was drawn up showing
the relationship between the various components.
-
36
4.5. Proposed Architecture
Figure 15: Proposed System Architecture
5. Implementation
The implementation approach was again top down, first of all creating the user interfaces
to address user requirements then developing the Java code to accommodate the
intended functions. The final implementation differed slightly from the original design
concept as problems and improvements were discovered in through the implementation
phase of the project.
5.1. Issues and Design Changes
In the initial concept a couple of changes were made: firstly it was initially envisioned
that the final system would behave much like a real world web application with login
details and user specific preferences. It was decide however that this kind of
-
37
functionality did not add any major benefit to the project nor did it help achieve the
initial goals.
The second change was to the retrieval aspect, it can be seen from the conceptual
architecture that the intention was to allow XSL transformations to be made on the data
before being inserted into the database; unnecessary data could be removed and
additional information added to improve document retrieval. It was later decided to drop
this function since the benefits would be minimal.
5.2. Module implementation
As with the overall architecture some changes were made during the development
process to accommodate new information as it became available. The implementation of
each module is discussed in detail.
5.2.1 Utilities Package
The utilities package was added to provide some basic functions to each of the other
modules rather than repeating code. Two core functions that both the database query
and dapp manager classes would require was the ability to access Xindice and
manipulate XML documents using DOM4J.
The Database Connector class provides basic database functions such as connection,
collection discovery, insertion and retrieval of documents. Queries are also execute via
the Database connector by passing an XPath string expression to the executeQuery()
method. Although no DOM standard implementation is favoured by any of the APIs
DOM4J was chosen because of the range of available functions. Within all the modules
-
38
XML files are manipulated or passed as DOM4J implementations of the Document
interface.
There were few issues with the database connector because much of the functionality is
available via the Xindice API and little additional functionality had to be coded.
The XML helper class was implemented to carry out the XML document processing which
became a common requirement between classes. The class handles saving, reading and
converting XML between formats.
During the development process it became clear that the generic typing of the retrieved
data by Dapper was going to cause problems with XSL. Some of the SVG transforms
required numerical data without any formatting information included. For example 1000
is represented as 1,000. To ensure the data retrieved is suitable for use with XSL,
regular expressions and additional data validation had to be added. As such the retrieval
process became more complicated. The solution was to add a user defined content type
field to the admin page so that users could specify what kind of data they were
expecting to retrieve. The selection is used to perform regular expressions on the input
strings to ensure the data will work with XSL.
Another feature of Xindice is that it can be used to update the XML files contained within
a collection using XUpdate, it would have been more elegant to store the relevant
configuration files in a separate user collection within Xindice. The issue with doing this
is that frequent database queries would need to be made because of the dependency
between Xforms and the XML files. As such it was decided to keep the files static on the
server and manipulate the documents using the XMLHelper class as such methods were
added to add and remove nodes sets from the document.
5.2.2 Dapp Manager Package
-
39
Despite the functionality provided by Dapper the execution and management of the
Dapps became more involved that expected. A Dapp implementation class was required
to store and manage the data sent via Xforms, an additional URLlist class was
implemented to manage the list of variables being used for retrieval. Finally a servlet is
used to access the objects.
The XML data submitted by the Xform is used to instantiate the DappImplementation
and Urllist objects. A base URL and a list of variables is specified by the user and stored
in the dapp configuration XML file. Once submitted, the URLlist class is responsible for
generating URLs and keeping track of the current progress. Replacing a predefined
marker in the base URL with a variable creates the URL as follows:
http://uk.moneycentral.msn.com/investor/quotes/quotes.asp?Symbol=GB:{var}
becomes
http://uk.moneycentral.msn.com/investor/quotes/quotes.asp?Symbol=GB:TSCO
The URL can now be passed to the Dapp for retrieval, once the Dapp has executed the
resulting XML file is validated to see if the retrieval was successful and if the data is valid
against our regular expression list. A copy of the output data elements is queried during
validation and added to the configuration file, this was not part of the original
functionality but later added to support the use of regular expressions for typing data,
the added benefit of this is that it can be used as a list of query parameters for the
Xforms interface.
-
40
It was the intention to use the URLlist class as a progress reporter for the Administration
interface. At present a particularly long list of variables will take a while to execute. It
would be desirable to provide feedback to the user on its progress, which could be
achieved by a servlet and javascript to periodically query the list. However this feature
was omitted due to time constraints.
5.2.3 DBQuery Package
The Database Query package contains the classes required for not only the database
queries but also the XML transforms. The two were packaged together because they are
both used from the same Xform interface.
The function of sorting and querying the data could be achieved in a number of ways. It
was decided that the styling function should be kept separate from the database query
function. One solution would have been to send the result of the database query direct
to the client with a reference to the relevant style sheet and allow the users browser to
perform the transform. This solution however means that every time a user makes a
change to the way in which a document is styled they need to submit another query to
the database in addition to parsing and styling the output again. The fact that the client’s
browser only sees the result of the transform prevents inspection of the output by the
interface, which can aid some of the context information. The decision was made to keep
the data retrieval and styling tasks separate.
When a query is submitted, the DBQuery servlet builds an Xpath expression from the
user input and queries a specified collection. The resulting XML output is not sent to the
client but stored on the server. Once complete a submit is triggered automatically to the
Data Styler servlet which then transforms the output file into and SVG document and
again stored the result on the server. The interface reads the SVG direct from this file
-
41
the advantage of this set-up is that changes to the style sheet can be performed on the
database output without submitting another query to the database. The added
advantage is that the database output is now directly accessible to the Xform provides
additional functionality such as listing available query parameters.
5.2.4 Style sheet & Icon Design
The style sheet design proved to be the most difficult part of the implementation. The
task was to provide a set of predefined graphical representations that users could select
and manipulate via the main interface. As we have seen XSL provides a mechanism to
use XML data to transform graphics, this can be any SVG parameter, dimension, colour,
opacity, shape etc.
The concept of an icon is used to represent an individual data entity, in this example we
are looking at individual shares as extracted from Dapper. The icon provides a graphical
representation of 1 or more pieces of data. The number of parameters differs between
each icon, the simpler ones only displaying one piece of information as a change in one
of the graphical aspects of the design.
The problem with this concept is providing a context for comparison. Because the data
can be over an infinite range we need some base to compare each item to. To get round
this problem each variable is presented as a percentage of the groups maximum. For
example if we want to display the last price the style sheet first needs to know what the
maximum price is in the data set.
This is achieved through he use of exstl:math a set of extension functions which can be
used in addition to XSL. In this case the math:max function is used to determine the
maximum value of an element in a node set. Once this value has been calculated the
individual elements can be compared to it to work out where they are placed on the
scale. Since this value can now be calculated as a percentage the style sheet can
calculate a corresponding percentage of a graphical value. In figure 16 the opacity of
each box represents each elements last price in relation to the maximum price of the
data set being viewed.
-
42
Figure 16 simple box icon designs similar to the heat-map concept.
This example is fairly simple and essentially the same as a heat-map to provide more
interesting graphics more complex icons needed to be designed using the same
principle.
To allow representation using different icons without lots of server side processing the
functionality of Xforms is taken advantage of again. As we have seen Xforms can access
any XML based content and as such can access and modify XSL. The basic parameters of
each icon are stored in the style sheet as global parameters. By implementing a simple
input control with a reference to these parameters the shapes can be manipulated.
In figure 16 three global parameters are available; zoom, resolution and text size. The
zoom function simply scales the graphics up by increasing the relevant dimensions. SVG
proves its usefulness, as the graphics remain crisp no matter how much a user zooms in.
The text size parameter is self-explanatory although the same function can be achieved
via most browsers.
Finally the resolution parameter is added as an exaggeration function. In figure 17 a
slightly more complex graphic is illustrated. This time the icon displays the difference
between two parameters as a sloping line. On initial testing of this model it became clear
that for some data sets the differences in slope was negligible for some stocks, making
-
43
distinction between icons difficult. To address this issue an exaggeration parameter was
added to allow the user to multiply the slope by a certain factor making small differences
more visible.
Figure 17 Rate of change icon showing the difference between two variables
The icons themselves are based on separate templates within the style sheet. The
selection of the icon to be used is achieved via the Xform interface and a binding to the
template reference. This allows the user to select any template from the list.
To improve usability the style sheet interface needed to be dynamic from the point of
view that the number of available user specified variables for each template would differ
and as a consequence would have different meanings in the context of the current
template. A separate style sheet configuration file is provided to the Xform and bound to
the style sheet the result is that the Xform now knows what controls to display when.
Looking at figures 16 and 17 it can be seen that the number of inputs available and the
labelling of these inputs differ between icons.
The positioning of the icons on the screen was another problem, which took considerable
time to resolve. The dynamic nature of the icons and scalability meant hard coding the
positions on the page was not an option, as such each position needed to be calculated
based on a stating point the size of the icons and the screen width.
-
44
5.3. Final System Architecture
Figure 18 Final system architecture showing changes
5.4. Interfaces
The final interfaces were partly governed by the functional requirements but also
constrained by the capabilities of Xforms. As mentioned earlier the Administration Xform
had a few additions to support the specification of basic type information for the
retrieved content. Xforms can be styled using CSS in much the same was as HTML the
process is not quite as straightforward and again relies on the browsers support. As such
only basic styling was used mainly for positioning elements on the user interface.
Another of xforms advantages is the ability to dynamically display content and controls
without making requests to a server. This function is used on the admin interface to
provide a page style navigation through the various Dapps the user has created; the left
-
45
and right arrow icons navigate between the Dapps updating the relevant fields. This
ability is also demonstrated by add and remove controls, which allow the user to add
new Dapps or variables and likewise remove them. The changes made by the user still
need to be saved, if the dapp.xml file was stored on the local file system this would be
easy through the use of Xforms built in put submission, since we are running from a
server we need to submit the XML and use a servlet to save the changes, although this
is not a perfect solution Xforms makes the submission asynchronously so the user is not
affected too much.
Figure 19 Admin interface showing Dapp configuration data
There are some outstanding issues with the Xpath navigation, the original intention
being that the Xform should be able to identify or generate the required Xpath to a
specific element by inspecting the database output and the dapp configuration file. The
Xform cannot handle groupings of data in this way and some additional path information
needs to be added by the user. For example we ideally want the user to be able to enter
any parameter that is available as either a search parameter or a styling parameter. This
information is taken from the Dapp and output XML documents stored on the server.
These elements only store the lowest level element names and as such cannot be passed
as a useful xpath parameter since we require the full path, in this case we need to first
access the parent element of Fundamentals first. This is perhaps an oversight in the
design but some modifications can be made to rectify this by adding more contextual
information to the dapp.xml file.
-
46
Figure 20 A slightly alternative presentation approach where the width of the ring signifies a value
It can be seen in the above illustration that two sets of submission controls are provided
to the user one for the database query and one for the data styler. The svg result is
loaded automatically from the server into a separate iframe, when changes are made to
the style sheet the xform waits until the submission is complete then refreshes the frame
to update the graphic.
6. Testing
In order to test how effective the design is at fulfilling the requirements the testing is
divided into 3 categories: Component testing and usability tests to determine how well
functional requirements are met. To test the initial hypotheses that a graphical interface
will be of advantage to an investor speed accuracy tests are carried out.
6.1. Component Testing
On the software level unit tests were carried out on each component to ensure they
achieve the desired functionality. Each functional requirement is tested in turn to ensure
the final design satisfies the original specification.
-
47
6.2. Usability testing
After sufficient testing of the base components was completed the User interface had to
be tested to determine how effective the design is in terms of usability and also to
determine if the solution provides proof of concept.
To test the usability of the system an observational approach was taken based on
Nielson’s 5 quality attributes28:
Learn-ability: How easy is it for users to accomplish basic tasks the first time they
encounter the design.
Test subjects were not given any background on the program and asked to try and
interact with it. They were also asked to describe what they were thinking and any
assumptions they had about the interface. The observer did not respond to any direct
questions at this point in order to gauge how effective the interface was at
communicating functionality.
Efficiency: Once users have learned the design, how quickly can they perform
tasks.
After the initial tests users were given the opportunity to ask questions to get a better
understanding of the interface, they were then asked to repeat specific tasks in order to
assess how easy it was to perform specific functions.
Memorability: When users return to the design after a period of not using it, how
easily can they re-establish proficiency?
Test subjects were at this point asked to return to the program after a period of time in
order to assess how easy it was to remember the affordances of the interface.
Errors: How many errors do users make, how severe are these errors, and how
easily can they recover from the errors.
An observational approach was again taken to note any mistakes the user made and
their impact on the system.
Satisfaction: How pleasant is it to use the design.
Finally test subjects were asked on a scale of 1 to 10 how pleasant they felt the interface
was to use.
-
48
6.3. Speed & Accuracy Test
To test how well the system answers the initial problem an assessment is made of how
well users can gain insight into the data being represented by the system. To test this
two factors were investigated; speed and accuracy.
Experimental Set-up
A set of 100 shares representing the FTSE 100 was selected, the data set being a
representation of the market on a specific date. For each date test subjects were asked
to identify a value in the set firstly on the graphical interface then on a plain text
representation of the same data.
The ordering of the symbols was changed in between tests to ensure subjects didn’t
memorize the positioning of a particular stock. Subjects were timed to see how long it
takes to identify a particular value and then assessed on how accurate they were.
For the first two tests the relative size graphical icon was used. This representation
changes the size of the icon relative to a specified value.
Figure 21 Relative Size box graphic
28
http://www.useit.com/
-
49
In the first test users were asked to identify which icon they thought represented the
highest and lowest value in a collection.
For the second set of data test subjects were asked to identify trends based on the daily
price movement. For this test the Two Variable Box representation was used
Figure 22 Two variable box graphic showing the relative difference
Correctly identify steepest trend up and down in a collection
Figure 22 shows the basic two variable icon, the rate slope of the line indicates the
difference between the specified variables.
To test the effectiveness of this design users were asked to look at the graphic and
identify which stock they thought was falling the fastest and also which one they thought
was rising the fastest.
The decision times were timed in all cases as a comparison to timings gained using a
text only representation.
-
50
Figure 23 Two variables relative to a third.
Correctly identify value the indicator near its highest and lowest extreme.
Figure 23 shows one of the more complex icon designs, similar in concept to the
candlestick the diagram it aims to show the direction and rate of change of the daily
price in relation to its year to date high.
As with the previous icon the slope of the line indicates the rate of change and the colour
re-enforces its direction. The position of the line in relation to its container box signifies
how close the current price is to the highest value it has been over the past year.
To test the effectiveness at communicating this information users were again timed and
asked to identify the stock they think is closest to its year to date high and furthest
away.
Figure 24 shows the text only interface, which was implemented as a style sheet
template in order to keep the surrounding interface the same and change as few test
variables as possible. The above tasks were all repeated on this interface again changing
the sorting order of the data to avoid test subjects memorizing data locations.
-
51
Figure 24 Text only representation of a variable
6.4. Results
Usability Test.
Learn-ability: after observing a set of 5 subjects it became clear that more
contextual information was required for the controls. One test subject commented
that it is not immediately obvious what function some of the controls performed.
Another issue was the openness of some of the controls, for example the zoom
control can be set to any value the user wants and it is not immediately obvious
how large that will make the icons.
Efficiency: On an initial attempt with no instruction some users had difficulty
working out what the controls did, however after a quick demonstration most
could manipulate the data confidently.
Memorability: After a day the users were asked to return to the interface and try
out some basic tasks to see how easy it was to repeat. Most users achieved this
-
52
task successfully and the main issue seemed to be the initial usage of the
interface.
Errors: The most common errors the users made was to either compare
parameters which were not suitable for any logical comparison and selecting
scales that were caused excessive distortion of the graphics. The first issue is
hard to rectify since the user can define any data source an assumption is made
that they will pick resources suitable for comparison. The second issue can be
rectified by the addition of stricter limits to the interface.
Satisfaction: The overall satisfaction rating was 6 out of 10 from our 5 test
subjects. There is evidently room for improvement on the interface however some
of the test subjects had no prior knowledge of stock market trading and as such
the overall purpose and context of the application was new to them.
Speed-Accuracy Test.
The results of the speed and accuracy test were more promising with 80% of the test
cases the users speed of decision-making was faster than using a test only interface.
The accuracy figures were however less conclusive, with both textural and graphical
accuracy rates of 60%. We would expect the accuracy rates to be around or lower for
the graphical representations because they are not as definite as numerical figures.
The test group used could have been larger and more testing in this area is needed to
make definite conclusions on the effectiveness of the interface, however the initial
results tend to agree the hypothesis that a graphical system is better in terms of gaining
quick insights into large sets of data.
7. Conclusion The application displays a basic answer to the initial requirements albeit a simplified one
but could easily be extended to give a wider range of functions, in its current state it
demonstrates that by using the available web standards a flexible system can be
developed which allows data to be retrieved transformed and represented on the web.
Our test results prove that the system can effectively impart large amounts of
information quickly to the viewer; however further work is required to improve the user
interface, mainly in the areas of contextual information.
To expand the system a user can easily add any content they like provided Dapper.net
could extract it successfully. There are limitations to the data that can be viewed and the
graphical icons that are displayed. Going forward it would be beneficial to provide
-
53
another interface, which allows users to create icons based on the retrieved data to
create personalized graphical representations.
8. Bibliography Cleveland, Williams S: Visualizing data, Murray Hill, N.J. : At&T Bell Laboratories ; [Summit, N.J. :
Published by Hobart Press, c1993]
Ellinger, A. G: The art of investment, 3rd rev. ed. Bowes and bowes, 1971
Harris, Robert L: Information graphics: a comprehensive illustrated reference
New York : Oxford University Pr