PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting,...

55
PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD Using a combination of open source and custom tools for fun and profit Presented by: Robert Krten [email protected]

Transcript of PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting,...

Page 1: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Getting, Managing, and Analyzing Stock Market

Information with FreeBSDUsing a combination of open source and custom tools for fun and profit

Presented by:

Robert [email protected]

Page 2: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

2PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Introduction

This presentation shows:– how to download a large variety of equity and

option data from various sources on the internet,– how to manage the data (parsing, archiving,

etc., currently >2 GB), and finally,– how to present the data to applications with a

focus on efficiency and access speed.

Page 3: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

3PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Introduction

Public domain / open source tools like curl and lynx are highlighted, as well as my own custom tools.

The entire database schema is presented, and then the use of mmap() is shown for complete efficiency.

Page 4: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

4PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

stockproc

stockstore

lynx | sed

htmlparse

telenium.cahttp

m-x.calynx

americanbulls.com

curl

eoddata.comcurl

flat ASCIIstock database

flat ASCIIoptions database

flat ASCIIrecommendations database

regen_stockdb

Masterbinary

database

Big Picture – Data Gathering / Storage

Page 5: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

5PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Masterbinary

database

Big Picture – Data Consumers

opt

stockdb library

Stock quotes, option chain selection, recommendations printouts

autotrader

stockdb library

Automated trading analysis

sfa

stockdb library

Graphing, trending, filtering, etc., (xgraph output compatible)

Page 6: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

6PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Data Storage – Top Level Directory Tree

/

data

dynamic generated

financial financial

binarydatabase

(350MB as of Jan ‘07, 569MB as of Apr ’07)

americanbulls amex mse tse vse… …

We’ll examine the americanbulls, mse, and all other directories over the next three slides.

Page 7: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

7PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Data Storage – American Bullsamericanbulls

20070111a.html.gza.txt…

z.html.gzz.txt

20070112a.html.gza.txt

z-w.html.gzz-w.txt

a-w.html.gza-w.txtz.txt

In the americanbulls directory, I store data based on the day that it was acquired. I store both the raw html file as captured from the website, as well as the processed text file. Notice on 20070112, which was a Friday, I stored the weekly data (as identified by a –w suffix).

tse nyse

……

Page 8: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

8PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Data Storage – Options

ARCHIVE

2007

01

A.20070101.htmlA.20070102.htmlA.20070103.htmlA.20070104.html…ZXG.20070109.htmlZXG.20070110.htmlZXG.20070111.htmlZXG.20070112.html

a abx abz ace … zxg

a/aa.06a/aa.07a/ae.07a/af.07a/ag.06a/ag.07…

xf.06xg.06xl.06xn.06xp.06

mse

2006…

… … … …

Individual call/put data files for the given underlying equity

backup(source)files The names are based on the

standard strike names at the exchange. So, “ag.06” is a January call (“a” indicates January and call), “g” indicates the price (via a table lookup) and “06” indicates the year.

Page 9: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

9PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Data Storage – Stocks

tse

a b…y z

……

BA.UABA.UNBAABAC.UABAC.UN…

Individual stocks are stored in an uppercase filename corresponding to the stock symbol. One line of text per day, containing the stock trading data.

20070413 ABX 33.120 33.480 32.820 33.360 0.000 2689766 1 20070413 Barrick Gold Cp

Date Symbol Open High Low Last Change Volume #trades Date Full Name

Page 10: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

10PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Data Sources

I get data from the following:– www.telenium.ca (free)

• end of day data for TSE and VSE (not any more)

– www.eoddata.com (subscription)• end of day data for most exchanges• not used any more, unreliable splits

– www.americanbulls.com (subscription)• recommendations for AMEX, NASDAQ, NYSE, TSE,

and VSE

– www.m-x.ca (free)• options for TSE-traded equities

Page 11: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

11PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Data Timing

Each data source is ready at a different time:– telenium.ca – 18:00 CST– eoddata.com – 21:00 EST– m-x.ca – no specified time, but after 18:00 EST

works– americanbulls.ca – no specified time, but

after 21:30 EST works

So, the first tool I needed is “waituntil”, which just waits until the specified time. E.g., “waituntil 21:00”

Page 12: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

12PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

telenium.ca

These guys have a fairly straightforward data format:

Which we can parse easily with a custom C program(this is for historical reasons)

Page 13: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

13PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

eoddata.com

Dead simple. It stores it in CSV format!NA,20070112,63.93,64.46,63.77,63.98,353106

NA.NT.J,20070112,9.75,9.75,9.75,9.75,500

NA.NT.K,20070112,11.6,11.6,11.6,11.6,1800

NA.PR.K,20070112,27.69,27.69,27.45,27.45,5900

NA.PR.L,20070112,25.82,25.9,25.8,25.9,9800

NAC,20070112,1.54,1.55,1.54,1.55,22000

NAE.UN,20070112,11.72,12.13,11.7,11.91,515736

NAL.UN,20070112,25.85,26,25.8,25.85,72903

Unfortunately, their splits information is not reliable, so I don’t use them any more.

Symbol,Date,Open,High,Low,Close,Volume

Page 14: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

14PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

eoddata.com

FTP can be easily accessed with curl:

(this is all one line):curl -s -S -u $eoduser:$eodpass -O ftp://ftp.eoddata.com/"$I"_"$year2$month$day.txt

The variables are:$eoduser – the username

$eodpass – the password

$I – the single character uppercase root symbol

$year2 – a two digit year (“07” for 2007)

$month – a two digit month (“05” for May)

$day – a two digit day (“22” for the 22nd)

This is wrapped in a for-loop in the shell…

Page 15: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

15PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

eoddata.comexchanges=(amex cbot nasd nymex nyse tse vse)

ftpcmd="curl -s -S -u $eoduser:$eodpass "

for i in ${exchanges[@]}; do

I=`echo $i | tr "[:lower:]" "[:upper:]"`

ftpcmd=$ftpcmd" –O \

ftp://ftp.eoddata.com/"$I"_"$year2$month$day.txt

done

$ftpcmd

This simply iterates across all the exchanges, and constructs a large string which is executed at the end. I concatenate all the FTP commands into one string to make it more efficient. (We’ll look at the curl command line options shortly.)

Page 16: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

16PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

americanbulls.com

There’s a fair bit of data here:

Page 17: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

17PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

americanbulls.com

I want just the analysis data:

– symbol name– number of stars (strength of recommendation)– recommendation (buy, sell, buy-if, etc)– front page recommendations (top stocks)

(note added 20070613):I now get the stock data from here as well, as eoddata.com has proved unreliable in their TSE/VSE split info.

Page 18: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

18PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

americanbulls.com

Much harder to work with:– since it’s subscription based, need to log in– need to suck up 6 exchanges with 26 (a-z) data

sets (156 total)– Need to get the “weekly” recommendations, but

only on Friday (doubles the amount of data from normal weekday to 312)

– Data format needs massaging– It’s not quite a “stock” nor is it an “option”, it’s a

third type…

Page 19: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

19PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

americanbulls.com

Logging in – with curl (all one line):

curl -s -S -L -c $root/americanbulls/cookies.txt -d "zUserID=USERNAME&zPassword=PASSWORD&CookieAtayimmi=1&submit1" http://www.americanbulls.com/Check.asp

This logs us in for the session. Let’s look at the -s, -S, -L, -c, and -d options (we used some of them in the eoddata.com example too)…

Page 20: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

20PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

americanbulls.com

Option Meaning

-s silent operation (I don’t want all the verbosity cluttering up my cron logfile)

-S show errors (overrides -s, but only if there is something wrong)

-L let curl follow pages to different servers

-c location of cookie jar

-d send data as if user had sent it via POST

Page 21: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

21PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

americanbulls.com

Fetch one set of data from one exchange:

curl -s -S -m 120 -L -b $root/americanbulls/cookies.txt "http://www.americanbulls.com/members/StockList.asp?MarketTicker=$I&Tick=$J" >$base.html

$I is the exchange name, e.g. “TSE”

$J is the starting letter, e.g., “N”

$root is the financial information directory base

$base is the name of the html file

Page 22: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

22PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

americanbulls.com

Finally, we convert the HTML to clean text:

lynx -dump -width=200 $base.html | sed -n "/Ticker.*Description/,/^$/s/ *\[[0-9][0-9]*]//p" >$base.txt

The result of this is (e.g., “n.txt” file):NA NATIONAL BANK OF CANADA [Star4.gif] 63.8300 63.9300 64.4600 63.7700 63.9800 0.23% 353,106 BUY-IF

NAC NORTH ATLANTIC RESOURCES LTD [Star2.gif] 1.5600 1.5400 1.5500 1.5400 1.5500 -0.64% 22,000 BUY-IF

NAEu NAL OIL AND GAS TRUST TR .. [Star3.gif] 11.7500 11.7200 12.1300 11.7000 11.9100 1.36% 515,736 BUY-IF

NALu NEWALTA CORP INCOME FUND T.. [Star4.gif] 25.8700 25.8500 26.0000 25.8000 25.8500 -0.08% 72,903 BUY-IF

NApK NATIONAL BANK OF CANADA N-.. 27.4900 27.6900 27.6900 27.4500 27.4500 -0.15% 5,900 WAIT

NApL NATIONAL BANK OF CANADA N-.. 25.9000 25.8200 25.9000 25.8000 25.9000 0.00% 9,800 WAIT

NAS NORTHSTAR AEROSPACE INC [Star2.gif] 4.8500 4.8500 4.8500 4.8500 4.8500 0.00% - SELL-IF

NB NORTHBRIDGE FINANCIAL CP [Star2.gif] 30.5300 30.4500 30.8800 30.4500 30.5800 0.16% 49,036 HOLD

NBD NORBORD INC [Star3.gif] 9.2900 9.3500 9.9100 9.2500 9.7000 4.41% 1,020,875 HOLD

NCFu NORCAST INCOME FUND UTS [Star3.gif] 7.0000 7.0100 7.0300 7.0100 7.0300 0.43% 4,175 WAIT

NCS NUCRYST PHARMACEUTICALS CP 6.3000 6.3000 6.3000 6.2400 6.2500 -0.79% 1,320 SELL CONF

This gives me all the information I need; the ticker, the number of stars, and the recommendation.

Page 23: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

23PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

americanbulls.com

Also, I get the “front page” recommendations:

This information is parsed using the htmlparse utility (discussed later) and stored as a text file alongside the other American Bulls data as fp.txt and fp-w.txt (for weekly data).

Page 24: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

24PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

m-x.ca

Finally, the options data from the Montreal Exchange:

Page 25: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

25PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

m-x.ca

Unfortunately, the HTML that contains the information is frightfully ugly – the instrument name is buried!<tr class='paire' title="header=[07&nbsp;JA&nbsp;27.500 ABXAA (27.50)] body=[Open Interest: 208] delay=[5] fade=[off]">

<td><a href="nego_cotes_in_en.php?symbol=ABX&insSymbol=ABXAA27.50&fixeDate=2007-01-12&#cote" class="lienblack">+&nbsp;07&nbsp;JA&nbsp;27.500</a></td>

<td>6.600</td>

<td>6.750</td>

<td>6.550</td>

<td>0</td>

<td>73.35</td>

Page 26: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

26PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

m-x.ca

So, the solution was a custom HTML parser (called “htmlparse”), which constructs an internal tree representation of the HTML document.

Then I just walk along the document, picking out items of interest.

1,600 lines of C, not outrageous, but not trivial either

Page 27: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

27PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

So what?

Well, at this point, we’ve:– fetched all the data– stored all the data in text files– archived all the temporary raw (html) data files

So now we’re ready to convert it to binary for the master database.

If we didn’t convert it to binary, it would take well over 10 minutes to load the entire database each time! (AMD64 3300+)

Page 28: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

28PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

The binary databaseAfter all the data is loaded, I run “regen_stockdb” as the final executable. It’s responsible for parsing all the text files and loading them into the binary database.

It takes about 10 minutes.

flat ASCIIstock database

flat ASCIIoptions database

flat ASCIIrecommendations database

regen_stockdb

Masterbinary

database

Page 29: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

29PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Why Flat ASCII?

I get asked this a lot. Here’s a bunch of answers:

– you can’t grep a database– you can’t vi/sed a database to repair it– regenerating my databases is trivial from ASCII– ASCII is much more “portable” than a given

database’s representation of the data

and…– I don’t like databases

(we can discuss that one over beer after the conference)

Page 30: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

30PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

The Binary Database

In spite of all those “fine” reasons, I need my data in a binary database for speed of access.

The next set of slides cover the schema for the database, and then I’ll talk about the implementation (making it fast).

Page 31: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

31PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Database Schema – Top Level

The topmost entity is the stock database:typedef struct{

exchange_t *exchanges;int nexchanges;int map_fd; void *map_ptr; int map_size;

} stockdb_t;

Ignoring the map information, this basically is a container that tells us how many exchanges worth of information are contained in the entire database (the nexchanges member) and gives us a pointer to the data (the exchanges member).

Page 32: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

32PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Database Schema – Exchange Level

The next topmost entity is an exchange:

typedef struct{

char name [EXCHANGE_NAME_SIZE + 1];int ninstruments;int ninstrumentsalloc; instrument_t *instruments;

} exchange_t;

In each exchange, we store the name (e.g., “tse”, “nasdaq”) and the number of instruments (the ninstruments member) as well as a pointer to their data (the instruments member). The ninstrumentsalloc is there to allow more efficient allocation/reallocate of the instruments array.

Page 33: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

33PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Database Schema – Instrument Leveltypedef struct

{

char name [SYMBOL_NAME_SIZE + 1];

int noptions [2];

int noptionsalloc [2];

strike_t *options [2];

int ntrades;

int ntradesalloc;

one_trading_datum_t *trades;

} instrument_t;

An instrument can have an equity flavour (the trades and ntrades members) and/or an options flavour (the noptions and options members). They are stored together because the root symbol is the same.

Page 34: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

34PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Database Schema – Trading Datum Level

typedef struct{

time_t date;float open;float high;float low;float close;float change;int volume;int ntrades;int abflags;int abstars;

} one_trading_datum_t;

This is the record for one day’s worth of trade data. To simplify storage and lookup, the American Bulls recommendations are stored as two members, the abflags and abstars, directly in-line with the trading data (this stores both the daily and weekly information in different bit positions).

Page 35: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

35PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

American Bulls Data

abflags weekly daily

weekly dailyabstars

AB_UNKNOWN 0 (no data available)

AB_BUY_IF 1 (watch this stock in preparation to buying)

AB_BUY_CONF 2 (buy this stock)

AB_SELL_IF 3 (watch this stock in preparation to selling)

AB_SELL_CONF 4 (sell this stock)

AB_HOLD 5 (hold this stock)

AB_WAIT 6 (do nothing)

AB_FRONTPAGE 0x80 (bitflag, appears on the front page)

We use 16 bits of the integer; the top 8 bits of that are the weekly data, and the bottom 8 bits are the daily data. Weekly data is only available on Fridays.

Valid values are 0 through 5, inclusive

Page 36: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

36PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Database Schema – Strike Leveltypedef struct{

char name [STRIKE_NAME_SIZE + 1];time_t expiration;float strike;

int ndatums;int ndatumsalloc;one_strike_datum_t *datums;

} strike_t;

Finally, this is the root datum for a strike. It consists of the name of the strike, its expiration, the strike price, and then an array of strike data (the datums array)…

Page 37: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

37PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Database Schema – Strike Datum Level

typedef struct

{

time_t date;

int bid_size;

float bid;

float ask;

int ask_size;

float last;

float change;

int volume;

float close;

int open_interest;

float volatility;

} one_strike_datum_t;

And this is the data as it pertains to one strike on one day.

Page 38: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

38PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

stockdb_t

exchanges

Database Schema – Big Picturene

xcha

nges

instruments

nins

trum

ents

ntra

des

nopt

ions

[0]

nopt

ions

[1]

name

name

options [TYPE_CALL]

options [TYPE_PUT]

trades

date open high low last change volume abflags abstars

(same as above)

datums

name expiration strikend

atum

s

date bid_size bid ask ask_size last change

volume close open_interest volatility

Options,Equities, Recommendations

“tse”

“bns”

“jj”

Page 39: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

39PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Database – In-Memory View

nexchanges

exchanges [0]

exchanges [1]

0x8e7bf000

name

ninstruments

instruments [0]

0x8e7cd8e0

instruments [1]

name

ninstruments

instruments [0]

0x8e7fa930

instruments [1]

name

noptions [0]

options [0][0]

0x8f00d8e0

options [0][1]

options [1][0]

options [1][1]

trades [0]

trades [1]

noptions [1]

ntrades

Page 40: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

40PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Database – In-Memory View

This is what the process address space looks like after we’ve read the entire text-based database into memory. The fragmentation has resulted from the fact that we are constantly realloc()ing and malloc()ing memory to grow our dynamic arrays.

This swiss-cheese effect is really not what you want to write to disk for other applications to use.

So, we need to convert the data…

Page 41: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

41PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Database – Coalesced (Disk) Viewheader

stock_db_t

exchange_t

one_trading_datum_t

instrument_t

one_strike_datum_t

By counting up all of the individual pieces of data, and allocating one contiguous memory region (and then subdividing it internally) we can coalesce all the data into one linear chunk.

This is great, because it means that we can then simply write() this to disk, and, even better, we can mmap() it back into the address space of the process that needs to use the data.

Yes, we are writing data with pointers to disk. The header contains administrative info (date compiled).

Page 42: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

42PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

32 bits vs 64 bits

32-bit VA is dead!–I already have 2 applications that are at the limit of 32-bit virtual address space, and are expected to exceed that shortly.

Page 43: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

43PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Loading the Database

So, how do we get the data?We stored the data at a special location:

0x123400000000

64 bit processors are great! There’s “infinite” address space available, so you can just pick an address and go with it.

(Ok, I have thought about this a little more than just that )

Page 44: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

44PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Loading the Database

So, to load the database:– calculate size (use stat() to get the file size)– mmap() the file at the magic address of

0x123400000000– initialize a pointer to point to the mmap()’d file– and you’re off to the races!

Page 45: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

45PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Loading the Databasevoid

stockdb_open_database (stockdb_t *db)

{

binary_database_header_t *header;

calculate_database_size (db);

map_database (db, PROT_READ, MAP_NOCORE | MAP_NOSYNC

| MAP_PRIVATE);

header = (binary_database_header_t *) db -> map_ptr;

// actual data starts after the header

db -> exchanges = (exchange_t *)

((uint8_t *) header + BINARY_DATABASE_HEADER_SIZE);

db -> nexchanges = header -> nexchanges;

}

This is the client API call; you pass it a stockdb_t and it fills it in, giving you access to the database.

Page 46: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

46PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Loading the Databasestatic void

calculate_database_size (stockdb_t *db)

{

struct stat s;

if (stat (DATABASE_PATH, &s) == -1) {

// squack an error out and

exit (EXIT_FAILURE);

}

db -> map_size = s.st_size;

}

This is really just an error checking cover function for the stat() function.

Page 47: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

47PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Loading the Databasestatic void

map_database (stockdb_t *db, int prot, int flags)

{

if ((db -> map_fd = open (DATABASE_PATH, O_RDWR)) == -1) {

exit (EXIT_FAILURE); // prints an error too…

}

if ((db -> map_ptr = mmap (MAP_ADDRESS, db -> map_size, prot,

flags, db -> map_fd, 0)) == MAP_FAILED) {

exit (EXIT_FAILURE); // prints an error too…

}

if (db -> map_ptr != MAP_ADDRESS) {

// @@@ for now, just exit. In the future, do a fixup

exit (EXIT_FAILURE); // prints an error too…

}

}

Page 48: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

48PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Client Example

Here’s how simple it is to use:#include <stdio.h>#include <parse/stockdb.h>

stockdb_t db; // THE database

intmain (int argc, char **argv){

int i;

stockdb_open_database (&db);printf (“There are %d exchanges: ”, db.nexchanges);for (i = 0; i < db.nexchanges; i++) {

printf (“%s “, db.exchanges [i].name);}printf (“\n”);

}

Page 49: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

49PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Clients

There are three main clients for the database:– opt

• retrieves quotes, selects options (xgraph or text)• shows americanbulls data

– autotrader• analyzes historical data• picks stocks based on that data• performs simulated trades (with very conservative

requirements; only uses “open/close” information)• functions daily or weekly

– sfa• massages stock data and outputs to xgraph

Page 50: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

50PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

opt

Here are some examples of using opt:– Get last value of Bank of Nova Scotia on TSE:[amd64@ttyp0] opt -xtse -q -sbns

bns 2007 01 12 50.75 51.14 50.67 50.93 0.38 1083267 3955

– Get last 3:[amd64@ttyp0] opt -xtse -Q3 -sbns

bns 2007 01 10 51.65 51.65 50.67 50.72 -0.81 1501561 4211

bns 2007 01 11 50.72 51.09 50.45 50.55 -0.17 2059363 4557

bns 2007 01 12 50.75 51.14 50.67 50.93 0.38 1083267 3955

– If you add “-a”, it also prints out the americanbulls recommendations on the side (data omitted to fit):

[amd64@ttyp0] opt -xtse -Q3 -sbns -a

bns 2007 01 10 …omitted… WAIT (3 star)

bns 2007 01 11 …omitted… WAIT (3 star)

bns 2007 01 12 …omitted… BUY-IF (3 star), WK WAIT (3 star)

Daily occupancy 0 days out of 3 (0.00%), weekly 0 out of 1 (0.00%)

– But wait, that’s not all

Page 51: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

51PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

opt

Find all call options < $2.00 with a spread of less than $0.20 and an expiration of > 180 days and a leverage of > 2000%:

[amd64@ttyp0] opt -o -D+180 -P-2.00 -S-0.20 -L+2000tse:bmo:gd [EXP 2007 07 21 (188 days) STRIKE $76.00 CALL] close $69.00: YYMMDD CLOSE DELTA BID ASK AVERG DELTA CORR ------ -------- ------ ----- ----- ----- ----- ----- 070112 69.00 0.50 0.45 0.55 0.50 0.05 10%NSAMP (up 444 + down 669) 1113 SUM (up 66.8994 + down 174.154) 241.053corr (up 15.07% down 26.03%) = 21.66% leverage 2077.63%

tse:cm:gd [EXP 2007 07 21 (188 days) STRIKE $120.00 CALL] close $98.41: YYMMDD CLOSE DELTA BID ASK AVERG DELTA CORR ------ -------- ------ ----- ----- ----- ----- ----- 070112 98.41 0.01 0.15 0.25 0.20 0.00 0%NSAMP (up 288 + down 271) 559 SUM (up 12.4133 + down 17.3214) 29.7347corr (up 4.31% down 6.39%) = 5.32% leverage 2391.61%

... lots more were found ...

Page 52: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

52PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

autotrader

Some sample output:BACKTEST (weekly)

Backtesting for a period of 365 days, using a depth of 120 days

Analyzing up to 10 top rated stocks

Holding a maximum of 50 stocks

Stock dollar volume during last 10 days must average $2000000 -> and up

Stock price during last 10 days must average $10 -> and up

Investing $30000 at a time

Stop loss set at 0 of initial purchase price

Maximum gain set at 9.99 of initial purchase price

Ratchet set at 0

Pop factor set to 1.1

Only stocks with a return between 1.01 and 9.99 over the depth period are considered

The 10-th stock in the list must have a return of at least 0, otherwise we scrub that day

Recommended stocks must have between 4 and 5 stars

We skip 0 stocks in the recommended list before processing

(items in red are configurable parameters)

Page 53: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

53PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

autotraderOur story begins with $100000 in cash...DATE NSHARES @ COST AMOUNT,CASHLFT NETWRTH GAIN/LOSS20060901 BUY 820 ncx @ 36.23 29708, 70262, #1, 9997120060915 BUY 2280 sw @ 13.15 29982, 40212, #2, 10020620060922 BUY 300 rim @ 96.96 29088, 11095, #3, 10146220060922 SELL 820 ncx @ 35.47 29036, 40151, #2, 101433 -623.20 (-2.10%)20061006 BUY 2040 hbm @ 14.64 29865, 10224, #3, 10627520061020 SELL 300 rim @123.94 37164, 47377, #2, 115629 8094.00 (27.83%)20061027 SELL 2280 sw @ 13.63 30939, 78385, #1, 117411 1094.40 ( 3.65%)20061103 BUY 220 rim @131.48 28925, 49431, #2, 11768820061117 SELL 2040 hbm @ 17.37 35312, 84804, #1, 118480 5569.20 (18.65%)20061201 BUY 1740 fnx @ 17.17 29875, 54876, #2, 11900220061201 SELL 220 rim @155.68 34236, 89097, #1, 118973 5324.00 (18.41%)20061222 BUY 1990 sxr @ 15.04 29929, 59108, #2, 11947020061229 BUY 2260 lim @ 13.25 29945, 29095, #3, 12267020070105 SELL 1740 fnx @ 16.93 29353, 58501, #2, 115250 -417.60 (-1.40%)20070112 BUY 2500 cux @ 11.99 29975, 28451, #3, 116240Still holding the following: 2500 cux (@$ 11.99) -> 29975 GAIN 0.00 ( 0.00%) 1990 sxr (@$ 14.30) -> 28457 GAIN -1472.60 (-4.92%) 2260 lim (@$ 12.99) -> 29357 GAIN -587.60 (-1.96%)cash left 28451.40, #active 3, net worth 116240.80(change 16.24% over 135 days or 50.21% APR)Total commissions $ 739.80 over 15 trades (avg $49.32/trade)

Page 54: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

54PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Conclusion

•This is an ongoing, multi-year project.

•I use open source / public domain tools where I can (and when I know about them )

•I write my own tools when I need to or want to.

•Suggestions welcome!

Page 55: PARSE SOFTWARE DEVICES © Copyright 2007 PARSE Software Devices, all rights reserved. Getting, Managing, and Analyzing Stock Market Information with FreeBSD.

55PARSE SOFTWAREDEVICES © Copyright 2007 PARSE Software Devices, all rights reserved.

Contact Info

Check out my other projects and website:– www.parse.com/~rk– www.parse.com/~museum

Email:– [email protected]

Slides available at:www.parse.com/slides

for 2 months after presentation.