A Guided Tour of Machine Learning for Traders by Tucker Balch at QuantCon 2016

A GUIDED TOUR OF MACHINE LEARNING FOR TRADERS

TUCKER BALCH, PH.D. PROFESSOR, GEORGIA TECH CO-FOUNDER AND CTO, LUCENA RESEARCH

A Guided Tour of Machine Learning for Traders Tucker Balch, Ph.D.

WHO THIS IS FOR People who are… •  familiar with quantitative techniques •  interested to know what’s under the “hood”

with ML techniques. •  No Machine Learning knowledge assumed.

ABOUT THE SPEAKER •  Professor of Interactive Computing at

Georgia Institute of Technology. •  Teach courses in Artificial Intelligence and

Finance. •  Teach MOOCs on Machine Learning for

Trading •  Published over 120 research publications

related to Robotics and Machine Learning. •  Co-founder of Lucena Research.

ABOUT MY COURSE

ABOUT LUCENA RESEARCH •  Fin-tech company who employ

experts in Computational Finance, Quantitative Analysis, and Software Development.

•  We deliver investment decision support technology to hedge funds and wealth managers:

•  Price forecasting •  Hedging •  ML-based stock screening •  Model portfolios

•  Python-based infrastructure. •  http://lucenaresearch.com

TALK OVERVIEW •  Machine Learning: Big Picture •  Decision Trees: Classification •  Decision Trees: Regression •  Decision Trees Example: Sentiment-based strategy •  kNN: Classification •  kNN: Regression •  Reinforcement Learning

THE BIG PICTURE “Machine Learning” goes by many names: •  Machine Learning •  Big Data •  Predictive Analytics Focus: Supervised Learning •  Start with examples: Factor values & outcomes •  Build model from examples •  Use model to predict outcomes

HOW TO BUILD A PREDICTIVE MODEL Factors (X1, X2, … XN) Predict outcome: Y

HOW TO BUILD A PREDICTIVE MODEL Factors (X1, X2, … XN) Predict outcome: Y Classification: One of several outcomes Regression: Numerical outcome

HOW TO BUILD A PREDICTIVE MODEL Factors (X1, X2, … XN) Predict outcome: Y Classification: One of several outcomes Regression: Numerical outcome Lots of methods solve same problem •  kNN •  Decision Trees •  Support Vector Machines (SVM) •  Artificial Neural Networks (ANN) •  Deep Learning

WHO SHOULD I VOTE FOR?

PREDICT VOTING BEHAVIOR Factors: •  Do you believe the country is “broken”? •  If so, what caused the country to become broken? •  Where do you stand on a woman’s right to chose? •  What are your religious views? Outcomes: •  Trump •  Clinton •  Cruz •  Sanders •  Kasich

PREDICT VOTING BEHAVIOR Model: Decision Tree

PREDICT STOCK BEHAVIOR Model: Decision Tree

TREES ALSO WORK FOR REGRESSION Model: Decision Tree

LOTS OF TREES = FOREST

HOW TO BUILD A TREE •  Gather data <X1, X2, X3, Y> •  Find most predictive factor Xi of Y •  Find threshold Ti that splits data most effectively •  Decision node: Xi < Ti?

•  Left tree: Xi < Ti •  Right tree: Xi >= Ti

•  Recurse until only one data item left: Leaf

DECISION TREES RECAP •  A decision tree is a flow chart of yes/no questions •  When you reach a leaf, that is your prediction •  Can be used for classification or regression •  Training:

•  find most predictive factor •  split data based on that factor •  Recurse

•  Query: •  Follow path through decision nodes until leaf

•  Forest: An ensemble learner with multiple trees •  Training: Build trees with sampled data •  Query: Query each tree: Vote, or average to find result •  Less susceptible to overfitting

USING DECISION TREES FOR STOCK SCANS •  CHECKMATE: Trading strategy developed by Lucena Research,

Inc. in partnership with PsychSignal.com •  Classification-based strategy •  Separate scans for long and short positions •  Factors:

•  PyschSignal: Sentiment data: stocktwits, twitter analysis •  Lucena: 400+ technical & fundamental factors per stock

•  Outcomes: Up/Down/Neutral

BACKTEST OF LONG SCAN

Backtest simulation performance from QuantDesk® – Past performance is no guarantee of future results. In-sample training period: 2011.

BACKTEST OF SHORT SCAN

BACKTEST OF LONG & SHORT COMBINED

FORWARD TESTING SINCE NOV 2015

Forward testing performance – Past performance is no guarantee of future results. In-sample training period: 2011.

K NEAREST NEIGHBOR •  Solves the same problem as decision trees •  Train: Save data •  Query: Find k nearest neighbors, vote or take mean

K NEAREST NEIGHBOR

TRADE OFFS KNN •  Classification or regression •  Training is fast •  Query is slow •  Requires data normalization •  Susceptible to overfitting

•  Larger K •  Ensemble

•  Must discover features •  You must map to strategy

Decision Trees •  Classification or regression •  Training is slow •  Query is fast •  No data normalization •  Susceptible to overfitting

•  Larger leafsize •  Ensemble (forest)

•  Auto feature discovery •  You must map to strategy

REINFORCEMENT LEARNING Solves a different problem: •  Find a policy π that tells us which action a to take in

every situation s. •  a = π(s) •  π*(s) is the optimal policy

Nomenclature •  s: state •  r: reward for last action •  a: action •  T: transition matrix (which state is next) •  π: the policy

REINFORCEMENT LEARNING For trading problem: •  s: factors/features describing a stock’s “situation” •  r: return •  a: buy, sell, do nothing Algorithms: •  Model-based:

•  Policy iteration •  Value iteration

•  Model-free •  Q-learning •  Dyna-Q

REINFORCEMENT LEARNING Advantages: •  Maps well to finance problems •  Provides entire strategy including

entry and exit conditions •  Policy accounts for whether to enter

based on probability of success

REVIEW •  Decision Trees

•  Classification •  Regression

•  kNN •  Classification •  Regression

•  Reinforcement learning •  Finds a policy

THANK YOU To learn about my company: •  www.lucenaresearch.com

To learn about my course: •  Google “Balch Udacity”

OVERFITTING Description: An overfit model is one that models in-sample data very well. It predicts the data so well that it is likely modeling noise.

OVERFITTING Description: An overfit model is one that models in-sample data very well. It predicts the data so well that it is likely modeling noise. As the degrees of freedom of the model increase, overfitting occurs when in-sample prediction error decreases and out-of-sample prediction error increases.

A Guided Tour of Machine Learning for Traders by Tucker Balch at QuantCon 2016

Economy & Finance

Transcript of A Guided Tour of Machine Learning for Traders by Tucker Balch at QuantCon 2016

Balch Springs Juneteenth 2011

Phy441 Blood Balch 2006

Balch Lewis

Plaintiff Motion to Alter or Amend - Ban Balch

For Co-PI’s Tucker Balch (GT), Doug Blank & Deepak Kumar (BMC) With Stewart Tansley (MSR)

tardir/mig/a298851 · Doug MacKenzie, Tucker Balch, and Khaled Ali College of Computing Georgia Institute of Technology Atlanta, Georgia 30332 email: arkin@cc.gatech.edu Fax: (404)

111 New Balch Street———North BeverlyNorth BeverlyNorth ...

Phil Balch BSAW 2005

The TeamBots Environment Tucker Balch The Borg Lab Georgia Institute of Technology.

Balch Springs Unbound - March 2011

Systematic M&A Arbitrage by Yin Luo at QuantCon 2016

The Institute For Personal Robots In Education (IPRE) Tucker Balch Associate Professor College of Computing at Georgia Tech Stewart Tansley Program Manager.

Katherine Balch

Ashley W. Stroupe, Martin C. Martin, Tucker Balch...Ashley W. Stroupe, Martin C. Martin, Tucker Balch CMU-RI-TR-00-30 The Robotics Institute Carnegie Mellon University Pittsburgh,

From Backtesting to Live Trading by Vesna Straser at QuantCon 2016

Balch Family Papers

James F. Balch - Super antyoksydanty.pdf

Tucker Balch - cc.gatech.edu

ON TEACHER PRACTICE By Ryan Thomas Balch

Medical Disaster Resources Network (MD.RN). Lori Balch