ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

58
Structural Methods

Transcript of ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

Page 1: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

Structural  Methods  

Page 2: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II
Page 3: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Pose  seems  sufficient  for  certain  ac5on  categories  •  Remove  effects  of  clothing,  ligh5ng  varia5on  from  representa5on  

G. Johansson, Moving Light Displays, 1973

Page 4: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II
Page 5: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II
Page 6: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II
Page 7: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II
Page 8: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Two  broad  classes  of  approaches  – Match  templates  (exemplar-­‐based)  

– Fit  a  human  body  model  

Page 9: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

Database of Exemplars

Mori & Malik PAMI 2005 Shakhnarovich, Viola & Darrell ICCV 2003 Bourdev & Malik ICCV 2009

Page 10: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

Felzenszwalb & Huttenlocher CVPR 2000 Ramanan NIPS 2006 Ferrari, Marin & Zisserman CVPR 2008

Pictorial Structures model

Page 11: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Detect  corners  in  images/video  •  Assess  likelihood  under  ac5on-­‐specific  pose  model  •  Discriminate  between  walking  direc5ons,  bicycle  riding  

Song, Goncalves & Perona NIPS 2001, PAMI 2003

Page 12: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Key  frame  matching  to  test  sequence  to  find  similar  poses  – Shape  matching  on  edge  maps  using  order  structure  

Sullivan & Carlsson ECCV 2002

test sequence key frame

Page 13: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Automa5c  person  detec5on-­‐tracking  •  Compare  quan5zed  pose  to  labeled  training  poses  

–  Smooth  over  5me  

Ramanan & Forsyth NIPS 2003

•  M  is  quan5zed  3d  pose  

•  T  is  root  orienta5on    

Page 14: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

Ferrari, Marin & Zisserman CVPR 2009

•  Video  shot  retrieval  from  pose  – Either  query-­‐by-­‐example  or  classifica5on  

– Focus  on  upper  body  pose  •  Pictorial  structures  model  

Ac5on  Label  

Page 15: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  SVM  on  descriptors  of  absolute  &  rela5ve  part  loca5ons,  segmenta5ons  –  Include  short  tracks  for  robustness  

query CODE AVAILABLE ONLINE

Page 16: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Focus  on  discrimina5ve  elements  of  pose  for  ac5on  classifica5on  •  Use  exemplar-­‐based  “poselet”  representa5on  

Yang, Wang & Mori CVPR 2010

Golfing?  

Walking?  

Page 17: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

17

Successful classification

examples

Unsuccessful classification

examples

Page 18: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

Yao & Fei-Fei CVPR 2010

H

A

•  More than one H for each A; •  Unobserved during training.

A:

Croquet shot

Volleyball smash

Tennis forehand

Intra-class variations

Activity

Object

Human pose

Body parts

lP: location; θP: orientation; sP: scale.

Croquet mallet

Volleyball Tennis racket

O:

H:

P:

f: Shape context. [Belongie et al, 2002]

P1

Image evidence

fO

f1 f2 fN

O

P2 PN

Page 19: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

19

Cricket defensive

shot

Cricket bowling

Croquet shot

Page 20: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Build  ac5on  models  from  web  search  results  

Ikizler-Cinbis, Cinbis, Sclaroff ICCV 2009

Page 21: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Find  repeated  poses  in  a  dataset  

SLAG

Wang, Jiang, Drew, Li, Mori CVPR 2006

Page 22: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Person  loca5on  given  

•  Classify  into  one  of  9  categories  

Riding horse Reading book Taking photo

Riding bike Play instrument Running

Phoning Use computer Walking

Page 23: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Pose  as  representa5on  for  ac5on  recogni5on  –  Captures  much  informa5on  about  ac5on  –  Invariance  to  clothing  /  ligh5ng  effects  – Model  and  exemplar  based  representa5ons  

•  New  direc5on:  Ac5on  recogni5on  from  s5ll  images  –  Image  retrieval  and  analysis  – An  important  cue  for  video-­‐based  ac5on  recogni5on  –  Pose  seems  essen5al  

Page 24: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II
Page 25: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II
Page 26: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II
Page 27: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Describe  low-­‐level  components  – Ac5ons  of  individual  people  – Movement  of  pixels  

•  Iden5fy  key  objects  or  loca5ons  in  scene  – Buildings,  roads,  etc.  

•  Model  interac5ons  between  people,  objects,  and  loca5ons  

Page 28: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Detect  and  track  moving  objects  

•  Manually  iden5fy  key  regions  in  scene  –  E.g.  road,  checkpoint  

•  Scenarios  describe  rela5ve  arrangements  of  objects  in  scene  –  E.g.  proximity  of  car  to  checkpoint  

– No5ons  of  scene  context  

Medioni, Cohen, Bremond, Hongeng, Nevatia PAMI 2001

Page 29: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Detect  and  track  players,  ball  

•  Low-­‐level  ac5on  detectors  for  individual  players  

•  Hand-­‐constructed  Bayes  net  for  each  ac5vity  –  Spa5al  and  temporal  rela5ons  between  low-­‐level  ac5ons  

Intille & Bobick CVPR 1999

Page 30: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

Zhong, Shi & Visontai CVPR 2004

•  Global,  frame-­‐level  feature  –  Bag-­‐of-­‐words  representa5on  

•  Detect  unusual  events  by  clustering  –  Isolated,  varied  clusters  are  unusual  

Page 31: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Real-­‐world  highway  dataset  –  Cars  pulling  off  road,  backing  

up,  U-­‐turns  

•  Chea5ng  detec5on  in  simulated  card  game  

Page 32: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Describe  moving  pixels  by  loca5on  and  mo5on  direc5on  – No  object  detec5on  

•  Use  as  visual  words  in  Latent  Dirichlet  Alloca5on  (LDA)  type  model  –  Infer  low-­‐level  ac5ons  from  words  

Wang, Ma, Grimson PAMI 2009 Blei, Ng, Jordan JMLR 2003

Page 33: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Higher-­‐level  ac5vity  analysis  –  Distribu5on  of  low-­‐level  ac5ons  over  en5re  scene  

•  Applica5ons  –  Temporal  segmenta5on  by  ac5vity  –  Abnormality  detec5on  

Page 34: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Hierarchical  Dirichlet  Process  model  –  Learn  number  of  ac5vi5es  automa5cally  

Kuettel, Breitenstein, van Gool & Ferrari CVPR 2010

A   B  

C  

0.8

0.5 0.3

Page 35: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

traffic  light  controlled  scene  

•  con5nuous  video  •  annotated  with  states  and  history  •  3x  speed  

Page 36: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

Loy, Xiang & Gong CVPR, ICCV 2009

Page 37: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Consider  5me-­‐delayed  correla5ons  between  regions  –  Applica5ons  to  irregularity  detec5on  

Page 38: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

Choi, Shahid, & Savarese VS 2009 Lan, Wang, Yang, & Mori SGA 2010, NIPS 2010

y  

h1 h2 y  h

x1 x2 xn image

action class

activity class

x0

Page 39: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Cap5oned  baseball  videos  in  training  

•  Build  AND-­‐OR  graph  representa5on  of  ac5vi5es  –  AND  specifies  elements  of  an  ac5vity  that  must  occur  

–  OR  allows  varia5on  in  how  an  element  appears  

•  Describe  low-­‐level  tracks  using  STIPs  

•  Match  tracks  to  ac5ons  in  AND-­‐OR  graph  

Gupta, Srinivasan, Shi, Davis CVPR 2009

Page 40: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Scene  modeling  to  look  at  the  big  picture  •  Feature  representa5ons  

– Holis5c:  describe  en5re  scene,  irrespec5ve  of  individuals  

– Local:  describe  ac5ons  of  individuals  •  Structure  of  ac5vi5es  

– Model  free:  clustering-­‐type  approaches  – Strong  models:  grammars,  probabilis5c  models  

Page 41: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II
Page 42: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II
Page 43: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

Objects: cars, glasses, people, etc…

Scene categories: indoors, outdoors, street scene, etc…

Actions: drinking, running, door exit, car enter

Geometry: Street, wall, field, stair, etc…

constraints

Page 44: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

A B

C

0.8

0.5 0.3

Reading book

Play instrument

Riding horse

Riding bike

Page 45: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Standardiza5on  of  datasets  for  field  – Allow  comparison  of  algorithms  

•  E.g.  KTH  for  low-­‐level  features,  atomic  ac5ons  

– Fair  tuning  of  model  parameters  

•  New  algorithms  compare  to  baselines  – Bag-­‐of-­‐words  on  densely  sampled  STIPs  – Pose  es5ma5on  (Ferrari  et  al.  code)  – HOG  SVM  (Dalal  &  Triggs  code,  Ramanan  code)  

Page 46: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Standardiza5on  of  datasets  for  field  – Don’t  feel  constrained  by  the  exis5ng  problem  defini5ons  

– Do  make  your  new  dataset  available  •  Should  clearly  specify  separate  training  and  test  sets  

•  New  algorithms  compare  to  baselines  – Do  use  reasonable  variant  of  standard  baselines  for  your  new  problem  

Page 47: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II
Page 48: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Even  atomic  low-­‐level  ac5ons  are  very  difficult  to  detect  reliably  – Far  more  work  needed  on  representa5ons  for  the  ac5on  of  a  single  person  

– Features  – Temporal  representa5on,  smoothing  – Tracking  – …  

Page 49: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

1.  Cameras and bandwidth are cheap

2.  Lots of training data is potentially available

+ = Training

data

Potential for the huge progress … if we can get the data

Page 50: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

Aligned with video   Describes visual content  

Source  

Subtitles   Yes No   DVD, Internet

Scripts for TV series, movies and sport games

No   Yes   Internet, e.g. www.dailyscript.com  

Plot summaries and synopses

No   Yes, sparsely   Internet (e.g. IMDB)  

Instruction videos No   Yes   Internet, e.g. www.videojug.com  

Descriptive Video Service Yes   Yes   DVD, rare  

Word tags No   Yes, sparsely   Internet (e.g. YouTube)  

Manual labelling, Human Computation

??   ??   Mechanical Turk, ESP Game,

Grad undergrad students

Page 51: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

  How to benefit from the structure of the human body in complex situations, e.g. heavy occlusions, uniformly colored clothing?

  Will action classification generalize over different video domains: Movies, TV, YouTube, Surveillance video?

  What is the useful action vocabulary? Are we trying to solve the right problem? How can we visualize/display the results?

Open questions:

  Use actions for recognizing functional and physical object properties, e.g. “sitable”, “eatable”, “heavy”, “solid” objects…

  Action prediction, i.e. what can happen in the given situation: e.g. is it dangerous to cross this road?

  Explore more sources of strong and weak supervision: Manual surveillance, Descriptive Video Service (DVS), YouTube tags; Transcripts of sports games; Instruction videos.

Interesting novel directions:

Page 52: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  P.  Viola,  M.  Jones,  and  D.  Snow.  Detec5ng  pedestrians  using  paeerns  of  mo5on  and  appearance.  In  Proc.  9th  Int.  Conf.  Computer  Vision,  pages  734–741,  2003.  

•  N.  Dalal  and  B.  Triggs.  Histograms  of  oriented  gradients  for  human  detec5on.  In  Proc.  IEEE  Comput.  Soc.  Conf.  Comput.  Vision  and  Paeern  Recogn.,  2005.  

•  Bo  Wu  and  Ram  Neva5a.  Detec5on  of  mul5ple,  par5ally  occluded  humans  in  a  single  image  by  bayesian  combina5on  of  edgelet  part  detectors.  In  Proc.  10th  Int.  Conf.  Computer  Vision,  2005.  

•  Pedro  Felzenszwalb,  David  McAllester,  and  Deva  Ramanan.  A  discrimina5vely  trained,  mul5scale,  deformable  part  model.  In  IEEE  Computer  Society  Conference  on  Computer  Vision  and  Paeern  Recogni5on,  2008.  

•  Chris  Stauffer  and  W.E.L.  Grimson.  Adap5ve  background  mixture  models  for  real-­‐  5me  tracking.  In  Proc.  IEEE  Comput.  Soc.  Conf.  Comput.  Vision  and  Paeern  Recogn.,  1999.  

•  Kentaro  Toyama,  John  Krumm,  Barry  Brumie,  and  Brian  Meyers.  Wallflower:  Principles  and  prac5ce  of  background  maintenance.  In  Proc.  7th  Int.  Conf.  Computer  Vision,  1999.  

•  J.L.  Barron,  D.J.  Fleet,  and  S.S.  Beauchemin.  Performance  of  op5cal  flow  techniques.  Int.  Journal  of  Computer  Vision,  12(1):43–77,  1994.  

•  T.  Brox,  C.  Bregler,  and  J.  Malik.  Large  displacement  op5cal  flow.  In  Proc.  IEEE  Comput.  Soc.  Conf.  Comput.  Vision  and  Paeern  Recogn.,  2009.  

•  M.  Isard  and  A.  Blake.  Condensa5on  –  condi5onal  density  propaga5on  for  visual  tracking.  Int.  Journal  of  Computer  Vision,  29(1):5–28,  1998.  

•  Yuan  Li,  Chang  Huang,  and  Ram  Neva5a.  Learning  to  associate:  Hybridboosted  mul5-­‐target  tracker  for  crowded  scene.  In  Proc.  IEEE  Comput.  Soc.  Conf.  Comput.  Vision  and  Paeern  Recogn.,  2009.  

Page 53: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  W.  T.  Freeman,  K.  Tanaka,  J.Ohta,  and  K.  Kyuma.  Computer  vision  for  computer  games.  In  IEEE  2nd  Intl.  Conf.  on  Automa5c  Face  and  Gesture  Recogni5on,  1996.  

•  J.  Sullivan  and  S.  Carlsson.  Recognizing  and  tracking  human  ac5on.  In  ECCV  2002  

•  A.A.  Efros,  A.C.  Berg,  G.  Mori,  and  J.  Malik.  Recognizing  ac5on  at  a  distance.  In  ICCV  2003  •  A.  Bobick  and  J.  Davis.  The  recogni5on  of  human  movement  using  temporal  templates.  IEEE  Trans.  

PAMI,  23(3):257–267,  2001.  

•  L.  Zelnik-­‐Manor  and  M.  Irani.  Event-­‐based  video  analysis.  In  CVPR  2001  •  E.  Shechtman  and  M.  Irani.  Space-­‐5me  behavior  based  correla5on.  In  CVPR  2005  

•  O.  Boiman  and  M.  Irani.  Detec5ng  irregulari5es  in  images  and  in  video.  In  Proc.  ICCV,  2005.  

•  M.  Blank,  L.  Gorelick,  E.  Shechtman,  M.  Irani,  and  R.  Basri.  Ac5ons  as  space-­‐5me  shapes.  In  Proc.  ICCV,  2005.  

•  Y.  Ke,  R.  Sukthankar,  and  M.  Hebert.  Efficient  Visual  Event  Detec5on  using  Volumetric  Features  .  In  Proc.  ICCV  2005.  

•  Y.  Ke,  R.  Sukthankar,  and  M.  Hebert.  Event  detec5on  in  crowded  videos.  In  Proc.  ICCV,  2007.  •  I.  Laptev  and  P.  Pérez.  Retrieving  ac5ons  in  movies.  In  Proc.  ICCV  2007  

•  D.  Weinland  and  E.  Boyer.  Ac5on  recogni5on  using  exemplar-­‐based  embedding.  In  Proc.  CVPR,  2008.  

•  Z.  Lin,  Z.  Jiang,  and  L.  S.  Davis.  Recognizing  ac5ons  by  shape-­‐mo5on  prototype  trees.  In  Proc.  ICCV,  2009.  

Page 54: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  I.  Laptev  and  T.  Lindeberg.  Space-­‐5me  interest  points.  In  Proc.  ICCV  2003.  

•  C.  Schuldt,  I.  Laptev,  and  B.  Caputo.  Recognizing  human  ac5ons:  A  local  svm  approach.  In  Proc.  ICPR,  2004.  •  P.  Dollar,  V.  Rabaud,  G.  Coerell,  and  Serge  Belongie.  Behavior  recogni5on  via  sparse  spa5o-­‐temporal  features.  

In  VS-­‐PETS,  2005.  

•  H.  Jhuang,  T.  Serre,  L.  Wolf  and  T.  Poggio.  A  Biologically  Inspired  System  for  Ac5on  Recogni5on.  In  Proc.  ICCV  2007  

•  P.  Scovanner,  S.  Ali,  and  M.  Shah,  A  3-­‐Dimensional  SIFT  descriptor  and  its  applica5on  to  ac5on  recogni5on,  ACM  MM  2007.    

•  J.  C.  Niebles,  H.  Wang,  and  L.  Fei-­‐Fei.  Unsupervised  learning  of  human  ac5on  categories  using  spa5al-­‐temporal  words.  In  IJCV  2008.  

•  I.  Laptev,  M.  Marszalek,  C.  Schmid,  and  B.  Rozenfeld.  Learning  realis5c  human  ac5ons  from  movies.  In  Proc.  CVPR  2008.  

•  A.  Klaeser,  M.  Marszałek  and  C.  Schmid.  A  spa5o-­‐temporal  descriptor  based  on  3D-­‐gradients.  In  Proc.  BMVC  2008  

•  G.  Willems,  T.  Tuytelaars  and  L.  Van  Gool.  An  Efficient  Dense  and  Scale-­‐Invariant  Spa5o-­‐Temporal  Interest  Point  Detector.  In  Proc.  ECCV  2008  

•  H.  Wang,  M.  M.  Ullah,  A.  Kläser,  I.  Laptev  and  C.  Schmid.  Evalua5on  of  local  spa5o-­‐temporal  features  for  ac5on  recogni5on.  In  Proc.  BMVC  2009.  

•  L.  Yeffet  and  L.  Wolf.  Local  Trinary  Paeerns  for  Human  Ac5on  Recogni5on.  In  Proc.  ICCV  2009.    •  A.  Gilbert,  J.  Illingworth,  R.  Bowden.  Fast  realis5c  mul5-­‐ac5on  recogni5on  using  mined  dense  spa5o-­‐

temporal  features,  In  Proc.  ICCV  2009.  

•  P.  Ma5kainen,  M.  Hebert,  R.  Sukthankar.  Trajectons:  Ac5on  recogni5on  through  the  mo5on  analysis  of  tracked  features.  ICCV  workshop  on  Video-­‐oriented  Object  and  Event  Classifica5on,  2009  

•  M.  M.  Ullah,  S.  N.  Parizi,  I.  Laptev.  Improving  bag-­‐of-­‐features  ac5on  recogni5on  with  non-­‐local  cues.  In  Proc.  BMVC  2010  

Page 55: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  Y.  Song,  L.  Goncalves,  and  P.  Perona.  Unsupervised  learning  of  human  mo5on.  IEEE  Trans.  PAMI,  25(7):814–827,  2003.  

•  D.  Ramanan  and  D.  A.  Forsyth.  Automa5c  annota5on  of  everyday  movements.  In  Advances  in  Neural  Informa5on  Processing  Systems  16,  2003.  

•  V.  Ferrari,  M.  Marin,  and  A.  Zisserman.  Pose  search:  retrieving  people  using  their  pose.  In  Proc.  IEEE  Comput.  Soc.  Conf.  Comput.  Vision  and  Paeern  Recogn.,  2009.  

•  Yang  Wang,  Hao  Jiang,  Mark  S.  Drew,  Ze-­‐Nian  Li,  and  Greg  Mori.  Unsupervised  discovery  of  ac5on  classes.  In  CVPR,  2006.  

•  Nazli  Ikizler-­‐Cinbis,  R.  Gokberk  Cinbis,  and  Stan  Sclaroff.  Learning  ac5ons  from  the  web.  In  IEEE  Interna5onal  Conference  on  Computer  Vision,  2009.  

•  Weilong  Yang,  Yang  Wang,  and  Greg  Mori.  Recognizing  human  ac5ons  from  s5ll  images  with  latent  poses.  In  Proc.  IEEE  Comput.  Soc.  Conf.  Comput.  Vision  and  Paeern  Recogn.,  2010.  

•  Bangpeng  Yao  and  Li  Fei-­‐Fei.  Modeling  mutual  context  of  object  and  human  pose  in  human-­‐object  interac5on  ac5vi5es.  In  Proc.  IEEE  Comput.  Soc.  Conf.  Comput.  Vision  and  Paeern  Recogn.,  2010.  

Page 56: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  R.  Polana  and  R.C.  Nelson.  Detec5on  and  recogni5on  of  periodic,  nonrigid  mo5on.  In  IJCV  1997.  

•  S.M.  Seitz  and  C.R.  Dyer.  View  invariant  analysis  of  cyclic  mo5on.  In  IJCV  1997  

•  A.  Thangali  and  S.  Sclaroff.  Periodic  mo5on  detec5on  and  es5ma5on  via  space-­‐5me  sampling.  In  IEEE  Workshop  on  Mo5on  and  Video  Compu5ng,  2005.  

•  I.  Laptev,  S.J.  Belongie,  P.  Pérez  and  J.  Wills.  Periodic  mo5on  detec5on  and  segmenta5on  via  approximate  sequence  alignment,  In  Proc.  ICCV  2005  

•  P.  Wang,  G.D.  Abowd  and  J.M.  Rehg.  Quasi-­‐periodic  event  analysis  for  social  game  retrieval.  In  Proc  ICCV  2009  

•  D.  Weinland,  E.  Boyer,  and  R.  Ronfard.  Ac5on  recogni5on  from  arbitrary  views  using  3D  exemplars.  in  Proc.  ICCV  2007.  

•  A.  Farhadi  and  M.  Tabrizi.  Learning  to  recognize  ac5vi5es  from  the  wrong  view  point.  In  Proc.  ECCV  2008.  

•  I.  Junejo,  E.  Dexter,  I.  Laptev  and  Patrick  Pérez.  Cross-­‐view  ac5on  recogni5on  from  temporal  self-­‐similari5es.  In  Proc.  ECCV  2008  

•  A.  Farhadi,  M.  Kamali,  I.  Endres,  D.  Forsyth.  A  latent  model  of  discrimina5ve  aspect.  In  Proc.  ICCV  2009.  

Page 57: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

•  X.  Wang,  X.  Ma,  and  E.  Grimson.  Unsupervised  ac5vity  percep5on  in  crowded  and  complicated  scenes  using  hierarchical  bayesian  models.  IEEE  Trans.  PAMI,  31(3):539–  555,  2009.  

•  Abhinav  Gupta,  Praveen  Srinivasan,  Jianbo  Shi,  and  Larry  S.  Davis.  Understanding  videos,  construc5ng  plots  -­‐  learning  a  visually  grounded  storyline  model  from  annotated  videos.  In  CVPR,  2009.  

•  T.  Xiang  and  S.  Gong.  Beyond  tracking:  Modelling  ac5vity  and  understanding  behaviour.  Int.  Journal  of  Computer  Vision,  67(1):21–51,  2006.  

•  G.  Medioni,  I.  Cohen,  F.  Bre  ́mond,  S.  Hongeng,  and  R.  Neva5a.  Event  detec5on  and  analysis  from  video  streams.  IEEE  Trans.  PAMI,  23(8):873–889,  2001.  

•  Y.  A.  Ivanov  and  A.  F.  Bobick.  Recogni5on  of  visual  ac5vi5es  and  interac5ons  by  stochas5c  parsing.  IEEE  Trans.  PAMI,  22(8):852–872,  2000.  

•  D.  Moore  and  I.  Essa.  Recognizing  mul5tasked  ac5vi5es  using  stochas5c  context-­‐free  grammar  using  video.  In  AAAI,  2002.  

•  Chen  Change  Loy,  Tao  Xiang,  and  Shaogang  Gong.  Modelling  ac5vity  global  temporal  dependencies  using  5me  delayed  probabilis5c  graphical  model.  In  ICCV,  2009.  

•  Xiaogang  Wang,  Keng  Teck  Ma,  Gee  Wah  Ng,  and  W.  Eric  L.  Grimson.  Trajectory  analysis  and  seman5c  region  modeling  using  a  nonparametric  bayesian  model.  In  Proc.  IEEE  Comput.  Soc.  Conf.  Comput.  Vision  and  Paeern  Recogn.,  2008.  

•  W.  Choi,  K.  Shahid,  and  S.  Savarese.  ”what  are  they  doing?  :  Collec5ve  ac5vity  classifica5on  using  spa5o-­‐temporal  rela5onship  among  people”.  In  9th  Interna5onal  Workshop  on  Visual  Surveillance,  2009.  

•  Ramin  Mehran,  Alexis  Oyama,  and  Mubarak  Shah.  Abnormal  crowd  behavior  detec5on  using  social  force  model.  In  CVPR,  2009.  

Page 58: ECCV2010 tutorial: statisitcal and structural recognition of human actions part II

Workshop  materials  available:  heps://sites.google.com/site/humanac5onstutorialeccv10/