B-SOiD: An Open Source Unsupervised Algorithm for ...2D perspective) (Eqn.1). Feature 2 subtracts...

13
B-SOiD: An Open Source Unsupervised Algorithm for Discovery of Spontaneous Behaviors Alexander I. Hsu 1 and Eric A. Yttri 1 1 Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA USA The motivation, control, and selection of actions comprising naturalistic behaviors remains a tantalizing but difficult field of study. Detailed and unbiased quantification is critical. In- terpreting the positions of animals and their limbs can be use- ful in studying behavior, and significant recent advances have made this step straightforward (1, 2). However, body position alone does not provide a grasp of the dynamic range of nat- uralistic behaviors. Behavioral Segmentation of Open-field In DeepLabCut, or B-SOiD ("B-side"), is an unsupervised learn- ing algorithm that serves to discover and classify behaviors that are not pre-defined by users. Our algorithm segregates sta- tistically different, sub-second rodent behaviors with a single bottom-up perspective video camera. Upon DeepLabCut esti- mating the positions of 6 body parts (snout, the 4 paws, and the base of the tail), our software performs novel expectation max- imization fitting of Gaussian mixture models on t-Distributed Stochastic Neighbor Embedding (t-SNE). The original features taken from dimensionally-reduced classes are then used build a multi-class support vector machine classifier that can decode millions of actions within seconds. We demonstrate that the highly reproducible, independently-classified behaviors can be used to extract kinematic parameters of individual actions as well as broader action sequences. This open-source platform enables the efficient study of the neural mechanisms of spon- taneous behavior as well as the performance of disease-related behaviors that have been difficult to quantify, such as grooming and stride-length in Obsessive-Compulsive Disorder (OCD) and stroke research. Behavioral Analysis | Open-field | DeepLabCut | Unsupervised Learning Correspondence: [email protected] Introduction The brain has evolved to support generation of action sequences in animals, particularly for surviving constantly changing environments. The core of surviving depends on an animal’s ability to correctly anticipate external events, and such decisions can be visualized through sequence of actions that have been adapted to those events for optimal outcome (3). Comprehensive tracking of behavior permits behav- ioral scientists to not only to quantify behavioral output (4); but also to identify unforeseen motions, such as "tap danc- ing" in songbirds(5). However, processing such data from recorded behavior is an extremely time and labor intensive process, even with the limited and expensive commercial op- tions available (6). Recent advances in computer vision and machine learn- ing has accelerated automatic tracking of geometric estimates of body parts (1, 2). Although establishing the location of body part can be informative given the right experimental configuration, the behavioral interpretability is quite low. For instance, the estimated position of where each paw is in an Open-field does not capture what the animal is doing. More- over, the angle criterion ascribed to whether a turning move- ment has occurred, or the minimum duration of an ambula- tory bout, are all subjective and could even depend on ani- mal size and/or video capture technique. This uncertainty in behavioral outcome only makes dissecting the neural mecha- nisms underlying the behavior more complicated. The chal- lenges presented above motivated our investigation into how partnering dimensionality reduction with pattern recognition could create a viable tool in automatic classification of an an- imal’s behavioral repertoire. In a recent behavioral study, Markowitz and colleagues were able to automatically identify subgroups of animal’s be- haviors using depth sensors. Their algorithm, MoSeq, uti- lized the principal components of "spinograms" to identify action groups (7). The authors uncovered the striatal neu- ral correlates of action, consistent with the notion that pop- ulation of medium spiny neurons (MSNs) encodes certain behavior, and the organization of (812). In another sem- inal study, Klaus et al. employed a unique dimensional- ity reduction method, t-Distributed Stochastic Neighbor Em- bedding (t-SNE) (13), to visualize clusters behaviors based on time-varying signals such as speed and acceleration (12). There, they discovered that the t-SNE clusters are easily in- terpreted as mouse behaviors and was correlated with distinct striatal neural ensembles. The powerful rationale behind se- lecting t-SNE to project high dimensional features onto low- dimensional space is to preserve the contrast in distributions of the original features, or so called "local structure". To- gether, these studies suggest that data-driven algorithms can provide reasonable clusters of animal behavior that in turn provide insight into how thoughts generate actions. Building on this progression, we were motivated to investigate the low-dimensional clusters of a composite of high-dimensional features - such as body length, distance and angle between body parts, occlusion of a body part from view, and speed. In conjunction with an open source platform, DeepLabCut, we provide an open-source unsu- pervised learning algorithm that integrates the three key layers of understanding (dimensionality reduction, pattern recognition, and machine learning) to enable autonomous Hsu et al. | bioRχiv | September 15, 2019 | 1–13 . CC-BY-NC 4.0 International license not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which was this version posted September 16, 2019. . https://doi.org/10.1101/770271 doi: bioRxiv preprint

Transcript of B-SOiD: An Open Source Unsupervised Algorithm for ...2D perspective) (Eqn.1). Feature 2 subtracts...

B-SOiD: An Open Source UnsupervisedAlgorithm for Discovery of Spontaneous

BehaviorsAlexander I. Hsu1 and Eric A. Yttri1�

1Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA USA

The motivation, control, and selection of actions comprisingnaturalistic behaviors remains a tantalizing but difficult fieldof study. Detailed and unbiased quantification is critical. In-terpreting the positions of animals and their limbs can be use-ful in studying behavior, and significant recent advances havemade this step straightforward (1, 2). However, body positionalone does not provide a grasp of the dynamic range of nat-uralistic behaviors. Behavioral Segmentation of Open-field InDeepLabCut, or B-SOiD ("B-side"), is an unsupervised learn-ing algorithm that serves to discover and classify behaviors thatare not pre-defined by users. Our algorithm segregates sta-tistically different, sub-second rodent behaviors with a singlebottom-up perspective video camera. Upon DeepLabCut esti-mating the positions of 6 body parts (snout, the 4 paws, and thebase of the tail), our software performs novel expectation max-imization fitting of Gaussian mixture models on t-DistributedStochastic Neighbor Embedding (t-SNE). The original featurestaken from dimensionally-reduced classes are then used builda multi-class support vector machine classifier that can decodemillions of actions within seconds. We demonstrate that thehighly reproducible, independently-classified behaviors can beused to extract kinematic parameters of individual actions aswell as broader action sequences. This open-source platformenables the efficient study of the neural mechanisms of spon-taneous behavior as well as the performance of disease-relatedbehaviors that have been difficult to quantify, such as groomingand stride-length in Obsessive-Compulsive Disorder (OCD) andstroke research.

Behavioral Analysis | Open-field | DeepLabCut | Unsupervised LearningCorrespondence: [email protected]

IntroductionThe brain has evolved to support generation of action

sequences in animals, particularly for surviving constantlychanging environments. The core of surviving depends onan animal’s ability to correctly anticipate external events, andsuch decisions can be visualized through sequence of actionsthat have been adapted to those events for optimal outcome(3). Comprehensive tracking of behavior permits behav-ioral scientists to not only to quantify behavioral output (4);but also to identify unforeseen motions, such as "tap danc-ing" in songbirds(5). However, processing such data fromrecorded behavior is an extremely time and labor intensiveprocess, even with the limited and expensive commercial op-tions available (6).

Recent advances in computer vision and machine learn-ing has accelerated automatic tracking of geometric estimates

of body parts (1, 2). Although establishing the location ofbody part can be informative given the right experimentalconfiguration, the behavioral interpretability is quite low. Forinstance, the estimated position of where each paw is in anOpen-field does not capture what the animal is doing. More-over, the angle criterion ascribed to whether a turning move-ment has occurred, or the minimum duration of an ambula-tory bout, are all subjective and could even depend on ani-mal size and/or video capture technique. This uncertainty inbehavioral outcome only makes dissecting the neural mecha-nisms underlying the behavior more complicated. The chal-lenges presented above motivated our investigation into howpartnering dimensionality reduction with pattern recognitioncould create a viable tool in automatic classification of an an-imal’s behavioral repertoire.

In a recent behavioral study, Markowitz and colleagueswere able to automatically identify subgroups of animal’s be-haviors using depth sensors. Their algorithm, MoSeq, uti-lized the principal components of "spinograms" to identifyaction groups (7). The authors uncovered the striatal neu-ral correlates of action, consistent with the notion that pop-ulation of medium spiny neurons (MSNs) encodes certainbehavior, and the organization of (8–12). In another sem-inal study, Klaus et al. employed a unique dimensional-ity reduction method, t-Distributed Stochastic Neighbor Em-bedding (t-SNE) (13), to visualize clusters behaviors basedon time-varying signals such as speed and acceleration (12).There, they discovered that the t-SNE clusters are easily in-terpreted as mouse behaviors and was correlated with distinctstriatal neural ensembles. The powerful rationale behind se-lecting t-SNE to project high dimensional features onto low-dimensional space is to preserve the contrast in distributionsof the original features, or so called "local structure". To-gether, these studies suggest that data-driven algorithms canprovide reasonable clusters of animal behavior that in turnprovide insight into how thoughts generate actions.

Building on this progression, we were motivated toinvestigate the low-dimensional clusters of a composite ofhigh-dimensional features - such as body length, distanceand angle between body parts, occlusion of a body partfrom view, and speed. In conjunction with an open sourceplatform, DeepLabCut, we provide an open-source unsu-pervised learning algorithm that integrates the three keylayers of understanding (dimensionality reduction, patternrecognition, and machine learning) to enable autonomous

Hsu et al. | bioRχiv | September 15, 2019 | 1–13

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 16, 2019. . https://doi.org/10.1101/770271doi: bioRxiv preprint

multi-dimensional behavioral classification. We found thatbehavioral-Segmentation of Open-field in DeepLabCut (B-SOiD) automatically uncovers reasonable behaviors with ex-cellent accuracy (tested against both held out data and blindhuman observers). It is easily applied across behavioral stud-ies. Moreover, in testing the power of the tool, we have dis-covered a broader and dynamic behavioral structure - an or-ganization of "what to do next" - exists in mice exploringa novel environment. Finally, we have released our repos-itory B-SOiD on open-source platform GitHub, and haveincluded three versions of our work: manual thresholdingof pre-defined behaviors, unsupervised discovery of actionclasses, and action class modeling, to suit a variety of behav-ioral research needs.

ResultsA schematic flowchart describing the proposed un-

supervised learning algorithm B-SOiD (Fig.1). UponDeepLabCut estimating the body parts outlining the ani-mal (snout, 4 paws, and the proximal tail), features suchas body length, speed and angle can be computed. To vi-sualize high-dimensional feature clusters, we opt to followa recent study(12) and use t-SNE, a particular dimensional-ity reduction algorithm that preserves local structure. Thesefeature clusters appears to carry topology from the high-dimensional space; therefore, we hypothesize that the clus-ters can be grouped using expectation maximization (EM)algorithm by fitting parameters of Gaussian mixture models(GMM). High-dimensional features from each GMM classwill then be used to train learners from multi-class supportvector machine (SVM) classifier. To validate our perfor-mance, we propose to test held-out data against the originalcluster assignments over many iterations.We provide this as an open source toolbox for the neuro-science community who uses DeepLabCut to study animalbehavior.https://github.com/YttriLab/B-SOiD

A. Selection of high dimensional features. An animalbehavior can be parsed into a sequence of changes in physi-cal features. For feature selection, we performed Hierarchi-cal clustering analyses on 20 features and identified 3 ma-jor classes. The classes can be generalized as pose-estimatespeed, angle, and length. We tested multiple different com-binations and strategically selected a combination of 7 thatwill most likely help isolate the different rodent behaviors.Feature 1 examines the body length of the mouse, dissoci-ates stationary from ambulatory states, and identifies actionsthat consist of elevated body parts (mouse will appear shorterwith rearing, grooming or scrunching with a flat bottom-up2D perspective) (Eqn.1). Feature 2 subtracts front paws tobase of tail distance from body length, or whether the ani-mals front paws are further away from base of tail than thatof the snout (Eqn.2). Feature 3 subtracts back paws to baseof tail distance from body length, or how extended/contractedthe snout is from back paws using body length as a reference(Eqn.3). Feature 4 finds the distance between two front paws,

as the proximity of the two paws can be an important markerfor various behaviors (Eqn.4). Feature 5 looks at the speed ofsnout, or how far the snout has moved per unit time (Eqn.5).Feature 6 looks at the speed of base of tail, or how far thebase of tail has moved per unit time (Eqn.6). Finally, feature7 is concerned with body angle, whether the animal is orient-ing to the right or left, and how big of a angular change thereis (Eqn.7). Upon extracting these 7 features, we visualizedthe clusters in a dimsionally-reduced space.

B. t-SNE clustering and visualization of high-dimen-sional features. To visualize the clusters in a 7-dimensionalfeature space, we performed a particular dimensionality re-duction clustering algorithm, t-SNE, onto a 3-dimensionalspace. On this 3-dimensional t-SNE space, we saw distinctnodes of data clusters that are appear to be interconnected,albeit much fewer data points (Fig. 2). The spectral nature oflow-dimensional features align with prior work (1, 12). Wefurther examined this 3-D space and uncovered that each in-dividual cluster appears to represent an interpretable action,as opposed to similar snapshots between two very differentactions. This is consistent with the notion that a behavior isusually inferred by a composite of high dimensional features.In other words, a rear and groom will both contain elevation(away from the camera), but only groom would have consis-tent changes in inter-fore-paw distances. Together, these re-sults suggest that we conserved the contrast in 7-dimensionalspace that are important for behavioral segmentation even ina lower-dimensional space.

C. Expectation maximization fitting of Gaussian mix-ture models. Unsupervised grouping will be required to dif-ferentiate one group from another autonomously. Since t-SNE utilizes Gaussian kernels for joint distribution, we choseto classify the clusters using expectation maximization (EM)algorithms for fitting of Gaussian mixtures models (GMM)(14). The "E-step" in EM requires a set of parameters tobe initialized, then subsequently updated with the "M-step".Since we do not have a priori in mouse action classes in a nat-uralistic setting, we initialized model mean, covariance, andpriors with random values. The danger of randomly initial-izing GMM parameters is getting a sub-optimal local mini-mum log-likelihood. To attempt at escaping poor initializa-tions, we chose to perform this method iteratively over manytimes and kept the initialized parameter set that gave rise tothe lowest log-likelihood. Additionally, we allowed the EMalgorithm to identify up to 30 classes, of which 15 uniqueclasses (colored) were pulled out (Fig.2). Interestingly, a cou-ple classes (8 and 15), can only be found in a few animals.These results argue that our algorithm adapts to inter-animalvariability in naturally occurring behaviors if sufficient dataare collected.

D. GMM classes represent distinct actions. The 15classes identified using GMM was based on data in the low-dimensional space. This does not warrant a proper segre-gation of actual behavior. The classes could very well ei-ther subdivide a sequence of turns, group a whole spectrum

2 | bioRχiv Hsu et al. | B-SOiD

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 16, 2019. . https://doi.org/10.1101/770271doi: bioRxiv preprint

D GMM classes represent distinct actions

Fig. 1. Flow-chart of our proposed algorithm, B-SOiD. (A) Obtain physical features that pertain to the three differentclass (speed, angle and length) that define movement. (B) Cluster high-dimensional input features in a low-dimensionalspace utilizing t-distributed stochastic neighbor embedding (t-SNE) and fit Gaussian mixture models (GMM) to segmentbehaviors. (C) Train a support vector machine (SVM) classifier based on the distribution of all 7 features.

Hsu et al. | B-SOiD bioRχiv | 3

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 16, 2019. . https://doi.org/10.1101/770271doi: bioRxiv preprint

of behavior into one, or even both. To answer this, we ran-domly isolated short videos based on classes (See https://github.com/YttriLab/B-SOiD for details). Wefound that these 15 classes were very differentiable from oneanother. Since we could potentially carry bias in examin-ing these action classes, we further investigated the distri-bution of low-dimensional (physical) features. Our resultsindicated that the 15 distinct Gaussian mixtures in the high-dimensional space, were also different in the compiled phys-ical feature space (p < 0.01, KS-test). Given no two classeswere alike, we can confidently implement an simple SupportVector Machine (SVM) classifier that deals with learning todistinguish more than two classes. All seven features will beused to train the learners 2.

E. SVM classifier design for multi-class actions. So far,all analyses were done post-hoc (after collecting hours of dataand running dimensionality reduction on close to 1 millionsamples). Although this automation can already drasticallyimprove neurobehavioral correlation, we went a step furtherand incorporated machine learning for a more real-time so-lution. Since we output more than 2 classes that we wouldneed to differentiate, a simple GLM would not be sufficientfor encoding. Recent computational advances have enabledSVMs to significantly improve decoding accuracy using er-ror correcting output codes (ECOC) (15). Based on our uti-lization of normal distribution with t-SNE and GMM, we hy-pothesize that using Gaussian kernel functions for classifiertraining would supply the most robust and accurate decod-ing results. To test our hypothesis, we trained our classifierwith three types of kernel tricks: linear, Gaussian, and poly-nomial. To benchmark model performance, we performedcross-validation on various partition size of held-out data.With 100 iterations for each partition size, we found thatSVM with Gaussian kernel function predicts an the classesmost accurately and with least variation given sufficient train-ing data ( 75,000 frames, or 70 % of 3 hours of video) (Fig.3).Since our expectation matched our observation, it suggeststhat we understand the technique that we chose to parcellatebehavior.

F. SVM classifier generalizes to new dataset. For amodel to be applicable, an essential feature to carry is gener-alizability across variable datasets. If our behavioral modeltruly understands behavior in a high-dimensional space, amouse of very different physique shall not impact perfor-mance. After we trained our model, we used it to predictthe actions of a mouse that is noticeably different. Within afew seconds, the classifier categorized all 18000 actions (30-minute video, 10 frames/second (fps)). The randomly sam-pled predicted classes showed very similar behavior to theclasses in training (Fig.4). Although there is no good mea-sure to benchmark such similarity, our results do suggest thatthe behavioral model we built did not over-fit our trainingmice. In the following sections, we provide more evidenceto increase the probability that our data-driven algorithm isvalid for behavioral quantification.

G. Classifier performance comparable to human ob-servers. Certainly, we want to exemplify the coherence be-tween our data-driven technique with human observed be-haviors. Up to this point, our results have mostly demon-strated statistical consistency in B-SOiD; However, part ofthe motivation was to build a toolbox that discovers user-defined behaviors for DeepLabCut. Transition probabilitymatrix is a common method for analyzing similarity betweenstates based on the predictability of the next states. For ex-ample, if actions A and B, but not C, have a high probabil-ity going into action D at the next time-step, we would con-sider actions A and B are more similar to each other thanthem to C. Based on the transition probability matrix, our15 classes can be categorized into three major groups: sta-tionary/rear, groom, and non-stationary 6. We ran one 10minute video at same temporal resolution (10 fps) against 4blind human observers and found that the inter-grader coher-ence (80.97 +− 3.96%) was similar to machine-grader coher-ence (72.04 +− 4.53%). Our findings not only provided evi-dence that human defined behaviors are typically statisticalinferences of multi-dimensional patterns, but also raised anawareness for biases in subjective behavioral quantification,which takes 1000x longer. To further support the validity ofour model, we looked into the 15 individual classes.

H. Unsupervised learning reasonably parses behav-iors and aligns with expectation. To address the robust-ness of the 15 classes modeled, we collected a data from 10different animals with 19 total half-hour sessions (in additionto a training set of 4 animals comprising a total of 6 ses-sions). After combining training and prediction data, we cananalyze a few characteristics of individual actions to see ifthose match our expectations. First, we examine bout dura-tions for each individual class. We found that across animals,the variability in time is allotted per action class was con-served when exploring a novel open-field (Fig.5). In addition,when grouped based on the commonly agreed upon states(rest/pause, sniff/nose-poke, rear, groom, gather/scrunch, lo-comotion/orientation), the relative distribution of durationsare in alignment with what we have observed. For instance:rearing against the wall generally will last longer than un-supported rears that are out in the open; a groom that hasa smaller range of motion may last over 4 seconds, while afaster and coarser "scratch" may not; orientating movementsare much briefer than locomotion given the size of our open-field.Second, we analyzed the transitional probabilities of allclasses. We can test the validity of our model based onthe predictability of the action followed. If the subdividedgrooms are true classes, the three classes that we ascribed"groom" should be interconnected given the stereotypical se-quence of grooming in rodents (? ). Indeed, the meanacross-animal transition matrix exhibits an oscillating struc-ture around the intersection of current and next states ofgrooms (Fig.6). We also observed that pause and sniff stateshave a higher probability of transitioning into non-stationarystates (locomotion and orientation) than the inverse, consis-tent with the notion of novelty-seeking behavior, particularly

4 | bioRχiv Hsu et al. | B-SOiD

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 16, 2019. . https://doi.org/10.1101/770271doi: bioRxiv preprint

H Unsupervised learning reasonably parses behaviors and aligns with expectation

Fig. 2. Gaussian mixture models (GMM) classified clusters into unique actions. (A) Expectation maximization for fitting ofGMMs on 3-dimensional t-SNE space identifies 15 classes. (B) All 7 low-dimensional (physical) features per class, eachrepresent the corresponding colors from (A).

Hsu et al. | B-SOiD bioRχiv | 5

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 16, 2019. . https://doi.org/10.1101/770271doi: bioRxiv preprint

Fig. 3. A. Support vector machine classifier robustness with three different kernel tricks. Given sufficient training data, itappears Gaussian kernel function (magenta) works best and most robust, albeit polynomial works better for small trainingdata (blue).

when first exposed to a new environment. We postulate thatthis is also the case for individual animals. We split hour-long sessions into early versus late exploration blocks for in-dividual animals (N=9, 30 minutes each block). We discov-ered that such asymmetry into non-stationary versus station-ary was present across individual animals in early exploration(left panels) when compared to that in late exploration (rightpanels; Fig.6. Our findings suggest that, aside from definingindividual actions, B-SOiD serves as a useful tool for uncov-ering the larger structure of action sequences.

I. Exploration versus exploitation. The exploration-exploitation dilemma is a widely established phenomenonin which an agent, in this case a mouse, learns about theenvironment through series of transitions between obtaininginformation by exploring and exploiting previously obtainedknowledge. With the B-SOiD algorithm, we have isolatedpotential behavioral strategies going from early to lateexploration and suspect a transition from exploration toexploitation. We hypothesize that the temporal inter-classintervals (ICIs) distribution will converge for stationaryversus non-stationary behaviors. The empirical cumula-tive distribution function (eCDF) for time between thesame actions demonstrates that the difference betweennon-stationary and stationary ICIs distributions diminishedduring late exploration (Fig.7), consistent with the idea thatexploration-exploitation trade-off depends on experience in

the environment (16). More surprisingly, we observed thatthe our extracted stereotypical grooming sequence appear tobe affected by experience as well. These results suggest thatour algorithm can dissect out a finer behavioral predictor forexploration-exploitation trade-off.

J. Behavioral organization in naturalistic exploration.We found it interesting that mice revisit grooming behaviorsmore often in the later stages of exploration. We further in-vestigated the baseline-subtracted history-dependent transi-tion matrix (HDTM). This measure reveals how likely an ac-tion is is to occur above and beyond the baseline probabil-ity of it occurring at all (Fig.8). In the mean across-animalHDTM, a rhythm for grooming in particular is observable.For individual animals, we see that the pattern is more pro-nounced in the late exploration phase (mouse 2 experiencedthe open field for an additional 15 minutes prior to the ’early’block shown). This may be due to many causes, including,but not limited to, comfort with new surroundings, decreasedbenefit in exploration, or even merely energy expenditure.Although more in-depth analyses are needed, we are encour-aged by B-SOiD’s ability to capture a full range of behavioralresponses.

6 | bioRχiv Hsu et al. | B-SOiD

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 16, 2019. . https://doi.org/10.1101/770271doi: bioRxiv preprint

J Behavioral organization in naturalistic exploration

Fig. 4. Classifier predicted actions of a novel animal (brown mouse, right) matched those of the trained animal (blackmouse, left). Plotted locomotion was the beginning 500ms (going from shaded gray to solid colors) of an ambulation boutthat continued on for more than 500ms; whereas grooms and rears were snapshots of 300ms into a continuous behaviorthat lasted at least 500ms.

Hsu et al. | B-SOiD bioRχiv | 7

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 16, 2019. . https://doi.org/10.1101/770271doi: bioRxiv preprint

Fig. 5. Duration mean and standard error for all behavioral bouts from 10 mice, consisting of 19 total 30-minute sessions(n=2 for mouse 1-9 and n=1 for mouse 10). A-B. Two types of unique pause and sniff were identified, similar boutdurations. C. The two classified rears, either against the wall or without support, show that rearing lasts longer when it isagainst the wall. D. The sub-divided grooming sequence shows that the animals make large range of motions (coarse)for shorter periods of time compared to that of fine grooming or when it is orienting to either side to groom. E. Gatheringor scrunching, has majority of action bouts shorter than 2s. F. Orienting to the right or to the left has very similar boutduration distribution, all appears to be shorter than locomotion.

Discussion

Naturalistic behavior provides a rich account of an an-imal’s motor plans and decisions, largely unfettered by ex-perimental constraints. Until recently, capturing the com-plexity of these behaviors in the open field has proven pro-hibitively taxing. Still, new methods are difficult to imple-ment and/or offer an incomplete account of the movements ofthe actual effectors (e.g. limbs, head). Our unsupervised al-gorithm, B-SOiD, offers the opportunity to capture limb andaction dynamics through the use of the popular DeepLabCutsoftware. This tool also serves as the vital bridge betweenknowing the location of body parts (provided by DeepLab-Cut) and the actions that those body parts perform in concertwith one another. It also demonstrates the utility of artificialintelligence, specifically the integration of multi-dimensionalembedding, iterative expectation maximization, and a multi-learner design coding matrix, in classifying behavior. Pairedwith the insight that the presence of missing information isitself information, these algorithms even allow the extraction

of three-dimensional movements from two-dimensional data.Finally, our approach has enabled the initial study of actionsets and how transitions between actions change with context- in this case, something as simple as the passage of time.The described unsupervised learning algorithm allows usersto automate detection of various classes of actions so as tounderstand the dynamics of Open-field, naturalistic behaviorand the neural mechanisms governing their selection and per-formance.Also important is the ability to decompose behaviors intotheir constituent movements. By using limb position, we canextract not only whether an animal was walking or grooming,but also determine the contributing stride length, speed ofarm extension, etc. While we have previously benefited fromaccess to such performance parameters (17), this may proveto be a potent advantage in the study of disease models. Thestudy of obsessive compulsive disorder in particular has longsought improved identification and quantification of groom-ing behavior. B-SOiD enables the detection of grooming, it’s

8 | bioRχiv Hsu et al. | B-SOiD

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 16, 2019. . https://doi.org/10.1101/770271doi: bioRxiv preprint

J Behavioral organization in naturalistic exploration

Fig. 6. Frame-by-frame transition probabilities, each row describes a current action, corresponding columns representthe range of upcoming behaviors, and colors indicate the computed probability. A. Average Frame-by-frame transitionmatrix from 10 different animals, 19 total 30-minute long sessions, predicts reasonable transitions between stationarysegments, between segregated grooming sequence, and between ambulatory actions. B.-C. Individual mouse’s transitionmatrix appears to be slightly variable; however, gross structural change in behavior appears to be similar going from earlyexploration (0:00:00.16 to 00:30:00.00, or 0-30 mins, B.) to late exploration (0:30:00.16 to 1:00:00.00, or 0-60 mins, C.).

Hsu et al. | B-SOiD bioRχiv | 9

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 16, 2019. . https://doi.org/10.1101/770271doi: bioRxiv preprint

Fig. 7. Empirical cumulative distribution function of recurrent behavioral intervals from 10 mice, first 30 minutes versuslast 30 minutes Open-field exploration. A.-C. In the first 30 minutes of Open-field navigation, all animal appears to revertback to actions that assist in exploration of environment faster, i.e. locomotion, orientation, and perhaps rearing againstthe wall. D.-F. In the last 30 minutes of the experiment, the recurrent behavioral intervals appear to shift in majority of theactions, showing a shorter interval between grooms and pauses, and a longer interval between locomotion.

relation to other behavior, and how vigorously each groom-ing bout is performed (18). Additionally, neurobehavioraldeficits such as diminished locomotor speed may be the re-sult of shorter stride length or slower stride – with importantdifferences between the two causes. Therefore, understand-ing both the structure and substructure of actions increasesthe potential of research.Though commonly used, and perhaps as a result of its com-mon usage, the term action has a broad range of applica-tions ranging from sub-second muscle activations (e.g "snap-ping your fingers") to a prolonged series of motor commands(e.g. "going home" or reorienting, walking, and engaging adifferent behavioral port). Our approach, while initially fo-cused on what we believed to be behavioral building blocks(grooming, rearing, walking, etc) was susceptible to priorsand anthropomorphizing. Based upon the the described di-mensionality reduction applied to length, distance, change inposition and angle, we discovered that, indeed, we may beover-simplifying animal behaviors, that may generate unnec-essary challenges in neurobehavioral correlation. In the soft-ware package, we preserve the user-defined categories of "ac-

tions", along with the unsupervised clustering and a hybridclassifier format. Therefore, B-SOiD provides a unique ad-vantage. It allows the experimenter to automatically or man-ually build classifiers of all types of behavior. Given the inputdata of control or any disease tetrapod model - which spansrodents to non-human primates, the algorithm derives classesof behavior based solely upon what is present in the data.

MethodsAnimal and Open-field set-up. 10 C57/BL6 adult mice (5males and 5 females) were placed in an clear 15x12 inch rect-angular Open-field for one hour while a 1280x720p video-camera captured video at 60Hz. Video was acquired from thebottom-up, 19 inches away from the diagonal midpoint of thecontainer. The videos were then divided into first and last 30minutes for analyses purposes.

Low-pass likelihood filter. DeepLabCut estimates the like-liest position of all body parts for all frames, even when it iscompletely occluded. Since we are recording from bottom-up, it is often the case that the mouse’s snout or forelimbs

10 | bioRχiv Hsu et al. | B-SOiD

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 16, 2019. . https://doi.org/10.1101/770271doi: bioRxiv preprint

J Behavioral organization in naturalistic exploration

Fig. 8. Surfaced behavioral structure to exploration. A. Average recurrent matrix aligned to end of most recent behavior Xfrom all sessions revealed that various grooms identified are recurring more often than baseline, whereas other behaviorsdo not. B.-C. Individual mouse’s recurrent matrix appears to be slightly variable; however, majority of animals show anincrease in probability of recurring grooms from early exploration (0:00:00.16 to 00:30:00.00, or 0-30 mins, B.) to lateexploration (0:30:00.16 to 1:00:00.00, or 0-60 mins, C.). Note that mouse 2 was pre-exposed to the open-field for 15minutes prior to recording.

Hsu et al. | B-SOiD bioRχiv | 11

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 16, 2019. . https://doi.org/10.1101/770271doi: bioRxiv preprint

will not be seen during rearing, grooming, and the likes ofthem. To take advantage of this situation, instead of replacingthe estimated position with unworkable variables, we apply alow-pass filter and convert all sub-threshold estimated posi-tions as the previous likeliest position, if exceeds our cutoff(p < 0.2). This particular workaround allows the machine totreat it as a unique signal (no signal).

Feature extraction. Out of the 20 possible features we stud-ied, we distilled it down to 7 based on hierarchical clusteringanalyses (HCA) and features that capture absence of signalsbest. The body length, or distance from snout to base of tail,or dST , is formulated as follows,

||dST ||=

√√√√ 2∑D=1

(SD−TD)2 (1)

, where SD and TD represent the likeliest position of snoutand base of tail, respectively, and D denoting x or y dimen-sion.The front paws to base of tail distance relative to body length,or dSF , is computed with the equation,

dSF = ||dST ||− ||dFT ||

=

√√√√ 2∑D=1

(SD−TD)2−

√√√√ 2∑D=1

(FD−TD)2(2)

, where dFT is the distance between front paws and base oftail, FD is the mean x and y position of the two front paws.The back paws to base of tail distance relative to body length,or dSB , is calculated using the formula,

dSB = ||dST ||− ||dBT ||

=

√√√√ 2∑D=1

(SD−TD)2−

√√√√ 2∑D=1

(BD−TD)2(3)

, where dBT is the distance between back paws and base oftail, BD is the mean x and y position of the two back paws.The distance between two front paws, dFP , is derived from,

||dFP ||=

√√√√ 2∑D=1

(FRD−FLD)2 (4)

, where FRD and FLD are the likeliest positions of rightand left front paws, respectively.The snout speed, vS , or displacement over period of 16 ms,uses the following equation,

||vS ||=

√√√√ 2∑D=1

(St+1D −StD)2 (5)

, where St+1D and StD refer to the current and past likeliest

snout positions, respectively.

The base of tail speed, vT , or displacement over period of 16ms, similar to the formula above, as follows,

||vT ||=

√√√√ 2∑D=1

(T t+1D −T tD)2 (6)

, where T t+1D and T tD refer to the current and past likeliest

base of tail positions, respectively.The snout to base of tail change in angle, ∆θA′

A , is formulatedas follows,

∆θA′

A =sign(A′xAy−AxA′y)(180π

)

arctan(A′xAy−AxA′yA′xAx+A′yAy

)(180π

)

sign(A′xAy−AxA′y)(1−sign(AxA′x+AyA′y))

(7)

, where A and A′ represent body length vector at past (t)and current (t+1) time steps, respectively, sign equals 1 forpositive, -1 for negative, 0 for 0, and x • ||x|| for complexnumbers. Note that the Cartesian product and dot product arenecessary for four-quadrant inverse tangent and that the signis flipped to determine left versus right in terms of animal’sperspective.In addition, the features are also smoothed over, or averagedacross, a sliding window of size equivalent to 60 ms (30 msprior to and after the frame of interest).

Data clustering. With sampling frequency at 60 Hz, 1 frameevery 16 ms, we are capturing fragments of movements. Anyclustering algorithm will have a difficult time teasing apartthe innate spectral nature of action groups. To resolve thisissue, we decided to either take the sum over all fragmentsfor time-varying features (features 5-7), or the average acrossthe static measurements (features 1-4) every 6 frames. Dueto our sliding window smoothing prior to this step at aboutdouble the resolution of the bins, we are not concerned withwashing out inter-bin behavioral signals.

Upon t-distributed Stochastic Neighbor Embedding (t-SNE) was performed on our high-dimensional input data tominimize the divergence between the distribution of input ob-jects and the embedded distribution of the low-dimensionalspace. This algorithm has been preferred over other dimen-sionality reduction methods simply due to preservation oflocal structures, allowing behavioral data points to be pre-sented in a continuous fashion. The locations of the embed-ded points yij are determined by minimizing the Kullback-Leibler divergence between joint distributions P and Q,which is formulated using the following equations:

C(ε) =KL(P ||Q) =∑i 6=j

pij log(pijqij

) (8)

∂C

∂yi= 4

∑j 6=i

(pij− qij)qijz(yi−yj) (9)

12 | bioRχiv Hsu et al. | B-SOiD

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 16, 2019. . https://doi.org/10.1101/770271doi: bioRxiv preprint

J Behavioral organization in naturalistic exploration

In simpler terms, similar objects (think of the objectas a mouse action, high values of pij) will retain its simi-larity visualized in the low-dimensional space (high valuesof qij), scaled with a normalization constant Z defined as∑k 6=l(1 + ||yk− yl||2)−1. To accelerate the dimensionality

reduction process, we opted to perform Barnes-Hut approxi-mation (13).

Grouping. Expectation Maximization based on GaussianMixture Models (14) was performed to guarantee conver-gence to local optimum. We opted to randomly initialize theGaussian parameters µk,Σk and πk, over number of times toescape a bad local optimum.First, we evaluate the responsibilities using the initialized pa-rameter values, or E-step,

γ(znk) = πkN(xn|µk,Σk)∑kj=1πjN(xn|µj ,Σj)

(10)

Second, we re-estimate the parameters using current respon-sibilities γ(znk), or M-step,

µnewk = 1Nk

N∑n=1

γ(znk)xn (11)

Σnewk = 1Nk

N∑n=1

γ(znk)(xn−µnewk )(xn−µnewk )T (12)

πnewk = NkN

(13)

Finally, we evaluate the log likelihood,

lnp(X|µ,Σ,π) =N∑n=1

ln{K∑k=1

πkN(xn|µk,Σk)} (14)

to check whether or not the parameters or log likelihood hasconverged. If the convergence criterion is not satisfied, thenlet µk,Σk,πk ←− µnewk ,Σnewk ,πnewk , and repeat the E andM-steps.

Classifier design. Since we are dealing with more than twoclasses, we performed error correcting output codes (ECOC)(15, 19) to reduce the problem from multi-class discrimina-tion into a set of binary classification problems. To build thisSVM classifier, we consider the following exemplar table,

Classes Learner 1 Learner 2 Learner 3

Class 1 1 1 0Class 2 -1 0 1Class 3 0 -1 -1

, where learner 1 learns to differentiate class 1 (1) fromclass 2 (-1), learner 2 learns that class 1 (1) is different fromclass 3 (-1), and learner 3 learns to classify class 2 (1) fromclass 3 (-1). Following constructing the coding design matrixM with elements mkl, and sl as the predicted classification

score for positive class of learner l. ECOC algorithm assignsa new observation to the k̂th class that minimizes the aggre-gate loss for L binary learners.

k̂ = argmin

∑Ll=1 |mkl|g(mkl,st)∑L

l=1 |mkl|(15)

, where g is the loss of the decoding scheme.

ACKNOWLEDGEMENTS

We would like to acknowledge Susanne Ahmari’slab at University of Pittsburgh for validating B-SOiDin OCD mouse models (See https://github.com/YttriLab/B-SOiD for videos). We thank Gretta Linde-mann, Justin Lowe, Grace Lindemann, Jessica Meyers forpainstakingly annotating source video manually.

References1. Alexander Mathis, Pranav Mamidanna, Kevin M. Cury, Taiga Abe, Venkatesh N. Murthy,

Mackenzie Weygandt Mathis, and Matthias Bethge. DeepLabCut: markerless pose esti-mation of user-defined body parts with deep learning. Nature Neuroscience, 2018. ISSN15461726. doi: 10.1038/s41593-018-0209-y.

2. Tanmay Nath, Alexander Mathis, An Chi Chen, Amir Patel, Matthias Bethge, and Macken-zie Weygandt Mathis. Using DeepLabCut for 3D markerless pose estimation across speciesand behaviors. Nature Protocols, 2019. ISSN 1754-2189. doi: 10.1038/s41596-019-0176-0.

3. C. R. Gallistel. Representations in animal cognition: An introduction. Cognition, 1990. ISSN00100277. doi: 10.1016/0010-0277(90)90016-D.

4. Mayank Kabra, Alice A. Robie, Marta Rivera-Alba, Steven Branson, and Kristin Branson.JAABA: Interactive machine learning for automatic annotation of animal behavior. NatureMethods, 10(1):64–67, 1 2013. ISSN 15487091. doi: 10.1038/nmeth.2281.

5. Nao Ota, Manfred Gahr, and Masayo Soma. Tap dancing birds: The multimodal mutualcourtship display of males and females in a socially monogamous songbird. Scientific Re-ports, 5, 11 2015. ISSN 20452322. doi: 10.1038/srep16614.

6. A J Spink, R A Tegelenbosch, M O Buma, and L P Noldus. The EthoVision video trackingsystem–a tool for behavioral phenotyping of transgenic mice. Physiology & behavior, 73(5):731–44, 8 2001. ISSN 0031-9384. doi: 10.1016/s0031-9384(01)00530-3.

7. Jeffrey E Markowitz, Winthrop F Gillis, Celia C Beron, Scott W Linderman, Bernardo LSabatini, Sandeep Robert, and Datta Correspondence. The Striatum Organizes 3D Be-havior via Moment-to-Moment Action Selection. Cell, 174:44–49, 2018. doi: 10.1016/j.cell.2018.04.019.

8. Barry J. Everitt and Trevor W. Robbins. Neural systems of reinforcement for drug addiction:From actions to habits to compulsion, 11 2005. ISSN 10976256.

9. Xin Jin and Rui M Costa. Start/stop signals emerge in nigrostriatal circuits during sequencelearning. Nature, 466(7305):457–62, 7 2010. ISSN 1476-4687. doi: 10.1038/nature09263.

10. Long Ding and Joshua I. Gold. The basal ganglia’s contributions to perceptual decisionmaking, 8 2013. ISSN 08966273.

11. Xin Jin, Fatuel Tecuapetla, and Rui M. Costa. Basal ganglia subcircuits distinctively encodethe parsing and concatenation of action sequences. Nature Neuroscience, 17(3):423–430,3 2014. ISSN 10976256. doi: 10.1038/nn.3632.

12. Andreas Klaus, Gabriela J. Martins, Vitor B. Paixao, Pengcheng Zhou, Liam Paninski, andRui M. Costa. The Spatiotemporal Organization of the Striatum Encodes Action Space.Neuron, 2017. ISSN 10974199. doi: 10.1016/j.neuron.2017.08.015.

13. Laurens Van Der Maaten. Accelerating t-SNE using Tree-Based Algorithms. Technicalreport, 2014.

14. M Jordan, J Kleinberg, and B Schölkopf. Pattern Recognition and Machine Learning. Tech-nical report.

15. Sergio Escalera, Oriol Pujol, and Petia Radeva. On the decoding process in ternary error-correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence,32(1):120–134, 2010. ISSN 01628828. doi: 10.1109/TPAMI.2008.266.

16. Oded Berger-Tal, Jonathan Nathan, Ehud Meron, and David Saltz. The exploration-exploitation dilemma: A multidisciplinary framework. PLoS ONE, 9(4), 4 2014. ISSN19326203. doi: 10.1371/journal.pone.0095693.

17. Eric A Yttri and Joshua T Dudman. Opponent and bidirectional control of movement velocityin the basal ganglia. Nature, 533(7603):402–6, 2016. ISSN 1476-4687. doi: 10.1038/nature17639.

18. Ann M Graybiel and Esen Saka. A genetic basis for obsessive grooming. Neuron, 33(1):1–2, 1 2002. ISSN 0896-6273. doi: 10.1016/s0896-6273(01)00575-x.

19. Thomas G Dietterich and Ghulum Bakiri. Solving Multiclass Learning Problems via Error-Correcting Output Codes. Technical report, 1995.

Hsu et al. | B-SOiD bioRχiv | 13

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 16, 2019. . https://doi.org/10.1101/770271doi: bioRxiv preprint