Abstract
Pose estimation algorithms are shedding new light on animal behaviour and intelligence. Most existing models are only trained with labelled frames (supervised learning). Although effective in many cases, the fully supervised approach may still produce noisy outputs that hinder downstream analyses. I will discuss our efforts to address this limitation with a semi-supervised approach that leverages the spatiotemporal statistics of unlabelled videos in two different ways: (1) unsupervised training objectives that penalise the network whenever its predictions violate smooth, low-dimensional dynamics; and (2) a new network architecture that predicts pose for a given frame using temporal context from surrounding unlabelled frames. The resulting pose estimation networks achieve better performance with fewer labels and provide smoother and more reliable pose trajectories. I will also describe a Bayesian post-processing approach based on deep ensembling and Kalman smoothing that further improves tracking accuracy and robustness. Finally, I will discuss our associated deep learning package and an accompanying cloud application that allows users to annotate data, train networks, and predict new videos at scale, directly from the browser.
Biography
Matt Whiteway is an Associate Research Scientist at the Zuckerman Institute (Columbia University) and a Data Scientist at the International Brain Lab. His research focuses on developing open source tools for analysing large-scale neural and behavioural datasets. This research builds on his work as a postdoctoral researcher in the Paninski Lab at Columbia University.