I am currently a senior computational biologist at Adaptive Biotechnologies on the TCR Discovery Team.
I defended my PhD in 2022. At Oxford, I was a member of Pier Palamara’s group. We developed a method, ARG-Needle, to infer the genome-wide genealogy of a set of genetic samples from genotyping array or sequencing data. We built the genome-wide genealogy of 337K samples in the UK Biobank and scanned the genealogy for associations with 7 complex traits. This yielded more rare and ultra-rare associations than found by genotype imputation, which were overall enriched for loss-of-function variation. See our paper in Nature Genetics or my thesis.
During my PhD, I was supported by the Clarendon Scholarship and was a member of St. John’s College. Before Oxford, I worked for two years as a research engineer at DeepMind in London.
Interests. Adaptive Immune Receptor Repertoires, Autoimmune Disease, Statistical Genetics, Population Genetics, Machine Learning
Projects
-
Biobank-Scale Ancestral Recombination Graph Inference
[code]
[blog]
In population genetics, the ancestral recombination graph (ARG) captures the history of coalescence, recombination, and mutation events that gives rise to observed genetic data. We developed a method, ARG-Needle, that leverages coalescent modeling for ascertained genotyping array data to infer accurate, biobank-scale ARGs from SNP arrays. We also developed a framework for performing mixed-model association of unobserved variation implied by an inferred ARG. Using these methods, we inferred the ARG of 337,464 individuals in the UK Biobank and performed genealogy-based association of 7 complex traits, recapitulating as well as detecting complementary associations compared to reference-based imputation. As these methods only require SNP array data, we anticipate they will be particularly relevant for populations that are currently undersequenced.
-
Mathematics of Linear Mixed Models
My first PhD project focused on improving linear mixed model association in genetics. Standard inference under the mixed model does not scale to modern genetic datasets, so my PhD supervisors and I were looking to build on past methods like BOLT-LMM and LDpred to further improve the scalability of mixed model association. Although I have put the project on pause, my work in this area led me to write a set of expository notes on the mathematics of linear mixed models. (In progress, last modified February 2020.)
-
Coconuts and Islanders: A Statistics-First Guide to the Boltzmann Distribution
An arXiv writeup presenting the Boltzmann distribution in what I hope is an accessible and intuitive way. I learned this approach from my father and the notes are dedicated to his memory.
-
Random Graphs and Giant Components
[code]
An R Markdown blog post introducing the Erdős-Rényi random graph and giant component. I tried to build intuition through figures and animations, but have also linked to further reading on random graphs. Done in my free time during my PhD.
Publications
2026
-
Population-scale Ancestral Recombination Graphs with tskit 1.0,
arXiv.
[code]
Ben Jeffery*, Yan Wong*, Kevin Thornton*, Georgia Tsambos*, Gertjan Bisschop†, Yun Deng†, E. Castedo Ellerman†, Thomas B. Forest†, Halley Fritze†, Daniel Goldstein†, Gregor Gorjanc†, Graham Gower†, Simon Gravel†, Jeremy Guez†, Benjamin C. Haller†, Andrew D. Kern†, Lloyd Kirk†, Ivan Krukov†, Hanbin Lee†, Brieuc Lehmann†, Hossameldin Loay†, Matthew M. Osmond†, Duncan S. Palmer†, Nathaniel S. Pope†, Aaron P. Ragsdale†, Duncan Robertson†, Murillo F. Rodrigues†, Hugo van Kemenade†, Clemens L. Wei߆, Anthony Wilder Wohns†, Shing H. Zhan†, Brian C. Zhang†, Marianne Aspbury, Nikolas A. Baya, Saurabh Belsare, Arjun Biddanda, Francisco Campuzano Jiménez, Ariella Gladstein, Bing Guo, Savita Karthikeyan, Warren W. Kretzschmar, Inés Rebollo, Kumar Saunack, Ruhollah Shemirani, Alexis Simon, Chris Smith, Jeet Sukumaran, Jonathan Terhorst, Per Unneberg, Ao Zhang, Peter Ralph‡, and Jerome Kelleher‡ (*, †, ‡ = equal contribution)
2025
-
HLA-A*03:01 as predictive genetic biomarker for glatiramer acetate treatment response in multiple sclerosis: a retrospective cohort analysis,
eBioMedicine.
[blog]
Brian C. Zhang*, Tilman Schneider-Hohendorf*, Rebecca Elyanow*, Beatrice Pignolet, Simon Falk, Christian Wünsch, Marie Deffner, Erik Yusko, Damon May, Daniel Mattox, Eva Dawin, Lisa Ann Gerdes, Florence Bucciarelli, Lisa Revie, Gisela Antony, Sven Jarius, Christiane Seidel, Makbule Senel, Stefan Bittner, Felix Luessi, Joachim Havla, Matthias Knop, Manuel A. Friese, Susanne Rothacher, Anke Salmen, Fumie Hayashi, Roland Henry, Stacy Caillier, Adam Santaniello, University of California San Francisco MS-EPIC Team, German Competence Network Multiple Sclerosis (KKNMS), Maria Seipelt, Christoph Heesen, Sandra Nischwitz, Antonios Bayas, Hayrettin Tumani, Florian Then Bergh, Gerd Meyer zu Hörste, Tania Kümpfel, Catharina C. Gross, Brigitte Wildemann, Martin Kerschensteiner, Ralf Gold, Sven G. Meuth, Frauke Zipp, Bruce A.C. Cree, Jorge Oksenberg, Michael R. Wilson, Stephen L. Hauser, Scott S. Zamvil, Luisa Klotz, Roland Liblau, Harlan Robins, Joseph J. Sabatino Jr., Heinz Wiendl, and Nicholas Schwab (* = equal contribution)
2024
2023
2019
2018
-
Vector-based navigation using grid-like representations in artificial agents,
Nature.
[blog]
Andrea Banino*, Caswell Barry*, Benigno Uria, Charles Blundell, Timothy Lillicrap, Piotr Mirowski, Alexander Pritzel, Martin Chadwick, Thomas Degris, Joseph Modayil, Greg Wayne, Hubert Soyer, Fabio Viola, Brian Zhang, Ross Goroshin, Neil Rabinowitz, Razvan Pascanu, Charlie Beattie, Stig Petersen, Amir Sadik, Stephen Gaffney, Helen King, Koray Kavukcuoglu, Demis Hassabis, Raia Hadsell, Dharshan Kumaran (* = equal contribution)
2017
-
The Kinetics Human Action Video Dataset,
arXiv.
Will Kay, João Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, Andrew Zisserman
D.Phil. Thesis