Geoff Bacon

Geoff Bacon

Computational linguist



I’m a computational linguist at Google. I received my PhD from UC Berkeley in 2020, advised by Terry Regier. My dissertation evaluated what neural networks learn about phonology, morphosyntax and semantics across more than 80 languages. From 2017-2020, I was a Senior Data Science Fellow at D-Lab, where I consulted and taught on natural language processing and data science with Python. I was a researcher in the Measuring Hate Speech team, where we developed NLP methods to identify hate speech and abusive language online. I built the Cerberus app to reduce the annotation burden of linguists. I was also a researcher at the International Computer Science Institute, working on integrating structured lexical resources with learned semantic representations.

Before my PhD, I was a project manager at Kleffmann Group and taught English in Grosseto and Paris. I studied lingiustics, classics and Arabic at the University of Melbourne. As a language learner, I’ve studied French, Italian, Arabic, Latin, Ancient Greek, Sanskrit and Middle Egyptian. As a linguist, I’ve worked on Somali, Nafsan and Tswefap, and spent two months in Nigeria conducting fieldwork on CiShigini.


  • Representation learning
  • NLP data pipelines
  • Multilingual NLP
  • Hate speech detection


  • PhD in Linguistics, 2020

    University of California, Berkeley

  • BA (Hons), 2013

    University of Melbourne

  • Diploma in Arabic, 2013

    University of Melbourne


My research evaluates what neural networks learn about language, through computational experiments, massively cross-linguistic datasets and fine-grained error analysis. In my dissertation, I build and analyze large datasets in more than 80 languages to characterize the phonological, morphosyntactic and semantic knowledge of neural networks. I seek to understand where and how existing models fail, in order to help develop more human-like models of language.


Through D-Lab, I offer free consulting on Natural Language Processing and Python to UC Berkeley’s community. Broadly, if you’re working on a project involving natural language data (text or speech), I’m happy to help. Common topics include:

  • Building and preprocessing NLP datasets
  • Data quality metrics
  • Part-of-speech tagging
  • Syntactic parsing
  • Named entity recognition
  • Text classification
  • Information extraction
  • Consuming web APIs
  • Web scraping

I have a special focus on applying these techniques to languages other than English or Standard Average European.

For Python projects, I can help with:

  • Reproducible workflows (git, GitHub, Docker)
  • Data engineering
  • Data analysis (pandas, numpy)
  • Data visualization (matplotlib, seaborn)
  • Scientific Python ecosystem (scipy, jupyter)
  • Machine learning (scikit-learn, statsmodels)
  • Deep learning frameworks (TensorFlow, PyTorch)
  • Squashing bugs