Geoff Bacon

Computational linguist

Google

Biography

I’m a computational linguist at Google. I received my PhD from UC Berkeley in 2020, advised by Terry Regier. My dissertation evaluated what neural networks learn about phonology, morphosyntax and semantics across more than 80 languages. From 2017-2020, I was a Senior Data Science Fellow at D-Lab, where I consulted and taught on natural language processing and data science with Python. I was a researcher in the Measuring Hate Speech team, where we developed NLP methods to identify hate speech and abusive language online. I built the Cerberus app to reduce the annotation burden of linguists. I was also a researcher at the International Computer Science Institute, working on integrating structured lexical resources with learned semantic representations.

Before my PhD, I was a project manager at Kleffmann Group and taught English in Grosseto and Paris. I studied lingiustics, classics and Arabic at the University of Melbourne. As a language learner, I’ve studied French, Italian, Arabic, Latin, Ancient Greek, Sanskrit and Middle Egyptian. As a linguist, I’ve worked on Somali, Nafsan and Tswefap, and spent two months in Nigeria conducting fieldwork on CiShigini.

Interests

Representation learning
NLP data pipelines
Multilingual NLP
Hate speech detection

Education

PhD in Linguistics, 2020

University of California, Berkeley
BA (Hons), 2013

University of Melbourne
Diploma in Arabic, 2013

University of Melbourne

Research

My research evaluates what neural networks learn about language, through computational experiments, massively cross-linguistic datasets and fine-grained error analysis. In my dissertation, I build and analyze large datasets in more than 80 languages to characterize the phonological, morphosyntactic and semantic knowledge of neural networks. I seek to understand where and how existing models fail, in order to help develop more human-like models of language.

Consulting

Through D-Lab, I offer free consulting on Natural Language Processing and Python to UC Berkeley’s community. Broadly, if you’re working on a project involving natural language data (text or speech), I’m happy to help. Common topics include:

Building and preprocessing NLP datasets
Data quality metrics
Part-of-speech tagging
Syntactic parsing
Named entity recognition
Text classification
Information extraction
Consuming web APIs
Web scraping

I have a special focus on applying these techniques to languages other than English or Standard Average European.

For Python projects, I can help with:

Reproducible workflows (git, GitHub, Docker)
Data engineering
Data analysis (pandas, numpy)
Data visualization (matplotlib, seaborn)
Scientific Python ecosystem (scipy, jupyter)
Machine learning (scikit-learn, statsmodels)
Deep learning frameworks (TensorFlow, PyTorch)
Squashing bugs

Teaching

During my PhD, I created and taught a range of workshops, including: