Geoff Bacon

Geoff Bacon

Computational linguist

University of California, Berkeley

Language & Cognition Lab

D-Lab

Biography

I’m a computational linguist, graduating from my PhD at UC Berkeley advised by Terry Regier. My dissertation evaluates what neural networks learn about phonology, morphosyntax and semantics across more than 80 languages. I am a Senior Data Science Fellow at D-Lab, where I consult and teach on natural language processing and data science with Python. I am a researcher in the Measuring Hate Speech team, where we develop NLP methods to identify hate speech and abusive language online. I built the Cerberus app to reduce the annotation burden of linguists. This summer, I am a researcher at the International Computer Science Institute, working on integrating structured lexical resources with learned semantic representations.

Before my PhD, I was a project manager at Kleffmann Group and taught English in Grosseto and Paris. I studied lingiustics, classics and Arabic at the University of Melbourne. As a language learner, I’ve studied French, Italian, Arabic, Latin, Ancient Greek, Sanskrit and Middle Egyptian. As a linguist, I’ve worked on Somali, Nafsan and Tswefap, and spent two months in Nigeria conducting fieldwork on CiShigini.

Interests

  • Representation learning
  • Neural language models
  • Multilingual NLP
  • Hate speech detection

Education

  • PhD in Linguistics, 2020

    University of California, Berkeley

  • BA (Hons), 2013

    University of Melbourne

  • Diploma in Arabic, 2013

    University of Melbourne

Research

My research evaluates what neural networks learn about language, through computational experiments, massively cross-linguistic datasets and fine-grained error analysis. In my dissertation, I build and analyze large datasets in more than 80 languages to characterize the phonological, morphosyntactic and semantic knowledge of neural networks. I seek to understand where and how existing models fail, in order to help develop more human-like models of language.

Consulting

Through D-Lab, I offer free consulting on Natural Language Processing and Python to UC Berkeley’s community. Broadly, if you’re working on a project involving natural language data (text or speech), I’m happy to help. Common topics include:

  • Building and preprocessing NLP datasets
  • Data quality metrics
  • Part-of-speech tagging
  • Syntactic parsing
  • Named entity recognition
  • Text classification
  • Information extraction
  • Consuming web APIs
  • Web scraping

I have a special focus on applying these techniques to languages other than English or Standard Average European.

For Python projects, I can help with:

  • Reproducible workflows (git, GitHub, Docker)
  • Data engineering
  • Data analysis (pandas, numpy)
  • Data visualization (matplotlib, seaborn)
  • Scientific Python ecosystem (scipy, jupyter)
  • Machine learning (scikit-learn, statsmodels)
  • Deep learning frameworks (TensorFlow, PyTorch)
  • Squashing bugs