Brian Roark
Contact Info:
Google, Inc., 555 SW Morrison St., Ste. 500 Portland, OR 97204 email:
roarkbr AT SYMBOL g m a i l DOT c o m
I am a computational linguist working on various topics in natural language processing. My research interests include:
transliteration and text normalization; language identification; language modeling for automatic speech recognition, text entry and other applications; weighted
transducers and grammars; supervised and unsupervised learning of language models; pronunciation modeling; text entry, accessibility and augmentative & alternative communication (AAC); syntactic parsing
of text and speech; statistical models of human language processing; spoken language processing for
diagnosis of neurodevelopmental and neurodegenerative disorders.
Some recent-ish activities, links and/or resources:
- Richard Sproat, Su-Youn Yoon and I have a new book coming out in late 2025: Tools of the Scribe: How Writing Systems, Technology, and Human Factors Interact To Affect the Act of Writing.
- I gave a talk on "Empirical methods in context-aware transliteration" at the Eugene Charniak Memorial Symposium at the CS Deptartment of Brown University in November, 2024.
- I co-organized (w/Kyle Gorman, Emily Prud’hommeaux and Richard Sproat) the Second Workshop on Computation and Written Language (CAWL 2024), held in conjunction with LREC-COLING in Torino, Italy, May 21, 2024.
The workshop was sponsored by the newly-formed ACL Special Interest Group on Writing Systems and Written Language (SIGWrit), which I helped establish in 2023.
The previous year, we organized the First ACL Workshop on Computation and Written Language (CAWL), which was held at ACL 2023 in Toronto, July 14, 2023.
- I was co-Editor-in-chief for the Transactions of the Association for Computational Linguistics (TACL) from 2018-2022. The journal's 2021 impact factor was the topic of an MIT Press blog post.
- Here's the site of the Dakshina dataset, an open-source collection of romanized and native script Wikipedia in 12 South Asian languages that I helped put together.
- Here's a 2021 Google Research blog post about some work my team was involved in, transliterating geo entity names into Brahmic scripts. And here's
an earlier (2017) post about some related work I contributed to, providing transliteration keyboards in 20+ South Asian languages.