CSE 506/606 - Topics in Natural Language Processing, spring, 2012

Instructors: Brian Roark and Emily T. Prud'hommeaux

Class time: Tu/Th 11:00-12:30pm

Class location: WCC 403

Office hours: Weds 10-12, Central Building 115, or by appointment

Required textbooks:

None, reading will come from papers available on-line

See official OHSU Grade Policy and Disability Statement below



Skip to overview of topics.

Goals

This course is focused on two "tracks" of advanced topics in Natural Language Processing (NLP), with in-class sessions roughly alternating between tracks. The first track examines the current best practices in building systems that cluster, label or transform raw text to improve information access. Such systems are often chained together within larger applications that retrieve documents, extract information, summarize, answer questions and translate to other languages. We will have a particular focus on machine translation, but will also cover information retrieval, information extraction, question answering and automatic summarization, all with a focus on current best practices for real, deployed systems. The second track of the course will focus on an increasingly common form of structured processing in NLP: dependency parsing. Dependency parsing is a very popular method for annotating hidden structure on raw text, due to applicability to many kinds of languages (e.g., those with free word order) and relatively efficient inference algorithms. We will cover key algorithms and new developments for accurate automatic dependency parsing. This course will provide a hands-on, project oriented introduction to these topics. Students will have a choice of whether to focus more (in homeworks and for projects) on the applications track or the dependency parsing track.

Prerequisites

There is no official programming language for this course, but there will be a some amount of scripting or programming required to complete assignments, hence facility with some programming language (or willingness to acquire such facility) is assumed. Students should have taken an introductory course in computational linguistics or natural language processing, or receive permission from the instructor.

Homework and term projects

Specifics to be determined.

Grading

See university Grade Policy below. 10% of your grade will depend on in-class discussion, 15% on in-class presentations, 15% each on 3 homework projects and 30% on a term project and presentation.

What we'll cover and an approximate schedule

Date     Topic Tentative Reading slides
Apr.3 Overview of class structure; introduction to the text processing "pipeline", including IR, IE, QA, summarization and MT; introduction to structured processing and dependency parsing; homework and term project options    
Apr.5 Introduction to Information Retrieval (IR) and Information extraction (IE) Manning et al. Introduction to Information Retrieval ch 1 and ch 6  
Apr.10 Dependency Parsing I; context-free grammars; chart parsing; bi-lexical grammars; Eisner's cubic complexity parsing algorithm    
Apr.12 Introduction to Question Answering (QA) and Automatic Summarization TREC-8 QA Track Report
D. Ferrucci (2010) Building Watson
 
Apr.17 Dependency Parsing II; derivation models; shift-reduce parsing; Nivre's greedy deterministic approach; arc eager parsing
Homework 1 due
Nivre, J. (2004) Incrementality in Deterministic Dependency Parsing  
Apr.19 Introduction to Machine Translation (MT) Manning and Schutze chapter 13
Jurafsky and Martin, chapter 25
Lopez (2008) Statistical Machine Translation
 
Apr.24 Dependency Parsing III; Non-projective dependencies; minimum spanning tree dependency parsing Nivre, J. and J. Nilsson (2005) Pseudo-Projective Dependency Parsing
McDonald, R. and J. Nivre (2007) Characterizing the Errors of Data-Driven Dependency Parsing Models
 
Apr.26 Topics in Machine Translation I: Using syntax in MT Chiang (2005) A Hierarchical Phrase-Based Model for Statistical Machine Translation
Zollman and Venugopal (2006) Syntax Augmented Machine Translation via Chart Parsing
Li et al. (2010) Joshua 2.0
 
May 1 Dependency Parsing IV; "perceptron-like" on-line learning, direct loss minimization, passive-aggressive methods; higher-order MST models and algorithms; and dual decomposition (intro/intuitions)
Homework 2 due
McDonald, R. and F. Pereira (2006) Online Learning of Approximate Dependency Parsing Algorithms  
May 3 Topics in IR and summarization: Graph-based methods Page and Brin (1999) PageRank
Erkan and Radev (2004) LexRank
 
May 8 Student presentations of term project proposals and discussion    
May 10 Topics in Machine Translation II: Evaluation
Papineni et al. (2002) BLEU
Denkowski and Lavie (2011) Meteor 1.3
Snover et al. (2009) TER-Plus
 
May 15 NLP for low-resource and morphologically rich languages    
May 17 Student presentations of homework 3 and discussion    
May 22 Guest lecture:
John Hale, Dependency parsing for psycholinguistic modeling
Homework 3 due
Marisa F. Boston, John T. Hale, Shravan Vasishth and Reinhold Kliegl. 2011. Parallel processing and sentence comprehension difficulty, Language and Cognitive Processes, 26:3, 301-349.  
May 24 Guest lecture:
Steven Bedrick, Extrinsic Evaluation: does your system/tool/algorithm really work?
See extrinsic evaluation references below  
May 29 Dependency Parsing V: dual decomposition for dependency parsing; dependencies and higher complexity grammar formalisms T. Koo, A.M. Rush, M. Collins, T. Jaakkola, D. Sontag (2010) Dual Decomposition for Parsing with Non-Projective Head Automata  
May 31 Dependency Parsing VI: Approximate inference for dependency parsing; vine parsing, arc filtering; using dependency parsing/parses Shane Bergsma and Colin Cherry (2010) Fast and Accurate Arc Filtering for Dependency Parsing  
June 5,7 No class    
June 12,14 Term project presentations    


References:

Bergsma, S. and Cherry, C. (2010) Fast and Accurate Arc Filtering for Dependency Parsing, Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 53-61.

Boston, M.F., Hale, J.T, Vasishth, S. and Reinhold, K. (2011) Parallel processing and sentence comprehension difficulty, Language and Cognitive Processes, 26:3, 301-349.

Erkan, G. and Radev, D. (2004) LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. Journal of Artificial Intelligence Research 22:457-479.

Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A.A., Lally, A., Murdock, J.W., Nyberg, E., Prager, J. Schlaefer, N., and Welty, C. (2010) Building Watson: An overview of the DeepQA project. AI Magazine 31(3):59-79.

Jurafsky, D. and Martin, J. (2009) Speech and Language Processing (2nd Edition). Upper Saddle River, NJ: Pearson Prentice Hall.

Koo, T., A.M. Rush, M. Collins, T. Jaakkola, D. Sontag (2010) Dual Decomposition for Parsing with Non-Projective Head Automata. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1288-1298.

Lopez, A. (2008) Statistical Machine Translation. In ACM Computing Surveys 40(3):1-49.

Manning, C.D., Raghavan, P., Schutze, H. (2008) Introduction to information retrieval. Cambridge: Cambridge University Press.

Manning, C.D. and Schütze, H. (1999) Foundations of statistical natural language processing. Cambridge: MIT University Press.

McDonald, R. and F. Pereira (2006) Online Learning of Approximate Dependency Parsing Algorithms. Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 81-88.

McDonald, R. and J. Nivre (2007) Characterizing the Errors of Data-Driven Dependency Parsing Models. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing (EMNLP) and Computational Natural Language Learning (CoNLL), pp. 122-131

Nivre, J. (2004) Incrementality in Deterministic Dependency Parsing. In Incremental Parsing: Bringing Engineering and Cognition Together. Workshop at ACL-2004, July 25, 2004, Barcelona, Spain, 50-57.

Nivre, J. and J. Nilsson (2005) Pseudo-projective dependency parsing. In Proceedings of the 43rd Annual Meeting of the ACL, pp. 99-106.

Page, L., Brin, S., Motwani, R. and Winograd, T. (1999) The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999-66, Stanford InfoLab.

Extrinsic evaluation references:
Belz. That's nice... what can you do with it?. Comput. Linguist. (2009) vol. 35 (1) pp. 111-118

Law et al. A Comparison of Graphical and Textual Presentations of Time Series Data to Support Medical Decision Making in the Neonatal Intensive Care Unit. Journal of clinical monitoring and computing (2005) vol. 19 pp. 183-194

Jones. Towards better NLP system evaluation. Proceedings of the workshop on Human Language Technology (1994) pp. 102-107

Miyao et al. Task-oriented Evaluation of Syntactic Parsers and Their Representations. Proceedings of ACL-08: HLT (2008) pp. 46-54

Quirk and Corston-Oliver. The impact of parse quality on syntactically-informed statistical machine translation. Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (2006) pp. 62-69

Schneider et al. Comparing Intrinsic and Extrinsic Evaluation of MT Output in a Dialogue System. Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT) (2010) pp. 329-336




OHSU Grade Policy

OHSU SoM Graduate Studies Grade Submission Policy
Approved by SoM Graduate Council April 8, 2008

Graduate Studies in the OHSU School of Medicine is committed to providing grades to students in a timely manner. Course instructors will provide students with information in writing at the beginning of each course that describes the grading policies and procedures including but not limited to evaluation criteria, expected time needed to grade individual student examinations and type of feedback they will provide.

Class grades are due to the Registrar by the Friday following the week of finals. However, on those occasions when a grade has not been submitted by the deadline, the following procedure shall be followed:

1) The Program Coordinator will immediately contact the Instructor requesting the missing grade, with a copy to the Program Director and Registrar.

2) If the grade is still overdue by the end of next week, the Program Coordinator will email the Department Chair directly, with a copy to the Instructor and Program Director requesting resolution of the missing grade.

3) If, after an additional week the grade is still outstanding, the Coordinator may petition the Office of Graduate Studies for final resolution.


OHSU Disability Statement

Our program is committed to all students achieving their potential. If you have a disability or think you may have a disability (physical, learning, hearing, vision, psychological) which may need a reasonable accommodation please contact Student Access at (503) 494-0082 or e-mail at orchards@ohsu.edu to discuss your needs. You can also find more information at www.ohsu.edu/student-access. Because accommodations can take time to implement, it is important to have this discussion as soon as possible. All information regarding a student's disability is kept in accordance with relevant state and federal laws.