Natural language processing

Objectives and outcomes

Students are introduced to advanced techniques of natural language processing and extraction of useful
knowledge from unstructured text. Students are able to select and apply an appropriate natural language
processing technique to a specific domain and to implement applications that can interpret human
language, as well as generate text in natural language.

Lectures

Fundamentals of linguistics. Areas of natural language processing: speech recognition, natural language
understanding and natural language generation. Text segmentation. Word recognition and sentence
recognition. Ambiguity of language. Language structure and morphology. The structure of expression.
Words. Collocations. Statistical language processing. Statistical estimators. Combining estimators.
Determining the meaning of words. Determining meaning using a dictionary. Determining meaning
without supervision. Lexical acquisition. Subcategorization of words. Selection preferences. Semantic
similarity. Markov models of grammar. Marking parts of speech. Probabilistic context-free grammar.
String probability. Probabilistic parsing. Statistical editing and machine translation. Vector space model.
Latent semantic indexing. Discourse segmentation. Text categorization. Decision trees.

Practical classes

An overview of languages and natural language processing tools. Practical exercises with parsing,
tokenization, stemming, semantic reasoning in the chosen tool (example Python NLTK) and over a
certain corpus of text. Implementation of language element tagging, entity extraction and text
classification. Experimental exercise of generating text in natural language.