LOT Winter School 2019

RM1 - A deeper understanding of distributional semantics

Antske Fokkens

Contact





Title of the course: A Deeper Understanding of Distributional Semantics

Teacher: dr. Antske Fokkens

Contact
Address:
Dept. of Language, Literature & Communication
De Boelelaan 1105, 1081 HV Amsterdam

Email address: antske.fokkens@vu.nl

website teacher: http://antskefokkens.info

Course info
Level:
RM1 (First year Research Master Linguistics)

Course description:

Distributional semantic representations are based on the idea that the meaning of words is determined by their usage. Harris (1954) and Firth (1957) used this idea to formulate what became the distributional semantic hypothesis, which states that words with similar meanings will occur in similar contexts. If this is the case, it should be possible to learn the meaning of words from looking at the contexts they occur in.

Computational linguists have explored this idea for several decades exploring various methods for creating distributional semantic models from large corpora.

Distributional semantic models represent the meaning of words as vectors, often called word-embeddings, based on their occurrence in large corpora. Such a vector encodes a word’s context words as a numeric representation. Thanks to increasingly larger corpora and more and more computation power being available, the quality of these models has improved significantly over the last years. Using these representations in computational linguistics has led to an improvement of a wide variety of tasks. They are also used to study word meaning itself, e.g. to identify whether the meaning of a word changed over time.

This course first provides an introduction to distributional semantic models and their role in computational linguistics. We then examine these representations from a linguistic point of view focusing on the following questions: What information do these models capture? How can we verify them or determine their quality? How are linguistic properties represented? What does this tell us about language use?

Day-to-day program

Monday: Introduction: what are distributional models and how are they created?
In this first lecture, we will cover the (technical) background of distributional semantic models. What are they? What do they represent (or are they supposed to represent)? How are they created from corpora?

Tuesday: Evaluation & practical application
In this lecture, we will discuss how distributional semantic models are used and how they are evaluated. We treat standard intrinsic evaluation methods and practical applications (i.e. how they are used in various NLP applications).

Wednesday: Diving deeper: semantic phenomena & other linguistic properties
This lecture addresses which linguistic phenomena distributional semantic models capture and to what extent. We look at what information we expect to be captured. We then discuss methods for testing whether this is indeed the case and insights from the latest research using such methods.

Thursday: Semantic models for corpus studies: methodological issues
Distributional semantics have been used in the field of digital humanities to study sense shift (linguistic change) and concept drift (as part of historical studies). Researchers have aimed to determine whether the meaning of a word changed by comparing distributional models created from old text to more modern text. We introduce the methods used in such studies and explore their limitations. In particular, do they capture actual shifts or are these artifacts? Are the corpora balanced and large enough?

Friday: Critical Discussion and Open Questions
The final lecture will consist of critical discussions, where we link insights from various experiments back to linguistic theories. What do we know about distributional semantic models? What is still unclear? How can we find answers to current open questions?

Readings:

Course readings (obligatory):

Lecture 1:

Obligatory readings will be provided as part of the first assignment (covering the distributional hypothesis and Gricean principles).

Lecture 2:

Hill, Felix, Roi Reichart, and Anna Korhonen. "Simlex-999: Evaluating semantic models with (genuine) similarity estimation." Computational Linguistics 41, no. 4 (2015): 665-695.

Lecture 3:

Baroni, Marco, Brian Murphy, Eduard Barbu, and Massimo Poesio. "Strudel: A corpus‐based semantic model based on properties and types." Cognitive science 34, no. 2 (2010): 222-254.

Lecture 4:

Hamilton, William L., Jure Leskovec, and Dan Jurafsky. "Cultural shift or linguistic drift? comparing two computational measures of semantic change." In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, vol. 2016, p. 2116. NIH Public Access, 2016.

Lecture 5:

Hellrich, Johannes, and Udo Hahn. "Bad company—neighborhoods in neural embedding spaces considered harmful." In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2785-2796. 2016.

Further readings (optional):

General:

Jurafsky, Dan and James Martin. Speech and Language Processing 3rd edition. Chapter 6. https://web.stanford.edu/~jurafsky/slp3/ Note 1: make sure to get the third edition (chapter numbers change). Note 2: This will also be explained in class.

Lenci, Alessandro. "Distributional semantics in linguistic and cognitive research." Italian journal of linguistics 20, no. 1 (2008): 1-31.

More technical papers:

Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).

Goldberg, Yoav, and Omer Levy. "word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method." arXiv preprint arXiv:1402.3722 (2014).

Levy, Omer, Yoav Goldberg, and Ido Dagan. "Improving distributional similarity with lessons learned from word embeddings." Transactions of the Association for Computational Linguistics 3 (2015): 211-225.

Lecture 3:

Derby, Steven, Paul Miller, Brian Murphy, and Barry Devereux. "Using Sparse Semantic Embeddings Learned from Multimodal Text and Image Data to Model Human Conceptual Knowledge." arXiv preprint arXiv:1809.02534 (2018).

Sommerauer, Pia, and Antske Fokkens. "Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell." arXiv preprint arXiv:1809.01375 (2018).

Lecture 4:

Hamilton, William L., Jure Leskovec, and Dan Jurafsky. "Diachronic word embeddings reveal statistical laws of semantic change." arXiv preprint arXiv:1605.09096 (2016).

(support for obligatory Hamilton paper).

Martinez-Ortiz, Carlos, Tom Kenter, Melvin Wevers, Pim Huijnen, Jaap Verheul, and Joris van Eijnatten. "ShiCo: A Visualization Tool for Shifting Concepts Through Time." In Proceedings of the 3rd DH Benelux Conference (DH Benelux 2016), p. 1. 2016.

Martin and Jurafsky