Wednesday, November 06, 2013

Word Sense Disambiguation

Some words have more than one meaning.  The brain seems to have an innate ability to work out what a sentence means.  Take the following two sentences:

"I tied my boat to the bank"
"I put my money in the bank"

In the first sentence the you probably imagine somebody tying their boat to the side of a river, yet in the second sentence you imagine somebody investing their money with a financial institution.  That string of four characters: 'b a n k' has completely changed meaning.

Word sense disambiguation (WSD) is a well researched task in computational linguistics with an important application to lexical simplification.  The majority of previous research splits roughly into three categories:
  • Supervised: Using labelled data, a system builds a classifier which can recognise the different senses of a word, from a variety of features in the words surrounding it.
  • Unsupervised: With unlabelled data, a system learns the different senses of a word.  Classification of new data makes use of the previously learned senses.
  • Knowledge Based: A large knowledge resource such as WordNet provides information about the words which can be used during disambiguation.

 WSD is vital to the task of lexical simplification.  Consider simplifying a sentence from the previous example. If you look up the word 'bank' in a thesaurus you will have a list of synonyms that looks something like the following:

Bank:
Financial Institution; Treasury; Safe;
Edge; Beach; Riverside;

If a system does not employ WSD, then there is no method of telling which of the synonyms are correct for the context.  We do not wish to say "I tied my boat to the treasury", or "I put my money in the riverside".  These examples are at best farcical and at worst nonsensical.  WSD is paramount to selecting the correct set of synonyms.

I will not venture to a full explanation of WSD as applied to lexical simplification.  Suffice to say that there are four papers which I have so far identified as addressing the matter.  These can be found in the lexical simplification list.

  • Can Spanish be simpler? LexSiS: Lexical simplification for Spanish. Bott et al. 2012
  • Wordnet-based lexical simplification of a document. Thomas and Anderson 2012
  • Putting it simply: a context-aware approach to lexical simplification. Biran et al. 2011
  • Lexical simplification. De Belder et al. 2010