Camelia
Camelia Camelia

Natural Language Processing for Semantics of Multicultural Proverbs

Natural Language Processing for Semantics of Multicultural Proverbs

In this mini-project we perform Natural Language Processing on proverbs and aphorisms from a variety of cultures, being primarily interested in their semantic analysis.

In PART 1, we develop an elaborated Spacy pipeline with multiple custom components, which include:

  • expanding contractions,
  • negated verbs detection,
  • word sense disambiguation (WSD) based on WordNet 3.1 using SupWSD, BlazeGraph and SPARQL over BabelNet ,
  • semantic role labelling (SRL) using AllenNLP,
  • coreference resolution of pronouns (COREF) using AllenNLP.

In PART 2, we start by performing analytics at part of speech level. Then, we use various embeddings techniques (static vectors from Spacy, Node2Vec on WordNet, BERT embeddings on dynamic word contexts) and evaluate how well these capture the semantic similarity between the nouns in our dataset. Finally, we train a Tensorflow model using BERT embeddings to classify a proverb as figurative (metaphoric) vs realistic.

Results:

comments powered by Disqus