Skip to content

sayarghoshroy/Bengali_Anaphora_Resolution_Challenges

Repository files navigation

Bengali Anaphora Resolution Challenges

Outline of a Rule Based Anaphora Resolution System for Bengali

We present an anaphora resolution system for Bengali which utilizes linguistic features for disambiguation of possible antecedents of a particular anaphor. The system accepts a sentence in UTF8 Bengali script as its input, identifies the pronominal anaphors and detects all possible antecedents for that particular anaphor. Using a set of hierarchical rules, the system disambiguates and selects the best contextual antecedent from the set of possibilities. The system is evaluated on a manually tagged dataset.

This study shows exactly how far a system with linguistic rules can go and illustrates the stepping off point after which the use of core knowledge becomes paramount. However, for a simple rule based system, it performs reasonably well often disambiguating a set of possible antecedents down to the most obvious and correct choice.

Disambiguation Features

  1. Part-Of-Speech information: The POS tag not only identifies pronouns and nouns, but also helps us recognize named entities in text.

  2. Number: The pronoun and its antecedent must have number-agreement i.e references to singular and plural entities must tally with the antecedents themselves.

  3. Person: Agreement of person helps classify one particular antecedent as more probable than another given other features are equivalent.

  4. Status: Honorifics used give us clues as to which particular person the referent is actually referring to.

  5. Morphological : Morphological features are used to identify number in nouns and type of referent in case of pronouns.

For details, refer to the full report.


Releases

No releases published

Packages

No packages published

Languages