Rule based part of speech tagger software tools

11/8/2023

The POS-tagger CLAWS4 was selected among the available software (e.g. In wanting to give the opportunity of automatic POS search through the 32 film dialogues collected for the anglophone section of the Pavia Corpus of Film Dialogue Footnote 1 (PCFD henceforth), we chose to conduct a pilot study on the dialogues of the film Thelma & Louise (Ridley Scott, 1991), which at the time was the latest film to be added to the corpus. nouns, pronouns, verbs, adverbs) and combinations of them (e.g.

The usefulness of POS tagging lies in the automatisation, thus, the speeding up of research for specific word classes (e.g. POS taggers built upon machine learning algorithms, such as SVM (Giménez & Marquez, 2004) and neural networks (Schmid, 1994), are very powerful however, many machine learning algorithms are not interpretable, which means that it is not possible to understand what motivated the POS tagger’s choices. Statistical POS taggers work by finding the sequence of POS tags that most likely fits the input sentences by means of hidden Markov models (Brants, 2000 Carlberger & Kann, 1999 Cutting et al., 1992) or entropy maximization (Ratnaparkhi, 1996, Toutanova and Manning 2000). While powerful enough to achieve high accuracy on benchmark datasets, rule-based taggers show inherent limitations in uncontrolled experimental environments, due to the lack of comprehension of the context and to the rigidity towards unexpected cases. Rule-based approaches are especially suitable for building multilingual and non-English taggers (Garg et al., 2012, Megyiesi, 1998, Rashel et al., 2014), which often cannot benefit from annotated corpora: any additional language requires a specific set of rules, yet neither data nor training are needed. The ruleset is often coupled with a set of constraints the tagger must follow, e.g., an article cannot be followed by another article. Rule-based POS taggers (Brill, 1992, 1994 Sadredini et al., 2018) rely on a set of deterministic transformation rules, such as the association of a word to a POS. Traditional methodologies involve rule-based and statistical POS taggers, and more recently machine learning algorithms. Part-of-speech tagging (POS tagging henceforth) is the process of assigning a sequence of tags to a sequence of words in order to mark word classes.

0 Comments

Rule based part of speech tagger software tools

Leave a Reply.

Author

Archives

Categories