Some Challenges of Automated Annotation in
A Multilingual Scenario

Arindam Roy; Sunita Sarkar; B. S. Purkayastha

抽象的

Some Challenges of Automated Annotation in A Multilingual Scenario

Arindam Roy, Sunita Sarkar, B. S. Purkayastha

A key ingredient of today’s NLP scenario is annotation and this paper discusses challenges involved in one of the toughest annotation tasks which is sense marking. A large amount of data needs to be sense marked accurately by human annotators in order to train the machine to understand the spoken languages. The sense marked corpus for various languages facilitate the task of Word Sense Disambiguation (WSD) which is required for translation. For accurately sense marking voluminous data, a standard and definitive lexicon is required. In the work reported here, the corpus is taken from the newspaper domain and tourism domain. The Princeton WordNet (Version 2.1) is used as the sense repertoire for English text while the Hindi and Nepali WordNets have been used for Hindi and Nepali texts respectively. The corpus was independently tagged by different annotators and it was found that the agreement level on word sense disambiguation was about 85% across the three languages, i.e., English, Hindi and Nepali. Different senses of a particular word in WordNet are quite specific, yet there have been cases when the senses provided had limitations and posed challenges to the human sense markers.

免责声明: 此摘要通过人工智能工具翻译，尚未经过审核或验证

期刊亮点

应用科学植物学流体动力学生物化学生物医学工程航空航天工程色谱技术

索引于

学术钥匙

研究圣经

引用因子

宇宙IF

参考搜索

哈姆达大学

世界科学期刊目录

学者指导

国际创新期刊影响因子（IIJIF）

国际组织研究所 (I2OR)

宇宙

国际期刊

制药科学医学科学工程普通科学

国际科学、工程与技术创新研究杂志

抽象的

Some Challenges of Automated Annotation in A Multilingual Scenario

期刊亮点

索引于

国际期刊

地址