Lexical chain

From Wikipedia, the free encyclopedia

The sequence between semantic related ordered words is classified as a lexical chain.[1] A lexical chain is a sequence of related words in writing, spanning narrow (adjacent words or sentences) or wide context window (entire text). A lexical chain is independent of the grammatical structure of the text and in effect it is a list of words that captures a portion of the cohesive structure of the text. A lexical chain can provide a context for the resolution of an ambiguous term and enable disambiguation of concepts that the term represents.

  • Rome → capital → city → inhabitant
  • Wikipedia → resource → web

About[edit]

Morris and Hirst[1] introduce the term lexical chain as an expansion of lexical cohesion.[2] A text in which many of its sentences are semantically connected often produces a certain degree of continuity in its ideas, providing good cohesion among its sentences. The definition used for lexical cohesion states that coherence is a result of cohesion, not the other way around.[2][3] Cohesion is related to a set of words that belong together because of abstract or concrete relation. Coherence, on the other hand, is concerned with the actual meaning in the whole text.[1]

Morris and Hirst[1] define that lexical chains make use of semantic context for interpreting words, concepts, and sentences. In contrast, lexical cohesion is more focused on the relationships of word pairs. Lexical chains extend this notion to a serial number of adjacent words. There are two main reasons why lexical chains are essential:[1]

  • Feasible context to assist in the ambiguity and narrowing problems to a specific meaning of a word; and
  • Clues to determine coherence and discourse, thus a deeper semantic-structural meaning of the text.

The method presented by Morris and Hirst[1] is the first to bring the concept of lexical cohesion to computer systems via lexical chains. Using their intuition, they identify lexical chains in text documents and built their structure considering Halliday and Hassan's[2] observations. For this task, they considered five text documents, totaling 183 sentences from different and non-specific sources. Repetitive words (e.g., high-frequency words, pronouns, propositions, verbal auxiliaries) were not considered as prospective chain elements since they do not bring much semantic value to the structure themselves.

Lexical chains are built according to a series of relationships between words in a text document. In the seminal work of Morris and Hirst[1] they consider an external thesaurus (Roget's Thesaurus) as their lexical database to extract these relations. A lexical chain is formed by a sequence of words appearing in this order, such that any two consecutive words present the following properties (i.e., attributes such as category, indexes, and pointers in the lexical database):[1][4]

  • two words share one common category in their index;
  • the category of one of these words points to the other word;
  • one of the words belongs to the other word's entry or category;
  • two words are semantically related; and
  • their categories agree to a common category.

Approaches and Methods[edit]

The use of lexical chains in natural language processing tasks (e.g., text similarity, word sense disambiguation, document clustering) has been widely studied in the literature. Barzilay et al [5] use lexical chains to produce summaries from texts. They propose a technique based on four steps: segmentation of original text, construction of lexical chains, identification of reliable chains, and extraction of significant sentences. Silber and McCoy[6] also investigates text summarization, but their approach for constructing the lexical chains runs in linear time.

Some authors use WordNet[7][8] to improve the search and evaluation of lexical chains. Budanitsky and Kirst[9][10] compare several measurements of semantic distance and relatedness using lexical chains in conjunction with WordNet. Their study concludes that the similarity measure of Jiang and Conrath[11] presents the best overall result. Moldovan and Adrian[12] study the use of lexical chains for finding topically related words for question answering systems. This is done considering the glosses for each synset in WordNet. According to their findings, topical relations via lexical chains improve the performance of question answering systems when combined with WordNet. McCarthy et al.[13] present a methodology to categorize and find the most predominant synsets in unlabeled texts using WordNet. Different from traditional approaches (e.g., BOW), they consider relationships between terms not occurring explicitly. Ercan and Cicekli[14] explore the effects of lexical chains in the keyword extraction task through a supervised machine learning perspective. In Wei et al.[15] combine lexical chains and WordNet to extract a set of semantically related words from texts and use them for clustering. Their approach uses an ontological hierarchical structure to provide a more accurate assessment of similarity between terms during the word sense disambiguation task.

Lexical Chain and Word Embedding[edit]

Even though the applicability of lexical chains is diverse, there is little work exploring them with recent advances in NLP, more specifically with word embeddings. In,[16] lexical chains are built using specific patterns found on WordNet[7] and used for learning word embeddings. Their resulting vectors, are validated in the document similarity task. Gonzales et al. [17] use word-sense embeddings to produce lexical chains that are integrated with a neural machine translation model. Mascarelli[18] proposes a model that uses lexical chains to leverage statistical machine translation by using a document encoder. Instead of using an external lexical database, they use word embeddings to detect the lexical chains in the source text.

Ruas et al.[4] propose two techniques that combine lexical databases, lexical chains, and word embeddings, namely Flexible Lexical Chain II (FLLC II) and Fixed Lexical Chain II (FXLC II). The main goal of both FLLC II and FXLC II is to represent a collection of words by their semantic values more concisely. In FLLC II, the lexical chains are assembled dynamically according to the semantic content for each term evaluated and the relationship with its adjacent neighbors. As long as there is a semantic relation that connects two or more words, they should be combined into a unique concept. The semantic relationship is obtained through WordNet, which works a ground truth to indicate which lexical structure connects two words (e.g., hypernyms, hyponyms, meronyms). If a word without any semantic affinity with the current chain presents itself, a new lexical chain is initialized. On the other hand, FXLC II breaks text segments into pre-defined chunks, with a specific number of words each. Different from FLLC II, the FXLC II technique groups a certain amount of words into the same structure, regardless of the semantic relatedness expressed in the lexical database. In both methods, each formed chain is represented by the word whose pre-trained word embedding vector is most similar to the average vector of the constituent words in that same chain.

See also[edit]

References[edit]

  1. ^ a b c d e f g h MorrisJane; HirstGraeme (1991-03-01). "Lexical cohesion computed by thesaural relations as an indicator of the structure of text". Computational Linguistics.
  2. ^ a b c Halliday, Michael Alexander Kirkwood (1976). Cohesion in English. Hasan, Ruqaiya. London: Longman. ISBN 0-582-55031-9. OCLC 2323723.
  3. ^ Carrell, Patricia L. (1982). "Cohesion Is Not Coherence". TESOL Quarterly. 16 (4): 479–488. doi:10.2307/3586466. ISSN 0039-8322. JSTOR 3586466.
  4. ^ a b Ruas, Terry; Ferreira, Charles Henrique Porto; Grosky, William; de França, Fabrício Olivetti; de Medeiros, Débora Maria Rossi (2020-09-01). "Enhanced word embeddings using multi-semantic representation through lexical chains". Information Sciences. 532: 16–32. arXiv:2101.09023. doi:10.1016/j.ins.2020.04.048. ISSN 0020-0255. S2CID 218954068.
  5. ^ Barzilay, Regina; McKeown, Kathleen R.; Elhadad, Michael (1999). "Information fusion in the context of multi-document summarization". Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. College Park, Maryland: Association for Computational Linguistics: 550–557. doi:10.3115/1034678.1034760. ISBN 1558606092.
  6. ^ Silber, Gregory; McCoy, Kathleen (2001). "Efficient text summarization using lexical chains | Proceedings of the 5th international conference on Intelligent user interfaces": 252–255. doi:10.1145/325737.325861. S2CID 8403554. {{cite journal}}: Cite journal requires |journal= (help)
  7. ^ a b "WordNet | A Lexical Database for English". wordnet.princeton.edu. Retrieved 2020-05-20.
  8. ^ WordNet : an electronic lexical database. Fellbaum, Christiane. Cambridge, Mass: MIT Press. 1998. ISBN 0-262-06197-X. OCLC 38104682.{{cite book}}: CS1 maint: others (link)
  9. ^ Budanitsky, Alexander; Hirst, Graeme (2001). "Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures" (PDF). Proceedings of the Workshop on WordNet and Other Lexical Resources, Second Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2001). pp. 24–29. Retrieved 2020-05-20.{{cite web}}: CS1 maint: location (link)
  10. ^ Budanitsky, Alexander; Hirst, Graeme (2006). "Evaluating WordNet-based Measures of Lexical Semantic Relatedness". Computational Linguistics. 32 (1): 13–47. doi:10.1162/coli.2006.32.1.13. ISSN 0891-2017. S2CID 838777.
  11. ^ Jiang, Jay J.; Conrath, David W. (1997-09-20). "Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy". arXiv:cmp-lg/9709008.
  12. ^ Moldovan, Dan; Novischi, Adrian (2002). "Lexical chains for question answering". Proceedings of the 19th international conference on Computational linguistics -. Vol. 1. Taipei, Taiwan: Association for Computational Linguistics. pp. 1–7. doi:10.3115/1072228.1072395.
  13. ^ McCarthy, Diana; Koeling, Rob; Weeds, Julie; Carroll, John (2004). "Finding predominant word senses in untagged text". Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics - ACL '04. Barcelona, Spain: Association for Computational Linguistics: 279–es. doi:10.3115/1218955.1218991.
  14. ^ Ercan, Gonenc; Cicekli, Ilyas (2007). "Using lexical chains for keyword extraction". Information Processing & Management. 43 (6): 1705–1714. doi:10.1016/j.ipm.2007.01.015. hdl:11693/23343.
  15. ^ Wei, Tingting; Lu, Yonghe; Chang, Huiyou; Zhou, Qiang; Bao, Xianyu (2015). "A semantic approach for text clustering using WordNet and lexical chains". Expert Systems with Applications. 42 (4): 2264–2275. doi:10.1016/j.eswa.2014.10.023.
  16. ^ Linguistic Modeling and Knowledge Processing Department, Institute of Information and Communication Technology, Bulgarian Academy of Sciences; Simov, Kiril; Boytcheva, Svetla; Osenova, Petya (2017-11-10). "Towards Lexical Chains for Knowledge-Graph-basedWord Embeddings" (PDF). RANLP 2017 - Recent Advances in Natural Language Processing Meet Deep Learning. Incoma Ltd. Shoumen, Bulgaria: 679–685. doi:10.26615/978-954-452-049-6_087. ISBN 978-954-452-049-6. S2CID 41952796.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  17. ^ Rios Gonzales, Annette; Mascarell, Laura; Sennrich, Rico (2017). "Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings". Proceedings of the Second Conference on Machine Translation. Copenhagen, Denmark: Association for Computational Linguistics. pp. 11–19. doi:10.18653/v1/W17-4702.
  18. ^ Mascarell, Laura (2017). "Lexical Chains meet Word Embeddings in Document-level Statistical Machine Translation". Proceedings of the Third Workshop on Discourse in Machine Translation. Copenhagen, Denmark: Association for Computational Linguistics: 99–109. doi:10.18653/v1/W17-4813.