Confusion network

A confusion network (sometimes called a word confusion network or informally known as a sausage) is a natural language processing method that combines outputs from multiple automatic speech recognition or machine translation systems.^[1]^[2] Confusion networks are simple linear directed acyclic graphs with the property that each a path from the start node to the end node goes through all the other nodes. The set of words represented by edges between two nodes is called a confusion set. In machine translation, the defining characteristic of confusion networks is that they allow multiple ambiguous inputs, deferring committal translation decisions until later stages of processing.^[3]^[4] This approach is used in the open source machine translation software Moses^[5] and the proprietary translation API in IBM Bluemix Watson.^[6]

Example of a confusion network

References[edit]

^ Rosti, Antti-Veikko I.; Zhang, Bing; Matsoukas, Spyros; Schwartz, Richard (2008). "Incremental Hypothesis Alignment for Building Confusion Networks with Application to Machine Translation System Combination". Proceedings of the Third Workshop on Statistical Machine Translation. StatMT '08. Stroudsburg, PA, USA: Association for Computational Linguistics: 183–186. ISBN 9781932432091.
^ Matusov, Evgeny; Ueffing, Nicola; Ney, Hermann (2006). "Computing consensus translation from multiple machine translation systems using enhanced hypotheses alignment". In Proc. EACL. CiteSeerX 10.1.1.483.5417.
^ Hoang, Hieu (2007). "Factored translation models". In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL: 868–876. CiteSeerX 10.1.1.80.3572.
^ Koehn, Philipp; Hoang, Hieu; Birch, Alexandra; Callison-Burch, Chris; Federico, Marcello; Bertoldi, Nicola; Cowan, Brooke; Shen, Wade; Moran, Christine (2007). "Moses: Open Source Toolkit for Statistical Machine Translation". Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. ACL '07. Stroudsburg, PA, USA: Association for Computational Linguistics: 177–180. doi:10.3115/1557769.1557821. S2CID 794019.
^ "Moses - Moses/ConfusionNetworks". www.statmt.org. Retrieved 2017-11-09.
^ "IBM® Speech to Text service provides an API Reference | IBM Watson Developer Cloud". www.ibm.com. Archived from the original on 2017-11-09. Retrieved 2017-11-09. A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.

This computer science article is a stub. You can help Wikipedia by expanding it.

[1] Rosti, Antti-Veikko I.; Zhang, Bing; Matsoukas, Spyros; Schwartz, Richard (2008). "Incremental Hypothesis Alignment for Building Confusion Networks with Application to Machine Translation System Combination". Proceedings of the Third Workshop on Statistical Machine Translation. StatMT '08. Stroudsburg, PA, USA: Association for Computational Linguistics: 183–186. ISBN 9781932432091.

[2] Matusov, Evgeny; Ueffing, Nicola; Ney, Hermann (2006). "Computing consensus translation from multiple machine translation systems using enhanced hypotheses alignment". In Proc. EACL. CiteSeerX 10.1.1.483.5417.

[3] Hoang, Hieu (2007). "Factored translation models". In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL: 868–876. CiteSeerX 10.1.1.80.3572.

[4] Koehn, Philipp; Hoang, Hieu; Birch, Alexandra; Callison-Burch, Chris; Federico, Marcello; Bertoldi, Nicola; Cowan, Brooke; Shen, Wade; Moran, Christine (2007). "Moses: Open Source Toolkit for Statistical Machine Translation". Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. ACL '07. Stroudsburg, PA, USA: Association for Computational Linguistics: 177–180. doi:10.3115/1557769.1557821. S2CID 794019.

[5] "Moses - Moses/ConfusionNetworks". www.statmt.org. Retrieved 2017-11-09.

[6] "IBM® Speech to Text service provides an API Reference | IBM Watson Developer Cloud". www.ibm.com. Archived from the original on 2017-11-09. Retrieved 2017-11-09. A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.

[1]

[2]

[3]

[4]

[5]

[6]