Template talk:Formal languages and grammars

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
WikiProject iconComputer science Template‑class
WikiProject iconThis template is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
TemplateThis template does not require a rating on Wikipedia's content assessment scale.
Things you can help WikiProject Computer science with:

WikiProject iconCognitive science NA‑class (inactive)
WikiProject iconThis article is within the scope of WikiProject Cognitive science, a project which is currently considered to be inactive.
NAThis article has been rated as NA-class on Wikipedia's content assessment scale.

I disagree with Tyler McHenry's version of this table, and much prefer the earlier one by Chris Pressey. An unrestricted grammar as defined in the Chomsky hierarchy is well defined and explicit. Listing it for both the Turing Machine and Decider rows seems to imply that both are equivalent. I'm co-teaching a formal languages course this semster, and all the students found this table confusing when "unrestricted" is listed in multiple rows. Jim Mahoney 19:09, 27 March 2006 (UTC)[reply]

I just want to note that this was addressed, by now (I'm stating it clearly for future readers of this talk page). --Blaisorblade (talk) 15:07, 16 June 2008 (UTC)[reply]

Formal languages and grammars vs. Chomsky hierarchy[edit]

Since this template is about formal languages and grammars in general, and not strictly the Chomsky hierarchy (as specified in Chomsky (1959, 1963)), would anyone have a problem if we listed other well-documented proper subset formal languages and grammars that have been discovered since then? For example, indexed languages & grammars have been around since Aho (1968) and have been well studied since then, in e.g. Hopcroft & Ullman (1979), not to mention mildly context-sensitive (Joshi et al, 1975), deterministic context-free, and other major formal languages and grammars. –jonsafari 03:37, 21 September 2006 (UTC)[reply]

Yes, this table is too heavily tied up in the Chomsky hierarchy — an important classification scheme, to be sure, but not a good way of organizing the information this template needs to convey, seeing as this template needs to include many other kinds of classifications. Please be bold. :-)   —RuakhTALK 03:48, 21 September 2006 (UTC)[reply]

Subsets not proper[edit]

As far as I know, indexed grammars and tree adjoining grammars, as well as context-free and deterministic context-free grammars, generate the same language. Therefore they are no proper subsets. Math1985 21:22, 7 August 2007 (UTC)[reply]

Where do you get this information from? I get my information from Hopcroft & Ullman (1979:233,390), Partee et al (1990:536-542), and Sipser (1997), not to mention many works by Vijay-Shanker & Weir. Most of these sources are cited in the respective language articles, where they should be. –jonsafari 21:18, 8 August 2007 (UTC)[reply]

I agree that the information proposed by Math1985 is incorrect.

Weir and Joshi actually proved that Combinatory Categorial Grammmars generate the same languages as 3 other families already known to (weakly) generate the same class of languages: tree adjoining grammars, Head Grammars and linear indexed grammars. So there is no problem there.

However, in his book "Taking Scope: The Natural Semantics of Quantifiers", Mark Steedman states on page 105: "Full LCFRS and IG are still properly contained within, and much less expressive than, Context-Sensitive (Type 1) grammars. Hoever, they characterize incommensurable overlapping sets of languages and do not stand in a containment relation" This is in contradiction with the table that suggest that no-name languages generated by Linear Context-Free Rewriting Systems (LCFRS) are properly contained in Indexed Languages.

However LCFRS is an important class, and (as I recall) a lot more tractable for practical purposes than Indexed languages, But that is of course a matter of opinion. I would suggest a mark in the table, referring to a note stating there is no containment relation in that case. Bernard Lang (talk) 14:21, 15 February 2014 (UTC)[reply]

I'd prefer to have only proper containments in the table, and (to repeat my Nov 2013 posting) to add a link to an own article devoted to the discussion of formalisms related in a more complex way than just set inclusion. Chomsky hierarchy is a possible place for the latter article. - Jochen Burghardt (talk) 12:19, 14 March 2014 (UTC)[reply]
Another reason to remove the LCFRS line is that it is (currently) not even clear that they are weakly equivalent to Thread automata from the involved articles. - Jochen Burghardt (talk) 12:32, 14 March 2014 (UTC)[reply]
A source for the latter statement: The article Mildly_context-sensitive_language#Formalisms says: "The larger language class is generated by the following formalisms: ... LCFRS ... The larger class is a subset of the class of languages generated by thread automata, but whether this inclusion is proper is not known..[1]" - Jochen Burghardt (talk) 09:31, 25 May 2014 (UTC)[reply]
  1. ^ Kallmeyer 2010, p. 216.

Range of Mildly context-sensitive languages[edit]

If I understand correctly, the term "mildly context-sensitive languages" refers to a range of languages broader than the TAL/LIL and EPDA (= L2 in Weir's Control Language Hierarchy). I'm referring to Joshi, Vijay-Shanker and Weir's "The Convergence of Mildly Context-Sensitive Grammar Formalisms" and Weir's "A Geometric hierarchy beyond context-free languages".--Ippei (talk) 16:13, 21 April 2008 (UTC)[reply]

If you would like to contribute this information in mildly context-sensitive languages, please do, always citing your reliable sources specifically. –jonsafari (talk) 06:25, 24 April 2008 (UTC)[reply]
I will definitely add there mention to Control Language Hierarchy when I get some time. Meanwhile, the mildly context-sensitive language article already says TAL alone is not the MCSL. It kind of bothers me the mismatch in this template and wondering if anyone could come up with an alternative. Fortunately (Weir 1992) has extended EPDA for his hierarchy in the same name. It's just that TAG not corresponding well to the MCSLs (or maybe the language column should be TAL).--Ippei (talk) 21:36, 3 May 2008 (UTC)[reply]
If TAL's are indeed a proper subset of MCSL's, then both rows should appear in this template. –jonsafari (talk) 01:57, 5 May 2008 (UTC)[reply]
That's a good point. The four weakly equivalent grammars (TAG,CCG,LIG,HG) indeed defines the language which Weir's Control Language Hierarchy calls Level-2 (Level-1 is CFL). The properties of Level-k (for some finite k>1) corresponds well with the "rough" definition of MCSL by Joshi. Probably the problem is MCSL not defined in a very proper way, as such does not fit very well into this table. --Ippei (talk) 14:15, 5 May 2008 (UTC)[reply]
I've edited the table minimally reflecting the facts but keeping the convenience. Hope it's the right way to deal with it. --Ippei (talk) 22:27, 8 May 2008 (UTC)[reply]

According to the article Mildly context-sensitive language#Formalisms, "The notion of mild context-sensitivity does not designate a single class of languages, but applies to any language class meeting the criteria in the definition". However, the template currently suggests that there is a single class called "mildly context-sensitive languages", which is recognized by thread automata and generated by linear context-free rewriting systems. I'd like to replace the "mildly context-sensitive" entry by "(no common name)"; possibly a note could say that this language class, as well as that of tree-adjoining languages below it, belongs to the set of language classes meeting the "mildly context-sensitive" criteria.

By the way: shouldn't the template somewhere explicitly state the horizontal correspondence (e.g. regular grammars generate regular languages, which are accepted by finite automata)? - Jochen Burghardt (talk) 22:38, 2 February 2014 (UTC)[reply]

Catagorisation of further automata[edit]

Hello, this is a great template! There are however a significant number of (sometimes oprhaned) articles pertaining rather more obscure automata that are not included. Could the template be extended to include these? Here's a rough list of candidates, where indentation is a possible subset example;

Parity automaton
Büchi automaton / Muller automaton / Streett automaton / Rabin automaton
Kripke structure
Tree walking automaton
Pebble automaton
Quantum finite automata
Learning Automata
Levenshtein automaton
Lattice gas automaton
Continuous spatial automaton
Semiautomaton
Probabilistic automaton
Continuous automaton

(Some of the above could be examples of Cellular Automata, I didn't investigate very far.) A fair few of these I am completely unfamiliar with. I plan to read up on the articles and draft a possible template extension, depending on consensus. --BlueNovember (talk) 12:07, 17 October 2008 (UTC)[reply]

Recursive Recursively Enumerable?[edit]

I wonder why the "Type 0" row is at the top. Aren't recursively enumerable languages a subset of recursive languages, thus the second row should be at the top? —Preceding unsigned comment added by 79.211.162.151 (talk) 22:53, 1 December 2009 (UTC)[reply]

I think the table is correct having Type 0 row on top and recursive languages next. Recursive languages are a subset of recursively enumerable languages, not the other way round. --Ippei (talk) 05:43, 11 January 2010 (UTC)[reply]

CFG CSG CSG[edit]

In other words: There are context-free grammars (in particular those with non-harmless -rules) that are no context-sensitive grammars (since the latter permit only harmless -rules in order to produce the empty string). Thus, the line saying "languages or grammars are proper subsets of ..." should be corrected. Instead, grammars should be added to the sentence about automata. --Zahnradzacken (talk) 18:05, 27 February 2010 (UTC)[reply]

Well then, it's correct now. --Zahnradzacken (talk) 18:53, 19 March 2010 (UTC)[reply]

Star-free grammar?[edit]

Rather than offering no name for the grammar of a star-free language, could it not unambiguously be referred to as a Star-free grammar and be linked directly to the Star-free language article? --24.26.130.82 (talk) 21:03, 22 May 2011 (UTC)[reply]

Undescribable languages[edit]

Over any given finite alphabet, there are uncountably many possible formal languages but only countably many possible formal grammars and/or Turing machines. This implies that there are formal languages which cannot be described by any formal grammar or recursively enumerated by any Turing machine. Is it worth recording the existence of these languages? This would be the highest row in the table if so. 195.212.29.92 (talk) 13:09, 27 May 2011 (UTC)[reply]

I consider undescribable languages worth to be mentioned in some article(s). However, they wouldn't fit into the template, as they are not a superset of any language there, but are disjoint to each of them. Concerning the complementary notion "describable language", some care must be taken not to run into paradoxes like "the smallest language class than cannot be described in fewer than fourteen words".
Referring to the above section #Catagorisation of further automata, I agree that an overview over the automata mentioned there (plus "Van Wijngaarden grammar", which came to my mind) should be given somewhere outside this talk page. Maybe, an own article should be devoted to the discussion of formalisms related in a more complex way than just set inclusion. The template should remain restricted to the main simple-inclusion hierarchy, but link to such a full-overview article. - Jochen Burghardt (talk) 11:04, 6 November 2013 (UTC)[reply]

Problem with table rendering[edit]

There is a layout problem in this table with at least two browsers: IE7 and Google. It could be purely a browser bug but it could also be a problem in the Wikimedia system.

Problem: When text in a table cell is too long and must span more than one line, the table rows lose their alignment, which makes the table confusing. The cell that does this is "Linear context-free rewriting systems etc."

IE7 seems to have a permanent limit on cell width while the line break occurs in Google Chrome 16 only if you make the window narrow (not wide enough). In both cases is the table layout broken. 83.226.178.77 (talk) 22:41, 8 January 2012 (UTC)[reply]

I can confirm this occurs on Firefox as well whenever the window isn't too wide (narrower than 1000px or so on my setup). Definitely needs to be fixed. /blahedo (t) 06:18, 10 February 2012 (UTC)[reply]
I don't get this bug on Chromium. A temporary fix could be to remove the "rewriting systems" part. Andreas vc (talk) 23:44, 16 June 2012 (UTC)[reply]

Removing "recursive grammar"[edit]

I am going to revert a recent edit that added "recursive grammar" to the hierarchy. As far as I know, there is no such category in the hierarchy for grammars. This issue is being discussed at Wikipedia:Articles for deletion/Recursive grammar. If there is such a category and you wish to revert my edit, please provide a reference to that effect from a reliable source. Thanks, --Mark viking (talk) 00:44, 31 March 2013 (UTC)[reply]

Today, such an entry has been added again in good faith. Apparently, the name "Recursive" in column "Languages" is tempting to draw such a connection. However, a "recursive language" (meaning a decidible language) does not correspond to a "recursive grammar" (meaning just a grammar with recursive rules).
In order to avoid repetion of this confusion, I changed in column "Languages" the entry "Recursive" to "Decidable" which
  • redirects to "recursive language",
  • fits well with "Decider" in column "Minimal automaton", and
  • doesn't invite to insert "recursive grammar" (an article "Decidable grammar" does not exist, and need not exist).
Jochen Burghardt (talk) 18:11, 11 January 2014 (UTC)[reply]
This looks good to me, thanks. --Mark viking (talk) 22:33, 19 January 2014 (UTC)[reply]
I changed it purely as part of a clean-up exercise on navboxes. Among other things, redirects should, in general (redirects to sections of another article are less clear-cut), be avoided in navboxes, so that the link can be displayed (automatically) in bold when the reader is looking at that article - this helps in navigation, which is what navboxes are designed for. You can pipe "decidable" to "recursive language" if you want, but it should not be left as a redirect. --NSH001 (talk) 20:21, 21 January 2014 (UTC)[reply]
Sorry, I wasn't aware of that redirect problem in navboxes. Now I piped "decidable" to "recursive language" as you suggested. - Jochen Burghardt (talk) 22:02, 21 January 2014 (UTC)[reply]

Linear Languages and one-turn PDA[edit]

Somewhere between regular and context-free languages lies another class of languages worth mentioning, the class of linear languages, often denotes as LIN. There is a corresponding automaton, the one-turn push down automaton, and there is a deterministic variant DLIN and its one-turn DPDA which is a proper subset of LIN and its automaton.

It is the case that REG DLIN LIN CFL and REG DLIN DCFL CFL, and that means LIN DCFL and DCFL LIN, but I still don't know how they relate to visibly pushdown automaton, so I haven't added them to the template myself yet. Someone need to sort that out. — Preceding unsigned comment added by 129.13.72.195 (talk) 10:09, 5 August 2014 (UTC)[reply]

No common name for languages corresponding to LFCRS?[edit]

I was responsible for the entry "(no common name)" in the "Languages" column next to (the "Grammar" column entry) "Linear context-free rewriting systems", that JMP EAX now has flagged "dubious". Of course, I won't insist on non-existence of a name; if anybody knows one (and preferably a source), please insert it. However, I still think that the former text (before my edit of 17:10, 10 February 2014), viz. "Mildly context-sensitive", would be wrong, see #Range of Mildly context-sensitive languages. - Jochen Burghardt (talk) 12:37, 17 August 2014 (UTC)[reply]

It's true that there is no full account of mildly context-sensitive languages in the sense of a formalism that can generate precisely the full class of such languages (and not more). From Kallmeyer's book (p. 24): "As already mentioned, mild context-sensitivity is introduced as a property of a set of languages. So far, it has not been possible to identify a grammar formalism that generates the largest possible mildly context-sensitive set of string languages. The closest approximation we know of are Linear Context-Free Rewriting Systems (LCFRSs), introduced in (Vijay-Shanker, Weir, and Joshi, 1987; Weir, 1988), and equivalent formalisms such as set-local Multicomponent Tree Adjoining Grammars (MCTAGs) (Weir, 1988), Multiple Context-Free Grammars (MCFGs) (Seki et al., 1991) and simple Range Concatenation Grammars (simple RCGs) (Boullier, 2000b). However, recent research on certain types of MCTAG suggests that there might be mildly context-sensitive grammar formalisms that are not comparable with LCFRS and equivalent formalisms, i.e., that generate languages that cannot be generated by LCFRS and vice versa (Kallmeyer and Satta, 2009)." JMP EAX (talk) 12:46, 17 August 2014 (UTC)[reply]
So if we put a line for mildly context-sensitive language in the languages column, it should (presently) have no grammar next to it. JMP EAX (talk) 12:48, 17 August 2014 (UTC)[reply]
As for the language name for LFCRS-generated language, the usual convention of replacing G (in this case S) with L turns out to hold, i.e. LCFRL [1]. JMP EAX (talk) 12:52, 17 August 2014 (UTC)[reply]

Are you going to create the page Linear context-free rewriting language, resp. Linear context-free rewriting systems, where it redirects to? Or shall it remain as a red-link for the time being?

Wrt. mildly context-sensitivity my point is that a class of languages should be distinguished from a class of language classes. "Context-free" (like currently all other entries in the "Languages" column) is an example for the former, but "mildly context-sensitive" is an example for the latter (according to the wikipedia article as well as to your Kallmeyer citation). - Jochen Burghardt (talk) 13:13, 17 August 2014 (UTC)[reply]

You're not making any sense to me in the paragraph just above. JMP EAX (talk) 13:27, 17 August 2014 (UTC)[reply]

The first sentence of Mildly context-sensitive language says: "In formal language theory, a class of languages is mildly context-sensitive if ..." In contrast, Context-free language says: "In formal language theory, a context-free language is a language generated by ..."

Similarly, Kallmeyer says in your citation: "... mild context-sensitivity is introduced as a property of a set of languages...", (and I supplement:) while being context-free is a property of a single language).

In my view, a single language cannot be called "mildly context-sensitive". As an analogy, being non-empty is a property of sets, so saying "{3} is nonempty" makes sense, but saying "3 is nonempty" does not. - Jochen Burghardt (talk) 13:47, 17 August 2014 (UTC)[reply]

Yeah, I see you have propagated your misunderstandings to the article space too. But that doesn't make them true or according to the source(s). Kallmeyer's writing might not be the most English-like... but is unfortunately the sole book about this. I was trying to figure out what Threaded Automata actually correspond to from her book and I read "TA were developed in order to specify an automaton model for mildly context-sensitive languages. In fact, they accept all LCFRLs, the largest mildly context-sensitive class of languages we know of. However, as mentioned earlier, there are probably other grammar formalisms that generate also only mildly context-sensitive languages and that generate languages that are outside LCFRL." (p. 204) If you manage to parse the last sentence [because her English is written in a sort of German grammar], it's talking about "mildly context-sensitive languages", so presumably it's possible to have such a property assigned to a single language. JMP EAX (talk) 15:37, 17 August 2014 (UTC)[reply]
Actually you were right. The catch is that the obscure phrase "it admits limited cross-serial dependencies" actually has a technical meaning [2] in terms of some languages that must be contained in the class/family, but not necessarily that those strings need to be part of another mildly CS language! This is in fact the only bit that prevents it from being a property of single language. But this was never explained properly in the wiki article. K also mentions it on p.23 in her book, although the technical details are a bit different. JMP EAX (talk) 15:50, 17 August 2014 (UTC)[reply]

Oh, I should note that the article on MCS got nuked/rewritten by another editor in the meantime! JMP EAX (talk) 22:02, 17 August 2014 (UTC)[reply]

wikitext/HTML guru needed[edit]

The rows lose alignment when the browser window is shrunk enough due to word wrapping inside each column. I don't know how to fix that with template wikitext. I suppose we could convert the whole think to a normal wikitext table. JMP EAX (talk) 13:35, 17 August 2014 (UTC)[reply]

I couldn't verify that effect. I'm not sure what browser and/or skin you use, but with a current Firefox and Vector skin, the text within the columns is not wrapped. If that should really be required (and as I said I don't see the need) we could use the code from the last line, <span style = "white-space:nowrap;">, to force no-wrapping. Huon (talk) 14:40, 17 August 2014 (UTC)[reply]
They wrap with Goolge Chrome at approximately less than 1000 pixels browser width. JMP EAX (talk) 15:22, 17 August 2014 (UTC)[reply]


Positive range concatenation languages[edit]

It is said in the table that they are not a proper subset of context-sensitive languages. So, what is known about the relationship between RCLs and CSLs? I can't find anything in the main references (Boullier 1998, Kallmeyer 2010). 186.108.146.134 (talk) 06:18, 6 July 2016 (UTC)[reply]