Talk:LR parser

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Article length[edit]

It's a bit long now. Perhaps there should be a separate article about LR(0) parsers that deals with the LR(0) algorithm. -- Jan Hidders 10:01 Aug 18, 2002 (PDT)

This article is useless for a beginner[edit]

Two key notions used in the article "reduction" and "derivation" are not explained. Without these notions the article is useless for a beginner. —Preceding unsigned comment added by 82.240.122.196 (talk) 19:08, 25 April 2009 (UTC)[reply]

Yes, those things any user of parsers & generators cares about should be well-explained before diving into arcane details of LR(0) state machines, closures, and parse tables. The notions of reductions and bottom-up parsing are now described in a separate user-oriented article on Shift-reduce parsers. I've added to the LR article a new intro that attempts cover the high-level concepts at a user level. DBSand (talk) 05:45, 3 May 2012 (UTC)[reply]
I believe this topic is best explained to beginners by starting with concrete examples of parse trees and shift/reduce parse steps for a nontrivial but small example, and then showing a concrete parse table with intuitive state names, and then discussing LR(1) conflicts. And only then diving into the deep end of how states are defined and computed. DBSand (talk) 00:06, 9 May 2012 (UTC)[reply]
What this Wikipedia needs is well-delimited material defining the reduction of sentential forms using productions of a formal grammar. AHMartin (talk) 23:30, 10 March 2017 (UTC)[reply]

Parse position indicator[edit]

This article is one of the better explanations of LR parsing I have seen. My only complaint is the dual use of the "*" character as (1) the current position in a in the LR(0) item, and (2) as the multiplication operator. That leads to items like "E * * B" (an actual example from the text) which is confusing. Perhaps we could just use a period? --Doradus 22:35, 23 Aug 2003 (UTC)

(Scratch that; apparently Konqueror renders · as an asterisk. --Doradus 23:16, 23 Aug 2003 (UTC))

SLR and LALR overlap[edit]

Looks like the "note on SLR and LALR" overlaps a bit with the section on conflicts. Perhaps they could be combined. --Doradus 01:38, 26 Oct 2003 (UTC)

E and B nonterminals[edit]

It's a really minor thing, but could we maybe change the grammars which use nonterminals "E" and "B" to use some other letters? They often look awfully similar on my screen. Marnanel 05:53, 4 Feb 2004 (UTC)

Programming Languages not LR[edit]

I'm looking for a reference for the claim that Perl and C cannot be parsed by an LR parser. Does somebody know one? -- Jan Hidders 14:12, 21 Jul 2004 (UTC)

The one thing that I know of is the dangling else problem. Because of this problem, not only can C not be parsed by a (pure) LR parser, it is not even an unambiguous grammar. I'm sure a Google search for dangling else will produce a large number of references, and if you're lucky, some of them may even have references with regards to other reasons C cannot be parsed by an LR parser (if there are indeed other reasons; I would imagine there are, but I don't know). CyborgTosser (Only half the battle) 07:51, 17 Apr 2005 (UTC)
It's possible to create unambiguous grammar for languages with if-else construction, but requires introducing a new nonterminal symbol: all_instructions_but_if_with_else. However such grammar becomes unnatural and less readible and that's why most parsers use another ambiguousness handling techniques, like associativity declarations.
Typedefs are another issue with C. They make the language context-dependent. It's discussed a little bit here [1], but I think there's a better reference somewhere. --Werdnagreb 19:06, 5 May 2005 (UTC)[reply]
The article now refers to Perl and C++. C++ is probably a better example than C, as I know C parsers get around both of the points noted above with fairly inocuous kludges, whereas C++ is trickier. It might be worth mentioning that early versions of Fortran were even bigger offenders, although the problems go beyond difficulty of parsing. CyborgTosser (Only half the battle) 01:07, 8 May 2005 (UTC)[reply]
Perl is worse, says Perl Cannot Be Parsed: A Formal Proof, but I haven't read it. Rp (talk) 09:50, 10 February 2009 (UTC)[reply]

Minor improvements to the article[edit]

This is, indeed, a great explanation of the LR(0) algorithm. Nevertheless, I'd like to suggest a few changes that may make it better:

a) Change the term "begin state" to "start state", as it seems to be the most correct form to reference the, err, start state of a grammar.

b) the section "Conflicts in the constructed tables" needs some editing, as it seems. For example, it begins with the phrase "The automaton that is constructed is by the way it is constructed always a deterministic automaton", which, although correct, sounds confusing (at least to me). A better wording might be: "The steps presented so far always construct a deterministic automaton." or "The automaton contructed by looking for transitions between item sets will always be a deterministic one."

c) Besides (still in the section referred above), we find several references to the grammar not being LL(0). Because we're discussing LR(0), it looks like a typo. It would be better if an explanation of how the non-LL(0)-ness of the grammar afftects the LR(0) parser was given.

d) Finally -- and maybe this is just in my system, I guess that using • instead of · to represent the dot would make it more visble.

Great article. --Branco 13:39, 23 October 2005 (UTC)[reply]

Go for it. --Doradus 22:57, 13 July 2007 (UTC)[reply]

Discussion of LR(1) Construction Needed[edit]

Although LR(0) item set construction is important bacause it is the basis for the SLR(1) parser construction algorithm as well as most LALR(1) construction algorithms, it should be pointed out that very few parsers are based on LR(0) itself. Indeed, LALR(1) parsers such as those produced by Yacc and GNU bison seem to be the most commonly used LR(k)-based parsers. Discussing LR(1) item set constuction would not only describe the LR(k) methodology when lookahead is a component of an item (as it always is except for the trivial case of k = 0), it would provide the basis for understanding the LALR(1) construction method and why an LR(1) grammar may not be LALR(1) or why an LALR(1) grammar may not be SLR(1).

It might also be pointed out that although a grammar might be LR(k) for some k > 1, there will always be LR(1), LALR(1), and SLR(1) grammars for the same language. --Raaronson 16:58, 27 December 2005 (UTC)[reply]

Discussion of construction of grammars for LR parsers?[edit]

It might be worthwhile to have a section about what sort of grammars work best with LR parsers: yes, they can parse all LR grammars which is a strictly-defined set, but some work better than others. E.g.:

 list ::= item ',' list
 list ::= item '.' 

is an LR(1) grammar that would need to shift the entire input onto the stack before reducing it, whereas:

 list ::= list-items '.'
 list-items ::= item
 list-items ::= list-items ',' item

is a grammar for the same language that allows the parser to reduce after each item. I just added a note to left recursion that mentions this, although not in as much detail. The same reference [2] is probably useful here. JulesH 08:06, 4 August 2006 (UTC)[reply]

removed sentence[edit]

I removed from section 1.2:

In practice the parser will usually try to recover from a syntax error by ignoring unexpected symbols and/or inserting missing symbols, and continue parsing, but this is outside the scope of this article.

This paragraph is like taken from a tutorial. Well, I imagine it's only because of the "outside of scope" thing. Whatever. --euyyn 22:14, 4 September 2006 (UTC)[reply]

It's also not outside the scope of this article. If someone wants to describe common techniques for error recovery, that's perfectly on-topic. --Doradus 22:55, 13 July 2007 (UTC)[reply]

Web resource for LR(k) parser construction[edit]

These are what I found when searching on the Internet for my project on implementing a LR(1) yacc:

- Practical LR(k) Parser Construction [3]. By David R. Tribble, 12/12/2004. This is a long and complete introduction, although many places are labeled as "+INCOMPLETE".

- The Honalee LR(k) Algorithm [4]. By David R. Tribble, 4/27/2006.

BTW, in "E → E + B", maybe B can be changed to T so it looks different from E. "Item" is called "Configuration" in some early papers in the 1970s.

-- oxc 10:04, 7 December 2006 (UTC)[reply]

Yes, the Honalee algorithm is an LR(k) parsing algorithm I invented a while back. It creates LR parsing tables for what I called Merged LR (MLR) grammars, which lie somewhere between LALR and LR grammars in complexity. I have since discovered that it does not, however, handle all LR grammars. There are LR grammars that have mutually conflicting item sets that cannot be merged, but which my algorithm is not smart enough to detect. At some point when I have more time, I'd like to go back and complete the web pages. In the mean time, feel free to paraphrase portions of it for these WP articles. | David R. Tribble / Loadmaster (talk) 15:23, 10 February 2009 (UTC)[reply]

Closure of item sets[edit]

I am not convinced that the example is entirely consistent with the given definition of closure.

In particular, in construction of the item sets the rule "S → E •" is added to Item set 3, and two others are added to complete the closure. However this rule has empty follow set, so closure as defined (as I understand) will not add a rule such as "E → E • * B" to the item set.

It is entirely possible that I have misunderstood some part of this -- if so perhaps this is an indicator that the explanation needs clearing up a little (or that I should pay attention in lectures :p ), but a comment either way by someone more knowledgable than I would be appreciated. JoeKearney 19:52, 6 April 2007 (UTC)[reply]

With a bit more thought, I believe that what were trying to say is this:

If we're at the state of "S → E •" then we want (i.e. the closure includes) all rules to parse anything in the follow-set of E.

In particular it includes rules 1 and 2: "E → E • * B" and "E → E • + B".
I believe the definition of the closure doesn't fully account for this, but I'm not certain. If I have a spark of inspiration, or (more likely) if someone would care to point one way or the other I'll edit to make this more clear. JoeKearney 20:19, 6 April 2007 (UTC)[reply]

Parser is NOT a finite state machine[edit]

The article states that "The parser is a finite state machine". This statement is false, as the parser has a stack, which means it has a potentially infinite number of states. Perhaps the parser is a Pushdown Automaton, but i am not sure enough to change it. Someone who does know for sure, please change this. 62.131.189.89 07:55, 22 May 2007 (UTC)[reply]

I don't think your assertion is correct. A state within a state machine is a node in a directed graph that represents the progress of the finite state machine up to that point. Unless the machine has an infinite number of nodes, it is indeed a finite state machine. — Loadmaster 20:07, 11 July 2007 (UTC)[reply]
No, he's right. An LR parser requires a pushdown automaton (PDA), which is a FSM plus a stack. A PDA can't be modelled by an FSM. It would need an "infinite state machine". --Doradus 22:54, 13 July 2007 (UTC)[reply]
My take on this has always been a little different. It is true that a straightforward execution of an FSM cannot be used to recognize sentences of an LR language (except for the special case of a regular language). But the parser does use an FSM to recognize viable prefixes of right sentential forms. When a final state of this FSM has been reached, the last N symbols scanned are the handle of the right sentential form and are replaced by a new (non-terminal) symbol creating a new right sentential form (the next step in the derivation in reverse). This replacement transformation is determined by tables that have been generated for the grammar. Conceptually, the FSM is then re-executed and the newly transformed sentential form re-scanned from the beginning looking for the next handle. This scanning and transforming is repeated until the transformed input consists only of the goal symbol. If you take a closer look at what happens with two successive executions of the FSM, you realize that the restarted FSM will make the same state transitions as the prior execution until it encounters the non-terminal symbol that was just replaced for the handle detected during the prior execution of the FSM. As an efficiency, then, the entire re-scan can be avoided by remembering (by using a stack) what those state transitions had been. So an LR language can be recognized by repeated executions of an FSM with input transformation performed between the executions (which, admittedly, is an altogether different thing than a simple application of an FSM). The use of a stack can be viewed as an optimization. -- Raaronson (talk) 13:01, 13 March 2008 (UTC)[reply]
The process you describe is not a useful one. Transforming and re-parsing the entire input after each reduction step would render the parsing process uselessly inefficient, requiring unlimited memory and O(n^2) time. Parsing a CFG can't be done online with a DFA or NFA. --Doradus (talk) 23:29, 11 February 2010 (UTC)[reply]
The LR parsing algorithm is driven by a parser table, which is a data structure with finite states and state transitions. The parser uses a stack during its operation to keep track of the states it has passed through, so it is a push-down automata (PDA). I think "finite-state machine" (FSM) is a more general term that can refer to an NFA, DFA or PDA. Paul B Mann

Recursive ascent parsing[edit]

Prefaced the description of the stack machine with a remark about the alternative, functional, recursive ascent, view on LR parsing, and added a corresponding reference 81.5.0.80 18:45, 11 July 2007 (UTC).[reply]

Dangling else and typedef[edit]

I made a few corrections. A parser does not deal with a grammar. It reads computer languages. It's the parser generator that "deals" with a grammar.

About the "dangling else" problem ... A BNF grammar that contains the dangling else problem is ambiguous and NO parser can handle it, unless the ambiguity is resolved !!! An LR paser can and does handle the dangling else problem just fine, because the ambiguity (or conflict) gets resolved at parser generation time by chosing to shift the "else" onto the parse stack instead of making a reduction before accepting the "else". So, it is a misconception to say that an LR parser cannot handle the dangling else problem.

About C++ and typedef ... A simple or pure LR parser cannot handle the context sensitive problems with C++ and the typedef in C. However, a smart LR parser which uses the symbol table to classify <identifier>s as "typedef" or just <identifier> can successfully parse C++ and C typedef situations. So, again, one should not generalize that LR parsers cannot handle C++, because in practice it is possible. Theoretical computer scientists might disagree, but then changing definition of LR parser can decide the argument if you want to get mathematical. I just don't like to put limitations on LR parsing, besides what other techniques are you going to use -- recursive descent (what a nightmare)?

Paul B Mann —Preceding undated comment added 00:36, 10 September 2009 (UTC).[reply]

History[edit]

This article is sorely lacking a History section. Does anyone know enough to add one? --Doradus (talk) 16:46, 3 February 2010 (UTC)[reply]

I have the "Dragon Book" (Aho, Sethi, Ullman) on my bookshelf. I will consult it later to see if there is sufficient information there to provide a history. It's been years since I used it, so I don't remember right off the top of my head. Dead Horsey (talk) 01:38, 4 February 2010 (UTC)[reply]
I checked my Dragon Book, and there isn't enough information there to write a history without violating WP:NOR. Dead Horsey (talk) 23:35, 21 February 2010 (UTC)[reply]
See History of compiler construction Paul Foxworthy (talk) 12:47, 11 March 2012 (UTC)[reply]

I belive Knuth invented LR parsers. At least that information should be in the article and a reference to the original publication. 89.153.132.2 (talk) 23:22, 8 May 2011 (UTC)[reply]

Yep, he did, in 1965. The LALR parser article states that:
LR parsing was invented by Donald Knuth in 1965 in a paper, "On the Translation of Languages from Left to Right".
So I added it to this article. — Loadmaster (talk) 16:33, 9 May 2011 (UTC)[reply]

LR(k) == LR(1)?[edit]

Article says "Every LR(k) grammar for k > 1 can be mechanically transformed into an LR(1) grammar", then later it says "most programming languages can be expressed with LR(k) grammars, where k is a small constant (usually 1)." Surely, given the first quote, 'usually 1' makes no sense? —Preceding unsigned comment added by 80.0.167.213 (talk) 17:27, 3 August 2010 (UTC)[reply]


I addressed this issue and clarified and referenced this part of the article.
Espadrine (talk) 12:56, 29 May 2011 (UTC)[reply]

Arbitrary context-free-languages without penalty?[edit]

“LR parsing can be generalized as arbitrary context-free language parsing without a performance penalty, even for LR(k) grammars.” What? This sentence does not make sense, I thought, LR(k) ⊂ CFL, and in the following sentences a performance-penalty is mentioned. --Chricho (talk) 12:07, 11 September 2010 (UTC)[reply]

Example grammar and input are too small[edit]

The example parse for '1 + 1' is too trivial to give any insight into bottom-up versus top-down parses, or the handling of delimiters and non-infix operators, or necessary recursion, or the meaning of individual LR(0) states. DBSand (talk) 06:57, 3 May 2012 (UTC)[reply]

Second thoughts. The example '1 + 1' is rich enough to show bottom-up versus top-down parsers, if the parse tree is graphically shown in a diagram (not yet present). It is rich enough to show the meaning of LR(0) states. (Showing the core items in the state chart would help.) And it does show one important use of grammar recursion. Showing use of non-operator delimiters or prefix unary operators would be nice, in a separate article about grammars, but would needlessly complicate this short article about LR.

The '1 + 1' example and grammar has three disadvantages for this article:

1) Its grammar is only LR(0) so the parse table has no examples where lookahead is needed to pick between shift action and reduce action. That's very important for explaining SLR(1) and LALR(1) parsing as used in practice; there are very few practical grammars that are merely LR(0). The existing article handled the shift&reduce topic by later working with a separate 2-rule grammar. But the actual table for that second example is not shown. In the alternate combined grammar examples I've considered, the cases needing lookahead had two separate precedence levels for products and sums. Is there some simple 1-level expression grammar that does need lookahead?

2) The example '1 + 1' has no multiple-symbol reductions other than the final one at the end of the entire input. The article should give an example where the parse stack is clearly acting like a stack. Also, with reductions reaching back to a state that was not already on the top of the stack. This now occurs at the end, but that can give the impression that this happens only then. It is somewhat helpful if there is some multiple-symbol reduction before reaching the end of the entire input. That could be done with example "1 + 1 + 1" or "1*1 + 1" or "-1 + 1".

3) Having both * and + at the same precedence level is counter to actual common practice. This is minor, and could be fixed by using - instead of *.

DBSand (talk) 15:50, 13 May 2012 (UTC)[reply]

Tables and Charts[edit]

The example Action/Goto table should show the meaning of each state (ie its unclosed item set), not just the state number.

The example parsing steps chart should have the stack (symbols & state#s) as left column, the decoded top state (ie its unclosed item set) as second column, the lookahead and unscanned input as third column, and parser action as fourth column. The action column should show the actual grammar rule being applied, not just a rule number. The 'output stream' column should be discarded; it is just the column of applied rules.

DBSand (talk) 06:57, 3 May 2012 (UTC)[reply]

Prolog?[edit]

This Prolog implementation belongs in some separate specialty article for Prolog fans, not here. The backtracking pattern-matching mechanism underlying Prolog is much more complicated, slower, and opaque than LR's own deterministic, linear pattern matching methods. Using Prolog to explain LR does not explain why LR is fast, or why LR grammars must be deterministic. Other editors will not know how to correctly update the Prolog code if the article's main example changes. And finally, Prolog is unfamiliar to most readers. I doubt very much that 99% of readers of this article will get any understanding from a prolog description of the LR tables and parser algorithm.

An imperative description of the parser's inner loop, in some widely understood language, with helpful names, could be helpful. DBSand (talk) 06:57, 3 May 2012 (UTC)[reply]

Article entirely rewritten on May 8 2012[edit]

The goal of this major revision was to make LR parsers better understood by programmers who are users of LR parser generators, and give an intuitive understanding of the generator's work and parser's work without going into the mathematical details and formalisms of a parsing theory textbook. Where possible, the parser's actions are explained in terms of the basic shift-reduce framework, or by showing a concrete realistic (but small) example. The progression from concrete actions, to tables, to state generation, to theory, will likely feel backwards to some. I hope this ordering is helpful to beginners. The example input and example grammar are bigger than before. None of the existing commentary based on the old examples was retained as is. The prolog implementation was similarly dropped.

I welcome feedback from previous editors. I wonder if the old and new articles should both remain visible for awhile?

DBSand (talk) 06:00, 9 May 2012 (UTC)[reply]


DBSand, thank you for your contribution. It looks like you've spent a lot of time researching this topic and have presented an in-depth example for the Wikipedia community and the general public. I have not read the article in its entirety, but it is clear that the primary example is substantially more intricate than the original "1+1" grammar.
Having said that, I am not sure that the Wikipedia article "LR Parser" is the correct resting place for this fine work. Please consider the possibility that there may be other areas of Wikipedia itself and the Wikipedia support structure including Wikibooks that may be a more appropriate recipient of your work.
The reason I bring this up is that it seems that the new article is rather more in-depth than the previous one and attempts to educate the reader. This is a purpose more suited to Wikibooks than Wikipedia. I personally have used this very Wikipedia page to implement a SLR parser on two occasions, and for this the "1+1" grammar works well. The sample grammar on the new page seems more appropriate for a researcher or student who wants to gain a deeper understanding of the theory of LR parsers. In my opinion, both "cookbook engineering" and "in-depth study" are topics outside the scope of Wikipedia, but it is more or less the nature of Wikipedia to attract and retain information that is well outside the scope of Wikipedia...
Let me come out and offer my critical remarks on the new version: it is of less use to an implementer such as myself than the original one due to the increased size and complexity of the example. However, I am entirely confident that there are some out there who found the original example to be so simple as to be useless for understanding the theory of LR parsers, and the new example substantially more informative.
My first concern is to retain the original content of the article somehow. There are a variety of possibilities such as reverting the article, merging the old article with the new one, and using a system of sub-pages to let the reader decide.
Awaiting remarks. 173.239.78.54 (talk) 01:17, 12 May 2012 (UTC)[reply]
I looked briefly at Wikibooks, but that seems to be for 100+-page textbooks rather than for 10-page introductions to single topics.
Wikipedia is for education at several levels. Wikipedia itself needs, IMO, (1) a quick gentle introduction to the major kinds of parsers now used in practice (mostly, recursive descent, and table-driven LALR(1), and operator precedence) without scaring away the reader with the formalisms of automata theory; (2) A gentle introduction to LALR(1) parsing sufficient for new users of existing generators; (3) An introduction to LR generators for the curious, and to motivate how to deal with conflicts on new grammars; and finally (4) a Wikipedia equivalent to a week of theoretical college course material on how to re-implement existing parsers & generators. I aimed to help 1,2, and 3.
The tutorial section and the generator-magic section could be factored out to other articles. But there is some gain in using a single example and terminology throughout both.
A good thing about the small scale of the 1+1 example, was that it was possible that the very curious would read and confirm every detail of the charts. That's unfortunately not likely with the new larger example. But the larger example does allow seeing by eyeball, how information flows into the parse stack and gets chunked there. That's essential for understanding the big picture. And the larger example shows interesting things happening, that never happened for 1+1. I initially had an even larger example, and also looked at something in between 1+1 and the current A=B+C*2 example.
I don't know the Wikipedia etiquette for handling large changes. But the advice pages said "go bold", not to hide the new stuff where it wouldn't be seen. My regret is in making the former article invisible and inaccessible to non-editors.
In the two SLR parsers you implemented, was that two grammars with an existing generator and existing parser skeleton? That's exactly the target audience I wanted to help.
A fun use of Wikibooks, would be the easily-executed python source code for an actual LALR(1) generator such as Ply, instrumented to explain what it is doing.
DBSand (talk) 06:14, 12 May 2012 (UTC)[reply]
(173.239.78.54 here, I just created the account with the memorable name Throwaway8493) Ok, so it sounds like you want to keep the content of the new version in Wikipedia, which is fine, but basically you have deleted a bunch of content that a lot of people have spent a lot of time on, content that was still evolving. We aren't really in a position where we can say that one version is better than another because there are many different people, and some will prefer the new while others will prefer the old. This is because the old one has some content the new one lacks. I would suggest merging the content (without cluttering it or sacrificing good organization) in a way that allows the article to do the greatest service to the greatest number of readers.
Fortunately we have some tools to handle the situation. I have two ideas, the slow process-oriented one and the quick-and-dirty one. Here is the process-oriented version:
1. Try to maintain continuity with the original article and avoid confusion. This is a process that seeks to incorporate feedback from editors before replacing the main article with the rewrite. This can be done by temporarily moving the new version to LR_parser/Rewrite, reverting the main article to the last version prior to your edits, and sticking a "This article is being rewritten at LR_parser/Rewrite" template at the top of the article. When the new article has stabilized and there is general agreement that the new article is better, the main article replaced with the rewrite. This gives editors warning that the rewrite is underway.
2. Find some way to stick the content of the old version in the new version. This could involve the creation of, say, LR_parser/Sample_code for the prolog code, or other sub-pages such as LR_parser/Examples. Or it may be simpler to keep everything on the main page. Basically we don't want to delete content. Moving the content around is fine, we just want to retain the 1+1 example somehow, which accounted for most of the content of the original article. Simple is good, so appending the original article to the new one and removing the redundant "Theory" section so there aren't two identical ones would be perfectly acceptable for a first draft.
3. This does not have to be perfect, and the article will surely go through many more iterations in the future. Many people will have ideas about how to deal with the examples, how to serve readers who don't care about the examples or source code, etc... I would say keep the rewrite in /Rewrite for at least a few days. If it is getting a lot of edits, it can stay there, but I wouldn't keep /Rewrite around for a long time, a month is way too long I think.
Anyway, that is the idea. Alternative quick-and-dirty approach is to just append the original article to the main article, delete the duplicate "Theory" section, forget about the process, and start hacking away.Throwaway8493 (talk) 08:32, 12 May 2012 (UTC)[reply]
I've added a link to the former article. I prefer to keep the new article visible & up front awhile, to get immediate feedback, especially about errors and major omissions. This weekend, I plan to reduce the size of the examples by removing the Assign stuff. If former editors are around now, I'd like to get work with them on creating a joint article that best serves the readers' needs.DBSand (talk) 18:39, 12 May 2012 (UTC)[reply]
I removed Assign from the example. This shrank the over-wide LR parse steps chart and made the other charts and diagrams a little smaller. I'm now looking at whether the grammar can be further flattened without losing its need for LR(1) lookahead.
Revising the parse tree diagram and the various charts is tediously hard. Whatever examples are used, they are likely to lock-down the explanatory content of the article for years, in the same way the original article did.DBSand (talk) 16:01, 13 May 2012 (UTC)[reply]

I reverted this major change and restored the main LR parser article back to its original form. The proposed replacement article is instead at a temporarily-separate article User:DBSand/LR parser/Rewrite. This separate location allows the original article to continue evolving incrementally for awhile without trying to share one Wikipedia change history before a final merge. Please review the reworked article for correctness, omissions, etc and suggest ways to merge the two articles together. Thanks! DBSand (talk) 19:16, 13 May 2012 (UTC)[reply]

My plan to support separate edits of the original and replacement articles has hit a snag. Some Wikipedia admins unconditionally toss all new articles that look like forks, into userspace, and suggest that further work occur there in some individual's userspace until the article is stable and no-one resists it replacing the original. But regular links from mainspace articles to personal space are immediately deleted. Other admins strongly recommend that any multi-author articles get edited only in Wikipedia main space, so that Wikipedia has a complete legal record of who had personal copyrights when the article text entered main space and was revised. So collaborative editing in user space is impractical. DBSand (talk) 01:06, 14 May 2012 (UTC)[reply]

The new plan is to put the new article online and visible, with the example contents of the original article retained as another section in the combined article. That will be a starting point for subsequent edits, removing overlaps, using a common style, etc. DBSand (talk) 17:36, 14 May 2012 (UTC)[reply]

Please don't change the article. It is well written and understandable to non-expert also. Having two examples is important to reinforce the understanding of the subject as one example is not enough. The introduction to the article was also excellent. Do not worry about the length it is very well manageable. Vettukal (talk) 13:51, 8 February 2014 (UTC)[reply]

Forks Discouraged[edit]

Within minutes of my creating a separate article LR parser/Rewrite in the main Wiki namespace, two Wikipedia admins moved it into a personal namespace User:DBSand. And no links from articles in the main namespace to personal namespaces will persist longer than a day or two. W. has defense mechanisms against what it perceives as permanent forking of articles with similar topic coverage. Don't know how to proceed. I don't want to lock up the original article and prevent it from getting incremental changes as part of a slow merge. DBSand (talk) 19:49, 13 May 2012 (UTC)[reply]

I've left you a note on your talk page: maybe moving it to a subpage of the article's talk page is the best solution. Writ Keeper 19:57, 13 May 2012 (UTC)[reply]
One option suggested by admins, is to stick the new article underneath talk:LR parse in main space. But the new article would remain invisible to readers and search engine.
A second option, suggested twice by Throwaway8493, is to append the old article to the end of the new article for awhile, until all merge issues are resolved. Is slightly ugly in short run, but gives equal treatment for readers and search engine, to the original and new content, and shows the need for further merge work in a very clear way. This method involves less Wikipedia magic than the first option. Preserves all history, all collaborations. Avoids untimely interference from random admins unfamiliar with this topic. Sounds good! DBSand (talk) 01:30, 14 May 2012 (UTC)[reply]

Comment[edit]

The article lacks inline citations, is not structured in accordance with the WP:MOS. These would appear to be major problems. Jezhotwells (talk) 19:00, 13 May 2012 (UTC)[reply]

From the timestamps, this appears to be about the original article, restored just a few minutes before. The relative lack of inline refs is a known issue. The comment about MOS structure is unclear. What in particular was Jezhotwells objecting to, other than the amount of inline refs? DBSand (talk) 00:43, 14 May 2012 (UTC)[reply]
  • Well for a start we do not have comments such as "A rewrite of this article with more tutorial focus and a larger example is under community review at User:DBSand/LR parser/Rewrite." within articles.
Fixed, using comment note visible only to editors.DBSand (talk) 17:28, 14 May 2012 (UTC)[reply]
  • The lead has too many paragraphs, see WP:LEAD.
Fixed.DBSand (talk) 17:28, 14 May 2012 (UTC)[reply]
  • Phrases such as "From this table and the found item sets we construct the action and goto table as follows:" are not encyclopaedic, reads like an instruction manual.
  • "Note that only step 4 of the above procedure produces reduce actions, and so all reduce actions must occupy an entire table row, causing the reduction to occur regardless of the next symbol in the input stream. This is why these are LR(0) parse tables: they don't do any lookahead (that is, they look ahead zero symbols) before deciding which reduction to perform. A grammar that needs lookahead to disambiguate reductions would require a parse table row containing different reduce actions in different columns, and the above procedure is not capable of creating such rows." Paragraphs such as this should be rewritten in good plain english, at the moment this is gobbledegook to the average reader. Jezhotwells (talk) 13:16, 14 May 2012 (UTC)[reply]
Thanks; that's advice we can certainly apply.
These comments definitely apply to the pre-rewrite contents of the article.
DBSand (talk) 15:09, 14 May 2012 (UTC)[reply]
I think parts of this article will almost necessarily sound like an instruction manual - this is going to happen with any article describing an algorithm, for example Euclidean Algorithm. So I'm not really sure where that comment was going. Throwaway8493 (talk) 08:16, 16 May 2012 (UTC)[reply]

Major rewrite of article, May 14 2012[edit]

A combined version of the original article and a major rewrite is now online at the LR parser article's main public page. The merging process is incomplete. The intro, theory, and reference sections are similar to before. All other content from the original article is now section "Additional Examples"; this section needs further adjustments.

The new section LR Tutorial is aimed for beginner readers, and to programmers who are users of LALR generators but are not concerned with the details of LR construction. The section LR Generator Analysis is aimed to users who need understanding of grammar conflict complaints from their generator. Both sections are built around a single example. The example and grammar are bigger than that of the original article, so that 1) the table has LR(0) conflicts as in most SLR(1) and LALR(1) grammars, and 2) some multi-symbol parse stack reductions occur before the end of input. DBSand (talk) 20:30, 14 May 2012 (UTC)[reply]

Thank you very much for understanding, I took a look at http://en.wikipedia.org/wiki/User:DBSand/LR_parser/Rewrite and it looks good. The main thing right now is to have all of the pieces of the puzzle in one place, I'm sure more parts will be added and the way the parts fit together will evolve, etc... Throwaway8493 (talk) 08:03, 16 May 2012 (UTC)[reply]

Removed sections[edit]

In the section "Architecture of LR Parsers", now removed,

  • LR theory parsers and most actual(table driven or generated-code) LR parsers are iterative, not a recursive program.
  • The recursive ascent variant is a minor footnote, covered elsewhere in article, and should not lead this section. And recursive ascent has same O(n) time and space efficiency as table driven, and is slower than explicit-stack non-table parsers.
  • Proving of correctness is a dubious claim, since every yacc-like generator that accepts disambiguating rules makes parses that can't be proven correct. Those parsers can only be demonstrated to work as expected by testing. And no generator or parser skeleton has been proven correct.
  • The mention of rightmost derivation is unnecessary jargon.

In the section "General Case", now removed,

  • Calling the parser a state machine is misleading, unless you also go into theoretical jargon of PDAs and FSMs and their differences.
  • An input buffer is unnecessary detail.
  • The lookahead item is not mentioned.
  • The goto table handles only some (nonterminal) moves to next state, not all moves.
  • The grammar itself is usually omitted from the parser.
  • The paragraph about rightmost derivations is better handled by prior discussions of bottom-up parses.

The diagram "Architecture of a table-based bottom-up parser" could be improved by showing the symbols & parse subtrees on the stack, not just the state numbers of the 1+1 example; and by showing the stack growing rightwards, not upwards; and by omitting the output line.

The metacode description of the LR parser loop should move upwards into the main tutorial section.

DBSand (talk) 22:50, 20 May 2012 (UTC)[reply]

Merging TODO's[edit]

The original and new sections use different styles for

  • Grammar rule symbol ← or →
New sections now use standard →. DBSand (talk) 21:21, 10 June 2012 (UTC)[reply]
  • Grammar rule number r1: or (1)
  • Item position marker ♦ or •
New sections now use standard • ; now colored as in all sections DBSand (talk) 21:21, 10 June 2012 (UTC)[reply]
  • Input end marker eof or $:
  • Stacked state numbers as 0 or 0
  • Stacked terminal symbols as + or '+'
  • Stack contents with or without [...] brackets
  • Nonterminals as names Sums or single letter E
  • Terminals as int or digits (easily misread as state numbers)
  • Successful-parse action as Done or acc
  • Next-state numbers in Action table (but not Goto table) as 9 or s9
  • LR(0) items called 'partially parsed rule' or 'item'.
  • LR(0) states called 'parse state' or 'item set'.
  • Tables show each state's LR(0) core, or just the state number.
  • Parse steps chart showing parse stack to left of input stream, or to its right.

These notational & terminology differences should eventually be eliminated.

DBSand (talk) 18:20, 22 May 2012 (UTC)[reply]

I agree. I also think we should create a navigation bar for some of the important parsing-related entries so that it's easier to achieve topic-wide notational convergence. — Kennyluck (talk) 09:48, 5 January 2013 (UTC)[reply]

The original and new sections use different styles for

...
  • Successful-parse action as Done or acc
Well, I've just made it consistently use "acc[ept]". AHMartin (talk) 04:58, 5 October 2023 (UTC)[reply]

All occurences of "you" should be removed[edit]

The use of "you" in the article is a violation of WP:TONE. We should remove all these "you"s. I propose we also remove the paragraph starting with "When using an LR parser within some larger program, ...". — Kennyluck (talk) 10:17, 5 January 2013 (UTC)[reply]

what are the pink dots denoting?![edit]

what are the pink dots denoting?! — Preceding unsigned comment added by 89.134.223.11 (talk) 06:53, 1 July 2013 (UTC)[reply]

It denotes the position of the parser within a production. The article mentions this, but it is a ways down:

The item E → E + B, for example, indicates that the parser has recognized a string corresponding with E on the input stream and now expects to read a '+' followed by another string corresponding with B.

Deathanatos (talk) 02:35, 14 August 2013 (UTC)[reply]

Is there any specific reason for using pink (or any other color)? I was having a hard time discerning whether it was a speck on my LED display, while reading this page in non-optimal lighting conditions. I imagine it would be a "bad" color from a usability POV. I will watch here, and unless I see any arguments I will probably change it to at least a darker color. --Lasse Hillerøe Petersen (talk) 22:18, 5 November 2014 (UTC)[reply]

1 + 1 Example is not LR(0)![edit]

The third section of the article in its current form references a grammar as an example that is labeled explicitly as LR(0), but entirely isn't prefix-free. You can easily create "1 + 1" (E → E + B → B + B → 1 + 1) which is, of course, a prefix of "1 + 1 + 1" (E → E + B → E + B + B → B + B + B → 1 + 1 + 1). Yet that should not be possible since LR(0)-grammars are supposed to be prefix free. [1]

Or am I missing something here?

03:29, 19 January 2014 (UTC)

Yeah, it's "1+1$" and "1+1+1$". The termination symbol is what makes it prefix-free. Basically, this is a confusion arising from the fact that it is possible to convert an arbitrary formal language on N letters to a prefix-free formal language on N+1 letters by adding the termination symbol. In this article, this is called "augmenting the grammar" which is why there is an extra production whose only job is to accept the start symbol, which in this case is E. So in your productions, the initial step is missing, so you would write "1+1$" (__start__ → E + "$" → E + B + "$" → ...). Hope this helps. 173.239.78.54 (talk) 21:07, 16 February 2014 (UTC)[reply]

References

  1. ^ Reghizzi, Stefano (2009). Formal Languages and Compilation. Springer. p. 215.

Incorrect/misleading statement about error reporting[edit]

Currently the introduction ends with the sentences "LR is also better at error reporting. It detects syntax errors as early in the input stream as possible.", comparing LR to LL parsers. However, LL(1) parsers have the immediate error detection property. Strong LL(1) parsers only have the valid prefix property by default, but a small change to the algorithm can restore the IED property. The literature strongly implies that LL(k) has the IED property for all k, though I haven't found anywhere that states that outright. I think the statement should be removed, especially in light of the fact that the LL parser page only shows a pure LL(1) parser, which is likely to result in anyone reading both pages to think that the example parser has worse error detection properties than an LR parser, which is not correct. Klkblake (talk) 11:05, 24 February 2016 (UTC)[reply]

Worst case complexity of other parsers[edit]

According to the linked articles, CYK algorithm#lead, Earley parser#lead, and GLR parser#Advantages, they all have O(n3) worst case complexity. So they can't need exponential time. Conversely, an algorithm that does need exponential time due to bad guesses doesn't deserve any of the threee algorithm names. - Jochen Burghardt (talk) 13:51, 1 April 2016 (UTC)[reply]

Symbols and tokens[edit]

Some registered user might want to further explain - or define somewhere, that whenever the article talks about 'symbols', tokens and not necessarily single chars are meant. This might confuse some beginners. --2A02:8070:9288:6B00:6515:51E4:6551:5F5A (talk) 22:14, 27 February 2017 (UTC)[reply]

I just wikified "symbol". I wish there was either a cleanly delimited subsection (or a separate article) for the definition used in formal language theory. But the correct definition is in the formal symbol page. AHMartin (talk) 23:23, 10 March 2017 (UTC)[reply]

External links modified[edit]

Hello fellow Wikipedians,

I have just modified one external link on LR parser. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 15:12, 14 December 2017 (UTC)[reply]

Error reporting[edit]

In the article it is claimed that "LR is also better at error reporting. It detects syntax errors as early in the input stream as possible.", a claim that is disputed in this Stack Overflow answer. I'm not an expert on the topic, but I'm more inclined to believe a well-voted-up SO answer than an unsourced claim on wikipedia. Should these two sentences be removed?

Explicitly show production rules[edit]

The article immediately begins talking about `Product` and `Sum` productions without ever having defined the grammar whose sentences we are parsing. This makes it extremely confusing to follow the diagrams and explanation of the algorithm. It would be useful to precisely and clearly state the grammar that is being parsed at the outset. — Preceding unsigned comment added by 173.177.218.122 (talk) 13:58, 11 March 2022 (UTC)[reply]