User:Art LaPella/Proposed Main Page proofreading bot

From Wikipedia, the free encyclopedia

The purpose of this proposed bot is to automate most of the Main Page proofreading I do, some of which needs to be done several times a day. It sounds trivial, until we remember how many million times each triviality is viewed. I intend to submit it at Wikipedia:Bot requests rather than try to code it myself; I do a lot of PowerBASIC coding in my job, but I only wish I knew how to make my program move among Internet pages gathering data. My present idea is to automatically schedule this bot every few hours.

Specific tasks that could be automated:[edit]

Today’s featured article[edit]

  • Make sure the bolded article link and the “more” link point to the same article.

Did you know...[edit]

  • Each entry has at least one bolded link.
  • Each entry starts with {{*mp}}…
  • Space required immediately after ...
  • 3 dots, not 2, not 4 or more.
  • Sometimes one character is used that prints as 3 dots. I’m in the habit of changing to 3 real dots when I notice, to make sure it shows the same for everyone, but no one has ever complained so it might not matter. Anyway, either change the character or recognize it as a substitute.
  • It should include a question mark. It doesn’t necessarily come at the end because occasionally there are 2 sentences. There should not be a space immediately before the question mark.

In the news[edit]

  • Each entry should start with {{*mp|month day}}.
  • The dates should be in order.
  • Each entry should have at least one bolded link.

On this day[edit]

  • The dates should be in order.

All of the above, plus Today’s featured picture[edit]

  • This would use more server time, but it would be nice if every link could be checked to make sure it doesn’t go to a disambiguation page. That situation is among the most frequently reported at WP:ERRORS, even though I manually check links to simple words because simple words are more likely to be disambiguations. To automate this, we could only detect links to disambiguation pages that correctly include the templates {{disambig}} or the alternatives listed at Wikipedia:template messages/General#On disambiguation pages. Or see the "Dab" section of the associated talk page.
  • (pictured) should be (pictured)
  • (bla bla bla pictured) should be (bla bla bla pictured)

Tomorrow’s Main Page[edit]

It would be nice if these edits could be run by administrators after updating a section of the Main Page. But it’s easier to say it should be done than to see that it does get done, and if everybody did what we expected then I wouldn't keep finding these errors. So although I've never updated an entire Main Page section myself, my sense is that we need to edit tomorrow’s text before it becomes today’s. That is, edit tomorrow’s featured article, selected anniversaries, picture of the day, and Template:Did you know/Next update. For the latter, distinguish “Suggestions” from “Credits”, and give a character count if a suggestion is over 200 characters plus a fudge factor of 50. WP:ITN/C might use a similar edit for text bound for the Main Page, although I don't have much experience with it.

Reporting errors[edit]

For an error like forgetting to bold a main article, it wouldn’t be practical to ask the computer to guess which link to bold, so WP:ERRORS would be a good place to report such errors. We could either have a new heading called “Automated error report” or use the existing headings. But some errors, especially adding a space after … in Did you know, would occur several times a day if I didn’t manually, routinely search for “. that” at Template talk:Did you know (if indeed that is even an error – others occasionally report or correct it). So it would help a lot if the computer could automatically fix such routine errors that it’s unlikely to mistake for something else.

Template talk:Did you know[edit]

If my bot idea is popular, Template talk:Did you know could be Phase 2, but not yet. We might want to reformat the page so the bot can more confidently distinguish the hooks.

A nomination with an error can be moved to Next update and on to the Main Page so quickly that no schedule could catch the error before it shows. If a suggestion is too long, it would be nice to give the nominator a chance to rephrase it. It would also be nice to let the nominator fix routine problems rather than the administrator. And we might automate the day headings, “Expired noms”, and deleting expired noms after n days.

But talk:Did you know isn’t designed for a computer to read at all. There’s no easy, foolproof way to distinguish hooks from the comments that come afterwards (the question mark usually works if someone remembers it, but overly long hooks more often contain multiple sentences), or from comment lines, or from comment lines like “How about…” that contain revised hooks whose errors may be bound for the Main Page, or from “How about…(short rewritten sub-phrase)?”

The alternative is elaborate, fallible rules like these: a hook with both a question mark and a period continues after the question mark, if the next character (not counting spaces, “(“ and “[“) is a capital letter other than “Nom”, “Nomination”, “Self nom”, “Selfnom”, “Self-nom”, “Shameless”, or ”User:”. And a comment line is distinguished because it has no more than one of the following identifying marks: “*” (but not “**” or “:*”), “…”, “?”, and a pair of “’’’”. Even that doesn’t always work.

Other things I haven’t included[edit]

  • 1500 character articles. I don’t think there’s a practical way to ask a bot to exclude things like lists. If a bot excludes a list item because it starts with a *, # or number, then where does it end, and what if a list has no such punctuation at all?
  • (‘’pictured’’) or ‘’(pictured)’’. See this edit]. But if it just says (pictured), then which way should we fix it?
  • Some editors add the word “that” to the beginning of Did you know entries, which makes them consistent, but I stopped doing that because I don’t think it’s a consensus.
  • Main Page protection. I haven’t studied it, but I know it breaks down once in a while. Perhaps there are aspects of page protection that could be similarly automated.
  • "Whole numbers from zero to ten" (nine in the talk page version) "should be spelled out as words" (WP:MOSNUM). A frequent edit, but all the exceptions would be hard to automate.