Jump to content

User:Ryan Postlethwaite/Processing article text

From Wikipedia, the free encyclopedia

Simple enough: if a parser runs across On [[October 11]] [[2007]] a new type of automobile was launched that run on banana peels. it can safely assume the "October 11 2007" is a date because of the brackets. Could it parse for dates and assume they refer to an event in the article? Sure. But it becomes harder with quoted dates (that is, dates within a quote which may have nothing to do the article subject). Special:Whatlinkshere also becomes more useful for looking for events (any event, no matter how minor or insignificant) that occur on a given date or in a given year. Can Google help with some of this? Certainly, but for an automated system which doesn't want to perform a full text search for dates, it's more useful to have Whatlinkshere as an index to these events.

As an aside, I'm sure there are other uses for this in the syntax; that's the funny thing about markup, people will find uses for it that you didn't think of when you added it. :P —Locke Coletc 15:18, 29 March 2009 (UTC)


  • There is a difference of both performance and quality between a search using a parsing algorithm (i.e., one trying to recognise data by pattern-matching the data itself) and one using metadata. Something that has been marked by a human editor as a date is more informative, machine-wise, than it's own guesswork as to what might be a date. This is true even if the text so flagged doesn't follow any standard convention beyond being humanly readable as a date. If <tag>Oct 18 45</tag> is allowed, as well as <tag>Eighteenth of October, 1945</tag>, and even <tag>in October of that year</tag>, the existence of the tags does nothing to detract from the presented data, and allows the development of future applications which might well present useful data to the user. Consider, for example, a parser which was able to resolve that last example, from the article context, as being a date concurrent with the first two - that might be a useful research feature, and one whose operation could only be helped by date tagging. Or imagine a historical article in which the author finds it useful to use the early, local calendar in order to relate the sequence of events. If each date is tagged, an application might offer automatic pop-up conversions of each date into other relevant calendars.
  • The argument that most current users don't see any difference is relevant only to the existing applications, which nobody seems to think useful. If a future application can exploit this metadata to useful purpose, such an application might become part of the standard interface, rather than being optionally configured on a per-user basis.
  • Whilst date tagging as described above would be potentially machine-useful whilst being mostly user-neutral, far MORE machine-useful would be the addition of a field to the tag specifying the date in a standard format, whilst the enclosed text continues to display as written. This would allow bot-tagging without affecting primary content (e.g. quotations), and allow existing proponents of the optional autoformatter to continue to play with it.
  • There needn't be a requirement that all, or indeed any dates are tagged in an article, and as long as no "killer app" appears which makes editors want tagged dates, it's possible that most articles won't have any which aren't inserted by bot-tagging. With the appearance of such an app would likely come a surge of retrospective date-tagging.
  • Of course, the duplication of the date in the tag involves the risk that the two dates may end up different, but this strikes me as nothing new to Wikipedia editors - almost every fact in the encyclopedia can be found in more than one place, and in many cases in hundreds of different articles. Avoiding the possibility of inconsistency is neither a realistic, nor a necessary aim.
  • The extra work involved in creating pages shouldn't be a problem: editors unconvinced of the worth of date tags may simply omit them. Provided their choice of format isn't too obscure ("on the third moon after Michaelmas, in the year of the long winter"), it shouldn't be too difficult for subsequent editors and bots to add them, should they desire.
Nyelvmark (talk) 01:30, 1 April 2009 (UTC)