User:Charles Matthews/WikiProject DNBMerge

From Wikipedia, the free encyclopedia
Now of historical interest. See Wikipedia:WikiProject Missing encyclopedic articles/DNB and CAT:DNBPROJ.

WikiProject DNBMerge has two aims. The first, which is conventional enough, is to push ahead with merging into Wikipedia content from the Victorian-era Dictionary of National Biography (DNB), a major British biographical dictionary that is now in the public domain.

The second aim is to pilot a methodology for such merges, taking into account Wikipedia:Merging encyclopedias. This, potentially, is more significant.

Main motivation[edit]

The importing of DNB text and its adaptation to Wikipedia's needs can of course happen anyway without a project. Projects can supply backbone to what is otherwise an invertebrate, casual procedure. They can document progress, and indicate neglected areas. They can provide motivation and a degree of recognition to those who do the work. Where such projects have fallen down, in the past, is in their self-conscious role as guardians of important knowledge about the work. They can therefore look perhaps like "talk pages" but have lacked the equivalent of "page history" and "guidelines". The word has been "haphazard", even if (as for the celebrated 1911 Britannica merge) there has been major progress in terms of quality additions to the site.

It is therefore timely enough to reconsider the whole merging business, and to do "proof of concept" in the form of a working example of a more systematic merge. The main planned features of DNBMerge will be the following:

  • Table-based project pages, to replace conventional list-based methods.
  • A standard of at least seven columns in tables.
  • Table columns to cover
    1. Reference number
    2. Title
    3. Status assigned by DNBMerge
    4. Abstract
    5. Wikisource status
    6. Orphan?
    7. DNBMerge rating of merged article.
  • The full project history should be accessible in the project pages' editing histories, with no need for further documentation.
  • The procedure for updating the status column, from an initial "unchecked" status, should follow an explicitly given flowchart, that should largely carry over to other merge projects, and have general relevance to list maintenance.
  • There should be proper integration with Wikisource in the matter of having the original DNB text uploaded there, in a parallel "sister merge project".

Biographical dictionaries are a little easier[edit]

The simplifications inherent in a biographical dictionary will mean that DNBMerge is not quite the general case.

  • A general encyclopedia merge requires the tracking of splitting of articles (for example, an article on "family F" ought often to be split into Wikipedia articles on each family member, and generally encyclopedias tend to place together under one title composite articles where we would set up a dab page and split various meanings apart). DNBMerge will be able to ignore the decisions inherent in the situation that article A in the source would best divided into A1, A2 and A3 and then these parts should be merged into Wikipedia articles W, X and Y, respectively. This methodological issue can easily come up, with A really comprising "A in ancient times", "A in the Middle Ages", "A in modern times", for example; but not when A is a person.
  • Another simplification is the "orphan" issue. Links can be installed on surname pages, to deorphan most biography articles. It is in bad taste for a merge project to create orphans, because that suggests that getting ahead with the project has priority over maintaining the integrity of the existing hypertext on Wikipedia. It is acceptable to have further names added to surname pages, or to create them, as surname pages are a type of disambiguation page, a status that makes them a special case in the hypertext. The merged articles will then not have any negative effect in hypertext terms, and the creation of new surname pages (where there is more than one instance) is acceptable.

Suggested procedure[edit]

Starting from User:Magnus Manske/Dictionary of National Biography, create new table pages of 100 titles at a time. Without rushing ahead, use comparison with the Concise Dictionary of National Biography to make those tables and correct OCR errors. This is to be a pilot, not to grasp too quickly at the "percentage complete" figure. We will be interested for each page in the numbers at each status (could be automated). But the idea is to edit the status column explicitly as we go along, and refer to a flowchart in so doing. The pilot pages could be few in number, initially, but the work should be systematic. The idea is to demonstrate, not (to start with) to create large numbers of new articles, though of course that is a side effect and bonus.

Other techniques[edit]

Not mentioned so far are "on-page" and "on-talk page" techniques, which can use templates and (possibly hidden) categories in an auxiliary fashion.

Current discussion[edit]

Extract (corrected):

I have been thinking of a general schematic for the associated list maintenance. It is quite a complex flow diagram, in fact. The fundamentals are the redlink/bluelink distinction, and then the labelling of bluelinks by {{dn}} and {{mnl}}. There should also be a template or templates for "correct" entries. These are going to be related to merges. Suppose then that {{cl}} is a template for "correct link": [...] human judgement should then refine to either {{clex}} (expand the existing article from the DNB) or {{clne}} (not to expand from DNB). I prefer changing templates to removing list entries, but {{clne}} entries could be moved to a "done" list.
Updating {{mnl}} (misleading name link) entries should be a check: either a correct link can be found, so change the link and mark with {{cl}}; or a dab page is a better target (tag with {{mld}} for "misleading link, disambiguate") by creation of an entry on a dab page; or there is need to add a hatnote to the misleading target page ({{mlhat}}). Updating {{dn}} links is classic disambiguation, and there are two cases, {{dnav}} where the correct page is available through the dab page (update link and tag with {{cl}}), and {{dnna}} where the correct page is not available, and the bluelink on the list should become a redlink (tag with {{dncr}} for "disambiguation done, this article should be created at this title).
Anyway, there is more to say about redlinks in this business [...]

Trying to list the different tamplates here:

{{cl}}
correct link
{{clex}}
expand the existing article from the DNB
{{clne}}
not to expand from DNB
{{mnl}}
misleading name link
{{cl}}
???
{{mld}}
misleading link, disambiguate
{{mlhat}}
hatnote
{{dn}}
link is classic disambiguation
{{dnav}}
where the correct page is available through the dab page
{{dnna}}
where the corrrect page is not available
{{dncr}}
disambiguation done

Test 1[edit]

I've been bold and have created a first test here. --Magnus Manske (talk) 20:29, 4 December 2008 (UTC)

Addendum: My current method would yield ~10.500 articles that have an ODNB ID. --Magnus Manske (talk) 20:33, 4 December 2008 (UTC)