Wikipedia:WikiProject Molecular Biology/Genetics/Gene Wiki/SWL proposal

From Wikipedia, the free encyclopedia

Background[edit]

Some of you are already familiar with the semantic wikilink ({{SWL}}) template created a couple years ago as a way to encode semantic "meaning" into ordinary wikilinks. The aim behind encoding this information is to make the information in Wikipedia articles more useful. For instance, even though most biologists could tell you that CDK7 is involved in DNA repair and acts via phosphorylation, a computer has no way of knowing that from the raw text. As a result, if one wanted to do a search for, say, kinases involved in DNA repair, that information is unavailable . With the {{SWL}} template, an editor can encode that information in a simple, consistent way, that renders this information amenable for computational processing and thus much more available to users. If one were to add it to CDK7, we could change the sentence

It is an essential component of the transcription factor [[TFIIH]], that is involved in transcription initiation and [[DNA repair]].

to

It is an essential component of the transcription factor [[TFIIH]], that is involved in transcription initiation and 
{{SWL|type=involved_in|target=DNA repair}}.

which would render like such:

"It is an essential component of the transcription factor TFIIH, that is involved in transcription initiation and DNA repair."

From a typical user's perspective, the only difference between the two final renderings is the faint underline on the SWL, which shows a tooltip upon hover that clarifies the relationship (the tooltip uses the current page title, which is why this example's tooltip looks off. It would read "Cyclin-dependent kinase 7 involved_in DNA repair").

At this point, we have encoded the semantic relationship between CDK7 and DNA repair. If we had a system that was capable of understanding the meaning what we had just written, we could search for the term, or for the properties of CDK7. We could combine search properties into real queries for truly innovative and advanced ways of finding connections. Unfortunately, and as you all probably know, Wikipedia doesn't natively support these semantic links yet, so we can't do much with any SWL templates as it stands besides add more markup to articles. However, we've come up with a couple of solutions to this problem that may make these templates more useful.

The Userscript[edit]

We wrote a series of Wikipedia:Userscripts to get around the limitations of Wikipedia with regards to these semantic links. Our first, available here, facilitates writing SWL templates while editing. The second, really neat, one here allows you to use the SWL links on a page as a type of secondary infobox. It pulls out all the relationships and shows them to you at the top of the page, so that as you browse to CDK7 or Phospholamban for the first time, you can see instantly that CDK7 is involved in DNA repair, and phospholamban is a substrate for PKA, negatively regulates SERCA, and is involved in cardiomyopathy- specifically, congestive heart failure. This kind of instantly-accessible information is ideal both as a summary of function and as queryable properties. Information on installing userscripts is available on the Wikipedia:Userscripts page; we encourage you to try them out- they make the {{SWL}} templates immediately useful. Here's a screenshot:

Thoughts[edit]

We think, and believe others out there do as well, that it's a worthwhile aim to have machine-readable information in Wikipedia, and it makes sense for this information to be not just limited to infoboxes but also to the more subtle relationships that one would find in the inline text. The userscript makes these relationships immediately available and apparent at first glance, complementing the infoboxes and article summary, and the template makes it easy for third-party utilities to extract this information and use it without having to resort to natural-language processing or heuristics. It's a step towards making everything we've written here available to a larger audience and ultimately more useful, and we think it's worth some discussion.

Cheers, Pleiotrope (talk) 17:14, 19 October 2011 (UTC)[reply]