Jump to content

User:PerfektesChaos/js/WikiSyntaxTextMod/flow/link

From Wikipedia, the free encyclopedia

WikiSyntaxTextModSyntax polishing → Step 4

Links

With the fourth step of syntax polishing all links are processed. Possible links are detected by [ and afterwards by :// string search.

One goal is to adapt link targets, another aim is formatting of links in a common and readable manner, which can be detected easily by other scripts and bots.

Wikilink[edit]

If not explicitly mentioned, in this section the term “bracket” means square brackets [[]].

Syntax correction[edit]

  • In certain unambiguous cases of wikilinks missing single brackets are added, superfluous brackets will be removed.
    • More than two opening [[ prevent link rendering and will be fixed (reduced to two brackets). With [ in intended visible opening bracket might be provided.
  • If there are multiple adjacent pipe symbols within a wikilink instead of a single one they are reduced to one only.
    • If any other additional pipe symbol is found within link title the intended separation between link target and link title cannot be guessed and an error message is thrown only.
  • A line break (which is not permitted) within the bracketed region of meaningful extension is turned into a space character.

Wikilink by http[edit]

Sometimes an external URL is used, like

[http://en.wikipedia.org/wiki/Main_page

as well as [https: and protocol relative URL.

This is turned into wikilink format if possible.

Links by URL do not appear on WhatLinksHere and GlobalUsage.

Wikilink with scripting direction (left-right)[edit]

If directly before or after a wikilink target a (usually invisible) bidi character is present it will be discarded. Thie does not affect the functionality. On link or an old fashioned interlanguage into arabic language wikipedia the link target begins with :ar: snd is not affected anyway.

Wikipedia in other languages and major sister projects[edit]

Correct external links like

[http://de.wikipedia.org/wiki/Schur%E2%80%93Zassenhaus-Theorem

are not enclosed in <ref> or moved as external link into other sections by this script.

Not only Wikipedia, but also other major sister projects (with a shortcut) linked by URL are detected and transformed into wikilink format.

It is a unique format used with a shortcut p (1 letter or wikt or meta):[1][2]

  • p:Lemma – same language, other project type
  • p:lang:Lemma – other language, other project type
  • :lang:Lemma – other language, same project

A leading colon ahead of project identifier is used by some authors but redundant and will be discarded.

The inverted order :lang:p:Lemma is quite rare and will be brought into usual sequence despite it works both ways.[3]

URL as wikilink[edit]

This means something like

Gem%C3%A4ldegalerie_%28Berlin%29#Die_Gem.C3.A4ldegalerie_in_Dahlem

This brewage in URL-Escape/UTF-8 is made more pleasant.

As generally known this is born if authors copy the URL of the target page into wikilink. Underscores are replaced by spaces. Escape sequences are identified and replaced by UCS characters.

Wikilink on itself[edit]

This means a wikilink targetting to the current page (self):

[[self]]

will be unlinked, a differing link title

[[self|Alter Ego]]

shall become

Alter Ego

Often as

[[self#section|

to be replaced by

[[#section|

Within a includeonly or onlyinclude region link on itself is permitted and required and kept.

Simplify your wikilink[edit]

Titled wikilinks to other pages like

[[pointing device|pointing devices]]

are simplified as

[[pointing device]]s

The same rules implemented in the parser are applied here avoiding changed appearance.

This goes especially for

  • [[target|target]]
    wich is just
    [[target]]

Sometimes for the human reader the coinciding target word splits the matching link title at strange positions not expected for syllabification.

For titled links the resulting clickable (blue) part shall be the same as the bracketed title, merging

[[Component (software)|component]]s

into

[[Component (software)|components]]

Pipe trick[edit]

In the first days of wikipedia the pipe trick has been invented: If a link target contains an expression in round parentheses () or a comma, the part before will be displayed as link title if an empty link title is given: The pipe symbol is followed by closing backets |]] immediately.

This was supposed to reduce typing. However, only a few authors are familiar with this notation, and the small pipe symbol might be overlooked easily. This script evaluates the construct by the same rules as the parser does and inserts the resulting and displayed link target explicitly.

It is less known even to authors swearing on the abbreviated format that the pipe trick does not work within “tag extensions” like <ref> or <gallery> (and other delicacies won’t work there either). In this case the explicit title is producing the intended behaviour the first time.

Formatting[edit]

One of the general rules later text search may rely on:

  • There is no remaining space between [[ and link target or around pipe symbol | or ahead of ]].

Weblink (external link)[edit]

For recognition of URL only the following protocols are used: http https ftp git mms svn and protocol relative [//. Other schemes are permitted in wikitext but quite rare.

If not explicitly mentioned, in this section the term “bracket” means square brackets [].

Weblink correction[edit]

  • Weblink with \n
    If an URL after opening bracket is immediately followed by \n line break, that will be replaced by space, since the link won’t be displayed if spread over multiple lines.
    • If anything else follows after link title but closing bracket is missing nothing will be changed, since it cannot be determined where the link title is intended to be terminated. The closing bracket might be absent until end of paragraph. An error message is displayed.
  • Weblink in double square brackets [].
    If double square brackets enclose an URL starting with protocol like [[http:// or [[https:// The brackets are reduced to single. This is unambiguous and a common mistake.
  • If within a URL pairs of square brackets are detected they will be escaped automatically if no doubt:
    • tx_ttnews[tt_news]= etc. result from TYPO3.
    • The entities &#91;&#93; are used rather than URL encoding %5B%5D – this keeps the original notation of the web server. Not every server (especially applications of last century) supports percent decoding, nor is any server obliged to obey URL rules for its GET access. Therefore the functionality is not endangered, but an escaped URL would need to be tested. However, the MediaWiki software turns the encoding when displaying the page but this is not business ofthe underlying wikitext source.
    • An error message is always issued. If change appears to be unsafe nothing is modified.
  • If an URL is containing or joining special characters, a warning message is issued:
    • "{} will break the link; they need to be escaped.
    • Pipe symbol | or might be originated from wikisyntax with other intention: Separation of link title and italic or bold decoration when a space character got lost.
    • If an URL is terminated by a punctuation character (,.;? this is suspicious since without brackets the MediaWiki software assumes that this does not belong to the URL. Links without brackets should be enclosed in brackets and get an appropriate title to make it absolutely clear. If inside brackets they might have been copied by error until adjacent space.

Weblink formatting[edit]

Two of the general rules later text search may rely on:

  • There is no remaining space between [ and http:// etc.
  • There is exactly one space between URL and linktitle.

URL formatting[edit]

  • In general a URL which is pointing to a domain only is terminated by slash /. It also works without slash, since slash path is defaulted by HTTP, but this slash is the path of the “home” resource. Web servers return their own URL in this format. For search processes it might make more clear where the host part is terminated.
  • The domain name (host) is turned into lowercase as well as the protocol.

Weblink on wiki project[edit]

For weblinks with brackets related to wiki projects the following action is taken:

  • If conversion into wikilink is possible this will be done.
  • Otherwise on many known WMF domains a protocol relative form is built. If certain subdomains are available by https only the protocol is changed into secure access.
  • The secure.wikimedia.org domain is obsolete since fall 2011 and an equivalent URL will be created.

On WMF URL without brackets which might be formatted as wikilink nothing is changed, but a warning will be issued.

Modification of link target or environment[edit]

User defined modifications of wikilink, URL, or the adhering text segments are applied immediately to any detected link target.

If it is needed the link target will be protected against textual modification.

Remarks[edit]

  1. ^ A longer project name is replaced by the common shortcut.
    Instead of [[wikisource:lang:Title something like
    ''[[s:lang:Title|Title]]'' for the *** language [[Wikisource]]
    etc. should be written to show any reader clearly into which language a link will lead.
  2. ^ Both m: and meta: are possible, but meta: is used for easier readability.
  3. ^ See also recommendations at meta:Help:Interwiki linking #Prefixes.

[ German page ]