Jump to content

User:PerfektesChaos/js/WikiSyntaxTextMod/flow/tag

From Wikipedia, the free encyclopedia

WikiSyntaxTextModSyntax polishing → Step 2

Tags

The second step in the syntax polishing exercise standardizes tags like <tag> (also comments) and detects errors.

Scope[edit]

The common and unique appearance of tags is accomplished. Human authors shall not be confused by various formatting styles. Bots and scripts may identify structures in a reliable and simple manner.

Only well known elements will be processed:

a applet area audio b base bdi big blockquote body br button center code command dfn div em embed font form frame frameset gallery h1 h2 h3 h4 h5 h6 head hiddentext hiero hr html i iframe imagemap img includeonly input inputbox isindex kbd layer link map math meta noinclude nowiki object onlyinclude option pages poem pre rb rbc ref references rp rt rtc ruby s samp score script select small source span strike strong style sub sup syntaxhighlight templatedata textarea timeline title tt u wbr xml

Comments are considered here, too.

All unknown tags will be ignored.

Formatting[edit]

The following format is expected after polishing:

  • A known tag opened by < is to be closed by > and no other < or > is permitted inside.
  • After and before the limiting < > there is no whitespace.
  • All known tags as enumerated above consist of lowercase letters only.
  • If a backslash \ is detected just after < or before > a manual mistake is assumed and this one is turned into a regular slash.
  • An end tag is written in compact notation: </sup>.
  • An unary tag (like <references />) is written with exactly one space between name (or attribute) and slash.
  • Elements which are permitted in HTML unary only (br, hr and wbr) are enforced to be a unary tag whereever what kind of slash might be present.
  • Empty elements (like <nowiki></nowiki> and <references></references>) will be turned into one unary tag.
    • If there is only whitespace (spaces or linebreaks) between the tags they are regarded as empty, too. There is an optical effect of <pre>\n</pre> but not meaningful except for the Whitespace language. However, <syntaxhighlight> keeps any content unchanged. In other cases an empty tag pair is to be filled with some content.
    • For <div></div> an exception is made.
  • All attribute names are turned into lowercase letters.
  • Every attribute is permitted only one time, multiple occurrence causes an error message.
  • Attribute assignments are written as attr="Val" in compact notation:
    • Whitespace around the equal sign will be removed.
    • The value is encosed in quotation marks ".
    • If inside the value a " has been identified, the apostrophe ' is kept.
    • It is not possible that both quotation mark and apostrophe shall occur in a wikitext and a syntax error (missing delimiter) is assumed, triggering an error message.
    • < or > enclosed in quotation marks are not accepted.
    • Leading and trailing whitespace within the value encosed by quotation marks will be removed.
    • Assignments of empty values are invalid and cause an error message. This goes not for occasional single attributes without equal sign (which are quite rare).
  • Before and ahead an attribute assignment there is exacly one space.
    • In case of multi-line tags line breaks are kept.

Nesting[edit]

Associated opening and closing tags are identified.

Correct nesting is checked; if end tags are missing or superfluous in a level an error message is thrown.

Some elements are processed immediately from opening until closing tag.

Content analysis[edit]

  • nowiki ranges and some (unary) elements will be protected immediately after regions which are commented out.
  • syntaxhighlight areas will be protected next and entirely.
    • If possible (key word „syntaxhighlight“ not within range) the obsoleted source is turned into syntaxhighlight. By the way, the strike tag is standardized as <s>.
  • For security reasons HTML elements with URL links out of wiki projects (like <a href= or <img src=) are blocked in the generated HTML page. Within wikitext the script will deactivate them by transformation of the leading < into &lt;, which yields the same optical appearance.
  • If typographical tags are met in unary shape, which is meaningful in binary mode only (like <b />, <em />, <i />, <span /> etc.), a certain bad habit is assumed and they are turned into <nowiki />. Parameters would be pointless and will be removed.
  • On activities in <br />, which use the CSS property style="clear:… or contain the non-standard clear=…, only the block element <div /> is possible and br will be transformed respectively. Non-standard forms in <div /> are interpreted and according to the intention proper style="clear:both" etc. will be assigned.
    • In order to ensure valid HTML <div … /> is written as empty <div …></div>.[1]
  • If an attribute assignment is mandatory or might not be permitted, an error message is shown.
    • With elements gallery ref references well-known parameters are tolerated only.
  • If the kind of element suggests more specific processing, whitespace formatting, syntax analysis or possibly content protection, this is done or prebooked.

Comments[edit]

  • For the beginning of a comment <!-- the adjacent end --> is searched. If the end cannot be found or there is a space detected within the beginning of a comment an error message is displayed.
  • A comment may be subject to a user defined comment modification.
  • All comments will be protected against any further searching and replacement.

Remarks[edit]

  1. ^ The inner tags of wikisyntax are not kept in the HTML document and may be provided as unary XML.

[ German page ]