User:Krauss/arXiv-1

From Wikipedia, the free encyclopedia

This draft "user space article" started at 2006 to details some of the formalism behind the Web template engine and Template system articles, for the benefit of readers who desire a more rigorous treatment of the material than is accessible to a general audience. The draft stopted to avoid conflicts with the No original research policy. The author was looking for sources...

...Four years later, no good adicional source was found. We believe that is possible organize what wikipedians as presented as "template systems" in a more rigorous and consistent way. Now (there are a more consensual and mature view of cases and concepts) this draft is adopted for a ARXIV.ORG article colaborative construction, by users Krauss, Gabriel.scapin, and any other wikipedian that understand the necessity of this issue, with a special invite for Dreftymac.

Introduction[edit]

There are many systems promoted as being template systems. Wikipedia listed about 60 "template engines" [note 1] and a dozen of "template processors" [note 2], classified in at least three types of use (document generation, dynamic web page generation, and source code generation).

There are a high diversity of cases, applications, concepts and terminology associated with this "fuzzy field of study". Then, there is a demand for a objective criteria to select "what is and what is not a template system", and to classify template systems. This situation is not new for programmers and software engineers, Franklin and Graesser[1] deal with a similar problem and offer a formal definition as solution. The diversity of data base query languages before the SQL ANSI was also a similar problem[note 3].

A first template formal syntactic definition was done by Hsu and Yih[2], [3]. At that time (1997) the diversity of template engines and template languages was not emerged as a problem, and the focus was extract information from documents generated by templates. A seminal and wide accepted work was done by T. Parr, in his "Enforcing Strict Model-View Separation in Template Engines" of 2004, [4], offering a formal linguistic approach to define templates and template engines. The Parr article was cited mainly in web application and programming language contexts, and his focus is the template syntax and the MVC (Model-View-Controller software pattern, [ref]) use case.

The Hsu-Yih's template definition is more generic about processing templates, and more restrictive about output documents, supposing structured ones. The Parr's processing restrictions take out of definition some important "intuitively accepted as template" cases, such as the W3C standards for templates, XSLT[5]. A general purpose template definition is need: to integrate the two definitions and use cases, and to conciliate the intuitive view.

We understand that the parts, like processor, template, input and output, are components of a whole system, and the main concepts, template language and template processor, only have meaning with the whole system view. Our proposal is to "evaluate a candidate system" abstracting it as a black-box and checking whether meets a predetermined set of properties based on its input and output characteristics. This methodology ensures that our formal definition is based solely on general aspects of observable elements (input and output). Only after this characterization we apply the syntactic considerations, where we enhanced the Parr's and Hsu-Yih's definitions, and sugest an approach for classification, by syntax and black-box aspects.

We sugest that a more general definition and integrated models can delimit a field of study, that help research fields (Information Extraction, Information Recover, Computational Linguistic, Software Engineering, etc.) to deal with the template systems and the "template generation hypothesis".

Simplest templates[edit]

Parr (2006)[6] suggested a simple way to express templates, as string functions. Let,

k1, k2, ..., km string constants
c1, c2, ..., cn, attributes (string inputs)

a template function is any function tpl that only concatenate these elements,

tpl(c1, c2, ..., cn) = k1 | c1 | k2 | ... | cn | km

where the symbol "|" is the concatenation string operator, n and m (≤n+1) can range from 0 (no attributes or no constants) to any number, and the indexed variables are references, in any sequence combination. Example (different templates with the same constants and attributes),

tpl1(c1, c2) = k1 | c1 | k2 | c2 | k3
tpl2(c1, c2) = c1 | k1

Virtually all the popular programming languages have a facility to express this kind of function. Usually the ki parameters are inline constants (string literals), example:

tpl(c) = "Hello " | c | "!"

Many languages also incorporate this functionality in a print function. It does something more, that template systems must do: express a final result to output[note 4]. A well-known example[7] is the printf function of the standard C programming language[8],

printf ("Hello %s!", c);

Now the string literals "Hello " and "!" are compacted in only one k argument, using the strategy of the "putting placeholder marks". Another way to see this template function[7], is naming the constant part ("Hello %s!") as template, the attribute reference (c) as input, and the function as template processor. Without a formal definition, these terms are little bit confusing: "template" is a function or is the string constant with placeholders?

The printf have also a sophisticated formater based on datatypes (see %i, %f, etc. marks). It is a good approach if it is a general purpose outputting.

Another common strategy to simplify the programmers life is to compact placeholder mark anb attribute reference in only one mark. Awk, Perl, PHP and others use this simple syntax:

print "Hello $c!";

For apply a format function PHP use a more complex syntax, including the format directive as a explicit function call,

print "Hello {$c->format()}!";

These concatenation and print examples permits, in a microscale scope, to understand the template concepts and problems. A desirable scale of work, to simplify the designers and programmers life for the use cases of the section bellow, is (to output) the whole document. This change of scale, that is also a change of context, will change requirements, that characterize template systems.

Use cases and contexts[edit]

{figure here: content1+template1=content2; content1+template2=content3; content4+template1=content5}

In the generation process of "new content", starting with content embodied into a template or supplied as input in a template system, content is reused (see illustration).

A book, a letter, a song or a set of pixels in a screen, they have information that can be interpreted as content by humans; and this content may have been authored by human or not, like a database generated report. For any kind and any case, authoring costs and misunderstanding risks (of a inadequate form) might be reduced if digital content could be reused across the different use forms. Reuse can be defined as "the use of existing digital content to produce new content, or the application of existing content to a new context or setting"[9].

A short chronology of the automated reuse of digital contents:

At 1960s: digital memory was very expensive for content storage, but software (assembler) source code was a content of human interest. A strategy for reuse it was presented in 1959 as macro instruction: "... to save time and clerical-type errors in writing sequence of instructions which are often repeated ..."[10]. A macro is like a subtemplate with optional parameters and without input attributes.

At 1970s: some popular typewriters allowed repetitive typing[note 5], with a one-line display for editing single lines. The popular IBM Selectric models store content for temporary use into digital media (tape or card), permitting reuse for corrections, and for (processed by hand) form letters — ex. changing addressee name, "Dear Mr. John", "Dear Mrs. Mary". This processes allowed reuse not only of new letters, but also of invitations, adhesion contracts, memorandums, legal boilerplates, and any other type of repeated-content documents. At the same decade, relational database management system (R-DBMS) show one of the best ways to "organize for reuse" fragments of contents in mainframes. For raw-text databases, simple tools, like AWK programming language arrived as effective solution as raw-text report writer.

At 1980s: word processors (WordStar, Word Perfect, XyWrite, MS-Word, Wordstar, Workwriter, etc.) and desk top publishing solutions (MacPublisher, PageMaker, LaTeX, QuarkXPress) arrived with the personal computers (PCs) and printers. Reuse by "starting with a stored document as model" was the simplest and popular strategy. The modern user-interface paradigms for find/replace and copy/paste (the modern solution for repetitive typing) started with these publishing tools, and offer a "processed by hand template system". Automated mail-merge (reuse of the main content in a form letter) and reuse of letterheads (reuse of layout for letter frame) also was possible, and increasingly used. Desk top publishing solutions offer the first style sheet languages, allowing wide reuse of layout in structured documents. At this time, macro languages are a standard part of many programming languages, like in the C-Preprocessor[11]. Object-oriented languages (see ex. C++) arrived offering a concept of "syntactical templates". Database (R-DBMS) report writers, like Oracle ReportWriter[12] and Quik Reports[note 6], begin to used.

At 1990s: the production of digital documents, for printers and for the increasing digital media (HDs, CDs, LANs, WANs and Internet), been consolidated. With the inception of the World Wide Web around the middle of the decade, the diffusion of digital content grow exponentially each year. Web page arrived as important type of document, and type of network connected resource. It can seem as a letter or report "living" into a web address (URL). Behind the page's URL have a HTTP server and, if the server can produce a new fresh page for each "perceived as different" page visitor, we say that the web page is a dynamic web page.

At 2000s: standards for digital contents, like XML, CSS, HTML (ISO 15445), PDF (ISO 32000), ODF (ISO 26300) and others, consolidate the use of template systems and associated "XML-publishing" solutions. Standards for complex quering, like SQL3 (ISO 9075:1999), XSLT[13], xQuery[14] and RDF[15], with better integration with standard output formats, allow better the use of them as report writers... Query languages are used also as template languages.

The highlighted terms can be organized, sketching a typology of template system typical uses:

 FIG: 3 typical kinds of simple placeholder templates.
Document generation
With one template (ex. a letter template) and many inputs, many documents are produced (ex. the final letters). There are a set of "template generated documents". Main cases (see illustration):
  • Form letter: office memorandum, standard form (adhesion) contract and legal boilerplates (reused fragments of law) are typical text candidates for this kind of template. They allows mass production of similar content documents. Specific examples: appendix "Simplest examples", OppenOffice[16] or MS-Word[17].
  • Letter frame: knowed also as skin, letterhead or "header and footer frame" templates. Any content is "filled in the blank" of this kind of template. It allows a mass production of non-similar content documents, with a "standard frame". A sample document with a letterhead of corporate identity, or a document model for a corporate memorandum, are knowed letter frames. Specific examples: appendix "Simplest examples", OpenOffice frames[18].
  • Query report: a report document produced from a input information like a input form or a database query tool. Typical report templates are like a letter frame template with many placeholders. Examples: the report writers of OpenOffice Base, MS-Access, Oracle, etc.; a "confirm your data" report after a "please fill the form" interface; a report data spreadsheet (like OpenOffice-Calc or MS-Excel) calculated from a spreadsheet of raw data.
Dynamic web page generation
With one template (ex. a letter template) and one input (ex. a posted form), a document is produced (ex. the final letter). There are many possible inputs, then, many possible different documents are produced. A set of "possible documents" is different from a set of "concrete documents", in document generation. Form letter, letter frame and query report can used as a typical examples of content generation into a web page.
Source code generation
With a template language augmenting a programming language, a software source code can be expanded and easily reused. New source code is generated by the template.[19] It is analog to the document generation use cases, where documents are interchanged by software source code. Example: a macro processor, such as the C PreProcessor[8] (UNIX cpp). The source code generation illustrate also the case where script language and output language are the same (or the same kind), and illustrate the main cases where the template language separation criteria is not lexical, but syntactical[20] (see metaprogramming[21]).

Template generation hypothesis[edit]

Observing two or more documents with a very similar content, we can imagine that were produced by a process of templating. Typical boilerplate texts can be generated from at least two of the above illustrative cases:

  • Copies of the input content: input content with the boilerplate text, replicated with many different letter frame templates, delivering many similar output documents.
  • Copies of the template content: each different input is combined to a same form letter template (that contains the boilerplate text), with a little customized delivery variation.

So there are at least two situations where, given a set of very similar (content) documents, we can imagine a common process of production, that explains this high similarity. There are many other cases, where the similarity is not about the content, but about the document structure, and, many others more with a "mixed kind" of these cases.

The template systems are very popular and are generating digital documents over years. Add to them the many other mass produced documents that was created by "by hand", with copy/paste and find/replace edition procedures. There are a so many template generated documents: the "template generation hypothesis" is hardly ever null. There are two main ways to use this hypothesis:

  1. In a statistical context (non-known templates): document clusterization[22], classification[23],[note 7], linguistic analysis[24], plagiarism analysis[refs], cloned code detection[25], and many other document analysis, that compare documents, can refine your methods doing assertions about the probability of occurrence of template generated documents.
  2. In a information extraction context (known templates): early proposed by Hsu and Yih[2] (1997). The template-based information extraction methologies and algorithms suppose the existence of similar (template generated) documents in your working set. They perform something as "reverse engineering of the template", and extract data according this template.
    A example, in a extreme case of information extraction, is the conversion of scientific journals integral articles (raw text HTML, TXT, converted PDF or other) into XML NLM articles[26], undertaken by publisher organizations that not have a XML publishing pipeline, but want to deposit articles into the PubMed Central[27].
    Semi-automated tools like INERA's[28] do the conversion to the NLM markup. Since scientific articles have a rigid structure (with parts such as title, authors, affiliations, abstract, body, reference list, etc.), and the journals maintains a stable style, ever is possible to use the template generation hypothesis in a large number of articles.

They need a formal reference model of template systems.

Objectives[edit]

The goals of this article are:

  • Render more inclusive conceptualization and definitions (than of Parr's and Hsu-Yih's);
  • Provide subsidies for the identification and classification of template systems;
  • Fix a reference model for the template generation hypothesis;
  • Provide subsidies to standardize "field limits" and consistent terminology, in this field of study.

Black-box system characterization[edit]

Elements (C,T,P,R) on the dataflow representation.

... resumo arXiv-2 ...

Architecture characterization[edit]

There are some alternatives to put the template system black-box into a computer network context, characterizing architecture decisions. A client-server reference model is the more general and a natural way to express the architecture context.

Template systems at different architecture contexts need diferent kinds of implementation and rules for use it, then, the architecture context is also a criteria to organize the diversity of template systems. The illustrated three groups (below) was first proposed by a 2006 Wikipedia article[29], that was organized a list of web template systems: Outside server systems, Server-side systems, and Distributed systems.

A formal characterization of the network context "plugs" the formal black-box model, and avoid mistakes about systems with cache strategies and remote references. Using the system notation (definitions for R, T, C, and P above) and adding a network notation:

Notation Definition
A@X "A is at X", or "the resource for A is at the X machine or at the same LAN (Local Area Network)". @X can be:
  • @c - at a Client.
  • @s - at a Server (or into the same high performance LAN).
  • @o - at another LAN, Outside server (and outside client).
A@X ← B@Y "information A, at X, is transfered from Y". Send and record a message.
A@X := B@X Copy. No send process was required.
Outside (or cached) server architecture.

Outside server systems (or "local systems")

R@O := P@O(L@O,C@O)

The system act only on local transfer process. The "global transfer process" need two steps:

  1. R@O := P@O(L@O,C@O)    Output production, with the template system.
  2. R@CR@SR@O    Publication (using another system or something like manual FTP) and distribution (e.g. HTTP browsing).
Server-side architecture.

Server-side systems   there no flow between nets, all are server-side net (or server machine).

R@CP@S(L@S,C@S)    "On-fly publication".

Or caching on server, two steps:

  1. R@C1Rcache@S := P@S(L@S,C@S)    "On demand production" of R (first request) and caching.
  2. R@C2Rcache@S    (next request), using the cache.
Client-side and distributed (decentralized) architectures.

Distributed systems   All other combinations, with one or more elements, but not all, on sever:

R@C := P@C(L@S,C@S)    Typical client-side case.
R@C  := P@C(L@S1,C@S2)    Generic client-side.
R@CP@X(L@Y,C@Z)    Generic distributed case. Any of these combinations characterize a distributed case: (X,Y,Z) IN {(S1,S2,S3), (C,S1,S2), ..., (S,C,C)}.

There are also, on distributed systems, the possibility of use a "distributed library", L, where the templates are not at the same resource. It is obtained by single exchanging from local to remote references:
L@* = {T1@S1, T2@S2, … Ti@Si, … TN@SN}.

Dynamic interface considerations   The AJAX approach[30] use page interface events (like a mouse click) to trigg a "refresh a document portion" process. A new template evaluation is requested, and, supposing the document portion as a document, we can use the template system model. Anyway for an elegant description of this context, we add more one constraint: the document was generated by a template system, and the output document structure (valid document schema) can be mapped into templates.

Step-by-step, the AJAX approach is,

  1. R@CP@S(L,C1)    The browser receive the server page. A portion of the document was rendered by TiL.
  2. The user event trigg a new request, that is send to the server.
  3. Portion(R@C,i) ← P@S({Ti},C2)    The portion i (associated with Ti) of the document is refreshed by the new one. The template system is at a web-service (is not a usual page server service).

Another combinations are possible. This "refreshed portion" strategy have a growing interest in web applications, ranging from "get value" (where no template system is necessary) or "get a little piece" to a whole page, to page refresh. Anyway, Portion refresh and strategies of distributed "fine grain" processing (ex. one template processor for each template of L) are out of scope of this article. Our sugestion is to not-generalize the template system definition in this way.

Template syntax characterization[edit]

The use of hooks are exemplified by the red marks. It permits the separation between logic (hidden for blue designers-view) and design.

Informally a simple template T is a "document with holes", where holes are placeholders or macro references. A template T, from the "black box system characterization", is an input it self, or an element from the library.


Output documents[edit]

The output data model, S, is not necessary a explicit data model specification, into the template system scope. It is sufficient, for black-box modeling, a abstraction from a implicit definition. For example, textual document, may have only a informal characterization, but your language (fixed ex. by with a language identifier[31] attribution), and the language rules (dictionary, grammatic, etc.), are part of the document data model. If the output is a software source code, in other hand, there are a formal and explicit data model, supplied by the compiler specifications.

A generic way to formally express the "output type", in a Transformational Linguistics view, is as grammar[32].

Any human-readble (for human analisers) output is a valid output for the black-box approach, but, for the characterization discussion (below), we suppose always a "source code information", like a OpenOffice XML packed document, insted a "binary output", like PDF document, that can be converted (with loss of information) to source code (ex. PDF text to raw TXT text).

The primary type of content is the TXT file: a raw content with no structure and no type associeted. A source code with this content is a literal...

"Source code" outputs, as documents with document schemas (like DTD, XML schema, or RELAX-NG) and software source code (with a associeted compiler or formal language definitions) have this ...

Holes and logic[edit]

... descrição do que é o template e do que é o template script... transforming input content into output document...

Logic separation[edit]

... holes ... hooks ... hooks as delimiters...

Finalmente, se o template T é um documento com buracos, então do ponto de vista léxico esses buracos precisam ser expressos ...

Split model[edit]

A template T is a string that can be split (using "hook criteria") into 2 distinct, not empty, token types:

  • t: output document contiguous fragments.
  • s: script contiguous fragments, like expressions or instructions — simple instructions, or statements, or directives, or blocks of them.
    Note: a sequence of repeated s, like occurs with XSLT or ColdFusion, is transformed into a unique "contiguous s" block.

The resulted sequence of tokens is not arbitrary, and, theoretically the "contiguous hypothesis" enforce a pattern that avoid validation necessity. Technically it is validated by a regular expression: /^((t(st)*s?)|(s(ts)*t?))$/.

Formally it is supplied by a generative grammar, , with , , the start symbol, and the following production rules:   ;   ;  .

Notes:

About convention for "embed" terminology: if the template T is generated by productions (starts with t), it is a template with "output language embedded with the script", else (starts with s) it is a template with a "script embedded with the output language". Languages like XQuery permits both of the "template embeddeding modes".
About point of view: designers see the script fragments as "holes", then, designers always see (by a background effect or viewer/editor choose) a template as a "output language embedded with a script".
About Parr definition: this definition is given by a generalization over "Parr split model" [4], that must start with t and not is submitted to system context considerations.

Affinity between script and output languages[edit]

The resulted pattern ([st]+ sequence) not need to reflect a well-balanced XML structure, or a script with nested loops. But this kind of behavior reflect the level of affinity between languages.

The paramount characteristic of a template scripting language "is whether it operates at the lexical or syntactical level" [20].

Conceptually, lexical P processing precedes "output language parsing", and is thus ignorant of the syntax of the underlying output language. The t fragments are "transparents" for s and vice-versa; they have no affinity.

Typically "lexically embed scripts" like ASP, PHP and JSP, can be lexically transformed into a full script: output language fragments (t) are wrapped in invocations of print-like instructions to output.

"In contrast, syntactical languages operate on parse trees (...) which of course requires knowledge of the host language and its grammar. (...) the syntax may help convey the meaning of and reflect the nature of the abstraction."[20].

There are also hibrid levels. XSLT, XQuery, TeX macro language, and Haml offer more affinity than lexical languages, recognizing the basic rules of the output language: s and t can be balanced and/or complemented.

Adicionar? Types of affinity:

  • Lexical isolation: neither of then (script or document language) recognizes the hook as a construct, then, it is a good separator.
  • Lexical affinite: they recognize the hooks, like when both, script and document, are XML language (see XSLT and XHML).
  • Syntatical affinite: the hooks are not lexical but natural constructs of the languages, see template metaprogramming... These kind of languages have problems into the "split model" (?rever!).

Template script language types[edit]

T is a grammar G where the s script fragments are specifications to the engine, to generate output using the content, C, and the output fragments, t.

The simplest script type only do scalar variable references. Parr defined[4] another 4 types of templates:

  • Regular (Parr's def.2): have a "internal grammar" restricted to 2 sub-token types, a (scalar or multi-valued) variable reference and a sub-template reference. Both references are side-effect free, and may iterate over a set of multi-variable values (from content C) or literals (t).
  • Context-free (Parr's def.3): limited to referencing scalar variables and sub-templates, but more general than regular language "(...) since it can handle balanced tree structures"[33].
  • Context-sensitive (Parr's def.4): is a Context-free augmented to allow predicated template application; that is, a template augmented to allow template references or inclusion of sub-templates only in certain grammatical context. Predicates operate on variables and the template tree structure itself. Actions and predicates are side-effect free. By limiting predicates to operations on cj and surrounding template (t).
  • Unrestricted (Parr's def.1): like context-sensitive, but unrestricted computationally and syntactically. Script fragments behave as Turing machines.

Languages hierarchy[edit]

There are two levels of abstractions for the template language definition:

  1. Template instance grammar: the template text is split into elements of a generational grammar, and output analyzed. T is a string that was characterized as a template grammar, and, analyzing output behaviour it can be characterized also as an instance of a grammar type.
  2. Template language: it is a language where a "split model grammar" generates it, and have a pre-fixed standard meta-grammar characterized by the specific "script language". The same schema of types may be used to define generic language types (groups of standards).

Template languages can be grouped in a hierarchy:

Language class Template type (formal name) Notes
Recursively enumerable Complex (unrestricted template) Turing complete
Recursive "Near Complex" (Recursive) Sub-templates and template references. Grouped with Programmable. Not exist on Chomsky schemes.
Context-sensitive Programmable (Context-sensitive) Use IFs.
Context-free Iterable (context-free) There are loops.
Regular (regular template) Grouped with Iterable.
(not a language) Simple Not exist on Parr scheme.
A Venn diagram showing the template language types as sets of features.

The main divisory line, from the good separation principles perspective, is about Programmable/Iterable (Context-sensitive / Context-free). From algorithms perspective, the upper line — complex languages have power to produce any algorithm — and lower line (no algorithm on simple languages).

These template languages groups,

Level 3 - Complex template language,
Level 2 - Programmable template language,
Level 1 - Iterable template language,
Level 0 - Simple template language;

are also a hierarchy of feature sets. The logic of the hierarchy is about minimal features: if a language have the "minimal features" of a level N > 0, the language will/must have all the minimal features of the level N-1.

Loose characterization[edit]

... Loose criteria for hook or language classification: the self-discipline of the programmer state what kind of hooks and/or language is wanted. ... Any system that satisfy the blackbox criteria, can satisfy the "loose syntax criteria"... A simple C program will be a "strict sense template" if the self-discipline of the programmer control the syntax.

Similarly, the Parr sugestion of MVC separation on unrestrited languages ... it is a loose MVC template characterization.

Template "decision driven types"[edit]

For designers and programmers, specifying projects or divisiding tasks, the template library L, when n(L)>1, need some organization. They need to make choices about template script language and template set arrangement.

The first consideration is about how template system with a specific library will do decisions about template selection. In a black-box perspective, a template library L is equipped with this decision power if, for any input contents C1, C2 that differ by a flag, C1=(C,true), C2=(C,false), it can internalize the choice "use L1 if flag else use L2". The behaviour is expressed by the property: P(L,C1)= P(L1,C) and P(L,C2)=P(L2,C), with P(L1,C)≠P(L2,C).

There are two main paradigms to arrange templates:

Script-driven template arrangements: where the script have explicit IF commands.
Black-box characteristic: need a default template (for sub-template selection), or there are more than one livrary, for the system's controller do the choice.
Syntatic characteristic: equiped with conditional template blocks and/or sub-template call (reference). In a library the default template express the logic of template calls.
Designer's perception: the template processor "select template fragments and fill it with content".
Programmer's perception: all the logic about template decisions (if/then or switch/case logic) are explicit into the script.
Examples: users of SSI[34], XQuery[35] and Smarty[36] preffer this arrangement.
Content-driven template arrangements: the decision is made "by data matching".
Black-box characteristic: there are no default template, the input content "do the decision" about the first one. The library internalize the control of the default template.
Syntatic characteristic: have "data matching" or some associative mechanism for sub-template calls. To supply this feature in a script-driven language, programmer can use a dispatcher or another event-driven pattern algorithms.
Programmer's perception: part of the logic is implicit (not expressed on script), and is on processor as pre-defined rules.
Example: XSLT[37], and attribute languages like Zope[38] (TAL and METAL specifications).

As showed by examples, there are template languages that not support one or another type of arrangement. When the script language allows, mixed arrange type can be used, and a little care with organization is recommended.

By way of template system classification, each "driven type" can be explicitly (and friendly) supported or not.

Vocation of template systems[edit]

Shannon's diagram of a general communications system[39]. ...
... see arXiv-2 ...

A template system is a part of a communication system, so the template system's goals are characterized by their relationships with the communication system. These goals are not "requiriments" (of a system software development process), but observed goals of a large number of existent systems — thus, we seek to express "vocational aspects" of the existent template systems.

Template systems are specialized tools for uniformization of the form and reuse of the content of documents, in a "communicating by documents" context. For generalize this vocation we need some detailment.

There are two significant and ilustrative types of reuse achieved by template systems, as showed in the introduction section and in the modus operandi property:

  • reuse of the same input content: many outputs are possible, each one customizing a specific delivery pack for the (reused) content. See use of letter frame templates as a typical use case.
  • reuse of the same template content: each different input is combined to a same template (to replicate to the output the same main content), with a little customized delivery variation. See use of form letter templates as a typical use case.

Understanding these issues in a broader context, of a communication process, the "reuse by template system" strategy, instead of another strategy, assures some advantages:

  1. Information integrity: "reuse by replication" (copies of pieces of information and/or exact compliance of a information structure) have greater integrity than interpretations, translations, summaries, and other strategies.
  2. Transport adequacy: o template system "packs" the information (of the input and/or template) more appropriately to your channel. Other strategies, without customizing transport, can compromise the final reception of the information.
  3. Information objectivity: o template system allows filtering of inputs, ensuring that only the relevant portion of the information is delivered. The templating strategy offer less traffic and less "time for reading".

These three aspects generalize and detail the reuse vocation.

Uniformization of a set of documents, like repeat structure, layout and diagramation, have advantages that can generalized as a solution of coordenation problems[40], very frequent in a communication context.

...

Por fim, as "specialized tools", all relevant template system case uses have a important rule into human and computer division of labor, that reinforce your system delimitation. Into this context the primary goal behind using a template system "is to separate logic and data computations from the display of such data in both thought and mechanism"[6].

Templates into design patterns[edit]

.... apenas lembrar tipos de linguagem (baseada apenas em copias e comparacoes simples) que garantem o MVC... e que funcionalidades como formatacao de numeros, etc. sao loose enforce de unrestrited language...

MVC enforce pode ser caracterizado parcialmente na blackbox (only copy operation, mas nao tem como verificar objetivamente o complex compare), e totalmente na escolha da linguagem (simple, regular, context free, etc.)...

Strict sense template systems[edit]

Alternative definitions to didactic simplifications and comparations.

Strict black box[edit]

Only minor changes and constraints must be added to the general black-box characterization.

  • Instedad a library, L, use only one template, T. This restriction removes languages like XSLT but simplify the scopo of analisis (how to select what templates use).
  • Principle 3, "No information generated by P", must be satisfyed. It removes all unrestricted template languages.
  • A new principle, 5, "No recycling templates". It enforces the assumption that "all is doed in one step".

Then, a scrict template system is:

R = P(T, C)

where

 ... ser arXiv-2 ...
# Property Notes
1 P(ε,C) = ε Analog to the general black-box.
2 ... Analog to the general black-box.
3 Info[P(T, C)] ⊆ Info(T) ∪ Info(C) ...
4 Analog to the general black-box.
5 Clean(R)=R No recycling templates.

Strict sintax[edit]

... Parr e outros impoe que o template sempre inicia e termina (balanceado) com documento, nao com script (isso elimina XSLT)... Apesar de nao formalizado por Parr, fica implicito que os hooks sao evidentes (portanto pode-se supor que strict hooks sejam sintaticamente invariantes) (isso elimina o exemplo do codigo C)... por fim a linguagem do script nao pode ser turing-completa (elimina PHP e cia)... Dar exemplos de template systems que sao strict syntax.

Use in hypothesis tests and classification[edit]

... The "Template generation hypothesis" ... a única coisa que temos é o documento final ... assim a única e melhor hipótese é que os documentos tenham sido gerados por processos o mais simples... Strict templates sao os mais simples.

... as classes sao "form letter", "letter frame", e "another".

... o teste da "Template generation hypothesis" fica mais evidente para o form letter....

Discussion[edit]

... retomar os exemplos da introducao ...

... a discussao fica em torno da aplicacao das classificacoes, grau de aderencia, vantagens e limitacoes das definicoes, etc...

Notes[edit]

  1. ^ Sob o verbete "Template engine (web)" foi reunida uma listagem de aproximadamente 60 engines. The list and the article are stable since. http://en.wikipedia.org/wiki/Template_engine_%28web%29
  2. ^ See http://en.wikipedia.org/wiki/Template_processor
  3. ^ ...Ver ref sobre evolucao das query languages...
  4. ^ The "process output" concept of the operating systems (OS).
  5. ^ IBM's Mag Tape Selectric Typewriter (MTST) and later Mag Card Selectric (MCST) is cited as "early devices of this kind" by Wikipedia http://en.wikipedia.org/w/index.php?title=Word_processor&oldid=384228695
  6. ^ "Quik Reports" originated "Crystal Reports", see 1990s use ate http://www.accessmylibrary.com/coms2/summary_0286-9259653_ITM
  7. ^ van Rijsbergen C. J. ("Information retrieval", Butterworths, 1979) cited by ISP states that "documents having similar contents are also relevant to the same query", in a information retrieval context.

References[edit]

  1. ^ "Is it an Agent, or just a Program?: A Taxonomy for Autonomous Agents", Stan Franklin and Art Graesser (1996), Springer-Verlag, pag. 21-35. psu.edu 10.1.1.52.1255
  2. ^ a b "Template-Based Information Mining from HTML Documents", Jane Yung-jen Hsu and Wen-tau Yih (1997). psu.edu 10.1.1.43.7968
  3. ^ Tese Yih de 1997
  4. ^ a b c "Enforcing Strict Model-View Separation in Template Engines", T. Parr (2004). In Proceedings International WWW Conference, New York, USA. Access also at USFCA University.
  5. ^ ? http://www.w3c.org/...xslt ?define a linguagem, Ver se tem recomendacoes sobre processamento tambem
  6. ^ a b Terence Parr (2006), "A Functional Language For Generating Structured Text". ACM. pdf
  7. ^ a b Mario Blaževic (2009), "Composable Templates". OmniMark Developer Resources. psu:10.1.1.84.2413.
  8. ^ a b Defined by the ISO/IEC 9899:1999 standard of the C language.
  9. ^ Michael G. Shanley et al. (2009) "The prospects for increasing the reuse of digital training content". RAND Corporation. ISBN 978-0-8330-4661-1. http://www.rand.org/pubs/monographs/2009/RAND_MG732.pdf
  10. ^ Irwin D. Greenwald, Maureen Kane (1959). "The Share 709 System: Programming and Modification". Journal of the ACM 6 (2): 128-133.
  11. ^ ver ref C-P
  12. ^ http://download.oracle.com/docs/html/B13895_01/orbr_formletter.htm#g1012114
  13. ^ http://www.w3.org/TR/xslt20
  14. ^ http://www.w3.org/TR/xquery
  15. ^ http://www.w3.org/RDF
  16. ^ Gurdy Leete, Ellen Finkelstein, Mary Leete (2004), "OpenOffice.org for dummies", 359 pp. ISBN 978-0764542220.
  17. ^ Gary B. Shelly, Thomas J. Cashman, Misty E. Vermaat (2007), "Microsoft Office Word 2007: Complete Concepts and Techniques". Cengage Learning, 560 pp.
  18. ^ Andy Channelle (2009), "Beginning OpenOffice 3: From Novice to Professional". Apress, ISBN 978-1-4302-1590-5
  19. ^ http://en.wikipedia.org/wiki/Automatic_programming
  20. ^ a b c "Growing Languages with Metamorphic Syntax Macros", C. Brabrand & M. I. Schwartzbach. (2000). University of Aarhus, Denmark.
  21. ^ "Generative Programming: Methods, Tools, and Applications", Krzysztof Czarnecki and Ulrich W. Eisenecker; Addison Wesley, 2000.
  22. ^ Thomas Gottron (2008), "Clustering template based web documents", In "Proceedings of the IR research, 30th European conference on Advances in information retrieval". portal.acm.
  23. ^ x
  24. ^ Christopher D. Manning and Hinrich Schütze (1999) "Foundations of Statistical Natural Language Processing", MIT Press. ISBN 978-0-262-13360-9
  25. ^ Hamid Abdul Basit and Stan Jarzabek (2007), "A Data Mining Approach for Detecting Higher-level Clones in Software". IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, TSE-0079-0207.R2 1. comp.nus.edu.sg
  26. ^ http://dtd.nlm.nih.gov/publishing
  27. ^ http://www.ncbi.nlm.nih.gov/pmc
  28. ^ http://www.inera.com/extylesinfo.shtml
  29. ^ See "Kinds of template systems" at http://en.wikipedia.org/wiki/Web_template_system
    Tools like Microsoft FrontPage and Adobe/Macromedia Dreamweaver are classified as "Outside server template systems"; and "AJAX system tools" like Mjt or a "on browser XSLT processor", are classified as "Client side template systems". All another popular web template systems are "Server side template systems".
  30. ^ See ex. http://en.wikipedia.org/wiki/Ajax_%28programming%29
  31. ^ http://www.w3.org/TR/REC-xml/#sec-lang-tag
  32. ^ Jackendoff, Ray (1974). Semantic Interpretation in Generative Grammar. MIT Press.
  33. ^ "Domain Specific Languages for Interactive Web Services", C. Brabrand (2002). PhD Dissertation, University of Aarhus, Denmark.
  34. ^ Server Side Includes. http://httpd.apache.org/docs/2.2/mod/mod_include.html
  35. ^ http://www.w3.org/XML/Query
  36. ^ http://www.smarty.net
  37. ^ http://www.w3.org/TR/xslt
  38. ^ http://www.zope.org/Documentation/Books/ZopeBook/2_6Edition/AppendixC.stx
  39. ^ C.E. Shannon and W. Weaver (1963), "The Mathematical Theory of Communication". ISBN 0-25-272548-4.
  40. ^ Edna Ullmann-Margalit (1977), "The Emergence of Norms". Oxford Un. Press.

Appendix[edit]