User talk:D'oh!/sandbox0

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Comments[edit]

Rename "Query" to "Select"? I know it's not an actual command to use, but for clarity. It will sound awful much like MySQL, but "select" is more "natural". —  HELLKNOWZ  ▎TALK 17:38, 15 September 2010 (UTC)[reply]

Bah, I only read half of it and typed before reading all of it. Also preview and save are too close... —  HELLKNOWZ  ▎TALK 17:43, 15 September 2010 (UTC)[reply]
What do you mean by preview and save are too close? d'oh! talk 11:17, 16 September 2010 (UTC)[reply]
The <form> buttons "Preview" and "Save Page" are too close. I got frustrated by yesterday's wiki timeouts and misclicked. P.S. I have this page on watchlist, so you don't need to post {{tb}}. —  HELLKNOWZ  ▎TALK 11:21, 16 September 2010 (UTC)[reply]
No problem, I used the {{tb}} just in case you wasn't watching the page. Also I started writing the doc on how Pallet will function. d'oh! talk 11:47, 16 September 2010 (UTC)[reply]

Suggestion: Making namespace optional and default to articles (i.e. 0). So "Get "Main page";" —  HELLKNOWZ  ▎TALK 13:56, 16 September 2010 (UTC)[reply]

Suggestion: Page and page list separation, as in, different commands for retrieving one page and a page list. E.g. "Read "Main Page";", but "Pages in "Category:Random";". Then when applicable "Delete (Pages in "Category:Random");" —  HELLKNOWZ  ▎TALK 14:03, 16 September 2010 (UTC)[reply]

Both  Done. d'oh! talk 15:52, 16 September 2010 (UTC)[reply]

"when a variable isn't wrapped with quotation marks and one of the variables contains a reserved word (e.g. in, where, delete) the statement can cause unexpected results." I would say remove this option. Keep the syntax strict. Such as, "all variables have to be wrapped in quotation marks." —  HELLKNOWZ  ▎TALK 16:48, 17 September 2010 (UTC)[reply]

 Done. d'oh! talk 17:49, 17 September 2010 (UTC)[reply]

{{1:variable}} syntax seems very complex and I can foresee enough issues when implementing such recursive things. I think the framework is best to be kept with more statements and not one super-long one with parenthesis and level-specific variables/calls. Consider the following as example: "Define myPage as Page "Sandbox"; Read myPage. {#if "content" = "" | Delete myPage; };". I'm not sure if this can be all written in one line, but if it can, I guess it would be rather complex. —  HELLKNOWZ  ▎TALK 16:48, 17 September 2010 (UTC)[reply]

Extra whitespace and line breaks can be used throughout. I was going to do both magic variables and defining variables, but if magic variables are too complex I will drop them. d'oh! talk 17:49, 17 September 2010 (UTC)[reply]

Also do note that while most of syntax is straight-forward for "coder-type" editors, non-natural language and reserved word order may not sit well with non-programming background people. Think of SQL (which I can tell you know well enough). "SELECT name,address FROM stuff WHERE age=18" seems more or less natural. I suggest having these commands the same way, "GET 10 PAGES FROM Category:Stuff" reads better than "PAGES 10 IN Category:Stuff". —  HELLKNOWZ  ▎TALK 16:48, 17 September 2010 (UTC)[reply]

I have started this. d'oh! talk 17:49, 17 September 2010 (UTC)[reply]
 Done, it should read more easier now. d'oh! talk 12:28, 18 September 2010 (UTC)[reply]

On a side note, only your documentation will have code highlighting, meaning all underlines, boldfaces, etc. will be stripped when users write their own scripts. Again, borrowing from MySQL, capital reserved words work very nicely - "Select name from table" versus "SELECT name FROM table". Just a thought on how I would do it. —  HELLKNOWZ  ▎TALK 16:48, 17 September 2010 (UTC)[reply]

The code highlighting is used to help explain the items in the command definition. The code highlighting is not required when the code is interpreted. But I will use capital letters for the reserved words, as it does make it easier to read. d'oh! talk 17:49, 17 September 2010 (UTC)[reply]

Implementation[edit]

Are you not doing this via api.php? I notice you suggest using index.php?...&action=raw. —  HELLKNOWZ  ▎TALK 11:54, 18 September 2010 (UTC)[reply]

Yes, I am coding it up with api.php. But for the command READ I am using the action=raw from index.php. action=raw uses less bandwidth and doesn't cause high server load unlike the api method. The api method comes wrapped in JSON (or one of the other many formats) which uses more bandwidth and causes a high load of the server when decoding the JSON. d'oh! talk 12:25, 18 September 2010 (UTC)[reply]
Well, yeah, squid will return the page faster with action=raw. The wrapping in XML (which is what I use) doesn't take much time, it is a simple matter of appending mysql result with appropriate string. The live query itself is slower on servers, yes. But I assume you need more then just page content — you need timestamp, and edit token at least, and any other info user might want to use. —  HELLKNOWZ  ▎TALK 12:58, 18 September 2010 (UTC)[reply]
  • d'oh!* You are right I forgot about that. Thanks, here your cookie. :) d'oh! talk 14:15, 18 September 2010 (UTC)[reply]

Cookieeee... Another question — what language are you planning to use? Are you licensing this under GNU/CCbySA/PD/CR? —  HELLKNOWZ  ▎TALK 15:12, 18 September 2010 (UTC)[reply]

I am pro-opensource, so PHP and I am thinking of using CCbySA since the spec is under CCbySA already. d'oh! talk 15:39, 18 September 2010 (UTC)[reply]
I would support PHP and can help with coding if needed. I have decent experience. —  HELLKNOWZ  ▎TALK 17:00, 18 September 2010 (UTC)[reply]
When the spec is completed and a community starts to build around the project, I like to see developers come on board. But I will leave choosing developers and trusted editors up to the community. d'oh! talk 13:49, 19 September 2010 (UTC)[reply]

Few ideas[edit]

This looks like a really interesting idea, but I have few suggestions about the language:

  • I think this language should be written in a way that it is easy to use for its purpose, so I think having some real tasks that could be expressed using it would be good (and keep them in mind when designing the language). Otherwise, features may be needed to be added at a later time, resulting in a similar mess to current MediaWiki syntax.
  • I think much more information about pages is needed, e.g. about revisions, categories they are in etc. and not just in a SELECT. I would like object-like syntax (e.g. #page.revisions[0].user.is_anon), but maybe that would complicate the syntax too much.
  • I don't like the syntax for tools at all. I don't think basing them on ParserFunctions is a good idea. While the syntax may be familiar to template-editors, it's already quite different (and I'm sure it's going to be more and more), which is confusing. And there's no need for such complicated syntax at all. Something like IF test THEN action ELSE another action seems much better to me.Svick (talk) 14:34, 18 September 2010 (UTC)[reply]
    • I concur with the "IF something THEN thisthing ELSE thatthing" (like Pascal) reads significantly better. And the point about editors not familiar with templates is also true. —  HELLKNOWZ  ▎TALK 15:09, 18 September 2010 (UTC)[reply]
  • I don't see the need for #expr: Why shouldn't arithmetic expressions work directly? The ambiguity of #var1 + #var2 could be resolved by using another operator for concatenation (e.g. .).
  • Why is there #ifeq? Isn't ordinary if with = enough?
  • I think for is useful too, in addition to foreach. (Or at least some way to generate numbers .)Svick (talk) 14:34, 18 September 2010 (UTC)[reply]
    • I think FOREACH should be the priority in syntax, it is much less prone to screwing up than using FOR. E.g.
      FOREACH page IN PAGES FROM CATEGORY "Video games"
        DELETE page;
      END.
      reads nicely (and is influenced by Pascal a lot). To further improve readability, shortcuts can be added, such as, FOREACH page IN CATEGORY "Video games". Simple and friendly. —  HELLKNOWZ  ▎TALK 15:09, 18 September 2010 (UTC)[reply]
  • As already mentioned above, magic variables, especially with the ability to access higher levels using the number prefixes, seem to be too complicated to me.

Svick (talk) 14:34, 18 September 2010 (UTC)[reply]

I like the idea on having a object-like syntax, although it will be hard to explain to users with limited programing experience, it will contain all the page information and functional. Plus the end result will be very clean:

#pages = articles.category("Computer jargon").limit(10);

foreach (#pages as #page) {
  
  #page.remove_category("Jargon");
  
  #page.remove_category("Computer jargon");
  
}

On the other hand if you want to do the same thing in the current syntax, its not much better:

DEFINE pages AS (SELECT 10 articles FROM "Category:Computer jargon");

{#foreach:#pages AS page | EDIT #page {#regexp: "{\[\[Category:(Computer jargon|Jargon)\]\]}" | {#contents#} | "" } }

What do you both think of using object-like syntax instead? d'oh! talk 15:29, 18 September 2010 (UTC)[reply]

Object oriented syntax as shown above is not true to simplicity and ease of use by new users. I thought the goal of this project is to provide a platform where a first-time editor can write his bot task in 30 min with near-natural language. I think we ought to define a proper goal and syntax direction here. Below is what I meant:
#pages = GET 10 PAGES FROM CATEGORY "Computer jargon"

FOR EACH #page IN #pages

    IF NOT #page->protected AND NOT #page->semiprotected THEN

        READ #page

        REMOVE CATEGORY "Jargon" FROM #page

        SAVE #page

    END
 
END

You will see me strongly supporting this kind of syntax, which is as close to natural language as possible. I do realize how backwards "REMOVE CATEGORY "Jargon" FROM #page" looks to a programmer, as it "should" be "#page -> REMOVE CATEGORY "Jargon"" or similar. But the goal here is the end-user, right? —  HELLKNOWZ  ▎TALK 16:58, 18 September 2010 (UTC)[reply]

The goals of the project has been changed a few times, and the goals was not clear to begin with. So I took both code examples to non-programmers and I received a blank expression for both. After telling them to not think about it as code but as sentences from a person telling them what is going to happen, surprising they pointed to object-like syntax as "less complicated". They also asked why am I creating a language for someone who doesn't want to create bots. Which got me questioning the target audience for this language. Maybe instead of targeting editors with no programming experience, what about editors who has little to good programming experience but are scared off from writing a bot because of the mundane tasks that go with it, such as writing the code to edit the common parts of pages (e.g. categories), working with the MideaWiki APIs, writing code to connect to the APIs and giving a hold of a server to run the bot. If these mundane tasks was removed, it will make it easier to create bots. In this case, I think the goals for this project should now be:

  1. Remove the mundane tasks.
  2. A nice and easy syntax where users can still create bots without knowing the language in full, think of PHP.
  3. A functional language and full of useful features, so experienced programmers to create complex bots.

d'oh! talk 09:35, 19 September 2010 (UTC)[reply]

You can only go so far in complexity, an experienced programmer needing a big task would rarely use a scripting language, because it does not allow any code of their own. You cannot edit images, browse websites, parse special files or dumps, do tricky calculations, etc. The framework is limited to its available list of commands. What you are proposing is, I think, closer to a library for a real language, such as PHP or C. Then it becomes an API framework, like WP:Peachy.
From implementation/programming perspective you would have the same functions and loops as the corresponding scripting statements. There seems to be little benefit of moving from PHP's $page = $site->loadPage("Main Page"); $page->edit("Ha-ha!", "Updating."); to #page = article.load("Main Page"); #page.edit("Ha-ha!", "Updating.");.
Though don't take my comments as (too much of) criticism. I am just too fixed on the "natural language" thing. I can see your proposal as (very) useful for implementing mundane bot requests, that would otherwise need full BRFAs and such. Though I must say, I would expect a very high level of scrutiny from a framework that allows changing stuff this quickly and easily. —  HELLKNOWZ  ▎TALK 11:05, 19 September 2010 (UTC)[reply]
True, this language can not do tricky calculations or run bots requiring large server power, but don't be so quick to rule out anything else. Remember PHP is a scripting language too. I know I am not going to win over most experienced programmers from our current tools, but this language is a new tool some programmers may use it. Everyone who commented, including you, has given (or lead me to) very good ideas, so please keep them coming. I know you like the natural language idea, but I just see it becoming a mess. Even though the object-like syntax will become a mess too, but it will be easier to mange and easier to expand when new features come out, e.g. Pending changes (if that comes about). I know I will received a high level of scrutiny from BRFA and other groups, and I am ready for it. d'oh! talk 13:49, 19 September 2010 (UTC)[reply]

Comment[edit]

D'oh, good idea. This looks like an online version of AWB. AWB allows adding custom modules and plugins for non-standard tasks. Ganeshk (talk) 13:20, 9 October 2010 (UTC)[reply]