User talk:WP 1.0 bot/Third generation
This page discusses the new version of the WP 1.0 bot that is under development. This will replace the current version. Please add your discussion points below. Walkerma (talk) 03:02, 11 October 2018 (UTC)
More eyes needed
[edit]The 1.0 bot is now a pretty major piece of the editorial infrastructure of the encyclopedia. Would it make sense to advertise these as an RFC, or ask about it on WP:CENT and/or WP:VPT? Titoxd(?!?) 22:21, 11 October 2018 (UTC)
- @Titoxd: Hi there! So glad to see your name there! Sorry I was slow to follow up - things are just very busy IRL. Yes, it would be a great idea. Do you know how to do that? I've never done one before. Walkerma (talk) 04:57, 17 October 2018 (UTC)
Bot features
[edit]Below is first posting of useful features for new assessment bot. JoeHebda (talk) 14:38, 18 October 2018 (UTC)
Feature: Bot status page
[edit] In progress
Working
---
Completed
Done
Custom message
---
Warning
---
Feature: Progress bar
[edit]Feature: Lockup or error notification
[edit]Feature: Force bot to create log file
[edit]In addition to on-request assessment table generation, add ability for bot to create log file.
Feature: Collision avoidance
[edit]Simple flagging logic for every WP:
- Flag-inuse-assessment
- Flag-inuse-log
- Flag-inuse-popular-pages
1. Community tech bot, clears all Popular-pages flag at start of it's monthly job. For each WP, check two other flags before working on each WP.
If not in use, set Popular-pages "In-use" ON. After each WP Popular pages table created, set "In-use" flag off for that specific WP.
If in-use, skip to next WP. After first pass through WPs, go back & look for more WP that were skipped until all are done.
---
2. WP 3.0 bot, Daily job (for assessments and logs)
---
3. WP 3.0 bot, On Request jobs (for assessments only)
---
Feature: Daily job schedule
[edit]Instead of 1 fixed time daily job, create job to read 7 database records, find record that matches current day-of week & start processing at that time. JoeHebda (talk) 13:29, 22 October 2018 (UTC)
For example:
Day | Job start | Comment |
---|---|---|
Monday | 02:00 | - |
Tuesday | 01:30 | - |
Wednesday | 01:15 | - |
Thursday | 00:45 | - |
Friday | 00:05 | - |
Saturday | 00:05 | - |
Sunday | 00:05 | - |
- Before setting this schedule, need to investigate peak workload times at Wikipedia data centers.
- Need to know approximate total processing runtime, if the bot ever runs through without stopping because of error(s). This will really help with scheduling.
JoeHebda (talk) 13:40, 22 October 2018 (UTC)
Feature: Category tree
[edit]The WP 1.0 bot uses this actual data updated gist here. Recently the bot started multiple processing (4 or 8 times) of WikiProject Military history and eventually stops or stalls out after many hours. A search of category data shows this WP is listed 61 times with all the task forces.
For the new bot, it needs to be driven by a different or better category tree. For example, in looking at Wikipedia:WikiProject Catholicism/Popular pages there is another bot (Community tech bot) doing assessments. Have not heard any comments about it halting or failing.
Other considerations:
- Skip task forces? Avoid complex interconnections between WP.
- Skip inactive wikiprojects? These can always be manually updated as needed.
- Break apart A to Z alphabet to schedule bot workload spread Monday to Sunday? Weekly updates are better than none.
- Schedule WP Biography processing as a single job?
JoeHebda (talk) 16:15, 23 October 2018 (UTC)
Feature: Tags on "User contributions" log file
[edit]Distinguish between daily vs. on-request processing.
- For daily WP 1.0 bot processing, add a "D" to each line of the log.
- For on-requested enwp10 tool processing, add a "R" to each line of log.
The advantage is to identify when on-requested processing occurs during the daily update. Alternate codes could be a "B" for bot, and a "T" for tool. JoeHebda (talk) 20:18, 20 December 2018 (UTC)
General need for WP1.0
[edit]The first and second generation bots are great at compiling metadata about articles and article collections. However, the process of turning those lists of articles into offline collections is still requires a lot of manual intervention to bridge this gap. Probably this work would require a different piece of code, separate from the current WP 1.0 bot, but I want to list the work so we can consider it. The necessary steps for turning metadata into usable offline collections are as follows:
- Compute the WikiProject "relative importance" score for each WikiProject, based on the lead article(s) for that project.
- Use the bot's data to generate lists of articles like this one, using a combination of the quality, importance and "external interest points score" for each article. The bot should be able to compile these WikiProject lists into one grand master list, while taking into account the following:
- Make sure to adjust scores to allow for WikiProject "relative importance" (see above).
- Allow the user to customise the collection list to meet their needs. For example, a user putting together a collection for use in Zimbabwe may choose to give African WikiProjects a higher importance than others, to meet the local needs and interest. A medical group might want to emphasise articles involving medicine, chemistry and biology, but still include some non-medical content for their target end-users.
- Remove duplication between lists (e.g., Marie Curie might appear under WP:Physics and WP:Chemistry), and make sure that each article is given the highest score possible. (e.g., Curie might be only mid-importance for Poland,but top importance for chemistry, and we need to base it on the latter).
- Once a complete list of scores has been created, remove lower score articles based on a cutoff score.
- With the master collection list in hand, we need to find the best recent RevID for each article using ORES (currently active) and/or WikiTrust (inactive but may be re-activated).
- Create the ZIM file using the list of articles and related RevIDs.
Once this process has been completed, the ZIM file can be shared on the Kiwix website, and through end-user groups such as Internet-in-a-Box. Walkerma (talk) 17:09, 18 October 2018 (UTC)
WP BOT 3.0 - Processing summary
[edit]Step1: Startup - At beginning of process, set all WP status lights to plain black circle.
---
Step2: In-use - While processing a WP, set status to yellow.
---
Step3a: Done - When WP processing is completed successfully, set to status to green, skip to next WP.
---
Step3b: Error - When WP processing has an error, set to status to red, skip to next WP.
---
At close of all WP Processing, Status lights will be Green, or some may be Red.
Step 4: Post list of Error WP to "Bot User" page for the Red error WPs. Include error timestamp.
---
JoeHebda (talk) 02:30, 20 October 2018 (UTC)
Audit report - talk page WikiProjects
[edit]In order to do effective cleanup of errors on Talk pages, an audit report needs to be ran. Criteria to be checked
- Duplicate WikiProject lines
- Duplicates of workgroup names & wikiprojects for same workgroup
- Standard spelling of WP names, for example: WPMILHIST, WPmilhist, WikiProject Military history
- Duplicate WPs in the category tree
There may be more criteria which need to be added. Regards, JoeHebda (talk) 15:53, 1 November 2018 (UTC)
- What can I do to help with this BOT? Adamdaley (talk) 03:24, 16 November 2018 (UTC)
- @Adamdaley: - The WP 1.0 bot is being "rewritten" with Python so if you know anyone on WP with Python programming skillset please send them the link to this page so they can collaborate & contribute their talent. Regards, JoeHebda (talk) 14:51, 16 November 2018 (UTC)
Unblock for 1 hour?
[edit]Can I request that the Bot 1.0 be unblocked for 1 hour so I can update WP:MILHIST/Biographies and WP:MILHIST – Please? Adamdaley (talk) 03:08, 16 November 2018 (UTC)
- @Adamdaley: - Since the bot was blocked at Wikipedia:Bots/Noticeboard I will copy-and-paste your question there. I myself don't know how to block or unblock. Wish there was a button for those two functions.
- Also be aware that when you request assessment with emwp10 it updates the assessment tables only & no quality logs. JoeHebda (talk) 14:37, 16 November 2018 (UTC)
Status update: 2019-07-24
[edit]Greetings @Audiodude: - Lately I have not been checking logs for WP1.0bot. The last days it is not running for all WPs. For example:
- 19 July - 212 wikiprojects processed
- 20 July - 212 wikiprojects processed
- 21 July - 216 wikiprojects processed
- 22 July - 210 wikiprojects processed
- 23 July - 39 wikiprojects processed
Wondering if this is a bug with new bot version, or just a smaller test set?
Log to show "Requester"? Is this feature going to be added? For today 24 July, after 20:31, 23 July 2019, User:WP 1.0 bot/Tables/Custom/Roads-1, then User:WP 1.0 bot/Tables/Project/Biography (military) was ran ELEVEN times consecutively between 23:36, 23 July 2019 and 06:39, 24 July 2019 without any other WPs in between. Could this be a live person on that wikiproject repeatedly requesting? And not a bug in the bot. Or articles with that WP coded incorrectly, driving the bot into repeats? Regards, JoeHebda (talk) 13:28, 24 July 2019 (UTC)
- @JoeHebda: Hi Joe, I moved your comment to the talk page, as the article page is more for "official" announcements about the status of the bot and not a suitable place for discussion. Hope you don't mind. Anyways, I'm not sure what you mean by "not running for all WikiProjects". Running how? We have been experimenting with the scope of the new bot versus the old bot over the past week or two, but that should be transparent to the end users. Specifically, if you recall from the discussions on this talk page, the bot is currently banned from creating logs and only creates tables.
- Currently, no work has been done on the uploading part of the bot, that actually generates the tables, and no work has been done on the web interface, the page where you "kick off" an update. When that happens, we should be able to accommodate your feature request to mark manually requested updates in the edit summary. Hope this helps, audiodude (talk) 06:16, 26 July 2019 (UTC)
Number of WikiProjects processed
[edit]@Audiodude: - In addition to WP 1.0 bot, for the last several weeks I've been involved with reporting issues/errors with Community Tech bot which also processes all Wikiprojects (monthly) producing the "Popular pages" table. For CTb, it completed July and processed 795 Wikiprojects here.
Sorry but I'm confused as to why WP 1.0 bot is only processing (daily) about 200 Wikiprojects? Example User:WP 1.0 bot/Tables/Project/Saints revision history shows 20:34, 14 July 2019 processing, then nothing until 20:03, 24 July 2019. JoeHebda (talk) 13:31, 26 July 2019 (UTC)
- Hi @JoeHebda: What you're saying makes a lot of sense, I think during that period (Jul 14 - Jul 24) there were errors in the old bot and we were trying to get the new bot up to speed. So we were tinkering with it, which led to some instability. I think as of today the bot updated tables for about 1000 projects which seems pretty accurate, given that I believe there are 2100 projects but many of them are dead/empty/have no updates.
- As of today, the new bot is processing all of the updates (not uploads/table posting) for all projects, so it should be more consistent going forward. User:Walkerma is planning on an announcement of that fact in some capacity soon, I believe. Hope this helps, audiodude (talk) 07:44, 1 August 2019 (UTC)