Wikipedia talk:Database reports/Archive 3
This is an archive of past discussions about Wikipedia:Database reports. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | Archive 4 | Archive 5 | → | Archive 8 |
Request: Files with 'conflicted' licensing
In other words, files that are currently tagged as having both a 'free' and 'non-free' license, or which have a 'free' license but also have a 'fair-use' rationale present.
I'm currently working through an earlier list someone provided on the toolserver here: http://toolserver.org/~betacommand/reports/miss_tagged_files.txt
but feel that doing this via a weekly report would greatly assist cleanup efforts.
I am currently tagging the ones I am finding with {{conflicted-license}} which will categorise into Category: Wikipedia files with confilicted copyright information a category whose contents could reasonably be excluded from the report.
Sfan00 IMG (talk) 12:08, 4 January 2010 (UTC)
- Is there some way to identify images using a free license? Is there a master category of free license templates somewhere? I know all non-free file go into Category:All non-free media, but there doesn't seem to be a corresponding category for free media. --MZMcBride (talk) 17:55, 4 January 2010 (UTC)
- Category:Free_images and it's subcateogries is a possibility, but it's subcatagories, not just on great big one...
Sfan00 IMG (talk) 22:19, 5 January 2010 (UTC)
- OK I've looked into this further, potentialy conflicted files are ones in both Category:All free media and Category:All non-free media.Sfan00 IMG (talk) 11:14, 16 January 2010 (UTC)
- Further to last message , someone called erwin on #wikimedia-toolserver came up with the following -
SELECT page_title
FROM page
JOIN categorylinks AS c1
ON c1.cl_from = page_id
JOIN categorylinks AS c2
ON c2.cl_from = page_id
WHERE page_namespace = 6
AND c1.cl_to = 'All_free_media'
AND c2.cl_to = 'All_non-free_media'
It would need a report to go with it but in essence the query is straightforward. :) Sfan00 IMG (talk) 11:28, 18 January 2010 (UTC)
- Done: Files with conflicting categorization (configuration). --MZMcBride (talk) 05:14, 25 April 2010 (UTC)
Broken reports
Articles containing red-linked files (configuration) — Commons subquery issueFixed!File description pages containing no templates (configuration) — Commons subquery issueFixed!File description pages containing no templates or categories (configuration) — Commons subquery issueFixed!Indefinitely semi-protected articles (configuration) — needs to be paginatedFixed!Mistagged non-free content (configuration) — Commons subquery issue; also needs a rename to match other reportsFixed!Templates containing red-linked files (configuration) — Commons subquery issueFixed!
--MZMcBride (talk) 19:33, 9 March 2010 (UTC)
Update: The Commons subquery issue is described here: https://jira.toolserver.org/browse/TS-549. --MZMcBride (talk) 05:33, 11 March 2010 (UTC)
- Could I prod you into running the indef semi prot one? There's an ongoing WP:AN about indef semi's and I'd like to see the current data. –xenotalk 18:42, 23 March 2010 (UTC)
- Cats are evil. --MZMcBride (talk) 07:19, 24 March 2010 (UTC)
- Lol. Just dogs for future, then =p Cheers, –xenotalk 15:27, 25 March 2010 (UTC)
- Cats are evil. --MZMcBride (talk) 07:19, 24 March 2010 (UTC)
Candidates for Autoreview privileges
I would like to generate a report on editors who are eligible for autoreview privileges (and don't already have them):
- Search for all editors with more than 50 created non-redirect articles.
- Remove entries which already have administrator or autoreview privileges.
I suppose this report could be run quarterly—once every three months or so. I asked around to see if anyone would be willing to make a bot to do this, and I was pointed here by Tim1357. —Preceding unsigned comment added by Cryptic C62 (talk • contribs)
- I think someone beat me to this: jira:DBQ-87. I can still make a report with the provided query if you'd like. If not, we'll count this as resolved. --MZMcBride (talk) 03:11, 17 March 2010 (UTC)
- Yay! I am more than slightly clueless, so bear with me here: Is there a way to use DBQ-87 to generate a new report every few months? Or is it just a static report? In the latter case, I think it would be helpful to make the report here. On the other hand, this query generated over 900 results, so it may not even be necessary to run it again any time soon considering how long it would take to process that many candidates. Your call. --Cryptic C62 · Talk 04:01, 17 March 2010 (UTC)
- The DBQ is static. The ticket has been resolved and isn't set to re-run in the future. A report here would be set to update every X weeks or months or whatever. (This is actually how most queries got run before I set up "Database reports" with automation.) If/when you need the query re-run, just leave a note here and I'll use that code that Mauro wrote to make something that updates however often you'd like. --MZMcBride (talk) 04:15, 17 March 2010 (UTC)
- That sounds like an excellent plan to me, thanks mate! I'll come back in a few months when this batch is done. In the meantime, here's something to think about: Of the candidates that these queries generate, some will probably not be granted autoreview privileges. In order to avoid having these users show up in every subsequent report, I could maintain a subpage that lists the rejected candidates. I assume it would be fairly straightforward for the automated report to perform an additional check to see if the user is on that list, correct? --Cryptic C62 · Talk 04:21, 17 March 2010 (UTC)
- Simple enough to program. Something similar is already done with the edit count list (currently living at User:MZMcBride/Sandbox 3), which has both an opt out list (for unflagged bots) and an anonymize list. You may run into political issues making a list of users, though. Sometimes people get testy with such things (look up "User:Kelly Martin/B" sometime if you're bored). Just let me know sometime in the future and we'll figure out a decent solution then. --MZMcBride (talk) 04:29, 17 March 2010 (UTC)
- Yup yup, sounds good. --Cryptic C62 · Talk 04:33, 17 March 2010 (UTC)
Yikes, there's a problem! It seems that the line of code r2.rev_timestamp > '20091231235959'
isn't working. Many of the editors on the list are no longer editing. For now I can just check their most recent contributions, but what can be done to rectify this permanently? --Cryptic C62 · Talk 02:18, 18 March 2010 (UTC)
- So... I guess I didn't get beat out after all? Can you give me an example user that's wrongly listed in the current list and an example user that's correctly listed? I should be able to run this report properly this week. Thanks. --MZMcBride (talk) 02:22, 18 March 2010 (UTC)
- Certainly. 2fort5r, 45ossington, and 4meter4 were all correctly listed and approved, as they meet all the requirements and have edited in 2010. 40fifw0, 6afraidof7, and 9Nak were all incorrectly listed, as they meet the requirements but have not edited in 2010. On a side-note, it may be useful to add a check that the user is not currently blocked. I assume this would also be easy to implement? --Cryptic C62 · Talk 02:33, 18 March 2010 (UTC)
- The static report is becoming less useful over time. In the last batch of 6 candidates, 2 had already been given the autoreview privilege since the report was first compiled, plus there's always the problem of inactive users. I've started an exclusion list at User:Cryptic C62/Exclusion, any chance ya'll could run the report again using the new criteria? --Cryptic C62 · Talk 16:05, 23 April 2010 (UTC)
- What are your exclusion criteria for that user subpage? The same as listed above? --MZMcBride (talk) 05:36, 25 April 2010 (UTC)
- The static report is becoming less useful over time. In the last batch of 6 candidates, 2 had already been given the autoreview privilege since the report was first compiled, plus there's always the problem of inactive users. I've started an exclusion list at User:Cryptic C62/Exclusion, any chance ya'll could run the report again using the new criteria? --Cryptic C62 · Talk 16:05, 23 April 2010 (UTC)
If a user is not suitable for the autoreview privilege but might become suitable later (for example, if they haven't edited/created articles in a while), then I don't put them on the exclusion list. If a user is not suitable for the privilege and they are unlikely to become suitable later (for example, if they are indefinitely blocked or have a number of article warnings on their talk page), then I do put them on the exclusion page. Also, if I nominate a user but an admin rejects the nomination, then I will put that user on the exclusion list. Basically, I just want to avoid having to process the same users over and over again if there's no chance that the later nominations will be successful. The ideal report would produce users with all of the following attributes:
- 50+ created articles
- Not autoreview, not administrator
- Most recent edit after 1 April 2010
- Not blocked
- Not on exclusion list
Does this help? --Cryptic C62 · Talk 18:08, 25 April 2010 (UTC)
Start articles
A report that list articles with quality scale of start that doubled in size since were last rated. Sole Soul (talk) 17:16, 26 March 2010 (UTC)
- For which wikiProject? –xenotalk 17:19, 26 March 2010 (UTC)
- (edit conflict) Do you have any example pages? I don't think this is possible to (easily) query, but I won't know for sure until I try it out on a few example pages. Also, is this something you'd like a regularly updated report for or just a one-time thing? What's the ultimate goal of this report (there might be a better way to find similar data)? --MZMcBride (talk) 17:21, 26 March 2010 (UTC)
- This is not a wikiproject-specific. I will try to find examples; I was thinking that the hard part is to know when was an article last rated, which do not need examples, am I wrong? A one-time report is enough, I think. Sole Soul (talk) 19:29, 26 March 2010 (UTC)
- If it's not WikiProject specific, the report is going to be huge - and who is going to go through it? Most projects already have a trouble keeping their unassessed articles out of backlog - let alone re-rating existing ones... –xenotalk 19:39, 26 March 2010 (UTC)
- This is only a report for start pages that were doubled in size since last rated, I doubt that it is gonna be huge. If that is the case then we can start with the first 100. Outdated rate is less useful than no rate. Sole Soul (talk) 19:54, 26 March 2010 (UTC)
- If it's not WikiProject specific, the report is going to be huge - and who is going to go through it? Most projects already have a trouble keeping their unassessed articles out of backlog - let alone re-rating existing ones... –xenotalk 19:39, 26 March 2010 (UTC)
- This is not a wikiproject-specific. I will try to find examples; I was thinking that the hard part is to know when was an article last rated, which do not need examples, am I wrong? A one-time report is enough, I think. Sole Soul (talk) 19:29, 26 March 2010 (UTC)
I'm no programmer, but maybe this will help:
- Step 1: Find all talk pages that contain
class=start
. - Step 2: Search through revisions backwards to find timestamp of earliest revision with
class=start
. - Step 3: Search through revisions of the actual article to find revision with the largest timestamp that is older than the aforementioned talk page timestamp. (The revision of the article that was reviewed when
class=start
was added. - Step 4: Find prose size of that article revision.
- Step 5: Find prose size of current revision.
- Step 6: If current prose ≥ 2x old prose, add to report.
Cheers. --Cryptic C62 · Talk 17:32, 27 March 2010 (UTC)
- Yeah, you're clearly not a programmer. :-) --MZMcBride (talk) 18:20, 28 March 2010 (UTC)
- Lol, why do you say that? --Cryptic C62 · Talk 18:25, 28 March 2010 (UTC)
User subpages of indefinitely blocked users
It could be useful to have a report listing user subpages, and maybe also user talk subpages, of indefinitely blocked users blocked over a year ago, as they often contain inappropriate content. A report on main userpages and usertalkpages would need plenty of exceptions to become usable but it's probably not the case for subpages. Cenarium (talk) 22:57, 11 April 2010 (UTC)
- Done. I removed all pages ending in .css and .js. Cheers! Tim1357 (talk) 02:49, 21 April 2010 (UTC)
The SQL
|
---|
SELECT page_title FROM page
JOIN user ON user_name = SUBSTRING_INDEX(REPLACE(page_title,'_',' '), '/', 1)
JOIN ipblocks ON user_id = ipb_user
WHERE page_namespace = 2
AND ipb_expiry = 'infinity'
AND page_title LIKE '%/%';
|
Cenarium: Do you want a regularly updated report for this? If so, how often? --MZMcBride (talk) 05:19, 25 April 2010 (UTC)
Very old user subpages of long-term inactive users
Per discussion at Wikipedia:Village_pump_(development)#Bot_to_delete_very_old_userspace_pages, would it be possible to get a report listing very old user subpages of long-term inactive users? We could start with the criteria (on an AND filter basis) i) subpage older than 5 years ii) user's last edit (to any page) more than 5 years ago. It's quite possible that this listing would contain nothing at all, in which case we could try it with 3 years instead of 5. PS Providing a diff and date of the last edit of the subpage could be helpful. Rd232 talk 12:45, 12 April 2010 (UTC)
- Its running now. When it finishes it will be saved here. The SQL:
Extended content
|
---|
SELECT
DISTINCT(user_name)
FROM user
JOIN page
ON REPLACE(page_title,'_',' ') = user_name
AND page_namespace = 2 WHERE (SELECT MAX(rev_timestamp) FROM revision WHERE rev_user= user_id) < DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 5 YEAR),'%Y%m%d%H%i%s');
|
- I think that's what you want. Tim1357 (talk) 02:54, 17 April 2010 (UTC)
- Hah, parts of that query look pretty familiar. --MZMcBride (talk) 02:57, 17 April 2010 (UTC)
- Thanks, but that's a list of old userpages. That may be of interest as well, but less so. I'm after user subpages. Also (and this must be a perennial question) can't the report provide clickable links to the pages in question? Copying and pasting will waste a lot of time here. Rd232 talk 07:32, 17 April 2010 (UTC)
- Yeah, that list from Tim1357 is useless to you; it doesn't contain any subpages. I'll look at making a proper report. --MZMcBride (talk) 17:34, 17 April 2010 (UTC)
- Wow I fail at reading. I didn't take anybody else's SQL, so maybe just great minds think alike? Ill do another query for subpages and post a link when I am done. Tim1357 (talk) 17:59, 17 April 2010 (UTC)
- Great minds read documentation, I think you mean. I wasn't accusing you of anything, but the REPLACE(), DATE_FORMAT(), and SELECT MAX(rev_timestamp) bits all look familiar. Anyway, your problem is that you're JOINing page ON user, which will inherently eliminate all subpages. I wouldn't worry about this one, I'm about to write it. --MZMcBride (talk) 18:04, 17 April 2010 (UTC)
- I already did. Running and will save to here when done. New SQL:
- Great minds read documentation, I think you mean. I wasn't accusing you of anything, but the REPLACE(), DATE_FORMAT(), and SELECT MAX(rev_timestamp) bits all look familiar. Anyway, your problem is that you're JOINing page ON user, which will inherently eliminate all subpages. I wouldn't worry about this one, I'm about to write it. --MZMcBride (talk) 18:04, 17 April 2010 (UTC)
- Wow I fail at reading. I didn't take anybody else's SQL, so maybe just great minds think alike? Ill do another query for subpages and post a link when I am done. Tim1357 (talk) 17:59, 17 April 2010 (UTC)
- Yeah, that list from Tim1357 is useless to you; it doesn't contain any subpages. I'll look at making a proper report. --MZMcBride (talk) 17:34, 17 April 2010 (UTC)
- I think that's what you want. Tim1357 (talk) 02:54, 17 April 2010 (UTC)
Extended content
|
---|
SELECT
DISTINCT(REPLACE(page_title,'_',' ')) as usersubpage
FROM user
JOIN page
ON user_name=SUBSTRING_INDEX(REPLACE(page_title,'_',' '), '/', 1)
AND page_namespace = 2 WHERE (SELECT MAX(rev_timestamp) FROM revision WHERE rev_user= user_id) < DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 5 YEAR),'%Y%m%d%H%i%s') and page_title LIKE '%/%';
|
- Cheers! Tim1357 (talk) 18:30, 17 April 2010 (UTC)
- Thanks... but is there really no way for the reports to be linked to the pages in question, so they can be seen with one click? Rd232 talk 22:12, 19 April 2010 (UTC)
- Ha, of course! That is just the way the query outputs it. here ya go. Tim1357 (talk) 01:13, 21 April 2010 (UTC)
- Thank you. This way of presenting the info is so much more user friendly! :) Rd232 talk 06:52, 21 April 2010 (UTC)
- Ha, of course! That is just the way the query outputs it. here ya go. Tim1357 (talk) 01:13, 21 April 2010 (UTC)
- Thanks... but is there really no way for the reports to be linked to the pages in question, so they can be seen with one click? Rd232 talk 22:12, 19 April 2010 (UTC)
Rd232: Do you want a regularly updated report for this? If so, how often? --MZMcBride (talk) 05:19, 25 April 2010 (UTC)
Pages with four or more WikiProject banners, no shell, and at least one heading
I would like a list of talk pages with four or more WikiProject banners1, no shell2, and at least one ==heading==3. –xenotalk 18:38, 14 April 2010 (UTC)
Criteria 1
\{\{.*?(WikiProject|WP|\|[ ]class[ ]*=).*?\}\}.*?\{\{.*?(WikiProject|WP|\|[ ]class[ ]*=).*?\}\}.*?\{\{(.*?WikiProject|WP|\|[ ]class[ ]*=).*?\}\}.*?\{\{(.*?WikiProject|WP|\|[ ]class[ ]*=).*?\}\}
Criteria 2
\{\{[ ]*(Template:|)(W(iki|)p(roject|)[ ]*banner[ ]*s(hell|)|(WPBS|WPB|WBS|Shell)[ ]*\|)
Criteria 3
==
- I imagine the first iteration of this will be rather large, so you can just store it somewhere before making it into a auto-refreshing report. –xenotalk 18:39, 14 April 2010 (UTC)
- Sounds like you're posting to the wrong page. --MZMcBride (talk) 18:51, 14 April 2010 (UTC)
- bah. you ignore that page like a red-headed stepchild. –xenotalk 18:55, 14 April 2010 (UTC)
- Toolserver accounts are free and I publish nearly all of the code I write. :-) --MZMcBride (talk) 19:14, 14 April 2010 (UTC)
- I was applying for a toolserver account and ran into a snag (couldn't log into JIRA). Emailed the people who were responsible and never got a reply. =\ –xenotalk 19:23, 14 April 2010 (UTC)
- Try again? I don't know of any known issues regarding account creation in JIRA. If you hit a wall, post the error message (here or on my talk page) and I'll see that it gets looked at. --MZMcBride (talk) 19:25, 14 April 2010 (UTC)
- I was applying for a toolserver account and ran into a snag (couldn't log into JIRA). Emailed the people who were responsible and never got a reply. =\ –xenotalk 19:23, 14 April 2010 (UTC)
- Toolserver accounts are free and I publish nearly all of the code I write. :-) --MZMcBride (talk) 19:14, 14 April 2010 (UTC)
- bah. you ignore that page like a red-headed stepchild. –xenotalk 18:55, 14 April 2010 (UTC)
- Why are you suggesting regex on the page text here? You're interested in pages not using Template:WikiProjectBannerShell that contain 5 or more instances of templates that start with "WP" or "Wikiproject"? That seems mostly like database work (the database will be far more accurate than your shitty regex, at least). The additional == header requirement isn't really too difficult to implement in a database report as well. Much as I loathe dealing with page text, some of the current reports do so. Would the implementation I suggested work, and if so, do you have a report title in mind? --MZMcBride (talk) 19:29, 14 April 2010 (UTC)
- It would probably be more robust to find pages transcluding 4 or more members of Category:WikiProject banners (recursive), not transcluding {{WikiProjectBannerShell}}, but I am not a wizard like you. The == part can be left off if it's easier, but I'll probably still be ignoring pages without at least a header (no need to save space when there's nothing below to see) The above is the hack I came up with =)
- For the name: Pages needing WikiProject banner shells? I am terrible at this name stuff. –xenotalk 19:31, 14 April 2010 (UTC)
- That's one shell of a name. Killiondude (talk) 00:54, 15 April 2010 (UTC)
- Sounds like you're posting to the wrong page. --MZMcBride (talk) 18:51, 14 April 2010 (UTC)
Xeno, Im running a query that matches the first 2 criteria on your list. My guess that parsing for a section header won't be to hard for you. When the query is done, the output will save here and Ill make it pretty after that. Tim1357 (talk) 16:40, 21 April 2010 (UTC)
The SQL
|
---|
Create temporary table u_tim1357.temp (ttl varchar(230));
INSERT into u_tim1357.temp Select tl.page_title FROM page as tl
JOIN categorylinks on cl_to IN ('WikiProject_banners_with_quality_assessment','WikiProject_banners_without_quality_assessment') AND cl_from = tl.page_id
WHERE
tl.page_namespace = 10;
SELECT page_namespace, page_title FROM page
WHERE page_namespace IN (1,3,7,11,101)
AND
(SELECT COUNT(*) FROM templatelinks JOIN u_tim1357.temp ON tl_title = ttl WHERE tl_from = page_id AND tl_namespace=10) >3
AND NOT EXISTS (SELECT 1 FROM templatelinks WHERE tl_from = page_id AND tl_title = 'WikiProjectBannerShell' AND tl_from = page_id AND tl_namespace = 10);
|
- Thanks! Don't worry about making it pretty as long as it's an importable list. And yes, I can look for section headers easily enough. –xenotalk 16:59, 21 April 2010 (UTC)
- Grr it got stuck last night. I slapped a
LIMIT 500
on there for now, and restarted it. MZMcBride, can you find anything that I am doing wrong? Tim1357 (talk) 22:59, 22 April 2010 (UTC)- The 500 talk pages finished pretty quickly. Ill try 5,000 tonight. If it works, ill leave another link here. Cheers! Tim1357 (talk) 00:27, 23 April 2010 (UTC)
- Here are the first 5,000. Damn this list is big! Do you still want more? Tim1357 (talk) 18:27, 24 April 2010 (UTC)
- Let me run with this 5000 and I'll bother you when I need more. Thanks, –xenotalk 18:28, 24 April 2010 (UTC)
- Here are the first 5,000. Damn this list is big! Do you still want more? Tim1357 (talk) 18:27, 24 April 2010 (UTC)
- The 500 talk pages finished pretty quickly. Ill try 5,000 tonight. If it works, ill leave another link here. Cheers! Tim1357 (talk) 00:27, 23 April 2010 (UTC)
- Grr it got stuck last night. I slapped a
wmf: conversion
Could you run a report similar to [1], but excluding User:, User talk:, and /Archive pages? Thanks. MBisanz talk 19:15, 20 April 2010 (UTC)
Templates that cause citation errors
Templates that contain inline citations occasionally cause cite erros if transcluded in articles that do not contain {{Reflist}} or if the template was placed below a {{Reflist}}. AWB does not fix this error. The conditions are:
- Templates that contain inline citations, AND
- Transcluded in mainspace articles that display "Cite error"
example. Sole Soul (talk) 11:47, 21 April 2010 (UTC)
Not done Sorry, but it is impossible to use the database to find this information, as MzMcbride and I cannot do queries on the pages text. I would suggest requesting a dump scan instead. Thanks! Tim1357 (talk) 13:15, 21 April 2010 (UTC)- If this only concern the first condition then I may have an alternative suggestion, but if the the "Cite error" is also considered a text, then may be not. Sole Soul (talk) 13:40, 21 April 2010 (UTC)
- Category:Pages with broken reference names, Category:Pages with missing references list, and Category:Pages with incorrect ref formatting. If you want a report of these (I'm not sure why you would...), just let me know. --MZMcBride (talk) 15:49, 21 April 2010 (UTC)
- Cool, looks like I was wrong: I was not aware that there were maintenance categories that list pages with broken refs. I thought you wanted us to parse the text for broken refs, which we of course cannot do. Tim1357 (talk) 16:22, 21 April 2010 (UTC)
- Yes please, but I want to change my request slightly. I want a list of articles not templates. So, articles that appear in Category:Pages with missing references list and transclude these templates. If possible weekly report. Sole Soul (talk) 21:26, 21 April 2010 (UTC)
- Cool, looks like I was wrong: I was not aware that there were maintenance categories that list pages with broken refs. I thought you wanted us to parse the text for broken refs, which we of course cannot do. Tim1357 (talk) 16:22, 21 April 2010 (UTC)
The SQL
|
---|
Create TEMPORARY TABLE u_tim1357.temporary (template_name varchar (150));
INSERT INTO u_tim1357.temporary
SELECT page_title FROM page
JOIN categorylinks on cl_from = page_id AND cl_to = 'Pages_with_missing_references_list'
AND page_namespace = 0;
SELECT DISTINCT page_title, b.tl_title FROM page
JOIN u_tim1357.temporary on page_title = template_name
JOIN templatelinks as b on b.tl_from = page_id AND b.tl_namespace = 10
WHERE page_namespace=0
AND b.tl_title != 'Broken_ref'
AND b.tl_title != 'Reflist'
AND EXISTS(Select 1 from templatelinks as ttl JOIN page as tl on tl.page_id = ttl.tl_from WHERE ttl.tl_title = 'Reflist' and ttl.tl_namespace = 10 and tl.page_namespace = 10 and tl.page_title = b.tl_title);
|
- Thank you very much. Sole Soul (talk) 00:43, 22 April 2010 (UTC)
I have no intention of making this a regularly updated report as I don't understand Tim's output. I don't understand why there are two columns (article title and template title) and why there are results being listed multiple times. It's fairly trivial to obtain a list of pages in namespace 0 that are in Category:Pages with missing references list, if by no other means than by using yanker. --MZMcBride (talk) 05:42, 25 April 2010 (UTC)
request
Sorry but, I tried to use the scripts in italian wikipedia, I have an account on toolserver! what can I do for use this scripts? :( --.snoopy. 10:22, 23 April 2010 (UTC)
- Hi. You need to install wikitools and check out the git repository. Links are available here: Wikipedia:Database reports/Configuration. Let me know if you have any questions. --MZMcBride (talk) 18:36, 23 April 2010 (UTC)
- Hi, thanks for the reply, Ok, Yesterday morning I download wikitools and after I run this commands on toolserver:
- tar -xf wikitools-1.1.1.tar.gz
- and into wikitools-1.1.1's directory I run python setup.py install.
- but I can't use wikitool module in python on toolserver :|
- sorry but my english is not good enought --.snoopy. 18:56, 24 April 2010 (UTC)
- Hi, thanks for the reply, Ok, Yesterday morning I download wikitools and after I run this commands on toolserver:
There is no setup.py. Here's what you need to do:
Navigate to the directory where you want to put the wikitools folder. Then run this command:
$ svn co http://python-wikitools.googlecode.com/svn/trunk/wikitools/ wikitools
Then you need to modify your PYTHONPATH. Edit your .bash_profile:
$ nano ~/.bash_profile
Add the following line to it (where the path is wherever your wikitools/ directory is):
export PYTHONPATH="/home/<username goes here>/path/to/directory"
For me, this is "/home/mzmcbride/scripts" because my "wikitools" directory is in /home/mzmcbride/scripts.
Save the .bash_profile file and then you need to source it.
$ source ~/.bash_profile
You can check your PYTHONPATH by running this command:
$ echo $PYTHONPATH
Now you need to test that wikitools is working properly:
$ python -c 'import wikitools'
If this fails, paste the full error message onto this page.
You can also test wikitools by creating a file like this:
#! /usr/bin/env python
import wikitools
wiki = wikitools.Wiki()
print 'Success!'
Save this file as "wikitools-test.py" and then run:
$ python wikitools-test.py
It should output "Success!". If it doesn't, paste the full error message onto this page.
Hope that helps. --MZMcBride (talk) 04:55, 25 April 2010 (UTC)
- yes Now it's all OK. thanks a lot MZMcBride!! :-)) --.snoopy. 06:48, 26 April 2010 (UTC)
- sorry but I try to use one of your scrips, brokenredirects.py and insert itwiki_p where I find enwiki_p but in .my.cnf I found only my username dario_vet and the password, I don't find any configurations for my .snoopybot. --.snoopy. 07:59, 26 April 2010 (UTC)
- You need to rename "settings.sample" to "settings.py" and then modify it appropriately (example file here). Be sure to run "chmod 600 settings.py" and "chmod 600 settings.pyc" after modifying the file to prevent others from being able to read it. --MZMcBride (talk) 08:02, 26 April 2010 (UTC)
ok. I create fil ~/wikitools/setting.py
# Wiki login details
username = '.snoopybot.'
password = 'XXX'
apiurl = 'http://it.wikipedia.org/w/api.php'
# Toolserver details
host = 'login.toolserver.org'
dbname = 'itwiki_p'
# Report details (generic)
rootpage = 'Utente:.snoopy./Sandbox/bot'
editsumm = 'test'
# Report details (multi-page)
blankcontent = ''
blanksumm = 'none'
do you think is ok? --.snoopy. 08:41, 26 April 2010 (UTC) you speak about settings.pyc, but I don't have any settings.pyc, thanks for you help anyway --.snoopy. 08:55, 26 April 2010 (UTC)
- The file looks fine, except the "host" line and the "rootpage" line. The host line needs to be 'sql-s2'. The rootpage lines needs to end with "/". When you run brokenredirects.py (or any of the other scripts), it will create the settings.pyc file. --MZMcBride (talk) 12:07, 26 April 2010 (UTC)
- seems to be ok (link) if I copy line for line on a python shell on toolserver, but if I run python2.5 -c "import wikitool" wiki/brokenredirect.py don't work, I must write "import wikitool" in the command, I hope it's ok :| --.snoopy. 12:39, 26 April 2010 (UTC)
- You want "python ~/block/brokenredirect.py" --MZMcBride (talk) 15:05, 26 April 2010 (UTC)
- ok, now it's ok, but the script run only under home directory, not on ~/block, I thinks that is a problem with export, now my configuration for export is:
- You want "python ~/block/brokenredirect.py" --MZMcBride (talk) 15:05, 26 April 2010 (UTC)
- seems to be ok (link) if I copy line for line on a python shell on toolserver, but if I run python2.5 -c "import wikitool" wiki/brokenredirect.py don't work, I must write "import wikitool" in the command, I hope it's ok :| --.snoopy. 12:39, 26 April 2010 (UTC)
export PYTHONPATH="/home/dario_vet/wikitools"
the script run on ~ but not on ~/block thanks for help. --.snoopy. 15:24, 26 April 2010 (UTC)
- You could try setting your PYTHONPATH to "/home/dario_vet". Mine is "/home/mzmcbride/scripts" because my wikitools/ directory is inside the scripts folder. Your wikitools directory isn't. Hope that helps. --MZMcBride (talk) 15:26, 26 April 2010 (UTC)
- I move all file on /home/dario_vet/wikitools/block but not work, mmm I don't think where I done an error :| --.snoopy. 15:55, 26 April 2010 (UTC)
$ echo $PYTHONPATH
should output:
/home/dario_vet
If it does not, you need to modify your PYTHONPATH in ~/.bash_profile and then source the file (see above). --MZMcBride (talk) 16:04, 26 April 2010 (UTC)
Ok, now I think I have a directory tree like your, befoure I read wrong, now I have this situation:
dario_vet@nightshade:~/scripts$ echo $PYTHONPATH
/home/dario_vet/scripts
dario_vet@nightshade:~/scripts$
conferm path and wikitools is in script directory
dario_vet@nightshade:~/scripts$ pwd
/home/dario_vet/scripts
dario_vet@nightshade:~/scripts$ ls -l
total 4
-rwxr-xr-x 1 dario_vet users 2724 2010-04-26 19:51 broken.py
drwxr-xr-x 3 dario_vet users 1024 2010-04-26 19:57 wikitools
dario_vet@nightshade:~/scripts$
list file in wikitools:
dario_vet@nightshade:~/scripts/wikitools$ pwd
/home/dario_vet/scripts/wikitools
dario_vet@nightshade:~/scripts/wikitools$ ls -lsa
total 171
1 drwxr-xr-x 3 dario_vet users 1024 2010-04-26 19:57 .
0 drwxr-xr-x 3 dario_vet users 96 2010-04-26 20:06 ..
12 -rw-r--r-- 1 dario_vet users 11983 2010-04-26 19:55 api.py
11 -rw-r--r-- 1 dario_vet users 10795 2010-04-26 19:57 api.pyc
4 -rw-r--r-- 1 dario_vet users 3917 2010-04-26 19:55 category.py
4 -rw-r--r-- 1 dario_vet users 3765 2010-04-26 19:57 category.pyc
35 -rw-r--r-- 1 dario_vet users 35147 2010-04-26 19:55 COPYING
1 -rw-r--r-- 1 dario_vet users 226 2010-04-26 19:55 __init__.py
1 -rw-r--r-- 1 dario_vet users 361 2010-04-26 19:57 __init__.pyc
6 -rw-r--r-- 1 dario_vet users 5222 2010-04-26 19:55 pagelist.py
22 -rw-r--r-- 1 dario_vet users 22283 2010-04-26 19:55 page.py
22 -rw-r--r-- 1 dario_vet users 21751 2010-04-26 19:57 page.pyc
1 -rw------- 1 dario_vet users 321 2010-04-26 19:55 settings.py
1 drwxr-xr-x 6 dario_vet users 1024 2010-04-26 20:05 .svn
6 -rw-r--r-- 1 dario_vet users 5177 2010-04-26 19:55 user.py
6 -rw-r--r-- 1 dario_vet users 5521 2010-04-26 19:57 user.pyc
8 -rw-r--r-- 1 dario_vet users 7396 2010-04-26 19:55 wikifile.py
8 -rw-r--r-- 1 dario_vet users 7920 2010-04-26 19:57 wikifile.pyc
10 -rw-r--r-- 1 dario_vet users 9398 2010-04-26 19:55 wiki.py
12 -rw-r--r-- 1 dario_vet users 11351 2010-04-26 19:57 wiki.pyc
dario_vet@nightshade:~/scripts/wikitools$
now I start broken.py, the output is:
dario_vet@nightshade:~/scripts$ pwd
/home/dario_vet/scripts
dario_vet@nightshade:~/scripts$ python broken.py
Traceback (most recent call last):
File "broken.py", line 23, in <module>
import settings
ImportError: No module named settings
dario_vet@nightshade:~/scripts$
I try another time, and I go to bed. Thanks a lot for your help! --.snoopy. 20:11, 26 April 2010 (UTC)
- I replace settings with my variable and It's ok... he import correctly wikitools but not settings :-) --.snoopy. 20:20, 26 April 2010 (UTC)
- You want "settings.py" to be in the same directory as whatever file you're running is in. It should not be in the wikitools/ directory unless your other scripts are also in that directory. So, for example, "broken.py" is in /home/dario_vet/scripts. "settings.py" should also be in /home/dario_vet/scripts. Then it should import properly. --MZMcBride (talk) 06:36, 27 April 2010 (UTC)
- Ok, now is all ok. thanks very very much --.snoopy. 11:39, 27 April 2010 (UTC)
articles transcluding Infobox settlement without coordinates_display
Per Template talk:Infobox settlement#coordinates display= and Wikipedia:BOTREQ#add coordinates display= to Infobox settlement transclusions, we are looking to identify articles that
- Transclude Template:Infobox settlement
- Do not transclude Template:Coord/display/inline,title
- Do not transclude Template:Coord/display/title
Not sure if this belongs here or WT:DBR. I imagine the initial run can be taken from a dump, and future cases being in a database report. If possible, please prioritize over my above requests. –xenotalk 15:24, 4 May 2010 (UTC)
- This is trivial. I'll try to get to it tonight. You just want pages in ns:0? Do you have a report title in mind and an update frequency? --MZMcBride (talk) 22:19, 4 May 2010 (UTC)
- Articles lacking title coordinates? I think... Yes, just mainspace-thanks. –xenotalk 22:25, 4 May 2010 (UTC)
- You say "ugh," but someone has to come up with a report title. This isn't WP:DDR, after all! And I'm already doing all the heavy lifting! --MZMcBride (talk) 22:27, 4 May 2010 (UTC)
- My original name was the header - but I guess you didn't like it =) –xenotalk 22:31, 4 May 2010 (UTC)
- It seems like this job could be simple enough fot Catscan 2.0. However, the tool has yet to work for me. Tim1357 talk 23:28, 4 May 2010 (UTC)
- That thing looks cool. –xenotalk 23:32, 4 May 2010 (UTC)
- It seems like this job could be simple enough fot Catscan 2.0. However, the tool has yet to work for me. Tim1357 talk 23:28, 4 May 2010 (UTC)
- My original name was the header - but I guess you didn't like it =) –xenotalk 22:31, 4 May 2010 (UTC)
- You say "ugh," but someone has to come up with a report title. This isn't WP:DDR, after all! And I'm already doing all the heavy lifting! --MZMcBride (talk) 22:27, 4 May 2010 (UTC)
- Articles lacking title coordinates? I think... Yes, just mainspace-thanks. –xenotalk 22:25, 4 May 2010 (UTC)
65,000 results. Do you want them in a text file or something? --MZMcBride (talk) 02:00, 5 May 2010 (UTC)
- Awesome! Text file is fine–xenotalk 02:01, 5 May 2010 (UTC)
- tools:~mzmcbride/xeno-settlement-coord-2010-05-04.txt --MZMcBride (talk) 02:03, 5 May 2010 (UTC)
- Thanks again. This is a huge help. AWB was choking trying to compare a 200,000 list to a 356,000 one =) [I wonder if JIRA is fixed yet...] –xenotalk 02:05, 5 May 2010 (UTC)
- No problem. And yes. --MZMcBride (talk) 02:07, 5 May 2010 (UTC)
- Thanks again. This is a huge help. AWB was choking trying to compare a 200,000 list to a 356,000 one =) [I wonder if JIRA is fixed yet...] –xenotalk 02:05, 5 May 2010 (UTC)
- tools:~mzmcbride/xeno-settlement-coord-2010-05-04.txt --MZMcBride (talk) 02:03, 5 May 2010 (UTC)
Report Suggestion : Orphaned Fair Use
List all media in File: namespace having a 'Fair Use' tag but which is not indicated as actually being used in an article.
Basic Query suggested by carl-m on the toolserver IRC channel: select concat('File:', page_title) from page join categorylinks on page_id = cl_from where page_namespace = 6 and cl_to = 'All_non-free_media' and not exists (select 1 from imagelinks where il_to = page_title) limit 4;
Sfan00 IMG (talk) 13:44, 9 May 2010 (UTC)
- My bot, DASHBot does this every night. I could have the list be saved somewhere, but I see no point because the bot tags them immediately. Tim1357 talk 16:11, 9 May 2010 (UTC)
- Done: Unused non-free files (configuration). --MZMcBride (talk) 17:48, 9 May 2010 (UTC)
Request: ' Free' file lacking {{information}}
Should be a relativly straightforward report..
Find and display by date all files that are in Category:All free media, but do not currently have a {{information}} block.
Reason for the request is to further identify items for Commons and resolve some of the problems before they get transfered (and deleted :( ).
Sfan00 IMG (talk) 11:23, 14 May 2010 (UTC)
- You don't need a database report becase you can do it with CatScan. Tim1357 talk 17:17, 14 May 2010 (UTC)
- Well, CatScan is being stupid so I went ahead and made the list for you. Mind you its friggin' huge. Tim1357 talk 20:47, 14 May 2010 (UTC)
- Thanks :) Looks Like a considerable backlog to clear :) Sfan00 IMG (talk) 15:55, 19 May 2010 (UTC)
PRODed pages with deletion logs
I think we could use to have a daily report of all pages in Category:All articles proposed for deletion (alternatively, all articles containing a {{dated prod}} tag), which have deletion logs. Frequently, these are pages are recreated versions of previously PRODed pages, where the recreation is contesting the previous deletion. עוד מישהו Od Mishehu 13:37, 31 May 2010 (UTC)
- I don't yet have approval to have a bot update a list, but I have a list that update every night here. Tim1357 talk 22:04, 31 May 2010 (UTC)
- Im working on making them clickable links. Tim1357 talk 01:47, 1 June 2010 (UTC)
Dab pages with two or less entries
Once/month DBR reporting all pages with {{disambiguation}} but have two or fewer wikilinks. Thanks, —fetch·comms 02:03, 3 June 2010 (UTC)
- Do you realize that all pagelinks count and that there's no way to separate meta-data from legitimate page content? For example, the {{disambiguation}} template alone will +1 all counts (due to [[Help:Disambiguation|disambiguation]]). --MZMcBride (talk) 02:18, 3 June 2010 (UTC)
- I limited the count of the page_links to only links to the main namespace. Do you want a regular report? Tim1357 talk 03:55, 3 June 2010 (UTC)
The First 200
|
---|
mysql> select page_title -> from page -> join templatelinks on tl_from = page_id and tl_namespace = 10 and tl_title = 'Disambiguation' -> where page_namespace=0 -> and (select count(*) from pagelinks where pl_from = page_id and pl_namespace = 0) <= 2 limit 200; +-------------------------------------------+ | page_title | +-------------------------------------------+ | Transport_phenomena | | Solute | | Orientalist | | Indrani | | Web_TV | | Shirako | | SDK_(disambiguation) | | War_of_Religion | | Algazel | | Komunistická_strana_Slovenska | | Frente_Amplio | | Leopoldville | | Southfield | | Braces | | Partido_dos_Trabalhadores | | Harmodius | | United_Telecommunications | | Mouth_harp | | Zhang_Bao | | Trove | | Vara_(length) | | Nikita_Sergeyevich_Khrushchev | | Realistic | | Bolckow | | Open-loop | | Beos | | Front_Populaire | | Honey_berry | | Ackerly | | Sfls | | LPH | | Van_de_wetering | | Quality_engineering | | Voron | | DSDP | | OFI | | Det_Liberale_Folkeparti | | Battle_of_Fei | | Basting | | Fanmail | | ITRF | | Acción_Popular | | English_Wars | | Marxist-Leninist_Party_of_Canada | | Pocket_Hercules | | Sleeping_pills | | Mavado | | CPI(ML) | | Lord_Buckhurst | | Front_de_Libération_Nationale | | Susan_Moore | | Unión_Republicana | | Partido_Acción_Ciudadana | | Tarentel | | Tripling | | Eidi | | Rehab | | KMDB | | Demokrat_Parti | | F-Zero:_GP_Legend | | Bouzoukia | | Abyssinians | | Sakkara | | Juk | | Record_(audio) | | Funky_Chicken | | Ancient_Chinese | | Harikrishna | | Globo | | Garey | | Palkar | | Iverson_notation | | Access_Point | | Studentification | | Chogye | | Oligo | | Derechos | | B58 | | Bradley_Field | | Murren | | Jatni | | IFBB | | Cordillera_Mountains | | Fleming_Museum | | Amusement_arcade | | PZP | | Hakkinen | | Aggtelek | | Luminis | | Mosk | | Lask | | Banjo_Frog | | ESDS | | New_Liberalism | | Party_of_the_Left | | Morris_and_Company | | Dosco | | Ponoka | | Mercantile_capitalism | | Logical_Volume_Manager | | Indian_Tobacco | | Winlock | | Perfect_society | | Tetu | | CPIF | | Microsoft.net | | OneTel | | Banana_spider | | ECWA | | YYT | | Longwang | | Tube_frame | | Wilson_Administration | | Hard_metal | | Fishing_Party | | YLD | | Rabbit_fever | | ENSAM | | MIAT | | ARPS | | S85 | | Talkshow | | Vivandière_(disambiguation) | | National_football_team | | Scarlet_Knight | | Lynnfield | | Macke | | Jōshū | | The_Ink_and_Paint_Club | | Communist_Party_of_Spain_(international) | | Unity_List | | Orok | | AIYF | | Clinching | | CYRL | | XED | | Dwight_Baldwin | | Combat_mission | | Religious_drama | | V21 | | EUB | | Balot | | Big_Thinkers | | Partido_Progressista | | Kalenda | | Information_filter | | CPTP | | ^^ | | MLPC | | BADD | | G100 | | Menswear | | S._S._Stevens | | Rock_102 | | Nonanol | | SURS | | Windows_4.0 | | W16 | | The_Man_Comes_Around | | Bundesgericht | | Sam. | | Vedic_University | | Gunkanjima | | RSLC | | Paulinerkirche | | Drop_(album) | | Bezhta | | Treaty_of_Dayton | | OMLE | | Bishi | | IIUC | | NVM | | Posterior_intercostal | | B29 | | Parti_Socialiste_Unifié | | BJCC | | Sea_Holly | | UNLP | | Red_Shirt_Movement | | Cruiserweight_Championship | | D_Series | | Academy_for_Jewish_Religion | | Rasse | | Margareta_Eriksdotter | | W27 | | Well-tempered | | Helen_Gould | | Movimiento_Socialista_de_los_Trabajadores | | NAMB | | Mazandarani | | LTUC | | Cordis | | Partido_Social-Democrata | | VLK | | Sagal_Twins | | Serious_Fraud_Office | | EESI | | Isildur's_Bane | | MTUC | | NSIC | +-------------------------------------------+ 200 rows in set (5.75 sec) |
- Oh, duh. Of course you do. Ill set up a regular report. Tim1357 talk 03:57, 3 June 2010 (UTC)
Something is broken with that report. The pasted results seem fine, so you must've used a different query for the automated report. The results are simply wrong in the automated report.
There should probably be a better way of adding new reports.... --MZMcBride (talk) 05:49, 3 June 2010 (UTC)
- Oh, fixed. Tim1357 talk 22:05, 3 June 2010 (UTC)
- Thanks, I'll try to deal with the pages sometime. Didn't realize there would be 500. —fetch·comms 02:41, 4 June 2010 (UTC)
- It's capped at 500, there could be many more than that. Tim1357 talk 03:55, 4 June 2010 (UTC)
- Here is the full list (a bit over 1,000). I just thought 500 would be the most practical. Tim1357 talk 03:57, 4 June 2010 (UTC)
- It's capped at 500, there could be many more than that. Tim1357 talk 03:55, 4 June 2010 (UTC)
- Thanks, I'll try to deal with the pages sometime. Didn't realize there would be 500. —fetch·comms 02:41, 4 June 2010 (UTC)
- Oh, fixed. Tim1357 talk 22:05, 3 June 2010 (UTC)
I've been collecting MOS:DAB issues, numbered according to Josh Parris's list, when people use the Dab solver. The file is TSV formatted, and best worked with using filtered tables in Excel and then converted to wiki code in WikEd. The revision id (which increases with time, hint) is included so, in principle, we can prune updated pages. However, I will need to redo it to gather the number occurrences of each issues; this allows identifying pages which are in a particular bad state. — Dispenser 05:59, 4 June 2010 (UTC)
Reviewers
It seems the flagged protection trial is planned for implementation next week, so it will be useful to have a list of candidates for the reviewer usergroup. So I ask for an update of Wikipedia:Database reports/Potential reviewer candidates (with links). Since it's kinda lots of users, we should probably use the 'most obvious first' strategy, by using higher standards for the first few reports. I suggest 2,500 edits, their first edit more than a year ago, and their latest edit within the past month for now. It can be one-time for now then daily or every 2-3 days when we start granting the rights. Cenarium (talk) 15:10, 9 June 2010 (UTC)
- Easy enough to update the report. Personally, I am curious about this, though. I thought RobLa, et al. had declared that all autoconfirmed users were getting review rights. Did that change or am I mistaken? --MZMcBride (talk) 16:33, 9 June 2010 (UTC)
- I believe that's been quashed. –xenotalk 16:35, 9 June 2010 (UTC)
- I'm running the updated report now. It may fail due to such a large page size (things like the motherfucking AbuseFilter like to kill large edits, even if they're completely legitimate). If it fails, I can add pagination easily enough or we can limit output, whichever. --MZMcBride (talk) 16:38, 9 June 2010 (UTC)
- One way to reduce the size would be to exclude: rollbackers, account creators, autoreviewers, these people are likely trusted enough by virtue of those rights to receive the right implicitly or explicitly. –xenotalk 16:43, 9 June 2010 (UTC)
- Cenarium: Do you have a preference between pagination or limitation? Putting all the results on one page simply isn't feasible. --MZMcBride (talk) 20:50, 10 June 2010 (UTC)
- Ok, we probably need to do several reports. The reports should of course exclude admins and users in the reviewer usergroup, when it comes live. A possibility is
- a report listing users in the rollbacker, account creator or autoreviewer usergroups whose latest edit is within the past month
- a report listing users with X edits, their first edit more than Y ago, and their latest edit within the past month, and who are not in any of the above usergroups; where X and Y are gradually downgraded as we grant the rights. We can start with X=2,500, Y= 1 year, and if there are still too many, then paginate. Cenarium (talk) 23:57, 10 June 2010 (UTC)
- Haven't all kinds of auto-promotion been rejected by the community? Amalthea 13:34, 11 June 2010 (UTC)
- Autopromotion by the software, yes. But admins will make their own determination when parsing the lists. Cenarium (talk) 15:52, 11 June 2010 (UTC)
- Ah, using the lists as strong suggestions is of course fine. It should have actually read your initial post, sorry about that. Amalthea 16:01, 11 June 2010 (UTC)
- Autopromotion by the software, yes. But admins will make their own determination when parsing the lists. Cenarium (talk) 15:52, 11 June 2010 (UTC)
- I think one report is fine as the proposed report 1 is naturally a subset of the proposed report 2. Adding a "Groups" column that lists the user's current user groups might be nice, though. --MZMcBride (talk) 20:17, 11 June 2010 (UTC)
- I'll also be implementing an "Exceptions" subpage for users who wish to not be listed (and therefore not be in the "reviewer" user group), like me. --MZMcBride (talk) 20:31, 11 June 2010 (UTC)
- Thanks, it has been requested at AN too. Cenarium (talk) 03:50, 12 June 2010 (UTC)
This should be done now. The report isn't particularly cheap to update, though. --MZMcBride (talk) 22:28, 12 June 2010 (UTC)
- Indef-blocked editors are currently on the list, would of course be best if they can be excluded.
But what's happening now? If you need me to mass-add people to the group, say the word. Personally I'd just add everyone from the lists and deal with hypothetical false positives once they are reported, but I'd certainly go ahead and add all rollbackers, autoreviewers, and account-creators since they all previously had someone look at them before. Amalthea 09:32, 13 June 2010 (UTC)- The vast majority of users probably meet the (not yet established) criteria. There were a suggestion that users should be recently sufficiently active, so more than one edit in the latest month, how many subject to interpretation, and recent blocks should also be considered. Maybe users who had not requested the right should be given some explanatory notice, so either a talk page notice, or we wait until the trial begins where we'll have the sitnotice. Cenarium (talk) 04:09, 14 June 2010 (UTC)
- Hmm, if people are still not happy with any kind of fixed criteria then I'll hold still for now (won't get around to do any sanity checks myself the next few days). But as indicated, feel free to give me a list of usernames and I'll semi-automatically put them into the group. Amalthea 09:48, 14 June 2010 (UTC)
- The vast majority of users probably meet the (not yet established) criteria. There were a suggestion that users should be recently sufficiently active, so more than one edit in the latest month, how many subject to interpretation, and recent blocks should also be considered. Maybe users who had not requested the right should be given some explanatory notice, so either a talk page notice, or we wait until the trial begins where we'll have the sitnotice. Cenarium (talk) 04:09, 14 June 2010 (UTC)
I manually updated this list this morning. As expected, there was a large drop in the output. --MZMcBride (talk) 15:19, 16 June 2010 (UTC)
Images uploaded from Flickr without renaming
Would it be possible to generate a weekly report of English Wikipedia files whose names follow, more or less, the regex /^[0-9]{7,11} [0-9a-f]{10}.*\.jpe?g$/
(don't count the slashes, case-sensitive is OK, and convert to SQL format as appropriate). This would be very useful to identify both files that need renaming and also copyvios (you can prepend http://www.flickr.com/photo.gne?id= to the first space-separated part to get the original description page including licensing information). Thanks, PleaseStand (talk) 05:22, 12 June 2010 (UTC)
- Simple enough to do; this is pretty much the same methodology that User categories (configuration) uses. Do you want to query Commons as well? Or avoid Commons images? How often do you want the report updated? And do you have a good report title in mind? For the link bit, I'll probably just use a wrapper template that you can customize to meet your needs. --MZMcBride (talk) 05:51, 12 June 2010 (UTC)
- Commons images should be in a separate section. Both reports should be run weekly. A good report title would be "Files with Flickr-generated filenames". PleaseStand (talk) 06:16, 12 June 2010 (UTC)
- To clarify: this large backlog (much from 2006 and 2007) remains from before such filenames were title-blacklisted, so the backlog should not be growing. I am not sure how many such filenames exist on Commons, but I know many exist herePleaseStand (talk) 06:29, 12 June 2010 (UTC)
- Commons images should be in a separate section. Both reports should be run weekly. A good report title would be "Files with Flickr-generated filenames". PleaseStand (talk) 06:16, 12 June 2010 (UTC)
- Teach a man to fish... http://toolserver.org/~nikola/grep.php?pattern=%5E%5B0-9%5D%7B7%2C11%7D+%5B0-9a-f%5D%7B10%7D.*%5C.jpe%3Fg%24&lang=en&wiki=wikipedia&ns=6 --MZMcBride (talk) 14:10, 4 July 2010 (UTC)
- Shit, sorry, I just noticed it's been over two weeks since I looked at this. The month flew by. Silly life, always getting in the way. --MZMcBride (talk) 14:12, 4 July 2010 (UTC)
- Thank you! I never knew that there was an article title grep program on the toolserver. It is good that the list is manageably short. PleaseStand (talk) 14:31, 4 July 2010 (UTC)
Edit filters warnings effectiveness
I'm not sure if this is doable. A report that list edit filters with "warn" as action (exclude filters that have "disallow" in addition). For each of the filters 1) the number of edits that produced warning in the last 30 days 2) the portion of those edits that were not saved after the warning 3) calculate the percentage. Thank you. Sole Soul (talk) 23:57, 21 June 2010 (UTC)
- I suppose you want number of edits (not users) that produced warning, right? Also, I'm not sure how to properly decide whether an edit actually followed a warning. But I think something like “warned user made some edit to the page he was warned about during one minute after that warning” should be pretty accurate. Svick (talk) 00:26, 22 June 2010 (UTC)
- Yes, that's what I meant, corrected. I think the warning filters log the edit initially with the action "warn", and a second time when saved with action "none" or "tag" etc. See this for example. Sole Soul (talk) 00:44, 22 June 2010 (UTC)
- Ok, I posted the result at User:SvickBOT/Edit filter effectiveness. If you find any inaccuracies, let me know. Svick (talk) 04:12, 22 June 2010 (UTC)
- Wow, that was fast. Thank you very much. The filters that warn and disable like 9 and 10 show 100% effectiveness, which is inaccurate. Apparently these filters warn users and then disallow them if they choose to save (which really doesn't make sense, either warn the user or disallow him). Thank you again. Sole Soul (talk) 05:27, 22 June 2010 (UTC)
- I added a column for warning messages links to the report. Sole Soul (talk) 06:31, 22 June 2010 (UTC)
- I removed the filters that warn and disallow, like you said, it didn't provide any useful data. BTW, it's usually not a good idea to edit page that is generated in some automatic way – the next time it is generated, your changes will be lost (I added the new column to the program that generates the table). Also, if you wanted, I could update the table periodically or on demand. Svick (talk) 13:53, 22 June 2010 (UTC)
- Yes, that would be great, may be every month. Sole Soul (talk) 21:43, 22 June 2010 (UTC)
- Ok, done. Svick (talk) 22:33, 22 June 2010 (UTC)
- Yes, that would be great, may be every month. Sole Soul (talk) 21:43, 22 June 2010 (UTC)
- I removed the filters that warn and disallow, like you said, it didn't provide any useful data. BTW, it's usually not a good idea to edit page that is generated in some automatic way – the next time it is generated, your changes will be lost (I added the new column to the program that generates the table). Also, if you wanted, I could update the table periodically or on demand. Svick (talk) 13:53, 22 June 2010 (UTC)
- Very interesting report! One highlight....linkspamers ignored the warning. only 22% effective. --Rocksanddirt (talk) 22:41, 22 June 2010 (UTC)
- Ok, I posted the result at User:SvickBOT/Edit filter effectiveness. If you find any inaccuracies, let me know. Svick (talk) 04:12, 22 June 2010 (UTC)
- Yes, that's what I meant, corrected. I think the warning filters log the edit initially with the action "warn", and a second time when saved with action "none" or "tag" etc. See this for example. Sole Soul (talk) 00:44, 22 June 2010 (UTC)
Non free images without rationales
Is it possible to exclude images that have the following wording :
"It is held that the use of copyrighted images of individuals that are no longer living, are inherently unrepeatable, and where no new free-use image is available is permitted under US copyright law governing the fair use of copyrighted properties, and Wikipedia fair use policies. Its use is solely for the purpose of illustration and in no way limits the copyright holders rights."
in a permission field of an {{information}} template? Sfan00 IMG (talk) 09:58, 4 July 2010 (UTC)
- Sure, done. --MZMcBride (talk) 04:51, 13 July 2010 (UTC)
Non Free Images without Rationales (exclude from report)
Exclude from search-
- {{Short-Rationale}}
- {{Standard-Rationale}}
- {{MTG set symbol}}
- {{Non-free Wikimedia logo}} (and no other non-free tag)
This would assist greatly in finding the images that are actually without rationales (as opposed to having non-standard/non-templated ones.) Sfan00 IMG (talk) 10:36, 6 July 2010 (UTC)
- I excluded the first three. I'm not sure about the last one, mostly due to this thread on my talk page. --MZMcBride (talk) 04:52, 13 July 2010 (UTC)
- Yes, it's technically non-free, but it is treated as free- we are allowed to use it in the userspace and rationales are not required. J Milburn (talk) 12:03, 14 July 2010 (UTC)
Related
The configuration for the report is available here: Wikipedia:Database reports/Non-free files missing a rationale/Configuration
If you search in your browser for "fair_use_strings", you can see the strings that the bot looks for to determine whether a file description page of a non-free file uses a proper rationale. --MZMcBride (talk) 06:10, 13 July 2010 (UTC)
Unprotected Templates by Number of Transclusions
Hello, It would be nice to see a list of templates that are not protected, sorted by the number of transclusions they have. That way some sysops could go through and protect the higher risk templates. Tim1357 (talk) 06:54, 17 January 2010 (UTC)
- Would an extra column at Templates with the most transclusions (configuration) be sufficient or would you like a separate report? --MZMcBride (talk) 04:17, 11 March 2010 (UTC)
- A column would probably be ok, but if you sorted by protection status would it sub-sort by number of transclusions? –xenotalk 14:51, 23 March 2010 (UTC)
- Yeah, there may actually be an advantage to creating a separate report. Not the sorting issue, necessarily, but it would drastically cut down on the noise of the new report and allow one to be purely statistical (the current report) while the other could be more production (the new report). Though I'm not a huge fan of using template protection, so I'd want to avoid any mass protections simply due to being listed.... In the meantime, Indefinitely protected templates without many transclusions (configuration) desperately needs some love. --MZMcBride (talk) 15:06, 23 March 2010 (UTC)
- Gah! Can you exclude subpages of Template:Convert? Some of these are also protected because they are "frequently substituted"... Maybe a filter for that might be worthwhile as well. –xenotalk 15:22, 23 March 2010 (UTC)
- Yes, I can, but I'm not really inclined to. They're sorted alphabetically, just skip over them if you'd like. Or unprotect them, as they're not widely used and their protection is silly. --MZMcBride (talk) 05:24, 25 April 2010 (UTC)
- Gah! Can you exclude subpages of Template:Convert? Some of these are also protected because they are "frequently substituted"... Maybe a filter for that might be worthwhile as well. –xenotalk 15:22, 23 March 2010 (UTC)
- Yeah, there may actually be an advantage to creating a separate report. Not the sorting issue, necessarily, but it would drastically cut down on the noise of the new report and allow one to be purely statistical (the current report) while the other could be more production (the new report). Though I'm not a huge fan of using template protection, so I'd want to avoid any mass protections simply due to being listed.... In the meantime, Indefinitely protected templates without many transclusions (configuration) desperately needs some love. --MZMcBride (talk) 15:06, 23 March 2010 (UTC)
- A column would probably be ok, but if you sorted by protection status would it sub-sort by number of transclusions? –xenotalk 14:51, 23 March 2010 (UTC)
- Need to pick a report name for this and come up with a limit for number of transclusions. I guess it only matters if editing is sysop-only. Nobody should care about move protection, surely. --MZMcBride (talk) 05:24, 25 April 2010 (UTC)
While I don't necessarily object to this report, it may be sensible to change the cut-off point given the mess it has inadvertantly caused. Looking at the lower end of the report, do 501 transclusions really constitute "many"? PC78 (talk) 13:29, 13 September 2010 (UTC)
- What limit would you suggest? --MZMcBride (talk) 19:17, 13 September 2010 (UTC)
- 1000? Even that may be too low, but it would be enough to cut the report (as it is now) in half. Ultimately it depends on what this report is to be used for. I'll raise the question over at the WP:ANI discussion. PC78 (talk) 00:10, 14 September 2010 (UTC)
Potentially Commons
Would it be possible to have a report that lists Files in Category:All free media which are not currently in Category:Copy to Wikimedia Commons so that they may be more readily identified?
Someone called 'erwin' on the #Wikimedia-toolserver IRC, suggested the following via a pastebin,
1.SELECT page_title
2. FROM page
3. JOIN categorylinks
4. ON cl_from = page_id
5. WHERE page_namespace = 6
6. AND cl_to = 'All_free_media'
7. AND NOT EXISTS (SELECT *
8. FROM templatelinks
9. WHERE tl_from = page_id
10. AND tl_namespace = 10
11. AND tl_title IN ('Nocommons', 'Not commons')
12. )
13. LIMIT 10
but they pointed out that it's only a basic starting point and would need a LOT of tweaking before it's usuable for the intended purpose of finding Commons candidate material that is not currently tagged as such.
It would need to exclude Files already in Category:Copy to Wikimedia Commons, as well as images with PUI, No license or No source tags etc
Sfan00 IMG (talk) 11:31, 18 January 2010 (UTC)
- Here are the first 100 results of pages in the File namespace that are not in Category:Copy to Wikimedia Commons or Category:Wikipedia files on Wikimedia Commons:
Extended content
|
---|
mysql> SELECT
-> page_title
-> FROM page
-> JOIN categorylinks
-> ON cl_from = page_id
-> WHERE page_namespace = 6
-> AND cl_to = 'All_free_media'
-> AND NOT EXISTS (SELECT
-> 1
-> FROM categorylinks AS c2
-> WHERE c2.cl_from = page_id
-> AND c2.cl_to = 'Copy_to_Wikimedia_Commons')
-> AND NOT EXISTS (SELECT
-> 1
-> FROM categorylinks AS c3
-> WHERE c3.cl_from = page_id
-> AND c3.cl_to = 'Wikipedia_files_on_Wikimedia_Commons')
-> LIMIT 100;
+-----------------------------------------------------+
| page_title |
+-----------------------------------------------------+
| SalamancaMarkets.JPG |
| 1952_UMD_Navy_game.JPG |
| Sean_Theriault.png |
| BiharDistricts.svg |
| Gulf_of_Suez_from_orbit_2007.JPG |
| Irish_immigrants_1909.jpg |
| Eiseleyat15.jpg |
| Stmc25logo.png |
| Panela.JPG |
| Assamfish.jpg |
| Putumayam.jpg |
| Chappati.jpg |
| LcMain2.jpg |
| The_world_flag_flying.jpg |
| Lemar_image.jpg |
| MPLGelgel.gif |
| Rock_Ethos_Band.jpeg |
| Stalin_Hitler_photomontage.png |
| Rock_Ethos_Band_1.jpeg |
| Bungee_Jumping_at_Rock_Ethos_2008.jpeg |
| Stage_at_Rock_Ethos_2008.jpeg |
| Crowd_at_Rock_Ethos_2008.jpeg |
| Leo_Griffin_1912.jpg |
| CIS_logo.png |
| Rev_22_Scratch_Race.jpg |
| Vectorsoliton_polarization_rotation_in_2009.jpg |
| Mark_Wrighton.jpg |
| Spectraliquid.jpg |
| Vectorsoliton_High-order_in_2009.jpg |
| Painscastle_rads.png |
| Vectorsoliton_origin_in_2009.jpg |
| Jefflocke.JPG |
| Aphrodite_fountain.jpg |
| WADS.jpg |
| Dougflutie.JPG |
| HarthamWeirFloodStage.JPG |
| W_Thomas_West.jpg |
| Inscription_Grey_Monument.jpg |
| Hiram_Bell.jpg |
| Richgarces.JPG |
| Writeratworkbe.jpg |
| Newradnor_rads.png |
| Rinuccini.JPG |
| Middle_eastren_warrior.jpg |
| Popcornstand.JPG |
| Paso-IceCream-AmigaVsPC.png |
| Ergm_medium.gif |
| RO_licence_front.jpg |
| Dwightevans.JPG |
| RO_licence_back.jpg |
| Erasers.jpg |
| F-105-354-333tfs-kc135-takhli.jpg |
| F-105-4fw-mycoy-1962.jpg |
| F-105-b66-takhli.jpg |
| Fredlynn.JPG |
| F-105-george-nov73.jpg |
| F-105D-62-4375-12thTFS-18thTFW-Kadena-18May1971.jpg |
| Rhayader_rads.png |
| F-105G-63-8291-17wws.jpg |
| F-105G-63-8319-561tfs.jpg |
| Romanian_Identity_Card_2009.jpg |
| F-105d-36thtfw-bit-1962.jpg |
| F-105d-44tfs-1965.jpg |
| F-105d-562tfs-thai.jpg |
| F-105f-49tfw-spang.jpg |
| Richardsonbill.JPG |
| F-105g-561tfs-mc.jpg |
| F-105s-4thtfw-sj.jpg |
| F-105s-george-1975.jpg |
| F-105s-takhli-1964.jpg |
| F102_F86D.JPG |
| Hodgenville_Lincoln_statue.jpg |
| Blackwolddog.jpg |
| F000418.jpg |
| F0201002.gif |
| F0201005.gif |
| BPL-Main.jpg |
| F-89_Scorpion_Crash_IAE_Detroit_1952.jpg |
| F-111-27tfw-cannon.jpg |
| F-111-f-4-kc135-takhli-1974.jpg |
| F-111a-67-081-429tfs-takhli.jpg |
| F-111f-mountainhome.jpg |
| Sony_SRS-17.jpg |
| F_ertel_carlyle.jpg |
| F-117-acc.jpg |
| F-15D-79-008-525tfs-bit.jpg |
| F-15D-80-082-52FW-Spang.jpg |
| Knighton_rads.png |
| FAA_laser-free-zone.jpg |
| F-5E-ClarkAB.jpg |
| Richardibest.jpg |
| F-82e-27fw-kearnyne.jpg |
| F-82-b-29-kearneyne.jpg |
| Smoothed_ripple.svg |
| TeDeumIllust.jpg |
| World_happiness.png |
| Matpandm.jpg |
| Zorah_on_the_Terrace.jpg |
| Atelier_rouge_matisse_1.jpg |
| Matissetoits.gif |
+-----------------------------------------------------+
100 rows in set (0.04 sec)
|
- Please review this list and provide feedback regarding its results. Thanks. --MZMcBride (talk) 05:33, 25 April 2010 (UTC)
- So far it's looking good :), I'm tagging the problems I find as well :) Sfan00 IMG (talk) 11:39, 25 April 2010 (UTC)
- Is this still wanted? Is there a report name in mind? A report update frequency? --MZMcBride (talk) 19:25, 24 October 2010 (UTC)
- Belated followup - This can't proceed until some other issues relating to finding images tagged for Commons that don't have
recognisable license tags are resolved. Sfan00 IMG (talk) 13:49, 5 January 2011 (UTC)
Stale report?
Wikipedia:Database reports/Indefinitely semi-protected articles hasn't been updated since late October? –xenotalk 14:34, 1 March 2010 (UTC)
- Yar, it got to be too big for a single page and paginating it is an issue as it divides the content into redirects and non-redirects. This isn't the only broken report, see below. --MZMcBride (talk) 19:33, 9 March 2010 (UTC)
- What about splitting the reports? (Indefinitely semiprotected redirects). –xenotalk 19:35, 9 March 2010 (UTC)
- That might be an option, though it has two issues: (1) it would make the semi-protected report inconsistent with the fully protected report (though I guess that report could use some pagination too, heh), and (2) it would mean that the non-redirect report would use something clunky like "Indefinitely semi-protected non-redirects" or "Indefinitely semi-protected articles", neither of which are very nice. --MZMcBride (talk) 19:43, 9 March 2010 (UTC)
- Good god you're obsessive-compulsive like me, aren't you? ;p The two reports (Indefinitely semi-protected (articles|redirects)), when listed side-by-side, should be self-explanatory. Alternatively the landing page could disambiguate the viewer. –xenotalk 19:57, 9 March 2010 (UTC)
- That might be an option, though it has two issues: (1) it would make the semi-protected report inconsistent with the fully protected report (though I guess that report could use some pagination too, heh), and (2) it would mean that the non-redirect report would use something clunky like "Indefinitely semi-protected non-redirects" or "Indefinitely semi-protected articles", neither of which are very nice. --MZMcBride (talk) 19:43, 9 March 2010 (UTC)
- What about splitting the reports? (Indefinitely semiprotected redirects). –xenotalk 19:35, 9 March 2010 (UTC)
Uncatecorized and Unreferenced BLPs
Could someone make a list of articles in Category:All unreferenced BLPs that are in no categories when one ignore all maintnance categories, Category:Living people and Cat:XXXX births. Rettetast (talk) 19:50, 22 February 2010 (UTC)
- I made the above request at WP:BOTREQ a few weeks ago, but nothing happened. This is probably a better venue.
- I have chosen to only do Unreferenced BLPs at the moment, but the report could be expanded to do all articles. It depends on how large the output will be. Also articles in Category:Category needed is already in a queue and does not need to be listed on this report. Rettetast (talk) 11:30, 15 March 2010 (UTC)
- Example page? --MZMcBride (talk) 16:43, 15 March 2010 (UTC)
- 1, 2, 3, 4, 5, 6, 7, 8. Rettetast (talk) 17:27, 15 March 2010 (UTC)
- Thanks for the examples. I spent about an hour working on this, but I think I need to re-think how to do this. --MZMcBride (talk) 23:40, 15 March 2010 (UTC)
- Done: Uncategorized and unreferenced biographies of living people (configuration). How often do you want the report updated? --MZMcBride (talk) 21:29, 18 March 2010 (UTC)
- Thank you. Once a week is enough. I'll leave a note here if I need the report too be expanded too all articles and not just the unreferenced BLPs. Rettetast (talk) 22:19, 18 March 2010 (UTC)
- Done: Uncategorized and unreferenced biographies of living people (configuration). How often do you want the report updated? --MZMcBride (talk) 21:29, 18 March 2010 (UTC)
- Thanks for the examples. I spent about an hour working on this, but I think I need to re-think how to do this. --MZMcBride (talk) 23:40, 15 March 2010 (UTC)
- 1, 2, 3, 4, 5, 6, 7, 8. Rettetast (talk) 17:27, 15 March 2010 (UTC)
- Example page? --MZMcBride (talk) 16:43, 15 March 2010 (UTC)
Could the report be expanded too all Living people? And please exclude articles that are tagged with {{catimprove}}. The latter should also be done to Wikipedia:Database reports/Potential biographies of living people (4). Rettetast (talk) 11:38, 7 April 2010 (UTC)
- I'm not sure expanding this report would be accurate. If you use Category:Living people instead of Category:All unreferenced BLPs, the report title would be "Uncategorized biographies of living people," not "Uncategorized and unreferenced biographies of living people." Presumably the biographies not in Category:All unreferenced BLPs are referenced. Would a separate report work?
- I'm also not sure why {{cat improve}} should be applied to Potential biographies of living people (4) (configuration). The goal of this report is to find pages that need Category:Living people. Can you elaborate? --MZMcBride (talk) 12:01, 7 April 2010 (UTC)
- A separate report would be fine, but I was thinking of expanding the focus to Category:Living people not moving it. Maybe a new report titled Biographies that need better categories (configuration), and also include all articles in XXXX births/deaths, would be appropriate.
- The point with {{cat improve}} is that it already is in a maintenance category and is more likely to already be fixed when you get to it. Rettetast (talk) 12:24, 7 April 2010 (UTC)
- Bump. Rettetast (talk) 21:03, 22 May 2010 (UTC)
- Are you using Potential biographies of living people (4) (configuration) to find pages in need of better categorization or are you using it to find bios that need to be in Category:Living people? It sounds like you're using the report for the former, when its purpose is the latter. I think that's why your {{cat improve}} suggestion didn't make sense to me. --MZMcBride (talk) 00:01, 23 May 2010 (UTC)
Please get Wikipedia:Database reports/File description pages containing no templates and/or Wikipedia:Database reports/File description pages containing no templates or categories up and runnung again, and then remove the File: namespace from Wikipedia:Database reports/Blank single-author pages. There are too many files which were uploaded with empty description pages, and that has nothing to do with CSD G7. עוד מישהו Od Mishehu 10:41, 18 March 2010 (UTC)
- The broken reports are noted above. I just filed another bug in Wikimedia's bug tracker (bugzilla:22883) to accompany the Toolserver bug report (jira:TS-549). I could probably rewrite these reports to use a different method of comparing against Commons, but I'd really rather see this bug resolved properly (setting the data types as varchar or varbinary consistently). I'm not sure if this is likely to happen in the near future, though, so I may be forced to rewrite these reports in the next few weeks.
- As for Blank single-author pages (configuration), I don't think the File pages are really overwhelming the report, though if they're being listed elsewhere, I don't have an issue removing them. I'll do it if/when the other reports are fixed. --MZMcBride (talk) 15:05, 18 March 2010 (UTC)
Recently created pages with deleted titles
I think we could use an automatically generated list of recently created (30 days would make sense, based on Special:NewPages) pages, whose author isn't an admin, which have a deletion log entry and no undeletion entry. Pages like this are likely recreations of the original pages, which would frequently be speedy deletable (if the original was a speedy deletion, then under the same CSD; if the original was deleted per XFD, then CSD G4). Weekly seems like a good frequency for this one. עוד מישהו Od Mishehu 09:36, 14 April 2010 (UTC)
- Example page? --MZMcBride (talk) 16:21, 14 April 2010 (UTC)
- I think this would be a good idea, except I wouldn't necessarily limit it to those with no undeletion history. Some pages have a history of deletion/restoration only to be finally settled by an XfD resulting in delete. If such a category were recreated, filtering out ones with a restoration in history wouldn't catch this. An example of one that would fit Od Mishehu's original suggestion would be Category:Wikipedian cyclists, and an example of one that wouldn't be caught would be Category:Socialist Wikipedians. VegaDark (talk) 16:30, 14 April 2010 (UTC)
I looked at pages that were new (in the recentchanges table, which goes back about 30 days) and had a deletion entry. There were about 5,000 results total. I put 100 results below for analysis. These are only articles, but there are IP user talk pages, file pages, and other types of pages in the results as well. --MZMcBride (talk) 23:42, 14 April 2010 (UTC)
Extended content
|
---|
mysql> SELECT
-> ns_name,
-> page_title
-> FROM page
-> JOIN toolserver.namespace
-> ON dbname = 'enwiki_p'
-> AND page_namespace = ns_id
-> JOIN recentchanges
-> ON rc_cur_id = page_id
-> WHERE rc_new = 1
-> AND page_is_redirect = 0
-> AND EXISTS (SELECT
-> 1
-> FROM logging_ts_alternative
-> WHERE log_title = page_title
-> AND log_namespace = page_namespace
-> AND log_type = 'delete'
-> AND log_action = 'delete')
-> LIMIT 100;
+---------+-----------------------------------------------+
| ns_name | page_title |
+---------+-----------------------------------------------+
| | Kim_Hak-Sung |
| | Atlassian |
| | Philadelphia_military_academy_at_leeds |
| | Payless_Cashways |
| | Big_Scoob |
| | Michael_Stevenson |
| | Manikura_Tikinau |
| | Andreu_Lacondeguy |
| | Youth_Outreach |
| | Nate_Lowman |
| | Tatrapan |
| | Papa_Niang |
| | Webberbus |
| | Muhammad_Akram_Khan |
| | Land_of_Clover |
| | The_Rat_Look |
| | New_Zealand_Top_50_Singles_of_2001 |
| | Nazneen_Contractor |
| | Nerm |
| | EUscreen |
| | Strong_House |
| | L.L.Bean_SIGNATURE |
| | Michael_Barlow |
| | Sergio_Petrelli |
| | Theo_Blake |
| | ClearTrial |
| | Lynda |
| | Paul_Duhart |
| | Multidark |
| | Break_The_Chair |
| | Chrysothrix |
| | Ikonika |
| | Casey_Clausen |
| | Denodo |
| | The_Original_Rabbit_Foot_Spasm_Band |
| | Emily_Ratajkowski |
| | VMAC |
| | Virtu-oso |
| | Stade_Klébert_Picard |
| | Soldier_Support_Institute |
| | Vladimir_Cuk |
| | Supermale_(novel) |
| | Uttoxeter,_Ontario |
| | Alexander_First_Nation |
| | Brothers_forever |
| | Supreme_Court_of_the_United_States,_televised |
| | Penelope_Gross |
| | Craig_Lowe |
| | Ganci |
| | Mann_(rapper) |
| | University_of_Connecticut_Marching_Band |
| | Radmila_Gorjanovic |
| | Royal_Lavanderia |
| | Mary_Wheeler |
| | Taped_With_Rabbi_Doug |
| | Jippii |
| | Crystal_King_(band) |
| | Globe_(album) |
| | WhizBase |
| | Moynat |
| | Hailesa |
| | Sunaila |
| | Rainbow_Entertainment |
| | Allyn_River |
| | Richard_Jay |
| | Nirosha |
| | Arve_Isdal |
| | Lev_Kuznetsov |
| | MindMeister |
| | Periphery_(band) |
| | Age_Concern_Cymru_and_Help_the_Aged_in_Wales |
| | Jorge_Gestoso |
| | DJ_Swamp |
| | Robert_Babicz |
| | Danny_Rhymes |
| | Cider_Bill_of_1763 |
| | Andrew_Garcia |
| | Blockdot |
| | Merry_Tales |
| | PMS_Asterisk |
| | Lourdes_grotto |
| | Keith_Norman |
| | Philipose_Vaidyar |
| | Kyle_Rudolph |
| | Celebrity_Skin_(band) |
| | Pseudochazara |
| | Huw_Jones |
| | Jon_Nolan |
| | Murray_Carter |
| | Jonathan_Hall |
| | Palaemonoidea |
| | Iwebkit |
| | The_So_So_Glos |
| | Karamoko_Koné |
| | ITVFest_(Independent_Television_Festival) |
| | Eddie_Lee_Sexton |
| | The_Case_of_the_Disappearing_Diamonds |
| | Google_TV |
| | Marooned_(band) |
| | Allen_Frizzell |
+---------+-----------------------------------------------+
100 rows in set (0.06 sec)
|
Incomplete work
- Potential biographies of living people (5) (configuration) — the intention was to split off pages with 0 categories from the "(4)" report; (4) is no longer listing pages with 0 categories, but "(5)" still hasn't been created
- Non-free files missing a rationale (configuration) — looks like this has been committed and pulled, it just needs to be added to the crontab, WP:DBR, /Announcements, and it needs a /Configuration subpage created
--MZMcBride (talk) 15:24, 3 May 2010 (UTC)
- With regard to the second report, QCoder said here that there were some false positives, but he seems to enjoy the list nonetheless. I pointed him over here and asked if he could produce some common keywords for the false positives to make the report better. Killiondude (talk) 18:09, 3 May 2010 (UTC)
- Non-free... is resolved. Still need to work on (5) quite a bit. --MZMcBride (talk) 03:54, 9 May 2010 (UTC)
Categories linking to deleted categories
I think we could use a weekly report for categories linking to deleted categories. These are frequently the result of back-links to renamed categories not being fixed. עוד מישהו Od Mishehu 12:39, 23 May 2010 (UTC)
- There are Categories categorized in red-linked categories (configuration) and Deleted red-linked categories (configuration) already. Not sure what you want. --MZMcBride (talk) 13:52, 23 May 2010 (UTC)
- Not the same thing. I'm talking about a "See also Category:Foo", or a navbox for categories, which isn't necessarily updated when the category is moved. עוד מישהו Od Mishehu 04:43, 24 May 2010 (UTC)
- Ah, so you're talking about Categories containing deleted red-linked categories, pretty much. (There's a bit of ambiguity in that report title, but oh well.) I posted a list here. The results look pretty noisy. Some results appear to be related to {{#ifexist:}} parser magic voodoo. Or perhaps just bad rows in the pagelinks table, though that's less likely. Let me know what you think of the results. --MZMcBride (talk) 05:21, 24 May 2010 (UTC)
- I think we should exclude any case where the deletion reason contains the string "C1" (the CSD for empty categories), which seems to be filling up the report. עוד מישהו Od Mishehu 10:51, 26 May 2010 (UTC)
- Updated results. This report idea seems untenable. --MZMcBride (talk) 22:03, 26 May 2010 (UTC)
- This does contain some useable information (see this edit, where I foxed a link thenks to your report). To make this more useful, I think we should remove categories transcluding {{Progress box}} (these are maintenance categories, where the links don't need to be updated), transcluding {{cat class}} (these are WikiProject categories, where some of the corresponding levels have been deleted). עוד מישהו Od Mishehu 08:18, 27 May 2010 (UTC)
- And any transcluding {{Cfr full}}, which should catch categories being proposed for renaming to previously deleted titles. עוד מישהו Od Mishehu 08:30, 27 May 2010 (UTC)
- This does contain some useable information (see this edit, where I foxed a link thenks to your report). To make this more useful, I think we should remove categories transcluding {{Progress box}} (these are maintenance categories, where the links don't need to be updated), transcluding {{cat class}} (these are WikiProject categories, where some of the corresponding levels have been deleted). עוד מישהו Od Mishehu 08:18, 27 May 2010 (UTC)
- Updated results. This report idea seems untenable. --MZMcBride (talk) 22:03, 26 May 2010 (UTC)
- I think we should exclude any case where the deletion reason contains the string "C1" (the CSD for empty categories), which seems to be filling up the report. עוד מישהו Od Mishehu 10:51, 26 May 2010 (UTC)
- Ah, so you're talking about Categories containing deleted red-linked categories, pretty much. (There's a bit of ambiguity in that report title, but oh well.) I posted a list here. The results look pretty noisy. Some results appear to be related to {{#ifexist:}} parser magic voodoo. Or perhaps just bad rows in the pagelinks table, though that's less likely. Let me know what you think of the results. --MZMcBride (talk) 05:21, 24 May 2010 (UTC)
- Not the same thing. I'm talking about a "See also Category:Foo", or a navbox for categories, which isn't necessarily updated when the category is moved. עוד מישהו Od Mishehu 04:43, 24 May 2010 (UTC)
Sorry, I seem to have missed your replies here (or I forgot about them). Are you still interested in getting an updated report with the new exclusions? --MZMcBride (talk) 05:01, 13 July 2010 (UTC)
- Yes. עוד מישהו Od Mishehu 10:07, 13 July 2010 (UTC)
- Here are the results with the three specified templates excluded: http://en.wikipedia.org/w/index.php?oldid=373363263
- By the way, you should ping me on my talk page if I don't respond to a thread for a few weeks. --MZMcBride (talk) 01:30, 14 July 2010 (UTC)
- Update: I would like it if the list would:
- include cases where the target is a category redirect, regardless of the existrance of a deletion log for such pages;
- exclude all cases where the target begins with a digit, as there are too many false positives in that section to make it useful;
- exclude all source categories which transclude {{Articles by Quality}}, for the same reason as {{cat class}}.
- עוד מישהו Od Mishehu 08:38, 29 July 2010 (UTC)
- I'm confused now. This most recent request seems to have veered off of the original request. We haven't been looking at category redirects (or pseudo-redirects, as the case may be with {{category redirect}}). I'm not sure how I can modify the current query to accommodate the new request. It seems like it would have to go into a new list (with the exception of point 2, I suppose). --MZMcBride (talk) 01:37, 30 July 2010 (UTC)
- Category redirects are almost always categories which have been renamed, but at lewast one user thinks that the old name is one people might be looking for. As such, from a human prospective, it's similar to the main thiong I'm looking for - links to categories which have been renamed, where the links haven't been updated. If you need to do it in a separate report, feel free to do so.
- Points 2 and 3 are consistant with my earlier request - point 2 removes a practcly useless part of the list; point 3 removes a class of categories which, while technicly fit the original request, there is no need for anything to be done with. עוד מישהו Od Mishehu 07:31, 30 July 2010 (UTC)
- I'm confused now. This most recent request seems to have veered off of the original request. We haven't been looking at category redirects (or pseudo-redirects, as the case may be with {{category redirect}}). I'm not sure how I can modify the current query to accommodate the new request. It seems like it would have to go into a new list (with the exception of point 2, I suppose). --MZMcBride (talk) 01:37, 30 July 2010 (UTC)
- Update: I would like it if the list would:
I need to chart this, because I don't remember what the original request was or what I wrote and rewrote.
Here's what I have so far:
- pages in the category namespace
- that don't transclude {{progress box}}, {{cat class}}, and {{cfr full}}
- that include links to deleted categories
- where the deleted categories' deletion reason isn't like ".*([Cc]1|[Ee]mpty|Datadump).*"
Now you want to:
- exclude pages beginning with a digit
- add {{Articles by Quality}} to the list of templates to exclude based upon on pages
These two pieces are easy enough. The part about category redirects is confusing the shit out of me, so you'll need to copy and paste from this list and re-explain what exactly you're after with those. It'll have to be in a separate report, I think. --MZMcBride (talk) 02:54, 31 July 2010 (UTC)
Here you go: http://en.wikipedia.org/w/index.php?oldid=376370319 --MZMcBride (talk) 04:00, 31 July 2010 (UTC)
- After an other thought, I think that links to category redirects may be a different report. What I want there is any case where:
- Category:A links to category:B
- Category:B is a category redirect (defined as containing a {{Category redirect}} template), or a catewgory disambiguation (defined as containing a {{Category ambiguous}}).
- עוד מישהו Od Mishehu 10:12, 1 August 2010 (UTC)
Actually a red-link in a progress box means that the category should be created, so there's still an action required. However I seem to remember using AWB's red-links function to try and pick these up, and getting the empty red-linked categories which {{Progress box}} knows about but is not showing, as well as the ones I wanted. Rich Farmbrough, 09:22, 15 September 2010 (UTC).