MediaWiki talk:Robots.txt

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Some suggestions[edit]

Perhaps these should be included?

Along with the associated talk pages and all archives? rootology (C)(T) 13:17, 13 September 2008 (UTC)[reply]

Why Fringe? The usual reason for search exclusion is potential harm to real life identified people. Can you give example where googling people is connecting back to Fringe? Dragons flight (talk) 15:36, 13 September 2008 (UTC)[reply]

Administrator's Noticeboard[edit]

I strongly disagree with the inclusion of all of AN. Very few AN discussions involve people identifiable in real life. In previous discussions I asked for examples where a Google search on just someone's name showed a major result from AN (top 20 or so) and no one could give me any examples. Being able to search AN is useful, and I don't believe there is sufficient evidence that content on AN creates harm for people in order to justify this inclusion. Dragons flight (talk) 15:21, 13 September 2008 (UTC)[reply]

Indeed someone tried to do this previously through another method and it was rejected. WP:AN discusions are often important latter on people need to be able to find them even if not certian of their locations in the archives.Geni 21:25, 13 September 2008 (UTC)[reply]

User and user talk[edit]

{{edit protected}} Can you disallow my user and user talk? NonvocalScream (talk) 16:40, 13 September 2008 (UTC)[reply]

Please just use {{NOINDEX}} on them. Also note, user-talk pages are automatically noindexed. - Rjd0060 (talk) 16:42, 13 September 2008 (UTC)[reply]
So, if User:Geo Swan, or User:Geo Swan/Guantanamo had a {{NOINDEX}}, is that {{NOINDEX}} supposed to automatically apply to all subdirectories? Geo Swan (talk) 00:10, 11 December 2009 (UTC)[reply]
Most likely.--174.53.247.29 (talk) 17:28, 13 August 2012 (UTC)[reply]

Bugzilla entries[edit]

Is there a reason to keep the bugzilla links? As far as I can see, they are in the original robots.txt to show why various entries have been added and who requested them, etc. Here we don't need to file bugzilla reports, of course, we can use the talk page or just edit the page ourselves, so I don't really see why we should keep those links. We should probably use the comments to explain why the various pages are in the list instead. --Conti| 01:25, 15 September 2008 (UTC)[reply]

I think they should be kept for the same reason that we have edit history and edit summaries, and talk page discussion histories. The links point to those bugzilla discussions. I think it would be a mistake to remove that history. They're references to reasons for the additions. (Which may be useful for future discussions.) - jc37 01:45, 15 September 2008 (UTC)[reply]
Hmm, couldn't we just link to the original robots.txt instead, then? My point is that this page will be edited again and again, and soon enough those bugzilla entries say one thing, and our local robots.txt says another thing. http://bugzilla.wikimedia.org/show_bug.cgi?id=12111 is, for example, about the German de:Wikipedia:Checkuser, and has nothing to do with what can be seen in the local list below the bugzilla URL. --Conti| 13:29, 15 September 2008 (UTC)[reply]
This page goes in place of the default robots.txt. Mr.Z-man 16:18, 15 September 2008 (UTC)[reply]

This page (and its subpages and related pages) are being indexed by Google and probably shouldn't be. --MZMcBride (talk) 00:56, 8 September 2009 (UTC)[reply]

WikiProject_Deletion_sorting?[edit]

Does WikiProject_Deletion_sorting need to be here? It only contains current and very recently closed deletion discussions, and I was surprised when I couldn't find one by Googling for i.e. 'deletion sorting china'.--Apoc2400 (talk) 23:02, 10 January 2010 (UTC)[reply]

Update and addition[edit]

{{editprotected}} Could someone update TFD as it has been renamed to "Templates for discussion" (though I'm not sure it's really needed, as templates hardly would end up as the number 1 Google hit for some person) and add Files for deletion and Possibly unfree files, where I see more danger than in templates. An image about oneself might be something to avoid in Google results. However, file deletion discussions don't seem to be too popular (e.g. they weren't in Xfd today until recently), so no one has added them yet. In this table, you'll find the new syntax:

code to add
Disallow: /wiki/Wikipedia:Templates_for_deletion/
Disallow: /wiki/Wikipedia%3ATemplates_for_deletion/
Disallow: /wiki/Wikipedia:Files_for_deletion/
Disallow: /wiki/Wikipedia%3AFiles_for_deletion/
Disallow: /wiki/Wikipedia:Possibly_unfree_files/
Disallow: /wiki/Wikipedia%3APossibly_unfree_files/

Thank you, --The Evil IP address (talk) 20:27, 4 April 2010 (UTC)[reply]

 Done, as well as the talk pages. Nakon 21:12, 4 April 2010 (UTC)[reply]

Syntax highlighting[edit]

{{editprotected}} Could you replace the <pre> tag with <source lang="robots">. This would highlight the syntax and thus make it easier to read. Thanks, --The Evil IP address (talk) 20:28, 3 June 2010 (UTC)[reply]

 Done exactly as requested. Do I need to change the tag at the bottom? HJ Mitchell | Penny for your thoughts? 20:50, 3 June 2010 (UTC)[reply]
Yes you should, pre → source. Peachey88 (Talk Page · Contribs) 05:40, 4 June 2010 (UTC)[reply]
Did that sort it? HJ Mitchell | Penny for your thoughts? 03:27, 7 June 2010 (UTC)[reply]
Note: This was not using valid parameters, reverted to pre - feel free to suggest new improvements — xaosflux Talk 03:36, 10 June 2017 (UTC)[reply]

Help[edit]

Hi! Could someone more experienced take a look at b:pt:MediaWiki:Robots.txt to see if I've created it right, please? Any suggestions? Helder 15:23, 21 November 2010 (UTC)

Robot Exclusion for Wikimedia images[edit]

Disallow Googlebot-images[edit]

Curtis J Neeley v NAMEMEDIA INC et al, (5:09-cv-05151-JLH) https://ecf.arwd.uscourts.gov/cgi-bin/DktRpt.pl?33207
Curtis J Neeley has been ordered to attempt to see if googlebot-images can be directed to stay out of the images donated here. The Plaintiff removed them from the articles but they were reverted back in overnight by others. Must Curtis J Neeley sue the Wikipedia Foundation to force the googlebot-images exclusion? It is now either voluntarily exclude this bot or Curtis J Neeley will ask that the Wikipedia Foundation be added for US Title 17 § 106A violations in the ongoing litigation with Google et al.CurtisNeeley (talk) 21:14, 7 December 2010 (UTC)[reply]

Requests for action based on ongoing legal cases should be sent to the Wikimedia Foundation directly. Please see this page for contact info. Dragons flight (talk) 05:42, 8 December 2010 (UTC)[reply]

Check my work?[edit]

Can someone confirm that the subpages of WP:Copyright problems and WP:Suspected copyright violations (which I just added) won't get indexed? It seems that the CP pages (which have been listed here for a while) don't show up on Google/elsewhere but I'd just like someone else to confirm that the daily subpages won't be indexed individually. Help? VernoWhitney (talk) 00:09, 20 March 2011 (UTC)[reply]

Subpages of disallowed pages in robots.txt are disallowed by default (ie. simply "/" would disallow the entire site). Edokter (talk) — 15:04, 2 April 2011 (UTC)[reply]
And that applies even if the entries don't end in '/' as in this case? It's been a while since I messed around with robots.txt configs. VernoWhitney (talk) 16:01, 2 April 2011 (UTC)[reply]
To be honest, for me too. Pages are treated as files, so it wouldn't hurt to duplicate those entries with a trailing '/'. Edokter (talk) — 20:50, 2 April 2011 (UTC)[reply]

Wikipedia:Mediation_Cabal/Cases/[edit]

Suggested addition. –xenotalk 19:17, 6 April 2011 (UTC)[reply]

Sounds good to me. It would be nice if we could exclude MedCom cases as well, but unfortunately that doesn't have a cases subpage. I've added Wikipedia:Mediation_Committee/Nominations, though. Feezo (send a signal | watch the sky) 11:38, 27 June 2011 (UTC)[reply]
I've actually changed this to disallow all mediation committee pages, by analogy with the arbitration committee. Also added MedCab cases as suggested. Feezo (send a signal | watch the sky) 23:17, 30 June 2011 (UTC)[reply]

Sockpuppet categories[edit]

Please disallow Category:Wikipedia sock puppetry and all of its subcategories. –xenotalk 12:50, 16 September 2011 (UTC)[reply]

  • Category:SPI cases*
  • Category:SPI Cases*
  • Category:SPI requests*
  • Category:Wikipedia sockpuppet*
  • Category:Suspected Wikipedia sockpuppets*

Clarification?[edit]

Could we have a short description at MediaWiki:Robots.txt making it clear whether this is the robots.txt file for en.wikipedia.org or the robots.txt file for www.mediawiki.org? Even if the two are identical now that may change in the future.

Also, the text does not match http://en.wikipedia.org/robots.txt or http://www.wikimedia.org/robots.txt. Is this an old version? If so, is there a way to keep it synchronized with the actual robots.txt file? --Guy Macon (talk) 18:58, 2 May 2012 (UTC)[reply]

It says right at the top that this is the "Localisable part of robots.txt for en.wikipedia.org" and also states as much at the top of teh talk page. Specifically, these are areas denied to all robots (user-agent *) - the others parts are not controllable by us. Do you have a suggestion on how to make the text clearer? --ThaddeusB (talk) 14:46, 7 May 2012 (UTC)[reply]
Are we limited to making hash comments in the body of the .txt file, or can we place a plain-English descriptive sentence or two above the .txt box? --Guy Macon (talk) 19:39, 10 May 2012 (UTC)[reply]
I think, but am not certain, that anything on the MediaWiki page is copied exactly into the robots.txt file (if you scroll down far enough in the main robots.txt file, you'll see all the text form here). I.E. trying to <noinclude> documentation won't work. --ThaddeusB (talk) 16:22, 11 May 2012 (UTC)[reply]
If that is the case, the downside of adding a few bytes to millions and millions of pageviews is far larger than the upside of giving the hundreds who look here a better explanation. If it turns out to be possible to do so, I would like to place a paragraph on top explaining to the reader what he is looking at, but not in the file. --Guy Macon (talk) 17:42, 11 May 2012 (UTC)[reply]
Even assuming it's correct that bots retrieve robots.txt millions and millions of times (remembering that AFAIK any decent bot only loads the robots.txt once every so often rather than for every page it retrives), I'm not sure that it should really be our concern that they need to load a few more bytes. I presume our server people aren't worried about it or they would have told us and if you're running a bot you need to accept you'll incur some extra bandwidth because of it. Remember of course that those just browsing the page normally should not retrieve robots.txt unless there's something weird with their browser or they intentionally choose to visit it in which case they're likely to actually want to see the comments. Nil Einne (talk) 06:09, 27 December 2013 (UTC)[reply]

There are a few typos in en robots.txt[edit]

Would be cool if an admin could fix these:

 /wiki/Wikipedia%3Mediation_Committee/
 /wiki/Wikipedia_talk%3Mediation_Committee/
 /wiki/Wikipedia%3Mediation_Cabal/Cases/

There has to be an A after all the `%3`s to be a valid encoded URL:

 /wiki/Wikipedia%3AMediation_Committee/
 /wiki/Wikipedia_talk%3AMediation_Committee/
 /wiki/Wikipedia%3AMediation_Cabal/Cases/

Thanks! — Preceding unsigned comment added by Cebe.cc (talkcontribs) 21:47, 6 January 2013 (UTC)[reply]

Done. Thanks for spotting that. Edokter (talk) — 23:41, 6 January 2013 (UTC)[reply]

wayback-maschine[edit]

http://en.wikipedia.org/robots.txt

Line 122, is this an error or..? Shouldn't it be Wayback Machine instead of maschine? It doesn't show up in this page either, just the actual robots.txt page. --108.211.193.185 (talk) 14:39, 12 May 2013 (UTC)[reply]

That looks to be in the non localised portion so you'd probably need to file a bugzilla. Since it's just in the comment, I don't know if it matters enough to bother but it's up to you. Nil Einne (talk) 06:12, 27 December 2013 (UTC)[reply]

Duplicate entries[edit]

There are two entries for each of these:

  • Disallow: /wiki/Wikipedia:Templates_for_deletion/
  • Disallow: /wiki/Wikipedia%3ATemplates_for_deletion/

One of them can be removed safely. -- Dalba 16:40, 26 November 2013 (UTC)[reply]

Same for
  • Disallow: /wiki/Wikipedia_talk:Templates_for_deletion/
  • Disallow: /wiki/Wikipedia_talk%3ATemplates_for_deletion/
So, Done see here. --Redrose64 (talk) 17:41, 26 November 2013 (UTC)[reply]

Protected edit request on 13 May 2014[edit]

i am requesting a change in this sites robots.txt due to a error in it here is a description of it

Line 145 Allow: /w/api.php?action=mobileview& Unknown command. Acceptable commands are "User-agent" and "Disallow". A robots.txt file doesn't say what files/directories you can allow but just what you can disallow.

thank you for your time Aarongaming100 (talk) 16:03, 13 May 2014 (UTC)[reply]

This is only the editable portion of robots.txt, which has no Allow: directives. You may be referring to the server version, which cannot be edited from here. Edokter (talk) — 18:59, 13 May 2014 (UTC)[reply]

Protected edit request on 28 February 2015[edit]

The old bugzilla.wikimedia.org links need to be replaced with the new phabricator.wikimedia.org links, given that Bugzilla was replaced with Phabricator. Basically, the old links of the form http://bugzilla.wikimedia.org/show_bug.cgi?id=[id] need to be replaced with links of the form https://phabricator.wikimedia.org/T[id+2000]. Gparyani (talk) 21:35, 28 February 2015 (UTC)[reply]

Hi Gparyani. "Need" is a strong word to use... the old links should work fine now and will continue to work indefinitely. If you want to create a draft of MediaWiki:Robots.txt with improved links, I'm sure an admin will be happy to sync. --MZMcBride (talk) 22:19, 28 February 2015 (UTC)[reply]
Looks like you caught me in the middle of this edit. I've reverted it for now, but the new links would all be working properly. Nakon 22:27, 28 February 2015 (UTC)[reply]
Your edit looked acceptable to me. My point was mostly that these types of edits/edit requests are basically along the lines of Wikipedia:Redirect#Do not "fix" links to redirects that are not broken. If someone really cares, there's not much harm in updating bugzilla.wikimedia.org links to phabricator.wikimedia.org links. However, it was an intentional decision to keep bugzilla.wikimedia.org links working so that editors would not be required (read: need) to fix bugzilla.wikimedia.org links. --MZMcBride (talk) 17:11, 1 March 2015 (UTC)[reply]
These are not internal wiki redirects. And while maintaining the old links is fine, it should not be regarded as a prohibition to update these links. -- [[User:Edokter]] {{talk}} 20:04, 1 March 2015 (UTC)[reply]
No kidding. And nobody suggested otherwise; in fact, the opposite. :-) --MZMcBride (talk) 22:41, 1 March 2015 (UTC)[reply]

Exclusion of sandbox content...[edit]

It is noted that a number of users sensibly use their userspace to develop article drafts and to create sandbox content for test edits.

It would therefore be appreciated if consideration be given to adding such sandboxes and drafts to the exclusions here.

The alternative is to place a {{user sandbox}} or {{userpace draft}} manually, which I've been informed upsets people who like to trest their userspace with a degree of privacy. Sfan00 IMG (talk) 12:02, 26 April 2015 (UTC)[reply]

Disallow /?title=[edit]

Lately a large part of my Google searches give url's like https://en.wikipedia.org/?title=Denmark and https://en.wikipedia.org/?title=Woman while our preferred /wiki/ url is not listed. I guess it's removed as duplicate content of /?title=. Is it possible to disallow /?title= without a big risk of not having it replaced by another url like /wiki/ ? So far I only see /?title= for en.wikipedia.org so I haven't posted to meta:MediaWiki talk:Robots.txt. PrimeHunter (talk) 15:02, 22 June 2015 (UTC)[reply]

@PrimeHunter: I've been unable to reproduce. Is this still an issue? Mdann52 (talk) 11:47, 5 October 2015 (UTC)[reply]
@Mdann52: I haven't noticed it in a long time and couldn't reproduce now so I guess Google has either fixed it or their bot no longer finds such url's. The specific search site:en.wikipedia.org/%3ftitle claims "About 939,000 results" but there are only 55 when the last result page is clicked. site:en.wikipedia.org/?title doesn't seem to work. PrimeHunter (talk) 12:06, 5 October 2015 (UTC)[reply]

Protected edit request on 5 July 2015[edit]

Per Wikipedia:Village pump (proposals)/Archive_126#Userpage drafts shown in search engines, there is a consensus to disable indexing for userspace. This is easiest done by adding Disallow: /wiki/User: immediately below the last entry in the list (Disallow: /wiki/Category%3ANoindexed_pages).


Thanks, Mdann52 (talk) 10:40, 5 July 2015 (UTC)[reply]

Ehm, that seems rather drastic to me... Also note that NOINDEX != Disallow, these days. Dan, any thoughts on this ? —TheDJ (talkcontribs) 22:53, 5 July 2015 (UTC)[reply]
For changing an entire namespace, it's better to change wgNamespaceRobotPolicies of the server config btw. Then you can set noindex, follow for instance. For that file a phabricator ticket. —TheDJ (talkcontribs) 23:11, 5 July 2015 (UTC)[reply]

Thanks for the ping, TheDJ! Is the indexing of user space a recent change? In the distant past I added the __INDEX__ magic word to my user page since I didn't mind having it indexed by search engines, which would imply that something's changed since then. If there's some other cause of this then it'd be good to know what it is rather than piling on quick hacks on top of some other problem, as this doesn't seem emergent enough to require immediate action. I'll start a thread on wikitech-l to see if anyone knows. With respect to this specific request, I have no issue with this request as the __NOINDEX__ magic word doesn't affect our search functionality at all, so you can still patrol the projects in that way. Per TheDJ's recommendation, we can do this via a configuration change; I can have an engineer in the Search Team take a look at that after some initial investigation is performed. --Dan Garry, Wikimedia Foundation (talk) 17:11, 6 July 2015 (UTC)[reply]

I've tracked this in phabricator as phab:T104797. Mdann52 (talk) 17:15, 6 July 2015 (UTC)[reply]
I've started a wikitech-l thread to whether something's changed on our end. --Dan Garry, Wikimedia Foundation (talk) 17:33, 6 July 2015 (UTC)[reply]

Comment vs code (Wayback Machine)[edit]

Isn't this bit treated as a comment rather than a rule since it's preceded by #'s?

Wayback Machine entry
# Don't allow the Wayback Machine to index user-pages
#User-agent: ia_archiver
#Disallow: /wiki/User
#Disallow: /wiki/Benutzer

86.90.39.63 (talk) 22:32, 3 October 2015 (UTC)[reply]

Yes. It seems to be disabled on purpose. -- [[User:Edokter]] {{talk}} 22:36, 3 October 2015 (UTC)[reply]
There is a patch for review to change it to this:
User-agent: archive.org_bot
Disallow: /wiki/User:
Disallow: /wiki/Benutzer:
See phab:T104949 and gerrit diff. PrimeHunter (talk) 22:54, 3 October 2015 (UTC)[reply]

Protected edit request on 26 February 2016[edit]

YO 1.23.216.65 (talk) 19:27, 26 February 2016 (UTC)[reply]

 Not done malformed request. — xaosflux Talk 19:59, 26 February 2016 (UTC)[reply]

April 2016 request[edit]

immediate edit request[edit]

The Disallow: /wiki/Wikipedia:Long_term_abuse section needs to be updated to refer to Wikipedia:Long-term abuse due to a mass rename of the entire project years ago. The difference is a hyphen, but search engines are now picking up on the reports which previously were excluded. The same goes for the Disallow: /wiki/Wikipedia:Abuse_reports/ section, which was renamed to Wikipedia:Abuse response years ago. It might be better to keep both the old and new names, because there are some straggler subpages on both names. The corresponding talk pages and subpages would also need to be updated. Pteroinae (talk) 07:08, 3 April 2016 (UTC)[reply]

 Donexaosflux Talk 03:00, 4 April 2016 (UTC)[reply]
Sorry, forgot my first account's password. Wikipedia:Abuse reports is also listed on the NOINDEX page, but that project too has been renamed to Wikipedia:Abuse response and some of the reports were moved and some weren't. As with Long-term abuse, it seems two sets of entries are needed because there are some straggler subpages. Thanks. Pteroinae alternate (talk) 06:22, 7 April 2016 (UTC)[reply]

Split for discussion[edit]

Could we also perhaps add Wikipedia:Wikiquette assistance and its subpages/talkpages to NOINDEX because it's materially similar to the other noticeboards already NOINDEXed and could out/pose a privacy concern to those being discussed (or were being discussed, since the place is inactive)? Pteroinae (talk) 07:08, 3 April 2016 (UTC)[reply]

I've split this for further discussion - a community consensus must be demonstrated first. — xaosflux Talk 03:00, 4 April 2016 (UTC)[reply]
 Donexaosflux Talk 23:05, 10 April 2016 (UTC)[reply]

Disallow: /wiki/Wikipedia:Archive.is_RFC_4[edit]

Could you please add

Disallow: /wiki/Wikipedia:Archive.is_RFC
Disallow: /wiki/Wikipedia_talk:Archive.is_RFC

These RFC pages mistakenly not placed under already disallowed folders;

Disallow: /wiki/Wikipedia:Requests_for_comment/
Disallow: /wiki/Wikipedia_talk:Requests_for_comment/

PS. I added RFC_5. There is no such page yet, it is just to avoid the extra work when it will be created.

PPS. I read the spec about robots.txt and removed lines like "Disallow: /wiki/Wikipedia:Archive.is_RFC_4", "Disallow: /wiki/Wikipedia:Archive.is_RFC" should cover all pages with this prefix including "Disallow: /wiki/Wikipedia:Archive.is_RFC_4". Only two lines needed. — Preceding unsigned comment added by 78.139.174.106 (talk) 13:30, 26 May 2016 (UTC)[reply]

How about just MOVING these pages? — xaosflux Talk 17:24, 26 May 2016 (UTC)[reply]
It would be not enough. The pages have been moved, but old (indexable) location still have the content, not the redirect.
Although new location is protected by robots.txt and you are not able to archive the page using new URL: http://web.archive.org/save/https://en.wikipedia.org/wiki/Wikipedia:Requests_for_comment/Archive.is_RFC
You are still able to archive the page this way: http://web.archive.org/save/https://en.wikipedia.org/wiki/Wikipedia:Archive.is_RFC
Ironically, the Wikipedia servers does not redirect robots to the new location of the MOVED pages, it serves them with the same content on both old and new locations. Only meat users are redirected.
In the robot's eye it is no MOVE but COPY.
78.139.174.106 (talk) 17:58, 26 May 2016 (UTC)[reply]


  1. . Look, ma: I created a redirect to /wiki/MediaWiki:Spam-blacklist on a talk page https://en.wikipedia.org/w/index.php?title=User_talk:178.137.146.212&redirect=no
  2. . IP's talk page have NOINDEX.
  3. . /wiki/MediaWiki:Spam-blacklist are protected in robots.txt
  4. . The resulting page https://en.wikipedia.org/wiki/User_talk:178.137.146.212 has all the content of /wiki/MediaWiki:Spam-blacklist and it does not have NOINDEX and not protected by robots.txt. You can archive it: http://web.archive.org/save/https://en.wikipedia.org/wiki/User_talk:178.137.146.212 or submit to Google, whatever. 178.137.146.212 (talk) 04:26, 27 May 2016 (UTC)[reply]
The RFC pages could add {{NOINDEX}} but there was disagreement about that at Wikipedia:Requests for comment/Archive.is RFC 4#NOINDEX. I added {{NOINDEX}} to User talk:178.137.146.212 [1] but that apparently doesn't prevent indexing when it redirects to a page without NOINDEX. PrimeHunter (talk) 11:08, 27 May 2016 (UTC)[reply]
You are absolutely right, NOINDEX (placed on the content pages, not on pages with #REDIRECT) is the solution technically. NOINDEX has been on those pages for years. But yesterday an admin Beetstra removed the NOINDEX from all the page for ungrounded reason: Wikipedia_talk:Requests_for_comment/Archive.is_RFC_4#NOINDEX, Wikipedia:Requests_for_comment/Archive.is_RFC_4#NOINDEX. If I put NOINDEX back she or he undo my changes. 78.139.174.106 (talk) 14:40, 27 May 2016 (UTC)[reply]
I tend to think, that even robots.txt is not a solution here. Even with those lines added to robots.txt, one is still able to create a redirect page (or a huge farm of such pages) in her or his user space and thus circumvent robots.txt. The solution expected to be in fixing MediaWiki code, as using redirect for circumvent robots.txt makes many of the solutions above futile: you wanted to prevent User_talk:Jimbo_Wales from archiving on Wayback Machine? anyone can create an indexable redirect (actually, not redirect, but live mirror in the robots's eyes) page and save it instead. 78.139.174.106 (talk) 14:46, 27 May 2016 (UTC)[reply]

Google thinks it's cute, we need to blacklist Wikipedia%3AArticles_for_deletion%2F[edit]

Google results for "Sarah Beck Mather" brings up this link: Sarah Beck Mather, which is technically not disallowed, because the slash is escaped. If I am not reading this correctly, I'd like a pointer to what's actually happening, but if I'm right, please add %2F counterparts for the appropriate rules with slashes.

Note: This was brought up in the #wikipedia-en-help channel on IRC, and while the article is now blanked (thanks, User:DragonflySixtyseven), other AfDs may be indexed in this manner, against our wishes.

Thanks! --MarkTraceur (talk) 16:20, 8 December 2016 (UTC)[reply]

I think you are absolutely right, and other AfDs that are indeed getting indexed in this manner. Mz7 (talk) 22:54, 31 December 2016 (UTC)[reply]
With that being said, I'm not quite sure if simply disallowing Wikipedia%3AArticles_for_deletion%2F would fix this, however. In my admittedly very basic understanding of this, by escaping the slash, we are now referring to the page Wikipedia%3AArticles_for_deletion%2FSarah_Beck_Mather as a subpage of wiki/, instead of Sarah_Beck_Mather as a subpage of Wikipedia%3AArticles_for_deletion/. In other words, for this to work, we would have to add Wikipedia%3AArticles_for_deletion%2FSarah_Beck_Mather to the robots.txt in order to pull it from Google. It would be easier if the MediaWiki developers could somehow prevent our URLs from being able to escape that slash with a %2F. Mz7 (talk) 23:15, 31 December 2016 (UTC)[reply]

I was just coming here to say/note the same. This search has the result:

Wikipedia:Articles for deletion/Anil Dash - Wikipedia
https://en.wikipedia.org/wiki/Wikipedia%3AArticles_for_deletion%2FAnil_Dash
This page is an archive of the discussion about the proposed deletion of the article below. This page is no longer live. Further comments should be made on the ...

We currently specify these lines:

Disallow: /wiki/Wikipedia:Articles_for_deletion/
Disallow: /wiki/Wikipedia%3AArticles_for_deletion/

These lines do not match "Wikipedia%3AArticles_for_deletion%2FAnil_Dash".

Do we care about the root page (i.e., Wikipedia:Articles for deletion) being indexed? If not, we could just remove the trailing slashes from these two rules, which would then catch the Anil Dash case and others.

Otherwise, we'll need to add more permutations to the list of disallow directives, which is kind of gross. In either case, we need to act here. --MZMcBride (talk) 16:26, 9 January 2017 (UTC)[reply]

Should we change the second line to:
Disallow: /wiki/Wikipedia%3AArticles_for_deletion%2F
? Legoktm (talk) 02:07, 10 January 2017 (UTC)[reply]
I feel like changing the second line as you suggest will just result in future issues for "/wiki/Wikipedia%3AArticles_for_deletion/". We don't aggressively normalize these URLs.
We could manually mark pages such as Wikipedia:Articles for deletion/Anil Dash as noindex with a bot/script. And/or we could change MediaWiki to programmatically mark all pages with a specified prefix as noindex in their HTML outputs. (We already do this at a namespace level, but we could do it at a page title prefix level as a step further.)
I don't think there's much value in indexing the root Wikipedia:Articles for deletion page. I think the simplest solution is to remove the trailing "/"s from the existing rules. The harm in having a single false negative is surely outweighed by the harm of having many false positives. --MZMcBride (talk) 04:14, 10 January 2017 (UTC)[reply]
OK, done. I suppose should do the same for the rest of the XFD types? What about all the other rules? Legoktm (talk) 04:29, 10 January 2017 (UTC)[reply]
In skimming the list again, I think it's fine to remove all the trailing slashes. In some cases, such as "Disallow: /wiki/Wikipedia:Copyright_problems", we've already done this. For the cases where we're currently including the trailing slash, for example "/wiki/Wikipedia:Neutral_point_of_view/Noticeboard/", I don't think there's any real value in indexing the root noticeboard page. In cases like this, by including the trailing slash, we're actually allowing the current content to be indexed. If we've decided to not index the archives and other subpages of noticeboards, I don't really see how encouraging indexing of the current open topics makes sense. This is also true of cases where the root page transcludes content from subpages. --MZMcBride (talk) 05:58, 10 January 2017 (UTC)[reply]

Hi Od Mishehu and Legoktm and any other passing admin. Thanks for the recent edits. Can someone please remove the trailing slashes from the other rules? I'm worried about cases like this search, which have <https://en.wikipedia.org/wiki/Wikipedia%3ARequests_for_comment%2FHipocrite> in the results. --MZMcBride (talk) 05:11, 11 January 2017 (UTC)[reply]

I think I got them all. Lemme know if you need anything else ^demon[omg plz] 01:50, 12 January 2017 (UTC)[reply]
Cool, thank you. We may still see issues at some point with pages such as Wikipedia:Reliable sources/Noticeboard in search results, if the "/" gets converted to "%2F", but I'm not sure the risk is worth adding more variant directives. --MZMcBride (talk) 07:05, 12 January 2017 (UTC)[reply]

Archive Team's view on robots.txt[edit]

Hi. I found this piece interesting: <http://www.archiveteam.org/index.php?title=Robots.txt>. --MZMcBride (talk) 07:06, 12 January 2017 (UTC)[reply]

Protected edit request on 30 January 2017[edit]

My page Ujwal Ghimire needs indexing so search engines find it. Please add Indexing. Thanks --Rohkum (talk) 18:36, 30 January 2017 (UTC) Rohkum (talk) 18:36, 30 January 2017 (UTC)[reply]

Not done: The article is not noindexed due to Robots.txt. — JJMC89(T·C) 19:58, 30 January 2017 (UTC)[reply]

Sandbox modules[edit]

Please add:

Disallow: /wiki/Module:Sandbox
Disallow: /wiki/Module%3ASandbox

Unlike normal templates, Scribunto modules only work in the Module namespace, so what would otherwise be created in the User namespace get created under Module:Sandbox/. Nardog (talk) 10:28, 2 January 2019 (UTC)[reply]

 Donexaosflux Talk 16:40, 2 January 2019 (UTC)[reply]

Also add:

Disallow: /wiki/Template:TemplateStyles sandbox
Disallow: /wiki/Template%3ATemplateStyles sandbox

for a similar reason. Nardog (talk) 05:30, 6 September 2020 (UTC)[reply]

Added that in the same section. Jo-Jo Eumerus (talk) 06:55, 10 September 2020 (UTC)[reply]
@Jo-Jo Eumerus: Thanks—and, my bad, it should have been underscores instead of spaces (TemplateStyles sandboxTemplateStyles_sandbox). Apologies for the inconvenience. Nardog (talk) 11:09, 12 September 2020 (UTC)[reply]
Done, thus. Jo-Jo Eumerus (talk) 11:16, 12 September 2020 (UTC)[reply]
Thanks! Nardog (talk) 11:19, 12 September 2020 (UTC)[reply]

Protected edit request on 19 August 2020[edit]

Please add Disallow: /wiki/Talk: and Disallow: /wiki/Talk%3A per Wikipedia:Village_pump_(proposals)/Archive_169#Thoughts_on_deindexing_(some)_non-content_namespaces. (I know the discussion is from a month ago, but I did not know if there was consensus for the move, but looking a second time, it appears that there is a rough consensus to deindex article talk pages.) Aasim 05:57, 19 August 2020 (UTC)[reply]

 Not done @Awesome Aasim: not doing this, for many reasons. For something extremely broad like this: that RfC was never really "closed"; it was also not well-attended; it was not well-advertised; finally - entire namespace indexing control should be done with meta tags and the $wgNamespaceRobotPolicies parameters - which will require a phab request - which will also be requiring a well-attended, strongly supported discussion. — xaosflux Talk 14:15, 19 August 2020 (UTC)[reply]

Syntax validator in comments[edit]

Someone may want to remove the syntax validator URL from the comments, as it now redirects to a completely different site. Trivialist (talk) 16:06, 29 May 2021 (UTC)[reply]

 Donexaosflux Talk 18:14, 29 May 2021 (UTC)[reply]

COIBot report[edit]

COIBot is creating reports related to spamming/link abuse which are currently {{NOINDEX}}ed by the addition of a template. User:Asartea suggested to have them added here, therefore: can the following 4 pages and subpages of them (thousands of reports) be NOINDEXed through robots.txt please: Wikipedia:WikiProject Spam/COIReports, Wikipedia:WikiProject Spam/LinkReports, Wikipedia:WikiProject Spam/UserReports, and Wikipedia:WikiProject Spam/PageReports? Dirk Beetstra T C 12:48, 23 January 2022 (UTC)[reply]

@Beetstra and Asartea: Is this such a good idea? I thought that robots.txt, unlike {{NOINDEX}}, doesn't prevent the page from showing in Google search results. It only prevents the content from showing, but there will be still be a link. The owner of someinnocentsite.com is not going to want to be associated with "Spam". From [2]: Google can't index the content of pages which are disallowed for crawling, but it may still index the URL and show it in search results without a snippet Suffusion of Yellow (talk) 20:10, 23 January 2022 (UTC)[reply]
Ah had missed that was the case. In that case its probably best to indeed continue to use {{NOINDEX}} (although maybe via another template), although we do use Robots.txt for say XfD -- Asartea Talk | Contribs 20:52, 23 January 2022 (UTC)[reply]
FWIW I can't find a single example of a Google search actually showing an AFD discussion. I could have sword this came up before, though. Suffusion of Yellow (talk) 21:11, 23 January 2022 (UTC)[reply]
@Suffusion of Yellow and Asartea: hmm, OK. I've had in the past reports showing up in Google, even while we actually already have the path Wikipedia:WikiProject Spam in robots.txt. That was because some reports did not have {{NOINDEX}} (they, as SoY says, show up without snippet). I've once or twice asked google (after NOINDEXING the report) to remove the report from their results. Maybe better as is (but I am willing to consider another template - {{User:COIBot/noindex}}) from the future - note that it may need a change in the code of the bot, not sure if it is all regulated through m:User:COIBot/Settings and User:COIBot/Settings - but that is quick enough to figure out). Dirk Beetstra T C 05:29, 24 January 2022 (UTC)[reply]
But if robots.txt forbids access, how can the GoogleBot (or any well-behaved bot) even see the <meta name="robots" content="noindex,follow"/> in the {{NOINDEX}}ed page? It's not supposed to access it. Suffusion of Yellow (talk) 20:46, 24 January 2022 (UTC)[reply]
Yeah thats a good question, shouldn't the existing Wikiproject:Spam line just autoforbid the COIBot pages anyway? -- Asartea Talk | Contribs 20:56, 24 January 2022 (UTC)[reply]