Jump to content

Wikipedia talk:Search engine test/Archive 1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1 Archive 2

Website redirections

The Alexa section points out that articles on certain websites should be linked to the corresponding entity behind it. But this makes no sense to me intuitively; the article on Microsoft has very little to do with microsoft.com, per se. What if the person looking it up wanted to know when the website was created, the design changes it's gone through, the extent of its resources, possible domain name conflicts, etc? The Microsoft article is certainly relevent, and should be linked, but a company is not equivalent to their website.

Derrick Coetzee 15:32, 4 May 2004 (UTC)

Status dispute

The status of this page as a Wikipedia guideline is disputed by BlankVerse on Wikipedia talk:Semi-policy. I have reclassified it as a proposed policy for this reason.

If you think this page should be classified as a guideline, now would be a good time to speak up. If we have consensus, it can and should be reclassified. -- Beland 02:24, 19 May 2005 (UTC)

  • My opinion: I find this page to be mostly a "how-to" piece, more "help" or "advice" than weak or strong policy of any kind. If there is any policy content, it's something like, "The Google Test is to be used as a rough guide, and is not a substitute for human judgement and discussion." The rest is all FYI or perhaps a long-winded reasoning. I could see this page being classified as a "how-to" or "help" and leaving it at that. (The Wikipedia style and how-to categories do need some tidying up, by the way.) I would not object to it being classified as a "guideline" instead, though it would be nice if the "policy" aspect were clearly stated, perhaps as suggested above. -- Beland 02:24, 19 May 2005 (UTC)
  • Concur with Beland. We could classify it as a 'deletion tool' as well, maybe that would be acceptable? It's presently listed as 'proposed', btw, that doesn't quite feel right. Radiant_* 06:58, May 19, 2005 (UTC)
  • This page definately should not be tagged with either a Policy or Guideline tag unless there can be shown significant Wikipedia:consensus for its use. I would support classifying it as a "how-to" or "help" page, but even then, it desperately needs to be rewritten (I've had the page on my watch list because it's on my to-do list). BlankVerse 08:08, 19 May 2005 (UTC)
  • Since it isn't 'proposed policy' per the previous tag, I've marked it as a how-to document. I hope there are no objections, and I'm looking forward to your rewrite and improvement of it. Radiant_* 08:12, May 19, 2005 (UTC)
  • No one on Wikipedia (not even this page) thinks that the Google Test is always valid, but it's part of our jargon and culture, often referenced as a piece of evidence, and is definitely something new Wikipedians should be able to learn about. I'm not sure if it was ever intended to be a policy of any sort. I would say the Google Test is a tool, rather than a requirement or recommendation (it is certainly not "proposed" - we are not extending the idea that articles should pass the test.) Deco 08:16, 19 May 2005 (UTC)
  • Also: My personal opinion is that is should NOT be listed as a "Deletion tool" since it is so often misused at WP:VFD. BlankVerse 08:20, 19 May 2005 (UTC)
  • How-to is a good choice for this one. I wouldn't call it specifically deletion tool, or anything. It's commonly useful, but I don't think we want to overemphasize its usefulness for any particular purpose in the heading. Zocky 09:46, 19 May 2005 (UTC)
    • Okay, I agree. No deletion tool, just a how-to. Sounds good to me. Radiant_* 07:57, May 20, 2005 (UTC)

The Google Test

Are there any guidelines as to how many Google hits a topic needs to pass? I just added a page on Mary Devenport O'Neill who gets 4 hits, two on Wikipedia, but I think she is important enough to merit inclusion. By the way, she died when I was a child, I never met her, and I'm not related to her, but she played an important bit part in the history of 20th century Irish poetry, my main field of interest. Bmills 17:02, 20 Nov 2003 (UTC)

I think the Google test is simply one of many heuristics use to determine if an article belongs on Wikipedia. Nor is Google the ultimate reference; it will skew to the popular and the general. Your entry seems a fine addition to Wikipedia. orthogonal 17:15, 20 Nov 2003 (UTC)
I just started Balaenoptera omurai, which has 0 Google hits (take it as read that I am only VERY distantly related to this whale species) There are no hard and fast rules to the Google Test, and some contributors actively dislike it as a guideline because of its limitations particularly with matters of history. Pete 17:16, 20 Nov 2003 (UTC)
Google hits just a guide. If you are knowledgeable in the field and say she is important, that should be good enough. Clearly not a vanity page, which is the biggest problem here with the obscure biographies (autobiographies). Also yours is well written (does not go into her pets' names, lifetime moves, childhood friends, etc.) explaining why she is significant. -- Marshman 17:18, 20 Nov 2003 (UTC)

I only asked because the Google test is quoted so often on Wikipedia:Votes for deletion. As a relative newcomer, I'm still feeling my way around these things. Bmills 17:23, 20 Nov 2003 (UTC)

Re Balaenoptera omurai:
I have checked in Copernic Agent. There are 14 results:
One each in National Geographic and Nature
One Japanese, one Polish, one Czech, one Argentinian, one Norwegian
Seven German results
Dieter Simon 01:15, 21 Nov 2003 (UTC)
That's not surprising - the news about that whale species did just came out yesterday, so we are very fast to include an article on that. Right now google has 56 unique hits for that one, and the number will probably increase more. andy 12:41, 21 Nov 2003 (UTC)

It's only one guideline. It is perhaps most useful (but not limited to this use) for contemporary topics and for evaluating vanity pages. Daniel Quinlan 01:42, Nov 21, 2003 (UTC)

Yes. It does tend to get overused IMO. We don't want Wikipedia to reflect the bias already shown on the WWW, especially since a similar bias is probably produced by the demography of our editors as a population. But, both finding print media to cite, and verifying them when cited, are a lot more work than just typing a query into Google. Andrewa 03:23, 21 Nov 2003 (UTC)

Agreed, Andrewa. My perspective is probably skewed by the fact that I've been making contributions around writers and writing and in almost all cases with those writers' books to hand. I also try to add external links to provide as much verification as possible, but sometimes this is difficult as with Mary Devenport O'Neill. And sometimes the information on the Web is wrong, or slanted. Bmills 12:05, 21 Nov 2003 (UTC)

Wikpedia should be the means of getting lives on the web for which no other pages exist. If Google does not note it, thats all the more reason for putting it on Wikipedia! This is not a measure of their lack of importance, but more of the intersts of people who post web pages. As to who ought be noted, the index volume for the 1911 Britannica (not alas, included in the online edition), has biographies classified by subject, so the names of all the painters, all the engineers, etc, are in one place. The original Dictionary of National Biography is out of copyright now, but not on line. There are in-copyright biographical dictionaries which can form the basis for wikibiogrpahies to be written with just a little work Apwoolrich 13:27, 15 Sep 2004 (UTC)

Agree with the previous: 'Google testing' gives special privilege to certain subjects. It is an invalid method for determining the content of an encyclopedia. Think about it the other way round: if a subject has millions of WWW search hits does the world really need another page about it? Adambisset 14:53, 29 July 2005 (UTC)

Asymmetry

One aspect of Google that is IMO poorly understood on VfD in particular is that the test is far more useful for supporting a keep vote than for supporting a delete.

From Wikipedia:Votes for deletion/Professor Felina Ivy:

  • Keep. Well-written article on a subject of great interest to a substantial number of people. Another demonstration that the 'Google test' may be great at providing evidence for a keep vote, but it is absolutely useless as substantiation for a delete vote. The reason is simple: Many encyclopedic subjects are not well-represented on the web. And while it's easy to check when Google gives a false positive by providing unrelated hits, there is no easy way of checking these false negatives. Andrewa 23:47, 24 Feb 2005 (UTC)
    • Are you trying to tell me that the google test is flawed here because Pokemon characters are not well represented on the web? A search for "pokemon" and "character" gets 533 thousand hits. If this was a significant character, surely she'd get more than 38. Delete, or failing that Merge. DaveTheRed 01:10, 25 Feb 2005 (UTC)
      • Comment: Nope, I'm trying to tell you that this character may not be well represented, but that's not why I claim the Google test has limits, nor do I claim that the Google test is flawed. It's just misapplied. No change of vote. Andrewa 01:20, 25 Feb 2005 (UTC)
        • I think the google test can be misused both ways, depending on the subject. The subject matter is key, and is often overlooked by those who think it's all in the numbers. It's easy for many unencyclopedic things to get a substantial number of google hits, through self promotion, having names in common, as well as various things that exist only on the internet, hence displaying all references to a subject, rather than what is generally thought to be a mere sample from which one can extrapolate. Likewise many subjects are not well represented on the internet, mostly things that predated the internet and have not seen widespread discussion since its advent. Holding, say, minor historic figues of antiquity to the same significant google results as, say, porn stars, is ridiculous. The google test has its limits. Severe ones. And one must keep in mind not only the number of hits, but what the hits actually are. I've come across many pages that had no apparent mention of the subject I was searching for, and this must be taken into account. I will say this, however: something that turns up 0 google hits is very unlikely to be notable enough for an encyclopedia, and, is somewhat unlikely to exist. Of course, they may be exceptions to this too. My point is false positives and false negatives are both common, and anyone who thinks any Pokemon character is underrepresented on the web is delusional. My vote is below. -R. fiend 04:58, 25 Feb 2005 (UTC)

Much of what is said here should IMO be incorporated into the Wikipedia:Google test writeup, but I think a little more discussion should happen first, and here is probably the place to do it. Andrewa 20:52, 25 Feb 2005 (UTC)

  • In my opinion there should be some sort of loose stated hierarchy of how many hits is significant for a subject. Something along these lines, starting with subjects that would require the highest number of google hits to be deemed a significant result:
    1. Porn stars. Any attarctive woman willing to show off her (usually large) breasts can easily surpass most nobel prize winners in google hits. This does not in itself make them notable. There are thousands and thousands of people in the porn industry, and while quite a few are notable, Jenna Jameson is the exception more than the rule.
    2. Internet phenomena. Usually the results of a google search indicate only a small fraction of the number of times a subject has been mentioned in print, conversation, over the airwaves, etc. Then there are those things that exist almost only on the internet. Googling Winston Churchill gets me just under 2,000,000 google hits, which is still only a fraction of the times he has been mentioned in some form. Now googling slashdot gets me nearly 9,000,000. Is slashdot 4 times more significant and encyclopedic than Churchill? Clearly not. The slashdot hits basically represent slashdot in its entirety. Clearly both are encyclopedic, but it's easy to see how the google test favors slashdot.
    3. Figures in entertainment. Through promotion, and the fact that rather frivilous things such as movie stars get more than their share of mention on the internet, entertainemnt figures are overrepresented. The "information superhighway" is as much an "entertainment superhighway". Of course, many of these figures are discussed inordinately outside of the internet as well. And since fame generally means notability in some form, the google test here is only slightly favors this category.
    4. Subjects that have been existent/active in the internet's heyday and Famous people who are alive/were alive at that time. This is many subjects, and what many people have in mind when doing the google test. Things to keep in mind are, for example, someone who served in the US House of Representatives from 1997-2001 will likely have more google hits than a comparable person who served from 1957-1961. This does not make the first person more notable, it's just that he was alive during the internet age.
    5. Converse of the above. In this category we have our second congressman. Another example is Gaius Gracchus, who gets less than 7,000 hits, and Dennis Kucinich, who gets 354,000.
    6. Obscure/esoteric subjects of an encyclopedic nature. Open any encyclopedia and it shouldn't take you long to find something/someone you've never heard of. Some of these will yield few google hits. Various technical/scientific subjects fall into this category. The bar is lower for these things because they aren't necessarily discussed in the mainstream. Of course, everything scientific is by no means encyclopedic, but one must not dismiss them because they don't have as many google hits as that guy who played "Man in Elevator" in that movie about the college kids on a panty raid, and his article was deleted.
  • This being said, anything that results in no google hits at all is very unlikely to be encyclopedic. If it exists, it's likely mentioned somewhere on the internet, but even that is not always true. None of this is ever meant to be set in stone, just as the google test itself should not be. But it should give some food for thought to those people who take the flat view that 3,000 hits passes the test but 400 fails, or whatever. It is much more complex than that. -R. fiend 22:26, 25 Feb 2005 (UTC)

Google Test?

These tests prove nothing It also seems to endorse only one serch engine serch results are most often inconcluseive "If it isn't on google it must not exist" but it does exist use more to back up your claim rather than a random serch engine reserch the topic more

yahoo is a better serch engine in my opinion Dudtz 7/23/05 4:00 PM EST

Google is more of an authority than Yahoo, partly because it doesn't take payment for inclusion in its index. Most SE's do let webmasters include their sites (without being marked as advertisements) in their regular results, and "hitcounts". If you have the money, you can get *any* person, artist, business, product, or whatever you wish, on Yahoo Search in a few days; with the content refreshed frequently. Of course, Google can be spammed, but they don't personally profit from it. --rob 19:04, 20 August 2005 (UTC)


systemic bias

Google is useless for certain contexts because of the inbuilt pro-American usage bias of the Internet, and indeed the cataloguing system of Google itself. For example - In Britain and much of the rest of the world the light shiny metal is spelt aluminium (pronounced "alu min ee um") and not aluminum (pronounced "alu min um"). Absolutely no-one whatsoever in the UK pronounces or spells it as 'aluminum' and yet if you put "aluminum site:uk" you get more than 400,000 hits, put "aluminium site:uk" into Google and you get "767,000" hits. Someone might conclude that Brits use aluminium 2/3rds of the time and aluminum 1/3rd. But this is clearly nonsense. Note that the top 2 hits with "aluminium site:uk" do not even include the word in the text! They Google cache states that "These terms only appear in links pointing to this page: aluminum". Put "allintext:" into the search and 10,000 hits disappear. this demonstrates the inbuilt bias of the Google engine itself. There still remains the question of why it still shows more than half as many hits for "aluminum" compared to aluminium" - The reason is that much of the content of the Internet is cut and pasted from one site to another. Many American corporations do not bother to re-write their pages for non-American usage. For example on the front page of www.pricerunner.co.uk we find the words "Find the best price on your favorite music". Another factor is the ubiquitous use of Microsoft products. Quite often MS Word ships with the US English dictionary as default (this is how it was installed on my computer at my place of work for example). Frequently people don't know how to or can't be bothered to switch this dictionary to the British English one. Finally many sites use American spellings somewhere one their page because if they didn't Americans wouldn't be able to look up the product in a search engine. All in all Google only reflects the power of the US on the Internet and not usage in real-life. Jooler 21:52, 24 August 2005 (UTC)

:The internet is real-life. http://google.co.uk does in fact give more results for aluminium than aluminum, when doing a uk-only search. You are unfair to Google, who does recognize that tld's are not perfect indicators of national origin. In fact, Google.co.uk is so good, the first result for "aluminum" was "World-Aluminium" (e.g. American spelling took me to a site with British spelling). This article does address the problems with language bias pretty well. Google is great. Your criticisms of the internet though, are valid, but Google has no responsibility for things beyond its control. --rob 23:16, 24 August 2005 (UTC)

You have precisely proved my point! "e.g. American spelling took me to a site with British spelling" - Exactly! This counts as a HIT (for the purposes of a Google test) for the spelling "aluminum" even though the page doesn't even contain the word! - You are wrong about the hit count on google.co.uk though - When using Google.co.uk and limiting the pages to from the UK we get 869,000 for aluminum and 1,400,000 for aluminium - this suggests a ratio of 2/5 for the non-native spelling! One that literally no-one in the UK would use, that's real-life. Jooler 23:30, 24 August 2005 (UTC)
I mis-read what the first post said (my eyes have trouble seeing an "i" sometimes). Anyway, since this isn't real-life, I'll depart the conversation (p.s. look up the word "literally" in a dictionary). --rob 23:41, 24 August 2005 (UTC)

NPOV

the whole idea of the "Google Test" isn't neutral it promotes Google more than any other search engine it would be better to call it a search engine test Dudtz 9/1/05 3:39 PM EST

NPOV applies only to articles. It's just the name that emerged from the community, no favorism is intended (and no one would recognise the term you use). Deco 21:01, 1 September 2005 (UTC)

Google?

Google test? Are we trying to promote Google? I think we should change this to search engine test. — Stevey7788 (talk) 04:01, 13 September 2005 (UTC)

Invalid test

The whole notion of this Google test is ridiculous. Maybe one day when Google has scanned the majority of all written texts, THEN we can call something like that legitimate, but claiming that Google has access to all, or even close to half of the knowledge or thoughts that man has ever considered is ignorant. As I have realized from trying to Google for related content to that published in the 1911 Encyclopedia Britannica, Google is of little or no help. Is something that was notable enough for the 1911 Britannica no longer notable because an Internet search tool has never heard of it???

The only true value of the Google test is when testing the validity or notability of something which Google has scanned much: the Internet. Outside of this, Google cannot be used as a test of notability, only as a possible way of determining verifiability, but even if the test fails, the content is still not "unverifiable". The skeptic has to turn to the submitter for source references, or to the community as a whole, in the hope that one of them might have read a book in their lives. Pcb21 put it best: "I get frustrated by people using the Google 'test' as authoritative - if the web already knew it all there would be less need for Wikipedia!"BRIAN0918 • 2005-10-14 14:26

  • Can you give an example of something that's in the 1911 Encyclopedia Britannica that can't be found via Google? Dpbsmith (talk) 20:41, 14 October 2005 (UTC)

del.icio.us

I've also found del.icio.us useful in verifying external links and vanity articles. If a significant number of people have bothered to bookmark a site, and tagged it with relevant keywords, then that tells me the link or website is truly useful and informative. How about adding this site, as an added tool for "Google test"? --Aude 03:33, 16 December 2005 (UTC)

Alexa test for websites

What should the rank of a website (other than the official website of the subject or a very related one) be on Alexa to be considered notable enough to be included in the "External Links" section of a page? -- King of Hearts | (talk) 00:54, 5 January 2006 (UTC)

I haven't seen one in wide use. It's possible for a random non-notable person somewhere to have a particularly nice explanation / graphic / etc. on a more notable topic, so the person's notability or traffic doesn't matter so much, so Alexa perhaps isn't the best thing to use? I mostly see WP:SPAM or talk pages being used to whittle down external links to a managable number. --Interiot 01:43, 5 January 2006 (UTC)
Please note that Alexa is also a small bit of spyware. Every URL you go to while Alexa is running is sent back to the Alexa servers. Also, Alexa is available ONLY for Microsoft Internet Explorer, which does not mean its accurate by any means. People using something within the UNIX family of systems, the MacOS family of systems (not Mac OS X, that's UNIX), or the BeOS family of systems won't be using Alexa, thus, Alexa's results are skewed. Alexa has zero accuracy, as such, it should not be used as a viable means of any sort by Wikipedia. If it comes down to a vote, I vote to strike Alexa from the policies and guidelines of Wikipedia. (Lady Serena 02:55, 1 February 2006 (UTC))

The Google test, continued...

I agree with the comment that the term 'search engine test' would be more appropriate than 'Google' test. Personally, I use Clusty (clusty.com) - it has the very useful feature of clustering similar topics, it saves you from clawing through 2.8 million hits.(in 0.0146 seconds)

Yes yes. The name is biased, but that's the traditional term Wikipedians use. You might as well say "laugh" should be spelled "laf". Deco 04:03, 19 January 2006 (UTC)

I question the reliability of google news to provide untainted results anymore. A massive number of internet "news" sources have arisen, for purposes ranging from feeding the celebrity gossip craze to "news" links pages designed to support multiple ads in order to generate income for the owner. Some blogs are now being included in google news searches as well. Does anyone object to my modifying that section of the article to reflect this? Would appreciate input and opinions. -Jmh123 17:40, 25 February 2006 (UTC)

Including search engine test data in articles?

Is this appropriate?: Jihadunspun.net#Site stats? Esquizombi 04:48, 16 March 2006 (UTC)

Unfair

I strongly disagree with this method of verification. What if EVERY site required a google test??? And besides, maybe wikipedia is that item's first foray into cyberspace. I propose to scrap this method or atleast revamp it in a big way —Preceding unsigned comment added by RDLP715 (talkcontribs)

Please see Wikipedia:Verifiability. Given that your edits otherwise look like patent nonsense, and you have not responded with verifiable proof that their subject (a) exists (b) is verifiable (c) is not "original research", I cannot see any reason why they should not be speedy deleted; therefore, as per Wikipedia:Criteria for speedy deletion CSD1, I have done so. Also, please read WP:NOT, particularly the bit about advertising. -- Karada 18:08, 31 March 2006 (UTC)

Official?

So is this an official policy, guidline or process? I'm kind of confused if it's something that's suggested or something that must be looked at during an AFD. Because if you can use it whenever it suits you it seems a little pointless. What am I missing? 128.143.63.86 06:26, 6 April 2006 (UTC)

Guideline?

I think this article should become a guideline. everyone already uses it, so should i put up the tag? Vulcanstar6 01:31, 8 April 2006 (UTC)

No, not everyone uses it. Many see grave problems with the test, and there's much debate on its use. It's not really suited to being a guideline. It's just a page of information. -Rob 02:30, 8 April 2006 (UTC)

Biases

The discussion misses some of the other biases of Google and the web. Americans are still the largest and most prominent group of English web users so many websites follow American usages. However it is arguably incorrect to say this applies to world English speakers as there are many English speakers in countries such as India. Personally, I think you don't prove much by a Google search when you are debating American vs Commonwealth terms. You need to look at other issues... Nil Einne 18:29, 28 April 2006 (UTC)

New tool

Take a look at this site Fight. It shows a tool that compares two keyworlds by the number of sites after a search in Google. It might be nice to add it to the page. CG 09:23, 1 May 2006 (UTC)

Alexa test, Avril Lavigne example needs replacing

The Alexa test section gives as an example of the fact that alexa tests may not be workable, the fact that avrillavigne.com has an alexa ranking of only 1,261,091. Except, oops, the ranking is actually 122,615 as of now. Making this a terrible example. Either this website just got vastly more popular in the last month, or someone added an extra digit onto the end.

I'm going to delete this, but a new example would be a good idea. Here is the text as it currently stands, if anyone wants to find another example to replace it: "A number of unquestionably notable topics have corresponding web sites with a poor Alexa ranking. For instance, http://www.avrillavigne.com had a traffic ranking of 1,261,091 as of January 27, 2006[1], but nobody would question Avril Lavigne easily warrants an article, and its reasonable to assume the site is visited by more people than indicated by Alexa." --Xyzzyplugh 00:41, 11 March 2006 (UTC)

The claims in the article, for example "Alexa itself says ranks worse than 100,000 are not reliable" need citing. Also, the one citation that there is in that section, http://www.mediacollege.com/internet/utilities/alexa/ is dated 2004, and Alexa could have changed a lot (for better or worse) since then. An up-to-date citation is needed. Esquizombi 07:25, 12 March 2006 (UTC)

Actually, I added it, and I didn't put an extra digit. If you look at this you'll see it says:
    • Traffic rank today: 408,315
    • Traffic rank 1 week averge: 88,464
    • Traffic rank 3-mth: 122,615
So, you see some wild numbers, that don't seem to fit, but that's the kind of variation that's normal with Alexa. The "page views" and "reach" have also swung wildly. When I saw that number I had to double-check, but it was real. Any number you take from Alexa will give you the same problem. Also, you can only see historical figures if it's better than 100,000. Alexa produces wildly inconsistent results, and therefore numbers used to demonstrate will seem to be wrong. As well, her site uses Flash, and I think at one time (not now) you didn't register different url's for each page visit, which affected Alexa rankings. Also the 122,615 ranking can be beat by a *single* person on a computer, using just a normal web browser, with no automation. --Rob 07:51, 12 March 2006 (UTC)
Added: Incidently when you said "Either this website just got vastly more popular in the last month", I have to laugh. I've personally seen a web site go a *bigger* distance in ranking, from *one* person visiting the pages, with a toolbar, over several weeks (no automation used). That web site didn't get "vastly more popular". It's so simple, and so easy. I find it really sad that people actually think there's a big different between 1,000,000 and 100,000 rank. Why do people put more faith in Alexa than Amazon itself does. Also, while personal experience is not allowed in article space, I hope we can discuss personal experience here, in project space. As, people have to share their knowledge about flaws over things like the Alexa Test, or the Google Test. It's frankly scary to think Wikipedia has deleted many articles usuing these tests, by people who literally don't know what they mean. --Rob 08:11, 12 March 2006 (UTC)
Do the tests give any useful information, in your opinion? If a google search turns up no results, would this tend not to at least mean that at best that the term may be spelled wrong, or at worst that it may be made up? If an Alexa test has a high traffic rating, does this not at least mean the site may be popular, and an exceptionally low one like 1,000,000 or lower that it may be not notable? Are there other better tests that could be applied? Esquizombi 21:43, 13 March 2006 (UTC)
Oh, I agree that both tests have some value, in some circumstances. They're good for checking extremes on low-number side. As you say, no google hits is an indication the words, with the exact spelling, might not be used much. Also, with Alexa, a rank better (lower) than 10,000 is hard to "manufacture" manually. But, on the "high" side, they mean much less. An incredibly popular web site can easily and often have 1,000,000+ Alexa rank (example a hugely popular secure web site won't register). A personal blog, that nobody but the author visits, can easily have a 1,000,000+ google hits. Unfortunately, there aren't any great tests out there. I think ultimately, we have recognize the fact that while we can sometimes determine something popular and/or notable, we can almost never, by any means, prove conclusively that something is non-notable/unpopular/unknown. I'm ok with people using this tests, but I hate it, when I see somebody use one test, and then be convinced something is "nn" with meta-physical certainty. --Rob 22:19, 13 March 2006 (UTC)
"Are there other better tests that could be applied?" - Might want to check out Marketleap.com. And, as mentioned, a very good idea to use more than one test (see my "comparison" of Alexa and Marketleap here: Wikipedia talk:List of ways to verify notability of articles). MikeBriggs 15:06, 11 April 2006 (UTC)
  • I believe that the Alexa test would be misapplied if trying to apply it to Avril Lavigne. Lavigne is notable as a singer and meets several of the criteria at WP:MUSIC, so she clearly merits a Wikipedia article. She's not particularly notable for her Internet activity, so we don't need a Wikipedia article about AvrilLavigne.com, just a link to that site from her article. (The current Alexa traffic rank for AvrilLavigne.com is 50,665.) Alexa should be used (if at all) to measure the notability of web sites -- not the notability of the subjects of those web sites. --Metropolitan90 03:26, 19 May 2006 (UTC)

Advertising

This title is an advertisment for Google. There are many search engines other than Google, and this article indicates that, but its title implies that Google is the only search engine that can be used. This page should be moved to Wikipedia:Search engine test. Polonium 00:39, 8 March 2006 (UTC)

Since there was no responce to the comment after 2 days, I am moving the page from Wikipedia: Google test to Wikipedia:Search engine test to end a misleading and Google advertising name. Polonium 00:24, 10 March 2006 (UTC)
I'd hardly call some project namespace page called "Google Test" an "advertisement" for Google. Many people call it the "Google test" (most likely because many people use Google). I'm tempted to move it back, but I'll await comment before doing so. —Locke Coletc 20:31, 23 April 2006 (UTC)
Well, I use google as well, but other search engines exist, it is unfair to exclude them. Polonium 20:06, 9 June 2006 (UTC)
I agree with Locke Cole (above) and Deco (below). "Google test" ("Google test" -GLAT -Wikipedia) is the common name, at 124,000 hits; "Search Engine Test" ("search engine test") has only 836 — and many of those are about testing search engines, rather than testing topics by use of search engine. (And, no, "Yahoo test" et al. don't do any better.)  –Aponar Kestrel (talk) 19:21, 10 June 2006 (UTC)

Title?

Can we please move this back to Wikipedia:Google test? I know it's biased, but that's what we call it. You can't change the jargon by changing the page title. Deco 17:13, 9 June 2006 (UTC)

  • Your suggested title favors one company over all the others. It would be like calling all operating systems "Windows".--Patchouli 04:40, 24 July 2006 (UTC)
    • Of course it does. But that's what we call it. It's like saying the word "woman" is sexist because it has "man" in it. You can't reinvent jargon with page moves. Deco 19:59, 4 August 2006 (UTC)
Actually, the jargon is changing, for example on this page, search engine test is used instead of google test. The new page title is unbiased, and google test still redirects here. Based on this, the new page title should be kept. Polonium 12:48, 12 August 2006 (UTC)
Okay, if you think this term is in use I guess it's okay. But I would suggest we update the many references on this page to the "Google test", and also update hit counts to specifically note what engine was used to obtain them. Deco 13:42, 12 August 2006 (UTC)

Google Test Is Worthless

There are over 7,690,000 results for "you was"[2].

This proves that we shouldn't create an article for everything that renders many results using search engines.--Patchouli 04:36, 24 July 2006 (UTC)

Nor did anyone claim this. To quote the article, "The Google test has always been and very likely always will remain an extremely inconsistent tool, which does not measure notability. It is not and should never be considered definitive." Deco 13:46, 12 August 2006 (UTC)

Alexa test section

"Also, because of Alexa's recent plan to sell access to their web index by the hour, many websites with 'noncommercial' licenses have begun blocking Alexa's crawler completely'"'

The above line is confusing to me. What is meant by "websites with 'noncommercial' licenses"? In the U.S. websites are not licensed by any local, state, or federal government so does this refer to a license given out by Alexa for noncommercial websites to use their database content? Also, why would such websites feel the need to block Alexa's crawler? --Cab88 22:28, 2 September 2006 (UTC)

Is using Google Test an example of Original Research?

I was recently in a dispute with another user at Talk:Vaccinium_vitis-idaea over the relative popularity of two names. I quoted Google-test figures to bolster my argument, and the other user claimed this was an example of Original Research. No amount of quoting WP policy pages would dissuade him from this view. The debate on the words' popularities has died down now, but I think it would be useful to address the question of whether Google Test is Original Research directly, perhaps with a comment on it in the main project page. What do other users think? Kaid100 18:46, 21 October 2006 (UTC)

Guideline misleading?

The guideline appears to suggest Google tests establish popular usage. This is of course complete bull. It's fairly well established that Google/the internet is biased against American usage over other English usage out or proportion to their population because they're over-represented. This is even true for native English speakers such as Canadian, British, Australian, New Zealanders but is even worse once we consider the large number of second language speakers, especially from the developing world such as in India. Google/the internet is also biased to the young, male, l33t, and well the 'geek' population (all of which save l33t includes me). I propose the addition of the word "internet" to mention of popular usage for this reason. Google established popular usage on the internet, not popular usage in general Nil Einne 08:34, 28 November 2006 (UTC)

Alternatively, make it abudantly clear it may establish popular usage (rather then does establish). In theory, the in a nutshelf should cover that but a lot of people appear to be unaware of the systematic biases on the internet. BTW, Wikipedia talk:Search engine test#systemic bias has another issue worth considering. Nil Einne 08:41, 28 November 2006 (UTC)

Is this a guideline?

Maybe I'm missing something, but I'm not seeing where there was a consensus to make this a guideline. As far as I can tell, a "guideline nutshell" was added [3] ; then, later, it was categorised and marked as a guideline [4]. But it seems that the initial nutshell was added (incorrectly) purely as a summary, not to imply that this is a guideline. Has there been any consensus to adopt this page as a guideline (personally, I feel it should not be/have been yet)? Trebor 00:39, 2 January 2007 (UTC)

I wasn't thinking that marking it as a guideline would be making a statement about its level of maturity or acceptance. What would you call it? E.g. is it an essay? Kla'quot 00:44, 2 January 2007 (UTC)
I'm not sure; I've only just come across this page. The discussion over half a year ago didn't seem to come to a conclusion either. At the moment, I think it is written as a guideline, because it is "actionable (i.e. it recommends, or recommends against, an action to be taken by editors)" - if it was cleaned up and improved a bit, I think it would be useful to be able to refer to. I was just concerned that it hadn't achieved consensus (or, in fact, that much discussion at all) and had been marked as one by accident. Trebor 00:55, 2 January 2007 (UTC)
I've marked it as a how-to topic. Thanks for bringing this up :) Kla'quot 01:06, 2 January 2007 (UTC)

Refactor of page

I'm refactoring this page that had a lot of text and a low proportion of bulleted immediately summarized information. I've got as far as the first two sections, but "real life" has intervened.

I've therefore left the original text but put it into a new structure, and will work on it a bit more later. I haven't deleted anything, so at present its quite long....

When I finish it, I'll summarize what has been done, and so on. FT2 (Talk | email) 12:42, 31 July 2007 (UTC)

Update - done. Compare to before which was an essay, now its more a factual guide to search engine uses and issues; how to use them and what reliance to place on them. I've kept most of the old material, but a lot of it was in big paragraphs that readily abbreviated to a bullet point, or contained excess wordage. It now contains guidance on policy usage, a mini-tutorial on using search engines with WIkipedia, and so on. FT2 (Talk | email) 02:01, 1 August 2007 (UTC)

"How To" or "Essay"?

Some of this seems like it's a How To guide, but there's an awful lot of editorializing in between. Can we maybe separate out the instructions of how to use search engines to support arguments from the warnings on misuse of the results? I mean, nobody should point at a How To guide to dismiss another editor's argument. Torc2 (talk) 23:05, 11 January 2008 (UTC)

Jonathan de Boyne Pollard's revision

Large OR section "sourced" to a self-published blog? I don't think that's sufficient. Torc2 (talk) 02:30, 14 January 2008 (UTC)

I don't want to keep reverting this, but the guy refuses to discuss his change. He keeps linking to his own self-published website as a source. Torc2 (talk) 21:49, 18 January 2008 (UTC)
Don't keep reverting it. Discuss it here and try to come to a consensus with all of the editors who read this talk page. I'll start small by stating: links to homepages.tesco.net/~J.deBoynePollard don't seem appropriate here especially considering they're being added by the author of the article. Do others agree? ~a (usertalkcontribs) 00:51, 23 January 2008 (UTC)

Suggestion - instead of changing/reverting many paragraphs at the same time, try just one section at a time. At first glance, it seems that many of the changes in this edit war are simple wording or formatting differences and may be non-controversial. Try to identify the specific points of disagreement. The link mentioned above and associated text probably is the main bone of contention. You may be able to agree on all the other differences. Sbowers3 (talk) 02:03, 23 January 2008 (UTC)

  • His initial change was much larger. I left about half of it because I didn't think it was controversial. At least, it didn't represent the complete change in philosophy that the recent changes do. I guess that's how the link to his page remained, since it wasn't there before his first edit. The 'further reading' section is also just links to some blog articles that probably wouldn't stand up to WP:EL or WP:RS requirements, but I don't know if the standards are the same here as they are for regular articles. Torc2 (talk) 02:53, 23 January 2008 (UTC)

I agree that the link is questionable. The linked page, as best I can tell from nosing around the site, is one of many opinions expressed by one person on a variety of subjects. Several of the "FGA"s look like they would make for interesting and thought-provoking reading but it is unclear what makes them any more authoritative or reliable than any other essay that might be found on the internet. (To the extent this particular FGA is authoritative because of the citations contained therein, then a link or links to the underlying research would seem to be a more direct way of making the point.) The fact that the editor has linked to his own opinions is not troubling in and of itself, but there is a kind of circularity about it. ("If you aren't certain that my edit is a sound one, then take a look at what I've said on the subject elsewhere.") JohnInDC (talk) 17:12, 23 January 2008 (UTC)

  • I hoped for a discussion here, but it didn't happen, and it seems from the comments above like there isn't consensus for the change. Torc2 (talk) 09:30, 30 January 2008 (UTC)
I much prefer the version that you support. I really don't like Pollard's wording. TimidGuy (talk) 12:44, 30 January 2008 (UTC)
Thanks, John, for reverting. I feel like the language in Pollard's version isn't appropriate. For example, I cringe when I read this: "Raw hit counts do not, in fact, measure anything at all. Search engines do not in fact give correct hit counts, as scientific researchers trying to use them as research tools have been disappointed to discover." I much prefer "Raw hit count is a very crude measure of importance." Pollard's language is too emphatic, such as twice saying "in fact." The comment on the disappointment of scientific researchers doesn't seem necessary. TimidGuy (talk) 12:44, 3 February 2008 (UTC)
There's all kinds of specific problems with it. It's false to make that assertion that hit counts don't measure anything at all; they just don't measure what the user sometimes claims they do. However, for example, replying to a claim that a term like "shindiggining" is a common term in widespread usage by pointing out that it registers zero hits on any search engine is entirely valid. If it comes up with a few million hits, that's not a complete argument in itself, but does give cause to do more investigation before writing the claim off as false. Torc2 (talk) 19:13, 4 February 2008 (UTC)

To elaborate a bit on my concerns about the self-published link: I said above that it is circular, that it doesn't really add to the authority of one's edits to simply cite to a different place on the internet where you yourself once said the same thing. I also find it singularly unhelpful to encounter a link in support of a proposition and then discover, upon going to the link, that it is just some person's synthesis of what the scholars are (may be) saying, and furthermore that if I want to see what the scholars themselves are saying, I have to continue on to them. I suppose, finally, I figure that anyone who can write an opinion piece summarizing some category of research and put it on his personal web page ought to be able edit the thing to render it appropriate for the Wikipedia page itself by taking out the POV and citing directly to the scholars on whose research the entry is based. It's up to the editors of the Wikipedia pages, not the readers, to conduct that exercise. JohnInDC (talk) 20:32, 4 February 2008 (UTC)

No, you're right. That was my main concern too. And following the links really seems to me like the scholars are saying something closer to what we've already got in the essay than an all-encompassing "totally useless" argument. Incidentally, the link to the FGA page is still there, and the "Further Reading" section is just links to two more blogs. Should we replace those with links to the actual peer-reviewed material instead? Torc2 (talk) 20:47, 4 February 2008 (UTC)

Sure, if the ultimate sources say more or less what he says they say. (While you're at it you might tone down some of that hyperbole too, e.g., "raw hits measure nothing".) JohnInDC (talk) 21:25, 4 February 2008 (UTC)

I believe a consensus was reached to remove the J.deBoynePollard link and the hyperbole. However, J de Boyne Pollard keeps re-adding it to the project page. He still hasn't chimed in here for some reason. I'm going to revert his change unless somebody has an issue. ~a (usertalkcontribs) 15:27, 5 February 2008 (UTC)

  • I did it this time. I think the question has been posed long enough that somebody would have posted objections by now. Torc2 (talk) 21:12, 6 February 2008 (UTC)