Jump to content

Wikipedia talk:Search engine test/Archive 2

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1 Archive 2

Claiming shortcut

Wikipedia:Good topics would like to claim the WP:GT shortcut. I'm in the process of changing any pages that link to it already to link to WP:GOOGLE instead. Any objections? If not, I'll collect it soon - rst20xx (talk) 00:59, 9 September 2008 (UTC)

Depopulated. Consider it claimed - rst20xx (talk) 17:30, 10 September 2008 (UTC)

Power search options.

There are two options you can use for google searches: | (this replaces NOT) and * (google's word wild card).

So instead of (flavor OR flavour) (quark OR quantum OR physics) you can search for (flavor|flavour) (quark|quantum|physics). Similarly piometra OR pieometra OR pyametra OR pymetra can be replaced by piometra|pieometra|pyametra|pymetra.

If you are uncertain about a single world the * can be used as a wild card. harry * truman will find Harry Randall Truman as well as Harry S. Truman. Note using quotes drastically changes the results of using this wild card.--BruceGrubb (talk) 19:09, 11 July 2009 (UTC)

Bing

Should we perhaps add a mention of Bing? I've found lately that it is tuned differently enough from Google to make it a distinct, useful tool. - Jmabel | Talk 04:29, 14 July 2009 (UTC)

Sensationalism

I propose adding sensationalism to the undue weight paragraph under general biases, as this is a problem I am continually running into on Wikipedia. The popular media routinely uses sensationalism to label or categorize a topic, and inexperienced editors either do not possess the critical thinking skills necessary to objectively analyze the issue or accept without question the repeated use of a term because it was "widely reported". Viriditas (talk) 01:16, 4 August 2009 (UTC)

Dictionaries and other reliable sources should be more clearly preferred to Googling!

Alternative spellings and usages can have their relative frequencies checked (eg, for a debate which is the more common of two equally neutral and acceptable terms).

This section is a big problem and a major source of the many amateur discussions and much original research about spelling and usage on Wikipedia. As written, this section encourages original research and doesn't point out that modern dictionaries describe usage and no longer prescribe it, as they did in the past. A very commonly heard incorrect claim in discussions on WP is that dictionaries prescribe usage.) This section should point out that dictionaries use very large databases of citations from a large variety of mainly printed but also other sources and use professional skills in evaluating the results from these databases. We need to add explanations of what flags like "also" etc. and other usage labels and the order of different spellings in dictionary entries mean.

This page should have a note at the beginning pointing out that dictionaries and other reference works are a much more reliable way of checking for spelling and other usage frequencies than any amount of Googling we can do. The only things that search engine tests can add is information about common spoken usage that is possibly different than in printed sources and about possible new trends not yet recorded even in online dictionaries. --Espoo (talk) 08:26, 25 October 2009 (UTC)

Agree. This weird "advice" causes more confusion than collaboration, see most notably Talk:Aluminium/Spelling#Google stats again the end comment of that section proposing Aluminium to be renamed to "Butts". I think that statement should be deleted. Google (Search engine) tests could be used only for vaguely negative indications, such as "there is few sites supporting/treating... and therefore there is very little evidence that... ". Google tests could never ever be used to prove anything, they can just add some emphasis to other arguments that in and by themselves aren't dependent on Google tests to hold. ... said: Rursus (mbork³) 11:09, 14 November 2009 (UTC)

Common search engines

I think Scirus should be added to "Professional research indexes", they cover some ground Google Scholar has no access to. And let's not forget ADS and arXiv. Paradoctor (talk) 14:53, 15 November 2009 (UTC)

4icu?

Is the table row devoted to the 4icu university ranking site appropriate here? Note that the 4icu site mainly seems to be a specialized combination of other search engines, and relies on google custom search as the method for users to search its own site. It doesn't even seem to rise to the level of notability to include in a university ranking page on wikipedia (see Talk:College and university rankings#4icu.org_Web_Popularity_Ranking:_evidence_of_notability), and it isn't clear what common uses would be made of it to make it worth including here. I dare say there are lots of such narrowly specialized (and possibly biased) search engines out there to be included if this one is appropriate. ★NealMcB★ (talk) 13:12, 15 October 2012 (UTC)

Tamerlane vs Timur

Over at Timur there is some debate whether to title the page Timur of Tamerlane. Someone was citing the most used term was Timur. But I was argueing that these results vs a Tamerlane search are not apples to apples because there are other notable people with the first name Timur so it would presumably inflate the results. So is there some way to sort this out using the search engine test? Or is there another way? Thanks. PortlandOregon97217 (talk) 04:19, 18 November 2012 (UTC)

Google News and press agencies

There's a potential bias regarding Google News which isn't mentioned here and I think should be. Press agencies syndicate their content, so that a piece written by staff can end up being published on multiple news sites.

Google News is frequently used in discussion relating to WP:COMMONNAME, but I think it should be used with greater care.

A hundred articles produced by the New York Times, the Guardian and and Sydney Morning Herald gets you a hundred returns on Google News. But a hundred articles produced by AP, FP and Reuters could get you thousands of returns. So, lets say the first three prefer "tomayto" and the second three prefer "tomahto". Google news would tell you that "tomahato" wins by many hundred percent. But that's not a real reflection of the reality that the sources are split fairly evenly. It's actually a wildly unreliable place to look, at least in cases where house style is likely to be an issue. --FormerIP (talk) 23:54, 20 October 2011 (UTC)

Off-topic insertion

I've noticed that the sub-section 'Specialized options, including searches to include or exclude Wikipedia itself', under 'Using search engines', has recently had quite a lot of material relating to the use of diacritics added to it. Leaving aside my own feelings about this topic, this looks like an attempt to use a 'how to' page to promote a policy position which does not have anything intrinsically to do with the topic of this page, and about which there is no consensus. Would anyone mind if I trimmed that section to remove the most irrelevant bits? AlexTiefling (talk) 15:09, 18 December 2012 (UTC)

  • The tutorial explanation was expanded back around Nov. 20, with the Google template examples added around Nov. 25.
  • It explains (1) how to do specialized searches, and (2) how to use Wikipedia Google templates to greatly simplify searching multiple reliable sources simultaneously. Surely this explanation of how to use templates like Template:Google RS is appropriate here?
  • This explanation provides practical examples corresponding to the phrase below the table (in the section "Specialized options, including searches to include or exclude Wikipedia itself") that says (quote):
"Site inclusion/exclusion is often very useful to get views either from a named website, or from any other websites—e.g. it can be used:
  • To find pages on Microsoft terminology that are not self-published by Microsoft (not ending in microsoft.com),
  • To find pages that are official US or UK government sources (end in .gov and .gov.uk accordingly),
  • To find sites from a given country (more likely to end with that country's initials, such as ".fr" for France),
  • Or particular media publishers (eg, "cnn.com" or "bbc.co.uk")" (unquote)   LittleBen (talk) 16:06, 1 January 2013 (UTC)
I see you've reinserted the diacritics-related material that I removed. Do you fancy abiding by your topic ban on the subject of diacritics? AlexTiefling (talk) 23:58, 2 January 2013 (UTC)
  • As I explained, this was inserted on Nov. 20, before my topic ban. The research methodologyy—how to research terms of usage—is what is being explained, even if you object to the examples. The template—developed in my user space in November—is a very useful tool for Googling multiple reliable sources simultaneously. It is useful because—for each search—Google ranks the sources in order in the search results, with the more widely respected results at the top. Wikipedia is supposed to be about using reliable sources, and the template makes it so easy that surely there is no excuse for not properly researching accepted usage or terminology. LittleBen (talk) 00:07, 3 January 2013 (UTC)
I have no objection to the section or the general methodology. It's the undue weight given to the topic of diacritics that bugs me. I did not remove the entire section; only the examples that were diacritics-related. As explained, they appeared to endorse particular conclusions which are not a matter of WP policy, and thus distracted from the generally strong nature of the section. The one about Arguello was also way too long. (After edit conflict): Please learn how to use preview. Your constant re-amendment of the same pages is quite disruptive. You have been asked to amend this behaviour previously. AlexTiefling (talk) 00:13, 3 January 2013 (UTC)
  • I agree that the explanation of Template:Google LC is longer than the explanation of Template:Google RS—because the former example shows how to research a little-known topic (a soccer player) by selectively removing a flood of results for a much more well-known tennis player with the same name. This is essentially a much more complex version of the Madonna of the Rocks vs. Madonna (singer) example.
  • I have moved the longer and more complex Template:Google LC example below the relatively simple Template:Google RS example.
  • When I compose a reply, I often copy already-written items from other pages. The easiest way to do this is to save an intermediate copy of the reply and then add to it. If you answer before I have finished writing a reply then that interrupts the process. LittleBen (talk) 00:30, 3 January 2013 (UTC)
The Arguello examples refer to an ongoing debate elsewhere on Wikipedia, and you can't pretend not to know that. You included the text "The search results suggest that, in major English sources, the preferred way of spelling the name is to omit the diacritics." in three examples you wrote, each of which refers to a person whose name is debated elsewhere. This gives the false impression that your search technique is authoritative. English-language sources are not more reliable or more neutral than non-English ones, especially for information about people whose first language is not English. By directing people to use this highly biased method, you are pushing your own point of view. You have had this explained to you before.
And for the love of all that's holy, how dare you claim that me posting one correctly written reply 'interrupts the process' of you using the live version of the page your personal scratch pad? Your hamfisted editing technique disrupts other people's replies to you. Get it right first time; this is what preview is for. AlexTiefling (talk) 08:38, 3 January 2013 (UTC)
  • For the record, I think I have only once "added an unaccented variation of a name or other word as an alternate form to one with diacritics" (in the lede of the Walesa article) and probably never "converted any diacritical mark to its basic glyph on any article or other page". This was all a sham to stop any discussion about doing proper research in a broad range of reliable sources, and to stop any discussion about keeping English Wikipedia widely accessible and useful to the majority of people who can't read foreign languages—a trustworthy resource as to English usage and spelling in a great majority of reliable sources in the real world—while properly catering to the minority of people who want to see both versions together, which I support. LittleBen (talk) 00:45, 3 January 2013 (UTC)
The 'real world' includes plenty of people who use Latin scripts and don't speak English, and you know it. You can't know that only a minority of people want to see both versions together. And you're deliberately quoting from the narrow parts of your ban, when what you were told to do was to stay off the topic. The smokescreen of templates and 'reliable sources tools' doesn't really get you away from the fact that what you're doing is trying to build a tool for pushing your POV into WP's guidance to its editors. AlexTiefling (talk) 08:38, 3 January 2013 (UTC)
  • I really care very little about diacritics, but I do feel very strongly about NPOV, about keeping Wikipedia as a reliable and trustworthy source—and about minimizing or eliminating edit warring, move warring, and intimidation by a few ultra-nationalists. As for how names of people and places should be written in English, I would think that governments would be the among the most reliable sources as to proper usage. (I may write an expanded version of the Google RS template that searches official Government sites corresponding to all the major language versions of Wikipedia). In the case of Poland, there's this government site. Surely it's ridiculous to say that the official government site is not a reliable source, and that the majority English usage should be completely stripped out of an article—not appearing even once in the lede—because it's "not encyclopedic" or it's "unethical" to use English in English Wikipedia. LittleBen (talk) 03:10, 3 January 2013 (UTC)
I have never claimed that 'majority English usage should be completely stripped out', nor will I. Indeed, there are plenty of cases - mainly historical - where I would argue for English forms which are quite different to the native ones to be used more or less throughout. But this is something which can be determined on a case by case basis, per WP:COMMONNAME, rather than by using your bias-promoting templates. But I still think it is crazy to suggest that the Polish government's English-language site is more reliable than their Polish-language site. AlexTiefling (talk) 08:38, 3 January 2013 (UTC)
  • <Quote> they appeared to endorse particular conclusions which are not a matter of WP policy <unquote>. I understand that official WP policy is to be neutral—I haven't seen any policy that requires official English versions of names to be completely stripped out of English Wikipedia, have you? LittleBen (talk) 03:33, 3 January 2013 (UTC)
No, and what's more, I'd oppose such a policy. But what you're doing, with your little notes that "The search results suggest that, in major English sources, the preferred way of spelling the name is to omit the diacritics", is to try to introduce the reverse of such a policy by stealth. Please stop. AlexTiefling (talk) 08:38, 3 January 2013 (UTC)
Addendum: There is no such thing as 'official English', so the idea that there is reliably an 'official English' version of anything at all is nonsense. The appearance of variant spellings in official English-language versions of documents produced by non-English authorities is as much an artefact of the translation process as anything else. That doesn't mean we can't or shouldn't use them, but I think you give far too much weight to a feature which is incidental, rather than essential, to the media in which it occurs. AlexTiefling (talk) 08:42, 3 January 2013 (UTC)
  • <Quote>I think you give far too much weight to a feature which is incidental, rather than essential, to the media in which it occurs<unquote>. Do you mean that you think that the use of English is incidental, rather than essential, in English media? LittleBen (talk) 10:45, 3 January 2013 (UTC)
If you think English is so important, please take more time to read my own writing more closely for comprehension. I do not believe that establishing definitive English variants of personal names is an intentional function of the majority of English-language sources produced by agencies whose primary language is not English. And I still think that sources in the native language of a subject are generally at least as reliable as those in a foreign language such as English. Your gambit with the templates her deliberately undermines that. AlexTiefling (talk) 11:18, 3 January 2013 (UTC)
  • So who should have the final say as to how Chinese, Japanese, and Korean words and names should be written in English, if not the corresponding governments and the majority of reliable news media who tend to respect what governments and major international organizations like the UN, the Nobel Prize Committee, and the Olympic committee decide? (Of course names of famous living persons, as recorded by such organizations, are usually the same as the "international" version of their names as shown in their passports—passports usually show the name in the local language plus the "international romanized version" of the name). Doesn't the same argument apply to other countries too? LittleBen (talk) 14:55, 3 January 2013 (UTC)
No, the situation with countries and languages that do not use Latin alphabets at all is not a useful parallel. I know that's your actual preferred area of interest, and I have no quarrel with you about it. But I don't think it's relevant here. AlexTiefling (talk) 15:20, 3 January 2013 (UTC)
  • I don't know what the Latin alphabet has to do with it. Are you suggesting that English Wikipedia should be written in the Latin alphabet? The several extended Latin alphabets have little or nothing to do with English. LittleBen (talk) 05:05, 4 January 2013 (UTC)
  • <Quote> I still think it is crazy to suggest that the Polish government's English-language site is more reliable than their Polish-language site.<Unquote> Are you suggesting that the Polish language site is a reliable source for English usage relating to Poland? Because Polish usage is surely incidental, and English usage is surely primary, to English Wikipedia. LittleBen (talk) 15:01, 3 January 2013 (UTC)
Statements by the Polish government in Polish are likely to be more comprehensive, relevant and accurate than their statements in English, yes. This is true whether we are on the English, Polish, Latin or Japanese Wikipedia. AlexTiefling (talk) 15:20, 3 January 2013 (UTC)
  • The three encyclopedias and seven news media sources that are searched simultaneously by Template:Google RS are generally considered to be the most authoritative guides to accepted English usage. Are you contesting this? LittleBen (talk) 17:58, 3 January 2013 (UTC)
(outdent) I'm not contesting it, although I didn't know that such a canonical list formed part of our WP:RS policy. I'm disputing that such sources are automatically more authoritative for any purpose, whether naming conventions or any other, than ones originated by reliable creators using the same language as the subject. AlexTiefling (talk) 18:02, 3 January 2013 (UTC)
  • As I mentioned above, I will probably write an expanded version of the Google RS template that also searches official English-language Government websites corresponding to all the major language versions of Wikipedia. Names of famous living persons—as recorded by major international organizations such as the UN, the Nobel Prize Committee, and the Olympic committee—are usually the same as the "international" version of their names as shown in their passports: passports usually show the name in the local language plus the "international anglicized version" of the name. Such "anglicized (passport) names" of famous nationals of a country will also appear on the government English-language web site of that country, as well as being reported in the same form by the majority of reliable news media. LittleBen (talk) 08:08, 4 January 2013 (UTC)
  • The major foreign-language versions of Wikipedia appear to include German, French, Italian, Dutch, Polish, Swedish, Romanian, and Serbian. I have omitted Spanish and Portuguese from this list because I'm still looking for the most reliable sources. I have omitted Russian for the same reason, and I haven't yet researched Czech sources. I have already created a template for Vietnamese—it searches the ten reliable sources in the Google RS template, plus another nine official and major English sites in Vietnam. Chinese and Japanese also deserve separate templates. LittleBen (talk) 09:33, 4 January 2013 (UTC)
All you have done is to convince me that you are wilfully missing the point. I've tried to assume good faith, but I can no longer believe that you really don't understand me. You're choosing to ask me to restate my point, or feigning incredulity that I'm really suggesting particular things, because you prefer that to taking my point seriously, even in disagreement. I'm through here. I'm going to remove the specific POV-pushing lines from the guidance, but not the needlessly wordy and POINTy examples, and leave it at that. AlexTiefling (talk) 10:34, 4 January 2013 (UTC)
  • I have put together a starter version of the template at {{User:LittleBenW/Template test4|Test term}}. This searches official government web sites and official government tourist websites for Germany, France, Italy, Holland, Poland, Czechoslovakia, Sweden, Romania and Serbia—about 15 web sites in addition to the ten of the Google RS template. It should be useful for researching names of people and places. It still needs to be tested and documented. LittleBen (talk) 11:10, 4 January 2013 (UTC)

Another caveat?

There's discussion at Talk:Coypu to move that article to Nutria, where # of Google results has been mentioned in support of the move. I was commmenting on the problematic nature of search engine tests, and wanted to give an arbitrary example of Googles behavior when additional search terms are provided which should vastly narrow the number of results. Anyway, I did [this search], and was surprised by the results. Although I searched for "coypu", Google apparently now knows that "nutria" is a synonym, so I got many results where Google had bolded "nutria", as if it were my search term (some results were even on a prison food called "nutria" that had nothing to do with the animal). While having Google automatically search for synonyms of the entered search term is useful to the users of Google, it further diminishes the extent to which a count of Google results can be used to determine which of two synonymous terms is more common. Plantdrew (talk) 00:40, 25 April 2014 (UTC)

If you wish to search for an exact word or phrase, use quotes.[1] Then the count for "coypu" "benton county" reduced to 468 from 17,200.―― Phoenix7777 (talk) 01:02, 25 April 2014 (UTC)
Exact word. That's good to know. I knew about using quotes for an exact phrase. My point was about the reported number of results per se, but the content of the individual results. Man of the search results when coypu isn't in quotes had the word "nutria" only, and lacked "coypu" (the term I actually searched). Enclosing "coypu" in quotes gets rid of the pages that only used nutria. The Wikipedia:Search engine test doesn't mention using quotes to force Google to exactly match a single word, maybe something about that should be included.
I'm playing around with it a little more. Reported result number is the same if I'm just searching for the single word "coypu", with or without quotes. If I add a second search term, quotes start making a difference in reported number of results. I tried a couple more places where the animal occurs as secondary search terms (Shreveport and Louisiana); without quotes I get some results that use nutria but not coypu without quotes, and the reported result number went down with quotes. I also tried rodent/Myocastor coypus as secondary search terms. With rodent, reported results went up when coypu was in quotes. Enclosing the secondary search term in quotes also made a difference; results are 10x higher for "coypu"+"rodent" than "coypu"+rodent. My conclusions: search engine tests are very unreliable and exquisitely sensitive to how the search is constructed (and that conclusion is nothing new). Plantdrew (talk) 03:07, 25 April 2014 (UTC)
I wouldn't trust any result numbers above 1000. I'd try to filter the search until it gets under 1000, thenclick to the end and see how many real URLs are returned. This will better mimic what the user may potentially see. Even if there are 5 billion pages that say nutria, a user will never be able to reach them.--Obi-Wan Kenobi (talk) 03:11, 25 April 2014 (UTC)
  • sigh* I was trying to constract a query that would get less than 1000 results when I searched for coypu+"Benton County" in the first place. And coypu+"Benton County" or nutria+"Benton County" is way too specific a search to answer the question of whether the animal is more commonly known globally as nutria or coypu (although it's clear that the common name in Benton County OR is "nutria"). Are there any notable topics that don't have a raw count of more than 1000 Google results? I'm not notable at all, but I just Googled myself and have an almost 5k result count for my (globally unique) legal name. Is it even possible to construct a query that comes in under 1000 results where the choice of additional search terms to reduce the result number don't introduce bias? Plantdrew (talk) 04:18, 25 April 2014 (UTC)
Limit the search to reliable sources - choose for example 10 major newspapers, and only search against those urls.--Obi-Wan Kenobi (talk) 04:24, 25 April 2014 (UTC)

A challenge to the accuracy of this page

This page is being discussed at Wikipedia:Reliable sources/Noticeboard#Google search result as a direct source. --Guy Macon (talk) 22:23, 16 May 2015 (UTC)

Google scholar in "Specific uses of search engines in Wikipedia"

I don't know what is being implied by the content on google scholar: "Google Scholar provides evidence of how many times a publication, document, or author has been cited or quoted by others. Best for scientific or academic topics. Can include Masters and Doctorate thesis papers, patents, and legal documents. Google Scholar search." I would never say something like "X's papers have received 8 bazillion citations on google scholar" cited to the search result, and I would delete it, if i found it. What is the point of this section? Thanks. Jytdog (talk) 00:29, 17 May 2015 (UTC)

Exact phrase searches - more explicit recommendation, if not requirement-to-use, needed.

Having just seen a well-argued move-request fail because the opposers were intent on giving value to searches that did *NOT* use exact-phrase-search in evaluating different phrases for an article name, I believe there needs to be much more explicit recommendation, even requirement, to use exact phrase searches when evalutating, ehhm, phrases as potential article names.

I see this article is not locked, so I suppose I could just add it, but as I'm a newbie around here, I'm opening a conversation here first.

For the failed move request in question, see Talk:Lucas_Roberts#Requested move 27 February 2017 [EDITED to add: after I raised the matter with the closer, they changed their decision to MOVE] In this instance exact phrase searching indicated that WP:COMMONNAME was the proposed new location Lucas Horton, and not the current old name Lucas Roberts. However, opposers insisted on giving value to searches that did not search on the exact phrases "Lucas Roberts" and "Lucas Horton", and which report hits even when "Lucas" and "Roberts" (or "Horton") don't occur together. Due the sheer commonness of the surname Roberts (almost eight-times as common on the world wide web as Horton), these word-only searches produces stacks or results for Roberts, and the opposers held these results out as valid indication of WP:COMMONNAME, even when I carefully explained to them exactly why they are not. Furthermore, the closer also apparently gave credit to these invalid non-phrase searches - which leads me to believe that wikipedia is not giving sufficient guidance for this.

Hence, I propose that the paragraph on exact phrase searching to be far more explicit, and state a requirement to use an exact phrase search when evaluating a potential title that comprises more than one word. @Born2Cycle: @TAnthony: Aliveness Cascade (talk) 21:20, 21 May 2017 (UTC)

From the note on the top of the page, it seems that the proposal for an exact-phrase requirement should be made elsewhere, as this is simply an information page, but I also am not so much new to Wikipedia editing, but to discussions of policy. I have big problems with an internet search engine being used as a definer, particularly the heavy emphasis here on Google. I do think that the page provides good information on searches, the possible biases in their results, and how to maximize the relevance of results for a particular purpose. I like it as a “how-to” page.Sallijane (talk) 20:10, 11 June 2017 (UTC)

Semi-protected edit request on 7 May 2019

I really think this article should mention that some search engines opt-out of the filter bubble and provide unbiased results equally for all of their users, and provide examples of some popular ones that do so; specifically, I think that DuckDuckGo, Qwant, and Startpage.com are (in no particular order) the most popular ones. In section 3, the article says that "search engines often will not Be neutral."; I think an explanation can be added that some search engines try to provide unbiased results equally for all of their users (though the results might not be unbiased because of the sources the engine uses in the backend to provide results (for an explanation by DuckDuckGo see this), or because of other technical details (this Twitter thread could be generalized to apply not just to Google so it's always relevant)). Additionally, DuckDuckGo, Qwant, and Startpage.com should be mentioned in the table in section 8, common search engines, next to Google, Bing, and Yahoo! (in the "general search engines" examples). 85.64.33.163 (talk) 21:59, 7 May 2019 (UTC)

 Not done: it's not clear what changes you want to be made. Please mention the specific changes in a "change X to Y" format and provide a reliable source if appropriate. DannyS712 (talk) 21:40, 19 May 2019 (UTC)