Wikipedia:Reference desk/Archives/Computing/2017 February 1

From Wikipedia, the free encyclopedia
Computing desk
< January 31 << Jan | February | Mar >> February 2 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


February 1[edit]

Need Boolean minus for popular search engine[edit]

I'm trying to search for something while excluding something else. Of course, in Google, that is search Dog -vet or that sort of thing. Well, that minus doesn't work in the very popular Chinese https://www.baidu.com/ What am I doing wrong? Anna Frodesiak (talk) 07:26, 1 February 2017 (UTC)[reply]

Ehm, according to this it should work. (((The Quixotic Potato))) (talk) 12:05, 1 February 2017 (UTC)[reply]
I tested it and it seems to work. I used the search queries "trump -donald" and "trump" (without quotes) and there was a noticeable difference. (((The Quixotic Potato))) (talk) 12:06, 1 February 2017 (UTC)[reply]
It also seems to work for '糯 米' vs '糯 -米' to me. Nil Einne (talk) 15:31, 1 February 2017 (UTC)[reply]
God, I just put "trump" in Google and every single one of the top 100 hits was about this guy, or (at #98) an apartment in a tower named after him. You know Trump is trump when he trumps the freaking ace of spades. :( Wnt (talk) 15:43, 1 February 2017 (UTC)[reply]
Nil Einne, I think the problem is words made of multiple characters. Here is Hainan (province) minus Sanya (city in the province): 海南 -三亚. [1]
Anna Frodesiak (talk) 00:47, 2 February 2017 (UTC)[reply]
I never learnt spoken or written in any form, but my understanding which seems to be supported by Chinese characters is there's no word boundaries, so 海南 -三亚 could always be interpreted in various ways. The above help guide suggests it should work similar to Google but it seems it doesn't. That said, I would expect 海南 -"三亚" but it doesn't. However 海南 -三 -亚 seems to work to some degree (at least it produces significantly different results). I also tried –“三亚” and –三亚 and confirmed they don't seem to work. Nil Einne (talk) 05:01, 2 February 2017 (UTC)[reply]
When writing, extra characters are used to make words more obvious. You used 三亚 instead of 三亚市. I assume you meant to search for Hainan and exclude touristy Sanya. The difference is between spoken and written. When speaking, we say Sanya. When writing, we write Sanyashi. 209.149.113.5 (talk) 13:23, 2 February 2017 (UTC)[reply]

Identifying the article patron from URL[edit]

Hi,

I have list of wikipedia URLs which I fetched from below URL using wikipedia library in python.

https://en.wikipedia.org/wiki/List_of_American_mathematicians

But I am getting the URLs pointing to non mathematician as well eg https://en.wikipedia.org/wiki/Clark_College_(Washington)

Now I need to extract the URLs which are pointing to mathematicians only. But I am failing to find any particular keywords or sections which only a human artical can contain and not any university/publishing/history article can.

Can you please help me? — Preceding unsigned comment added by 175.137.69.57 (talk) 17:45, 1 February 2017 (UTC)[reply]

It appears you pulled all links from the page. You want ONLY the first child link from each LI object. 209.149.113.5 (talk) 19:00, 1 February 2017 (UTC)[reply]
Category:20th-century_American_mathematicians may be of interest to you. (((The Quixotic Potato))) (talk) 19:46, 1 February 2017 (UTC)[reply]
If you need a quick and dirty hack, you can filter out all links that contain the terms "university", "laboratory", "college", "society", "association" (case-insensitively). That's not a general solution, but should work. --Stephan Schulz (talk) 20:36, 1 February 2017 (UTC)[reply]