Wikipedia talk:WikiProject Linguistics/Archive 22

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 15

←

Archive 20

Archive 21

Archive 22

Long-term OR abuser, now inactive

Thanks go to Doric Loon for this removal of an entirely unsourced OR paragraph at English phrasal verbs. Unfortunately, that paragraph survived ten years in the article, added in 2012 by Tjo3ya (talk · contribs), now inactive. Glancing at their contrib history, they were a heavy contributor to linguistics-related articles, and I notice an unusual proportion of their edits being reverted by other editors, some fat cuts and restores, and where content is added, it's either unsourced (diff1, diff2) or appears to have a citation or two, but they often don't back the preceding article content, instead, they are more of a forward-looking, "see-also"-style explanatory note within <ref> tags, of the "See Foo & Bar (2000) for a debate" type thing. There are a lot of big cuts of 5, 10, or 20kb of content, indicating a bold style, but that bothers me less, as at least they don't introduce OR content (well, one can't be sure without examining the diff, but probably not) and mostly they are not reverted.

The 2012 OR paragraph at English phrasal verbs is the first time I've encountered Tjo3ya, so I don't really know how much damage they may have done. I wonder if anyone who enjoys gnoming articles for old OR content would like to try and tackle this, or at least, provide a better idea of the scope of the problem? I notice that Botterweg14 appears to have tangled with them in April 2021 at Predicate (grammar), and had edits at half a dozen other linguistics articles around the same time, so perhaps they will recollect those edits and be able to give their impressions about this editor, in order to to better scope the extent of the problem, if indeed there is a problem. Thanks, Mathglot (talk) 22:45, 14 December 2023 (UTC)

The problem is generally WP:NPOV more than WP:OR. Their contributions often argued in Wikivoice for their idiosyncratic version of dependency grammar, and even their less argumentative contributions still give undue weight. I removed some blatant instances, as did Kaĉjo, but there's still a lot out there. This isn't trivial to fix, since this editor was the main person working on syntax articles for quite a while, and their problematic contributions are often intertwined with good ones. I'll do what I can when I have time, but unfortunately their battleground behavior contributed to an unwelcoming environment for many of the people best positioned to fix it. Botterweg14 (talk) 03:51, 15 December 2023 (UTC)

To add to this: their idiosyncratic version of dependency grammar is based on the notion of a catena, so I used Special:WhatLinksHere/Catena_(linguistics) a couple of times. The trouble was that, when edited to give a neutral point of view on catenas, the passages generally looked OK to me, but gave undue weight to this theory in the context of the article. So the best solution (I thought, not being very familiar with wiki guidelines) would be not so much to cut down on the catena content but to add content about other approaches to make articles more representative. Unfortunately this would take much more work than simply cutting down on catena content, and should also be done with someone with much more knowledge of syntax than myself. Kaĉjo (talk) 16:01, 15 December 2023 (UTC)

Perhaps some of this text can be moved to catena, since it wouldn't be undue weight in that context. Botterweg14 (talk) 04:48, 16 December 2023 (UTC)

Pinging Mundart and RM_Dechaine, since their syntax expertise goes far beyond mine. Botterweg14 (talk) 03:59, 15 December 2023 (UTC)

Your diagnosis of the effects of Tjo3ya's edits is exactly right: a very large weighting of a very marginal (in the literature) theory using the catena, and a driving away of editors who didn't want to engage in endless small skirmishes over how to make the catena theory appropriately cited, and not give it undue weight. It was exhausting, and more than one of us simply decided to cut back on editing wikipedia. Really, as you say, all the syntax articles need a pretty thorough eye to rebalancing. Perhaps once I retire! Mundart (talk) 16:37, 23 December 2023 (UTC)

Difrasismo and Dvandva

There's a discussion regarding a merge between Difrasismo and Dvandva at Talk:Difrasismo#Merge? that could do with some input (there). The key current query is whether there is a suitable over-arching article into which both could be merged, but please also consider the reasonableness of the primary proposal. Klbrain (talk) 18:45, 27 December 2023 (UTC)

Komi languages

According to Oxford Guide to the Uralic Languages (2022), there is a single Komi language, for which two literary languages, Komi-Permyak and Komi-Zyryan were created. Neither of these languages seems to be primary one in any sense and deserve the designation as 'the Komi language', but for some reason Komi-Zyryan now holds that title. Also, Komi-Permyak is under a name Permyak, contrary to the reliable sources. I am not familiar with linguistics articles in Wikipedia, so I am asking for opinions on what should be done.

Should we move Permyak > Komi-Permyak, Komi>Komi-Zyryan and make Komi language into a disambiguation or a short article explaining the variants and the historical reason for their existence? This would probably affect many links. Jähmefyysikko (talk) 23:17, 29 December 2023 (UTC)

I expect the move to Komi-Permyak to be the least controversial step so I opened a discussion about it at Talk:Permyak language#Requested move 30 December 2023. Jähmefyysikko (talk) 14:01, 30 December 2023 (UTC)

Discussion at Talk:Voiced palatal approximant § Do not undo the alveolo-palatal approximant

You are invited to join the discussion at Talk:Voiced palatal approximant § Do not undo the alveolo-palatal approximant. Nardog (talk) 15:54, 30 December 2023 (UTC)

Linguistic input could be useful in a WP:V wording matter

Wikipedia talk:Verifiability#Merge WP:SELFSOURCE and WP:BLPSELFPUB to WP:ABOUTSELF has stalled out, with stonewalling by a single party, who claims that the syntactic problems in the policy material's opening sentence, which I've outlined in considerable detail, are just "[my] opinion" and that doing anything about them is "not needed" and is "WP:CREEP". I think these grammatical-meaning and parseability issues are objectively factual and not a matter of subjective opinion, but that editor will not engage on the matter further, there or in user talk [1], where I demonstrated that the revision actually complies with is not against the goals of the CREEP essay.

The discussion has too few active participants (despite "advertising" the thread to WP:VPPOL) to move past this issue. Either I'm correct that the sentence is syntactically faulty or I am not, and additional voices should get us past this blockage one way or the other. If I'm simply wrong about the problems I see in the original wording, then feel free to say so.

It's basically come down to a choice between the versions in the last two subthreads there (unless someone wants to propose a new revision); no real need to pore over the entire revision process. — SMcCandlish ☏ ¢ 😼 23:26, 31 December 2023 (UTC)

Query about IPA transcription of the latin word "tricolor"

I was reading the article Rubus tricolor and thought the IPA transcription was interesting: /ˈruːbəs ˈtraɪkʌlər/

I am no expert, but it was my understanding that the difference between the vowel sounds ʌ and ə was simply that the former is under stress, and the latter is not under stress. But in this word, the main stress falls on the first syllable, meaning that the second syllable must be unstressed (unless there is secondary stress?). Therefore, we should have both unstressed syllables (i.e., second and third syllables) rendered as schwa, correct? Anyway, I am not sure if it is correct, or if my previous understanding was not accurate. Many thanks, Moribundum (talk) 17:51, 8 January 2024 (UTC)

The second syllable has a subordinated stress like the second syllable of "homemaker" etc. I don't think the second and third syllables have the same degree of stress in the most usual pronunciation of the word... AnonMoos (talk) 18:09, 8 January 2024 (UTC)

Unsourced edits

Someone might want to review the edits from 90.241.160.140 and 84.68.219.93. The editor has changed many articles without any sourcing, mostly related to letters and alphabets (especially Armenian, Cyrillic, Glagolitic, and IPA ones), and the edit summaries range from vague to patent nonsense. Daniel Quinlan (talk) 21:24, 8 January 2024 (UTC)

I would block if they don't heed the warning to start sourcing. Just spot checking a few, they seem more in the realm of theories or original research. For example, here the editor adds some dubious origins of the grapheme ⟨X⟩ to the article on its descendant ⟨X̂⟩ but that same claim was removed from ⟨X⟩ a few weeks before for failing verification. Here they add some sister graphemes to ⟨U⟩, including ⟨उ⟩ which has a completely separate history from the Latin ⟨U⟩ and only shares a name. The 84... IP is slightly better in the sense that this edit at least is partially supported by the article text but we also get edits like this where the summary is just a skibidi toilet reference and the edit is unsourced. I'll probably just go through and revert them in bulk when I have some spare time since I don't have faith these are going to stand up to scrutiny if we took the time to research them (and the editor[s] should really be providing that if it's not in the text). — Wug·a·po·des 04:00, 9 January 2024 (UTC)

Based on those examples and the other reverts, it's likely that the best course of action is to undo every one of these edits, but we could wait another day to see if anyone here can make sense of some of them. I wouldn't object to undoing them now, though. Daniel Quinlan (talk) 06:36, 9 January 2024 (UTC)

I have undone most using this edit note: rv dubious edits by user:90.241.160.140 per Wikipedia talk:WikiProject Linguistics#Unsourced edits and left a note at their talk page inviting them to explain why they consider their edits to have been valid. The facetious tone of many of their edit notes do not inspire confidence. --𝕁𝕄𝔽 (talk) 14:05, 9 January 2024 (UTC)

All my edits are constructive, I simply just ran out of ideas for edit summaries and went for silly things I’m sorry. 90.241.160.140 (talk) 16:13, 9 January 2024 (UTC)

Which, apart from showing your immaturity, doesn't respond to Wugapodes's demand that you produce evidence to support your changes. --𝕁𝕄𝔽 (talk) 23:23, 9 January 2024 (UTC)

90.241.160.140 is now reverting my reversions. I have no inclination to get bogged down in an edit war so if anybody cares about these topics, they will need to open a WP:ANI report and redo the reversions. --𝕁𝕄𝔽 (talk) 23:23, 9 January 2024 (UTC)

@JMF: I blocked them from article space for a week which prevents further problems but allows them to still discuss the changes. They were warned twice (three times if you count me saying above that I'd block them) and clearly knew there was an ongoing discussion, but they kept going. I don't see the need for ANI when it's that clear cut. If they want to discuss, they can do so here or make edit requests. — Wug·a·po·des 02:39, 10 January 2024 (UTC)

The IP address has resumed adding unsourced changes, coupled with inappropriate edit summaries, upon the release of their block. I've filed a report and am working on reverting their edits. Panian513 16:04, 17 January 2024 (UTC)

sourcing for the etymology of "whore" as well as potential etymological missing link

Prostitution#Etymology_and_terminology I noticed the section here and thought that going from the proto-german *hōrōn to PIE *keh₂- and thought it strange, and decided to take a look over on our sister site for a source, and while not finding one, suggests a missing link between the two was another PIE word, *kéh₂ros, which i can see the connection better if it can be sourced. Anyone more familiar with sourcing etymology taking a look into this would be lovely. Akaibu (talk) 15:20, 25 January 2024 (UTC)

Help on untangling some Alaska-Eskimo scripts

It appears that the Commons images used in Yugtun script among other articles are mislabeled, but I can't figure out at all from the christusrex source which script image corresponds to which language/dialect, which script, and which script inventor -- each of which may have their own article and each of which may be scrambled. (It also used in ru:Эскимосская_письменность among others -- that page seems to have a better organization of how some of the scripts coordinate to dialects.) Someone who has the willingness to take the time to take a couple hours' dive into (or has background already of) the differences of several Eskimo dialects + phonetics, scripts, and transcriptions -- their efforts on this would be appreciated.

[Addendum:] I'd also appreciate ideas on how to verify the photo of Uyaquq /(Uyaqoq?) on Rovenchak 2011 (p. 8), which unfortunately seems like a very cruddy article. (That said, it passes WP:V and a very-most superficial reading of WP:RS, so it'd only be a matter of licensure to get the photo, else one could just link to it. However, I think it'd be irresponsible if we didn't try to independently verify ourselves.) SamuelRiv (talk) 04:21, 30 January 2024 (UTC)

As for the second part, Rovenchak 2011 takes the image from http://uyaquk.com/, which has been archived via the Way Back Machine in 2005: [2]. That has an invitation to Contact the author with comments or to request a full set of bibliographic references/footnoted article, that Yahoo email address was also used as the contact for doi:10.2307/1357795, I found her LinkedIn page which lists that BASOR article and provides her personal website with a different, but available email address. I'm not sure how much I can spell out directly, but it might be worth emailing her to ask if they recall where she got the image of Uyaquk from? Umimmak (talk) 05:07, 30 January 2024 (UTC)

ALL-CAPS for "keywords for lexical sets"?

See [3]: An anon is putting various words in ALL-CAPS (misusing {{sc2}} in the form {{sc2|FOOT}} which simply outputs regular all-caps not small caps), insists this is proper for "keywords for lexical sets", and claims that this is how they "are generally represented ... across Wikipedia", yet I have never encountered this before here, and it is not to be found in MOS:ALLCAPS or any other guideline I'm aware of. The anon seems to want to do this for any word containing a sound that is under discussion in the article, such as the ʊ in foot, to be rendered FOOT. I can't see any rationale for doing that instead of just writing foot. If there's a good reason to do it after all, then it needs to be accounted for at MOS:ALLCAPS. However, it seems to conflict with a specialized linguistic use already codified there:

* In linguistics and philology, glossing of text or speech uses small caps for the standardized abbreviations of functional morpheme types (e.g. PL, AUX) ....

The only thing like this I'm finding elsewhere on-site is at Help:IPA/English, where it has been done seemingly to random words, then veering back into lower-case, e.g.:

ɔː — THOUGHT, audacious, caught

— SMcCandlish ☏ ¢ 😼 20:50, 20 October 2023 (UTC)

It is standard (on Wikipedia and elsewhere discussing English phonology) to have keywords for lexical sets in all caps, see Lexical set, Fronting (sound change) (See "GOOSE-fronting"), the alternate name LOT–THOUGHT merger in Cot–caught merger, throughout in English phonology, New Zealand English phonology, Rhoticity in English etc., etc. Umimmak (talk) 23:09, 20 October 2023 (UTC)

Also see : Wikipedia talk:Manual of Style/Capital letters/Archive 26#I'm still confused on difference between sc and sc2 templates Umimmak (talk) 01:34, 21 October 2023 (UTC)

Well, there's some followup discussion at Talk:Hiberno-English#Merger of monophthong and diphthong sections (which is rather confusingly trying to address two things at once, but this is one of them). Anyway, the fact that some people write a lexical set this way doesn't seem to imply that it is "standard" that WP has to follow, especially when it is not likely to signify anything to more than a vanishingly small fraction of readers. Where is this standard published, and what body issued it? Also, doing {{sc2|GOOSE}} seems to serve no purpose at all, since it renders and copy-pastes the same as just typing GOOSE without a template. If we're certain we want to render lexical sets in all-caps, then this should be accounted for at MOS:ALLCAPS. — SMcCandlish ☏ ¢ 😼 01:05, 22 October 2023 (UTC)

some people write a lexical set this way — everyone writes the keyword to lexical sets in small caps (or all caps if there are typological limitations). The IP editor and I have both provided a few of the many Wikipedia pages already doing this, because the sources used for writing the articles also do this because everyone who refers to keywords for lexical sets does so in capital letters. See myriad sources noting this explicitly if you search Wells lexical sets "small caps" in Google Books.

These are J.C. Wells’ lexical sets, so if people make use of his sets they follow his typographical conventions (1982, p. xviii):

Words written in capitals
Throughout the work, use is made of the concept of standard lexical sets. These enable one to refer concisely to large groups of words which tend to share the same vowel, and to the vowel which they share. They are based on the vowel correspondences which apply between British Received Pronunciation and (a variety of) General American, and make use of keywords intended to be unmistakable no matter what accent one says them in. Thus 'the KIT words' refers to 'ship, bridge, milk . . .'; 'the KIT vowel' refers to the vowel these words have (in most accents, /ɪ/); both may just be referred to as KIT.

Note this isn’t in violation of MOS:WAW because GOOSE is referring to more than just the word goose.

Also GOOSE and GOOSE do appear differently so I’m confused what you mean by them rendering the same? Umimmak (talk) 12:22, 22 October 2023 (UTC)

They were pretty close to indistinguishable in a particular browser, that's all. Anyway, I've (belatedly – forgot about this for several months) updated MOS:SMALLCAPS to account for this use of them [4], and hopefully avoid another revertwar about it as happened at Hiberno-English in October last year. I'm honestly skeptical this is a good idea, because it's based on the style used in a partcular primary source, and smallcaps are already used for at least two other unrelated linguistics markup purposes (ones I was already familiar with from my own university linguistics department days). But if there's already a strong consensus among people who care about it that it should be done this way, and we're already doing it consistently in articles and even in documentation like Help:IPA/English, then it should be accounted for in the guideline.

PS: In the same MoS section is an HTML comment reading: This next part does not appear to actually be applicable on Wikipedia; will get clarification from WT:LINGUISTICS: Transcription of logograms (as opposed to phonograms) can also be done with small caps or all caps. Not really sure what to do with this. Is there anything Wikipedia-important that needs to be accounted for here? — SMcCandlish ☏ ¢ 😼 10:56, 3 February 2024 (UTC)

Thanks for that. And as for the second point, articles with transliteration of Sumerian text have that distinction see

NIN (cuneiform), EN (cuneiform), Sumerogram#Transliteration and examples. It might be used in other languages too, but I mostly associate it with Sumerian. Umimmak (talk) 15:44, 3 February 2024 (UTC)

Should these be in full-size ALL-CAPS or SMALL-CAPS? — SMcCandlish ☏ ¢ 😼 00:21, 5 February 2024 (UTC)

I guess we never got a clear answer when you asked before: Wikipedia talk:WikiProject Ancient Near East/Archive 6#MoS cleanup point: all caps and small caps. But I'm seeing all caps in journals:

Dalley speculates whether gišṭû (GEŠ.DA) is to be distinguished from the Sumerogram GEŠ.ZU, Journal of Ancient Near Eastern History doi:10.1515/janeh-2023-0010
BARA₂-mar is an alternative spelling of BARA₂.DUMU, Journal of Cuneiform Studies doi:10.1086/725217
The original writing is ^dPA₄.SIG₇.NUN.ME = ^disimu₄(-d). In this NUN.ME is a semantic marker, which had no consequences for the pronunciation, IRAQ doi:10.1017/irq.2022.7

And Foxvog's textbook on Sumerian [5] writes:

In unilingual Sumerian contexts, Sumerian words are normally written in lower case roman letters. Upper case (capital) letters (CAPS) are used:
When the exact meaning of a sign is unknown or unclear. Many signs are polyvalent, that is, they have more than one value or reading. When the particular reading of a sign is in doubt, one may indicate this doubt by choosing its most common value and writing this in CAPS. For example, in the sentence KA-ĝu₁₀ ma-gig 'My KA hurts me' a body part is intended. But the KA sign can be read ka 'mouth', kìri 'nose' or zú 'tooth', and the exact part of the face might not be clear from the context. By writing KA one clearly identifies the sign to the reader without committing oneself to any of its specific readings.

When the exact pronunciation of a sign is unknown or unclear. For example, in the phrase a-SIS 'brackish water', the pronunciation of the second sign is still not completely clear: ses, or sis? Rather than commit oneself to a possibly incorrect choice, CAPS can be used to tell the reader that the choice is being left open.

When one wishes to identify a non-standard or "x"-value of a sign. In this case, the x-value is immediately followed by a known standard value of the sign in CAPS placed within parentheses, for example da_x(Á) ‘side’.

When one wishes to spell out the components of a compound logogram, for example énsi(PA.TE.SI) 'governor' or ugnim(KI.KUŠ.LU.ÚB.ĜAR) 'army'.

When referring to a sign in the abstract, as in “the ŠU sign is the picture of a hand.”

In bilingual or Akkadian contexts, a variety of conventions exist. Very commonly Akkadian words are written in lower case roman or italic letters with Sumerian logograms in CAPS: a-na É.GAL-šu 'to his palace'. In some publications one also sees Sumerian words written in s p a c e d r o m a n letters, with Akkadian in either lower case roman letters or italics. In other newer publications Sumerian is even printed in boldface type.

So it definitely seems to be in ALL CAPS over SMALL CAPS, and that seems to track with usage on Wikipedia. Again still under the assumption this is about Sumerian/Sumerograms; might be worth asking Wikipedia talk:WikiProject Writing systems as well. Umimmak (talk) 01:06, 5 February 2024 (UTC)