Wikipedia talk:WikiProject Linguistics/Archive 18

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 15

Archive 16

Archive 17

Is an "alphabet" and a "script" same?

Is an "alphabet" and a "script" the same thing? I know this is probably not a strictly linguistic issue, but I can thing of no other expert group that can help with this. If this is not the forum, please point me to the right one.

Context: Today we have Bengali-Assamese script and the two alphabets: Bengali alphabet and Assamese alphabet. The "script" article came about because the then Bengali script was too language specific and after some discussion it was decided that a "parent" article was required (Talk:Bengali_alphabet#Merge_with_Assamese_script?, 2006-2007). After some meandering the article name settled on "Bengali-Assamese script".

The immediate context is the discussion at Talk:Rangpuri_language#Writing_system. In short the discussion is on whether we should link the script of the Rangpuri language as [[Bengali-Assamese script]] or [[Bengali alphabet|Bengali script]].

I am tagging the other interested parties: user:Za-ari-masen and user:Msasag. And also user:SameerKhan who was instrumental in the 2006/2007 decision.

Thank you!

Chaipau (talk) 12:53, 29 August 2020 (UTC)

Terminology varies. A good starting point might be the Glossary of Unicode Terms. In their strict terminology the Bengali-Assamese script is a script ("Bengali script") of the abugida type (and not of the alphabet type), and the Bengali writing system as well as the Assamese writing system use the Bengali script. HTH. Love —LiliCharlie (talk) 13:08, 29 August 2020 (UTC)

Does Unicode provide names of scripts or blocks of codes? There is in fact a proposal to change the block to "Bengali-Assamese" ("It may be possible to change the block header name, though the block property values cannot. The most neutral and least disruptive name would be “Bengali-Assamese”. This is an editorial, not a normative, matter." [1]) Nevertheless, the script is already called "Bengali-Assamese" in Saloman (1998) Bengali–Assamese_script#cite_note-1. Chaipau (talk) 13:34, 29 August 2020 (UTC)

Unicode provides a lot. The latest standard has over 1000 pages. And they also host ISO 15924. Love —LiliCharlie (talk) 13:50, 29 August 2020 (UTC)

I don't think Unicode's stability policy allows script or block name changes, not even if the names contain obvious spelling errors, but formal name aliases are allowed. Love —LiliCharlie (talk) 14:07, 29 August 2020 (UTC)

But Unicode does not encode scripts per se, according to their FAQ. For instance, Bengali uses the "danda" defined in the Devanagari block, so does it mean that Bengali encoded in Unicode uses a hybrid Bengali-Devanagari script? Yes we started with Unicode, but we need to move on. Chaipau (talk) 14:24, 29 August 2020 (UTC)

Blocks are handy, but they don't determine script. Characters have character properties, and one value of the script property is Zyyy for "undetermined script" aka "common". (Many scripts share punctuation, numerals, diacritics, etc.) Love —LiliCharlie (talk) 14:35, 29 August 2020 (UTC)

I respect the knowledge and opinions of the editors who joined that discussion of 2006-07 that Chaipau showed, but it just appears to be a case of WP:OR where the editors came up with the term "Bengali-Assamese script" which now seems like a WP:NEOLOGISM as there are some visible efforts to popularize the term. All the relevant sources call it "Bengali script" including the Unicode glossary that LiliCharlie showed and in the context of Rangpuri language, Ethnologue states the writing system of Rangpuri as "Bengali script". The sources use "Bengali script" and "Bengali alphabet" interchangeably, even the article on Bengali alphabet uses "script" numerous times in its description. I think the best way to solve this issue is to rename Bengali-Assamese script to Bengali script. Za-ari-masen (talk) 09:34, 30 August 2020 (UTC)

A script and an alphabet aren't same imo. A script is a set of characters and an alphabet is based on one or more scripts and the characters have certain sound values and other rules. This script we are talking about isn't just used for Assamese and Bengali but also for Maithili, Meitei Manipuri, Kamtapuri, Bishnupriya Manipuri, Sylheti, Hajong, Santali, Chittagonian etc etc. And this script is known by many different names. This script currently has two Unicode blocks: Bengali and Tirhuta. Tirhuta block is, as of now only usef for Maithili language, it's also known as Mithilakshar. And the Bengali block is used for many different languages like Assamese, Bengali, Rangpuri/Kamtapuri etc. Unicode has three names for the script: Tirhuta script, Bengali script and Assamese script. Though since this is one script, an unified name should be used. I prefer the name Eastern Nagari. The Siddham script has two descendants, 1) Nagari or Devanagari or Western Nagari and 2) Eastern Nagari. So the term Eastern Nagari is suitable for the script since it's used in the Eastern region. Scripts like Odia and Nepalese script came from early Eastern Nagari that emerged in 13th-14th century. We cannot choose any of the regional names like Bengali or Assamese or Tirhuta. That is because people from other regions don't accept any specific regional name. For example, if it's renamed as Bengali script, then people from Assam, Bihar, Jharkhand, Kamtapur region will feel offended. They feel offended and disadvantaged when their script, languages, culture etc are mistaken to be Bengali. This leads to hatred among different groups. So this issue will never be solved, people will keep demanding to change the name "Bengali script". So I think it's best not to favour any specific regional term and we should use an unified term like "Eastern Nagari" for the script. Outside wikipedia, the unified name "Eastern Nagari" or "Purvinagari" is quite accepted as I've seen. Only few people opposed this term, it seems that they prefer a term to which their cultural identity is associated. I'm a Bengali and I've many Bengali friends from Bangladesh and West Bengal who have no issues using the term Eastern Nagari. Msasag (talk) 10:41, 30 August 2020 (UTC)

@Za-ari-masen:, No. When user:SameerKhan suggested the name "Bengali-Assamese" in 2006 it was already prevalent ("Indian Epigraphy" Saloman 1998). It was to accommodate the non-Assamese/Bengali languages that for a time this article was named "Eastern Nagari script". (Manipuri language uses the Bengali র and the Assamese ৱ for example. - addendum) We know from Brandt 2014 that the academic community rightly prefers "Eastern Nagari script" for the very same reason. [2] Chaipau (talk) 14:24, 30 August 2020 (UTC)

Ethnic pride is not among our five criteria for article titles and we will continue to call Serbo-Croatian by that name in spite of animosities between fervent Serbs and Croats who hate their linguistic varieties to be described as varieties of a common language, and in spite of Bosnians and Montenegrins who hate not to be mentioned. We should be guided by our existing policy and not invent new ad hoc rules to cater for the taste of people who lack scientific objectivity (i.e., maximum distance between observer and the observed). Love —LiliCharlie (talk) 15:30, 30 August 2020 (UTC)

Chaipau, these are just one or two sources where the terms "Bengali-Assamese" or "Eastern Nagari" are mentioned but there are thousands of sources that describe the script as "Bengali script". Even Brandt himself notes that "Bengali script" is the most common and popular term for this script, hence, it seems to be the most suitable title per WP:COMMONNAME. You should see what LiliCharlie stated above, ethnic pride is not a criteria to suggest article titles. Za-ari-masen (talk) 09:13, 31 August 2020 (UTC)

@Za-ari-masen: What applies here is WP:NAMINGCRITERIA, not WP:COMMONNAME. The point is that academics and others have recognized the name "Bengali script" is problematic. You are misquoting Brandt—this is what she says: "In fact, the term 'Eastern Nagari' seems to be the only designation which does not favour one or the other language. However, it is only applied in academic discourses, whereas the name 'Bengali script' dominates the global public sphere." In other words, she (and the academic community) is rejecting the dominant name ("Bengali script") and is preferring quite another name ("Eastern Nagari script").

And the claim to WP:COMMONNAME is a little misleading. In determining what is WP:COMMONNAME it recommends In determining which of several alternative names is most frequently used, it is useful to observe the usage of major international organizations, major English-language media outlets, quality encyclopedias, geographic name servers, major scientific bodies, and notable scientific journals.. Just a search on the web is not enough. Again pointing back to Brandt's statement preferring "Eastern Nagari script" over the popular "Bengali script".

Chaipau (talk) 11:23, 31 August 2020 (UTC)

Addendum: Using solely WP:COMMONNAME, one should use "Bengali-Assamese script" rather than "Bengali script". This is because there are significant works that mention the script as Assamese or Asamiya script (e.g. in "Indo-Aryan Languages, Cardona") and it improves recognizability. Chaipau (talk) 11:33, 31 August 2020 (UTC)

Za-ari-masen I don't see any reason to consider the "popularity" of a word, rather we should use a name that is acceptable to all (not just an individual). We should also keep in mind the publication dates of those "thousand" sources. Mohsin274 (talk) 10:20, 31 August 2020 (UTC)

For what it is worth, Unicode does call it "Bengali and Assamese"[3]. Chaipau (talk) 12:06, 31 August 2020 (UTC)

"It"? No. Unicode calls the script "Bengali script", and the block starting at U+0980 "Bengali", cf. chapter 12.2 of the current standard which also mentions "Bangla script", "Asamiya", and "Assamese" as synonyms for the script. What you are citing is a page to help users find charts of Unicode blocks rather than scripts. Love —LiliCharlie (talk) 12:45, 31 August 2020 (UTC)

@LiliCharlie: I think we have addressed these issues earlier.

The block header name will never change in Unicode. It will break too many things and it was designed not to change.
Blocks encode codes, not scripts (look at the FAQ link I provided above). It says they do not encode scripts, per se. I also gave you an example why every complete sentence used in Bengali Unicode is a hybrid Devanagari-Bengali code.
Further more, look up the answer to the FAQ: Can I determine the script of a character by the character or block name? Ans: No, not at all. The character names and block names are not reliable indicators of the script of a character. In other words, the name "Bengali script" may or may not determine the name of the script to which the characters in the block belong. For example, the letter ৱ which is called "BENGALI LETTER RA WITH LOWER DIAGONAL". This letter does not even exist in the Bengali alphabet, and it is not "RA' but "WO".

In this case at least, we cannot go by Unicode naming conventions.

Chaipau (talk) 14:07, 31 August 2020 (UTC)

@LiliCharlie: I don't know if these will help but you should read these news articles once: [4], [5], [6] [7]. Mohsin274 (talk) 14:19, 31 August 2020 (UTC)

@Mohsin274: You are aware that the whining Sentinel editorial is utter BS that conflates script with language? –Austronesier (talk) 14:48, 31 August 2020 (UTC)

@Austronesier: I don't know. You may/may not be right, but I personally don't have any issues with Sentinel editorial. I am just showing few articles about "The London sitting of the International Organization for Standardization... held between June 18 and June 22, 2018." And, if you think the article from Sentinel is unreliable or biased then you can read the other 3 from The Assam Tribune, NE Now, and Indian Express. Mohsin274 (talk) 15:17, 31 August 2020 (UTC)

Don't get me wrong, but we need peer-reviewed scholarly articles rather than newspaper articles by people who seem involved. Love —LiliCharlie (talk) 15:27, 31 August 2020 (UTC)

@LiliCharlie: I agree. Opinion columns are the bane of Wikipedia in many instances. The Indian Express reports are also too opinionated. It was be better to look at the Unicode ad hoc committee report, which is some kind of a peer-review of the submission made by the BIS. Here they are:

The proposal: [8]
The Ad Hoc Committee report: [9]
The Working Group Report: [10]

Please note the Recommendation M67.25b from the Working Group (page 5): Change the block header from Bengali to Bengali-Assamese. Obviously, the WG did not accept everything the BIS submitted.

Chaipau (talk) 15:47, 31 August 2020 (UTC)

I never said we should follow Unicode or ISO 15924 practice. What I said was we should be guided by our own five criteria for article titles, and not consider ethnic pride. And I now add: We shouldn't try to settle any political issues. Love —LiliCharlie (talk) 14:46, 31 August 2020 (UTC)

@LiliCharlie: Yes, I agree with you. We should apply WP:NAMINGCRITERIA diligently here. We have seen that the old usage has some problems, and the Unicode, the academics and scholars are moving in a certain direction. We are best off being mindful of that direction. Not doing so is political. We should not overstep them either. This debate has been going on for some time in different talk pages, and I believe the experts in this Linguistic forum are possibly the best equipped to take the nuances into consideration and resolve the issue. Chaipau (talk) 15:28, 31 August 2020 (UTC)

My above example of Serbo-Croat was chosen because issues are involved that lead to atrocious wars with massacres and many casualties. I refuse to fuel tensions by taking sides, neither the Bengali-speaking, nor the Assamese-speaking nor any other linguistic or ethnic group have the right to demand considerateness that might result in hurting somebody else's feelings. I prefer to remain completely neutral by not agreeing with any of the parties involved. And certainly not with the loudest one. Love —LiliCharlie (talk) 16:04, 31 August 2020 (UTC)

@LiliCharlie: According to user:Za-ari-masen, the reliable sources like Ethnologue uses "Bengali script" (not "Bengali-Assamese script"). And, according to Ethnologue, they use ISO Standard 15924 for identifying writing systems or scripts (As stated here). Therefore, we are indirectly following ISO 15924. But, if ISO renamed "Bengali script" to "Bengali-Assamese script", then we should use the same. Mohsin274 (talk) 15:42, 31 August 2020 (UTC)

Discussion about Late Greek

There is an ongoing discussion about Late Greek in Talk:Late Greek -- is it a "period" of Greek? is it a "register"? should it have a standalone article, or be part of some other article? Kindly help us out! --Macrakis (talk) 17:05, 31 August 2020 (UTC)

Is Ethnologue reliable for the Kamta group of languages?

The Ethnologue seems to give classifications and names in a very different system, at variance with accepted knowledge and recent findings. Here are some examples:

Ethnologue calls Rangpuri language a language [11], whereas Masica 1991, p 25 calls it Rajbangsi (" Thus the Rajbangsi dialect of the Rangpur District (Bangladesh), and the adjacent Indian Districts of Jalpaiguri and Cooch Behar, has been classed with Bengali because its speakers identify with the Bengali culture and literary language, although it is linguistically closer to Assamese.")
Ethnologue, on the other hand, calls Rajbangsi a different language from Nepal [12].
Ethnologue calls Kamtapuri an alternative name for Rangpuri [13], whereas Toulmin (PhD 2006) finds "However, with a sizeable number of speakers now located within a different country to Rangpur, and lacking any special historical reason for choosing Rangpuri over Kamta, it is unlikely that this term will catch on further afield."

It seems Ethnologue is at complete variance with linguists and their findings and reports.

Could we then consider Ethnologue, at least for these entries, reliable?

Chaipau (talk) 17:39, 1 September 2020 (UTC)

The Ethnologue is a tertiary source because it is a compendium of other secondary sources (which in turn rely on primary sources). As such, it may be helpful, but proper secondary sources should be preferred. For our policy regarding primary, secondary and tertiary sources, see WP:PSTS.

Regarding the Ethnologue, I do not know about the Kamta group of languages, but I know cases where the Ethnologue does not reflect the best consensus in Linguistics, namely when it comes to the differentiation between Western Upper German varieties (which is what I know about the most), which includes entries such as “Swiss German“ (not a linguistic division, but rather a cultural or national one), “Walser” (various Highest Alemannic German varieties, but not the only ones), but scandalously lacks Alsatian.

I think it is problematic that the ISO has basically copied the Ethnologue classifications. Of course, a hard classification scheme like ISO 639-3 is a necessity for computers, and it has many benefits. However, it obscures the inherent fuzziness of linguistic classifications and perpetuates one classification system, in this case the Ethnologue’s. Also, the Ethnologue now has a hard paywall. --mach 🙈🙉🙊 18:54, 1 September 2020 (UTC)

I think it is problematic that the ISO has basically copied the Ethnologue classifications.

It's the other way round, see Ethnologue's The Problem of Language Identification page where it says: "Since the fifteenth edition (2005), Ethnologue has followed the ISO 639-3 inventory of identified languages (http://iso639-3.sil.org/) as the basis for our listing of distinct languages." (A more direct link to the language identification policy of ISO 639-3 is https://iso639-3.sil.org/about/scope. See articles SIL International, Ethnologue, and ISO 639-3 for the relationship between Ethnologue and ISO 639-3.) Love —LiliCharlie (talk) 19:39, 1 September 2020 (UTC)

P.S. The starting point to request an ISO 639-3 entry for Alsatian is their Introduction to the Code Change Process page. Love —LiliCharlie (talk) 19:57, 1 September 2020 (UTC)

Oh-oh, shows that it’s better to research first and rant later. Thanks for the corrections. --mach 🙈🙉🙊 22:10, 1 September 2020 (UTC)

@LiliCharlie: Yes. In the Indo-Aryan context, where "The speech of each village differs slightly from the next, without loss of mutual intelligibility, all the way from Assam to Afghanistan.", Masica 1991 p.21 has a very comprehensive description of the language/dialect problem. This is a much bigger problem that cannot be adequately captured by the mutually exclusive categories of Ethnologue. Chaipau (talk) 10:04, 2 September 2020 (UTC)

This is typical of dialect continua, of course, and by no means restricted to Indo-Aryan. Mach's Western Upper German example within the Continental West Germanic continuum is of the same kind. A language is a dialect with an army and navy. Love —LiliCharlie (talk) 10:30, 2 September 2020 (UTC)

LiliCharlie, J. 'mach' wust could an unpublished thesis be considered a reliable source over Ethnologue? Za-ari-masen (talk) 09:26, 2 September 2020 (UTC)

Sources are required to be verifiable, and our verifiability policy rules that "content is determined by previously published information". Love —LiliCharlie (talk) 09:43, 2 September 2020 (UTC)

What do you mean by "unpublished"? If you're talking about a PhD thesis that has been submitted and accepted then it counts as published (WP:SCHOLARSHIP). Nardog (talk) 10:05, 2 September 2020 (UTC)

The current setup on Ethnologue for the Ranjbanshi dates to 2008, and like with other recent changes it's got a paper trail that you can follow [14] (you will recognise the name of Toulmin somewhere in there). My experience with similar code changes in this part of the world is that they're usually based on the results of a sociolinguistic survey. Of course, conclusions could be different if other methods were used, and even the same sociolinguistic data is often open to different interpretations. Also, a recent survey can paint a different picture from the one gleamed from a three-decades-old reference text. – Uanfala (talk) 10:30, 2 September 2020 (UTC)

Yes, I agree with user:Nardog on the general principle. Furthermore, the PhD thesis in question, Toulmin 2006, is open-access published by the University: [15]. Therefore, it satisfies WP:V too, as required by user:LiliCharlie. Chaipau (talk) 10:40, 2 September 2020 (UTC)

So it looks like Toulmin himself was part of the team at Ethnologue to create the database for Rangpuri, so shouldn't we follow Ethnologue over Toulmin's earlier thesis? Za-ari-masen (talk) 11:09, 2 September 2020 (UTC)

Feedback requested at Portuguese vocabulary

Your feedback would be appreciated at Talk:Portuguese vocabulary#Examples and article title. Thanks, Mathglot (talk) 01:35, 8 September 2020 (UTC)

Comments requested

Please come and make your voice heard at Talk:Eskimo#Racial slur?. Trying to discuss what, if anything, direction the article should take. I have notified all projects listed at the top of Talk:Eskimo. CambridgeBayWeather, Uqaqtuq (talk), Huliva 22:42, 16 September 2020 (UTC)

Request for example numbering

Thought I'd let the community know I've made a feature request for an example numbering tool that would generate numbers automatically and allow cross-referencing. I'd appreciate any comments on my proposal, or just editors chiming in with their support if they agree that this would be useful. Botterweg14 (talk) 12:28, 21 September 2020 (UTC)

CFD for neologisms chronology categories

You are invited to join the discussion at Wikipedia:Categories for discussion/Log/2020 October 4 § Neologisms, words and phases introduced in time periods. —⁠andrybak (talk) 02:06, 4 October 2020 (UTC)

RfC at Cracker (term)

Hello, there is an RfC that y'all might be interested in here. It is in regards to the lead of Cracker (term). Bait30 ^{Talk 2 me pls?} 22:10, 5 October 2020 (UTC)

What is WP:SYNTH when using multiple sources about language classification?

Although the point was brought up mainly as a result of a WP:BATTLEGROUND situation in a range of articles about languages in NE South Asia, I want to elicit your thoughts about the general problem.

Building an article based on multiple sources is obviously not synthesis, as long as we do not draw new conclusions based on the material in various reliable sources. Drawing new conclusions is SYNTH. Typical cases are

Families A and B are members of the proposed macro-family AB; another source says that family C is related to B: including C in macro-family AB is SYNTH.
Paleolinguists propose that the distribution of language family A is associated with the spread of Haplogroup Foo. A paleogenetic paper claims that Haplogroup Foo originates from area X. Saying that family A originates from area X is SYNTH.

But what about "vertical" ~~grafting~~ piping of trees? Consider this situation:

Source A discusses the division of a language family X into larger subgroups, one of them is "Fooic". Source B deals with the internal classifaction of Fooic. NB, there is no disagreement about the validity of Fooic. Is the combination of this information already synthesis? Or more concrete: Is the tree information Family X → Fooic → Southwest-Foo language SYNTH?

I dare to say that we do this everywhere on WP. I can hardly think of a source that provides the full tree information e.g. of Yorkshire dialect.

Another, more problematic example:

R. M. W. Dixon is a staunch opponent of the Pama–Nyungan family. OTOH, he has contributed a lot to the classification of smaller units of Australian languages usually included in Pama–Nyungan. Now, the mainstream of specialists accepts Pama–Nyungan. Are we then barred (per SYNTH) from using Dixon's micro-classifications in the presentation of the internal classifcation of Pama–Nyungan, because he opposes the latter? E.g. there is no disagreement between Dixon and the rest about the validity of Yolngu.

We all know that full tree information is most easily to get from Ethnologue and Glottolog. We also all know that this is just a default choice, but where better sources exist, we should make use of them. Usually, specialized sources will not always provide full tree data. I want to use such sources without running into danger of producing contestable synth content.

PS: Is taking the birth date and death date for Alfred E. Neuman from two different sources SYNTH? –Austronesier (talk) 10:46, 6 October 2020 (UTC)

A very pertinent issue!

SYNTH should be OK as long as no new information is generated.
For a concrete example, where is the new information if we grafted the Gauda-Kamarupa tree from Glottolog to the Bengali-Assamese node of Ethnologue? Given that Ethnologue gives no sub-tree to Bengali-Assamese and Gauda-Kamarupa (Glottolog) = Bengali-Assamese (Ethnologue) is accepted.

Chaipau (talk) 11:44, 6 October 2020 (UTC)

Allow me to give another example. Say a source describes how a volcanic bassalt rock formation lies beneath X basin since the Jurrasic times and another source says the bassalt formation beneath basin X yields phosphate minerals, it is okay to write - the Jurrasic era volcanic rock formation beneath basin X yields phosphate minerals - because the connection is explicit.

It should be alright to use one tree upto Bengali-Assamese and another from Gauda-Kamarupa, if it is explicitly asserted by an RS that Bengali-Assamese is indeed Gauda-Kamarupa (especially so if the first one doesn't follow the tree beyond Bengali-Assamese). But it also needs to establish that Bengali-Assamese is Gauda-Kamarupa. Aditya^{(talk • contribs)} 09:03, 7 October 2020 (UTC)

@Aditya Kabir: I think "Bengali-Assamese = Gauda-Kamarupa" can be estabilshed from Toulmin's dissertation (who btw considers it a weakly supported clade), but we can discuss this in depth in Talk:Bengali-Assamese languages once it becomes more peaceful there (#prayforNESouthAsia); you can bring darjeeling, I'll provide some bandrek. –Austronesier (talk) 18:36, 8 October 2020 (UTC)

The other side of the WP:SYNTH coin is WP:NOTSYNTH. Not all synthesis is against WP:OR, and I think it's more productive to discuss what constitutes "original research" when listing classifications. Say we have source 1 which says "if A then B" and source 2 says "if B then C" then saying "A -> B -> C" and citing 1 and 2 should be fine unless there is some other prominent hypothesis in the literature. If you submit an article with that claiming it's original research, you'll get laughed at because that's the obvious logical conclusion. It would not be okay to say "A -> C" since neither source connects A and C without B and claiming A directly implies C would constitute an original claim. If you sent that to a journal, explicitly cutting out B, that requires evidence and is OR. — Wug·a·po·des 00:31, 8 October 2020 (UTC)

Excellent explanation. Can I offer you

a cup of hot darjeeling in appreciation? Or would you prefer beer instead? Aditya^{(talk • contribs)} 17:46, 8 October 2020 (UTC)

Requested article: Language Question (Italy)

I recently wrote an article about the Language Question (Malta), and while I was researching it I came across a somewhat similar linguistic debate which took place in Italy. This is covered by a fairly decent article on the Italian Wikipedia (it:Questione della lingua) but there's no article about it on the English Wikipedia. Would someone from this project be interested in translating the article from Italian and perhaps improving upon it?

I am also making this request at WikiProject Languages and WikiProject Italy. --Xwejnusgozo (talk) 19:36, 9 October 2020 (UTC)

Discussion at Wikipedia talk:WikiProject English Language § Inconsistent examples in talk page templates

You are invited to join the discussion at Wikipedia talk:WikiProject English Language § Inconsistent examples in talk page templates. Soumya-8974 ^talk _contribs ^subpages 12:21, 10 October 2020 (UTC)

Limburgish short close-mid front rounded vowel

Hello. I'm about to change ⟨ʏ⟩ to ⟨ø⟩ in our Limburgish transcriptions. The reason for that is [ʏ] is heard by Dutchmen and Belgians as a variant of /y/, rather than /ʏ/ which is phonetically [ø] or [ɵ] in Limburgish (as it is in Standard Dutch). In conversations with native speakers of Dutch, at least two of them have complained to me about the misleading use of ⟨ʏ⟩ in IPA transcriptions of Dutch.

Gussenhoven (1992) reports a lowered [ʉ̞] as the norm in Northern Standard Dutch, whereas Collins & Mees (2003) report [ʏ] as the norm. Both describe /ʏ/ as close-mid, the former source describes it as closer to central [ɵ], whereas the latter closer to front [ø]. [ʉ̞, ɵ] for /y, ʏ/ have been reported to occur in the Limburgish dialect of Hamont (by Verhoeven 2007), whereas [ʏ, ɵ] for /y, ʏ/ (with [y] being a word-final allophone of the former) have been reported to occur in the Ripuarian dialect of Kerkrade (by SKD 1987), which is often treated as a Limburgish variety,

It's clear to me that wherever there's a contrast between a short /y/ and a short /ʏ/, the latter is typically not closer than close-mid, whereas the former is not necessarily fully close.

The symbol I've chosen is ⟨ø⟩, used by Peters (2006). It's also in line with how the related West Frisian vowel is transcribed. In order to completely bring the transcription of the close-mid vowels in line with West Frisian, I'm also going to change ⟨ʊ⟩ to ⟨o⟩ and leave ⟨ɪ⟩ as it is. I'm sure that both vowel symbols (meaning ⟨ʊ⟩ and ⟨o⟩) are used in Limburgish dialectology, as they are in Dutch dialectology.

⟨ø⟩ is superior to ⟨ɵ⟩ in that it clearly shows that the vowel in question is the phonological short counterpart of /øː/. I also haven't seen ⟨ɵ⟩ used for the Limburgish vowel, though it has been used for the Dutch vowel and even for the West Frisian vowel. Even if it tends to be more central than front, in fact any of the so-called front rounded vowels in Limburgish can be central, and so can /œy/.

Full citations can be found on the following pages: Dutch phonology, Hamont-Achel dialect, Kerkrade dialect and Hasselt dialect.

I'm now going to WP:BOLDLY introduce the changes (⟨ʏ⟩ -> ⟨ø⟩ and ⟨ʊ⟩ -> ⟨o⟩). Sol505000 (talk) 12:11, 10 October 2020 (UTC)

I'm neutral to this, but I notice we don't have a dedicated {{IPA-li}} transcription template, which would help pave the way to creating an IPA help page and allow for a centralized place for discussions like this related to Limburgish. I'm a little short on time, but anyone who feels up to it can create the template and even look at Sol505000's recent edits (which they have so graciously provided a link to this talk page section in their edit summaries) to convert the transcriptions they've identified as Limburgish and change them from IPA-all to IPA-li. — Æµ§œš¹ _{[lɛts b̥iː pʰəˈlaɪˀt]} 16:06, 10 October 2020 (UTC)

@Aeusoes1: An IPA help page would be great, but I'm not sure how to deal with the pitch accent as it varies from region to region (or at least the way it's analyzed varies from source to source, if that makes sense). I understand that superscript numbers are disliked by many of the editors, and I think that I share that sentiment as well. Ooswesthoesbes could be of help here. Sol505000 (talk) 16:45, 10 October 2020 (UTC)

Before we create a help page, we generally start out by creating the IPA transcription template so that we can have an idea if there are enough transcriptions to merit the creation of such a help page. One step at a time. — Æµ§œš¹ _{[lɛts b̥iː pʰəˈlaɪˀt]} 17:16, 10 October 2020 (UTC)

@Aeusoes1: I see. I can create it, that's no problem. Just out of curiosity: how many transcriptions would have to exist to merit a help page? Sol505000 (talk) 17:46, 10 October 2020 (UTC)

I don't think we have a hard and fast agreement on that. IMHO, one or two transcriptions wouldn't be enough, but a couple of dozen would be enough to pass such a threshold. — Æµ§œš¹ _{[lɛts b̥iː pʰəˈlaɪˀt]} 18:08, 10 October 2020 (UTC)

The true vowel [ʏ] does not exist anywhere in the Limburgish linguistic area (even in the broad sense which includes Kleverlandic) as far as I know; it is, however, used very often due to tradition and allignment to Dutch. Another reason is that the short "u" is not pronounced the same in all places. My knowledge of Belgian Limburg is limited, so I can only talk about the Dutch dialects. Maastrichts, Roermonds and Weerts have a clear tendency towards [ɵ], while the rural dialects of Midden-Limburg use a simple short version of eu, that is [ø]. /ɪ/ is generally more closed than Dutch, so it would be best to stick with that. The use of [o] over [ʊ] is debatable, as it actually can be very close to [u] in some dialects. For simplicity's sake, [o] is a good choice, however, as it would make the phonology table more streamlined: (/i i: y y: u u: - ɪ (odd one out) - e̞ e: ʊ o: - ə - æ ɛ: œ œ: ɒ ɒ: - ɑ a:/) The big sidenote to this is that there are excentric dialects that contrast a three to four-way /æ a (ɒ) ɑ/ or even contrast /ɪ (e:) e̞ ɛ æ/. The transcriptions I have created thusfar are mainly based on my own dialect, Montforts, which is also well-described by Pierre Bakkes. Here, I chose to use [ø], but [ʊ], as in some words/derivitatives [ʊ] and short [u] are interchangeable due to there proximity (bók "buck" > boekketig "buck-like" etc.). And indeed, [y] does not exist as well, the best way to describe it would be [ʉ] ([ʉ̞] or [ʉ̜] specifically, depending on the dialect), but again, the Dutch tradition is to use [y]. Another note, /øy/ in Dutch usually ends in /øi/ in Limburgish, an often cited example is the pronunciation of the Dutch word truien as /truiwen/ in Northern Dutch vs. /truijen/ in Limburg.

When it comes to pitch accent, the most neutral way is to use simple diacritics: á for sleiptoean/drag tone, and à for stoeattoean/push tone, and none for a neutral accent. The exact pronunciation varies, and in some cases are even reverse, f.e. in Venlo the tones seem to be the opposite of those in Roermond. In literate, "a~" for drag tone and "a\" for push tone are often used, but they do not work well in combination with IPA (as the slash actually appears to close (\ vs. /) the transcription in IPA). --OosWesThoesBes (talk) 06:38, 11 October 2020 (UTC)

Zeugma (and syllepsis)

Wanting a quick working definition of zeugma and being too lazy to dig out my copy of Crystal's excellent Dictionary of Linguistics and Phonetics (let alone essay a definition by myself), I looked in Wikipedia for Zeugma. What I found amazed me, and not in a good way. I suppose that this is what happens when editors entrust linguistics matters to vaguely literary sources that demonstrate no knowledge of (post-18th-century) linguistics. I mean as a ferinstance:

"He works his work, I mine" (Tennyson, "Ulysses") [...] is ungrammatical from a grammarian's viewpoint, because "works" does not grammatically agree with "I": the sentence "I works mine" would be ungrammatical.

which I might rephrase as

"He works his work, I mine" (Tennyson, "Ulysses") [...] is ungrammatical from the viewpoint of a grammar-obsessed ignoramus, because the sentence "I works mine" would be ungrammatical.

The talk page sports a template that says something-something about applied linguistics. How zeugma (or syllepsis) is a matter of applied linguistics eludes me, and I hope that the content of articles such as this one isn't applied anywhere. (Rant over.) -- Hoary (talk) 00:12, 15 October 2020 (UTC)

Error (Linguistics)

Does anyone know if someone is currently working on this article? It seems unfinished. Below is my evaluation of it.

https://en.wikipedia.org/wiki/Error_(linguistics)

I chose this article because I thought Error meant that there was something wrong with the article that needed to be fixed. Once I started reading it, I found the topic interesting.

Lead

The article includes an introductory paragraph that explains the main topic covered briefly refers to content covered by the sections in the outline. This paragraph could be more concise and it does include information that is not later covered in the article.

Content

The content covered is relevant to the topic, but some of the information could be more up to date and the author relies heavily on one source. The introduction refers to social perceptions and value claims that are not covered anywhere else in the article. The article does not address Wikipedia’s equity gaps.

Tone and Balance

The article appears to be neutral in tone.

Sources and References:

Sources are cited for the facts presented in the article, several are fairly recent. One link is broken and most of the citations are for books, not journal articles. I am unsure how to discern if the individual works cited are from historically marginalized individuals.

Organization:

What is provided in the article is written clearly and easy to understand with not grammatical or spelling errors noted. It is quite short and only includes 2 sections, leaving the impression that it is unfinished.

Images and Media:

There are no images or media included.

Talk page:

There is no talk page for the article. Users are directed to the WikiProject Linguistics portal to leave feedback. The article is rated start class on the quality scale and has not been rated on the importance scale.

Overall Impressions:

The article is a good start, but seems incomplete. The comment about social perceptions and value claims should be removed if the statement isn't going to be expanded on by adding another section. Canonlvr (talk) 07:47, 15 October 2020 (UTC)

Hi Canonlvr, and thanks for the feedback! In general, it's better to post feedback like this on the article talk page since it will be easier to find when an interested editor wants to improve the article. But no worries this time, I copied it over there for you. Looking at the page history, no one is actively working on it. Based on your feedback, you might want to try copyediting the introduction or adding the {{One source}} banner at the top of the article. — Wug·a·po·des 22:02, 15 October 2020 (UTC)

Does the history of the first print type for Bengali language belong in the history of the Bengali-Assamese script?

Bengali-Assamese script is used to write both Bengali and Assamese besides a host of other languages. The first printable types were produced for the Bengali language. Does it mean that since the types were made for the Bengali language, an account of this part of the history does not belong in the article? The issue here seems to be "nationalistic" in the sense that what "belongs" to Bengali cannot belong to anything that is named "Bengali-Assamese". Chaipau (talk) 10:32, 17 October 2020 (UTC)

It looks like there's a discussion on the talk page about this. If you would like community input, you might want to ask plainly. — Æµ§œš¹ _{[lɛts b̥iː pʰəˈlaɪˀt]} 22:35, 17 October 2020 (UTC)

Indeed and since I asked, we have made some progress towards resolution in the talk page: Talk:Bengali–Assamese_script#Printing. But I apologize if this was not done plainly. Chaipau (talk) 07:54, 18 October 2020 (UTC)

RfC on Sylheti language - Family tree

What could the family tree be for the Sylheti language? Ethnologue uses the following tree:

Indo-European→Indo-Iranian→Indo-Aryan→Outer Languages→Eastern→Bengali-Assamese→Sylheti

Chatterji (1926), on the other hand, uses a combination of names and regions to come up with this tree:

Magadhi Prakrit and Apabhramsa→Vanga Dialects

Here he splits Vanga Dialects into two parts and names Sylhet (probably the region, not the language) in two different branches (we can probably assume that E Sylhet represents Sylheti language, but I am giving out both the branches for reference):

Western and S W Vanga in which he includes NW Sylhet
Eastern and S E Vanga in which he includes E Sylhet

(Chatterji's tree is reproduced for reference in Toulmin's thesis (2006) p302)

Could we combine these two different sources, insert Vangiya in the tree from Ethnologue, and come up with a tree as follows?

Indo-European→Indo-Iranian→Indo-Aryan→Outer Languages→Eastern→Bengali-Assamese→Vangiya→Sylheti

@Za-ari-masen:, UserNumber, Kmzayeem, Aditya Kabir, Austronesier.

Chaipau (talk) 17:48, 4 October 2020 (UTC)

@Chaipau: I think you need to start the RfC at the article talk page, and leave a message here leading people to the discussion there. (see: Wikipedia:Requests for comment) Aditya^{(talk • contribs)} 18:21, 4 October 2020 (UTC)

@Aditya Kabir: Pinging somebody requires you to add new lines of text and sign your contribution in the same edit, see the "Usage" section of {{Ping}}. — Chaipau is watching this page anyway, I think. Love —LiliCharlie (talk) 18:48, 4 October 2020 (UTC)

Thanks. I guess if I add a ping later, I would also have to sign the comment again. I re-signature a lot anyways (because a poor connection and a strong ADHD). Here,

let me pour you a hot cup of fine darjeeling. See you at the RfC. Aditya^{(talk • contribs)} 18:59, 4 October 2020 (UTC)

@Aditya Kabir: this is a technical issue (in Linguistics) that might involve different language databases and the relative weights experts give them. If we sort it out here, we may not have to go through the formal RfC. Chaipau (talk) 19:13, 4 October 2020 (UTC)

Seriously!? I don't think I need to be a lingustic specialist to understand that a "language" (i.e. Sylheti) can't be a subset of a "dialect superculster" (i.e. Bengali/Vangiya, whatever that is). Also obsolete taxonomy belongs to the history section or alternatives section, not the infobox. In my humble but slightly tickled opinion, not-being-a-moron is talent enough to deal with this "technical issue".

Aditya^{(talk • contribs)} 19:25, 4 October 2020 (UTC)

This is not a language/dialect issue, Sylheti's status as a language/dialect itself is disputed but that's not the point. I don't see any harm in combining the two sources to form a family tree as has been done on Rangpuri language by combining Chatterji, Toulmin and Ethnolugue to insert "Kamrupic" and "Western Kamrupic", which the OP himself has supported. If there are problems with such family trees, it should be avoided on both Sylheti laguage and Rangpuri language. Za-ari-masen (talk) 08:53, 5 October 2020 (UTC)

Comment at the risk of not-being-not-a-moron per Aditya Kabir: It takes a layman to believe that a "language" (e.g. Sylheti) can't be a subset of a "dialect supercluster". "Language" and "dialect" are fluid concepts. "Bengali-Assamese" is a complex dialect continuum, with some of its varieties having a literary tradition and thus being considered "languages" (Bengali, Assamese, Sylheti). Many varieties don't have a literary tradition, and as a rule, these non-literary (including "aspiring" literary) variants are "roofed" by a traditional literary language: e.g. Chittagongian, Rangpuri/Kamta by Bengali, Kamrupi by Assamese, or Surjapuri by Hindi.

Ethnologue is agnostic with regards to the internal classification of Bengali-Assamese. This does however not mean that earlier classification proposals are invalid/obsolete. Chatterji has divided the Bengali-Assamese dialect continuum into four branches ("Radha", "Varendra", "Kamarupa", "Vanga"). Only "Kam(a)rupa" has been studied in detail by Toulmin (who btw calls Bengali-Assamese "Gauda-Kamrupa", with a question mark because of its unclear relation to Odia). Unlike Chatterji, he preliminarily proposes that all non-Kamrupa varietes can be assigned to a single sister branch of Kamrupa, viz. "Gauda-Baŋga". Note that the internal structure of "Gauda-Baŋga" is not discussed by Toulmin at all. Toulmin's classification of non-Kamrupa varieties of Bengali-Assamese does not invalidate Chatterji's classification; the matter clearly requires further research.

Since the internal structure of Bengali-Assamese is still not yet fully understood, and Chatterji's and Toulmin's classifications of non-Kamrupa variants are conflicting, I suggest to place Sylheti directly under "Bengali-Assamese" in the infobox, and mention the details in prose. –Austronesier (talk) 09:00, 5 October 2020 (UTC)

So shouldn't we be consistent on both Sylheti and Rangpuri if we are to mention the details in prose and keep the family tree upto Bengali-Assamese? Za-ari-masen (talk) 09:18, 5 October 2020 (UTC)

We don't have to be consistent between apples and pears. The place of Rangpuri is uncontroversial (Kamrupa is "established"), so there's no harm to have more solid info in the infobox. –Austronesier (talk) 09:28, 5 October 2020 (UTC)

Austronesier Does this mean that we can use dialects and languages interchangeably? What is the purpose of structuring a taxonomic tree for concepts that are inherently unstructured? In a layman's view this looks like buidling with bricks made of water. Aditya^{(talk • contribs)} 11:49, 5 October 2020 (UTC)

@Aditya Kabir: Trees are not about taxonomy, nor about languages vs. dialects, but about the historcal relations between individual language varieties. For "language" vs. "dialect" see: Abstand and ausbau languages, A language is a dialect with an army and navy, Dialect#Dialect_or_language –Austronesier (talk) 12:56, 5 October 2020 (UTC)

Aren't those historical relations of a mother-daughter variety? As for the army and navy... every Bangladeshi village talks a little differently than the next one, and there are over 100 thousands of them. Thanks lord that they don't all have access to armies and navies. While I understand the lack of research, but a structure can't be a way to explain things not structured. Can it? (By the way, I must state that my comments last one onwards have nothing to do with the dispute. With new enlightenment I am just wondering about the extreme subjectivity of the thing in dispute.) Here,

a cup of nice hot darjeeling to compensate for the distraction. Aditya^{(talk • contribs)} 13:18, 5 October 2020 (UTC)

@Aditya Kabir: I don't see any subjectivity here. Historical linguistics is pretty rigorous and can be safely relied upon. It is technical and unfortunately any advanced technology looks like magic. Chaipau (talk) 15:45, 5 October 2020 (UTC)

Comment - Is there any established guidelines or manual of style specifically for such linguistic articles to create the language family trees? --Zayeem ^(talk) 16:43, 5 October 2020 (UTC)

I will pocket the insult and remind that any magic look like science to believers. When all the definations of things and their relations depend upon fluidity open to interpretation and not established facts or accepted hypotheses, it really looks like an interpretation of the scripture than asserting facts. The high attitude against laymen is also not uncommon to scriptural interpreters. That kind of interpretation also has a valid claim of rigour. (By the way, history without lingustic, including historiology and historiography, happens to be my key interest and that discipline has no pretention to be a science. My comment still has nothing to do with the dispute.) Apologising again for further distraction. Aditya^{(talk • contribs)} 17:34, 5 October 2020 (UTC)

Unless there is a verifiable scholarly consensus on which classification is better, we should report both per WP:WEIGHT. We may not combine the two hypotheses as suggested as it violates WP:SYNTH. — Wug·a·po·des 17:46, 5 October 2020 (UTC)
- Or neither, if we don't want to overload the infobox. –Austronesier (talk) 18:02, 5 October 2020 (UTC)

Wugapodes, I presume this also applies to Rangpuri language which also has a classification combining two hypotheses? Za-ari-masen (talk) 18:11, 5 October 2020 (UTC)

@Austronesier: oh right, infoboxes. Including neither might not be the best course, but I don't know much about Indo-Aryan languages. IIUC, it looks like both hypotheses agree with classification up to Indo-Aryan, so we may want to include that part of the tree and then refer to the text for further hypothetical sub-classifications. @Za-ari-masen: if the infobox at Rangpuri language combines two classification hypotheses to produce a new hypothesis that does not exist in the literature, then it is likely original research (synthesis of sources) and should be revised to comply with WP:V and WP:OR. As mentioned, I'm not familiar with this language group, so I trust your decision-making on the specifics. — Wug·a·po·des 19:08, 5 October 2020 (UTC)

@Za-ari-masen: FWIW, Rangpuri has a classification based on non-conflicting sources. It's not synthesis when multiple sources state the same. –Austronesier (talk) 19:12, 5 October 2020 (UTC)

@Wugapodes: Yes, that was my idea, to cut off the tree at the bottom where the hypotheses diverge. The upper consensus part can fit in the infobox, while conflicting proposals must be explained in the prose part of the article. –Austronesier (talk) 19:18, 5 October 2020 (UTC)

Agreed, combining the two trees is synthesis; simply list both unless a reliable source that combines them is found. Gbear605 (talk) 18:24, 5 October 2020 (UTC)
A tentative summary of independent comments so far:
- We cannot combine the two trees (SYN)
- We need to give due to the two trees (WEIGHT)
The question remains—do we give both the trees in the Infobox? The suggestions seem to be:
- Provide the Ethnologue in the Infobox
- Explain the two in the text.

Also, Za-ari-masen Rangpuri is out of scope for this. Please use WP:LOP for a solution of Rangpuri, and not the solution to the problem here as input. We cannot have a chain of individual solutions to determine resolution.

Chaipau (talk) 18:49, 5 October 2020 (UTC)

@Chaipau: see this edit I made. I listed the IE and IA macro-families and replaced the disputed parts with "Disputed, see text" and a link the the classification section. What do others think? — Wug·a·po·des 19:16, 5 October 2020 (UTC)

@Wugapodes: thank you. As Austronesier has pointed out, there is no conflict between the two sources up to Bengali-Assamese languages. So maybe we can retain it up to that point and then say "disputed"? Also, is "disputed" too strong? Maybe "[more details in text]" or something? Chaipau (talk) 20:53, 5 October 2020 (UTC)

@Chaipau: And, then link the "see "[section name]" in the infobox to lead to the section. When something doesn't have one answer, it is prudent to lead readers from the infobox to a section that discusses the differing opinions in greater detail. Aditya^{(talk • contribs)} 02:02, 6 October 2020 (UTC)

Additional Comment: It seems Chatterji's grouping hypothesis has been reconstructed by Pattanayak (1966). From Toulmin (2009) p212 "Chatterji’s subgrouping hypothesis has been subjected to detailed comparative reconstruction by Pattanayak (1966)." The reason there is a disagreement between Ethnologue and Chatterji is because Ethnolgue (and Glottolog) likely follow Pattanayak, which is the more updated tree. At this point I wonder whether we should mention Chatterji at all in the Sylheti article in this context. Chaipau (talk) 10:13, 6 October 2020 (UTC)

Implementing the solution

I have implemented a solution: [16]. Effectively, I have removed the "Vangiya", as defined in Chatterjee, and replaced it with "Eastern Bengali" from Glottolog. This definition of Glottolog is based on the identification of Bengali-Assamese (Ethnologue) with Gauda-Kamrupa (Glottolog) as we have have discussed in this section: Wikipedia_talk:WikiProject_Linguistics#What_is_WP:SYNTH_when_using_multiple_sources_about_language_classification?. Chaipau (talk) 10:33, 18 October 2020 (UTC)

I have removed gottolog from the infobox since it is still WP:SYNTHESIS, combining two sources to form a family tree. The consensus in the discussion was to keep the family tree up to Bengali-Assamese languages, I have added "Disputed" after it, as recommended by Wugapodes. Za-ari-masen (talk) 16:30, 18 October 2020 (UTC)

Sharp contradiction between Middle Tamil and Old Malayalam articles

FYI

– Pointer to relevant discussion elsewhere.

Please see: Talk:Middle Tamil#Sharp contradiction between Middle Tamil and Old Malayalam articles. — SMcCandlish ☏ ¢ 😼 23:37, 20 October 2020 (UTC)