User talk:Tzusheng/sandbox/Wikipedia:Wikibench/Campaign:Editquality
What means "Damaging" / "not damaging" in general?
[edit]@PriusGod @Robertsky I am talking with @Tzusheng now and I wondered how I can made a decision about (for example) this diff while its unclear what damaging means. In the case of this diff it was clearly overlinking, but the interpretation wheter its damaging the article or not is a bit hard if there is no general conscensus. TenWhile6 (talk | SWMT) 15:23, 1 July 2023 (UTC)
- I think we have to take the entire article into context. At that time of the edit, the article contains a list of alumni with their occupations stated as well. Is it overlinking? Sure is, but the article is already overlinked, what more damage can these additional links do to an already overlinked article? (Nonetheless, a correction should be carried out to remove the non-notable entries (if any), and maybe categorisation like that on List of Hwa Chong Institution people may reduce the amount of overlinkage.) – robertsky (talk) 15:45, 1 July 2023 (UTC)
- I think in this case, overlinking is particularly damaging (relative to overlinking in general). The section is almost entirely wikilinks, it reads very awkwardly, and this is exacerbating the problem. "Musician" and "Screenwriter" are not particularly uncommon or technical terms, either. It would be good to have a general definition of "damaging" on the campaign page; something like
"Introduces new problems or exacerbates existing issues within the context of the article."
Thoughts? Actualcpscm (talk) 17:20, 1 July 2023 (UTC)- Yep, that’s why I created this talk page, we definetely need this general definition as soon as possible. TenWhile6 (talk | SWMT) 18:01, 1 July 2023 (UTC)
- Thank you all for initiating and participating in this discussion! Please feel absolutely free to update the content on the campaign page where you see fit the most for definitions or anything else. There are only two places surrounded by the <div> ...</div> tags where it will be beneficial to pay attention not to change them, given that they are where Wikibench dynamically loads data curation progress barcharts and the table. Other than that, please be bold for editing. If anything breaks, please feel free to drop me an email anytime. I'll try to fix it as soon as possible. Thank you very much! Tzusheng (talk) 19:33, 1 July 2023 (UTC)
- Yep, that’s why I created this talk page, we definetely need this general definition as soon as possible. TenWhile6 (talk | SWMT) 18:01, 1 July 2023 (UTC)
- I think in this case, overlinking is particularly damaging (relative to overlinking in general). The section is almost entirely wikilinks, it reads very awkwardly, and this is exacerbating the problem. "Musician" and "Screenwriter" are not particularly uncommon or technical terms, either. It would be good to have a general definition of "damaging" on the campaign page; something like
- Tzu-sheng and I had a couple of conversations like this - my interpretation in this case is that there exists an overlinking issue, and anything that would cause it to be more overlinked (or underlinked, if it was completely stripped of links I think it would not be great) exacerbates the problem, leading to my judgement of "damaging."
- In the general case, I read "damaging" to be 'anything that, to the labeling editor's eyes, introduces or exacerbates a policy-related issue' and "not damaging" to be 'anything that does not create or intensify a policy-related issue, or resolves such an issue.' You can see my interpretation with these two diffs - for the one I marked non-damaging, I noted that there was an issue with tone, but in my opinion it's not hurtful to the article in the end, whereas for the one I marked damaging, I noted that while the new section was okay, external links are generally not used in article bodies except in specific circumstances.
- You can see that I altered a label upon review, while writing this very post - this is because, as you note, it's not always clear one way or the other - it's important IMO to pass on diffs that you can't positively identify (and most importantly, explain your reasoning if it isn't immediately obvious) as damaging or non-damaging. PriusGod (talk) 17:47, 1 July 2023 (UTC)
Label definitions
[edit]@PriusGod @Robertsky @TenWhile6 @Tzusheng @Illusion Flame I added provisional label definitions to the campaign page, in line with the discussion I've seen so far. I think it's really important to have clear definitions for user intent and edit damage, especially as this project scales up and more editors become actively involved. What do you think of the current definitions? Actualcpscm (talk) 10:19, 2 July 2023 (UTC)
- Tzusheng, the "edit damage" definition includes the external factors cases we had briefly discussed – do you think the wording is alright in it's current form? I hope I'm representing your input accurately :) Actualcpscm (talk) 10:29, 2 July 2023 (UTC)
- @Actualcpscm First I like to say Thank you for being being bold and adding a definition. TenWhile6 (talk | SWMT) 10:39, 2 July 2023 (UTC)
- @Actualcpscm Thank you for being bold and initiating the effort! Please feel free to edit based on what you believe is the consensus rather than a single input of mine :) Tzusheng (talk) 16:31, 2 July 2023 (UTC)
- @Actualcpscm First I like to say Thank you for being being bold and adding a definition. TenWhile6 (talk | SWMT) 10:39, 2 July 2023 (UTC)
- Thanks for adding that. I agree with @TenWhile6. - 🔥𝑰𝒍𝒍𝒖𝒔𝒊𝒐𝒏 𝑭𝒍𝒂𝒎𝒆 (𝒕𝒂𝒍𝒌)🔥 12:47, 2 July 2023 (UTC)
- @Actualcpscm I also agree with the content / the definition. TenWhile6 (talk | SWMT) 11:54, 3 July 2023 (UTC)
How should labels put on diffs that have been revdel’d since its been labeled be handled? I ask this because only editors such as sysops would be able to see those diffs and therefore would not allow for more discussion, or labels on the diff. One example of this is here. Ⓩⓟⓟⓘⓧ Talk 16:37, 2 July 2023 (UTC)
- @Actualcpscm and @robertsky also kindly bring this issue up for this edit. A potential short-term solution for this 1-week field study is that I can request a speedy deletion for these Wikibench's entity pages, given that they are currently in my sandbox and should satisfy the requirement for speedy deletion. However, in the longer term, I may need to update Wikibench to catch these deleted revisions automatically and remove them from the table or move them somewhere else. Since it will likely take longer than our field study to fix it, please feel free to ping me whenever you stumble upon these entity pages, which I can request for speedy deletion after receiving a notification. What do you all think about this short-term solution? Tzusheng (talk) 16:49, 2 July 2023 (UTC)
- Continuing the discussion from here, I had offered to move such pages to an archive while suppressing the creation of redirects (using WP:PMRC#2 with @Tzusheng's agreement) to take out such entries from wikibench. I am have done batched work like this, Talk:222nd_Broadcast_Operations_Detachment#Requested_move_15_November_2022, where I single-handedly followed up with the all the necessary moves. For a start, we can identify all the pages with revdel'ed edits, and batch move them. Then for subsequent ones, just need to ping me to get the page moved, and I will have it done as soon as I can after seeing the notification. ;) This is of course until Wikibench codes have been updated to deal with such pages. – robertsky (talk) 17:13, 2 July 2023 (UTC)
- Question? Should we label edits that we think might be eligible for RevDel or suppression? — FenrisAureus ▼ (she/they) (talk) 05:35, 3 July 2023 (UTC)
- Probably not, I think we should go through the usual channels for that. Neither edits subject to RD 2-4 nor edits subject to oversight should ever be publicly labelled as such. Once an edit is RD‘d or oversighted, we can list it here and let robertsky do his thing - much appreciated, by the way! Actualcpscm (talk) 06:45, 3 July 2023 (UTC)
- I was more talking about labeling them as damaging or bad faith in wikibench — FenrisAureus ▼ (she/they) (talk) 06:47, 3 July 2023 (UTC)
- I‘m not sure the research team have access to deletion logs. Even if they do, I think it might be better to have the dataset used for training and evaluating AI be entirely transparent. I don‘t like the idea of training / evaluating AI against a reference group that only a handful of people can fully access.
- My suggestion would be to just archive them and exclude them from the study / from use in AI training and evaluation. Actualcpscm (talk) 11:01, 3 July 2023 (UTC)
- I personally do not have access to deletion logs. I appreciate the idea of keeping the entire dataset transparent. Adding to @Actualcpscm's suggestion, maybe we could archive them if that doesn't create too much trouble for @Robertsky?
- By the way, based on @TenWhile6's suggestion, I just added a warning message on the entity page for revdel’d diffs. Tzusheng (talk) 20:44, 3 July 2023 (UTC)
- Seems like a good solution to me. Actualcpscm (talk) 20:46, 3 July 2023 (UTC)
- @Tzusheng I have moved the pages to User:Tzusheng/sandbox/Wikipedia:Wikibench/Archive/Entity:Diff/*. – robertsky (talk) 04:29, 4 July 2023 (UTC)
- @Robertsky Thank you! Tzusheng (talk) 04:33, 4 July 2023 (UTC)
- @Tzusheng, the revdel'ed message box is wrongly applied on User:Tzusheng/sandbox/Wikipedia:Wikibench/Entity:Diff/false/703976291. See #Other cases. – robertsky (talk) 04:33, 4 July 2023 (UTC)
- @Robertsky Thanks for the ping! While I didn't expect that Wikibench could record labels for new page creations, this can be a good thing because it means that Wikibench will be able to support new page patrol labeling with only a few tweaks on the technical infrastructure. Before actually making changes to Wikibench, it might be helpful to discuss the following three questions:
- Are there AI models for new page patrols that will benefit from evaluation through Wikibench?
- What are the appropriate labels for new pages? While edit damage and user intent may work for identifying vandalism on general edits, other labels might be more helpful to new page creations.
- The entity page title for new pages should differ from that for diffs. Maybe something like Wikipedia:Wikibench/Entity:Newpage/revisionID could work? Tzusheng (talk) 04:47, 4 July 2023 (UTC)
- I remember @Alpha3031 mentioned the new page patrol in our chat and might have some thoughts on this as well. Tzusheng (talk) 04:53, 4 July 2023 (UTC)
- I actually do AfC (drafts) at the moment (could probably apply for NPP but the queue is about the same and AfC is easier) but they pretty much use the same infra, except what they (and we) get from ORES at the moment is for specific issues (Spam/Copyvio/Attack page, though Copyvio is a different bot), article assessment and article topic classifications rather than a straight good/bad. The standard likelydamaging etc. might be useful in unsubmitted drafts, but the volume of drafts (and actual articles) that are actually vandalism is low enough that human review is probably easier. Alpha3031 (t • c) 05:06, 4 July 2023 (UTC)
- @Robertsky @Alpha3031 Thanks so much for your feedback! If AI prediction and evaluation are not essential for AfC and NPP at this moment, we could probably archive this new page here. What do you all think? Tzusheng (talk) 21:52, 5 July 2023 (UTC)
- Being a new page doesn't necessarily mean it's out of scope for recent changes patrollers, vandalism would just be tagged for speedy deletion in those cases. I'm fairly sure I've done more CSD tagging while in a RCP workflow than I have outside of it. Though, it does mean that the bad revisions will be deleted instead of just been reverted in those cases. But yeah, probably more useful if we could wait for more tagging options. Alpha3031 (t • c) 02:25, 6 July 2023 (UTC)
- @Robertsky @Alpha3031 Thanks so much for your feedback! If AI prediction and evaluation are not essential for AfC and NPP at this moment, we could probably archive this new page here. What do you all think? Tzusheng (talk) 21:52, 5 July 2023 (UTC)
- I actually do AfC (drafts) at the moment (could probably apply for NPP but the queue is about the same and AfC is easier) but they pretty much use the same infra, except what they (and we) get from ORES at the moment is for specific issues (Spam/Copyvio/Attack page, though Copyvio is a different bot), article assessment and article topic classifications rather than a straight good/bad. The standard likelydamaging etc. might be useful in unsubmitted drafts, but the volume of drafts (and actual articles) that are actually vandalism is low enough that human review is probably easier. Alpha3031 (t • c) 05:06, 4 July 2023 (UTC)
- @Robertsky The warning message box has been updated for new pages! Tzusheng (talk) 02:10, 6 July 2023 (UTC)
- @Robertsky Thanks for the ping! While I didn't expect that Wikibench could record labels for new page creations, this can be a good thing because it means that Wikibench will be able to support new page patrol labeling with only a few tweaks on the technical infrastructure. Before actually making changes to Wikibench, it might be helpful to discuss the following three questions:
- I was more talking about labeling them as damaging or bad faith in wikibench — FenrisAureus ▼ (she/they) (talk) 06:47, 3 July 2023 (UTC)
- Probably not, I think we should go through the usual channels for that. Neither edits subject to RD 2-4 nor edits subject to oversight should ever be publicly labelled as such. Once an edit is RD‘d or oversighted, we can list it here and let robertsky do his thing - much appreciated, by the way! Actualcpscm (talk) 06:45, 3 July 2023 (UTC)
I have complied a list of pages affected below. – robertsky (talk) 18:05, 2 July 2023 (UTC)
- Thanks so much for compiling the list! Tzusheng (talk) 01:37, 3 July 2023 (UTC)
- @Tzusheng maybe you can add a warning instead of “nothing” under the headline, if the rev is deleted? TenWhile6 (talk | SWMT) 11:52, 3 July 2023 (UTC)
- @TenWhile6 That's a good idea! I just updated Wikibench's script. How does the message look? Tzusheng (talk) 18:25, 3 July 2023 (UTC)
- @Tzusheng very Good! Thank you! TenWhile6 (talk | SWMT) 19:54, 3 July 2023 (UTC)
- @TenWhile6 That's a good idea! I just updated Wikibench's script. How does the message look? Tzusheng (talk) 18:25, 3 July 2023 (UTC)
- @Tzusheng maybe you can add a warning instead of “nothing” under the headline, if the rev is deleted? TenWhile6 (talk | SWMT) 11:52, 3 July 2023 (UTC)
List of revdel entity pages
[edit]Other cases
[edit]There are other possible cases where the revisions won't appear:
- User:Tzusheng/sandbox/Wikipedia:Wikibench/Entity:Diff/false/703976291 - New page created. Can be accessed/assessed by going to Special:Diff/703976291.
- Edit that logs page move in the edit summary/comment.
Display of revdel'd entries
[edit]@Tzusheng:I'm wondering if it would be possible to have revdel'd entries on the labeled data table struck out so people dont click on them which is kind of annoying.— FenrisAureus ▼ (she/they) (talk) 02:15, 6 July 2023 (UTC)
- @FenrisAureus Yes, it's possible! But similar to the discussion label for available talk pages, it requires rewriting the infrastructure to avoid drastically increasing the loading time of the table, which already takes some time. I'll definitely take note of this as well for the next iteration. Thanks again for the great suggestion! Tzusheng (talk) 17:56, 6 July 2023 (UTC)
Discussion Label
[edit]@Tzusheng It’s not easy to see where is a local diff discussion. Maybe you can add a label “discussion: no/yes” to the big table so you can see where is a discussion and join there. The disagreement is a good way to find discussions, but is not perfect. (There are diffs with disagreement and no talk and diffs with no disagreement but a talk) TenWhile6 (talk | SWMT) 11:56, 3 July 2023 (UTC)
- Agree. The inclusion of such a link would be helpful, perhaps checking if said diff has a talk page or not. 1TWO3Writer (talk) 14:44, 3 July 2023 (UTC)
- Comment: I think the „disagreement“ column is meant for this, and I think the buttons provide a great way to navigate the table currently. Not sure if this is necessary. Actualcpscm (talk) 14:55, 3 July 2023 (UTC)
- Comment: That's a great suggestion! I think it is technically possible, but it may take some time for me to figure out the best way to implement it. Let's continue the discussion here to see what others think about it. Meanwhile, I will try to find ways to make it happen if people generally think adding this feature would be helpful. Tzusheng (talk) 19:04, 3 July 2023 (UTC)
- Agree. — FenrisAureus ▼ (she/they) (talk) 00:02, 4 July 2023 (UTC)
- Comment: I did some research on the implementation details for this feature. It turns out the required technical change to the infrastructure is overly complicated to be done before our scheduled exit interviews this and next week. However, I'll definitely take notes and integrate this feature into Wikibench's next iteration. Thanks again for the great suggestion! Tzusheng (talk) 21:40, 5 July 2023 (UTC)
Timestamped edit quality table
[edit]Perhaps the table should be sectioned with the time the primary edit took place so it is not so long to scroll through. 1TWO3Writer (talk) 13:44, 3 July 2023 (UTC)
- @1TWO3Writer That might be a great idea! Which timestamp are you thinking of? For example, is it the timestamp of the latest change of the primary label or any edits to the entity page (including individual labels, discussions, etc.)? Tzusheng (talk) 14:29, 4 July 2023 (UTC)
- I think a timestamp for the latest change of the primary label would work! The change of the primary label would mean there is active discussion going on. 1TWO3Writer (talk) 15:37, 4 July 2023 (UTC)
- Yes! As others also suggested in this thread, showing whether an active discussion is available along with a timestamp will be great! I likely will not be able to implement the change before our exit interview, but I will integrate it into the next iteration. Thanks again for the great suggestion! Tzusheng (talk) 21:57, 5 July 2023 (UTC)
- I think a timestamp for the latest change of the primary label would work! The change of the primary label would mean there is active discussion going on. 1TWO3Writer (talk) 15:37, 4 July 2023 (UTC)
Paid-contribution disclosure
[edit]@Actualcpscm Thank you for adding the section about WP:PAID policy. Given that declaration in edit summaries is not possible for technical reasons, what do you think about creating a contribution list for people who participate in the paid study to sign up? Wikibench currently has a contact list for people interested in future updates. We may create another list specifically for paid contributions for disclosure purposes. Tzusheng (talk) 18:42, 3 July 2023 (UTC)
- Alternatively, will adding Wikibench's user box suffice? Tzusheng (talk) 19:12, 3 July 2023 (UTC)
- That works for me, although it might raise some privacy questions. Still, I think it would be legitimate for you to publish a list of paid participants. It would be a nice convenience, but it doesn't replace user disclosure as required by the terms of use. Same thing for the userbox; the terms of use are very specific about how paid editing should be disclosed. Maybe if the userbox is edited to say "This user was paid by CMU (or whoever is funding this, is it CMU?) to participate in Wikibench, a research project about etc etc". I personally don't think this project really belongs in the same category as conventional paid editing, and I don't think remunerated research participation is frowned upon the same way that conventional paid editing commonly is, but the terms of use don't make that distinction. Actualcpscm (talk) 19:28, 3 July 2023 (UTC)
- I agree that we should definitely be cautious about the privacy concern. Let's continue the discussion here to see what others think. I am happy to create a Wikibench userbox specifically for paid contributions if people generally find it necessary in addition to the existing userbox. Thank you again for bringing up this important issue! Tzusheng (talk) 20:54, 3 July 2023 (UTC)
- I went ahead and made a new one, tweaking the wording of {{paid}} to fit the situation:
{{User:FenrisAureus/wikibench-disclose}}
This user, in accordance with the Terms of Use, discloses that they have been paid by CMU for their contributions to the development of Wikibench's Edit Quality Campaign. - Feel free to move it to to the project space. — FenrisAureus ▼ (she/they) (talk) 00:27, 4 July 2023 (UTC)
- Thank you, @FenrisAureus! I also created one that is more general and added both to the project page here. Tzusheng (talk) 01:08, 4 July 2023 (UTC)
- I made a minor edit to both of them to include CMU as the employer, as required by ToU. That way, the userbox is sufficient on its own with regards to the disclosure requirements. Actualcpscm (talk) 09:32, 4 July 2023 (UTC)
- Thanks! — FenrisAureus ▼ (she/they) (talk) 13:56, 4 July 2023 (UTC)
- Thanks so much! Tzusheng (talk) 14:10, 4 July 2023 (UTC)
- @Tzusheng Side note: It might be good to notify paid contributors that they are obliged to disclose the payments; I'm not sure everyone is intricately familiar with WP:PAID, and it's not unreasonable to believe that research projects like Wikibench would be exempt from disclosure requirements. Actualcpscm (talk) 15:11, 4 July 2023 (UTC)
- Hell, I started a thread on the village pump and some people there seem to think that. It might be a good idea to just flat out say in the onboarding: "If you are compensated for your participation, you are obligated to disclose." — FenrisAureus ▼ (she/they) (talk) 15:16, 4 July 2023 (UTC)
- Thank you for initiating the discussion! Yes, I will definitely remind all the paid contributors during the upcoming exit interviews this and next week. Tzusheng (talk) 15:47, 4 July 2023 (UTC)
- I've updated the section on this page to reflect the consensus I read at that VP discussion. It seems to me that the general sentiment is something like this: "Yes, it's technically paid editing and technically needs to be disclosed, but it's not substantially the same thing as being paid to edit in the article namespace, and probably very few people would care." Actualcpscm (talk) 19:48, 4 July 2023 (UTC)
- The updated text looks wonderful! Thank you for following up on the VP discussion! Tzusheng (talk) 21:35, 5 July 2023 (UTC)
- Hell, I started a thread on the village pump and some people there seem to think that. It might be a good idea to just flat out say in the onboarding: "If you are compensated for your participation, you are obligated to disclose." — FenrisAureus ▼ (she/they) (talk) 15:16, 4 July 2023 (UTC)
- @Tzusheng Side note: It might be good to notify paid contributors that they are obliged to disclose the payments; I'm not sure everyone is intricately familiar with WP:PAID, and it's not unreasonable to believe that research projects like Wikibench would be exempt from disclosure requirements. Actualcpscm (talk) 15:11, 4 July 2023 (UTC)
- I made a minor edit to both of them to include CMU as the employer, as required by ToU. That way, the userbox is sufficient on its own with regards to the disclosure requirements. Actualcpscm (talk) 09:32, 4 July 2023 (UTC)
- Thank you, @FenrisAureus! I also created one that is more general and added both to the project page here. Tzusheng (talk) 01:08, 4 July 2023 (UTC)
- I agree that we should definitely be cautious about the privacy concern. Let's continue the discussion here to see what others think. I am happy to create a Wikibench userbox specifically for paid contributions if people generally find it necessary in addition to the existing userbox. Thank you again for bringing up this important issue! Tzusheng (talk) 20:54, 3 July 2023 (UTC)
Seeking feedback on data statement
[edit]I added a section called the data statement to prevent people from inappropriately using the dataset in ways that the data is not labeled and curated for. What do you all think about the statement? For example, is it necessary or appropriate? Please feel free to edit the statement to make it better! Tzusheng (talk) 22:32, 3 July 2023 (UTC)
- Good stuff! I edited the wording a bit, what do you think of this version? Actualcpscm (talk) 22:36, 3 July 2023 (UTC)
- Looks wonderful to me! Thanks so much for the revision. Tzusheng (talk) 22:39, 3 July 2023 (UTC)
Small QOL code suggestion
[edit]By adding the following code to wikibench a direct link to the project page will appear in the user's personal tool section after "Sandbox"
mw.util.addPortletLink(
'p-personal',
'https://en.wikipedia.org/wiki/User:Tzusheng/sandbox/Wikipedia:Wikibench/Campaign:Editquality',
'Wikibench',
'pt-wikibench',
'wikibench',
null,
'#pt-preferences'
);
— FenrisAureus ▼ (she/they) (talk) 02:47, 4 July 2023 (UTC)
- Yes please! Actualcpscm (talk) 08:19, 4 July 2023 (UTC)
- @FenrisAureus @Actualcpscm Thanks for the suggestion! Would you mind sharing a bit more about what it does? I already added the code to Wikibench but didn't notice any change. Tzusheng (talk) 14:25, 4 July 2023 (UTC)
- The code adds a direct link to the project page to the personal toolbar at the top. — FenrisAureus ▼ (she/they) (talk) 14:34, 4 July 2023 (UTC)
- It functionally replaces a browser bookmark, which is useful when so much other stuff is already bookmarked :) Actualcpscm (talk) 14:46, 4 July 2023 (UTC)
- That's wonderful! Thanks for the clarification! Tzusheng (talk) 15:55, 4 July 2023 (UTC)
- It functionally replaces a browser bookmark, which is useful when so much other stuff is already bookmarked :) Actualcpscm (talk) 14:46, 4 July 2023 (UTC)
- @FenrisAureus @Actualcpscm Thanks for the suggestion! Would you mind sharing a bit more about what it does? I already added the code to Wikibench but didn't notice any change. Tzusheng (talk) 14:25, 4 July 2023 (UTC)
Test edits
[edit]I've come across a few cases (such as this one) where it was argued that an edit was made in good faith because it might have been a test edit. I don't think test edits should be classified as being made in good faith; the intention "I will try and see if I can make this article worse by removing content / introducing nonsense" is not meaningfully better than "I will make this article worse by removing content / introducing nonesense." I think if a test edit makes damaging changes, that should be classified as bad faith in so far as the editor intended to publish damaging changes, even if they were unsure of their technical ability to do so. What does everyone think? Actualcpscm (talk) 15:06, 4 July 2023 (UTC)
- I agree that test edits should be classified as damaging/bad faith. I don’t believe I’ve ever ran into a case with test edits where good faith is evident. Ⓩⓟⓟⓘⓧ Talk 19:13, 4 July 2023 (UTC)
- WP:IDTEST:
Remember that vandalism is "any addition, removal, or change of content, in a deliberate attempt to damage Wikipedia". While editing tests are certainly deliberate, the intention of editing tests are usually to "see what this will do" or "see if this works", and are usually not made with the intention of damaging Wikipedia.
— FenrisAureus ▼ (she/they) (talk) 01:34, 5 July 2023 (UTC)- I agree with Fenris here. While such actions don't make that much sense to experienced editors who are aware of the policies and guidelines, new users aren't and might just type some random thing just to see it on the page. 1TWO3Writer (talk) 07:14, 5 July 2023 (UTC)
- Although it‘s a well-written essay, I can‘t say I agree with that conclusion. There are plenty of ways to test if „this will work“ that don‘t involve damaging an article, such as correcting grammar or spelling mistake or even just rewriting a sentence in a slightly better way. If the only intention is to test if editing works, there‘s no need to make that test a damaging edit. Actualcpscm (talk) 10:10, 5 July 2023 (UTC)
- If it's anything to go by, Twinkle does come with a warning for users who make and then revert test edits (Template:uw-selfrevert). But when the selection is confined to a singular diff, then it may be best to mark it as damaging. Loafiewa (talk) 23:35, 9 July 2023 (UTC)
Selection bias
[edit]@Tzusheng This is particularly relevant to you, since you're managing this study (even if you're allowing consensus to form organically on most issues, which is very nice).
I wonder how useful this dataset will be for training and evaluating AI given that it does not accurately represent the totality of edits on Wikipedia, or even in the main namespace. Surely there are significant biases in how editors select edits to evaluate; for example, damaging and bad-faith edits are significantly overrepresented, edits outside the main and Talk: namespaces are probably underrepresented, and there is significant recency bias. Were those initial 15 diffs you provided to each editor chosen randomly to account for that? Seems like it would be good to have a representative dataset alongside the curated one, which largely consists of this biased selection. What are your thoughts? Actualcpscm (talk) 20:49, 4 July 2023 (UTC)
- Also, since most RC patrollers (including myself) come across bad edits ID'd through the preexisting ORES model, and the aim of this project is to evaluate the effectiveness of the preexisting ORES model, there seems to be a significant potential for dataset bias here. — FenrisAureus ▼ (she/they) (talk) 14:42, 5 July 2023 (UTC)
- @Actualcpscm @FenrisAureus Thank you for the very good question! Yes, as both of you mentioned, simply using the entire curated dataset for evaluation will not be a good idea because of the potential selection bias. The accuracy, or even other metrics, calculated using the whole dataset will not reflect the actual quality of the AI models. However, there are multiple approaches to account for this issue, including but not limited to the one @Actualcpscm suggested. In fact, I plan to share a few more proposals with you during the exit interview to learn about your feedback on which might be most helpful to patrollers and the Wikimedia community. If you have any other suggestions, please feel free to share them here in advance! Tzusheng (talk) 21:31, 5 July 2023 (UTC)
Talk in article / promo
[edit]I came across these edits, of which seem to be personal opinion commentary from COI editors: 1, 2. To me, this is bad faith editing; I find it hard to imagine that these were honest attempts to improve the articles. However, @FenrisAureus seems to disagree. Thoughts on this? Actualcpscm (talk) 11:36, 7 July 2023 (UTC)
- Agreed. I don't remember my thought process on this, but I was clearly wrong. — FenrisAureus ▲ (she/they) (talk) 12:46, 7 July 2023 (UTC)
- I've changed my labels on both of them, thanks! — FenrisAureus ▲ (she/they) (talk) 12:53, 7 July 2023 (UTC)
Use of entity talk pages
[edit]During an off-wiki convo with @Tzusheng, we developed a more coherent idea of where discussions should take place within Wikibench. To summarise our conclusion: entity talk pages (i.e. the individual diffs' talk pages) should be used only for discussion specifically about that individual edit. Any discussion that is or might become generally relevant should be moved to the campaign talk page right here. This is necessary because entity talk pages are not usually accessed by editors who aren't concerned with that specific edit, and consensus that develops on entity talk pages will not be broadly apparent. A suggested implementation of that was to automatically add editnotices to entity talk pages that summarise this principle. What does everyone think? Do you agree with this characterisation of the different talk pages? Actualcpscm (talk) 20:15, 11 July 2023 (UTC)