Wikipedia talk:Prosesize

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

How do I use it?[edit]

@Galobtter: I cannot find where the toolbox for this gadget is supposed to be and it is not mentioned in the documentation. Veverve (talk) 04:00, 26 November 2021 (UTC)[reply]

Idem. Joshua Jonathan -Let's talk! 09:57, 5 January 2022 (UTC)[reply]

@Kusma: could you help us figuring out how to use this gadget? Veverve (talk) 22:13, 17 February 2022 (UTC)[reply]

@Veverve, @Joshua Jonathan, you should find the link "Page size" in the "Tools" box on the left of the screen if you are using the Vector or Monobook skin on a desktop with sufficiently large window. In desktop mode on a phone, it may be hidden behind a "tools" symbol that you need to click to open. I don't use mobile mode, so I wouldn't know whether it is accessible there. If you can't get it to work at all, try to describe your preferences settings and what kind of browser and window size you use and ask at the help desk or at the technical village pump if the help desk can't help you. —Kusma (talk) 22:23, 17 February 2022 (UTC)[reply]
@Kusma: found it, thanks a lot! Veverve (talk) 22:38, 17 February 2022 (UTC)[reply]

Bug? Looks like something is counted that should not be[edit]

At Ludwig Ferdinand Huber, prosesize gives me 4257 B. (Xtools has 4281 B). After this edit, prosesize counts 2352 B and Xtools articleinfo gives 2368 B. While it is normal that removing the EB template by adding a * should reduce the prose size by around 300 B, there seem to be about 1700 B of extra material that is invisible, yet is being counted. @Izno suggested on Discord that this could be the <style> content.

Happy to explain more if this bug report is unclear, please let me know. —Kusma (talk) 22:10, 17 February 2022 (UTC)[reply]

Pinging experts @Galobtter, @Dr pda in case this is not a good place to raise the issue. —Kusma (talk) 09:59, 18 February 2022 (UTC)[reply]
Yeah it looks like the text inside the style tag is being counted. I'll update the code to exclude that tag. Galobtter (pingó mió) 23:01, 18 February 2022 (UTC)[reply]
@Kusma  Done. Galobtter (pingó mió) 06:15, 30 April 2022 (UTC)[reply]

Extra returns attached to topicons[edit]

Run the tool on J. R. R. Tolkien for an example of what I mean. For every icon at the top of a page, another instance of the script result. I asked in the Discord server and was told it had something to do with the ".mw-parser-output" causing multiple returns. Is this unique to me? –♠Vamí_IV†♠ 10:53, 21 September 2022 (UTC)[reply]

No, sometimes I get duplicated instances as well. Thinker78 (talk) 15:33, 22 September 2022 (UTC)[reply]
@Vami IV, @Thinker78  Fixed Galobtter (pingó mió) 06:56, 27 January 2023 (UTC)[reply]
Awesome, thanks. –♠Vamí_IV†♠ 08:30, 27 January 2023 (UTC)[reply]
@Galobtter: Can you check if this patch added any javascript errors? I'm not a coder; I've noticed that previewing an article I'm working on sends me back to the top of the article in the edit window rather than leaving me where I was. When I asked around I was told that it was probably because of a busted usercript gadget and I can't recall this happening before this ping. –♠Vamí_IV†♠ 17:52, 1 February 2023 (UTC)[reply]
No, shouldn't be able to cause any javascript errors, and the code I changed doesn't run unless the page size is requested. Galobtter (pingó mió) 23:03, 1 February 2023 (UTC)[reply]

Math handling bug?[edit]

On Prototype filter, I'm getting 31 kB prosesize on an article that doesn't crack 15 kB wikitext. theleekycauldron (talkcontribs) (she/her) 04:38, 27 January 2023 (UTC)[reply]

@Theleekycauldron: you can use the new prosesize tool to get an accurate count (8.2kb), which properly ignores math blocks. Legoktm (talk) 04:38, 25 February 2023 (UTC)[reply]
Hi Legoktm, this just came up on another page. Would you happen to know if there is a reason this addition has not been pushed to the main prosesize script? Alternatively/in the meantime, is there a reason not to add this exception to the Prose size notes here? Best, CMD (talk) 06:07, 26 October 2023 (UTC)[reply]

Pull numbers from prosesize.toolforge.org tool?[edit]

I've created a new prosesize tool that provides more accurate counts (see the above section about math). It has an API, I think it would be better if this gadget pulled from the tool to centralize the counting logic in one place. The tool uses the same logic as the Featured articles by size database report, I've explained a bit more on my blog. Legoktm (talk) 04:40, 25 February 2023 (UTC)[reply]

+1. Current version of this gadget is bugged and gives 5-6 times the actual word count!  wolfRAMM  21:05, 16 May 2023 (UTC)[reply]
I think my only concern with using the tool is losing the highlighting of readable prose, but I don't know how much of a concern that is. Galobtter (talk) 21:49, 23 November 2023 (UTC)[reply]
@Legoktm getting CORS issues when trying to query the API - I think you might need to set Access-Control-Allow-Origin to allow requests to your API? Galobtter (talk) 22:50, 23 November 2023 (UTC)[reply]
@Galobtter: done, should now be accessible over CORS. I suspect the highlighting can be done with just CSS, here's the stylesheet I use against the Parsoid HTML output. I can try to write one for the current page view HTML after the holiday weekend if no one beats me to it. Legoktm (talk) 04:44, 24 November 2023 (UTC)[reply]

Fix for conflict with WikiEdit tool[edit]

Hi! As reported here and here, there's a conflict between this gadget and the Wikipedia:WikiEdit tool. First I tried to fix it from WikiEdit, but after some trying and thinking, I figured it's much easier to fix it from here, by adding this exception to the ones already contemplated. Can it be added to MediaWiki:Gadget-Prosesize.js? Thanks! Sophivorus (talk) 23:40, 6 July 2023 (UTC)[reply]

Have you considered a more sophisticated word count tool?[edit]

The current one, i.e. just wordCount += this.innerHTML.replace( /(<([^>]+)>)/ig, '' ).split( ' ' ).length;, does a substantial overcount across most articles I have tested, though I haven't tried tracing through to figure out what parts are getting counted as "words" that shouldn't be.

Relatedly, has anyone considered making up a list of test cases with their "ideal" (i.e. consensus count if manually counted by humans) word counts? There are a lot of wonky interactions in wiki pages between templates, footnotes, manually entered html, math formulas, mediawiki features, etc., and it seems unlikely any tool is going to get all of these right at first try without some kind of set of tests to target and measure current performance against.

jacobolus (t) 06:54, 26 October 2023 (UTC)[reply]

Now uses prosesize toolforge tool for word counts[edit]

Thanks Legoktm for creating the prosesize tool - this script now uses that API to get word counts. This fixes pages with lot of <math> tags like Prototype filter or Galois cohomology that had bad word counts before. The only issue right is that now the highlighting is no longer in sync with the word count on those kinds of pages - athough I figured it was more important to have an accurate word count than the highlighting.

Note: this is only used and works when viewing the current revision of a page - we fallback to the old word count on page preview and old revisions.

Ping @Theleekycauldron, Chipmunkdavis, WolfRAMM, and Jacobolus: as people who had issues with the word count before - how do things look now? Galobtter (talk) 21:04, 24 November 2023 (UTC)[reply]

Thanks @Galobtter! I've added support for older revisions to the tool, just add a ?revision= parameter (example) and I'm thinking about how to handle page previews... let me know if other features are needed. Legoktm (talk) 04:12, 25 November 2023 (UTC)[reply]
Added that in, thanks. Galobtter (talk) 16:14, 25 November 2023 (UTC)[reply]
@Galobtter: It appears there's a slight issue with this. It doesn't seem to work if the page's name contains characters like : or /, so it doesn't work in userspace or, say, on Speakerboxxx/The Love Below. The API page it's trying to access returns 404 unless these characters are percent-encoded. AstonishingTunesAdmirer 連絡 23:27, 25 November 2023 (UTC)[reply]
Fixed below. Galobtter (talk) 00:08, 26 November 2023 (UTC)[reply]
Thank you! Works great now. AstonishingTunesAdmirer 連絡 04:03, 26 November 2023 (UTC)[reply]
@Galobtter Word count seems more or less accurate, but the "Prose size (text only)" is off. For instance Directed information gives 4280 B (692 words) "readable prose size" while xtools result for the same page is Bytes: 4,395 Words: 664.  wolfRAMM  03:48, 11 December 2023 (UTC)[reply]
I didn't notice that prosesize substracts the refmarksize from the prosesize it calculates - @Legoktm: I assume your prosesize count excludes the ref mark text right? If so then I will not subtract the ref mark size when using the prosesize from your tool. Galobtter (talk) 03:57, 11 December 2023 (UTC)[reply]
@Galobtter: correct. Legoktm (talk) 17:32, 12 January 2024 (UTC)[reply]
 Fixed Galobtter (talk) 03:30, 16 January 2024 (UTC)[reply]

Interface-protected edit request on 25 November 2023[edit]

On line 40, please wrap mw.config.get( 'wgPageName' ) in mw.Uri.encode(), to prevent the gadget from not working on pages with special characters like colons and forward slashes.

Diff:

+ mw.config.get( 'wgPageName' ) + '?revision=' + mw.config.get( 'wgRevisionId' ) );
+
+ mw.Uri.encode( mw.config.get( 'wgPageName' ) ) + '?revision=' + mw.config.get( 'wgRevisionId' ) );

mw (talk) (contribs) 23:27, 25 November 2023 (UTC)[reply]

 Done Done using javascript encodeURIComponent; I think no need to add a dependency on mediawiki.Uri. Galobtter (talk) 00:05, 26 November 2023 (UTC)[reply]
I've moved this request from MediaWiki talk:Gadget-Prosesize.js to centralize discussion. Galobtter (talk) 00:06, 26 November 2023 (UTC)[reply]

Getting prose sizes for a *group* of articles[edit]

Hello! Is it possible to use this tool to get a list of articles ordered by prose size? Ideally, this would replace User talk:Dr pda/generatestats.js so that we can see the longest/shortest articles that use a good or featured article template. Having access to this info could significantly impact the proposal at Wikipedia talk:Good article nominations#Proposal: mandate compliance with WP:TOOBIG in GA criterion 3b.

Currently, Dr pda's script still works but will only pull 500 articles (see examples 1, 2). Petscan will only return info based on the article's total size in wikicode, which can be very different from a word count (e.g. Phillippines).

Thanks for any help y'all can provide! cc Galobtter, Legoktm. Ed [talk] [OMT] 08:52, 12 January 2024 (UTC)[reply]

Hi Ed! I set up Wikipedia:Database reports/Featured articles by size a while back based on a similar request, do you want the same thing for good articles? There are 38k, so presumably it'll be split over multiple pages. Or are you looking for something else? Legoktm (talk) 17:27, 12 January 2024 (UTC)[reply]
Hello Legoktm! Hope you're doing well. :-) I completely missed that page. Thanks for sharing it. So, I'm not sure what the effort level is to accomplish this. For the purposes of the linked discussion + for the GA project more broadly, my thought is that they'd only really utilize lists of the shortest/longest GAs. If that's a trivial ask, fantastic. But if it's simpler and possible to ctrl+F for "featured" and replace with "good" in your existing code, a list of all of them would work great. Ed [talk] [OMT] 20:15, 12 January 2024 (UTC)[reply]
Fixing ping Legoktm. Ed [talk] [OMT] 21:23, 12 January 2024 (UTC)[reply]
The main thing is to create a list of the shortest and longest, we need to also calculate the sizes of everything in the middle too :) So I did the swap "Featured" for "Good" and copied most of the code, here you go: Wikipedia:Database reports/Good articles by size. The longest are on page 1 and the shortest at the end of page 4. Let me know if this works for you and/or if there's other stuff you want! Legoktm (talk) 03:13, 13 January 2024 (UTC)[reply]
@Legoktm: Huh. That just makes too much sense for me to have realized that by myself. :-)) That is perfect! Thanks very much. I appreciate the work and owe you one. Ed [talk] [OMT] 19:18, 13 January 2024 (UTC)[reply]
You're welcome! Legoktm (talk) 17:30, 14 January 2024 (UTC)[reply]