Ratings by category

Writer61

Englishman abroad
Joined
Feb 17, 2024
Posts
584
I have read a lot about how ratings differ between categories. With some time on my hands, I decided to have a look. The method was to copy and paste from the category list into an Excel sheet with some custom formulae to parse each entry. FWIW, writing the functions took the longest

Taking the first 750 titles from each of the categories below, the average scores were as follows:

Ratings.png
Anecdotally, I was expecting Loving Wives to be the lowest, but Group Sex is significantly lower than the others.

Does anybody have a good way to download stats from the site?
 
I don' think there's an API or way to really do that beyond what you're doing.

Not saying this is likely to result in something, but I don't see a downside to sending Manu a DM asking pretty please if he could run this on the whole DB. I have to think he's already written a query for it, and even if not that's like 90 seconds work if you already know the schema.
 
I don' think there's an API or way to really do that beyond what you're doing.
You could write a python script to scrape the pages with BeautifulSoup. That's what I would do.

It turns out there IS an API, at least one, but it's totally clear to me that it wasn't intended for public use and it would be a cumbersome, inefficient way to get the data you wanted.
 
You could write a python script to scrape the pages with BeautifulSoup. That's what I would do.

It turns out there IS an API, at least one, but it's totally clear to me that it wasn't intended for public use and it would be a cumbersome, inefficient way to get the data you wanted.
I stand corrected ☺️

I could write the query if I could see the schema. That's the extent my skill in the arena, lol.
 
Ratings.png

Interesting. I write almost exclusively in GS, and do much better than what your table implies. OTOH, have read very few works in that category, so really can't judge the overall picture. What I do know is that it is challenging to portray multidimensional relationships that make any sense, so possibly I inadvertently stumbled into something that works.
 
Interesting. I write almost exclusively in GS, and do much better than what your table implies. OTOH, have read very few works in that category, so really can't judge the overall picture. What I do know is that it is challenging to portray multidimensional relationships that make any sense, so possibly I inadvertently stumbled into something that works.
I have written in all those categories and the relative orderings of your table is very different than my experience. GS is my highest and LW is my lowest.

@8letters did the table here from 7 year old data, which agrees with my more recent personal experience.
 
I have written in all those categories and the relative orderings of your table is very different than my experience. GS is my highest and LW is my lowest.

@8letters did the table here from 7 year old data, which agrees with my more recent personal experience.
Not sure how I missed that thread.
 
I took the first 750 stories alphabetically, which may/may not be a representative sample. The aim was to see whether there was a difference between categories.

If your story titles begin with anything other than a punctuation mark, a number, or an A, they are not in the data set.
 
I took the first 750 stories alphabetically, which may/may not be a representative sample. The aim was to see whether there was a difference between categories.

If your story titles begin with anything other than a punctuation mark, a number, or an A, they are not in the data set.
Thanks for the details. I was guess chronologically sorted (most recent 750, which might or might not back far enough. Something feels wonky with the results to me, presumably something in the sample I am not imagining..
 
Minimum zero is presumably for stories that never got rated at all. The minimum actual rating is 1.
Correct. Some stories do not achieve a rating. It is actually the mode for the data set I extracted with 71 of 3675 records.
 
Adding a similar anecdote of my own. GS is the category of my highest scoring story (teetering around 4.8), whereas R and E&V are second and third worst, respectively (around 4.2 - 4.3).

I suspect that your alphabetic sample has kind of systemic bias in it. Perhaps what you have actually demonstrated is that stories whose titles are in the format of "A Something" correlate with lower scores overall :)
 
It would be if there are an extraordinary number of 0's. Can you filter the zero's out and see what it gives you.
 
Adding a similar anecdote of my own. GS is the category of my highest scoring story (teetering around 4.8), whereas R and E&V are second and third worst, respectively (around 4.2 - 4.3).

I suspect that your alphabetic sample has kind of systemic bias in it. Perhaps what you have actually demonstrated is that stories whose titles are in the format of "A Something" correlate with lower scores overall :)
It is a comparison between categories. That I started alphabetically only matters if some "A" stories in some categories score lower than in others. That may be true, but I am not going to bother proving that.

FWIW, I deliberately started at A to eliminate variation. YMMV.
 
Last edited:
It would be if there are an extraordinary number of 0's. Can you filter the zero's out and see what it gives you.
1751055296305.png
Excluding 0s.

The only clear message there is that LW readers are less inclined to score.
 
You might check your procedure. I don't think the median should be that far below the average.
Yup, I screwed up the calculation. Excel isn't so helpful with calculating the median of a subset.

Let me get back to you tomorrow.
 
But the categories are now ordered in the way 8letter's data suggested. And several of our experiences reflect. Romand now has the highest average and LW the lowest, by a significant margin.
Bloody Excel, I sorted by # of stories and it garbled the output.
 
I took on the challenge of examining whether there is any difference in ratings based on the first letter of the story title. As expected, there is no significant difference.
1751117275085.png
Note: ~ is any title starting with a number or a symbol

I focused on E/V, no reason to think other categories would differ.
 
Back
Top