BobbyBrandt
Virgin Wannabe
- Joined
- Apr 7, 2014
- Posts
- 1,740
You're right, it could be done, but to what point?Well, since you ask:
Step 0: decide what you want your normalised scores to look like. (e.g. do you want equal numbers of 1s, 2s, 3s, 4s, 5s? Do you want a bell-shaped distribution with 3 being the average? Do you want something close to the current distribution of scores on Literotica, for familiarity?) Pick whichever you like of those.
For the sake of example, I'll assume we decide to give scores from 1-5 and to have it so that there are as many stories in 1-2 as there are from 2-3, 3-4, and 4-5, and so on at finer levels, but the same approach can be applied for other scales.
Step 1: find the actual distribution of raw scores for each category: e.g. maybe in LW the bottom 50% of stories score 4.1 or lower, the bottom 80% score 4.4 or lower, 90% score 4.5 or lower, 99% score 4.6 or lower. (Numbers made up, I haven't counted.) You could do this reasonably well via random sample, but there are "only" about half a million stories on Literotica, so it wouldn't be impossible to pull all the scores.
Step 2: for each story, check the raw score against its category distribution, convert that to a percentile, then refer to the normalised distribution we chose in step 0 and convert the percentile back to a score. For instance, if an LW story has a raw score of 4.4 (putting it at the 80th percentile) then using the scale I mentioned above, its normalised score ends up at the 80th percentile of that 1-5 scale, which is a 4.2.
So far, this is just "grading on a curve", as done in many a high school or college class. It does make a few assumptions, some of which you've already identified:
But with a bit more work, we can shed those assumptions. For instance, say we find that Category X has an average score of 4.5 and Category Y has an average score of 4.1. How can we figure out whether this difference of 0.4 is a matter of "Category X gets better authors" or "Category Y has tougher voters"?
- Assumes no systematic quality difference between categories (but what if the stories in Category X really are better written than those in Category Y, on average?)
- Ignores effects of story/series length on scores.
- Ignores possibility that rating behaviours change over time (maybe readers are softer than they used to be?)
Well, we probably have a bunch of authors who've written in both categories. If we look at those authors and find their average scores for Category X are typically about 0.1 points higher than their scores in Y, then we can conclude that about 0.1 of that difference is due to tougher voting in Y, and the other 0.3 is due to higher author quality in X. We can then adjust scores accordingly, e.g. bump the Category X scores down 0.05 and Category Y up 0.05 to compensate for that voting difference.
(I wouldn't actually do it quite that way, because a simple additive adjustment produces some weirdness at the ends of the scales - e.g. if somebody does manage to get a 4.97 in Category Y, then we'd be adjusting it to an impossible 5.02. Instead, you'd probably apply something like a logit transform to make the results more sensible across the scale. But I don't want to drown people in detail here.)
Story/series length can be handled via similar tricks: pick variables like "number of previous chapters", "chapter length", "total length of all chapters up to this one", find which ones have good explanatory power for scores (both within a series and across different works by the same author) and the nature of that relationship. I'd expect something like "logit(score) increases by k per 1000 words before score point, up to maximum of K" would do reasonably. Age, similarly.
For bonus points, you can look at the number of votes contributing to each score and weight its importance in the model accordingly - a score of 4.9 from 500 votes means a lot more than 4.9 off 10 votes!
That might seem like a lot if you haven't worked with this kind of problem previously, but it's something that's been researched. Sports nerds do stuff like this when trying to answer questions like "who was the greatest batsman of all time?" - even though Don Bradman and Sachin Tendulkar never overlapped one another, so you can't compare directly, there are thousands of smaller overlaps that can be put together to get a better comparison.
Again, I'd note that I don't think it would be terribly useful to the site to actually do this, it's just interesting as a thought exercise.
I was talking strictly about story scores, and votes only to the extent that more votes = less noise in the ratings. If you want to get into favourite counts, you need to consider views first - e.g. is 100 favourites off 1000 views more impressive than 10k favourites off a million views? One can keep on adding complications forever; I'm just saying that it's not particularly hard to do that sort of normalisation if one believes a universal ranking of Lit stories is a useful thing to have. Most of it would be a matter of finding people who've already tackled similar problems, and modifying their code.
Also, don't forget to account for multi-part stories that have chapters cross over into different categories. What a fuster cluck that would bring to the data.