Author rating

Well, since you ask:

Step 0: decide what you want your normalised scores to look like. (e.g. do you want equal numbers of 1s, 2s, 3s, 4s, 5s? Do you want a bell-shaped distribution with 3 being the average? Do you want something close to the current distribution of scores on Literotica, for familiarity?) Pick whichever you like of those.

For the sake of example, I'll assume we decide to give scores from 1-5 and to have it so that there are as many stories in 1-2 as there are from 2-3, 3-4, and 4-5, and so on at finer levels, but the same approach can be applied for other scales.

Step 1: find the actual distribution of raw scores for each category: e.g. maybe in LW the bottom 50% of stories score 4.1 or lower, the bottom 80% score 4.4 or lower, 90% score 4.5 or lower, 99% score 4.6 or lower. (Numbers made up, I haven't counted.) You could do this reasonably well via random sample, but there are "only" about half a million stories on Literotica, so it wouldn't be impossible to pull all the scores.

Step 2: for each story, check the raw score against its category distribution, convert that to a percentile, then refer to the normalised distribution we chose in step 0 and convert the percentile back to a score. For instance, if an LW story has a raw score of 4.4 (putting it at the 80th percentile) then using the scale I mentioned above, its normalised score ends up at the 80th percentile of that 1-5 scale, which is a 4.2.

So far, this is just "grading on a curve", as done in many a high school or college class. It does make a few assumptions, some of which you've already identified:
  • Assumes no systematic quality difference between categories (but what if the stories in Category X really are better written than those in Category Y, on average?)
  • Ignores effects of story/series length on scores.
  • Ignores possibility that rating behaviours change over time (maybe readers are softer than they used to be?)
But with a bit more work, we can shed those assumptions. For instance, say we find that Category X has an average score of 4.5 and Category Y has an average score of 4.1. How can we figure out whether this difference of 0.4 is a matter of "Category X gets better authors" or "Category Y has tougher voters"?

Well, we probably have a bunch of authors who've written in both categories. If we look at those authors and find their average scores for Category X are typically about 0.1 points higher than their scores in Y, then we can conclude that about 0.1 of that difference is due to tougher voting in Y, and the other 0.3 is due to higher author quality in X. We can then adjust scores accordingly, e.g. bump the Category X scores down 0.05 and Category Y up 0.05 to compensate for that voting difference.

(I wouldn't actually do it quite that way, because a simple additive adjustment produces some weirdness at the ends of the scales - e.g. if somebody does manage to get a 4.97 in Category Y, then we'd be adjusting it to an impossible 5.02. Instead, you'd probably apply something like a logit transform to make the results more sensible across the scale. But I don't want to drown people in detail here.)

Story/series length can be handled via similar tricks: pick variables like "number of previous chapters", "chapter length", "total length of all chapters up to this one", find which ones have good explanatory power for scores (both within a series and across different works by the same author) and the nature of that relationship. I'd expect something like "logit(score) increases by k per 1000 words before score point, up to maximum of K" would do reasonably. Age, similarly.

For bonus points, you can look at the number of votes contributing to each score and weight its importance in the model accordingly - a score of 4.9 from 500 votes means a lot more than 4.9 off 10 votes!

That might seem like a lot if you haven't worked with this kind of problem previously, but it's something that's been researched. Sports nerds do stuff like this when trying to answer questions like "who was the greatest batsman of all time?" - even though Don Bradman and Sachin Tendulkar never overlapped one another, so you can't compare directly, there are thousands of smaller overlaps that can be put together to get a better comparison.

Again, I'd note that I don't think it would be terribly useful to the site to actually do this, it's just interesting as a thought exercise.



I was talking strictly about story scores, and votes only to the extent that more votes = less noise in the ratings. If you want to get into favourite counts, you need to consider views first - e.g. is 100 favourites off 1000 views more impressive than 10k favourites off a million views? One can keep on adding complications forever; I'm just saying that it's not particularly hard to do that sort of normalisation if one believes a universal ranking of Lit stories is a useful thing to have. Most of it would be a matter of finding people who've already tackled similar problems, and modifying their code.
You're right, it could be done, but to what point?

Also, don't forget to account for multi-part stories that have chapters cross over into different categories. What a fuster cluck that would bring to the data.
 
Well, since you ask:

Step 0: decide what you want your normalised scores to look like. (e.g. do you want equal numbers of 1s, 2s, 3s, 4s, 5s? Do you want a bell-shaped distribution with 3 being the average? Do you want something close to the current distribution of scores on Literotica, for familiarity?) Pick whichever you like of those.

For the sake of example, I'll assume we decide to give scores from 1-5 and to have it so that there are as many stories in 1-2 as there are from 2-3, 3-4, and 4-5, and so on at finer levels, but the same approach can be applied for other scales.

Step 1: find the actual distribution of raw scores for each category: e.g. maybe in LW the bottom 50% of stories score 4.1 or lower, the bottom 80% score 4.4 or lower, 90% score 4.5 or lower, 99% score 4.6 or lower. (Numbers made up, I haven't counted.) You could do this reasonably well via random sample, but there are "only" about half a million stories on Literotica, so it wouldn't be impossible to pull all the scores.
:
This is all interesting and stuff, but you seem to be saying "This is easy if I assume someone else does all this work." There's no "someone else". Having done projects like this, I find what you proposed massive and extremely difficult. But hey, I'm just a self-taught gerbil trainer. Prove me wrong! Knock this project out in your spare time! I hear python scripting is really easy. Someone with your background and your brains could probably do this in a weekend.
 
Just FYI, if you want to know your own average without doing the math, you can get that from the spread sheet you can download from your control panel.

This is what I do. You can download the data file so it is automatically converted into an Excel spreadsheet, making it very easy to use.
 
I totally agree. I'm not sure exactly how the algorithm works to determine who gets on these lists, but it has very little to do with consistent success (views, stories, or high scores) within a particular category. I just published my first story in the Humor and Satire category a week ago, and I'm currently on the list of "Top" Humor and Satire authors, despite the fact my story is currently sitting with a 3.83 score and it's my only one in that category! it has something to do with total favorites or views or something combined with newness on the list.
I am on one list, and have no clue how I got there.

How is popularity judged? If it is simply by followers, as an example, it would make sense that someone with 500 stories would have more followers than someone with far fewer stories. (The same should hold true with favorites.)

SamuelX is one of the more prolific writers here with 3,700 submissions and 1,958 followers. JustPlainBob has 851 submissions and 8,200 followers. Who would you judge as the more popular writer with readers?
 
I am on one list, and have no clue how I got there.

How is popularity judged? If it is simply by followers, as an example, it would make sense that someone with 500 stories would have more followers than someone with far fewer stories. (The same should hold true with favorites.)

SamuelX is one of the more prolific writers here with 3,700 submissions and 1,958 followers. JustPlainBob has 851 submissions and 8,200 followers. Who would you judge as the more popular writer with readers?
If you go strictly by the definition of the word "popular" that's exactly how it would be weighed. From Dictonary.com:

"prevailing among the people generally"
By definition, JustPlainBob would be more popular than SamuelX. Does it mean that Bob is a better writer than Samuel? I have never read a work by either one, but because a writer has more stories written, or they have more followers, or they have a higher vote average doesn't mean they are a better writer. Popularity is one of those fickle human traits that is determined primarily by personal taste.

Unless you are in pursuit of the top spot in the "most popular author" list (which is an okay aim if that's your gig) I don't think it matters much.

Comshaw
 
Note that the hubs give a different head and data for its "top author" listing. The old view is calling this "most popular authors" and the new one is calling it "top story authors." And the names and ordering of the listing is different on the two at the same time. So, whatever the formula is, it's different between the two category hub looks. It's no longer called "most popular" on the new hub look, though.

(But back to the original snarky e-mail I got on this, my account name sr71plt is on both versions of the GM list today.)
 
Last edited:
This thread is just another reminder that this forum is far more concerned with numbers than writing.
 
If you go strictly by the definition of the word "popular" that's exactly how it would be weighed. From Dictonary.com:

"prevailing among the people generally"
By definition, JustPlainBob would be more popular than SamuelX. Does it mean that Bob is a better writer than Samuel? I have never read a work by either one, but because a writer has more stories written, or they have more followers, or they have a higher vote average doesn't mean they are a better writer. Popularity is one of those fickle human traits that is determined primarily by personal taste.

Unless you are in pursuit of the top spot in the "most popular author" list (which is an okay aim if that's your gig) I don't think it matters much.

Comshaw
Popularity is certainly fickle, and I have no desire for it.

I simply mentioned that the metric, as applied here, involves inconsistent data for proper measurement. There simply isn't enough available to make the determinations that the site implies with many of their "Top List" choices.

For example, should the "Most Productive" list be ranked on the number of stories a writer posts or the number of words written for the stories posted? Who's more productive, the writer with one-thousand 750-word stories or the writer with fifty 100,000 word stories? Who's more popular, the writer with fifty followers per story posted or the writer with nine followers per story posted, regardless of the number of stories counted?
 
And yet you've spent tens of thousands of hours with them, and you'll spend tens of thousands more until you join us in the underground sauna.:geek:
I feel like I’m watching season 12 of a show, without the benefit of having watched seasons 1 - 11.

Em
 
For example, should the "Most Productive" list be ranked on the number of stories a writer posts or the number of words written for the stories posted? Who's more productive, the writer with one-thousand 750-word stories or the writer with fifty 100,000 word stories? Who's more popular, the writer with fifty followers per story posted or the writer with nine followers per story posted, regardless of the number of stories counted?
Also to be included is what counts as a story? A complete standalone and then the combined chapters as one story, or does each posted chapter get a full count as a story?

I have no idea what goes into the stats given other than noting that in the new hub views, authors are getting listed on the top author list who don't really write much in that category at all.
 
Popularity is certainly fickle, and I have no desire for it.

I simply mentioned that the metric, as applied here, involves inconsistent data for proper measurement. There simply isn't enough available to make the determinations that the site implies with many of their "Top List" choices.

For example, should the "Most Productive" list be ranked on the number of stories a writer posts or the number of words written for the stories posted? Who's more productive, the writer with one-thousand 750-word stories or the writer with fifty 100,000 word stories? Who's more popular, the writer with fifty followers per story posted or the writer with nine followers per story posted, regardless of the number of stories counted?
As I said and will reiterate if you go by the standard definition of “popular” it means the most widely accepted author, therefore the one with the most followers, or highest average vote. This site has plenty of hard data to determine either of those. If you want to redefine popular into another meaning altogether, I can't address the question because I have to reference.

As far as your last paragraph addressing a “most productive” category it all depends on who's setting the metric. It could be either of those you mentioned or a highbred of the two.

Comshaw
 
This is all interesting and stuff, but you seem to be saying "This is easy if I assume someone else does all this work." There's no "someone else". Having done projects like this, I find what you proposed massive and extremely difficult. But hey, I'm just a self-taught gerbil trainer. Prove me wrong! Knock this project out in your spare time! I hear python scripting is really easy. Someone with your background and your brains could probably do this in a weekend.

I'm getting the impression I've trodden on toes, which was not my intention, and if I've said something that gave that vibe I beg pardon.

I cannot remember the last time I had a free weekend, even over the Christmas break; my freelance gig comes in cycles and right now is Busy Time, which is nice for the bank balance but hell on hobbies. My D&D players haven't had a game in months.

But if somebody's willing to do the scraping and some light data wrangling for me, I can give it a go. What I'd need would be something like a CSV of stories from English-Language Literotica, one per line, with something like the following data in machine-friendly format:

URL
Title (optional)
Category
Publication date
Author numeric ID (e.g. I'm 1374399)
Author name (optional)
Word count
Rating
Number of views
Number of votes, if available. (Not sure if this is publicly visible for stories outside the top list though? If not, I can use views as a proxy.)
Story/chapter position in series (i.e. total number of chapters published, and what number this one is)
URLs of previous and following chapters in series, if any
Cumulative series word count (i.e. sum of word counts for all chapters up to and including this one)

The approach I've outlined wouldn't be using tags, number of favourites, or number of comments, but it might not hurt to grab those as well in case somebody wants to look at those later.

I'm not sure whether it'd be worth extending this to poems, as many don't get enough votes to support meaningful analysis.

Could I do that scraping/initial wrangling myself? Yes, I've written a Lit scraper before to satisfy myself that I could do it, and modified it to investigate a couple of things that interested me or friends. But it's not my strength and not something I enjoy enough to be doing for free, in search of numbers that I wouldn't be using for anything other than looking at and saying "hmm, that's interesting". I'm only volunteering myself for the bits that I'd have fun doing.
 
Back
Top