Regression To The Mean

So much BS.

If there were anyone here who wanted to engage in this discussion with honesty and good faith, I'd take the time to refute most things that the last several posters said. There are so many strawman arguments.

But there isn't anyone. I wasted my breath many times over these subjects previously and found the lack of honest desire to discuss these things.

Most of these Lit cultists make one post where they express strong impressions and then refuse to reply when their posts are challenged. That's why I didn't bother posting in this thread until now when I was mentioned.

It's easy to call BS; it's harder to prove it. You've been calling things out but not proving them for the entire time you've been posting to this thread. Others have treated your arguments seriously, rather than accusing you, as you do others, of acting in bad faith and being "cultists," and when they confront you you don't respond, you just go back to the name-calling.

I've pointed to your story set, because you've been one of the loudest critics of the site. You have nothing to be angry about. In under three years of publishing, in categories that don't typically get a great deal of views, you've acquired 684 followers, you have a very high mean story score, and you get plenty of favorable comments. What is there to complain about? Where is the injustice? What is your grievance?

Consider your 4-chapter A New Beginning series, published between January and June 2024. The AVERAGE story score is around 4.84, which is extremely high. I can't see your vote totals, but extrapolating from your view totals I'd guess you're getting around 90 to 200 votes for each story. Which means they're right around the number where scores start to settle down and become more long-term predictable, and at that point you've achieved excellent scores. The comments are favorable; you obviously have enthusiastic fans. The evidence of your own story set, where the story scores have been given time to settle down to stable numbers, shows the standard pattern at work, and it shows that what you are doing is working and succeeding under the existing system.
 
To argue the counterpoint, the near universal flattening to 4.85 is not a natural statistical phenomenon. You should see an upper bound tail drifting off. But almost every story in every category that dares creep above that line gets knocked back down. I have been paying more attention to N&N recently because I have had several of my own reaching 100 votes recently (or soon). There is currently 1 story total above 4.85. But 100's (and I suspect probably over 1000) stories at exactly 4.85. That is not a natural statistical phenomenon; this can only come about through manipulation. When I look at my novel, half of the stories came into maturity (100 votes) above 4.85. The other half in the lower 4.8's. Guess which half now has better scores? Why does the regression only impact the stories with the audacity to cross that magical barrier. It doesn't because this is obviously not regression to the mean. The number of votes for the two halves do not vary substantially, only by a handful of 1's the upper half received. I have yet to hear a rational explanation for the minimally higher scoring stories dropping more than the lower ones beyond we have a serious problem with trolls attacking the top lists.

To be fair, I d not think any of my novel deserve to be in the all time top list. Many other (better) stories before me have taken the same abuse already.

As I said in another thread, the rating system, as it works, is not well suited to distinguish the better stories from each other. But that's not what the rating system needs to do to serve it's primary market of the readers/
 
As I said in another thread, the rating system, as it works, is not well suited to distinguish the better stories from each other. But that's not what the rating system needs to do to serve it's primary market of the readers/

I agree with this, and I always have. The problems are a) the scoring system isn't finely graded enough, with only five choices, and b) the red H distorts the system artificially by incentivizing the giving of 5s and disincentivizing the giving of less than 5, so you have undue compression of scores at the top. But authors don't want their red Hs taken away. You can't have it both ways. I would either get rid of the red H or convert it to a percentile system, where only stories in the top X percent of a category get a red H. As it exists right now, it's silly, because a red H in one category means something completely different from a red H in another category.
 
To argue the counterpoint, the near universal flattening to 4.85 is not a natural statistical phenomenon. You should see an upper bound tail drifting off. But almost every story in every category that dares creep above that line gets knocked back down. I have been paying more attention to N&N recently because I have had several of my own reaching 100 votes recently (or soon). There is currently 1 story total above 4.85. But 100's (and I suspect probably over 1000) stories at exactly 4.85. That is not a natural statistical phenomenon; this can only come about through manipulation. When I look at my novel, half of the stories came into maturity (100 votes) above 4.85. The other half in the lower 4.8's. Guess which half now has better scores? Why does the regression only impact the stories with the audacity to cross that magical barrier. It doesn't because this is obviously not regression to the mean. The number of votes for the two halves do not vary substantially, only by a handful of 1's the upper half received. I have yet to hear a rational explanation for the minimally higher scoring stories dropping more than the lower ones beyond we have a serious problem with trolls attacking the top lists.

I'm not convinced. I'd like to see more rigorous mathematical analysis, which I'm, admittedly, not capable of doing.

Consider the three datasets I analyzed. The all-time highest Exhibitionist story has a score of 4.85. For Lesbian Sex it's 4.87. For Sci Fi it's 4.9. Each category seems to have its own peculiar degree of pickiness or taste.

In Novels and Novellas, the top story has a score of 4.88, .03 higher than any other story, but also an amazing total of over 10,000 votes. The next 4 stories, all at 4.85, have between 3700 and 8100 votes. With numbers that high it seems at least IMPROBABLE that the scores are the result of malicious downvoting and that there's an active presence holding scores at that ceiling. Maybe, but it seems improbable to me.

I checked Incest, which has huge view numbers. The top all time story has a score of 4.86. In Mature it's 4.87. They have very high vote totals. They're completely different categories with different reader groups, so it's unlikely that we're seeing the same malicious behavior patterns in all of these categories. What seems MORE likely to me is that this is statistical. It's hard for a story with thousands of votes to get much over 4.85 because even a stray 1 vote, something that is bound to come to anybody, will hold it down. I think this is the logical explanation. But I admit I'm not enough of a mathematical expert to set up a proof for that.
 
I agree with this, and I always have. The problems are a) the scoring system isn't finely graded enough, with only five choices, and b) the red H distorts the system artificially by incentivizing the giving of 5s and disincentivizing the giving of less than 5, so you have undue compression of scores at the top. But authors don't want their red Hs taken away. You can't have it both ways. I would either get rid of the red H or convert it to a percentile system, where only stories in the top X percent of a category get a red H. As it exists right now, it's silly, because a red H in one category means something completely different from a red H in another category.
The ratings themselves above about 4.8 are all suspect as well because of the relentless 1-bombing of stories on the all-time list, at least until you reach a very high vote total.

Back to the original regression to mean proposition, I do believe there is a phenomenon that happens. And author who has even a modest following has a biased early sampling of readers, because the followers are notified of new stories. Although they represent a small percentage of total views (for most of us at least), they have already demonstrated an interest in interacting with stories by choosing to follow, so I will hypothesize that they are also more willing to vote. And because they are preselected readers who like your work, you are likely to have a positively biased early voting pattern. That is somewhat offset by the trolls who seem to want to 1-bomb every new story.

For roughly my first half to two-thirds of my stories, I had a small enough following (<100 followers), that is balanced out around 20 or 25 votes and that score tended to hold reasonably well. I have gained more followers now (closing in on 400) and the rating seems to stabilize around 80 votes. If I quadruple my followers again, maybe it goes up to 200 or something. But, if the story is good and exceeds that magical 4.85 (for most categories) that will swamp any other effect.
 
because of the relentless 1-bombing of stories on the all-time list, at least until you reach a very high vote total.

I'd like to see proof of this. I don't disbelieve in it, if others think they are seeing this, but I'd like to see, in some detail, what the proof is, rather than just hearing people make vague allegations that they "know" this is happening.

Proof would be this: Give an example of a specific story, with a link to it, and give us specific evidence of the pattern of voting and scores. Characterizations of what's happening are not sufficient. We need data.

As I've said, I've been doing this for nine years, I've published 64 stories, AND I've published exactly the kinds of stories that the trolls allegedly want to bomb: stories where wives have sex outside marriage and have fun and get away with it. And I've seen plenty of 1-bombing. But I haven't seen the patterns of deliberate 1-bombing that people allege. So, I'd like to see what the evidence is.
 
To argue the counterpoint, the near universal flattening to 4.85 is not a natural statistical phenomenon. You should see an upper bound tail drifting off. But almost every story in every category that dares creep above that line gets knocked back down. I have been paying more attention to N&N recently because I have had several of my own reaching 100 votes recently (or soon). There is currently 1 story total above 4.85. But 100's (and I suspect probably over 1000) stories at exactly 4.85. That is not a natural statistical phenomenon; this can only come about through manipulation.

Hmm, that does look peculiar.

Looking at the toplist history, that flattening on N+N seems to be quite a recent development. The most recent archive, from September 18, has a much more natural looking distribution.

Looking at one of the high-rated stories on that list: on September 18 it had an average of 4.87 from 5864 votes, and as of today it has 4.85 from 6063 votes. Doing the math, that suggests that in that time it received 199 votes with an average somewhere between 3.96 and 4.56. (I can't tell the exact number because of rounding in the scores - that "4.87" could be anywhere between 4.865 and 4.8749999).

For a story whose long-run average is around 4.85, a sample of 199 votes would have an expected average of 4.85 with a standard deviation of approximately 0.025. (Stats nerds: treating votes as independent, identically distributed Bernoulli variables.) The observed average, even at the top bound of 4.56, is about 11 standard deviations below that expected value. That's...a lot.

It doesn't automatically mean foul play. There are other ways something like this could happen - for instance, as stories rise and fall in the top lists the audience may change. (Stats nerds: i.i.d. assumptions break due to time-dependent effects.) And "weird site bug" is always a possibility. But I think this one would be worth reporting to @Laurel if you haven't already done so.

It's very late here and I don't have time to pursue this further, but if somebody were to compile a time series of vote counts and scores for some of the top stories on that list (and likewise other suspect categories) it might be interesting to analyse.

I checked Incest, which has huge view numbers. The top all time story has a score of 4.86. In Mature it's 4.87. They have very high vote totals. They're completely different categories with different reader groups, so it's unlikely that we're seeing the same malicious behavior patterns in all of these categories. What seems MORE likely to me is that this is statistical. It's hard for a story with thousands of votes to get much over 4.85 because even a stray 1 vote, something that is bound to come to anybody, will hold it down.

I had the same reaction initially, but after looking at the score patterns for N+N a couple of months ago vs. now, I do think @iwatchus is right that something peculiar is going on there.
 
I will give another example this one not a top list but still spotlighted story, my Nude Day Contest prize winner.

It won at 4.91 (after the last round of sweeps). There were approximately 240 votes on it at the time (likely something like 220 5s and 20 4's) Within a few hours of the announcement, it was at 4.77. That represents seven or eight 1's. I was at a family outing that weekend, so I could not watch the votes come in, but it had 20 more votes in the morning, still at 4.77.

Fast forward five months. It has roughly doubled its votes now (currently at 484) and sits at a 4.85. What does it take in those new 224 votes to raise the rating back to 4.85? 4.92.

This is not a regression. This is 100% the result of one or more trolls throwing a hissy fit at the winners.
 
I had the same reaction initially, but after looking at the score patterns for N+N a couple of months ago vs. now, I do think @iwatchus is right that something peculiar is going on there.
I don't believe it's limited to NN. Like a week ago there were 23 stories above 4.85 on the all-time Romance toplist; today there are 16, I believe. With more than 25,000 stories submitted to the category, I don't think you'd expect a natural distribution where one story is 4.90. one is 4.89, two are 4.88, four are 4.87, eight are 4.86 and at least 234 are 4.85. And it's a lot more than that, because the tiebreaker for identical scores is the number of ratings. Story #250 has 836 ratings and it only takes 100 to qualify.

You'd also expect to see stories break into the all-time toplist with 100+ votes, but that doesn't happen in Romance. None of the stories above 4.85 have fewer than 1000 ratings. The 12-month toplist has nine stories at 4.9 or above (and mine's been downvoted over the last month out of that range, so at one point quite recently it was 10). You'd expect at least one of those to stick, but they don't.

@Duleigh's Stormwatch Ch 11 is nine ratings away from the 100 threshold at a 4.91 score. We'll see if it actually makes an appearance on the toplist.
 
I don't believe it's limited to NN. Like a week ago there were 23 stories above 4.85 on the all-time Romance toplist; today there are 16, I believe.
Yeah, I had a quick glance at other categories, and some of them looked hinky but I didn't have the time to go in-depth on them all.

FWIW I've pinged Laurel and Manu on the stuff I discussed in my previous post.
 
I will give another example this one not a top list but still spotlighted story, my Nude Day Contest prize winner.

It won at 4.91 (after the last round of sweeps). There were approximately 240 votes on it at the time (likely something like 220 5s and 20 4's) Within a few hours of the announcement, it was at 4.77. That represents seven or eight 1's. I was at a family outing that weekend, so I could not watch the votes come in, but it had 20 more votes in the morning, still at 4.77.

Fast forward five months. It has roughly doubled its votes now (currently at 484) and sits at a 4.85. What does it take in those new 224 votes to raise the rating back to 4.85? 4.92.

This is not a regression. This is 100% the result of one or more trolls throwing a hissy fit at the winners.
An alternative explanation would be that, after the announcement of the winners, your story gets a great deal more views from readers with reasonably elevated expectations: your story is a prize winner. They expect it to impress them. Some of them don't like it, or they don't think it's good enough to merit a contest prize. Some, whether or not we approve of the practice, drop some 1s on it. The rating dips.

I say that not as a judgment of your story, just to be clear. The same thing happened to me.
 
It's easy to call BS; it's harder to prove it. You've been calling things out but not proving them for the entire time you've been posting to this thread. Others have treated your arguments seriously, rather than accusing you, as you do others, of acting in bad faith and being "cultists," and when they confront you you don't respond, you just go back to the name-calling.

I've pointed to your story set, because you've been one of the loudest critics of the site. You have nothing to be angry about. In under three years of publishing, in categories that don't typically get a great deal of views, you've acquired 684 followers, you have a very high mean story score, and you get plenty of favorable comments. What is there to complain about? Where is the injustice? What is your grievance?

Consider your 4-chapter A New Beginning series, published between January and June 2024. The AVERAGE story score is around 4.84, which is extremely high. I can't see your vote totals, but extrapolating from your view totals I'd guess you're getting around 90 to 200 votes for each story. Which means they're right around the number where scores start to settle down and become more long-term predictable, and at that point you've achieved excellent scores. The comments are favorable; you obviously have enthusiastic fans. The evidence of your own story set, where the story scores have been given time to settle down to stable numbers, shows the standard pattern at work, and it shows that what you are doing is working and succeeding under the existing system.
First of all, I didn't make any posts in this thread until you called me out in your previous post. So I've no idea what you're on about referencing some posts of mine I supposedly haven't been proving. The post where I claimed BS was the first post of mine in this thread. Try to at least browse through the thread before you make claims.

Second of all, I've been through these topics with you before. Your MO is to make sweeping judgments based on your impressions "in the last nine years I've been here." You do that all the time. And then you disengage from further discussion. I've found that trying to engage you in a proper discussion is a waste of time.

But let's humor a typical strawman of yours, one you used several times as some kind of proof in similar threads.
Man, what in the world do I care whether my average score is above yours? What kind of benchmark do you think that is? Do you really think I care whether my rating is above yours? Or you consider yourself so special that everyone who goes above your average should dance around the room in elation?

Once again, I've made no posts about ratings in this thread, nor did I intend to until you called me out. It's a waste of time to waste time, you know.
 
An alternative explanation would be that, after the announcement of the winners, your story gets a great deal more views from readers with reasonably elevated expectations: your story is a prize winner. They expect it to impress them. Some of them don't like it, or they don't think it's good enough to warrant a contest prize. Some, whether or not we approve of the practice, drop some 1s on it. The rating dips.

I say that not as a judgment of your story, just to be clear. The same thing happened to me.
Except that would be expected as the long term average, but the average rating SINCE that first attack is 4.92. It only dropped in that first few hours involving a small number of votes (so lots of 1's)
 
An alternative explanation would be that, after the announcement of the winners, your story gets a great deal more views from readers with reasonably elevated expectations: your story is a prize winner. They expect it to impress them. Some of them don't like it, or they don't think it's good enough to warrant a contest prize. Some, whether or not we approve of the practice, drop some 1s on it. The rating dips.

I say that not as a judgment of your story, just to be clear. The same thing happened to me.
Maybe -- one thing that would suggest that's not the case is how much the rating went up after. There's a contest winner that was mid-4.7s through the contest, 4.94 after the final sweep and is mid 4.7s now. I think your explanation makes sense for something like that. It actually did regress to its mean after the sweep improperly cleared legitimate ratings. But for something that goes from steady x to x-0.2 to x regression is working in the other direction.
 
Except that would be expected as the long term average, but the average rating SINCE that first attack is 4.92. It only dropped in that first few hours involving a small number of votes (so lots of 1's)

Maybe -- one thing that would suggest that's not the case is how much the rating went up after. There's a contest winner that was mid-4.7s through the contest, 4.94 after the final sweep and is mid 4.7s now. I think your explanation makes sense for something like that. It actually did regress to its mean after the sweep improperly cleared legitimate ratings. But for something that goes from steady x to x-0.2 to x regression is working in the other direction.

I think it's reasonable to expect an influx of votes after a contest announcement, irrespective of category, followers, etc. There are readers, it seems, that watch those contests, and the contest list page doesn't give you a lot of (or any) information on what story you're clicking on beyond the title. In the immediate wake of the contest announcement, readers who might not find your story by category or other means will click on it, find it's not for them, and fairly or not drop a 1 on their way out. Those votes are not representative of the overall trends your story will see based on the usual means of readers finding their way to it.
 
I do not discount the possibility of 1's as a valid vote. But what is the more likely explanation, honestly.

A story gets a 4.91 rating (after sweeps granted) over 240 votes. It gets a sub 2 rating for 20 votes in a few hour period and then gets a rating of 4.92 over another 220 votes (no sweeps involved). Is the "true" rating more likely to be 4.85 or is it more likely that those 20 votes were tainted?

Many of us watch the vt totals for our first 100+ votes pretty carefully and know the exact numbers of each 1-5 we get. Once it gets well over 100 it can be hard to be sure. Before I lost track. this story was at a 4.89 without the 1's.

Again, it's painfully obvious that people vote 1's for malicious purposes outside of contests. (I had 12 straight stories get a 1 at approximately 7:45 AM EST on the day they were released. It would take a huge amount to convince me that every one of those votes was legitimate). Why is it so hard to accept that this is true for contests and all-time lists.

To summarize the evidence:

* Stories that are slightly below making the all time list do better after crossing the 100-vote mark than ones that are slight above it. I have yet to hear even a minimal attempt at explaining that other than trolls.

* Most of the categories now have a very unnatural distribution of scores at the high end. I have yet to hear an explanation other than trolls for that distribution.

* A single, reasonably well documented occurrence of a story having it's entire "regression" occur in a single few hour slot, after which the new scoring returns to it's previously established level. I wait to hear an explanation for how that is regression to the mean.
 
I do not discount the possibility of 1's as a valid vote. But what is the more likely explanation, honestly.
I think what crooked is saying isn't that there aren't trolls. When contest winners are announced, people are going to be routed to your story that wouldn't ordinarily click on it. That's because there's no information for them about what kind of story it is. Ratings are a bit like CinemaScore; they're largely a measure of how well you met audience expectations. When the audience can't calibrate their expectations because they can't see the category beforehand, they're more likely to leave 1s.

I think that explains some, a little piece, of what happens. I don't think it explains all of it by any means.
 
I'm sure there are trolls in the voting system. They exist everywhere else, why wouldn't they be there? But I also think there are as many factors influencing readers' votes as there are readers voting.

My point above isn't to discount trolls or to try to explain all 1 votes. But when we talk about trends and regressions to means etc. etc., then we also talk about contest winners and top list entries, i.e. stories that garner added attention from a wider pool of readers, I think a muddled narrative is getting even more muddled.

It's possible a contest winner is getting targeted by an insecure little turd who now has you in their sights because you have what they want. Those 1 votes might be because people hate you, for whatever reason. They also might because people are driven to your story -- a great upside to your work being spotlighted -- and they hate your story, for reasons maybe just as unreasonable. They hate it because nobody got pregnant or there isn't enough butt stuff or they hate third person present. Or it just isn't their category, and they can't leave without taking their parting shot.

My point, really, is we have no idea why most people vote the way they do.
 
Also not sure we know how many people are leaving ratings. We know how many ratings there are, but if you're skeptical about how the toplists work I think you also have to accept that there's some level of automated rating happening; otherwise you're into conspiracy territory.
 
A legitimate vote can score a one; we don't like that, but some people don't like what we write. Well, they don't like what I write. But 15 in a row, and you got a troll.
 
Stepping back from individual cases and example, Regression to the Mean is a valid concept, mostly. But not always.

At the risk of getting into a Pirsig-esque debate about undefinable Quality, let’s agree that each work of art each does indeed have some intrinsic value, merit or worth. High school art class daubs have relatively little, whereas works by Michelangelo, Picasso or Rubens are possessed of a great deal, are simply better. If all art were to be compared together, judged merely on merit, the masters’ works would naturally rest at one end of the scale and crayoned works taken from the refrigerator at the other.

If large numbers of people continue to judge this broad array of artistic effort, is it reasonable to think that that the relative assessed values of paintings of big-eyed children on black velvet and Rembrandt’s Night Watch will converge, regress to the mean? You’d have to work very hard to convince me of that.

Back to dodging Pirsig. I think it fair to say that each artistic work (even our stories here) has an intrinsic value or worth. To wave off declining scores of works here in Lit as entirely normal flies in the face of that.

I will of course acknowledge things like individual preferences and changing tastes with time, but I will maintain that there is more at play here than statistical inevitability. Indeed, the mere fact that the site runs sweeps is a solid proof that there are irregularities needing correction. Am I losing sleep over it? Hardly. E pur si muove!
 
My story, Just a Friendly Drink, has maintained a 4.81 ranking with 183 votes (where it froze a long time ago), but has picked up seven new votes over the past few weeks and dropped to a 4.74 rating. I don't think that represents any one bombing, just a few people who didn't give it a five-star review. I can live with that. After all, it's my highest-rated story here.
 
For the sake of argument, let's assume that for every 10 votes, you get one 4. That means the highest possible score you can have is a 4.9.

Let's build on that assumption and assume that for every 50 votes, you're going to get one 2, from someone who genuinely doesn't like your story but doesn't want to bomb it with a 1, along with 45 5s and four 4s. Not a single 1. In that case the highest possible score you can get is a 4.86.

I don't know if these assumptions are true, but they seem on their face reasonable. I know of no reason to assume they are unreasonable. If that's the case, it's not hard to see how stories in long-term all-time scoring lists with many, many votes tend to top out no higher than the mid 4.8s. And you can get to that without any assumptions about people lying in wait ready to downvote stories that get above a certain level to hold them down. It may happen (and I believe it does happen in some cases), but there's no particular reason to believe that it is a substantial explanation of the long-term score patterns we see.

The Incest category, which has the highest number of views and votes, appears to bear this out. The highest rated story on the all time list has a score of 4.86. It has 17,575 votes. That's a robust figure. That would indicate that with enough votes 4.86 is an approximate ceiling on how high a score one can expect to get.

Given these numbers, while it's annoying that some people vote for malicious reasons, it's hard to see how it's a long-term problem that the site should feel any obligation to solve through a significant change to voter eligibility or to the vote-counting system. The site's attitude would appear to be, "This happens, and we try to take care of it through our sweep system, but in the long term it's not a big deal and it's not worth setting up a system that will systematically eliminate perfectly legitimate votes AND reduce the overall number of votes, making the system in some ways MORE prone to abuse, because a large vote base is precisely what helps insulate the score from manipulation and makes the score produced more reliable as a source of information for prospective readers."

That's reasonable. If you controlled the voting system you might do it differently, but I think it's hard to say it's a "broken" system.
 
I don't know if these assumptions are true, but they seem on their face reasonable. I know of no reason to assume they are unreasonable. If that's the case, it's not hard to see how stories in long-term all-time scoring lists with many, many votes tend to top out no higher than the mid 4.8s. And you can get to that without any assumptions about people lying in wait ready to downvote stories that get above a certain level to hold them down. It may happen (and I believe it does happen in some cases), but there's no particular reason to believe that it is a substantial explanation of the long-term score patterns we see.
I think if that were the case, we'd see smoother distribution of both ratings (again, Romance has ~16 stories right now between 4.9 and 4.86 and an undeterminable number that's significantly greater than 200 at 4.85) and number of ratings. Look at Erotic Couplings, where the top three stories have 700, 153 and 14000 ratings. But in Romance, nothing in that group above 4.85 has fewer than 1000 ratings, because everything that breaks cover and enters that space gets downrated straight out. And that's a category where the ceiling is demonstrably higher; the #1 story has 50% more ratings than the #1 in Incest (nearly 23,000) and is at 4.9.

Is it broken? Meh. Yeah, probably. Is it fixable? Meh, probably not. Online polls are notoriously easy to manipulate and no one's ever managed to put one together that can't be screwed with without requiring verification. It just is what it is.
 
Back
Top