Counting your votes using the TLAR method

If you want, you can calculate a distribution that might describe your votes, as long as you don't have a low number of votes or any of the other things that cause patterns to deviate.
I don’t need to do that. As I explained I know the individual votes on dozens of my stories. They all are J curves, which is precisely what you’d expect given a 1 - 5 rating system.

How did @8letters determine the actual votes cast given the numbers? Or did he fit them to a curve, like @Duleigh?

OK - I’ve read what you said again. The above is NOT an analysis of the actual votes, it’s an exercise in curve fitting. I’m talking about actual vote values captured one by one in low traffic categories. That’s a totally different approach.
 
Last edited:
I don’t need to do that. As I explained I known the individual votes on dozens of my stories. They all are J curves, which is precisely what you’d expect given a 1 - 5 rating system.

How did @8letters determine the actual votes cast given the numbers? Or did he fit them to a curve, like @Duleigh?

OK - I’ve read what you said again. The above is NOT an analysis of the actual votes, it’s an exercise in curve fitting. I’m talking about actual vote values captured one by one in low traffic categories. That’s a totally different approach.
8letters published his method. As I recall, he tracked stories in the 30-day top list and backed out the individual results when he could. The data come mostly from slow categories where he could capture the vote. He broke the results down by category, and votes in every category except two with very few votes matched the geometric distribution with a little extra bump on 1* votes.

By "J curve" do you mean that the 1* votes are higher than 2* votes, 3* votes are the normally the lowest, and 4* and 5* votes are progressively higher?

If I saw that pattern on my stories, I would ask myself what I did to offend my readers. I had a similar pattern in early voting of the fourth story I published here. It was because I put the story in the wrong category. I moved the story and subsequent voting has been more normal.
 
The data come mostly from slow categories where he could capture the vote.
I’d like to hear @8letters expalin that. It takes an analy retentive data geek like me all my effort to track just the views on one of my stories at a time. I don’t see how he could possibly track actual votes cast across a portfolio of stories (even if we consider just the low traffic categories, we are talking about hundreds of stories and thousands of votes). But maybe he has a way to track individual votes that doesn’t involve any upfront assumptions about their distribution. I’ll wait to hear.
 
I’d like to hear @8letters expalin that. It takes an analy retentive data geek like me all my effort to track just the views on one of my stories at a time. I don’t see how he could possibly track actual votes cast across a portfolio of stories (even if we consider just the low traffic categories, we are talking about hundreds of stories and thousands of votes). But maybe he has a way to track individual votes that doesn’t involve any upfront assumptions about their distribution. I’ll wait to hear.
My basic question is this:

Are the stats presented above based on an aggregation of actual individual votes on actual individual stories, or are they an analysis based on aggregates and some assumptions about distributions (that may or may not be valid).

All of my comments are about counting actual votes, not starting with aggregated numbers and trying to disaggregate them in a what is hoped is a reasonable manner based on assumptions.
 
My basic question is this:

Are the stats presented above based on an aggregation of actual individual votes on actual individual stories, or are they an analysis based on aggregates and some assumptions about distributions (that may or may not be valid).

All of my comments are about counting actual votes, not starting with aggregated numbers and trying to disaggregate them in a what is hoped is a reasonable manner based on assumptions.
Here's the link to his original post. He's counting actual votes, but he isn't doing it manually.

Join the crowd of anally retentive data geeks. I also count actual votes, to the extent that I can. They're monitored at intervals that vary with the traffic. As short as 8 minutes, as long as 4 hours. The data are stored in an SQLite database. 8letters uses Access.

If I need to estimate a vote or disaggregate data then I do it with probabilistic methods. The only distribution I assume is uniform (equal probability of all votes.)

I started doing this with my fourth story on Lit because I felt like the votes were coming in all 1s or 5s, and I didn't understand why. Turns out that impression was incorrect. My methods continued to evolve over time after that, and are still evolving.
 
Here's the link to his original post. He's counting actual votes, but he isn't doing it manually.
Read it. His method is predicated on single votes coming through between snapshots (his N and N+1). That’s a biased sample. He’s counting a subset of vote which may not share the characteristics of the overall population.

The results might be interesting, but it’s hard to call them robust.
 
Read it. His method is predicated on single votes coming through between snapshots (his N and N+1). That’s a biased sample. He’s counting a subset of vote which may not share the characteristics of the overall population.

The results might be interesting, but it’s hard to call them robust.
He admits problems, but the problems are mostly in high-volume categories. There is a potential for bias, but that doesn't establish that a bias exists. This is especially true if your interest is in the distribution, which I've found to be a little hard to mess up.

I you look at his table, there are few votes in T/I because he wasn't able to catch them. In low-volume categories he probably caught a large proportion of the total votes, especially those cast after the first day. If first-day voting is distributed differently than voting after the first day, then the incomplete sampling could lean toward the distribution of data after the first day.

I can't dissect his data more than he already has, but I can look at my own.

This is the day one distribution for a story from earlier this year.

Day_One.png

This is the distribution of votes after the first day.

After_Day_One.png
Both are geometric distributions, though there is a small difference in the score.

This proves nothing, but it does demonstrate that missing early data doesn't necessarily change any conclusion about the distribution of the data. I'm confident that I could do this same break down on other stories with a sufficient number of votes, and the distributions would be the same.
 
He admits problems, but the problems are mostly in high-volume categories. There is a potential for bias, but that doesn't establish that a bias exists. This is especially true if your interest is in the distribution, which I've found to be a little hard to mess up.

I you look at his table, there are few votes in T/I because he wasn't able to catch them. In low-volume categories he probably caught a large proportion of the total votes, especially those cast after the first day. If first-day voting is distributed differently than voting after the first day, then the incomplete sampling could lean toward the distribution of data after the first day.

I can't dissect his data more than he already has, but I can look at my own.

This is the day one distribution for a story from earlier this year.

View attachment 2585523

This is the distribution of votes after the first day.

View attachment 2585524
Both are geometric distributions, though there is a small difference in the score.

This proves nothing, but it does demonstrate that missing early data doesn't necessarily change any conclusion about the distribution of the data. I'm confident that I could do this same break down on other stories with a sufficient number of votes, and the distributions would be the same.
He is selecting a subset of votes - and particularly those where votes are trickling in. Most votes occur in a rush over the first two days that a story is live (a little longer for comp stories). Those are excluded. I don’t have to prove this method is unsound, it’s obviously unsound - to be clear it’s not a robust way to draw conclusions on anything bar the subset of votes actually captured.

It’s perfectly fine to say - look at this interesting piece of modelling I did. Representing it as actually counting votes is not reasonable. It’s not doing that in any meaningful way.

Please feel free to have the last word, I’ll talk to @8letters directly about this.
 
You've all read this, right?
https://www.literotica.com/s/how-to-analyze-your-scores

It allows you to work out the max and min number of votes of each value you received for a score of S, where N is total votes:
Five-star Principle:

Let n denote the number of five-star votes awarded to a story. Then

a) n is no smaller than (S – 4.005) × N, and

b) n is at most N × (S – 0.995)/4.

One-star Principle:

Let n denote the number of one-star votes awarded to a story. Then

a) n is no smaller than (1.995 – S) × N, and

b) n is at most N × (5.005 – S)/4.

The rest of the article goes into the Maths behind these formulas and gives worked examples.
 
It would be nice to see specifically how many one star bombs we've collected. The problem is, even if a math equation could take the amount of votes, and 1-5 rating variety, and come up with a solid answer for how many 1, 2, 3, 4, and 5 stars we got, that kind of math is way too confusing for me. So if anyone figures it out, hopefully you explain how to do it with a simple calculator cause that would awesome information to have.

Also, I constantly get a flooded then unflooded with votes. Someone will go through and 1 star all my stories, then after a couple days, the vote goes away and the rating bounce back to where it original settled at before the person did it. So I have to imagine the 'system' aka website and people running it have the information of individual votes, so I don't understand why they couldn't share it with us.

That could be really valuable feedback/data for writers who want to know if there story is getting a lot of 3's and 4's, or a bunch of 5's with an occasional 1 rating.

[UPDATE!!!]

I went and asked ChatGPT for help finding out. It told me it can't figure it out exactly but it could tell me how many possibly 1 and 5 stars I had with the two points of information I could provide it, and it can also calculation all the possible distributions. So I gave it the overall rating and the amount of votes of my newest story which is 4.71 stars with 17 votes.

It said the most amount of 5 stars I could have is 15, and the other two votes would have to be a 4 star and a 1 star.

The most amount of 1 stars I could have is 1, and the other two votes would have to be a 4 star and 15 5 stars, which I guess is the same thing backwards.

Then it said there was 6 possible distributions, and it showed me them like (1star, 2star, 3star, 4star, 5star) format. And those were...

(0,0,0,5,12)
(0,0,1,3,13)
(0,0,2,1,14)
(0,1,0,2,14)
(0,1,1,0,15)
(1,0,0,1,15)

It's kind of crazy that 17 5star votes can only be arranged six different ways, but I don't have the math skill to confirm or disprove this answer.
 
Last edited:
I can confirm that there are only six possible combinations. With a high score like 4.71 there aren't a lot of options.
 
He is selecting a subset of votes - and particularly those where votes are trickling in. Most votes occur in a rush over the first two days that a story is live (a little longer for comp stories). Those are excluded. I don’t have to prove this method is unsound, it’s obviously unsound - to be clear it’s not a robust way to draw conclusions on anything bar the subset of votes actually captured.

It's also assuming a set curve distribution. This curve cannot be assumed for any polarizing story, nor for any story in a potentially polarizing category. It also can't be assumed for any story with less than say 50-100 votes. This method looks hopelessly flawed if you ask me.
 
It's also assuming a set curve distribution. This curve cannot be assumed for any polarizing story, nor for any story in a potentially polarizing category. It also can't be assumed for any story with less than say 50-100 votes. This method looks hopelessly flawed if you ask me.
8letters' compilation makes no assumptions about curves. I added curves to the graphs to demonstrate that the data follow a geometric distribution.
 
Like a lot of what passes for discussion here, this has become pointless. I’m not going to bloody my head by stating the obvious ad infinitum (and tedium). People can draw their own conclusions based on the stated method and indeed the stated outcome.

I know @8letters has his detractors here (I’m not going to get into the history) but we’ve generally got on fine and he’s let me use his [other] datasets to cross-check my own analyses. I’ll have a chat with him when he’s next around and suspect he is claiming less for his figures than others.
 
Thank you, @NotWise, for mentioning me in this discussion.

Reading authors discuss voting on this forum, it sounded to me like that people thought that the voting distribution was U-shaped, with 5 being the most common and 1 being the second-most common. In case you haven't noticed, there's constant discussion of 1-bombs on this forum. Looking at discernible votes, that's not the voting pattern at all. As @NotWise beautiful demonstrated, it's a geometric distribution with one-votes a little more common that a geometric distribution would predict. The rating on your story was much more likely knocked down by a 3-vote than a 1-vote. I.e., voters knocked down the rating on your story because they read it and weren't impressed.

Is my analysis perfect? No. But it's a hell of a lot better than anything else I've seen. If you've got something better, I'd love to see it.
 
Like a lot of what passes for discussion here, this has become pointless. I’m not going to bloody my head by stating the obvious ad infinitum (and tedium). People can draw their own conclusions based on the stated method and indeed the stated outcome.

I know @8letters has his detractors here (I’m not going to get into the history) but we’ve generally got on fine and he’s let me use his [other] datasets to cross-check my own analyses. I’ll have a chat with him when he’s next around and suspect he is claiming less for his figures than others.
I usually check the AH every weekday. There's just so little that's worth replying to.
 
8letters' compilation makes no assumptions about curves. I added curves to the graphs to demonstrate that the data follow a geometric distribution.

I was talking about the OP. His formulas are a hard-coded curve. He's using a curve to figure out the votes distribution. He has it backwards. We need the votes distribution to calculate the curve. The endeavor defeats its own purpose. Why do we want to know the vote distribution? So that we can see how we got our score.

Here are some examples to compare.

4,4,4,4,4,4,4,4,4,4,4,4,4,4,4 ~ total = 60 / 15 votes = 4.00 (even distribution curve)
5,5,5,5,5,5,4,4,3,2,2,2,1,1,1 ~ total = 60 / 15 votes = 4.00 (typical lit distribution, say)
5,5,5,5,5,5,5,5,4,1,1,1,1,1,1 ~ total = 60 / 15 votes = 4.00 (bipolar distribution curve)

The whole point of being able to see the votes distro is to figure out if we genuinely scored a 4.00 or if we actually may have scored better but were dragged down by bombs, see? So the OP's math simply assumes that we have a typical curve that all stories adhere to. So all the math does is tell us that our scores are typically distributed without actually analyzing the distro at all. Instead of telling us how the votes were distributed, it tells us how they would look if distributed on that arbitrarily assumed hard-coded curve. It could not be more flawed.

Sorry to say but this math is completely useless. Actually worse than useless. It's misleading.
 
Like a lot of what passes for discussion here, this has become pointless. I’m not going to bloody my head by stating the obvious ad infinitum (and tedium). People can draw their own conclusions based on the stated method and indeed the stated outcome.

I see some value here. Sorry if you don't. Since I first realized that there was a consistent pattern to the distribution of the readers' votes I've asked myself what it means. Learning what it means is the value in knowing the distribution.

This is philosophy, but some might be able to draw from it to help their own work.

Here are two things I've learned from a variety of sources and experiences at Lit, including the voting results:

-> Very few readers are critics. Their response doesn't mean your writing is good or bad. They're telling you how much they enjoyed the story, and very little more.

-> Readers who vote are mostly positive and supportive. Their comments don't always appear that way, but the median vote on Lit stories is close to 4.5 on a scale from 1 to 5. The readers are so supportive that they rate half of all the stories in the top 1/8th of the scale. Complain about the voters? Not me.

In addition, the typical distribution is similar to a geometric distribution, and the geometric distribution carries some implications.

-> The geometric distribution describes a serial process. Voting looks like a serial thought process where voters typically start at 5* and consider lower scores in succession until they make a choice. Basically, a 5* vote is yours to lose.

-> The odds that a voter will award any specific star rating is the same, regardless of the value. It's the order in their thought process that makes the difference. This may not be true for individual voters, but it seems to describe voters as a group. Their decision isn't so much about how they score a story as it is about whether they will score the story at all. The more likely readers are to vote, the higher the score will be.

We know from @8letters compilation that there is a small but fairly consistent bump in 1* votes. It's a small bump, but it's a consistent deviation from the geometric distribution. One possible explanation is that Lit has a small population of voters who's process starts at 1* instead of 5*, and they work their way up the scale instead of down the scale. Another possibility is that the bump is caused by punitive voting where the voter is trying to punish the writer for something in the story they find offensive. There could be other reasons.

There's additional value for people who know the distribution of their results. If their distribution differs from the typical distribution then they can ask themselves why it differs, and maybe discover something about their own writing.

As to the method in @Duleigh's OP, it's an estimate that's probably intended more for visual effect than for anything else. It's to give a writer who has no other data some sense of about how the distribution of votes might look. It starts with a basic distribution (that happens to look a lot like a geometric distribution) with a mean score of 4.55, and it adjusts the count for each possible score up or down to account for scores different from 4.55.

It's the coefficients used for that adjustment that are the magic in the process. From "Baysian" in the name of the process, those values probably came from some external source, typically the opinion of one or more experts, possibly combined with further analysis.

The results probably aren't bad for scores near 4.55, but they wander off at higher or lower scores.
 
Back
Top