Counting your votes using the TLAR method

If you want, you can calculate a distribution that might describe your votes, as long as you don't have a low number of votes or any of the other things that cause patterns to deviate.
I don’t need to do that. As I explained I know the individual votes on dozens of my stories. They all are J curves, which is precisely what you’d expect given a 1 - 5 rating system.

How did @8letters determine the actual votes cast given the numbers? Or did he fit them to a curve, like @Duleigh?

OK - I’ve read what you said again. The above is NOT an analysis of the actual votes, it’s an exercise in curve fitting. I’m talking about actual vote values captured one by one in low traffic categories. That’s a totally different approach.
 
Last edited:
I don’t need to do that. As I explained I known the individual votes on dozens of my stories. They all are J curves, which is precisely what you’d expect given a 1 - 5 rating system.

How did @8letters determine the actual votes cast given the numbers? Or did he fit them to a curve, like @Duleigh?

OK - I’ve read what you said again. The above is NOT an analysis of the actual votes, it’s an exercise in curve fitting. I’m talking about actual vote values captured one by one in low traffic categories. That’s a totally different approach.
8letters published his method. As I recall, he tracked stories in the 30-day top list and backed out the individual results when he could. The data come mostly from slow categories where he could capture the vote. He broke the results down by category, and votes in every category except two with very few votes matched the geometric distribution with a little extra bump on 1* votes.

By "J curve" do you mean that the 1* votes are higher than 2* votes, 3* votes are the normally the lowest, and 4* and 5* votes are progressively higher?

If I saw that pattern on my stories, I would ask myself what I did to offend my readers. I had a similar pattern in early voting of the fourth story I published here. It was because I put the story in the wrong category. I moved the story and subsequent voting has been more normal.
 
The data come mostly from slow categories where he could capture the vote.
I’d like to hear @8letters expalin that. It takes an analy retentive data geek like me all my effort to track just the views on one of my stories at a time. I don’t see how he could possibly track actual votes cast across a portfolio of stories (even if we consider just the low traffic categories, we are talking about hundreds of stories and thousands of votes). But maybe he has a way to track individual votes that doesn’t involve any upfront assumptions about their distribution. I’ll wait to hear.
 
I’d like to hear @8letters expalin that. It takes an analy retentive data geek like me all my effort to track just the views on one of my stories at a time. I don’t see how he could possibly track actual votes cast across a portfolio of stories (even if we consider just the low traffic categories, we are talking about hundreds of stories and thousands of votes). But maybe he has a way to track individual votes that doesn’t involve any upfront assumptions about their distribution. I’ll wait to hear.
My basic question is this:

Are the stats presented above based on an aggregation of actual individual votes on actual individual stories, or are they an analysis based on aggregates and some assumptions about distributions (that may or may not be valid).

All of my comments are about counting actual votes, not starting with aggregated numbers and trying to disaggregate them in a what is hoped is a reasonable manner based on assumptions.
 
My basic question is this:

Are the stats presented above based on an aggregation of actual individual votes on actual individual stories, or are they an analysis based on aggregates and some assumptions about distributions (that may or may not be valid).

All of my comments are about counting actual votes, not starting with aggregated numbers and trying to disaggregate them in a what is hoped is a reasonable manner based on assumptions.
Here's the link to his original post. He's counting actual votes, but he isn't doing it manually.

Join the crowd of anally retentive data geeks. I also count actual votes, to the extent that I can. They're monitored at intervals that vary with the traffic. As short as 8 minutes, as long as 4 hours. The data are stored in an SQLite database. 8letters uses Access.

If I need to estimate a vote or disaggregate data then I do it with probabilistic methods. The only distribution I assume is uniform (equal probability of all votes.)

I started doing this with my fourth story on Lit because I felt like the votes were coming in all 1s or 5s, and I didn't understand why. Turns out that impression was incorrect. My methods continued to evolve over time after that, and are still evolving.
 
Here's the link to his original post. He's counting actual votes, but he isn't doing it manually.
Read it. His method is predicated on single votes coming through between snapshots (his N and N+1). That’s a biased sample. He’s counting a subset of vote which may not share the characteristics of the overall population.

The results might be interesting, but it’s hard to call them robust.
 
Read it. His method is predicated on single votes coming through between snapshots (his N and N+1). That’s a biased sample. He’s counting a subset of vote which may not share the characteristics of the overall population.

The results might be interesting, but it’s hard to call them robust.
He admits problems, but the problems are mostly in high-volume categories. There is a potential for bias, but that doesn't establish that a bias exists. This is especially true if your interest is in the distribution, which I've found to be a little hard to mess up.

I you look at his table, there are few votes in T/I because he wasn't able to catch them. In low-volume categories he probably caught a large proportion of the total votes, especially those cast after the first day. If first-day voting is distributed differently than voting after the first day, then the incomplete sampling could lean toward the distribution of data after the first day.

I can't dissect his data more than he already has, but I can look at my own.

This is the day one distribution for a story from earlier this year.

Day_One.png

This is the distribution of votes after the first day.

After_Day_One.png
Both are geometric distributions, though there is a small difference in the score.

This proves nothing, but it does demonstrate that missing early data doesn't necessarily change any conclusion about the distribution of the data. I'm confident that I could do this same break down on other stories with a sufficient number of votes, and the distributions would be the same.
 
He admits problems, but the problems are mostly in high-volume categories. There is a potential for bias, but that doesn't establish that a bias exists. This is especially true if your interest is in the distribution, which I've found to be a little hard to mess up.

I you look at his table, there are few votes in T/I because he wasn't able to catch them. In low-volume categories he probably caught a large proportion of the total votes, especially those cast after the first day. If first-day voting is distributed differently than voting after the first day, then the incomplete sampling could lean toward the distribution of data after the first day.

I can't dissect his data more than he already has, but I can look at my own.

This is the day one distribution for a story from earlier this year.

View attachment 2585523

This is the distribution of votes after the first day.

View attachment 2585524
Both are geometric distributions, though there is a small difference in the score.

This proves nothing, but it does demonstrate that missing early data doesn't necessarily change any conclusion about the distribution of the data. I'm confident that I could do this same break down on other stories with a sufficient number of votes, and the distributions would be the same.
He is selecting a subset of votes - and particularly those where votes are trickling in. Most votes occur in a rush over the first two days that a story is live (a little longer for comp stories). Those are excluded. I don’t have to prove this method is unsound, it’s obviously unsound - to be clear it’s not a robust way to draw conclusions on anything bar the subset of votes actually captured.

It’s perfectly fine to say - look at this interesting piece of modelling I did. Representing it as actually counting votes is not reasonable. It’s not doing that in any meaningful way.

Please feel free to have the last word, I’ll talk to @8letters directly about this.
 
You've all read this, right?
https://www.literotica.com/s/how-to-analyze-your-scores

It allows you to work out the max and min number of votes of each value you received for a score of S, where N is total votes:
Five-star Principle:

Let n denote the number of five-star votes awarded to a story. Then

a) n is no smaller than (S – 4.005) × N, and

b) n is at most N × (S – 0.995)/4.

One-star Principle:

Let n denote the number of one-star votes awarded to a story. Then

a) n is no smaller than (1.995 – S) × N, and

b) n is at most N × (5.005 – S)/4.

The rest of the article goes into the Maths behind these formulas and gives worked examples.
 
It would be nice to see specifically how many one star bombs we've collected. The problem is, even if a math equation could take the amount of votes, and 1-5 rating variety, and come up with a solid answer for how many 1, 2, 3, 4, and 5 stars we got, that kind of math is way too confusing for me. So if anyone figures it out, hopefully you explain how to do it with a simple calculator cause that would awesome information to have.

Also, I constantly get a flooded then unflooded with votes. Someone will go through and 1 star all my stories, then after a couple days, the vote goes away and the rating bounce back to where it original settled at before the person did it. So I have to imagine the 'system' aka website and people running it have the information of individual votes, so I don't understand why they couldn't share it with us.

That could be really valuable feedback/data for writers who want to know if there story is getting a lot of 3's and 4's, or a bunch of 5's with an occasional 1 rating.

[UPDATE!!!]

I went and asked ChatGPT for help finding out. It told me it can't figure it out exactly but it could tell me how many possibly 1 and 5 stars I had with the two points of information I could provide it, and it can also calculation all the possible distributions. So I gave it the overall rating and the amount of votes of my newest story which is 4.71 stars with 17 votes.

It said the most amount of 5 stars I could have is 15, and the other two votes would have to be a 4 star and a 1 star.

The most amount of 1 stars I could have is 1, and the other two votes would have to be a 4 star and 15 5 stars, which I guess is the same thing backwards.

Then it said there was 6 possible distributions, and it showed me them like (1star, 2star, 3star, 4star, 5star) format. And those were...

(0,0,0,5,12)
(0,0,1,3,13)
(0,0,2,1,14)
(0,1,0,2,14)
(0,1,1,0,15)
(1,0,0,1,15)

It's kind of crazy that 17 5star votes can only be arranged six different ways, but I don't have the math skill to confirm or disprove this answer.
 
Last edited:
I can confirm that there are only six possible combinations. With a high score like 4.71 there aren't a lot of options.
 
He is selecting a subset of votes - and particularly those where votes are trickling in. Most votes occur in a rush over the first two days that a story is live (a little longer for comp stories). Those are excluded. I don’t have to prove this method is unsound, it’s obviously unsound - to be clear it’s not a robust way to draw conclusions on anything bar the subset of votes actually captured.

It's also assuming a set curve distribution. This curve cannot be assumed for any polarizing story, nor for any story in a potentially polarizing category. It also can't be assumed for any story with less than say 50-100 votes. This method looks hopelessly flawed if you ask me.
 
It's also assuming a set curve distribution. This curve cannot be assumed for any polarizing story, nor for any story in a potentially polarizing category. It also can't be assumed for any story with less than say 50-100 votes. This method looks hopelessly flawed if you ask me.
8letters' compilation makes no assumptions about curves. I added curves to the graphs to demonstrate that the data follow a geometric distribution.
 
Like a lot of what passes for discussion here, this has become pointless. I’m not going to bloody my head by stating the obvious ad infinitum (and tedium). People can draw their own conclusions based on the stated method and indeed the stated outcome.

I know @8letters has his detractors here (I’m not going to get into the history) but we’ve generally got on fine and he’s let me use his [other] datasets to cross-check my own analyses. I’ll have a chat with him when he’s next around and suspect he is claiming less for his figures than others.
 
Thank you, @NotWise, for mentioning me in this discussion.

Reading authors discuss voting on this forum, it sounded to me like that people thought that the voting distribution was U-shaped, with 5 being the most common and 1 being the second-most common. In case you haven't noticed, there's constant discussion of 1-bombs on this forum. Looking at discernible votes, that's not the voting pattern at all. As @NotWise beautiful demonstrated, it's a geometric distribution with one-votes a little more common that a geometric distribution would predict. The rating on your story was much more likely knocked down by a 3-vote than a 1-vote. I.e., voters knocked down the rating on your story because they read it and weren't impressed.

Is my analysis perfect? No. But it's a hell of a lot better than anything else I've seen. If you've got something better, I'd love to see it.
 
Like a lot of what passes for discussion here, this has become pointless. I’m not going to bloody my head by stating the obvious ad infinitum (and tedium). People can draw their own conclusions based on the stated method and indeed the stated outcome.

I know @8letters has his detractors here (I’m not going to get into the history) but we’ve generally got on fine and he’s let me use his [other] datasets to cross-check my own analyses. I’ll have a chat with him when he’s next around and suspect he is claiming less for his figures than others.
I usually check the AH every weekday. There's just so little that's worth replying to.
 
Back
Top