Statistical analysis of voting

MathGirl

Cogito
Joined
Aug 4, 2002
Posts
5,825
Here's something that should send most everyone to sleep:

In most scientific disciplines, the difference between two pieces of data are considered "statistically significant" if there is less than a 5% chance that the two are really the same (p<0.05). In many cases, results are not considered "significantly different" unless the chance is much less than that (e.g. p<0.01). The values are determined mathematically.

The statiscal significance is a function of the divergence of the two pieces of compiled data, the spread of the data input, and the number of pieces of data. I decided to see what the statistical significance of voting at Lit.com might be.

One of my dirty stories is in the top 500 (barely) with a cumulative mean vote of 4.74. There are several others with cumulative scores of 4.60 and higher. What I wanted to know was: "Is there any valid significant difference between the scores of the story that made the top 500 and one which didn't?

Question: What is the likelihood that a story which had a mean vote score of 4.75 actually scored higher than a story which averaged 4.60? I used a value of 50 votes for each story. Computational materials used included a #2 pencil, the back of a crossword puzzle, and a blonde head.

I analyzed the data using two of the most commonly used and simplest statistical analyses.

I couldn't come up with exact numbers because of missing information. E.g. I didn't know the individual votes (how many 4s, 3s,5s). I could, though come up with an interesting conclusion, and both tests gave comparable result:

"There is no significant difference between the scores of 4.75 and 4.60 based on 50 votes for each" (p>0.05)

This means that, if an infinite number of readers cast an infinite number of votes, there is a greater than 5% chance that the voting averages would be the same. In fact, the chance appears to be in the 10-15% range.

In other words, there really is no difference between votes of 4.75 and 4.60.

What does all that mean? Nothing, really.

MG
 
Last edited:
MG you have way too much time on your hands to have sat down and figured that out. I used to try and figure it out by checking the scores every few hours and going off the very beginning in terms of how to determine the voting scores. It drove Me mad, and I finally gave up because if you miss when two or three votes are cast it is a total was unless they are fives.

HEHE I once upon a time had a story that had a 5.0 ranking after 17 votes for sure. That one was easy to calculate lol.
 
I think this is perfectly natural for her. She is the math girl.

Sometimes I think she can read the stories in their base binary form. That would be so cool.
 
There is nothing in life quite so fascinating as a fanatic.
 
That is simply a terrifying thought KM. hehe Just realized I am sandwhiched in between two really intelligent women. :D Lucky Me.
 
MathGirl said:
This means that, if an infinite number of readers cast an infinite number of votes, there is a greater than 5% chance that the voting averages would be the same. In fact, the chance appears to be in the 10-15% range.

I hope you keep stats out of your dissertation dahhling... Or, statisticians out of your defense... One or the other... :devil:
hs
 
The ratio of the voting/views is what I look at. People will tend to vote for a better story and not vote at all on a poor one. Thus if you have 1 vote per 200 views, you have a hit. If you are getting 1 view per 1000 views, you have a dud. So if your 4.75 story has a ratio of 1 vote per 200 views and your 4.60 has a ratio of 1 vote per 500 views, you have a significant difference, even when you do the math due to the human element involved.

It is like blue fish. As the price goes down, the amount they sell goes up. However when it gets below $2.00 a pound, people stop buying it because it is considered too cheap. You have to somehow gauge the human element. I use the voter ratio and feedbacks.
 
KillerMuffin said:
Sometimes I think she can read the stories in their base binary form. That would be so cool.
Lots of people can read ASCII from binary, octal or hex, with or without parity. Some of us can still make a fair fist of GreyCode from five-hole papertape.

Why do programmers always confuse Hallowe'en and Christmas Day? (MG, please don't answer because you would obviously know.)
 
Un-registered said:
Why do programmers always confuse Hallowe'en and Christmas Day? (MG, please don't answer because you would obviously know.)

Actually, I have no idea. I don't know a thing about computer programming. My approach to computers is from the opposite end (hardware). If you call something you can't even see "hardware."
MG
 
Master_Vassago said:
MG you have way too much time on your hands to have sat down and figured that out. .

Dear MV,
It took about five minutes. Much less time than it took to figure out how to post in here in an understandable manner.
MG
 
Dunno. It's got nothing to do with binary that's for sure (says he who's just worked through the binary of 31/10 and 25/12). Put us out of our misery.

The Earl
 
KillerMuffin said:
I think this is perfectly natural for her. She is the math girl.

Sometimes I think she can read the stories in their base binary form. That would be so cool.

Ones and Zeroes, Pegs and Holes, Male and Female, Works for me!!!! :D
 
Sheeeeeshhhhh

TheEarl said:
Put us out of our misery.The Earl

What I meant was: It took longer to write the explanation of my statistical analysis than it did to do it.
MG
 
What's the saying again, you can quote statistics to prove anything? I find statistics fascinating. True, I only got a B in it because one of the members on my team was an extremely hot woman who deliberately wore miniskirts to excite our professor, but I appreciate their use anyway.

But I don't see how you can say that there's "no significant difference between the scores of 4.75 and 4.60 based on 50 votes for each" when well, you can actually measure the results of the score. The votes aren't random numbers, they're actually based on the quality of the story. A story with a score of 4.75 recieved more good votes than a story that scored 4.60.

And if you say that an infinite number of people cast an infinite number of votes...well, let's get rid of "infinite" and say, oh, 6 billion, which is a big enough number to be infinite for all intents and purposes. If every human being on the planet (6 billion) voted on both "Hamlet", by William Shakespeare, and "Those Autofellatio Blues", by, ahem, me, I would think that the greatest work in literature would score better than my disgusting tale, because it IS better.

True, my story might do better in a more limited sample, like people who get turned on reading about guys sucking themselves off, but I think that's a pretty small part of the overall population. Get a statiscially significant sample, and I lose to Shakespeare. And that's OK, I've come to terms with that.
 
MathGirl said:
... and a blonde head. ...


This variable can't be statistically explained. At least that's what my statistics professor told me when I complained about my final grade. :confused:


Pookie :rose:
 
christo said:
What's the saying again, you can quote statistics to prove anything?

Dear Christo,
Your course in stats should have also taught you that the above comment is completely false, mathematically.
MG
 
MG:


"There is no significant difference between the scores of 4.75 and 4.60 based on 50 votes for each" (p>0.05)

This means that, if an infinite number of readers cast an infinite number of votes, there is a greater than 5% chance that the voting averages would be the same. In fact, the chance appears to be in the 10-15% range.

//[later]
to figure out how to post in here in an understandable manner.


I'm not sure it is understandable, since the initial assumption of about 50 votes was apparently changed to 'an infinite number of votes.'

If there were (sets of) 50 votes cast twice, from the same population of voters(votes) with the same evaluation, and certain assumptions about scatttering are made, then I can see how one would, say, 10% of the time, get two averages that differed by that specified.

In simple terms, there is first a certain 'noise' in each posted average, and it may amount to .5 or .15 or whatever, depending how you analyze it. In that case, both groups of voters hypothetically might feel the same way, but the means of the two groups differ by the amount of the 'noise' factor.

Second, since the votes are always integral, the averages 'bounce' around more, at the low levels of voting, as results are accumulated. At the level of 50, a 4 still has an appreciable effect on the average, since no one can vote, say, 4.6
This affects possible difference between the results of two voting processes.

It is quite tricky to figure a possible range of 'true' values in this particular voting situation; the 'outliers' too have vastly disproportionate effects, and are never discarded. The only 'remedy' applied by Laurel is to let the accumulation occur, and if it ever reaches, say 500, one is pretty safe from odd fluctuation.

So, in all, I generally agree with your conclusion that small differences that affect the posted standings considerably-- eg between 4.61 and 4.60 are in fact meaningless. Iow that tiny fluctuation entirely due to 'noise' dropped the rank of the story from, say, 38 to 78.
 
Last edited:
WWSJD?

Sub Joe said:
I don't get it. Are you saying that my life has no meaning?

Au contraire, SubJoe. You're our leader, our paragon, our ideal, the sine qua non.
MG
 
How to screw with the ratings...

HEHE I once upon a time had a story that had a 5.0 ranking after 17 votes for sure. That one was easy to calculate lol.

If after 17 you still have a 5 average, and along came a spider, and sat down a 1 the average would then be 4.78, and not a bad score at that. But if two more spiders decided to join the first, and also gave you a 1 vote each then your score would be 4.4, and that puts you out of the running for a nice juicy fat H, let alone a chance to win any contests here. And one would think that after acquiring 17 5's in a row that they had a great story, wouldn't you? But hey, who cares, it's only a stupid vote, right? Now add another spider, and you get a 4.25. Even if the next ten votes are 5's you'll still be no better than a 4.4 average. And if you get two more spiders after that then you have a 4. 33 average. Of course if you aren't hit by anything lower than a 5 vote after that then you should be able to reach a 4.5 average again sometime around when you have a total of 50 or so votes. But by then you are no longer new, nor on the top lists. But who cares, right? So what if some snerts have sabotaged your work, right? You can always write another one, and submit it, and hopefully the snerts will be asleep next time. Right?

As Always
I Am the Dirt Man
 
"There are three kinds of lies: lies, damned lies and statistics". Benjamin Disraeli. I think that's the quote I was thinking of. I think there's another one, something along the lines of the Devil can quote Scripture to his own purpose, but with statistics, but I forget it now.
 
Hemorrhoids

Dirt Man said:
If after 17 you still have a 5 average, and along came a spider, and sat down a 1. You can always write another one, and submit it, and hopefully the snerts will be asleep next time.

No, that's when you come here, start a thread, and piss and moan about how unfair it all is. I think I did that once.

MG

Ps. The snerts never sleep

Pps. I believe that votes depend, to a large extent, on whether the reader's piles are acting up.
 
Last edited:
Back
Top