Statistical Analysis of LitE Stories

8letters

Writing
Joined
May 27, 2013
Posts
2,108
I've been gathering statistics on stories published on LitE. Everyday, I look at the story hubs to see what stories have been published. I grab all the descriptive facts I can about the story. Once I've done that, I pull the views, number of comments, number of favorites and rating for stories published seven days before. I've got 28 days worth of data, which I think is enough to be worth analyzing. I'm not going to analyze all the categories, so here are the categories I am going to analyze:
[tr][td].[/td][td]
.​
[/td][td]
Average​
[/td][td]
Average​
[/td][td]
.​
[/td][td]
Average​
[/td][td]
Average​
[/td][td]
%​
[/td][td]
%​
[/td][td]
Average​
[/td][td]
Average​
[/td][/tr][tr][td].[/td][td]
.​
[/td][td]
# of​
[/td][td]
# of​
[/td][td]
Average​
[/td][td]
# of​
[/td][td]
F-K Grade​
[/td][td]
Stories​
[/td][td]
Stand​
[/td][td]
# of​
[/td][td]
Comments​
[/td][/tr][tr][td]Category[/td][td]
Count
[/td][td]
Views
[/td][td]
Favorites
[/td][td]
Rating
[/td][td]
Comments
[/td][td]
Level
[/td][td]
Red H
[/td][td]
Alone
[/td][td]
Pages
[/td][td]
Per 100K Views
[/td][/tr][tr][td]Anal[/td][td]
13​
[/td][td]
15.5K​
[/td][td]
9.8​
[/td][td]
3.93​
[/td][td]
0.9​
[/td][td]
7.9​
[/td][td]
15%​
[/td][td]
46%​
[/td][td]
1.5​
[/td][td]
6​
[/td][/tr][tr][td]BDSM[/td][td]
138​
[/td][td]
5.2K​
[/td][td]
3.6​
[/td][td]
4.19​
[/td][td]
1.5​
[/td][td]
8.6​
[/td][td]
30%​
[/td][td]
32%​
[/td][td]
2.0​
[/td][td]
29​
[/td][/tr][tr][td]Celebrities[/td][td]
103​
[/td][td]
2.1K​
[/td][td]
6.0​
[/td][td]
3.87​
[/td][td]
1.4​
[/td][td]
8.7​
[/td][td]
14%​
[/td][td]
38%​
[/td][td]
1.8​
[/td][td]
66​
[/td][/tr][tr][td]E Couplings[/td][td]
200​
[/td][td]
7.0K​
[/td][td]
6.4​
[/td][td]
4.19​
[/td][td]
1.3​
[/td][td]
8.0​
[/td][td]
29%​
[/td][td]
59%​
[/td][td]
1.8​
[/td][td]
19​
[/td][/tr][tr][td]E Horror[/td][td]
21​
[/td][td]
3.4K​
[/td][td]
6.9​
[/td][td]
4.04​
[/td][td]
1.4​
[/td][td]
8.3​
[/td][td]
14%​
[/td][td]
57%​
[/td][td]
2.2​
[/td][td]
40​
[/td][/tr][tr][td]Exhib & Voyeur[/td][td]
92​
[/td][td]
8.9K​
[/td][td]
7.8​
[/td][td]
4.35​
[/td][td]
1.6​
[/td][td]
7.9​
[/td][td]
40%​
[/td][td]
43%​
[/td][td]
1.9​
[/td][td]
17​
[/td][/tr][tr][td]Fetish[/td][td]
99​
[/td][td]
5.4K​
[/td][td]
5.7​
[/td][td]
4.09​
[/td][td]
1.3​
[/td][td]
8.4​
[/td][td]
23%​
[/td][td]
43%​
[/td][td]
1.6​
[/td][td]
25​
[/td][/tr][tr][td]First Time[/td][td]
32​
[/td][td]
13.1K​
[/td][td]
9.3​
[/td][td]
4.11​
[/td][td]
2.5​
[/td][td]
7.9​
[/td][td]
13%​
[/td][td]
72%​
[/td][td]
1.5​
[/td][td]
21​
[/td][/tr][tr][td]Gay Male[/td][td]
108​
[/td][td]
7.9K​
[/td][td]
9.5​
[/td][td]
4.32​
[/td][td]
3.6​
[/td][td]
7.8​
[/td][td]
35%​
[/td][td]
42%​
[/td][td]
1.6​
[/td][td]
47​
[/td][/tr][tr][td]Group Sex[/td][td]
96​
[/td][td]
12.0K​
[/td][td]
10.1​
[/td][td]
4.23​
[/td][td]
1.4​
[/td][td]
8.2​
[/td][td]
39%​
[/td][td]
46%​
[/td][td]
2.4​
[/td][td]
11​
[/td][/tr][tr][td]Humor & Satire[/td][td]
9​
[/td][td]
1.9K​
[/td][td]
1.9​
[/td][td]
3.92​
[/td][td]
1.4​
[/td][td]
7.9​
[/td][td]
22%​
[/td][td]
56%​
[/td][td]
1.2​
[/td][td]
91​
[/td][/tr][tr][td]Illustrated[/td][td]
15​
[/td][td]
17.6K​
[/td][td]
0.3​
[/td][td]
3.36​
[/td][td]
0.7​
[/td][td]
7.6​
[/td][td]
0%​
[/td][td]
20%​
[/td][td]
1.2​
[/td][td]
4​
[/td][/tr][tr][td]Incest/Taboo[/td][td]
264​
[/td][td]
26.1K​
[/td][td]
40.7​
[/td][td]
4.34​
[/td][td]
6.3​
[/td][td]
8.0​
[/td][td]
38%​
[/td][td]
35%​
[/td][td]
2.6​
[/td][td]
24​
[/td][/tr][tr][td]Interracial[/td][td]
54​
[/td][td]
7.9K​
[/td][td]
10.5​
[/td][td]
3.84​
[/td][td]
3.6​
[/td][td]
9.4​
[/td][td]
16%​
[/td][td]
57%​
[/td][td]
2.2​
[/td][td]
44​
[/td][/tr][tr][td]Lesbian Sex[/td][td]
65​
[/td][td]
9.2K​
[/td][td]
10.5​
[/td][td]
4.39​
[/td][td]
5.0​
[/td][td]
8.7​
[/td][td]
44%​
[/td][td]
49%​
[/td][td]
2.1​
[/td][td]
54​
[/td][/tr][tr][td]Letters & Trans[/td][td]
9​
[/td][td]
1.3K​
[/td][td]
0.8​
[/td][td]
4.50​
[/td][td]
0.3​
[/td][td]
10.9​
[/td][td]
57%​
[/td][td]
44%​
[/td][td]
1.0​
[/td][td]
26​
[/td][/tr][tr][td]Loving Wives[/td][td]
131​
[/td][td]
23.2K​
[/td][td]
22.6​
[/td][td]
3.44​
[/td][td]
29.0​
[/td][td]
7.5​
[/td][td]
4%​
[/td][td]
68%​
[/td][td]
2.3​
[/td][td]
127​
[/td][/tr][tr][td]Mature[/td][td]
54​
[/td][td]
17.7K​
[/td][td]
16.0​
[/td][td]
4.29​
[/td][td]
3.6​
[/td][td]
7.8​
[/td][td]
26%​
[/td][td]
61%​
[/td][td]
2.0​
[/td][td]
20​
[/td][/tr][tr][td]Mind Control[/td][td]
96​
[/td][td]
6.6K​
[/td][td]
10.2​
[/td][td]
4.31​
[/td][td]
2.1​
[/td][td]
8.8​
[/td][td]
40%​
[/td][td]
30%​
[/td][td]
1.8​
[/td][td]
32​
[/td][/tr][tr][td]NC/Reluctance[/td][td]
126​
[/td][td]
13.7K​
[/td][td]
11.9​
[/td][td]
4.18​
[/td][td]
3.8​
[/td][td]
8.1​
[/td][td]
15%​
[/td][td]
35%​
[/td][td]
1.9​
[/td][td]
28​
[/td][/tr][tr][td]NonHuman[/td][td]
46​
[/td][td]
3.1K​
[/td][td]
9.8​
[/td][td]
4.33​
[/td][td]
3.3​
[/td][td]
10.4​
[/td][td]
46%​
[/td][td]
37%​
[/td][td]
3.0​
[/td][td]
102​
[/td][/tr][tr][td]Novels Novellas[/td][td]
53​
[/td][td]
1.8K​
[/td][td]
4.5​
[/td][td]
4.53​
[/td][td]
3.1​
[/td][td]
10.8​
[/td][td]
74%​
[/td][td]
6%​
[/td][td]
3.1​
[/td][td]
172​
[/td][/tr][tr][td]Romance[/td][td]
69​
[/td][td]
4.4K​
[/td][td]
10.4​
[/td][td]
4.46​
[/td][td]
5.0​
[/td][td]
7.9​
[/td][td]
57%​
[/td][td]
42%​
[/td][td]
2.8​
[/td][td]
114​
[/td][/tr][tr][td]Sci-Fi Fantasy[/td][td]
166​
[/td][td]
2.3K​
[/td][td]
7.4​
[/td][td]
4.56​
[/td][td]
2.9​
[/td][td]
9.1​
[/td][td]
64%​
[/td][td]
16%​
[/td][td]
3.5​
[/td][td]
127​
[/td][/tr][tr][td]Toys Masturb[/td][td]
11​
[/td][td]
5.1K​
[/td][td]
4.3​
[/td][td]
4.23​
[/td][td]
2.3​
[/td][td]
8.7​
[/td][td]
40%​
[/td][td]
64%​
[/td][td]
1.8​
[/td][td]
45​
[/td][/tr][tr][td]Trans & Cross[/td][td]
93​
[/td][td]
8.5K​
[/td][td]
14.5​
[/td][td]
4.41​
[/td][td]
3.8​
[/td][td]
8.5​
[/td][td]
51%​
[/td][td]
40%​
[/td][td]
2.0​
[/td][td]
45​
[/td][/tr][tr][td]----------[/td][td]
----​
[/td][td]
-----​
[/td][td]
----​
[/td][td]
----​
[/td][td]
---​
[/td][td]
----​
[/td][td]
---​
[/td][td]
---​
[/td][td]
---​
[/td][td]
---​
[/td][/tr][tr][td]All Categories[/td][td]
2163​
[/td][td]
10.5K​
[/td][td]
13.2​
[/td][td]
4.21​
[/td][td]
4.5​
[/td][td]
8.4​
[/td][td]
34%​
[/td][td]
41%​
[/td][td]
2.2​
[/td][td]
44​
[/td][/tr]

Before you spend too much time looking at the numbers, let me say I think they aren't very meaningful.

I'd like to thank AwkwardMD for reviewing these numbers with me yesterday.

Some details on the numbers:
* If a submission was from the user Literotica, I ignored it
* If a story was pulled when I went to get its statistics, I ignored it
* I didn't get the statistics for 10/9. Bloody stupid
* There was one Illustrated story with over 260K reads after 7 days. I deleted it from the data as an outlier
* If a story had voting disabled, then I didn't consider it when calculating average rating
* If a story had comments disabled, then I didn't consider it when calculating average number of comments
* The F-K grade level is the Flesch-Kincaid grade level. I keep thinking the F-K grade level is random noise, but it looks significant in the data
 
Why do I consider the numbers in the above table not very meaningful? Because of Chapters. First chapters in general don't get as many views as stand-alone stories, and subsequent chapters get fewer and fewer views while tending to get higher ratings. I went through the list of stories and identified which appeared to be a chapter and if it appeared to be the first chapter based on the title and description of each story. When I split the data out by that, I get:
[tr][td].[/td][td]
.​
[/td][td]
Average​
[/td][td]
Average​
[/td][td]
.​
[/td][td]
Average​
[/td][td]
%​
[/td][td]
Average​
[/td][td]
Average​
[/td][/tr][tr][td].[/td][td]
.​
[/td][td]
# of​
[/td][td]
# of​
[/td][td]
Average​
[/td][td]
# of​
[/td][td]
Stories​
[/td][td]
# of​
[/td][td]
Comments​
[/td][/tr][tr][td]Type[/td][td]
Count
[/td][td]
Views
[/td][td]
Favorites
[/td][td]
Rating
[/td][td]
Comments
[/td][td]
Red H
[/td][td]
Pages
[/td][td]
Per 100K Views
[/td][/tr][tr][td]Ch01[/td][td]
232​
[/td][td]
11.6K​
[/td][td]
18.1​
[/td][td]
4.14​
[/td][td]
4.2​
[/td][td]
24%​
[/td][td]
1.9​
[/td][td]
37​
[/td][/tr][tr][td]Ch2+[/td][td]
1036​
[/td][td]
6.1K​
[/td][td]
8.5​
[/td][td]
4.35​
[/td][td]
3.3​
[/td][td]
47%​
[/td][td]
2.4​
[/td][td]
56​
[/td][/tr][tr][td]StAS[/td][td]
895​
[/td][td]
15.4K​
[/td][td]
17.4​
[/td][td]
4.07​
[/td][td]
6.1​
[/td][td]
20%​
[/td][td]
2.0​
[/td][td]
40​
[/td][/tr]

First chapters got 75% of the views that stand-alone stories got, and subsequent chapters got on average 40% of the views that stand-alone stories got.

So if you go back to the table in my first post, I'm sure many people were amazed to see the Incest/Taboo stories didn't get significantly more views than Loving Wives stories. Incest/Taboo in the 28 days I have data for got far fewer stand-alone stories than Loving Wives, so even though Incest/Taboo stand-alone stories got significantly more views than Loving Wives did, the average story didn't.

To me, for the data to be significant, it has to be restricted to stand-alone stories. The data will be a little wonky in such an analysis for the SF&F and N&N categories as they have so few stand-alone stories, but for everything else, I think that'll give a more meaningful analysis. I'll post such an analysis soon with the data sliced-and-diced lots of different ways.
 
Have you considered using the chapters data (the steadier state from chapter 3 through to end) as a better indicator of true reads versus back clicks from a story? On the assumption that the views on chapter stories (especially later chapters) are closer to the number of true reads, it's only by analysing chapter stories that you can make even vaguely credible guesses about reader patterns overall.

I have two long multiple chapter stories (23 chapters and 12 chapters - it's not a big sample - but they're both finished stories, not the great unfinished Lit thing) and in both, the second chapter has half the view count of the first, and the third chapter has around one-fifth the number of views, and that count (15 - 20%) tracks (with ups and downs) through to the end. It can reasonably be assumed that views in multiple chapter stories are those folk reading through to the end, and thus, a more realistic assessment of the reader count.

For a stand-alone story, you can make zero assumptions on true reads, UNLESS you extrapolate (interpolate?) multiple chapter data back into the mix.

Based on my limited data (one an EH, the other a Sci-Fi & Fantasy story), my take is that reads : views is more likely 1 : 5, 1 : 6 something of that order, tops. Some writers, with massively strong fan bases, will be outliers from this, but for the majority of us, those who say, "Wow, just look at those numbers, aren't they wonderful?" need to take an Order of Magnitude pill, and calibrate down.

Good data set - but how did you get the F-K Grade ratings - by parsing every story?
 
Last edited:
I'm surprised at the average Flesch-Kincaid scores. My stories usually come out around grade 5 (often less).
 
Have you considered using the chapters data (the steadier state from chapter 3 through to end) as a better indicator of true reads versus back clicks from a story? On the assumption that the views on chapter stories (especially later chapters) are closer to the number of true reads, it's only by analysing chapter stories that you can make even vaguely credible guesses about reader patterns overall.
It would take a huge amount of data to make sense of series of chapters. You want independent events and I can justify considering each posting of a stand-alone story an independent event. But posting a chapter in a series is obviously not an independent event. Even to talk intelligent about the drop off from Chapter 1 to Chapter 2 would take a huge amount of data.

Good data set - but how did you get the F-K Grade ratings - by parsing every story?
Something like that.
 
Something like that.

I'm suspicious of the Flesch-Kincaid scores. Novels and Novellas had one of the highest grade scores, so I went there and checked the last three days of stories. That's not a lot of stories.

Using the "style" tool on my desktop, the scores ranges between 3 and 7. Nothing approached the average of 10.8. I put some samples through on-line tests and came up with similar, but -- at least for Flesch-Kincaid -- slightly higher scores. Still, nothing approached 10.8.

Flesch-Kincaid uses the number of syllables/word and the number of words/sentence. Different automated tools are going to handle those variables differently.

I doubt that any of them count syllables/word. It's common to assume a representative number of words/syllable and estimated syllables/word from the average length of the word.

The measure of words/sentence is probably more of a problem. If the software assumes that sentences end with a full stop, then sentences ending with a question mark or exclamation point will be lumped into adjoining sentences, giving a higher score. If the software uses question marks and exclamation points to end a sentence, then sentences including dialog are splintered, leading to lower scores.

Generally, dialog in the analyzed text should result in a lower score, because dialog is usually a set of short sentences.

Flesch-Kincaid is commonly used in the US, but the rating was based on Navy technical manuals, and it was not intended for use with fiction. You can expect variable results.
 
Last edited:
I had no idea incest stories were such a success.

I used to joke that only people who don't have sisters harbour the fetish of banging one.

To quote the eternal words of Dash from the Incredibles, "Wharrr! Bwa-hew-hewr." :)
 
It would take a huge amount of data to make sense of series of chapters. You want independent events and I can justify considering each posting of a stand-alone story an independent event. But posting a chapter in a series is obviously not an independent event. Even to talk intelligent about the drop off from Chapter 1 to Chapter 2 would take a huge amount of data.
Agree that, but without some parameters on views vs true reads (which I contend only multiple chapter stories can get you even vaguely close to) any assessment on stand-alone stories will miss the fundamental question,"How many people read the whole thing?" Unless you bring in behaviour from a slightly better control sample (multiple chapter stories), you have zero way of knowing how many readers bail out after the first paragraph.
 
[This content has been removed due to a copyright violation.]
 
Last edited by a moderator:
[This content has been removed due to a copyright violation.]
 
Last edited by a moderator:
[This content has been removed due to a copyright violation.]
 
Last edited by a moderator:
This is great work! Thank you for doing this. I agree it would be more useful if standalone story data were segregated, it it's really interesting even so.

A couple points.

John writes that people wont view a later chapter if they don't read the previous one. That seems intuitively right but it's not so. In both my series I have later chapters with more views than previous ones. There's a general pattern of view decline with chapters but it's not absolute.

I'd like to compare red H averages for longer time horizons with 1 month data to get scoring trends over time.
 
John writes that people wont view a later chapter if they don't read the previous one. That seems intuitively right but it's not so. In both my series I have later chapters with more views than previous ones. There's a general pattern of view decline with chapters but it's not absolute.

Can confirm. Here are the views vs. time for my 14-chapter story, running from blue (earliest posted) to red (latest). Chapter 14 now has more views than any of 2-13.
 

Attachments

  • Screen Shot 2018-10-13 at 10.50.24 pm.jpg
    Screen Shot 2018-10-13 at 10.50.24 pm.jpg
    45.3 KB · Views: 0
John writes that people wont view a later chapter if they don't read the previous one. That seems intuitively right but it's not so. In both my series I have later chapters with more views than previous ones. There's a general pattern of view decline with chapters but it's not absolute.
Based on my micro sample of two long multi-chapter stories (one four years old but still being read, the other published this month), you can take the first two chapters out as distortions, and assume the steady state readers to be around the third chapter numbers.

I have assumed the higher read later chapters are people reading the chapter twice - I can see a slight correlation between higher view counts and higher scores where there is an upwards fluctuation for a particular chapter.
 
Based on my micro sample of two long multi-chapter stories (one four years old but still being read, the other published this month), you can take the first two chapters out as distortions, and assume the steady state readers to be around the third chapter numbers.

I have assumed the higher read later chapters are people reading the chapter twice - I can see a slight correlation between higher view counts and higher scores where there is an upwards fluctuation for a particular chapter.

Multiple reads makes sense as an explanation. In both cases where this happened to my stories the later one had a significantly higher score.
 
Flesch-Kincaid scores?

This one of mine puzzles the algorithm for Flesch_Kincaid:

https://www.literotica.com/s/breathless-stargazing

Average grade level about 113 "Ooh, that's probably a bit too complicated. Have you thought about using smaller words and shorter sentences?"

Flesch Kincaid Reading Ease -319.4
Flesch Kincaid Grade Level 154.9
Gunning Fog Score 161.3
SMOG Index 40.4
Coleman Liau Index 11.6
Automated Readability Index 196

No. of sentences 2
No. of words 782
No. of complex words 100
Percent of complex words 12.79%
Average words per sentence 391.00
Average syllables per word 1.53
 
Based on my micro sample of two long multi-chapter stories (one four years old but still being read, the other published this month), you can take the first two chapters out as distortions, and assume the steady state readers to be around the third chapter numbers.

I have assumed the higher read later chapters are people reading the chapter twice - I can see a slight correlation between higher view counts and higher scores where there is an upwards fluctuation for a particular chapter.

Rereads could account for some, but my late chapters get very obvious bumps coinciding with when they made it onto the first page of the toplist, which seems to imply that people are jumping straight into those chapters from the toplist without going back to read the previous ones first.
 
[This content has been removed due to a copyright violation.]
 
Last edited by a moderator:
Rereads could account for some, but my late chapters get very obvious bumps coinciding with when they made it onto the first page of the toplist, which seems to imply that people are jumping straight into those chapters from the toplist without going back to read the previous ones first.

Jim Morrison summed it up: people are strange.
 
It's entirely possible that someone clicks on a later chapter and either leaves 'cause they don't feel like trying to figure it out, or they actually link to the first chapter in the series. Both seem like possible actions, and it would be interesting to investigate them.
That makes completely logical sense - spotting a later chapter first, then going back to the first.

The frequency of issue and the category churn would be a factor - in my first case, 23 chapters over six months = roughly a chapter per week; in the second case 12 chapters released one a day. Both in slow moving categories, so at all times I had several chapters simultaneously in the top list as the story rolled through. So Bramble's stats, to me would be - spot latest chapter, click = one view, click straight to chapter one or click out based on first paragraph. Once in chapter one, usual capture rates apply (in my case, 15 - 20% get to chapter three and read through to the end, some chapters twice).

People are strange, but they are also predictable when moving in large numbers - and we have Gallup Poll and polling booth pool sizes here, so there's no reason not to see some validity in what's going on. In our micro sample of three drops in the ocean :).

But in my own context, these two stories represent 50% of my available data set, and that, statistically, must be significant (within my own data set). Enough for me to see the Pareto principle yet again - 80% of the views are of no consequence because they're most likely back-clicks, but 20% matter because they're my reader base.
 
Those Chapter 13 (I think that color is 13) and 14 lines are abnormal. They rise at much higher rates than any of your other chapters, indicating that there is some exceptional force acting on them.

Chapters 13 and 14 have been in and out of the top-25 list for that category. The times when they're getting faster views coincide exactly with their appearances in that list.

Without being able to explain it, they should be removed from the set as outliers.

"Without being able to explain it" is just about the worst time to be removing outliers. Apologies in advance if I get preachy here, but we're getting into stuff that I get paid to have opinions about...

The main reason for outliering is that when data values are so unusual as to indicate measurement error, then it's probably okay to exclude those data values. (Some caveats apply even here - is the mechanism creating measurement error dependent on the true data values? If so, then excluding outliers may not fix things.)

Getting rid of outliers makes the data look tidier - it reduces your variances, improves p-values, all that good stuff - so a lot of people have fallen into the bad habit of outliering anything vaguely unusual. But if you don't understand why your outliers are present, then you shouldn't be doing it. See the history of the Antarctic ozone hole for an example of why automatically outliering atypical values is a very bad habit.

Now, your examples are interesting and could definitely push someone into looking at the issue more deeply, but they aren't at all useful when it comes to the question of site statistics. We have three series, from two authors. This just isn't useful, from a statistical perspective. Even in the incredibly small sample size (compared to the total number of stories on Lit) of our theoretical study--thousands of stories--three, individual stories aren't worth mentioning.

You can't extract general numbers from them, but they demonstrate that many people are willing to read a later chapter without having seen the previous ones. Toplist effects make it clear that this happens, but that doesn't mean that this is the only time it happens - just the easiest to detect.

The question that we're considering can only be answered based on macro-analysis, not micro-, and to try to contradict a premise on the basis of three pieces of preliminary data is, well, a little ridiculous.

To the contrary, even a single case study can point out when the macro-analysis is based on flawed assumptions. One of the most common reasons statistical analyses go wrong is that people jump in to macro-level analysis without being sufficiently familiar with the micro-level processes that underlie macro behaviour.
 
Back
Top