Who decides what's a similar story?

Thucydides

Virgin
Joined
Oct 26, 2008
Posts
10
Who decides what are "similar stories"? I posted a Romance story, and the "similar stories" list filled up with stuff about monsters, incest, and other stuff I'm not interested in. Eventually that list sorted itself out to where I could see a similarity, but it struck me as pretty odd.
 
It's automated code.

Manu has never said what determines it, and I've never been able to see a real pattern.

Most of us have looked at the similar stories and scratched our heads *laugh*
 
Who decides what are "similar stories"? I posted a Romance story, and the "similar stories" list filled up with stuff about monsters, incest, and other stuff I'm not interested in. Eventually that list sorted itself out to where I could see a similarity, but it struck me as pretty odd.

I don't have any insider knowledge about how Literotica does it, but I'm guessing they use a standard "recommender system" algorithm or something like it.

http://en.wikipedia.org/wiki/Recommender_system

In brief: rec systems work by looking at their users' rating histories, and looking for stories/songs/films/whatever that tend to be correlated. For instance, if you look at people who give "The Seven Samurai" a five-star rating, you'll probably find that a lot of them have also given high ratings to "Yojimbo", and a rec system will make recommendations accordingly. (It probably will NOT pay any heed to the fact that both of these are samurai films made by Kurosawa; all it cares about is how audiences rate this.)

So probably what happened is that one logged-in user rated your story highly, and that same user has also rated other stories highly in unrelated categories. As long as that one user is all the rec system has to work with, it's going to recommend the things that user liked.

Meanwhile, you might be getting plenty of other votes from anonymous readers, but unless the system is able to link them to votes on other stories, that doesn't help the rec system.

Eventually, as more logged-in users rate your story, the rec system gets a better picture of what's "similar". But at the start, it can be strongly influenced by one or two people's preferences.
 
I *think* it might be similar phrases? :confused:

I often get links to my other stories in the similar stories section.
 
This is interesting because mine have always matched up pretty well. Then again incest is fairly straight forward.
 
I assumed it was done with tags. Some pretty wild comparison on mine, though. (And some hitting a bit to close to home--matching up with a story in an alt account.)
 
I assumed it was done with tags. Some pretty wild comparison on mine, though. (And some hitting a bit to close to home--matching up with a story in an alt account.)

It's been a while, but I did some checking via tags, and it didn't seem to mesh up. Very limited matches on the stories I checked.
 
I *think* it might be similar phrases? :confused:

I often get links to my other stories in the similar stories section.

A recommender system is likely to do that - not from any comparison of content, but because somebody who rates your stories highly is likely to follow the author profile and find your others, then rate them highly as well.
 
As usual, the algorithm generates the specious conjunction. Whatever goes in determines what comes out.
 
Just did some random checking with a few of my stories.

One was part of my "Magic of the Wood" series. They don't group as a "series" because the shared part of the title is at the end, but the entire Similar Stories section on the one I looked at was filled with other stories in the series and there was a lot of keyword sharing. All were in themed contests, and thus saw a lot of activity.

A second was a Mature story. In this one, every similar story was the same category, and I found matching tags of 3,1,2,3,2.

A third was a story in the Non-Human category. In this one, none of the similar stories were in the Non-Human category, and each story was in a different category. Not a single tag in any of the five similar stories matched mine.

Next, an incest story. All similar stories were in the same category, though the tags only matched at a rate of 2,0,0,2,0

Seeing a pattern developing, I checked a Sci-Fi&Fantasy story with nearly 200 votes. Only two similar stories showed up. Both were in the same category, and only one tag matched mine across both stories.

Again, following the pattern I see, I checked a Chain Story with low vote totals. Only one story had the same category as mine, and shared one keyword. It was actually another story in the same chain by a different author. One other story shared a single keyword from the four that remained.

Now jumping to the other end of the scale, I picked another high vote Mature story. Category matched three times, with a poor keyword showing of only 3 matches on a single story.

And finally, another incest story with nearly 3k votes. Again, all matched category, and there was a poor keyword showing of 2 stories matching 2 keywords each.

Conclusions:

  • Tags have nothing to do with it at all.
  • The more votes/views/favorites a story has, the more accurate the similar stories are ( at least with respect to category )
  • Quick skims of the stories that match category on a high stat story show strong similarity to the stories I was checking.

Final Thoughts:

The suggestion that a script which tracks users activity on a story and evaluates other stories where the user has activity and distills all the users down into a final list of five seems accurate.

The more activity your story has, the more likely you are to see a list of similar stories that are actually similar to your story.
 
Last edited:
Seems like system effort that could more helpfuly be put to other functional needs then. I'll have to say that some matchups made on my stories are laughably off.
 
Seems like system effort that could more helpfuly be put to other functional needs then. I'll have to say that some matchups made on my stories are laughably off.

From what I see ( other than a few exceptions such as my Wood series ) most anything that doesn't pull down at least a couple hundred votes is going to have a miserable "similar" listing.

To get anything really accurate, you're looking more in the range of 800+ votes.

Seems like the "similar stories" system could benefit from a multi-tiered system where the stories under a certain number of votes work off a combination of category and keywords rather than user activity.

However, mulling that over in my head, it's ticking a lot of "Crap, that's going to be a lot of work and server tics" alarms.

So, we're probably stuck with crappy similar stories list on stories that don't hit the high vote totals.
 
Here's what I've learned so far

Take lovecraft68's story "Allison Scores First" for example. All the stories listed in the "similar stories" are Eternal_Midnight's favorites.
 
Back
Top