Similar Stories Window

SimonDoom

Kink Lord
Joined
Apr 9, 2015
Posts
15,734
Does anyone know exactly how stories are selected for the "Similar Stories" window at the end of each story? Is it based on the use of similar tags? Something else? I think it must be something else, or something in addition, but I can't quite figure it out, because:

a) the stories in the window tend to have good ratings, so low-rated stories appear to be more likely to be excluded, and

b) the window seems to pick up stories with similar subject matter even if the subject matter isn't fully captured in the tags. For example, a mom on son's lap story seems likely to show other similar stories in the similar stories window even when the tags don't reflect the subject matter that specifically. So I wonder if somehow the text or title are searched.

Anyone know? Or is this another part of the Site's secret sauce, like sweeps?
 
I always assumed it's a mixture of category, tags and good rating.

That would at least make sense as the intention would be to keep people on literotica and offering good alternatives should support that goal.

But how these factors are weighted is a question I can't answer. I get the impression that certain tags/categories carry more weight than others, especially the rather exclusive ones (incest, non-con, gay, etc.) but that's not based on solid research.
 
Usually recommender systems are based on a "readers who liked X also liked Y" type of algorithm.

A quick and dirty method would look something like: select all the identifiable readers who loved the story (i.e. 5 stars and/or favourited), look at which other stories those same readers loved, and recommend the ones that show up most often. There are ways to refine it, but that's the basic idea.
 
Usually recommender systems are based on a "readers who liked X also liked Y" type of algorithm.

A quick and dirty method would look something like: select all the identifiable readers who loved the story (i.e. 5 stars and/or favourited), look at which other stories those same readers loved, and recommend the ones that show up most often. There are ways to refine it, but that's the basic idea.

That sounds like an effective system. But doesn't the list come up when the story first appears? Nobody has had a chance to vote on it yet then. So at least the initial list must be based on tags or key words or something like that.

Perhaps the lists are adjusted later as votes come in. Are the lists in fact changed from time to time? If not, then a story's list could only ever contain stories older than the story itself, which I suspect is not the case.
 
Sometimes there will be a most strange match that comes up in 'Similar Stories'.

For example, I wrote a Romance story for Christmas called 'Take Cover From Tracy', which is about a young couple who are caught up in Cyclone Tracy which completely destroyed the Australian city of Darwin at Christmas 1974. One of the similar stories listed was a Celebrity story called 'Hayden Panettiere's Feet'.

And while the lead female character is a very pretty blonde like the actress who is the subject of the Celebrity story and the girl in my romance story is wearing flimsy shoes and obviously concerned about cutting her feet in the debris during and after the cyclone, how these two stories were linked as similar was a mystery to me.
 
Usually recommender systems are based on a "readers who liked X also liked Y" type of algorithm.

A quick and dirty method would look something like: select all the identifiable readers who loved the story (i.e. 5 stars and/or favourited), look at which other stories those same readers loved, and recommend the ones that show up most often. There are ways to refine it, but that's the basic idea.

This makes some sense. It would explain why (a) the recommended "similar" stories usually have high scores (because people favorite well-scored stories) and (b) why most, but not necessarily all (I checked) of the recommended similar stories are in the same category. Based upon what I can see the "tag" theory doesn't hold up.

The flaw to this theory is Hector Biden's point that the list is generated immediately, before one's story can have been favorited by many people. Does anyone know if the similar stories lists change over time?
 
This makes some sense. It would explain why (a) the recommended "similar" stories usually have high scores (because people favorite well-scored stories) and (b) why most, but not necessarily all (I checked) of the recommended similar stories are in the same category. Based upon what I can see the "tag" theory doesn't hold up.

The flaw to this theory is Hector Biden's point that the list is generated immediately, before one's story can have been favorited by many people. Does anyone know if the similar stories lists change over time?

So I just checked three stories on today's New List and two of my older stories.

For the new stories, the Similar list does seem to have been based on tags. Each "similar" story had one (or rarely two) tags identical to the tags of the main story. However, they ranged across categories. I would say the algorithm goes something like this: pick one of the story's tags, pick a random story that has that same tag, repeat six times.

For the old stories, some of the "similar" stories were dated later than the stories themselves, so the lists must have changed since the stories were originally published. The lists were not based on tags because none of the "similar" story tags matched any of the tags of the stories themselves.

One of the old stories was not very popular, and two of its "similar" stories were also stories of mine. This lends credence to Bramblethorn's idea, because maybe some of the few weirdos who liked it had also liked other stories of mine.

The other old story was my only story in SciFi/Fantasy, and all of the "similar" stories were in SciFi or NonHuman. This also lends credence to Bramblethorn's idea, as these categories I suspect have a dedicated and probably overlapping readership.

So I think they probably put up the initial similar story lists based on tags, and then eventually (and continuously?) modify them based on some more sophisticated method like Bramblethorn's. I don't think there's any human intervention, which is why the strange matches described by RetroFan occur.
 
My story Riddle of the Copper Coin is a mix of "realistic" F/F romance and fantasy adventure with more F/F romance. I could've posted it in either Lesbian Sex or SF/F; in the end I went with SF/F. Tags are: lesbian romance – erotic poetry – arabian nights – lesbian fantasy.

All five of the "similar stories" are in Lesbian Sex, not in SF/F.

Two of them have the "lesbian romance" tag; I think that's the only overlap with my tags. Another has "romance" and "lesbian first time".

The fourth has "office - work - boss - employee" and the fifth is "A Benign Something" which has no tags at all, as far as I can see. (It was posted in 2005 - possibly tagging wasn't around then?)

Checking the "similar stories" for those five, they show up pretty often in one another's recommends. There are recommends for both earlier and later stories.

So, what can we conclude from this?

It's clearly not just based on category or just on tags. Three of my recommends were for stories with no tag in common with mine, and none were in the same category. It would be possible to write a system that joins the dots between the Lesbian category and stories with "lesbian" in the tags, but implementing that across all categories would be a lot of work, so I don't think this is likely.

From the ones I've read, and from skimming the others, I think all of them are slow-burn romances, thematically similar to mine. I think it's pretty plausible that they'd appeal to similar audiences.

This is pretty consistent with what I'd expect from the sort of recommender system I mentioned earlier. It's a pity I didn't see what the recommendations were when it was new. Checking the latest story on the New Stories List, I agree with Hector that it looks to be tag-based at that stage (but not category-based).
 
I checked a few stories vis a vis the "readers who liked X also liked Y" idea.

There's no way to tell who voted for any particular story, but it turns out that you can get the whole list of readers who favorited a story by hovering your cursor over the "N" in "N other people favorited this story" at the end of the story. [To download the list right click on the "N", select "Inspect", and copy from the html code brought up in the sidebar.]

For four of the five stories I checked, it was indeed the case that the stories on the "similar" lists had also been favorited by readers who had favorited the stories themselves. For example, Bramblethorn's "Riddle of the Copper Coin" has a "fan list" of 23 readers. 7 of these were also fans of "A Benign Something" on RCC's "similar" list. The fifth story I checked is one I found from Jan 2017 that still has not been favorited. Its "similar" list is still based on tags.

This suggests that a story's "similar" list is based on tags until it is favorited, and then on "readers who favorited X also favorited Y." The lists are presumably updated from time to time since they often show stories that are newer than the story itself. It's not clear what refinement is done, although there does seem to be some effort to choose stories that have a wide fan overlap (e.g., 7 fans in common for RRC/BS).

One thing to keep in mind is that readers can differ considerably in terms of the selectivity of their recommendations. Some readers have relatively short favorite lists. They're presumably more discriminative, and so the stories on their lists might really be expected to have some similarities in terms of theme, tone, and quality. On the other hand, some readers have thousands of favorites, and saying that any two of them are similar is sort of like saying that they're similar just because they both appear on Literotica.

I imagine this is an unavoidable problem in recommender systems. It's mitigated somewhat for more popular stories. One of my stories has only one "fan," and all its "similar" stories were apparently chosen at random from his list of 3.5k favorites. Probably not all that helpful from a reader's point of view. On the other hand, 2 of the 7 RCC/BS fans were fairly discriminative (favorite lists of 33 and 100 stories). So it's not unreasonable to think that someone who has just read and liked RRC might find BS a worthwhile suggestion, despite the differences in category and theme.

Bottom line: the "similar" lists are valuable from a reader's point of view. At worst they mean that at least one reader (although perhaps a wildly eclectic one) liked both stories. At best they mean that several readers did. I've certainly found stories I've enjoyed from "similar" lists.

They're also valuable from an author's point of view as advertisements. The way to get your stories onto other stories' "similar" lists would seem to be to write stories that are popular and to develop your fan base.
 
Back
Top