bots writing stories IV

astuffedshirt_perv

Literotica Guru
Joined
Jun 22, 2002
Posts
1,325
Rather than resurrect the old thread, here we are 7 months later with a newer iteration of writing bots.

OpenAI just released GPT-3. You can read samples at this link. Full paper available here. Or search Github or reddit for GPT-3.

Some lines from the executive summary (emphasis added by me):
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. ... Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. ...Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans.

Good news, I guess, is that horsepower needed to make this go is beyond most home hobbyists. Bad news, I guess, is that giving Amazon this kind of tool could end the self-publishing game. Amazon already tracks kindle romance novels down to which paragraph the characters should first kiss.
 
I will be "concerned" (I guess) when someone demonstrates to me that people are willing to pay money to purchase a bot-written story to a degree that threatens sales by humans. Perhaps that day is not far off, but I see no evidence that we are there yet.
 
Those links are meaningless though, because they don't show the raw content from which the bot compiled its narrative. For all we know, it's just applying cut and paste from pre-existing content previously published by the school (using the first example), and applying a correct set of grammatical rules.

Show me a bot writing original content using only a dictionary as its raw content and I might pay attention. Until then, isn't it just a better grammar check program that still needs human raw content to work from?
 
Rather than resurrect the old thread, here we are 7 months later with a newer iteration of writing bots.

OpenAI just released GPT-3. You can read samples at this link. Full paper available here. Or search Github or reddit for GPT-3.

If the samples are typical of its output, this is impressive. These are vastly more coherent than previous iterations, and much harder to pick as machine-generated.

From the paper:

"On text synthesis, although as a whole the quality is high, GPT-3 samples still sometimes repeat themselves semantically at the document level, start to lose coherence over sufficiently long passages, contradict themselves, and occasionally contain non-sequitur sentences or paragraphs. We will release a collection of 500 uncurated unconditional samples to help provide a better sense of GPT-3’s limitations and strengths at text synthesis. "

One of the samples:

RACISM IN BRITAIN HAS BECOME SO entrenched that it is becoming a bigger problem than it was at the time of the Birmingham pub bombings, when the threat of violent nationalism in the late 1970s prompted the first official inquiry into racial attacks, according to a report published today by the Equality and Human Rights Commission (EHRC).

The report, We’re still struggling: the legacy of the Stephen Lawrence inquiry, says that despite years of measures to increase the representation of ethnic minorities in public life and protect them from discrimination, they are still “on the receiving end of racism”, with black people twice as likely to suffer police stop and search tactics as whites.

Figures from the Crown Prosecution Service (CPS) show that more than 40 per cent of cases involving charges of racially aggravated harassment are dropped, with charges reduced to more minor ones, or not pursued. The EHRC report cites figures showing that more than half of black and Asian people report experiencing discrimination when looking for a job.

Stephen Lawrence was a Black British teenager murdered in a racist attack, so the report title is very credible in this context. The only wrong note I spotted was the reference to the Birmingham pub bombings. These were conducted by the PIRA, so they're a bit of a non sequitur in a discussion about racial attacks. The UK did have a string of racially motivated bombing attacks, and I suspect the common theme of "terrorist bombing" and perhaps also police misconduct led GPT to think this would fit here.

It still falters when logic is required:

A positively charged particle has an excess of protons compared to the number of electrons. A negatively charged particle has an excess of electrons compared to the number of protons. The rest of the charged particles can be either positively or negatively charged depending on the number of protons and electrons that they have.

The first two sentences there are correct, and wouldn't be out of place in a physics/chemistry textbook. But the third doesn't fit - there isn't any "rest of the charged particles", positive and negative are the only options.

With a whopping 18 percent ABV, Triple Bock is not exactly what you’d call a session beer. That’s fine, though, since its flavor is so well-balanced that it can stand up to even the most robust food.

So what’s in this beer that makes it so strong? ...

At a mere $10 to $12 a bottle, it’s a great choice for an after-dinner drink, or maybe as an alternative for a low-alcohol beer (this would be a great choice for an alcohol-sensitive friend at a party).

18% alcohol by volume is not a "great choice for an alcohol-sensitive friend".

This sort of thing is an area where AIs often stumble. Phrases like "strong", "low-alcohol", "alcohol-sensitive", and "whopping 18 percent ABV" are related concepts, all connected to alcohol content, and AIs can recognise that. But they struggle to distinguish between synonymous and antonymous concepts.

I'm thinking I need to work on the main letters first and then do the banners. I'm going to be looking for about 5 letters that say "C O A" and 4 letters that say "S L E P"

..."C O A" is not 5 letters.

We met James when he was in his mid-thirties, a broke writer who married his childhood sweetheart, but felt that something was missing. At 38, he had a breakdown, left his wife and took up with a younger woman. Then he had a midlife crisis.

Doesn't recognise that "had a breakdown, left his wife and took up with a younger woman" is already a midlife crisis.

In 1995, the residents of Mexico had a choice to make. An election was held between a political novice and a political veteran. And that November, the people chose a newcomer, Ernesto Zedillo Ponce de León, as their president. The people had a choice. They could have decided that the government was a mess and the country needed a different leader. But that's not what they did.

Aside from an error in the data (should be 1994), the individual statements in this paragraph are all more or less correct, but as written they form a confusing non sequitur. Normally we'd think that choosing a novice over a veteran is a change of direction.

The historical context is that while Zedillo himself was a newcomer, his party (PRI) had held power continuously since 1929. In the context of "the party that had been in power for 65 years got in again", then the last two sentences of GPT-3's version make sense. But GPT doesn't understand that the "newcomer" angle doesn't support the conclusion of the paragraph.

Marengo Cave, sometimes referred to as the "Cave of the Winds" is a cave in Marengo, Iowa, United States. ...

In 1901, a local banker named Frank Smith purchased the cave for $4500. He spent the next three years repairing the cave, expanding it, and bringing in artifacts. The cave is named after his wife, a relative of Napoléon Bonaparte.

...not after the county of the same name where it's located?

Warm air blanketed much of the United States, setting record-breaking highs across the country in some areas, the National Weather Service reported... as I have been pointing out in my weather report, the cooling trend has been continuing, and may have entered a new stage.

Currently, Microsoft is selling its console for $399, which is about $100 more than the PS4.

Now it may sound like a lot, but it's really not that much for a quality gaming system. I get that Sony's product is more expensive, but there's a value to the higher price point in terms of the ability to play certain games.

More examples where it's struggling with synonymous vs. antonymous concepts: the Microsoft is "$100 more" than the [Sony] PS4, but "Sony's product is more expensive".

I am really impressed by what they've done with this. I'm still not convinced that it has what it takes to write a complete story that makes sense.
 
Those links are meaningless though, because they don't show the raw content from which the bot compiled its narrative. For all we know, it's just applying cut and paste from pre-existing content previously published by the school (using the first example), and applying a correct set of grammatical rules.

Section 2.2 of the paper discusses what they used for training.
 
Section 2.2 of the paper discusses what they used for training.
It's still working on previously written content though, isn't it? It might be writing grammatically sound content, but how is it original content? Isn't it just distilling human originated content?

What does it do with a blank piece of (digital) paper?

It might be a fun academic exercise in software programming, but what's the application? Is there a point, something useful out the other end? I'm probably not thinking about it hard enough, I acknowledge that, but what does it DO apart from being clever?

Not that I care, because I know no bloody bot will ever understand the importance of buttons ;).
 
It's still working on previously written content though, isn't it? It might be writing grammatically sound content, but how is it original content? Isn't it just distilling human originated content?

What does it do with a blank piece of (digital) paper?

That's a tricky philosophical question. Are there any humans who've learnt to write without being strongly influenced by previously written/spoken content? I think most us would make some sort of distinction between work that's merely a remix of previous content with details changed, and fictional work that is merely influenced by previous content, but where does one draw that line?

My understanding is that GPT-3 is far more sophisticated than taking a single story from its training set and just changing the details in a grammatical sort of way. It has an ability to learn structures at multiple levels and then produce new text that uses similar structures, although not for the same reasons that a human writer might.

It might be a fun academic exercise in software programming, but what's the application? Is there a point, something useful out the other end? I'm probably not thinking about it hard enough, I acknowledge that, but what does it DO apart from being clever?

There are some obvious malicious applications - impersonation, fake reviews, political bot accounts, that sort of thing. I can think of people who would love to be able to generate (say) thousands of fake emails written in the style of a political candidate, especially if they can require it to include some particular content, which I think is possible with these kinds of methods. Some of those people are probably working on this kind of capability already, so the public GPT work is useful as a warning of what may be possible, even though it also risks giving bad people ideas.

I am more pessimistic about non-malicious applications. Those usually involve trying to express some kind of truth, and AFAICT the concept of truth isn't really built into this - it's a bit like listening to a bright student trying to bluff their way through a report on a book they haven't read.

Meanwhile, this uses very different methods, but might be helpful for writers who want something to jog the creative processes: https://thispersondoesnotexist.com/
 
I am more pessimistic about non-malicious applications. Those usually involve trying to express some kind of truth, and AFAICT the concept of truth isn't really built into this - it's a bit like listening to a bright student trying to bluff their way through a report on a book they haven't read.
That's a good analogy.

When the programmers teach the bot to read a book and then write a coherent review, then I might be impressed. Right now, though, it seems to be a very elaborate party trick. Like an extremely good cover band who does note perfect copies of the original act, but doesn't know how to write a song.
 
That's a good analogy.

When the programmers teach the bot to read a book and then write a coherent review, then I might be impressed. Right now, though, it seems to be a very elaborate party trick. Like an extremely good cover band who does note perfect copies of the original act, but doesn't know how to write a song.

To stretch the analogy a bit further, this isn't simply doing covers. If it were a musician, I'd expect it to be able to write its own four chord song.
 
For me, it would be great to have a program where I can upload my data, feed it with some background information, results and conclusions, and to have it writing my articles in a consistent way for me. It would save me a lot of time, not having to worry about spelling and grammar, sentence structure, and such. It would save me even more time, not having to edit the work of students and less experienced post-docs. I (or my boss) might be willing pay quite a bit of money for such a program.

And I can imagine how companies like newspapers (or better, news-sites; who reads papers these days) would be very interested in being able to rapidly chunk out large pieces of text, based on input from people in the field (or even better, based on data obtained by bots, scanning the internet for hot, tending news-topics), without too much editing. And if they can tweak the program to write in different styles, e.g. focused on different social, political, ... groups, and imagine they'd auto-translate their material (not unthinkable...), then they could cater almost the whole world with only a few mouse-clicks.
Even those scenarios require raw data, though, which needs to be written from scratch. What you describe might be a very fancy grammar tool and a compiler - electronic editors, if you like, and yes, I get that. But the original content still has to be generated somehow - so these programs aren't generating original writing. They're not creative. It might be splitting hairs, but it's an important hair to split.

I certainly don't feel threatened as a writer of erotica. As Simon will attest, only EB could come up with Suzie, and only Simon could write her into a Daddy Bear story. And Simon is definitely the only one perverse enough to come up with "erotic" fish. It takes creativity to do that ;).
 
For me, it would be great to have a program where I can upload my data, feed it with some background information, results and conclusions, and to have it writing my articles in a consistent way for me. It would save me a lot of time, not having to worry about spelling and grammar, sentence structure, and such. It would save me even more time, not having to edit the work of students and less experienced post-docs. I (or my boss) might be willing pay quite a bit of money for such a program.

It would indeed. I'm not sure if GPT is the right tool for that.

If you're writing something that has essentially the same structure every time, and you just want to change the names and numbers, GPT is overkill. There are already solutions that will do that kind of thing for you with much less computational overhead.

If you want something that writes intelligent analysis of the information you supply to it, GPT could quite likely generate something that has the right structure, and which looks like a well-written report on casual inspection. But it might not make sense when you read closely, because it's not based on any understanding of the truths that it's trying to describe - just "this is how humans put words together". As we can see, it still struggles with the difference between low- and high-alcohol beer.

(That said, some humans are so bad at communicating that it's not impossible GPT could do better...)
 
I think it would be cheaper and much more potentially successful to give one thousand monkeys typewriters. Shakespeare here we come!
 
Like an extremely good cover band who does note perfect copies of the original act, but doesn't know how to write a song.

I would contend that an 'extremely good cover band' of silkstockinglover would do very well on Lit, and that a good good cover band of romance novelists would do very well on kindle. There would still be room for true breakout writer with novel ideas for a...eh...novel, but little space for an aspiring writer. While we may value the joy of hearing new music, covers get the crowd going.
 
From MIT Technology Review "I used an algorithm to help me write a story. Here’s what I learned". In this case the author used stats from other sci fi stories to write a story that matched those stats. Seems like tying this to a text generator would be pretty straight forward. Closing paragraph:

An “algostory,” or any use of computation that goes inside the creative process, exists in a consciously eerie space between engineering and inspiration. But that eerie space is increasingly the space we already inhabit. Software can recast your photograph through an infinity of filters or swap out parts of the picture for others at the click of a button. It can generate images that look convincingly like the paintings of any era you choose. Now machines are encroaching on everyday language. The quality of predictive text forces a literary question on us every time we pick up a phone: How predictable are human beings? How much of what we think and feel and say is scripted by outside forces? How much of our language is ours? It’s been two years since Google’s voice technology, Google Duplex, passed the Turing test. Whether we want it or not, the machines are coming. The question is how literature will respond.
 
From MIT Technology Review "I used an algorithm to help me write a story. Here’s what I learned". In this case the author used stats from other sci fi stories to write a story that matched those stats. Seems like tying this to a text generator would be pretty straight forward.

Reading through the article, there seem to be two different facets of AI involved.

One is producing metrics on things like abstractness, number of adverbs, % of text which is dialogue, etc. etc., and encouraging the author to write towards some target rating for each of these. I'm not convinced this is actually a very helpful thing to do, but assuming for the sake of argument that it is - I expect something like GPT-3 would do pretty well on this, given the right training corpus, and it probably wouldn't be hard to tweak it to get even closer to the target numbers.

The other is plot generation. On that:

From the canon of stories that I’d provided, SciFiQ offered two plot instructions that seemed incompatible: the story had to be about a foreign planet, and it also had to take place on Earth. It took months to make sense of that, but eventually the premise of “Twinkle Twinkle” came to me. The story would involve people on Earth looking, through elaborate machines, at a distant planet. I never would have come up with that myself.

...

For “Twinkle Twinkle,” Hammond took the topic modeling output and converted it into manageable narrative rules. (For example: “The story should be set in a city. The protagonists should be seeing this city for the first time and should be impressed and dazzled by its scale.”) For “Krishna and Arjuna,” I went under the hood myself. The algorithm’s topic modeling process produced word clouds of the most common themes (see below).

This story is suspiciously light on detail on how SciFiQ produced these plot instructions, and on how much detail it actually gave. An AI that actually understands the meaning of the content of a story would be a huge achievement, with far-reaching ramifications. The fact that the modeling output had to be "converted into manageable narrative rules" by an English professor also leaves me wondering how much is actually the machine here and how much is the professor.

But even taken at face value, what's described here isn't actually plotting. It's just a set of prompts. Here's an example of the word cloud that it generated:

JA20_fiction_wordcloud-reverse.jpg


Nothing in there even remotely resembles a plot. It's a bunch of things that could be put in a plot, but as the author acknowledges, it took months of human thought to figure out how to turn the computer's prompts into a story.
 
But even taken at face value, what's described here isn't actually plotting. It's just a set of prompts. Here's an example of the word cloud that it generated:

Nothing in there even remotely resembles a plot. It's a bunch of things that could be put in a plot, but as the author acknowledges, it took months of human thought to figure out how to turn the computer's prompts into a story.

I think the idea of AI writing fiction is way overblown at this point. Obviously, it could change, but I think that sort of change would require a development of something uncomfortably close to consciousness. That's not to say we won't get there, but I doubt it's imminent.

But the real reason for this comment is that I'm jealous of your word cloud. It's easier to view than the ones from the generator I use because the different colors help. (If I'm being honest, I really like it because it's prettier.) Would you mind posting a link?
 
I think the idea of AI writing fiction is way overblown at this point. Obviously, it could change, but I think that sort of change would require a development of something uncomfortably close to consciousness. That's not to say we won't get there, but I doubt it's imminent.

But the real reason for this comment is that I'm jealous of your word cloud. It's easier to view than the ones from the generator I use because the different colors help. (If I'm being honest, I really like it because it's prettier.) Would you mind posting a link?
Have you tried Worditout? It generates many different versions, counting the most frequent hundred words. If you don't like a view, it regenerates another one in a second or two.
 
Have you tried Worditout? It generates many different versions, counting the most frequent hundred words. If you don't like a view, it regenerates another one in a second or two.

That's the one I currently use. I think the different colors in the one Bramble used make it easier to notice the smaller words.
 
But the real reason for this comment is that I'm jealous of your word cloud. It's easier to view than the ones from the generator I use because the different colors help. (If I'm being honest, I really like it because it's prettier.) Would you mind posting a link?

It's just an image from the article that astuffedshirt_perv linked to - I don't have a generator, sorry.
 
If you agree with the statement (or sentiment) that “everyone has a good book in them” which arguably is factually true (everyone’s life story is a book) the fact is that not everyone knows how to write or even wants to write. Those who want to write have tools these days to help.

But re this idea of ‘bots’ needing material to work on - we have the tools so assuming a person is willing to share, a good conversation (or good chat show programs like Parkinson or dare I say Piers Morgan (whose Ego needs no support) but he is effective at getting worthy life stories out of celebrities).

The buzz word these days is ‘content’ so there is scope to get a lot more of it from humans that a bot can usefully use. It’s the same concept as Machine Learning.

But to EB’s and Eon’s point at what stage does a bot come up with original ideas and we end up with surprising stories? That might crack or prove another statement which is that there is only a set number of distinct types of story plots to be told {I forget the number but I think it’s about 7} and if that also being the case would true AI change that number or is it more like physics in that these are immutable truths (though truth as humans record it).

Brutal One

Edit:-

The 7 plot types are:-

Overcoming the Monster

Rags to Riches

The Quest

Voyage and Return

Comedy

Tragedy

Rebirth

I have not really thought about in Lit terms but would all Lit’s stories fall under one of these 7? Maybe the topic for another thread
 
Last edited:
I have not really thought about in Lit terms but would all Lit’s stories fall under one of these 7? Maybe the topic for another thread

Kev's was a Rags to Riches story of a young man who couldn't get laid in a whorehouse. Then he went on The Quest it started out as a Tragedy that became a Comedy in Overcoming the Monster he discovered something. Gary helped, it was a Rebirth his heterosexual Voyage out and Return as a leather daddy.

As to the original topic. Since they can't even sell reliable voice recognition and typing software, we are all probably...

You guys are probably safe for a while. Our tripe can be emulated, however.
 
Back
Top