Gender Guesser

PBAnnie

Really Experienced
Joined
Feb 9, 2005
Posts
128
About Gender Guesser. In 2003, a team of researchers from the Illinois Institute of Technology and Bar-Ilan University in Israel (Shlomo Argamon, Moshe Koppel, Jonathan Fine, and Anat Rachel Shimoni) developed a method to estimate gender from word usage. Their*paper*described a Bayesian network where weighted word frequencies and parts of speech could be used to estimate the gender of an author. Their approach made a distinction between fiction and non-fiction writing styles.

A simplified version of this work was implemented as the*Gender Genie*(no longer available). They showed that fewer words were needed and that writing styles varied based on the forum. For example, fiction and non-fiction differs from blogs (informal writing). Even though the genres differ, there are still gender-specific word frequencies.

In case anyone is looking for ways to make their writing appear to be written by the opposite sex, this tool and various papers written on the subject may be useful to you.

http://www.hackerfactor.com/GenderGuesser.php#Analyze

I tried it on something I wrote and it says it doesn't know what I am. Oh well...
 
Okay, so when I got to my laptop I decided maybe I'd, just for the hell of it, read the instructions on the page.

For fiction, use the "Formal" analysis.

So, "Tad's" output in chronological order, trying to be fair about who contributed to what:


Honeymoon with Mom - authors P, D and J
Genre: Formal
Female = 2263
Male = 1900
Difference = -363; 45.64%
Verdict: Weak FEMALE

Mothers Gone Wild - authors P, J and D
Genre: Formal
Female = 3483
Male = 2629
Difference = -854; 43.01%
Verdict: Weak FEMALE



The Mom Next Door - authors P and J
Genre: Formal
Female = 2759
Male = 2466
Difference = -293; 47.19%
Verdict: Weak FEMALE



Settling the Score - authors J and P
Genre: Formal
Female = 4923
Male = 3868
Difference = -1055; 43.99%
Verdict: Weak FEMALE

Hostile Makeover (in progress) - authors J and P
Genre: Formal
Female = 3200
Male = 2125
Difference = -1075; 39.9%
Verdict: FEMALE
Weak emphasis could indicate European.
 
Last edited:
Interesting. I got different results for different stories, which I guess is good, because it means my writing is adaptable. I was glad to see that the story I'm working on in first person female POV indicated "female."
 
The gender guesser has been around for a while. When I played with it before, it got mine right about half the time, maybe less.
 
" It also looks for weak emphasis -- used to distinguish European English from American English. In general, if the difference between male and female weight values is not significant (a "weak" score), then the author could be European. This is because the weight matrix is biased for distinguishing genders in American English."

Who are these Europeans of whom they speak? The French? The Germans? The Italians?
 
A Girl on the Bus
Genre: Formal
Female = 1134
Male = 731
Difference = -403; 39.19%
Verdict: FEMALE

A Girl on the Bus Part 6 (lots of sex in this chapter)
Genre: Formal
Female = 5083
Male = 2761
Difference = -2322; 35.19%
Verdict: FEMALE

Garter Belts and Cigarettes (the closest I've got to stroke)
Page 1
Genre: Formal
Female = 5685
Male = 3859
Difference = -1826; 40.43%
Verdict: Weak FEMALE

Page 2
Genre: Formal
Female = 3908
Male = 2499
Difference = -1409; 39%
Verdict: FEMALE

Well there ya go. EBs slow burn intimacy = just like a woman. I'm actually quite pleased about that.
 
Last edited:
I followed the link, and I can't find any empirical support for the guesser. Is there any reason to believe it has any accuracy? If so, I can't find it.
 
I tried a bunch of clips from stories I have posted here and results were all over the place. Most common result for Formal and Informal was "Weak Female" which my partner had to make a smart-ass remark over.

Probably just too many variables (personality, culture, age, education, life experience) that shape someone for anything like this to be really credible. But it was fun to try it out.
 
Probably just too many variables (personality, culture, age, education, life experience) that shape someone for anything like this to be really credible. But it was fun to try it out.

There was a thread not too long ago from someone having trouble writing dialog. This gender writing research won't help anyone create new dialog, but some of the results described in the paper (linked in an earlier post) could help us write dialog that's more convincingly coming from a male or female. One thing they pointed out is that women are more likely to use singular pronouns (I, you, she) and men are more likely to use plural pronouns (they, us, we.)

One of the tables in that paper listed some of the strongest predictors for an author's gender. A bunch of it uses codes that that may be familiar to linguists, but I am not one of those and would need to do more research to figure out what they mean.
 
Last edited:
I ran two of my stories though, removing the "" as instructed. On both, they came out as follows.

Genre: Informal
Female = 10672
Male = 15294
Difference = 4622; 58.9%
Verdict: Weak MALE

Genre: Formal
Female = 10570
Male = 7645
Difference = -2925; 41.97%
Verdict: Weak FEMALE

As I read it, Weak means the guess soft wear isn't positive. I'm not sure what the difference is between the formal part of the writing and informal, I mean how it determines that, but it tells me, I'm aparently a morphodite, female and male at the same time.
 
I checked out the article regarding the study.

It's interesting, but I'm not sure it has any meaningful validity.

1. The sample size seems low to me. 132 fiction stories by males and 132 by females.

2. It's not clear how they selected the articles. Without confidence in the manner in which articles were selected it's hard to have any confidence in the results.

3. I wish there was more analysis of what the supposed differences are. It indicates that women are more likely to use pronouns and men are more likely to use determiners (or articles, like "a" and "the"). But this could be a result of the sampling reflecting a bias, i.e., what if they just happened to choose a disproportionate number of articles by women with a lot of pronouns. I suspect it makes a big difference whether the story is told in first or third person.
 
I ran several more stories, the results are almost always the same, thought male and female flip flop between formal and informal. It never gives me just Man or Woman, but weak man or weak woman.

As to the using the woman, or a woman or the man or a man compared to he or she, I try to switch that about so it isn't to repetitive. There is nothing more boring than every sentence to start with, He, She, I, We.
 
There was a thread not too long ago from someone having trouble writing dialog. This gender writing research won't help anyone create new dialog, but some of the results described in the paper (linked in an earlier post) could help us write dialog that's more convincingly coming from a male or female. One thing they pointed out is that women are more likely to use singular pronouns (I, you, she) and men are more likely to use plural pronouns (they, us, we.)

One of the tables in that paper listed some of the strongest predictors for an author's gender. A bunch of it uses codes that that may be familiar to linguists, but I am not one of those and would need to do more research to figure out what they mean.

And that's fine, if it helps someone, that's all good. I probably shouldn't have said "credible." "Universal" would have been better.
 
I threw five stories into it. Three were mine and two were by female writers where I was the editor on the story.

On all three of mine, in the "informal genre" they all came up "weak male."

With the two female writers in "informal," both came up male!

I'll have to talk with my friend Eva_Adams about that :)
 
I followed the link, and I can't find any empirical support for the guesser. Is there any reason to believe it has any accuracy? If so, I can't find it.

If you view the source code for the page (Ctrl-U), you can see what it's doing. This is the meat of it:

// positive=male, negative=female
var DictionaryInformal = new Array();
DictionaryInformal['actually']= -49;
DictionaryInformal['am']= -42;
DictionaryInformal['as']= 37;
DictionaryInformal['because']= -55;
DictionaryInformal['but']= -43;
DictionaryInformal['ever']= 21;
DictionaryInformal['everything']= -44;
DictionaryInformal['good']= 31;
DictionaryInformal['has']= -33;
DictionaryInformal['him']= -73;
DictionaryInformal['if']= 25;
DictionaryInformal['in']= 10;
DictionaryInformal['is']= 19;
DictionaryInformal['like']= -43;
DictionaryInformal['more']= -41;
DictionaryInformal['now']= 33;
DictionaryInformal['out']= -39;
DictionaryInformal['since']= -25;
DictionaryInformal['so']= -64;
DictionaryInformal['some']= 58;
DictionaryInformal['something']= 26;
DictionaryInformal['the']= 17;
DictionaryInformal['this']= 44;
DictionaryInformal['too']= -38;
DictionaryInformal['well']= 15;

Every time you use a word from that list, it checks against that "dictionary" and if it finds a match, it adds the appropriate score to either the maleness or the femaleness score. Every "well" adds 15 to the maleness, every "too" adds 38 to the femaleness, and so on. Then it combines those totals to get an overall score.

The page states what it's based on: "In 2003, a team of researchers ... developed a method to estimate gender from word usage. Their paper described a Bayesian network where weighted word frequencies and parts of
speech could be used to estimate the gender of an author."

Basically: men use some words more often than women do, and vice versa, with the scores above being based on those discrepancies.

I suspect this kind of method would be confounded by genre: e.g. if you trained something like this on newspaper articles, and it so happened that most sports articles were written by men and most parenting articles by women, then you'd end up with something that flags sports as "male" and parenting as "female".

In the end, it's a bit better than tossing a coin, but not enough so that you'd want to use it for more than entertainment purposes.
 
In the end, it's a bit better than tossing a coin, but not enough so that you'd want to use it for more than entertainment purposes.

It's a small contribution to traffic analysis, that's where the money is. If you think the CIA is watching you (or the Chinese), just one more factor to consider when you anonymise your content.
 
Here's another article along the same theme of which words are more likely to be used by women vs. men:

https://languagelog.ldc.upenn.edu/nll/?p=13873

The sample data used in this article comes from recorded conversations instead of written works, which may be more useful in writing appropriately gendered fictional conversations, like we try to do in our stories. This dataset may not be any better or worse than that used in the other paper I linked; who were the partipants, what were their backgrounds, education level, English proficiency, geographical location, etc. The dataset also captures a particular snapshot in time (2003) and specific word usage can grow or decline fairly quickly, which you may consider to be either groovy or rad, depending on your decade.

There is a link near the end of the article to the whole list of words they evaluated along with how often that word was used and a relative gender score for each word.

All I need now is a gender thesaurus to be able to pick words that may be more appropriate for a given character in a story. I'll keep looking for that mythical device...
 
All I need now is a gender thesaurus to be able to pick words that may be more appropriate for a given character in a story. I'll keep looking for that mythical device...

Their greatest sex difference is in laughter.

Men don't laugh 99.4% of the time.
Women don't laugh 99.2% of the time.

I assume they've detected a statistically significant difference between men and women, large numbers can do that. But, when you partition the variance, a reality test, the difference is vanishingly small, it has no practical application.

That's what I was taught 50 years ago, when men were better at maths than women, partition the variance, the difference between the mathematical ability of men and women vanishes.

Don't expect this research to yield anything you could usefully use in your writing.
 
Interesting link.

I'm often interested in writing differences between men and women; if one gender can pass off as the other without anyone knowing.

That conversation is more prominent with sex scenes. Some women laugh at men writing sex. Others say you can't tell. That's in reference to mainstream novels, rather than sex stories.

I ran 3 stories I'm working on in that site and got mixed results; some weak female, some weak male.
 
I deliberately chose long sex scenes from each book to submit to the analyzer. I wonder if long dialogue passages would yield way different results.
 
I'm often interested in writing differences between men and women; if one gender can pass off as the other without anyone knowing.

It would be interesting to do a kind of male/female "Turing test" where men and women deliberately attempted to write like someone from the opposite gender and to see if they could fool readers. My guess is they could, and it wouldn't be that difficult. My guess, too (and all of this is just silly guessing) is that women writers could impersonate men writers better than vice versa. It's not hard to sound like a man.
 
Back
Top