Emotions (and lack therof)
Jonathan: You [Sean and a different friend] communciate in Jane Austen-type missives, we [Sean and I] communicate in grunts?
Sean: haha not quite that bad.
Jonathan: I know, exaggeration. Yeah, that makes sense. I’m not sure I’ve ever really talked to you about emotions or that sort of stuff.
Sean: Nope :)
Sean: Fun-looking emacs error I just got: “Error in post-command-hook: (error Lisp nesting exceeds `max-lisp-eval-depth’)”
Jonathan: BTW, since we switched to jabber on athena[2010-09-12] , “happy” has occurred three times in conversation between us. Once in “happy birthday”, once in reference to someone else, and once in reference to future Sean earlier. We really don’t talk emotions :P
Sean: I shudder to think of the parentheses then.
Jonathan: Prime example just there :P
Jonathan: Yeah, that’s in 120KB of text logs. That’s less than 1 in 10000.
Jonathan: Wiktionary reckons 1 in ~2300 words in TV scripts is “happy”. [...]
Sean: This means we have a qurater of the emotional capacity of the average person.
Sean’s estimate of emotional capacity is obviously facetious (that’s my story and I’m sticking to it), but I feel duty-bound to point out a couple of things about the use-of-happy ratio:
- I miscalculated the frequency for us. Given that happy including trailing space, is 6B (my logs are UTF-8 encoded), 3*6B / 120KB is actually closer to 1 in 7000.
- The (correct) 1 in 7000 figure is a fraction of the logs as a whole, not their actual conversational content, so the 120KB includes an average overhead (if Sean speaks as much as I do) of 21 bytes/line of timestamp and name. Looking at the complete log of this conversation, which is probably fairly typical, suggests that this overhead is about 1/4 the total size, i.e. our estimate is now ~1 in 5000 bytes in conversation. I don’t know the exact methodology Wiktionary used, but if they were trying to get frequencies in spoken English, they probably ignored the directions from scripts and just used the speech, so 1 in 5000 is closer to a like-for-like comparison to our 1 in 2300 figure.
- Wiktionary’s count was word-wise, mine was byte (~letter)-wise. A comparison needs to take into account the length of “happy” compared to a typical English word. So, I took the complete text of Little Brother by Cory Doctorow, and encoded it in UTF-8, the same as my logs. It came to 613425 bytes, and OpenOffice.Org Writer’s wordcount function said 111825 words. So actually, our 6 bytes of UTF-8 is a pretty typical English word, and we probably don’t need to adjust the numbers much (certainly not for this back-of-the-envelope calculation) to make this comparison like-for-like.
- But Wiktionary’s figure is spoken English. Ours is IM’d English. This means several things, but most relevant is probably smilies. :) is 3 bytes of UTF-8, so half an average word by our estimate. But what’s the equivalent in spoken English? Is it nothing, because the information would be passed by body language if we were in the same room, or tone of voice on the phone? This is certainly true for a lot of :Ps, and probably many other uses. Or is it actually a few or several words, because we might say “that’s really cool”, or “that’s great”, or “That’s pretty good, but not brilliant.” (else we’d use :D on IM). My guess is that IM in general, and particularly smilies, will tend overall to be slightly compressed compared to spoken language, simply because we can talk faster than we can type. So, it would be 1 in more than 5000 if Sean and I were talking. Trying to quantify this is beyond both the scope of this post and my capabilities at 1am.
- I’ve been saying Wiktionary’s figure is spoken English. But it’s English as spoken in scripted TV. How does this affect the frequency of “happy”? My guess is that the frequency of “happy” on TV is lower than in real spoken English. Why? Because one of the first things that’s drilled into us when we start to write anything longer than single sentences is that we must find “interesting” ways to write things, and a TV writer has more of a chance to pick a different word that better tells us what he wants to say. In spoken English (particularly casual spoken English of the sort that’s likely to include “happy” etc.), we tend to use simple words, because we’re pulling them out of our heads and saying them within a fraction of a second, and because if more detail is needed, we can judge that and elaborate easily. So, “happy” is likely to be used in spoken English where TV English uses “thrilled”, “delighted” (or doesn’t say it because screen time is limited and it can be conveyed other ways). So, it’s likely that “happy” is one in less than 2300 words in normal spoken English.
Overall, I’d guess that the adjusted ratio of Sean’s and my use of “happy” to normal spoken English is somewhere between 1:2.5 and 1:5. But that’s the back of a mental envelope with no data at 1am, so feel free to disagree by several orders of magnitude.
If you need evidence of my horrific nerdery, though, it’s that I bothered to think about all this and point it out :P
 I can’t post the file I used, because I stripped out the Creative Commons license (which we can safely assume has an average word length far greater than that of English spoken by humans), so if you want to verify my results you’ll have to download your own copy of the book (which you should do anyway, and read for long enough to make you go out and buy a dead-tree copy) and do it yourself. That should still be close enough for the back of an envelope though.
 This is complete rubbish with “said”, the example that’s most drilled into us in school. We don’t normally notice “said” in writing, but it’s helpful if we need to work things out. So actually, it’s better in almost all circumstances than anything we were told to use in primary school, because most of the time, we want to know what’s being said and how it’s being said, to the extent the verb tells us that, is obvious. Alternatives to “said” should generally only be used to draw attention to how something is said. (Although “asked” and “replied” are probably as good or better where appropriate, because they are common enough that the same not-noticing thing also happens.) Actually, I think generally, any verb for speech is a lot less common in good writing than in primary school. But that’s what they should be teaching us, not “enquired”, “exclaimed”,….