Un-Natural Language on Twitter
A common language emerges on Twitter: gibberish
Researchers recently anaylsed 75 million publicly available tweets sent by 189,000 Twitter users. Using network science algorithms, they found that it was possible to associate the language used in these Tweets with communities of users with a common interest. The researchers claim that “by looking at the language someone uses, it is possible to predict which community he or she is likely to belong to, with up to 80 per cent accuracy”. Taken together, this suggests that a ‘shorthand’ has spontaneously emerged to enable members of various ‘Twitter tribes’ to rapidly locate and identify each other.
Brands hoping for an e-Esperanto to help decipher the Twitterverse — perhaps Twittermultiverse is a better term — will be disappointed. The graphic below, from the report*, shows the scale of the challenge for anyone trying to extract real meaning. It’s unlikely that even the most sophisticated natural language data mining software can make anything coherent of “linq”, “twug”, “robsessed”, “melb”, “ty” (and these are words used by English speakers) or even of the relatively plainspeak “gratitude” and “heaps” (these are important words to the sub-group).
The shorthand lingo is not new or unique to Twitter. Special interest groups have always evolved their own secret codes: @masonicnetwork and #masonic are just an electronic evolution of the handshake.
What may be significant is that the ‘shorthand’, while unstructured, follows some kind of rules. In each tribe or sub-group, a high-level word acts like a hashtag; beneath it are secondary words associated with it. The high-level words, and to as lesser extent the secondary words, are almost always acronyms, slang or emoticons, which complicates text mining. It’s worth remembering that secret codes, from handshakes to hashtags, are intended to exclude outsiders as much as to include insiders.
*The study, by researchers at Royal Holloway, University of London, and Princeton University of New Jersey, is published by EPJ Data Science.