Last week Reuters, Financial Times and the Huffington Post referenced a rather sensationalistic outcome published by an Italian entrepreneur and contract university professor who is well known in Italian social media circles. His research allegedly showed that “up to 46 percent of Twitter followers of companies with active profiles could be generated by robots, or bots”.
His method, described in a research paper, uses a point system making the following assumptions:
Characteristics associated with “human” behaviour worth one point:
· The profile contains a name
· The profile contains an image
· The profile contains a physical address
· The profile contains a biography
· The user has at least 30 followers
· The user has been added to a list by other users
· The user has written more than 50 posts
· The user has been geolocalised
· The profile contains a URL
· The user has been included in another user’s favourites
· The user uses punctuation in posts
· The user has used a hashtag in their posts at least once
· The user has used an iPhone to log in to Twitter
· The user has used Android to log in to Twitter
· The user has posted with Foursquare
· The user has posted with Instagram
· The user has used the Twitter.com website
· The user has written the userID of another user inside at least one post
· The user has a number of followers which, if doubled, is greater than the number they are following.
· The user publishes content which does not just contain URLs
Characteristics associated with “human” behaviour worth two points:
· At least one post has been retweeted by other users
Characteristics associated with “bot” behaviour worth one point:
· For each characteristic on the “human” list which has not scored points, one “bot” point will be assigned, with the exception of the following:
- the user has logged in through different clients
- the user uses the website
- the user has used Android
- the user has used iPhone
- the user has posted with Foursquare
- the user has posted with Instagram
· User uses only APIs
If any one characteristic of “human” behaviour is true, the corresponding “human” points will be assigned. If it is false, the corresponding “bot” points are assigned.
Conversely, for each “bot” behaviour characteristic, if it is true, “bot” points will be assigned. If it is false, “human” points are assigned.
The algorithm based on the scoring system above has been run on followers for 13 international companies, and 26 Italian ones, mostly from very different industry sectors.
I guess it is quite obvious that most of these measures are rather arbitrary and debatable measures, and changing the scoring system would be both easy and plausible. For instance, are people who mostly lurk and do not write tweets any less human than those who are compulsive writers and retweeters? Does using a mobile device make somebody more human? Using twitter.com rather than the many tweeter applications for PC and mobile devices make somebody more human?
Also, the sample is hardly representative.
Given all this, it is somewhat remarkable that this research got as much exposure as it did, for which one clearly must give the credit to the professor-entrepreneur’s marketing skills. On the other hand, it proves what a former colleague of mine used to say: if you torture the data long enough, they will confess anything.
After all, in a blog post, the author says:
Now I wish that somebody will make the effort of re-processing the data I provided in my research, changing both method and algorithm, in order to get different results (with bigger or smaller numbers, it does not matter) My research has no presumption, unless opening a gate that – I hope – will be crossed. and possibly better, by others.
Which is like saying that he is not entirely sure about what he published. While this supports the point about how data can be tortured, also casts a shadow on what people who hold academic positions in this country consider as research.