If you thought this blog post was perfectly fine – not Hemmingway but OK – what rating would you give it on a 1-5 scale? It seems the standard practice would be to give it a “5” unless it was truly terrible.
I am doing data analysis on a survey with several 1-5 questions each, from thousands of respondents. And as I try to extract meaningful results I am stymied by the culture of “reputation inflation” that has resulted in almost all scores being 4 or 5. Out of 1.4 million 1-5 responses in my dataset, over 91% are 4 or 5 (with quite a bit more 5’s than 4’s). Only 0.2% gave a score of 1, and in my anecdotal experience half of those didn’t read the directions and thought 1 was the highest score.
When did 5 become “ok, fine, nothing I should complain about” instead of 3? If something truly is above and beyond where does that leave me room on the scale to indicate that?
In some cases this situation is built into the interpretation of the ratings. Uber drivers need to maintain a minimum rating to continue driving for the firm – one that assumes reputation inflation. That cements the inflation since now anything less than a 5 may kick them onto the unemployment line. I received a product from Amazon with a card indicating how I should rate the company and reminding me that “5 star means no problem!”. Finding the best hotel on Trip Advisor or restaurant on Yelp means parsing whether it’s worth driving a bit further for a 4.3 instead of a 4.2.
In a culture where 5 means “OK”, all average ratings with a sufficient number of raters blend into shades of beige that strain the eyes of the user trying to tell them apart.
I recommend adding textual descriptions next to or instead of the numerical scale. For example:
“3: everything was perfectly fine”.
Or perhaps asking two questions that probe the above and below average situations separately. For example:
“Was anything about the service above and beyond expectations?” and “Were there any problems or issues with the service?”.
But simplicity, impatience, habit, suspicion (of how ratings may be faked or used against the rater or ratee), and apathy conspire against changes that may produce a more accurate assessment.
Reputation inflation forces the poor folks trying to do data analysis off of these ratings to pull out a statistical magnifying glass to discern better/worse, find correlations, and see trends. In one case my data has so many data points that there is statistical significance to one 4.81 being higher than another 4.81 at the 95% level. This is apparent only when looking at the third or fourth decimal place.
I can interpret this data, but it is harder to produce charts that help users see differences clearly. The data set would be more meaningful if there were clear ways to differentiate ratings of “well above average” and “slightly poor” products and services. The rating culture at this point attributes a “5” to both of those situations, diminishing the whole point of ratings in the first place.
So go ahead and rate this blog a 5. Did you think it was just “no problem” or truly loved it? I’ll never know.
View Free, Relevant Gartner Research
Gartner's research helps you cut through the complexity and deliver the knowledge you need to make the right decisions quickly, and with confidence.Read Free Gartner Research
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.