Review scores. You know and love (or hate) them. They’re attached to most video game reviews, and they have a tendency to stir up controversy and debate. But they’re also highly subjective, and they’ve come under a lot of criticism for either failing to meaningfully encapsulate a full review or by reducing video game criticism and and gaming conversations into a pointless debate about a number.
In this two-part series of articles, I’m diving into the question of review scores to figure out what they mean, why we have them, and whether they’re worthwhile. In this first article, I simply want to find out what review scores are and how they’re used.
What on Earth Do Review Scores Mean?
It’s not an easy question to answer. What’s an “8/10” for example? On its own, it doesn’t really mean anything. It’s not like the game took a multiple-choice test and got 80% of the answers right. It’s subjective (obviously) but more importantly, it’s relative to other scores. So for any given review site, it’s important to know how they score relatively to how you feel about those same games. This is something you can do for yourself fairly easily, and assuredly, most gamers do.
But it’s also important to know how a particular site uses scores when compared to others. A site might provide a brief guide vaguely explaining what a particular score means, but the reality is that you can only judge by what scores they hand out most frequently to what games.
To illustrate, here’s a breakdown of the review scores given by three of the largest gaming review score websites over the past three months (April 22-June 22). While both Polygon and IGN give scores with decimal remainders, I’ve grouped them into ranges for ease of comparison. An ‘8+’ here means anything from “8.0-8.9”. So here are some things we can discern from the information:
Review Scores Are Consistent Across the Sites
The sites are fairly consistent in their groupings of review scores. Their median and average scores are fairly close. And while there are some variations, it’s clear that “10/10” games are elusive (GameSpot’s single 10 was Uncharted 4: A Thief’s End), and the vast majority of games lie between ratings of “6” and “9”.
This kind of universality of review scores isn’t surprising, especially in the era of Metacritic. Metacritic’s importance has become a major factor in universalizing game review scores. The inherent assumption in Metacritic is that an “8/10” from GameSpot is the same as an “8/10” from IGN. If that’s not true, then Metacritic’s model of averaging those scores makes no sense. But the popularity of Metacritic and its importance in getting click-throughs causes a feedback loop: it pushes game reviewers to make sure that their review scores mesh with a universalized concept of what those scores mean.
Now, when they give a game an “8/10”, they’re giving it that score against the general concept of an “8/10” that operates across all mainstream gaming journalism. Metacritic further reinforces this model by making it clear what scores are good, mixed, or bad based on its color-coding of those scores. In Metacritic’s world, there are three main categories: green (75+), yellow (50-74) and red (0-49). Thus a site that might think a “7/10” is actually a really good score and a decent game is a “5/10” gets, in a sense, overruled, and pushed to comply with the rest of the gaming world.
The More Reviews a Site Does, the Lower Its Median Score
GameSpot published the most reviews in the past three months, and also had both the lowest average and median review scores. Meanwhile, Polygon published less than half of those, and had the highest median. While three data points isn’t enough to be conclusive about this trend, it does make sense. Most review sites are going to review the big AAA titles because they drive the most traffic and get the most attention on sites like Metacritic. AAA games, by definition, have a fair amount of money and development resources behind them, so they tend to meet a minimum quality bar. They might not be great, but they’re unlikely to be truly broken or abysmal (although there are, of course, exceptions).
So if we look at Polygon’s 27 reviews, there’s a good chance most of these are major releases. GameSpot and IGN reviewed far more, and while it’s likely they reviewed the same games as Polygon, the extra reviews are probably of games with less financial backing. These will be games from small and indie developers, or smaller projects from the more major developers. There’s lots of gold here, but there’s also lots of buggy, broken, or simply poorly-conceived products on the market within this category. So it makes sense that the more of these a site reviews, the lower its average scores will skew.
Review Sites Tend to Use Only the Top of Their Scoring Scales
We’ve all become accustomed to this by now. Most sites have wide scoring scales, but it’s rare that a game gets less than half of the maximum. Because of this, it can often feel that the range of review scores is between 6 and 9, rather than 1 and 10. The three sites examined here are the same in this regard. We can again blame Metacritic in part, as it gives anything less than a 50/100 a “red” score, like you failed your math midterm.
But even Metacritic’s color-coding probably reflects a societally-learned understanding of numeric ratings. My explanation for this use of the upper half of the scale, although I have no evidence for it, is that we’re trained by school. Pretty much all of us went through a school system that gave us some sort of grade. And even if that grade was in the form of letters or other coding systems, we generally had an idea that it corresponded to percentage grade. Getting an ‘A’ usually meant above 80%, and a ‘B’ meant 70-79%. Most importantly, though, an ‘F’ typically meant less than 50% (or 60%, depending). But it was effectively all one grade. It didn’t really matter if you got a 16 or a 45–you failed. So half the whole range of grades was just a ‘fail,’ and the range of non-failing grades was the other half.
I think we see this repeated in game review scores, where we really only give a game less than five if we say it’s an utter failure. It then doesn’t really matter if it’s a 2, 3, or 4–it’s just a bad game you shouldn’t play. And in the case of a 1-scoring, I assume, are reserved for the worst of the worst: those games that are comically unplayable.
But there’s no reason review scores have to be this way. We seem to have accepted that “7/10” is an okay game, one that has merit, but either has a lot of problems or limitations. There’s no reason this shouldn’t be, say, a “5/10”, making lower scores more meaningful, while giving more room for differentiation at the higher level. But again, because of school, we tend to think of 70% as an ‘OK’ grade, and we certainly don’t think a 50% is.
So How Can We Interpret Review Scores?
Based on our discussion so far, here’s a breakdown of how I see review scores when normalized to 1-10 scale
10: One of the best games ever made, and should be played by anyone who enjoys video games in any way related to its genre. People who have not played this game can be asked if they even game, bro.
9: One of the best games of the year in its genre, and a contender for overall game of the year. Players who like games of this type should definitely pick it up. It comes highly recommended. The game goes beyond expectations to offer an experience that rises above the pack.
8: One of the most common review scores, and the lowest that can be considered a true recommendation. Either an excellent indie that differentiates itself strongly from others of its kind, or a AAA title that met all major expectations without going much beyond. Nothing here is revolutionary, but you can bet the game is both playable and fun, showcases either interesting new mechanics or very polished versions of existing mechanics, and is worth your investment if you like games of that type.
7: What we might call a perfectly average game, but actually a slight disappointment. Lots of games fall in this range. No game aspires to be a 7/10, so no matter what, some promises haven’t been fully delivered on, or some aspects weren’t executed properly. It could be a good game marred by too many bugs, a mediocre game overall, or a game with one or more elements so undercooked it undermines everything else. It’s therefore a buyer-beware score. The game is probably still fully playable, and will be enjoyable to the right audience, but its limitations hold it back from getting the clear recommendation that attaches to games that score 8 or higher. A game you might want to pick up on sale once you’ve cleared your backlog.
6: A “6” is a strange score. It’s often the lowest score one can give without calling the game an outright failure. But it’s clear that nothing in this range is recommended. It’s almost always a signal of a clear disappointment, of something that might have had potential, but was just executed far below that promise. A 6 means there is something there, some moments of fun or engagement or immersion, but they might be hard to find. There may be an audience out there who will enjoy it, but it’ll be a small and patient one devoted to games of its type. But at this level, the game’s fundamental playability might be in question due to bugs or other issues such as server stability. Definitively a game to get on sale, if at all.
5: Don’t buy. A “5” is pretty much a failure. No matter how much potential the game might have had, it didn’t just miss delivering, it offered no compelling reason to choose this game over any other. GameSpot calls a 5 “mediocre” but this is misleading. A mediocre game might still be fun, playable, and enjoyable. I often make myself what I would consider a very mediocre dinner but still enjoy it. Games are meant to be enjoyed, so a mediocre game should still be worthwhile. A 5 isn’t. It’s dull, broken, or offensive. It probably just feels like work to get through. Only the most diehard fans of its genre or series will enjoy it.
4: Downright bad. You shouldn’t buy this game, even if you like games of this type. It might be riddled with bugs and nearly unplayable, or just plainly devoid of fun. Games of this type are typically doomed from conception, as they show few redeeming qualities. However, there’s still some game there, however pointless it may seem, and maybe it’s even pretty to look at. You might consider playing it for an hour or so if you got it for free, but even then, maybe not. In either case, it misses its mark heavily.
3: Terrible. A total waste of time to play. Someone should pay you to play this game. It’s probably so broken as to prevent meaningful completion or simply devoid of interesting content.
2: Same as “3”, really, except maybe a bit more broken and unplayable.
1: Laughably bad. The kind of game one plays just for kicks or to make memes about. It has no redeeming qualities whatsoever, and it makes one wonder how anyone could possibly think it was socially acceptable to release it in the first place.