© Getty Images (representational photo)
© Getty Images (representational photo)

The serious level of suspicion about numbers among cricket fans and analysis arise not from the shortcomings of statistics in telling the full story, but from the inability to look at data in a proper way, writes Arunabha Sengupta

The discussion was about Virat Kohli. Most discussions these days centre around him.

And there was this respected journalist with decadesof sterling experience, who brought up the single Achilles Heel that remains in the Indian captain’s career. “What about his record in England?”

Of course Kohli is going there again. And there is time enough in his career to rectify this glitch in his numbers. However, while pointing that out I also raised a couple of questions.

“Yes, he has failed in England, but that was one series. However, similarly, Garry Sobers had a rather torrid time in New Zealand. (He averaged 15 there across two tours and 7 Tests). We don’t hold it against him, do we? Or Sunil Gavaskar in Sri Lanka (average of 37 in the single tour). It is a different matter if numbers indicate a definite shortcoming over a long time. For example, Rahul Dravid in South Africa, with an average of 29.71 across 5 tours and 11 Tests. But, one cannot really consider one tour as a chink in the armour, can one?”

Now, Sobers and Gavaskar are names that cannot be sullied. They belong to a past so far back in time that their few rough edges have been massaged by temporal gold dust along the way to produce gilt edged finished statues hovering between history and mythology. They cannot be faulted.

However, Kohli is too present-day, and hence his imperfections cannot be hidden through edited grainy highlight packages which show only the boundaries and not the snicks and close shaves. No matter how phenomenal his run-making is.

The response was not unexpected.

“It is wrong to judge cricketers through statistics. I’ve seen these players live, and I know the standard of play there was. I can vouch for the quality of Sunny and Garry. Virat is very good, but not a great yet.”

Well, no problem with the conclusion.

After all that is one man’s opinion. Scientifically, one can define what ‘great’ is and try to see whether Virat Kohli does fit into that category. Or perhaps it is better to wait before performing the exercise until he has called it a day; or, if and when he, as in the case of Don Bradman, Jack Hobbs, Sachin Tendulkar and their ilk, has every record under his belt.

But, then, I had reservations with the first statement made by this gentleman.

Unless we come up with comparable statistics, or rather scientific measures, which are robust enough to take care of variations between time and place, evaluation of greatness will depend on the questionable combination of individual gut-feel and admiration.

Someone growing up in the 1950s will have seen a Peter May in his adolescence, and even a Greg Chappell in his pomp will not be able to dislodge the man he had grown up admiring in his impressionable days.

The cycle is repeated over and over again with eras. The characters change … Peter May becomes Garry Sobers, and then Viv Richards … down to Sachin Tendulkar and Brian Lara … and further down to Virat Kohli and Steven Smith.

If one does not evaluate with proper measures, it all boils down to which era in which one grew up. And then opinion is pitted against opinion. Anecdotal recollections try hammering each other out of the way.

And if seeing a player in action is so important, then perhaps Jack Hobbs can no longer he considered a great batsman, since there remain precious few who have watched him bat. In fact the entire subject of history goes out of the window in that case.

A cricketer’s greatness cannot be a function of the impression he made on one eyewitness. What if a contemporary eyewitness has some other favourite? It is one man’s word against another’s.

To someone like me, a statistician by training and historian by trade, the argument is doubly objectionable.

In a sport so scrupulously documented, with every ball reflecting measurable outputs, the performance has to be measurable and comparable through data.

All this is rather complicated to put into words … and hence, as I was preparing my response, with proper care not to sound too blunt to the venerable journalist, another question flew in across the table.

“But, don’t you think numbers do a good job in indicating how good a player was?”

I was already preparing an answer along these lines, and hence I responded that as long as the measures are sophisticated enough, it is the only way to determine how good a player was.

“Of course, provided one looks beyond the simple stats like average and aggregate,” I said.

And the elderly journalist was delighted.

“Exactly what I said. We have to look beyond statistics.”

Well. Therein lies the rub.

For this gentleman, looking beyond average and aggregate essentially meant looking beyond statistics. Simplifying a bit further, to him statistics is equivalent to average and aggregate.

For most of the cricket followers, without the benefit of a mathematical or statistical background, statistical analysis based on numbers boil down to putting a greater than or less than sign between two averages or aggregates.

That is the reason for the rant: “Numbers cannot show the full picture,” “statistics is like a bikini,” and so on.

It is not that numbers are limited. But the knowledge of numbers and statistics is limited among people subscribing to this school of thought.

Recently Abhishek Mukherjee, Chief Editor of CricketCountry, and I had been in a panel discussion hosted by CricketSoccer. And there we had discussed how different the formats of cricket are today, especially how much T20 varied from the other formats. With both of us having the benefit of post-graduate degrees in statistics, we had used a statistical measure called Spearman’s Rank Correlation Coefficient to underline that T20 was in fact a different sport.

It is simple High School Statistics. But it was not readily understood by many.

If one goes beyond high school and pursues statistical tools available in the graduate or post-graduate curriculum, there are endless ways that data can be analysed.

Mann-Whitney Tests, along with other non-parametric analysis, can actually determine whether the performance of one cricketer was significantly better than another.

Anderson-Darling Tests can be performed to find out that the economy rate of a bowler, when considered series by series, does follow normal distribution. And thereby one can find out whether he was significantly expensive in a particular series or not, with statistically robust levels of significance.

Kaplan-Meier estimates can be used to get a better estimate of batting averages, where some of the problems of not considering a ‘not out’ as a completed innings can be resolved.

These only scratch the surface. The possibilities of using sophisticated statistical techniques on the rich reservoir of cricket data are endless.

However, they are not readily translatable in terms understandable to everyone.

And hence, statistics remain aggregate and average. And therefore they supposedly do not reflect the full picture.

Especially when numbers turn out to be counterintuitive, as they often are — mainly when fandom is involved. Heroes coming up short in analysis is unpardonable. That is why the scoreboard becomes an ass, and statistics becomes the refuge of ‘those who cannot understand the game.’ As opposed to ‘those who can understand the game enough to be blind to data because of heroes’.

And that is why a former cricketer can go around writing, “The numbers will intervene to suggest Walsh as a greater bowler — after all, he got over 500 Test wickets, while Ambrose had 405.”  And he still manages to walk free.

The problem is not that numbers cannot identify these details. The problem is that understanding of numbers is lacking in the majority.

But, unfortunately, there is hardly anything we can do about it.