Sunday, February 7, 2016

Book Review - How to Lie with Statistics - Darell Huff

How to Lie with Statistics is an amazing book. Educational, thought-provoking, humorous -- this book is all of these and more. That the first edition was printed in about 65 years ago makes it even more awe inspiring. All things that the author explains in the 10 chapters about the different devious tricks people use to mislead with statistics are pretty much relevant and applicable to current day. If you do not have time to read the rest of this post and want one takeaway, it would be to just pick this book right away and read it cover-to-cover.  You will definitely not regret it.

The introduction sets the tone for the rest of the book. It starts with an example where two polls, one by Gallup and another by a newspaper come up with such huge difference in their estimates on how many people are familiar with the metric system in US. One said 33% and the other 98%. The author ascribes this anomaly to the massive sampling bias in the newspaper poll. There are similar such examples in this chapter as well, but there are a couple of statements about statistics that are worth calling out, reproduced below. Clearly, when it comes to statistics, there is usually more to it than what meets the eye.
  • "So appealing in a fact minded culture, is employed to sensationalize, inflate, confuse and oversimplify"
  • "Well wrapped statistic is better than Hitler's big lie -- it misleads, yet it cannot be pinned on you"
The first chapter is titled "Sample with built-in bias". Let us suppose that there are two huge barrels full of beans with two different colours. We need to find out the ratio. Given sufficiently large sampling from these barrels and counting the red and white ones, it is possible to identify the ratio to a reasonable accuracy. But if there is a built-in bias, all bets are off. The author explains this with the following example. Let us suppose that the poll question is "Do you like questionnaires ?" The results of this poll is likely to give out percentages in the high 90s. That is not because a lot of people like questionnaires. It is because the people who dislike it are likely to fling this questionnaire into the nearest dustbin and it will never make it to the response pile. Ignoring people who did not respond is a guarantee for wrong interpretation. The rest of the chapter reinforces this point with several other interesting examples.

"Well chosen average" is the title of the second chapter. This chapter deals with using the word average without revealing whether it is a mean/median/mode and how misleading it can get. The author uses the example of "average income of a neighborhoods" to explain this concept. Assuming that there are a few extremely rich households and a lot of low income households, the average (mean) could be misleading to describe this number. A median or a mode here would be better choice. There is also the urge to assume that distributions are bell shaped. But in reality, quite a few of these are "hockey stick" shaped and so an unqualified average is completely meaningless.

"Little figures that are not there" is the third chapter. The essence of this chapter is this: Well biased samples can be used to produce any result. So can random ones, if some of the underlying information related to the numbers are not published. For example, if the sample size is small enough and you try enough of them, you can manage to get it to produce a distorted figure. For example, on an average, we will get heads half the number of times and tails the other half. But if we just toss the coin only 5 times, is not improbable to get 5H or 4H and 1T. The law of averages will hold good only if we toss the coin sufficiently enough number of times. By masking the sample size, the numbers can be made to confess to anything. There are a bunch of other techniques in this chapter, including playing with words, displaying graphs without labeled axes etc that can be used quite effectively for manipulating the outcome.

Chapter 4 is titled "Much ado about nothing" This chapter explains tricks that people use to mislead where they just rely on some ordering of items, without actually looking at the absolute numbers. The classic example cited here is a study by an independent agency on the tobacco and nicotine contents of various brands. The study concluded with the finding that there is very little difference between the various brands. However, there still was one at the bottom of the list who tried to exploit this by just citing the ranking and carefully not revealing the absolute numbers.

The Gee-Whiz Graph is the title of Chapter 5. When numbers in tabular form are taboo and words do not work, the author mentions that one can always mislead by drawing a picture. Consider line graphs. Let us presume that we need to depict the increase of some entity from 20 to 22. If the scale is big enough, the 10% increase is clearly depicted. But if you want to make it dramatic, just blow up the scale. From 20 to 22 in steps of 0.2 Now, that will appear to be a steep increase for the naive reader.

The one dimensional picture is the title of Chapter 6. Here the devices used are bar charts and pictorial graphs. Say you want to show that country A workers earn twice that of country B. Simple bar chart can be used to show that honestly. However, to make it dramatic, depict a two dimensional picture of it. Now, increase both width and height by 2. Implicitly it will make it appear that area is 4 times instead of just being two times. Make it three dimensional. It will be 8 times.

Chapter 7 is titled The semi-attached figure.  Quoting directly from the book, this chapter's trick is this -- "If you cant prove what you want to prove, demonstrate something else and pretend they are the same thing".  The first example is the following claim by a drug company. Nostrum cures cold,  kills 30K germs in 11 seconds in a test tube. The details are conveniently left out.  Antiseptic that works in test tube might not work the same in humans. It possibly could not be used in the same concentration. No information on the kind of germ that it killed. Here is another interesting example about the enlisting campaign by US Navy. The campaign compared the mortality rates in Navy and outside. Death rate in Navy was 9/1000. Same time, for civilians in NY, it was 16/1000. The campaign then went on to conclude that joining Navy is probably even better than being a civilian. What is not mentioned is this. Navy is full of super fit people. Civilian population includes infants, old, ill etc. The numbers are just not comparable.

Chapter 8 is titled Post Hoc rides again. This is the well known theory "Correlation does not mean causation". The author provides several examples where there may be a correlation, but it is clearly not obvious on which is the cause and which the effect. There are several cases where the cause and effect are confusingly distorted, reversed and intermingled.

How to Statisticulate is the title of Chapter 9.  Statisticulate is "Statistics" + "Manipulate". The chapter covers a bunch of other techniques that can be used to manipulate and mislead. The first tool the author uses here is the map. According to the author, maps expose a fine bag of variables where facts can be concealed and relationships distorted. Then there are percentages and percentiles. Index numbers are also critical for proper representation of data. This chapter has examples how each of the above can be used to paint a completely different picture than reality.

Chapter 10 is the last chapter that is titled "How to talk back to a statistic".  The author suggests five questions that one should ask to prevent from being misled by these devious mechanism.

  • Who says so
  • How does he know
  • What is missing
  • Did somebody change the subject
  • Does it make sense

In essence, this book is not only a great read, but also something one should get for one's library, to refer time and again. 







Saturday, January 9, 2016

Book Review - The Power of Habit: Why We Do What We Do in Life and Business by Charles Duhigg


The power of habit book was a fairly enjoyable read and good start to the new year. The prologue of the book is fairly captivating, it starts with a story on how a 34 year old woman who was struggling with obesity, debt, alcohol and a host of other issues turned it around, became fairly successful at work and indeed managed to run a marathon. The author quotes a Duke university finding -- "40% of the actions performed each day weren't actual decisions, but habits".  It is by forming the good habits and overwriting the old habits that we turn around our lives.

The book has three parts. The first deals with how habits emerge in individual lives, the second in organisations and the third the habits of societies. All the three sections are well laden with anecdotes and stories the make up for a interesting reading.

The first section starts with the story of Eugene Pauly, a man who was afflicted with a viral infection that affected a tissue in his brain that was primarily responsible for memory. He could not remember any recent activity, which meant he did not know whether he had just finished his breakfast or not, or which day of the week it was or the name of a stranger who just got introduced to him. While he was unable to map out the layout of either his house or his neighbourhood, he was able to visit the kitchen or walk around the block and get back to his house. How was he able to do it -- this intrigued researchers who hypothesised that he has formed a habit that was helping him operate in an auto-pilot mode without him thinking or making any decisions. This formation of "habit loop" and understanding its details is critical for us to change our actions and behaviors. The habit loop, told simplistically is just a loop of a cue, routine and reward. We perform a routine in response to a cue and reap a reward. By fiddling with the cue and rewards, you can change the routines or your actions, which could be giving up smoking or any other addiction etc. There are many other stories in this chapter that reinforces this point, including how Febreze, an air freshener became a hit after initial failures. There is also the mandatory sports story of how Dungy turned around the fortune of the Tampa Bay Buccaneers and Indiana colts football teams. The success of "Alcoholics Anonymous" as a movement is also quite interesting.

The second section dealt with how habits are formed and transform businesses was not as interesting as the first section, possibly because I may have less to takeaway from those than the first one. However, the stories of how new habits were created at Alcoa by Paul O'Neil and how he used it transform a company on decline to achieve new heights is clearly worth reading. One story that tries to explain how we muster enough willpower to get to gym on somedays and not others was very interesting. The author explains an experiment where a group of people were asked to have cookies and the other radish (resist cookies) and then gave them an assignment that tested how much determination they had to persist on an arduous task. The people who had spent their willpower resisting cookies fared badly as opposed to the people who had cookies and consequently had much of their willpower intact.  The author goes on to argue that those people who acknowledge their weak moments and plan for them are likely to succeed and come unscathed than those, in the battle of willpower and self improvement. There is also the section of how Target analyzes the spend patterns of its consumers to identify and possibly shape their habits. Working in the ad-tech business, this was not revolutionary to me, but I presume that this would be interesting to people who are not really familiar with Big Data and user analytics.

The third section talked about habits and societies. The primary narrative here is around the incident related to the arrest of Rosa Parks snowballed into a big revolution headed by Martin Luther King Jr. and others in the South, whereas similar events before did not make that impact. There are some interesting stories here, but clearly not as appealing to me as the first section.

The last section deals with the philosophical question of whether we are responsible for our habits. There are two main stories here, one involving the murder of a wife by the husband while he was still in sleep and had no idea of what he was doing and the other the squandering away of fortune and fall to infamy of a woman who had given in to gambling. While the former was acquitted in court, the latter was held responsible for her action. The author opines that in the former case, the man had no prior notion that he might act in a certain way while in the latter, the lady knew that she had fallen into the had habit and could have take steps to correct them, however difficult it might be.

The book concludes with a practical guide on how the readers can try and change some of the ingrained habits by identifying their cue and the craving it tries to satisfy so as to modify the routine between them. Identifying a cue is easier said than done, but the author lists some practical guidelines of how to look for them. 

 While I believe that the habit change guidelines do need to be given a fair trial, I think the  book is a good read irrespective of the outcome of that exercise; the anecdotes and stories in this book are good to read and make good conversation material, if not for anything else.