Think Stats: Exploratory Data Analysis

think stats exploratory data analysis

Learn to use Python programming to turn your raw data into knowledge: one of the most practical guides to the practice of Statistics – Think Stats!

The following is a review of the book Think Stats: Exploratory Data Analysis by Allen B. Downey.

Review

One of the most important characteristics of a Data Analysis, Data Scientist, or Machine Learning practitioner is to be able to efficiently inspect data, analyse it, and extract useful information from it. The book Think Stats teaches us to do this in a practical manner: it is of little use knowing the theory of statistics and probability if we do not know how to apply it to real data and solve relevant problems.

Think stats will teach you how to perform statistical analysis computationally and apply descriptive statistics in Python, with very little mathematical baggage.

It does this from a first principles approach, like Data Science from Scratch. What do we mean by this? To make us understand about each of the concepts, and learn how to implement it properly, everything is coded from scratch: the author provides classes made that implement each of the different analysis, like the mean of a Probability Mass Function, Root Mean Squared error, and many more. The code can be found in the following GitHub repo.

While some argue that this approach is useless, as there are libraries like Pandas, Numpy, and Scikit-Learn that provide us with already built functions, others, like us, like this 0 up approach, as we think it not only increases your coding expertise, but also allows for a deeper understanding of what is going on.

In general, the book provides a mixture of Python and Statistics, so having a previous background of both will allow you to make the most out of it. If you are new to Python and want to learn more before starting a book like Think Stats, we recommend introductory guides like Python Crash Course. Also, if you want to learn the most important and basic Statistical concepts, in an easy, and humorous manner we recommend the book Bayesian Statistics: The Fun way.

Once you know Python and some basic statistics, you will best grasp and understand think stats, and make the most out of the awesome value this book provides: It will mix both worlds, teaching statistics through coding, allowing you to consolidate the basic statistical concepts you know while expanding this knowledge with more advanced and insightful lessons.

Think Stats is a great introduction to statistics for any person with basic programming skills that avoids the formulas and goes directly to the practical side of the subject matter. You. In it you will face a simple way to practically and swiftly try each new concept through Jupyter Notebook interactive examples. We highly recommend running these Jupyter notebooks and taking a code as you go approach to reading this book.

Contents

The contents of the book are the following:

  1. Exploratory data analysis
  2. Distributions
  3. Probability mass functions
  4. Cumulative distribution functions
  5. Modeling distributions
  6. Probability density functions
  7. Relationships between variables
  8. Estimation
  9. Hypothesis testing
  10. Linear least squares
  11. Regression
  12. Time series analysis
  13. Survival analysis
  14. Analytic methods

As the goal of the author was to make the book has as wider reach as possible, and explain statistics in a very practical manner, he made the book available on PDF here: Think Stats PDF. Take a look to see if you like it, and decide if you want it on paperback or if you are happy with the digital version.

Also, you can find the official website for the text here: Think Stats Official Website for the second edition of the text.

It is worth to mention that each chapter includes exercises readers can do to develop and solidify their learning. When you write programs, you express your understanding in code; and while you are debugging the program, you are also correcting your understanding. As well as taking a code as you go approach to Think stats, we also recommend doing this exercises to test you knowledge in a non-guided manner.

The main idea behind the book is that some ideas that are hard to grasp mathematically are easy to understand by simulation. For example, we approximate p-values by running
random simulations, which reinforces the meaning of the p-value. Because of this the book is played out in an imminently practical manner, that will take theory and practice hand by hand, allowing you to not only understand the concepts in a theoretical level, but actually making this theoretical understanding easier with the help of the coding.

This book is a good introduction to coding the concepts of statistics, that can serve as a precursor to Machine Learning books like An Introduction to Statistical Learning or The Elements of Statistical Learning for experts, which will teach some more advanced topics and introduce you to Machine Learning using some of the statistical concepts you have learned.

Summary

Think Stats: Exploratory Data Analysis will take you through the entire process of exploratory data analysis and empirical probability in Python: from collecting data and generating different descriptive statistics in Python to identifying patterns and testing hypothesis. Youโ€™ll explore distributions, rules of probability, visualisation, and many other tools and concepts. In a few bullets, you will:

  • Develop an understanding of probability and statistics by writing and testing code
  • Run experiments to test statistical behavior, such as generating samples from several distributions
  • Use simulations to understand concepts that are hard to grasp mathematically
  • Import data from most sources with Python, rather than rely on data thatโ€™s cleaned and formatted for statistics tools
  • Use statistical inference to answer questions about real-world data

You can find the book on Amazon if you decide to purchase a paperback copy here:

Sale
Think Stats: Exploratory Data Analysis
  • Downey, Allen (Author)
  • English (Publication Language)
  • 223 Pages - 11/25/2014 (Publication Date) - O'Reilly Media (Publisher)

Also, if you liked it, take a look at our review of Practical Statistics for Data Scientists, another really awesome book that covers statistics and probability in Python!