Ever wondered how to learn the necessary statistics for data science, while escaping from the boring, theoretical books? Read on to find how!
The following is a review of the book Practical Statistics for Data Scientist by Peter Bruce, Andrew Bruce, and Peter Gedeck.
Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python is a book that covers the topic of statistics oriented specifically towards data scientist and Machine Learning engineers in a very practical, hands-on manner. It’s second, improved edition just came out, so we thought we would make a review about it.
If you are familiar with R or Python (this second programming language is only included in the second edition), and have had some previous exposure to statistics, this concise book will get you up and running with complex statistical methods, experiments and procedures in these two programming languages. If you are new to programming, you can find some very good reviews of Python books here.
You will learn to perform exploratory data analysis like a pro, how to sample prorpely, how to answer questions from an statistical perspective, how to use regression to predict outcomes and detect outliers, and some statistical supervised and non supervised Machine Learning methods.
This book is more oriented towards programmers that have some statistics knowledge than to data scientists who are probably very familiar with these topics. If you are a programmer that wants to approach data science, or a Data Scientist with a lack of statistical knowledge, then this book is a very good buy for you.
In a world full of statistics post, threads on stack overflow, and books, Practical Statistics for Data Scientist can serve as a reference manual to have near by and look up in case you need to refresh an old concept or look at how to implement something precisely.
Overall a great read and a fantastic reference manual for punctual questions.
Book description and contents
This book is aimed at the data scientist with some familiarity with the R and/or Python programming languages, and with some prior (perhaps spotty or ephemeral) exposure to statistics. Two of the authors came to the world of data science from the world of statistics, and have some appreciation of the contribution that statistics can make to the art of data science.
At the same time, we are well aware of the limitations of traditional statistics instruction: statistics as a discipline is a century and a half old, and most statistics textbooks and courses are laden with the momentum and inertia of an ocean liner. All the methods in this book have some connection—historical or methodological—to the discipline of statistics. Methods that evolved mainly out of computer science, such as neural nets, are not included.
In all cases, this book gives code examples first in R and then in Python. In order to avoid unnecessary repetition, we generally show only output and plots created by the R code. We also skip the code required to load the required packages and data sets. You can find the complete code as well as the data sets for download at GitHub.
Two goals underlie this book:
- To lay out, in digestible, navigable, and easily referenced form, key concepts from statistics that are relevant to data science.
- To explain which concepts are important and useful from a data science perspective, which are less so, and why.
In the following link you can find the table of contents for the previous edition. The new one has some additional subsections and Python code instead of only R.
About the book
Peter Bruce is the Founder and Chief Academic Officer of the Institute for Statistics Education at Statistics.com, which offers about 80 courses in statistics and analytics, roughly half of which are aimed at data scientists. He has authored or co-authored several books in statistics and analytics, and he earned his Bachelor’s degree at Princeton, and Masters degrees at Harvard and the University of Maryland.
Andrew Bruce, Principal Research Scientist at Amazon, has over 30 years of experience in statistics and data science in academia, government and business. The co-author of Applied Wavelet Analysis with S-PLUS, he earned his bachelor’s degree at Princeton, and PhD in statistics at the University of Washington
Peter Gedeck, Senior Data Scientist at Collaborative Drug Discovery, specializes in the development of machine learning algorithms to predict biological and physicochemical properties of drug candidates. Co-author of Data Mining for Business Analytics, he earned PhD’s in Chemistry from the University of Erlangen-Nürnberg in Germany and Mathematics from Fernuniversität Hagen, Germany
Publications: First publication 2017, Second publication 2020
Practical Statistics for Data Scientist is a book that is very well defined by its name: it is a very hands-on book to learn the most important statistical concepts and tools used in the data science world.
For people that are already confident with statistics most of the topics will be a bit familiar, however, the book still brings some fresh perspectives and insights–especially on helping gain a solid (step-by-step) grasp of common algorithms and models in the data science toolkit. For programmers with little statistical knowledge, this book will teach them what statistics are all about, and how they can be used to leverage the true power of their data.
With a book like this one, complemented with a Machine Learning book like “Hands-On Machine Learning with Scikit-Learn & TensorFlow”, you will be on your path to becoming an excellent Data Scientist if that is what you want.
For experts in Data Science or Machine Learning, the book can serve as a reference manual to have near by for looking up how to implement various algorithms. Overall, a good read, that we surely enjoyed. We hope you do too.
- Bruce, Peter (Author)
- English (Publication Language)
- 368 Pages - 06/02/2020 (Publication Date) - O'Reilly Media (Publisher)
We hope you liked the review, for more like it, check our Data Analysis Books category.
Thanks for reading How to Learn Machine Learning and have an awesome day!