Want to control Data like a professional? Become a Data Analysis master? Then this is the book for you.
The following is a review of the book Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes Mckinney.
About the book: Using Python for Data Analysis and exploration
Python for Data Analysis is a book that has the goal of getting its readers super-conformable playing around with structured data in Python. It explains how to manipulate, process, clean and efficiently crunch data in Python using the most well known libraries for this: Numpy and Pandas.
After finishing it, you will have the tools to solve a wide range of data analysis problems and to build your own data intensive applications, and you will know exactly how to do exploratory data analysis and its steps.
Compared to other recommended texts like Python for Data Science Handbook, it has less on the visualisation side and more on the analysis, also covering topics like Time Series and more advanced Pandas Features. Find the full review for this other book here.
Written by Wes McKinney, the main author of the famous Pandas library, this hands-on book is packed with practical cases studies. It’s ideal for analysts that are new to Python and for Python programmers new to scientific computing. After finishing the book, you will have learned to:
- Use the IPython interactive shell as your primary development environment and also learn the Basics of the Python programming language and Jupyter notebooks.
- Learn basic features of Numpy, the most used library for numerical analysis and operations in Python.
- Become and expert using the Pandas library to clean, transform, and combine data.
- Get some insights onto how to visualise this data using Matplotlib.
- Learn what Time Series are and how to handle time-dependent data.
- Learn how to solve problems in web analytics, social sciences, finance, and economics, through detailed examples
Contents of Python for Data Analysis
- Preliminaries: Explanatory chapter with an introduction, description of the goals of the book, and how to install the needed tools.
- Python Language Basics, IPython, and Jupyter Notebooks: This chapter presents and introduction to the Python fundamentals and semantics,
- Built-in Data Structures, Functions, and Files : Data types, control flow, conditionals and relevant built-in functions. The chapter finishes with how to handle files and how to interface with an Operating System.
- NumPy Basics: Arrays and Vectorized Computation: This chapter explains all the concepts from Numpy you need to know to carry out complex mathematical operations and explores the numpy array and different indexing strategies, as well as sorting and boolean operations.
- Getting Started with Pandas: This chapter is similar to the previous one but with Pandas instead of numpy. It introduces the pandas Series, Dataframes, and Index objects and explains various ways to play with them and get descriptive analytics.
- Data Loading, Storage, and File Formats: How to write and read from text files, work with CSV or TSV files, JSON and HTML and web scrapping. Also explains how to interact with APIs and Databases.
- Data Cleaning and Preparation: Chapter 7 gets us working with Pandas on how to filter values, replace missing values, remove duplicates, and how to detect and filter outliers and sample randomly.
- Data Wrangling: Join, Combine, and Reshape: This chapter dives into hierarchical indexing, how to combine and merge Dataframes, and how to reshape and pivot them.
- Plotting and Visualization: How to use matplotlib to do get visualisations on our data, how to plot with Pandas, and advanced plotting with Seaborn.
- Data Aggregation and Group Operations: Now we start getting into more complex stuff: the groupby functionality of Panda’s Dataframes and how to aggregate data.
- Time series: Date and Time Data types, basics of time series, frequencies, periods and how to do arithmetic with dates and lastly moving window functions.
- Advanced Pandas: What is categorical data, how to manipulate it, and advanced groupby functionalities.
Lastly, Chapters 13 and 14 are dedicated to different modelling libraries in Python for statistical analysis like statsmodels, and Data Analysis examples on different datasets.
If you want, you can find a video review of the book here:
Also, here you can find the link to the O’really entrance for this book.
Summary of Python for Data Analysis
Python for Data Analysis is a fantastic book to learn how to analyse data using Python, and to serve you as a desk reference to look up how to do certain tasks. It’s a product that can be of great help to developers on a day to day basis. You will learn basic to advanced Pandas and Numpy, and learn how to handle all kinds of data.
If you are working or looking forward to working with large amounts of data, Python for Data Analysis will save you many hours of work, it is very well written, easily explained, and amazingly helpful.
If you want to, you can buy the book on the following link:
Python for Data Analysis
- McKinney, Wes (Author)
- English (Publication Language)
- 550 Pages - 10/31/2017 (Publication Date) - O'Reilly Media (Publisher)