Feature Engineering for Machine Learning

Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists

feature engineering for machine learning pdf

Feature Engineering for Machine Learning: The following is a review of the book Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists by Amanda Casari and Alice Zheng (Twitter here), senior Data Scientists at Google and Amazon respectively.

Review of Feature Engineering for Machine Learning

You’ve probably heard the phrase a million times:

‘In a Machine Learning project 80% is cleaning and transforming the data and 20% is model building and training’. This book speaks about the first 80%.

Feature engineering for Machine Learning is a crucial step in any Data Science/ML Pipeline, however most texts are dedicated to model building and training (the 20% of the previous paragraph), rarely covering the topic of feature engineering on its own.

With this very practical book, you’ll learn techniques for extracting and creating features out of raw data, so that your machine learning models can better understand them and therefore achieve better results.

Feature engineering for Machine learning covers the main techniques of today’s Data Science toolkit, as well as some hints on how to optimally apply them. The book uses the best Python packages including numpy, Pandas, Scikit-learn, and Matplotlib, featuring them in the wide variety of examples it includes.

In the text you will learn about the following topics:

  • Feature engineering for numeric data: filtering, binning, scaling, log transforms, and power transforms.
  • Natural text techniques: bag-of-words, n-grams, and phrase detection (check out our awesome NLP posts)
  • Frequency-based filtering and feature scaling for eliminating uninformative features
  • Encoding techniques of categorical variables, including feature hashing and bin-counting
  • Model-based feature engineering with Principal Component Analysis
  • The concept of model stacking, using k-means as a featurization technique
  • Image feature extraction with manual and deep-learning techniques

This book tries to teach the intuition first, and the mathematics second. Instead of only discussing how something is done, it tries to teach the why, with the goal of providing the intuition behind the ideas, so that the reader may understand how and when to apply them.

Contents of Feature Engineering for Machine Learning

The contents of the book are the following:

  1. The Machine Learning Pipeline: Data, Tasks, Models and Features.
  2. Fancy Tricks with Simple Numbers: scalars, vectors, binarization, log transforms, feature scaling and normalization.
  3. Text Data: Flattening, Filtering, and Chunking: N-grams and bag of words.
  4. The Effects of Feature Scaling: From Bag-of-Words to Tf-Idf
  5. Categorical Variables: Counting Eggs in the Age of Robotic Chickens: one hot and dummies, categorical variable encoding types, and feature hashing.
  6. Dimensionality Reduction: Squashing the Data Pancake with PCA: PCA, its uses, advantages, and limitations.
  7. Nonlinear Featurization via K-Means Model Stacking: K means clustering and how to use the results as features for machine learning models.
  8. Automating the Featurizer: Image Feature Extraction and Deep Learning: a full chapter on how to extract awesome features out of our image data.
  9. Back to the Feature: Building an Academic Paper Recommender: a very interesting final project to finish the book with.

You can find the official website of the book here.

Summary of Feature Engineering for Machine Learning

This is a very easy to read book that covers an essential and many times overseen part of a Machine Learning Pipeline. It contains very clear Python coding examples, and we suggest that you try them yourself and play around with the code, exploring what you can do with it.

You can find it on Amazon at the best price here:

Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists
  • Zheng, Alice (Author)
  • English (Publication Language)
  • 215 Pages - 05/08/2018 (Publication Date) - O'Reilly Media (Publisher)

It is pretty short and can be digested fairly quickly, so we recommend reading it after books like Python Machine Learning or Hands on Machine Learning with Python, Tensorflow and Keras. You best have some Python knowledge when tackling Feature Engineering for Machine learning, so if you don’t we recommend texts like Python Crash Course or our awesome list of Python Programming online courses.

Once you’ve read it, you might want to further down your statistical knowledge with texts like Practical Statistics for Data Scientist or to start grasping some coding architecture patterns with amazing books like Architecture Patterns with Python. Check out our reviews!

feature engineering for machine learning principles and techniques for data scientists

Thanks for reading How to Learn Machine Learning and have an awesome day!

Tags: Feature Engineering, Feature Engineering Python, Feature Engineering for Machine Learning, Feature Selection.