<h1 id="machine-learning-representation">Machine Learning : Representation</h1>
In order to train a model, you must choose the set of features that best represent the data.
Feature engineering means transforming raw data into a feature vector. A lot of time is put into feature programming for machine learning.
Properties of a good feature :
Feature values should appear with non-zero value more than a small handful of times in the dataset.
Features should have a clear, obvious meaning.
Features shouldn't take on "magic" values.
The definition of a feature shouldn't change over time.
Distribution should not have extreme outliers.
Good Habits :
Visualize: Plot histograms, rank most to least common.
Debug: Duplicate examples? Missing values? Outliers? Data agrees with dashboards? 
Training and Validation data similar?
Monitor: Feature quantiles, number of examples over time?

# Machine Learning : Representation

In order to train a model, you must choose the set of features that best represent the data.

Feature engineering means transforming raw data into a feature vector. A lot of time is put into feature programming for machine learning.

**Properties of a good feature :**

Feature values should appear with non-zero value more than a small handful of times in the dataset.

Features should have a clear, obvious meaning.

Features shouldn't take on "magic" values.

The definition of a feature shouldn't change over time.

Distribution should not have extreme outliers.

**Good Habits :**

Visualize: Plot histograms, rank most to least common.

Debug: Duplicate examples? Missing values? Outliers? Data agrees with dashboards? 
Training and Validation data similar?

Monitor: Feature quantiles, number of examples over time?








Best practices for Feature engineering : 
Represent data as features in Machine Learning

Ashutosh

Ashutosh

Machine learning : How to represent data?

Best practices for Feature engineering

Machine Learning : Representation