Intro

Contents

Intro#

Is a library that implements many tools for machine learning and advanced data analysis.

Data transform#

The Sklearn provides a set of tools for data processing. The following table lists some of these classes.

Area of Preprocessing

Class Name

Description

Scaling & Standardization

StandardScaler

Standardizes features by removing the mean and scaling to unit variance (\(z\)-score).

MinMaxScaler

Scales features to a given range, typically \([0, 1]\).

MaxAbsScaler

Scales each feature by its maximum absolute value, typically to the range \([-1, 1]\). Useful for sparse data.

RobustScaler

Scales features using statistics that are robust to outliers (e.g., median and interquartile range).

Normalization

Normalizer

Normalizes samples individually to have unit norm (e.g., \(l_1\) or \(l_2\) norm).

Missing Values Imputation

SimpleImputer

Imputes missing values using a simple strategy (e.g., mean, median, most frequent, or a constant value).

KNNImputer

Imputes missing values using the K-Nearest Neighbors approach.

IterativeImputer

Imputes missing values by modeling each feature with missing values as a function of other features and using that estimate for imputation.

Categorical Encoding

OneHotEncoder

Encodes categorical features as a one-hot or dummy feature matrix.

OrdinalEncoder

Encodes categorical features into ordinal integers.

Feature Transformation

PolynomialFeatures

Generates polynomial and interaction features (e.g., \(x_1^2, x_1 x_2, x_2^2\)).

FunctionTransformer

Allows creating a transformer from an arbitrary function.

QuantileTransformer

Transforms features using quantiles, forcing the data to follow a uniform or normal distribution.

PowerTransformer

Applies power transforms (e.g., Yeo-Johnson or Box-Cox) to make data more Gaussian-like.

Discretization / Binning

KBinsDiscretizer

Bins continuous data into \(k\) discrete intervals.

Pipeline/Composition

Pipeline

Sequentially applies a list of transformers and a final estimator.

ColumnTransformer

Applies different transformers to different subsets of features.

For more details, check the Data Transform page.