scikit-learn Expert study guide

Expert-level machine learning

In-depth knowledge of machine learning algorithms, including emerging trends and best practices.
Algorithm development
Ability to develop and implement custom machine learning algorithms tailored to specific problems.
Model deployment

Expertise in deploying machine learning models into production environments, including knowledge of MLOps.
Research & innovation

Ability to conduct independent research and contribute to the development of new methods or tools.

Strategic planning

Involvement in long-term planning and strategy development for data science initiatives within the organization.
Strategic vision

Strong understanding of the broader industry and market trends to shape the strategic direction of machine learning efforts.
Model diagnostics

Identify, troubleshoot, and resolve potential problems within the machine learning pipeline of other team members.

Machine Learning concepts

Supervised learning and unsupervised (regression, classification, clustering, dimensional reduction)
Types of model families (tree-based, linear, ensemble, neighbors)
Loss functions and surrogate loss
Splitting criteria in Decision Trees
Filter, wrapper and embedded methods for feature selection
Calibration (expected calibration error) vs ranking power (ROC AUC / GINI)

Data preprocessing

Loading parquet datasets
Extract information from plots, e.g.:
- decide on which family of models may be the best fit
Data wrangling
- Combining data from multiple sources
- Adding new features or derived attributes (e.g. lagged features for time based data)

Model building and evaluation

Create your own estimator
- NearestCentroid
- Recommender systems
- Transformers
Metadata routing
Calibration plots with CalibrationDisplay and post-calibration with CalibratedClassifierCV

Model selection and validation

Model deployment

Understanding how to save and load trained models using joblib , pickle or skops.

Interpretation of results and communication

Explainability and interpretability
- partial dependence plots: impact non-linear on the target?
- permutation importance
Debugging the methodology
- given a plot, give a diagnostic for the model
- identify pitfalls in the modeling process (e.g. Feature selection techniques inside or outside the pipeline)
- code comprehension and good practices

Recommended training and resources