Scikit-learn Expert Practitioner
About
The Scikit-learn Expert Practitioner Certification is designed to ensure that our certified professionals posses both the conceptual understanding and practical skills of a senior data scientist. When applying to it, you should be proficient in the usage of a broad range of scikit-learn’s tools and functions, as well as posses skills in the following areas:
-
Expert-level machine learning
In-depth knowledge of machine learning algorithms, including emerging trends and best practices.
-
Algorithm development
Ability to develop and implement custom machine learning algorithms tailored to specific problems. -
Model deployment
Expertise in deploying machine learning models into production environments, including knowledge of MLOps.
-
Research & innovation
Ability to conduct independent research and contribute to the development of new methods or tools.
-
Strategic planning
Involvement in long-term planning and strategy development for data science initiatives within the organization.
-
Strategic vision
Strong understanding of the broader industry and market trends to shape the strategic direction of machine learning efforts.
-
Model diagnostics
Identify, troubleshoot, and resolve potential problems within the machine learning pipeline of other team members.
Program
Machine Learning concepts
-
Supervised learning and unsupervised (regression, classification, clustering, dimensional reduction)
-
Types of model families (tree-based, linear, ensemble, neighbors)
-
Loss functions and surrogate loss
-
Splitting criteria in Decision Trees
-
Filter, wrapper and embedded methods for feature selection
- Calibration (expected calibration error) vs ranking power (ROC AUC / GINI)
Data preprocessing
- Loading parquet datasets
- Extract information from plots, e.g.:
- decide on which family of models may be the best fit
- Data wrangling
- Combining data from multiple sources
- Adding new features or derived attributes (e.g. lagged features for time based data)
Model building and evaluation
- Create your own estimator
- NearestCentroid
- Recommender systems
- Transformers
- Metadata routing
- Calibration plots with CalibrationDisplay and post-calibration with CalibratedClassifierCV
Model selection and validation
- Performing hyperparameter tuning with proper scoring rules (calibration)
Model deployment
- Understanding how to save and load trained models using joblib , pickle or skops.
Interpretation of results and communication
- Explainability and interpretability
- partial dependence plots: impact non-linear on the target?
- permutation importance
- Debugging the methodology
- given a plot, give a diagnostic for the model
- identify pitfalls in the modeling process (e.g. Feature selection techniques inside or outside the pipeline)
- code comprehension and good practices
Recommended training and resources
Please have a look at the program to prepare for the certification on your own.
Successfully completing the official scikit-learn MOOC (Massive Open Online Course)—available in a static version on GitHub Pages or as the full interactive experience—as well as navigating some of the most prominent examples in the scikit-learn documentation will provide a foundation for successfully passing the Expert certification exam.
Here’s is a non-exhaustive list of examples to look at:
-
Features in Histogram Gradient Boosting Trees
-
Prediction Intervals for Gradient Boosting
-
Time-related feature engineering
-
Common pitfalls in the interpretation of coefficients
-
Failure of Machine Learning to infer causal effects
-
Permutation Importance with Multicollinear or Correlated Features
Please note, however, that the certification program covers a broader range of topics. Nothing better than experience and critical thinking to ace the test!
Register for an exam
Ready to pass the exam? Create an account on Webassessor and schedule an upcoming exam date.