Scikit-learn Professional Practitioner
About
The Scikit-learn Professional Practitioner Certification is designed to ensure that our certified professionals possess both the conceptual understanding and practical skills of a mid-level data scientist. When applying to it, you should be proficient in the usage of scikit-learn’s tools and functions, as well as possess skills in the following areas:
-
Advanced machine learning knowledge
Proficiency in a broad range of machine learning algorithms and the ability to select appropriate models for specific problems.
-
Programming expertise
Strong coding skills in Python, with experience in optimizing code for performance and scalability.
-
Data handling and engineering
Ability to handle large datasets, including data extraction, transformation, and loading processes.
-
Feature engineering
Experience in creating and selecting features to improve model performance.
-
Model tuning and optimization
Proficiency in hyperparameter tuning, model selection, and ensemble methods to improve model performance.
-
Critical thinking
Ability to approach complex problems systematically and evaluate multiple solutions. This includes being able to diagnose possible issues in a model pipeline.
-
Business expertise
Understanding of how machine learning projects align with business goals and the ability to translate technical results into actionable business insights.
Program
Machine learning concepts
- Supervised learning and unsupervised (regression, classification, clustering, dimensional reduction)
- Types of model families (tree-based, linear, ensemble, neighbors)
- Regularization (L1, L2, Elasticnet)
- Hard and soft predictions in classification (predict vs predict_proba)
- Model overfitting and underfitting impact on soft predictions
Data preprocessing
- Loading parquet datasets
- Visualizing data with intermediate plotting techniques (heatmaps, PCA)
- Identify strongly correlated features
- Handling missing values in the target by using label propagation
- Feature engineering using PolynomialFeatures, SplineTransformer, etc
- Combining features with FeatureUnion
Model building and evaluation
- Linear models as baselines
- Handling correlation with regularization and feature selection
- Understanding of bagging and boosting ensemble methods
- Correct choice of metrics (presence of outliers, imbalanced settings, etc)
Model selection and validation
- Broader understanding of cross-validation techniques (group structure, non i.i.d. data, etc)
- Performing hyperparameter tuning using GridSearchCV, RandomSearchCV
- Stability of optimal hyperparameters across splits with nested cross validation
Interpretation of results & communication
- Visualizing model results using intermediate plotting techniques (matplotlib, seaborn)
- Interpreting and communicating model outputs and performance metrics to non-technical stakeholders
Register for an exam
Ready to pass the exam? Create an account on Webassessor and schedule an upcoming exam date.