Skip to content

Scikit-learn Professional Practitioner

06

 

 

About

The Scikit-learn Professional Practitioner Certification is designed to ensure that our certified professionals possess both the conceptual understanding and practical skills of a mid-level data scientist. When applying to it, you should be proficient in the usage of scikit-learn’s tools and functions, as well as possess skills in the following areas:

  • Advanced machine learning knowledge

    Proficiency in a broad range of machine learning algorithms and the ability to select appropriate models for specific problems.

  • Programming expertise

    Strong coding skills in Python, with experience in optimizing code for performance and scalability.

  • Data handling and engineering

    Ability to handle large datasets, including data extraction, transformation, and loading processes.

  • Feature engineering

    Experience in creating and selecting features to improve model performance.

  • Model tuning and optimization

    Proficiency in hyperparameter tuning, model selection, and ensemble methods to improve model performance.

  • Critical thinking

    Ability to approach complex problems systematically and evaluate multiple solutions. This includes being able to diagnose possible issues in a model pipeline.

  • Business expertise

    Understanding of how machine learning projects align with business goals and the ability to translate technical results into actionable business insights. 

Program

Machine learning concepts
  • Supervised learning and unsupervised (regression, classification, clustering, dimensional reduction)
  • Types of model families (tree-based, linear, ensemble, neighbors)
  • Regularization (L1, L2, Elasticnet)
  • Hard and soft predictions in classification (predict vs predict_proba)
  • Model overfitting and underfitting impact on soft predictions
Data preprocessing
  • Loading parquet datasets
  • Visualizing data with intermediate plotting techniques (heatmaps, PCA)
  • Identify strongly correlated features
  • Handling missing values in the target by using label propagation
  • Feature engineering using PolynomialFeatures, SplineTransformer, etc
  • Combining features with FeatureUnion
Model building and evaluation
  • Linear models as baselines
  • Handling correlation with regularization and feature selection
  • Understanding of bagging and boosting ensemble methods
  • Correct choice of metrics (presence of outliers, imbalanced settings, etc)
Model selection and validation
  • Broader understanding of cross-validation techniques (group structure, non i.i.d. data, etc)
  • Performing hyperparameter tuning using GridSearchCV, RandomSearchCV
  • Stability of optimal hyperparameters across splits with nested cross validation
Interpretation of results & communication
  • Visualizing model results using intermediate plotting techniques (matplotlib, seaborn)
  • Interpreting and communicating model outputs and performance metrics to non-technical stakeholders

Recommended training and resources

Please have a look at the program to prepare for the certification on your own.
Successfully completing all modules of the official scikit-learn MOOC (Massive Open Online Course)—available in a static version on GitHub Pages or as the full interactive experience—will provide a strong foundation for successfully passing the Professional certification exam.
Please note, however, that the certification program covers a broader range of topics. For example, the MOOC does not include feature selection nor unsupervised learning, which is part of the certification syllabus.

Register for an exam

Ready to pass the exam? Create an account on Webassessor and schedule an upcoming exam date.