Skip to content

Scikit-learn Professional Practitioner

06

 

About

The Scikit-learn Professional Practitioner Certification is designed to ensure that our certified professionals possess both the conceptual understanding and practical skills of a mid-level data scientist. When applying to it, you should be proficient in the usage of scikit-learn’s tools and functions, as well as possess skills in the following areas:

  • Advanced machine learning knowledge

    Proficiency in a broad range of machine learning algorithms and the ability to select appropriate models for specific problems.

  • Programming expertise

    Strong coding skills in Python, with experience in optimizing code for performance and scalability.

  • Data handling and engineering

    Ability to handle large datasets, including data extraction, transformation, and loading processes.

  • Feature engineering

    Experience in creating and selecting features to improve model performance.

  • Model tuning and optimization

    Proficiency in hyperparameter tuning, model selection, and ensemble methods to improve model performance.

  • Critical thinking

    Ability to approach complex problems systematically and evaluate multiple solutions. This includes being able to diagnose possible issues in a model pipeline.

  • Business expertise

    Understanding of how machine learning projects align with business goals and the ability to translate technical results into actionable business insights. 

Program

Machine learning concepts
  • Supervised learning and unsupervised (regression, classification, clustering, dimensional reduction)
  • Types of model families (tree-based, linear, ensemble, neighbors)
  • Regularization (L1, L2, Elasticnet)
  • Hard and soft predictions in classification (predict vs predict_proba)
  • Model overfitting and underfitting impact on soft predictions
Data preprocessing
  • Loading parquet datasets
  • Visualizing data with intermediate plotting techniques (heatmaps, PCA)
  • Identify strongly correlated features
  • Handling missing values in the target by using label propagation
  • Feature engineering using PolynomialFeatures, SplineTransformer, etc
  • Combining features with FeatureUnion
Model building and evaluation
  • Linear models as baselines
  • Handling correlation with regularization and feature selection
  • Understanding of bagging and boosting ensemble methods
  • Correct choice of metrics (presence of outliers, imbalanced settings, etc)
Model selection and validation
  • Broader understanding of cross-validation techniques (group structure, non i.i.d. data, etc)
  • Performing hyperparameter tuning using GridSearchCV, RandomSearchCV
  • Stability of optimal hyperparameters across splits with nested cross validation
Interpretation of results & communication
  • Visualizing model results using intermediate plotting techniques (matplotlib, seaborn)
  • Interpreting and communicating model outputs and performance metrics to non-technical stakeholders

Coming soon

If you wish to get notified when the Scikit-learn Professional Practitioner Certification becomes available, please click on the "Get notified" button and fill in the form.