Data Science & Machine Learning Program

Home >> Data Science & Machine Learning Program

Data Science & Machine Learning

What You’ll Learn

Description

Week 1: Project Setup & Python Basics

  • CRISP-ML(Q) framework and business problem definition (Churn Prediction)
  • Python environment setup (Anaconda, Jupyter, Google Colab)
  • Python fundamentals: variables, data types, operators, control flow (if-else, loops)
  • Writing and running Python scripts
  • Git/GitHub for version control

Week 2: SQL & Data Extraction

  • Database basics and SQL fundamentals (SELECT, WHERE, ORDER BY)
  • SQL JOINS and GROUP BY operations
  • PARTITION BY for window functions
  • Connecting Python to databases and extracting data
  • Initial data profiling and documentation

Week 3: Python Data Structures & NumPy

  • Lists, tuples, sets, dictionaries and comprehensions
  • Python functions, lambda, map/reduce
  • NumPy arrays and operations
  • Array indexing, slicing, broadcasting
  • Mathematical operations for data analysis

Week 4: Pandas & EDA Fundamentals

  • Pandas DataFrames: loading, selecting, filtering data
  • Descriptive statistics (mean, median, mode, variance, std dev)
  • Groupby and pivot tables
  • Data quality assessment
  • Initial exploratory analysis on churn dataset

Week 5: Data Visualization & Correlation

  • Matplotlib and Seaborn for plotting
  • Univariate plots (histogram, boxplot, violin) and bivariate plots (scatter, line)
  • Covariance, Pearson and Spearman correlation
  • Correlation heatmaps and correlation vs causation
  • Complete EDA report with visual insights
  • Week 6: Data Cleaning

    • Missing value detection and imputation techniques
    • Error handling in Python (try-except, exceptions)
    • File handling and data I/O operations
    • Code documentation (docstrings) and modularization
    • Building reusable data cleaning functions

    Week 7: Outlier Treatment & Feature Engineering

    • Outlier detection (Z-score, IQR, Isolation Forest)
    • Feature engineering from datetime and RFM analysis
    • Aggregation and time-based features
    • Mathematical transformations (log, sqrt, polynomial)
    • Creating interaction features

    Week 8: Feature Encoding & Scaling

    • Categorical encoding (Label, One-Hot, Target, Frequency encoding)
    • Feature scaling (Standardization, Min-Max, Robust scaling)
    • Column standardization strategies
    • Creating preprocessed datasets
    • Feature transformation pipeline documentation

    Week 9: Feature Selection & Math Foundations

    • Feature importance concepts and curse of dimensionality
    • Filter methods (correlation, variance threshold, chi-square)
    • Wrapper and embedded methods for selection
    • Matrix algebra: vectors, matrices, eigen values/vectors
    • Train-test-validation split with stratified sampling
  • Week 10: ML Fundamentals & Metrics

    • Types of ML (Supervised, Unsupervised, Reinforcement)
    • Classification vs Regression and validation techniques (K-fold, Stratified CV)
    • Evaluation metrics (Accuracy, Precision, Recall, F1, ROC-AUC)
    • Imbalanced dataset handling (SMOTE, class weights)
    • Bias-variance tradeoff

    Week 11: Probability & Linear Models

    • Probability basics, conditional probability, Bayes' rule
    • Probability distributions (Normal, Bernoulli, Binomial)
    • Linear regression and Logistic regression
    • Regularization (L1, L2, Elastic Net)
    • Model interpretation and coefficient analysis

    Week 12: Distance & Probabilistic Models

    • K-Nearest Neighbors (KNN) algorithm and distance metrics
    • Optimal K selection and feature scaling impact
    • Naïve Bayes classifier (Gaussian, Multinomial, Bernoulli)
    • Model comparison and performance analysis
    • Selecting best performing model

    Week 13: Tree-Based Models & SVMs

    • Decision Trees (CART, entropy, Gini impurity, pruning)
    • Support Vector Machines (linear, kernel trick, RBF)
    • Hyperparameter tuning for trees and SVMs
    • Decision boundary visualization

    Comprehensive traditional ML model comparison

  • Week 14: Bagging & Random Forest

    • Ensemble learning principles and bagging technique
    • Random Forest algorithm and feature importance
    • Hyperparameter tuning (n_estimators, max_depth, max_features)
    • Out-of-bag error estimation
    • Voting classifiers (hard and soft voting)

    Week 15: Boosting Methods

    • Boosting fundamentals (AdaBoost, Gradient Boosting)
    • XGBoost, LightGBM, CatBoost overview
    • Hyperparameter optimization (learning_rate, max_depth, regularization)
    • Stacking and blending ensembles
    • Ensemble method comparison

    Week 16: Statistical Hypothesis Testing

    • Population vs sample, Central Limit Theorem
    • Confidence intervals and standard error
    • Hypothesis testing (null/alternative, p-values, significance)
    • Z-test and t-test for model comparison
    • Cross-validation strategies and model selection

    Week 17: Model Evaluation & Deployment Prep

    • Converting ML metrics to business KPIs
    • ROI calculation and cost-benefit analysis
    • Model monitoring and A/B testing design
    • Model documentation and deployment artifacts
    • Final churn prediction model with business impact report
  • Week 18: Neural Networks Foundation

    New Project: Customer Review Sentiment Analysis

    • Biological vs artificial neurons, Perceptron and limitations
    • Multilayer Perceptron (MLP) architecture
    • Activation functions (Sigmoid, Tanh, ReLU, Softmax)
    • Forward propagation, loss functions, backpropagation
    • TensorFlow/Keras for building neural networks

    Week 19: Deep Learning Optimization

    • Gradient descent variants and optimizers (SGD, Adam, RMSprop)
    • Learning rate scheduling and batch size selection
    • Regularization (Dropout, Batch Normalization, Early Stopping)
    • Vanishing/exploding gradient problems
    • Hyperparameter tuning with TensorBoard monitoring

    Week 20: NLP Fundamentals

    • Text preprocessing (tokenization, stemming, lemmatization, stop words)
    • Bag of Words and N-grams (unigram, bigram, trigram)
    • Count Vectorizer and TF-IDF vectorization
    • Text classification using TF-IDF + ML models
    • Sentiment analysis baseline model

    Week 21: Word Embeddings & RNNs

    • Word2Vec (CBOW and Skip-gram) and pre-trained embeddings
    • Recurrent Neural Networks (RNN) and LSTM/GRU architectures
    • Backpropagation through time (BPTT)
    • Bidirectional RNNs for sequence modeling
    • LSTM-based sentiment analysis model

    Week 22: Large Language Models (LLMs)

    New Component: LLM-powered Chatbot

    • Transformer architecture and self-attention mechanism
    • Pre-trained LLMs (BERT, GPT) and Hugging Face Transformers
    • Fine-tuning BERT for sentiment classification
    • Prompt engineering and few-shot learning
    • Building chatbot with LLMs and RAG (Retrieval Augmented Generation)
  • Week 23: Time Series Fundamentals

    New Project: Retail Sales Forecasting

    • Time series components (trend, seasonality, cyclical, residual)
    • Stationarity testing (ADF test) and differencing
    • Autocorrelation (ACF) and Partial Autocorrelation (PACF)
    • Time series decomposition methods
    • Sales data exploration and pattern identification

    Week 24: Time Series Forecasting

    • ARIMA models (AR, MA, ARMA, ARIMA, SARIMA)
    • Model selection using AIC/BIC and statsmodels library
    • LSTM for time series forecasting
    • Multi-step ahead forecasting
    • Comparing statistical vs deep learning approaches

    Week 25: Unsupervised Learning

    New Project: Customer Segmentation

    • Clustering algorithms (K-Means, DBSCAN, Hierarchical)
    • Clustering evaluation metrics (Silhouette, Davies-Bouldin)
    • Principal Component Analysis (PCA) and covariance matrix
    • Singular Value Decomposition (SVD) and t-SNE
    • Customer segmentation with profiling and insights

    Week 26: Capstone Project - Phase 1

    • Capstone project selection and business problem definition
    • Data collection from multiple sources
    • Comprehensive EDA and data quality assessment
    • Data cleaning, feature engineering, and feature selection
    • Train-test split preparation

    Week 27: Capstone Project - Phase 2

    • Building baseline and advanced ML models
    • Hyperparameter tuning and model optimization
    • Cross-validation and statistical model selection
    • Business impact analysis and ROI calculation
    • Model documentation and deployment strategy

    Week 28: Final Presentations & Portfolio

    • Complete project documentation and GitHub repository
    • Portfolio website/showcase development
    • Presentation preparation with storytelling
    • Final project presentations and Q&A
    • Career guidance (resume, LinkedIn, interview prep)

Duration & Mode

Communication Is Everything

FAQ - Frequently Asked Questions

Got questions? Here are answers to some common queries about learning with Dataholic. If you need further help, our support team is always ready to assist.

We offer a range of programs in data analytics, data science, machine learning, and mechanical design, catering to beginners and advanced learners.

Yes! All our courses are designed for flexible self-paced learning, so you can study whenever and wherever it suits you.

No prior experience is required. Our courses start with foundational concepts and gradually move to advanced topics.

Course content, including videos, quizzes, and projects, is available on our easy-to-use web platform accessible via desktop or mobile browser.

Absolutely. Our support team is available to help with any technical or course-related queries to ensure a smooth learning experience.

We regularly update courses to include the latest industry developments and best practices.

Yes, a certificate of completion is awarded to students who successfully finish their courses.

We offer discounts up to 70% for new learners and flexible payment options to make learning affordable.

Yes, our platform includes forums and Q&A sessions to connect you with instructors and fellow students.

Scroll to Top