Data Science Training
August 1, 2021  August 1, 2025
FreeProfessional Data Science Training
Data science, also known as datadriven science, is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, structured or unstructured, similar to data mining.
Data science is a “concept to unify statistics, data analysis and their related methods” in order to “understand and analyze actual phenomena” with data. It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information science, and computer science, in particular from the subdomains of machine learning, classification, cluster analysis, data mining, databases, and visualization
Introduction to Data Science
 What is Data Science?
 Why now?
 Where Data Science is applicable?
Business Statistics
Introduction to statistics
Summarizing Data



Central Tendency measures – Mean, Median and Mode

Measures of Variability – Range, Interquartile Range, Standard Deviation and Variance

Measures of Shape – Skewness and Kurtosis

Covariance, Correlation Data Visualization

Histograms

Pie charts

Bar Graphs

Box Plot Probability basics


Parametric and Non parametric Statistical Tests



‘f’ Test

‘z’ Test

‘t’ Test

ChiSquare test Probability Distributions

Expected value and variance

Discrete and Continuous

Bernoulli Distribution

Binomial Distribution

Poisson Distribution

Normal Distribution

Exponential Distribution

Empirical Rule

Chebyshev’s Theorem


Sampling methods and Central Limit Theorem



Overview

Random sampling

Stratified sampling

Cluster sampling

Central Limit Theorem


Hypothesis Testing



Type I error

Type II error

Null and Alternate Hypothesis

Reject or Acceptance criterion

Pvalue


Confidence Intervals
ANOVA


Assumptions

One way

Two way

Artificial Intelligence – Machine Learning Introduction
Introduction to Machine Learning

What is Machine Learning?

Statistics (vs) Machine Learning

Types of Machine Learning
Supervised Learning
UnSupervised Learning
Reinforcement Learning
Artificial Intelligence – Supervised Machine Learning
Classification
 Nearest Neighbor Methods (knn)
 Logistic
Tree based Models – Decision Tree
 Basics
 Classification Trees
 Regression Trees
Probabilistic methods
 Bayes Rule
 Naïve Bayes Regression Analysis
 Simple Linear Regression
 Assumptions
 Model development and interpretation
 Sum of Least Squares
 Model validation
 Multiple Linear Regression Regression Shrinkage Methods
 Lasso
 Ridge
Advanced Models – Black Box
 Support Vector Machine
 Neural Networks
Ensemble Models
 Bagging
 Boosting
 Random Forests
Optimization
 Gradient Descent (Batch and Stochastic)
Recommendation Systems
 Collaborative filtering
 User based filtering
Item based filtering
Artificial Intelligence – Unsupervised Machine Learning

Association Rules (Market Basket Analysis)

Apriori Cluster Analysis

Hierarchical clustering

KMeans clustering Dimensionality Reduction

Principal Component Analysis

Discriminant Analysis (LDA/GDA)
Model Validation
Confusion Matrix ROC
Curve (AUC) Gain and
Lift Chart
KolmogorovSmirnov Chart Root Mean
Square Error (RMSE)Cross Validation

Leave one out cross validation (LOOCV)

Kfold cross validation
Artificial Intelligence – Natural Language Processing

Introduction to Natural Language Processing Sentiment

Analysis

Text Similarity
Artificial Intelligence – Deep Learning

Deep Learning Introduction

Convolutional Neural Network

Recurrent Neural Network
R Programming Language
Introduction

R Overview

Installation of R and RStudio software

Important R Packages

Datatypes in R – Vectors, Lists, Matrices, Arrays, Data FramesDecision making & Loops

Ifelse, while, for

Next, break. trycatch
Functions

Writing functions

Nested functions
Builtin functions

Vapply, Sapply, Tapply, Lapply etc.Data Preparation/Manipulation

Reading and Writing Data

Summarize and structure of data

Exploring different datasets in R

Subsetting Data Frames

String manipulation in Data Frames

Handling Missing Values, Changing Data types, Data Binning Techniques,Dummy Variables Data Visualization using ggplot2

Basic charts – Histograms, Bar plots, Line graphs, Scatter plots etc.
Numpy Pandas

Introduction to Dataframes

Conversion of written R codes into pythonScipy
Machine Learning in Python
Beautiful Soup
Matplotlib