What do you need to study to become a Data Scientist?

Md Rana Mahmud
3 min readMay 7, 2020

If we first focus on the Kaggle curriculum at https://www.kaggle.com/learn/overview

We can see that they teach:

Python

Learn the most important language for data science.

Intro to Machine Learning

Learn the core ideas in machine learning, and build your first models.

Intermediate Machine Learning

Learn to handle missing values, non-numeric values, data leakage, and more. Your models will be more accurate and useful.

Data Visualization

Make great data visualizations. A great way to see the power of coding!

Pandas

Solve short hands-on challenges to perfect your data manipulation skills.

Feature Engineering

Discover the most effective way to improve your models.

Deep Learning

Use TensorFlow to take machine learning to the next level. Your new skills will amaze you.

Intro to SQL

Learn SQL for working with databases, using Google BigQuery to scale to massive datasets.

Advanced SQL

Take your SQL skills to the next level.

Geospatial Analysis

Create interactive maps, and discover patterns in geospatial data.

Microchallenges

Solve ultra-short challenges to build and test your skill.

Machine Learning Explainability

Extract human-understandable insights from any machine learning model.

Natural Language Processing

Distinguish yourself by learning to work with text data.

Intro to Game AI and Reinforcement Learning

Build your own video game bots, using classic algorithms and cutting-edge techniques.

If we focus on Udacity data scientist nano degree what they teach at https://www.udacity.com/course/data-scientist-nanodegree--nd025

Their Nanodgree prerequisite:

Programming:

○ Python Programming: Writing functions, logic, control flow, and building basic applications, as well as common data analysis libraries like NumPy and pandas

○ SQL programming: Querying databases using joins, aggregations, and subqueries

○ Comfortable using the Terminal, version control in Git, and using GitHub

● Probability and Statistics

○ Descriptive Statistics: Calculating measures of center and spread, estimation distributions

○Inferential Statistics: Sampling distributions, hypothesis testing

○ Probability: Probability theory, conditional probability

● Mathematics

○ Calculus: Maximizing and minimizing algebraic equations

○ Linear Algebra: Matrix manipulation and multiplication

● Data wrangling

○ Accessing database, CSV, and JSON data

○ Data cleaning and transformations using pandas and Sklearn

● Data visualization with matplotlib

○ Exploratory data analysis and visualization

○ Explanatory data visualizations and dashboards

● Machine Learning

○ Feature Engineering

○ Supervised Learning: Regression, classification, decision trees, random forest

○ Unsupervised Learning: PCA, Clustering

If we see the core curriculum and extra curriculum what they teach:

Software Engineering: Software engineering skills are increasingly important for data scientists. In this course, you’ll learn best practices for writing software. Then you’ll work on your software skills by coding a Python package and a web data dashboard.

Data Engineering: In data engineering for data scientists, you will practice building ETL, NLP, and machine learning pipelines. This will prepare you for the project with our industry partner

Experimental Design & Recommendations: Learn to design experiments and analyze A/B test results. Explore approaches for building recommendation systems.

Extra Curriculum:

Convolutional Neural Networks

Spark

Python for Data Analysis

SQL

Command Line Essentials

Git & Github

Linear Algebra

Practical Statistics

In summary all these you need to learn:

Statistics and Probability

Programming language Python/R.

Database Query Language: SQL

Machine Learning

Deep Learning

Software Engineering

Data Engineering

Experimental Design & Recommendations

Natural Language Processing

Data Visualization

Feature Engineering

Git and Github

Mathematics: Calculus, Linear Algebra

Experimental Design and Recommendation Engine

Spark

Command Line

Overall you need to give enough effort to become a data scientist. It’s a long journey. You need to stay focused, persistent, and work very hard to become a Data Scientist.

--

--

Md Rana Mahmud

Sr. Data Scientist, Statistician, and Software Engineer. Loves programming and data science.