What do you need to study to become a Data Scientist?
If we first focus on the Kaggle curriculum at https://www.kaggle.com/learn/overview
We can see that they teach:
Python
Learn the most important language for data science.
Intro to Machine Learning
Learn the core ideas in machine learning, and build your first models.
Intermediate Machine Learning
Learn to handle missing values, non-numeric values, data leakage, and more. Your models will be more accurate and useful.
Data Visualization
Make great data visualizations. A great way to see the power of coding!
Pandas
Solve short hands-on challenges to perfect your data manipulation skills.
Feature Engineering
Discover the most effective way to improve your models.
Deep Learning
Use TensorFlow to take machine learning to the next level. Your new skills will amaze you.
Intro to SQL
Learn SQL for working with databases, using Google BigQuery to scale to massive datasets.
Advanced SQL
Take your SQL skills to the next level.
Geospatial Analysis
Create interactive maps, and discover patterns in geospatial data.
Microchallenges
Solve ultra-short challenges to build and test your skill.
Machine Learning Explainability
Extract human-understandable insights from any machine learning model.
Natural Language Processing
Distinguish yourself by learning to work with text data.
Intro to Game AI and Reinforcement Learning
Build your own video game bots, using classic algorithms and cutting-edge techniques.
If we focus on Udacity data scientist nano degree what they teach at https://www.udacity.com/course/data-scientist-nanodegree--nd025
Their Nanodgree prerequisite:
Programming:
○ Python Programming: Writing functions, logic, control flow, and building basic applications, as well as common data analysis libraries like NumPy and pandas
○ SQL programming: Querying databases using joins, aggregations, and subqueries
○ Comfortable using the Terminal, version control in Git, and using GitHub
● Probability and Statistics
○ Descriptive Statistics: Calculating measures of center and spread, estimation distributions
○Inferential Statistics: Sampling distributions, hypothesis testing
○ Probability: Probability theory, conditional probability
● Mathematics
○ Calculus: Maximizing and minimizing algebraic equations
○ Linear Algebra: Matrix manipulation and multiplication
● Data wrangling
○ Accessing database, CSV, and JSON data
○ Data cleaning and transformations using pandas and Sklearn
● Data visualization with matplotlib
○ Exploratory data analysis and visualization
○ Explanatory data visualizations and dashboards
● Machine Learning
○ Feature Engineering
○ Supervised Learning: Regression, classification, decision trees, random forest
○ Unsupervised Learning: PCA, Clustering
If we see the core curriculum and extra curriculum what they teach:
Software Engineering: Software engineering skills are increasingly important for data scientists. In this course, you’ll learn best practices for writing software. Then you’ll work on your software skills by coding a Python package and a web data dashboard.
Data Engineering: In data engineering for data scientists, you will practice building ETL, NLP, and machine learning pipelines. This will prepare you for the project with our industry partner
Experimental Design & Recommendations: Learn to design experiments and analyze A/B test results. Explore approaches for building recommendation systems.
Extra Curriculum:
Convolutional Neural Networks
Spark
Python for Data Analysis
SQL
Command Line Essentials
Git & Github
Linear Algebra
Practical Statistics
In summary all these you need to learn:
Statistics and Probability
Programming language Python/R.
Database Query Language: SQL
Machine Learning
Deep Learning
Software Engineering
Data Engineering
Experimental Design & Recommendations
Natural Language Processing
Data Visualization
Feature Engineering
Git and Github
Mathematics: Calculus, Linear Algebra
Experimental Design and Recommendation Engine
Spark
Command Line
Overall you need to give enough effort to become a data scientist. It’s a long journey. You need to stay focused, persistent, and work very hard to become a Data Scientist.