You’ve probably read a lot, perhaps many textbooks to prepare for the post, but you’re still not sure you’re going to get it. And it’s understandable, a job in data science can be hard to get. There is one best way to show your skills, and that it by putting a portfolio together. This will show your future bosses that you can make the most out of those skills you’ve been learning.
We’ve written this article to give you five types of projects you can include in your portfolio.
As an unwritten rule, data scientists will spend about 80% of their time on a project called cleaning data. It’s very important – if you show what you can do with clean data, then you’ll show you’re valuable for the team. For this, you need to find messy data sets and clean them.
If Python is your favorite, Pandas is definitely what you need. For R, the dplyr package is what you’ll want to use. Just make sure you show that you can join multiple datasets, import data, detect anomalies and missing values, that you can input for those missing values and that you can offer data quality assurance.
Exploratory Data Analysis
Exploratory data analysis (EDA for short) is also an important thing to take into account. By doing this, you generate questions and investigate them with the help of visualizations.
This is what lets you come to a conclusion from data to the drive business impact. It will probably get you interesting details that are mainly based on customer segments or sale trends, that are mainly based on seasonal effects. Sometimes you’ll find new things that weren’t even on the list of the outcomes from the very beginning.
For Python, you can use Pandas and Matplotlib. For R, the ggplot2 package will be useful. For this project, you’ll want to show your ability to formulate relevant questions for the investigation, to identify trends and covariation when it comes to variables and that you can effectively communicate the results by using visualizations (we’re talking about histograms, whiskers and box and scatterplots).
Interactive Data Visualizations
When talking about interactive data visualizations, we’re talking about tools like dashboards. They’re used for the teams of data scientists, just as well as users who want a business.
These dashboards are useful, as they help science teams to collaborate with each other and come to conclusions together. They also give an interactive tool for those users who want a business. But they don’t really focus on strategic goals, but on technical details.
If you use Python, Plotly or Bokeh are great for dashboards. For R, the Shiny package is the one for you. For this project, you’ll need to show that you can create helpful features and that you can include metrics which are relevant to the needs of your customer. You also need to show that you can create an optimum refresh rate and a logical layout for easy scanning. Also, you might want to show them that you can also generate reports or any other automated actions.
This is also one of the most important things in data science. Communication is what makes the difference between a good and a perfect data scientist. It does not stay in how fancy the model is – if you cannot explain to them what the product is, then you won’t be able to have any sales. Notebooks and slides are such good tools for communication.
Make sure you show them that you know what the target audience is, and you present them relevant visualizations. Also, you might want to keep in mind that you don’t need to crowd your slides with info, but that your presentation is going to go smoothly.