Today I’m going to share with you a step by step course on how you unleash the power of machine learning in your business to accurately predict sales and other crucial metrics, for completely free.
The best part? Your competition isn’t doing this – because of one of two reasons. 1, they can’t afford to. Hiring an in house data science team costs at minimum $130,000 per year – and that’s a conservative estimate. Or 2, they don’t know how. If either of those situations happen to be you, don’t worry. I’m about to fix that for you right now.
Click here to see my GitHub containing the full code breakdown: https://github.com/1jamjam/Data-Science—Python-/blob/main/Predicting%20Profit%20with%20Prophet
Let’s dive in.
Step 1) Clarify Your Goals.
In this scenario we are using sales, but this applies to all metrics you may be trying to predict for.
Step 2) Locate Your Data.
Where is your existing data being kept? In a spreadsheet? A Google doc? A csv file? This is crucial to determine what kind of transformations need to be done to start the process.
Step 3) Clean your data.
Transfer your data to a spreadsheet if you haven’t already and make the data as clean as possible before feeding it into the model. Remove any duplicate values, NaN (Not a number) values, and missing values. Since we are primarily covering machine learning in this tutorial, I will not cover data cleaning here. I will make a separate blog post on this topic.
Step 4) Set up your coding environment.
Go to https://colab.research.google.com/ for a free and easy to use environment where we will be running our code. For this example, we will be creating our predictive model using Python.
Step 5) Import Your Libraries.
Copy and paste this code into the first block:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from prophet import Prophet
For this task, we will be using Facebook’s Prophet model to create our predictions. If needed, you can look at the in depth documentation here: Prophet | Forecasting at scale. (facebook.github.io)
Step 6) Import the dataset, split the data, and plot the results:
Copy and paste this code into the following blocks:
NOTE: Replace the variables that relate to your SPECIFIC dataset for your lines of code.
First Block:
df = pd.read_csv(‘prophet.csv’)
Second Block:
split_date = ‘1-May-2024’
split_date = pd.to_datetime(split_date, format=’%d-%b-%Y’)
df = df.set_index(pd.to_datetime(df[‘ds’]))
df_train = df.loc[df.index <= split_date].copy()
df_test = df.loc[df.index > split_date].copy()
Third Block:
plt.figure(figsize=(10, 6)) # Set the figure size
plt.plot(df_train.index, df_train[‘y’], label=’Train Data’, color=’blue’)
# Plot the test data
plt.plot(df_test.index, df_test[‘y’], label=’Test Data’, color=’orange’)
# Add labels and title
plt.xlabel(‘Date’)
plt.ylabel(‘Values’)
plt.title(‘Train vs Test Data’)
plt.show()
At this point, you should see a plot that represents the train vs test parts of the data.
Step 7) Fit the training set to the model:
model = Prophet()
model.fit(df_train)
Step 8) Create ‘Test” Predictions and Plot them Against the Actual Data:
Copy this code (Remember to make the data specific replacements that are specific to your information).
df_test_fcst = model.predict(df_test)
df_test_fcst.head()
At this point you should see a printed dataframe that shows you the predicted values for the time period of the test set.
This dataframe will show you the trends, upper and lower bounds, and if you scroll all the way over to the column titled ‘yhat’, you will see the model predictions.
Step 9) Plot and Compare
First, plot the predictions that were just made:
fig, ax = plt.subplots(figsize=(10, 5))
fig = model.plot(df_test_fcst, ax=ax)
plt.show()
The black dots are actual values, whereas the blue line is the model’s predictions, and the light blue shaded area is the upper and lower bounds.
Now, we will compare our forecast to our actual recorded values.
#Comparing Forecast to Actuals
# Plot forecast along with actual values
f, ax = plt.subplots(figsize=(15,5))
ax.scatter(df_test.index, df_test[‘y’], color = ‘r’)
fig = model.plot(df_test_fcst, ax=ax)
The black dots are your datasets actual values from your training set.
The red dots are your datasets actual values from your testing set.
The blue line is the model’s predictions, and the light blue shaded areas are the model’s upper and lower bounds for its predictions.
Cool, right? But you may be wondering, how do I know whether or not the model is accurate or not, and how can I measure it?
Here’s how we solve that problem:
Step 10) Evaluate the Model Using Error Metrics:
Copy this code (Adjust code for your situation)
# Evaluating the model using Error Metrics
#Mean Squared Error
from sklearn.metrics import mean_squared_error, mean_absolute_percentage_error
import numpy as np
np.sqrt(mean_squared_error(y_true=df_test[‘y’],
y_pred = df_test_fcst[‘yhat’]))
This is your model’s Mean Squared Error. This simply takes the distance between your actual values and your predicted values, and squares that number. In statistics, this helps you determine how far off you are from the exact value. In this case, we have a mean squared error of 97. Generally speaking, the lower MSE you get the better – so in this case, our model isn’t doing bad at all.
Now, we will use another metric called Mean Absolute Percentage Error (MAPE).
# Mean Absolute Percent Error
mean_absolute_percentage_error(y_true=df_test[‘y’],
y_pred=df_test_fcst[‘yhat’])
In statistics, this number tells us by what overall percent is our model off by? In this case, our model is off by 3%. In most cases, this is quite good.
Now, we get into what you’ve all been waiting for. Predicting into the future.
Here’s how you do it:
Step 11) Create Your Future Dataframe and Forecast into the Future!
Create your future dataframe:
# Predicting Into the Future!!
future = model.make_future_dataframe(periods = 365, include_history = False)
forecast = model.predict(future)
forecast
At last, you should be left with a dataframe that includes the dates into the future, the corresponding predictions under column ‘yhat’, and the upper and lower bounds of the model’s predictions.
Voila!
And just like that, you have a model that can predict whatever values and metrics that are key to your business.
As you’ve seen, Facebook Prophet is a powerful tool for making accurate sales predictions. However, implementing and fine-tuning this model can be complex – and as you’ve seen, time consuming. Especially if you do not have a computer science background.
If you’re looking for repeatable and accurate sales forecasting results for your business, our team at Aster Analytics Consulting can help. I specialize in Business Analytics, Sales Forecasting, and Business-Case Machine Learning and can provide tailored solutions to optimize your sales forecasting, allowing you to forge forward in confidence.
Again, click here to see my GitHub containing the full code breakdown: https://github.com/1jamjam/Data-Science—Python-/blob/main/Predicting%20Profit%20with%20Prophet
Let’s turn your data into dollars. Book a free consultation with me by clicking the link below:
Consultation – Aster Analytics (aster-analytics.com)
Feel free to email me at the address below:
james.aster-analytics@outlook.com
Cheers,
James Gregory