Use Python to scrape financial data and analyze Rule of 40

Python is very useful for a data monkey like me because it can massively collect, clean, standardize and analyze data. In this blog post, I will go through the steps to download historical financial data for some publicly traded SaaS firms, then use such data to answer some interesting questions about how different SaaS firms achieve Rule of 40 (and Rule of 50).

First thing first, my data vendor is Alpha Vantage, a fintech company providing APIs for accessing financial market data. Their free API keys allow 25 calls per day (as of Jan 2024). The Basic Tier in premium subscription costs around $30 monthly.

Second, let’s briefly review Rule of 40

The "Rule of 40" is a financial performance metric used in the tech industry, especially in SaaS firms, to evaluate the balance between a company's revenue growth rate and its profitability margin.

Rule of 40 Metric = Revenue Growth Rate (%) + EBITDA Margin (%)

Here's how it works:

Revenue Growth Rate (%): This is the percentage increase in a company's revenue over a specific period, often measured year-over-year.
- To be more precise, it should be subscription revenue only as a lot of companies have revenue from multiple sources: SaaS, hardware, consulting/professional services, etc.. However, separating subscription revenue from other type of revenue can be very tough when companies don’t breakdown their revenue sources in financial filing. And even when they breakdown revenue by type, data vendors like Alpha Vantage likely does not scrape those in their database (due to inconsistent practices between firms)
- Some SaaS experts prefer ARR to Revenue due to the fact that ASC606 rules often drives Accrual Revenue (reported to SEC) to be significantly different from ARR. Again, due to the lack of consistent and standardized ARR data, I will use Revenue for this analysis
EBITDA Margin (%): This is EBITDA as a percentage of its total revenue. It measures a company's profitability before accounting for interest, taxes, depreciation, and amortization expenses.

The Rule of 40 suggests that a healthy and sustainable software company should have a combined score (revenue growth rate and EBITDA margin) of at least 40%. It is used as a guideline to assess whether a company is achieving a balanced growth and profitability strategy. A score above 40% indicates that the company is effectively managing growth while maintaining profitability. It suggests that the company is in a healthy financial position and is likely to be an attractive investment or acquisition target.

Scrape financial data from alpha vantage

Special Note: I have noticed that some tickers don’t have accurate financial data when comparing the results scraped from Alpha Vantage to 10-Q or 10-K filings. I emailed Alpha Vantage about those discrepancies. Thus, readers should only view charts and tables below to understand the “big picture”, not to trust every single data point scraped from the internet.

First, let’s import needed libraries and download some sample data from Alpha Vantage using their API key. Then we look at different reports to determine which one contains data needed for our analysis

import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings("ignore", category=UserWarning)

# Set the global display format for numbers with commas
pd.options.display.float_format = '{:,.0f}'.format

# API Key to access data fron alpha vantage
Eliza_API_key = 'ELIZA-BUI'
example_ticker = 'SNOW'

# get url link & request access
url = f'https://www.alphavantage.co/query?function=INCOME_STATEMENT&symbol={example_ticker}&apikey={Eliza_API_key}'
r = requests.get(url)
data = r.json()

# Check data's keys
data.keys()

# Check Annual Report
annual_df = pd.DataFrame(data['annualReports'])
annual_df

# Check Quarterly Report
quarterly_df = pd.DataFrame(data['quarterlyReports'])
quarterly_df

**Notes:**

- Annual report stops at the last time a full year's result is reported, which can be delayed by up to 12 months, as in the case of SNOW here. At the time of this analysis, Jan 2024, the last annual data available is from Jan 2023.

- In addition, some companies are new IPOs, making full-year historical data limited

- Thus, I will use Quarterly reports in this analysis to get more and most updated data points from quarterly filings

—> For all in tents and purposes, Annual Revenue is calculated by summing the last twelve month/last four quarterly reported revenue. So is EBITDA. That way, all metrics used in this analysis are still annual metrics, but based on latest rolling 4 quarters

NOW, we scrape the data vendor website for financial data of all tickers in our list

# Function to fetch financial data from Alpha Vantage for a list of tickers and store it in a DataFrame
def fetch_quarterly_financial_data(ticker_list, api_key, function='INCOME_STATEMENT'):
    
    # empty dictionary to save data
    consolidated_quarterly_dict = {}
    
    for ticker in ticker_list:
        try:
            # Construct the URL for the Alpha Vantage API request
            url = f'https://www.alphavantage.co/query?function={function}&symbol={ticker}&apikey={api_key}'
            
            # Make the API request
            response = requests.get(url)
            
            # Check if the request was successful (status code 200)
            if response.status_code == 200:
                data = response.json()
                
                # Extract the relevant financial data from the API response
                if 'quarterlyReports' in data.keys():
                    # put data in dataframe
                    quarterly_df = pd.DataFrame(data['quarterlyReports'])
                    
                    # Store the quarterly data in the dictionary with the ticker symbol as the key
                    consolidated_quarterly_dict[ticker] = quarterly_df
                
                else:
                    print(f"No quarterly data found for {ticker}.")
                    
            else:
                print(f"Failed to fetch data for {ticker}. Status code: {response.status_code}")
        
        except Exception as e:
            print(f"Error fetching data for {ticker}: {str(e)}")
            
    # return dictionary
    return consolidated_quarterly_dict

# Declare list of ticker
SaaS_tickers = ["ADBE", "CRM", "BASE", "FICO", "MDB", "TEAM", "ASAN", "DDOG", "SNOW", "ZS", "CRWD", "DASH", 
                "ADP", "CYBR", "DOCU"]

# Fetch financial data for the SaaS tickers
consolidated_quarterly_dict = fetch_quarterly_financial_data(SaaS_tickers, Eliza_API_key)

# Preview consolidated dataframe
display(consolidated_quarterly_dict.keys())
display(len(consolidated_quarterly_dict.keys()))

2. Process data to analyze Rule of 40 performance

# Copy extracted data to a new dictionary to perform data manipulation without the need to recall API
con_quarterly_dict = consolidated_quarterly_dict

# Function to convert all columns in a dataframe except some from exclusion list
def convert_columns_to_float_except(df, columns_to_exclude):
    # Create a dictionary to specify data types for each column
    data_types = {column: 'float' for column in df.columns if column not in columns_to_exclude}

    # Replace 'None' with 'NaN' in all columns using map
    df = df.apply(lambda col: col.map(lambda x: np.nan if x == 'None' else x))

    # Convert columns to 'float' using the astype method
    df = df.astype(data_types)
    
    return df

# Iterate through the dictionary and apply the function to each DataFrame
for ticker, quarterly_df in con_quarterly_dict.items():
    
    # Convert fiscal date ending to datetime
    quarterly_df['fiscalDateEnding'] = pd.to_datetime(quarterly_df['fiscalDateEnding'])

    # List of columns to exclude
    excluded_columns = ['fiscalDateEnding', 'reportedCurrency']
    
    # Run function to convert all columns to float type except those specified to be excluded above
    quarterly_df = convert_columns_to_float_except(quarterly_df, excluded_columns)

    # Add EBITDA Margin
    quarterly_df['EBITDA Margin %'] = quarterly_df['ebitda']/quarterly_df['totalRevenue'] * 100

    # Add Calendar Year col
    quarterly_df['Calendar Year'] = quarterly_df['fiscalDateEnding'].dt.year

    # Add Quarter col
    quarterly_df['Calendar Quarter'] = quarterly_df['fiscalDateEnding'].dt.quarter
   
    # Sort the DataFrame by 'fiscalDateEnding' in ascending order if it's not already sorted.
    quarterly_df.sort_values(by='fiscalDateEnding', inplace=True)

    # Calculate the sum of the last 4 quarters' revenue using rolling and sum
    quarterly_df['Last 4Q Revenue'] = quarterly_df['totalRevenue'].rolling(window=4).sum()
    
    # Convert to (mil) unit
    quarterly_df['Last 4Q Revenue (mil)'] = quarterly_df['Last 4Q Revenue'] / (10**6)
      
    # Calculate the sum of the last 4 quarters' EBITDA using rolling and sum
    quarterly_df['Last 4Q EBITDA'] = quarterly_df['ebitda'].rolling(window=4).sum()
    
    # Convert to (mil) unit
    quarterly_df['Last 4Q EBITDA (mil)'] = quarterly_df['Last 4Q EBITDA'] / (10**6)

    # Calculate the YoY revenue growth with a lag of 4 quarters
    quarterly_df['YoY Rolling 4Q Revenue Growth'] = quarterly_df['Last 4Q Revenue'].pct_change(periods=4) * 100
    
    # Calculate LTM/ last 4Q EBITDA Margin using Last 4Q EBITDA/Revenue
    quarterly_df['Rolling 4Q EBITDA Margin %'] = quarterly_df['Last 4Q EBITDA']/quarterly_df['Last 4Q Revenue'] * 100

    # Calculate YoY revenue growth and EBITDA margin SUM
    quarterly_df['YoY Rev Growth + EBITDA Margin'] = (quarterly_df['YoY Rolling 4Q Revenue Growth'] + 
                                                                    quarterly_df['Rolling 4Q EBITDA Margin %'])
    
    # Calculate if metrics pass Rule of 40 and Rule of 50
    quarterly_df['Rule40_Pass'] = quarterly_df['YoY Rev Growth + EBITDA Margin'].apply(lambda x: True if x >= 40 else False)
    quarterly_df['Rule50_Pass'] = quarterly_df['YoY Rev Growth + EBITDA Margin'].apply(lambda x: True if x >= 50 else False)
   
    # Assign the modified quarterly_df back to the dictionary using the same key
    con_quarterly_dict[ticker] = quarterly_df

# preview a sample ticker
con_quarterly_dict['SNOW']

Then we extract data that are relevant to the goal of analyzing Rule of 40 and focus on period 2020-2023

# Convert the dictionary values into a single DataFrame
full_df = pd.concat(con_quarterly_dict.values(), keys=con_quarterly_dict.keys())

# Reset the index for a cleaner DataFrame
full_df.reset_index(level=0, inplace=True)
full_df.rename(columns={'level_0': 'Ticker'}, inplace=True)

# Display the resulting DataFrame
#display(full_df.info())

# List columns of interest
selected_columns = ['Ticker','fiscalDateEnding', 'Calendar Year', 'Calendar Quarter', 'Last 4Q Revenue (mil)', 
                    'Last 4Q EBITDA (mil)', 'YoY Rolling 4Q Revenue Growth', 'Rolling 4Q EBITDA Margin %', 
                    'YoY Rev Growth + EBITDA Margin', 'Rule40_Pass', 'Rule50_Pass']

# Extracted only related data for stocks in period 2020-2023
focused_df = full_df[full_df['Calendar Year'] >= 2020][selected_columns]

# Display the resulting DataFrame
display(focused_df.info())

# Check for infinite values in 'YoY Rev Growth + EBITDA Margin' column
is_inf = np.isinf(focused_df['YoY Rev Growth + EBITDA Margin'])

# Replace infinite values with NaN
focused_df.loc[is_inf, 'YoY Rev Growth + EBITDA Margin'] = np.nan

# Check for infinite values in 'YoY Rev Growth + EBITDA Margin' column
print(focused_df[is_inf])

3. seek answers to analytical questions

Q1: How often each stock achieves Rule of 40 vs Rule of 50 within 4 years period from 2020 to 2023?

# Name year of interest
latest_year = 2023

# Group by 'Ticker' and calculate the sum of 'Rule40_Pass' and 'Rule50_Pass' for each group
pass_counts = focused_df.groupby('Ticker')[['Rule40_Pass', 'Rule50_Pass']].sum().reset_index()

# Rename columns for clarity
pass_counts.rename(columns={'Rule40_Pass': 'Count_Rule40_Pass', 'Rule50_Pass': 'Count_Rule50_Pass'}, inplace=True)

# Filter the DataFrame for the year 2023
data_latest_year = focused_df[focused_df['Calendar Year'] == latest_year]

# Find the latest fiscalDateEnding for each ticker in 2023
latest_dates = data_latest_year.groupby('Ticker')['fiscalDateEnding'].max().reset_index()

# Merge the latest_dates DataFrame with data_latest_year to get the corresponding 'YoY Rev Growth + EBITDA Margin' values
latest_values_latest_year = latest_dates.merge(data_latest_year, on=['Ticker', 'fiscalDateEnding'], how='left')

# Rename columns
latest_values_latest_year.rename(columns={'fiscalDateEnding': 'Last Filing fiscalDateEnding'}, inplace=True)

# Merge the pass_counts DataFrame with the calculated averages
pass_counts = pass_counts.merge(latest_values_latest_year[['Ticker', 'Last Filing fiscalDateEnding', 
                                                           'Last 4Q Revenue (mil)', 'Last 4Q EBITDA (mil)', 
                                                           'YoY Rolling 4Q Revenue Growth',
                                                           'Rolling 4Q EBITDA Margin %', 
                                                           'YoY Rev Growth + EBITDA Margin']],
                                                           on=['Ticker'], how='left')

display(pass_counts)

# Create a scatter plot using Seaborn
plt.figure(figsize=(10, 5))
scatter = sns.scatterplot(data=pass_counts, x="Count_Rule40_Pass", y="YoY Rev Growth + EBITDA Margin",
                          hue="Count_Rule50_Pass", 
                          palette="deep", 
                          #style="Count_Rule50_Pass",
                          #size ="Count_Rule50_Pass", 
                          )

plt.xlabel('Number of Qs achieving Rule40 in 2020-2023')
plt.ylabel('YoY Rev Growth + EBITDA Margin')
plt.title('Scatter Plot with Color Based on Count_Rule50_Pass')

# Move the legend to the upper right corner
scatter.legend(title='Number of Quarters achieving Rule50', loc='lower right')

# Add ticker names to the data points
for i, row in pass_counts.iterrows():
    plt.annotate(row['Ticker'], (row['Count_Rule40_Pass'], row['YoY Rev Growth + EBITDA Margin'] + 2))

plt.show()

ADBE and CRWD are the clear leaders with all quarters achieving Rule of 40 and 12 and 14 quarters achieving Rule of 50 (respectively)

DDOG takes second place with 12 quarters achieving Rule of 40 and Rule of 50
CRM, FICO, SNOW, DOCU had some good performing quarters in the past
Other stocks didn't make it to the high-performance clubs

Q2: Charting Revenue Growth vs EBITDA Margin and Rule of 40

# Create a scatter plot using Seaborn
plt.figure(figsize=(10, 6))
scatter = sns.scatterplot(data=pass_counts, x="YoY Rolling 4Q Revenue Growth", y="Rolling 4Q EBITDA Margin %",
                          hue = "Count_Rule40_Pass", 
                          #palette = "deep",
                          size = "Count_Rule40_Pass",
                          sizes = (50, 200)
                          )

# Add the line representing x + y = 40
y_line = 40 - pass_counts["YoY Rolling 4Q Revenue Growth"]
plt.plot(pass_counts["YoY Rolling 4Q Revenue Growth"], y_line, label='Rev Growth + EBITDA Margin = 40%', linestyle='-', color='red')

plt.xlabel(f'YoY Rolling 4Q Revenue Growth - latest filing in {latest_year}')
plt.ylabel(f'Rolling 4Q EBITDA Margin % - latest filing in {latest_year}')
plt.title('Revenue Growth vs EBITDA Margin and Rule of 40')

# Move the legend to the upper right corner
scatter.legend(title='Number of Quarters achieving Rule40', loc='upper right', fontsize='10', bbox_to_anchor=(1.2, 1.0))

# Add ticker names to the data points
for i, row in pass_counts.iterrows():
    plt.annotate(row['Ticker'], (row["YoY Rolling 4Q Revenue Growth"], row['Rolling 4Q EBITDA Margin %'] + 2))

plt.show()

Revenue Growth rate is somewhat negatively correlated with EBITDA Margin, representing the trade-off between topline growth and profitability margin

ADBE, CRWD, FICO are tickers staying above the 40% line in the latest filing of 2023, ADBE and CRWD in particular have achieved Rule of 40 in all latest 16 quarters
ZS has never achieved Rule of 40 during 2020-2023, but in the latest filing of 2023, it comes very close, at around 38%
CRM and DDOG had been in the "Rule of 40" club in many prior quarters, but are currently not in the club in the latest filing of 2023

Q3: Which stocks achieve Rule of 50 in any quarter of each year between 2020 - 2023?

# Group the data by 'Ticker' and 'Calendar Year' and count the number of quarters meeting Rule 40
rule50_counts = focused_df.groupby(['Ticker', 'Calendar Year'])['Rule50_Pass'].sum().reset_index()

# Calculate the average 'YoY Rev Growth + EBITDA Margin' for each Ticker and Year
average_yoy_margin = focused_df.groupby(['Ticker', 'Calendar Year'])['YoY Rev Growth + EBITDA Margin'].mean().reset_index()

# Merge the 'rule50_counts' DataFrame with the calculated averages
rule50_counts = rule50_counts.merge(average_yoy_margin, on=['Ticker', 'Calendar Year'], how='left')

# Rename the new column to indicate it's the average
rule50_counts.rename(columns={'YoY Rev Growth + EBITDA Margin': 'Average of Rev Growth and EBITDA Margin within CY'}, inplace=True)

# Filter stocks where each year has at least 1 quarter meeting Rule of 50
rule50_pass = rule50_counts[rule50_counts['Rule50_Pass'] > 0]

display(rule50_pass)

ADBE, CRWD and DDOG passed Rule of 50 in at least 3 quarters in each year between 2020 and 2022. Although they fell out of the "Rule of 50" club in 2023, when we zoom into the details below, we can see that ADBE essentially still passed Rule of 50

CRM passed Rule of 50 in 2020-2021 but fell out in 2022 and 2023 which reflects the ongoing layoff/restructuring plan with the firm
SNOW passed Rule of 50 only in 2022

# Filter the data for each year and create scatter plots
years = [2020, 2021, 2022, 2023]
fig, axes = plt.subplots(1, 4, figsize=(10, 4))

# Define the common ylim based on the range of data across all years
common_ylim = (rule50_pass['Average of Rev Growth and EBITDA Margin within CY'].min() - 5,
               rule50_pass['Average of Rev Growth and EBITDA Margin within CY'].max() + 5)

for i, year in enumerate(years):
    data_year = rule50_pass[rule50_pass['Calendar Year'] == year]
    ax = axes[i]
    
    sns.scatterplot(x='Rule50_Pass', y='Average of Rev Growth and EBITDA Margin within CY', data=data_year, s=25, ax=ax)
    ax.set(xlabel=None)
    ax.set(ylabel=None)
    ax.set_title(f'Calendar Year {year}')
    ax.set_xticks([1, 2, 3, 4])
    
    # Set common ylim for all subplots
    ax.set_ylim(common_ylim)

    for j, row in data_year.iterrows():
        ax.text(x=row['Rule50_Pass'], 
                y=row['Average of Rev Growth and EBITDA Margin within CY'] + 2, 
                s=row['Ticker'], va='top')

# Set x-axis and y-axis labels for the entire frame
fig.text(0.5, -0.05, 'Count of Quarters within each CY that pass Rule of 50', ha='center')
fig.text(-0.05, 0.5, 'Average of Rev Growth and EBITDA Margin', va='center', rotation='vertical')
        
# Adjust layout
plt.tight_layout()
plt.show()
    

Q4: Which stocks achieve Rule of 40 in at least 2 quarters in each of the last 4 years: 2020 - 2023?

# Group the data by 'Ticker' and 'Calendar Year' and count the number of quarters meeting Rule 40
rule40_counts = focused_df.groupby(['Ticker', 'Calendar Year'])['Rule40_Pass'].sum().reset_index()

# Calculate the average 'YoY Rev Growth + EBITDA Margin' for each Ticker and Year
average_yoy_margin = focused_df.groupby(['Ticker', 'Calendar Year'])['YoY Rev Growth + EBITDA Margin'].mean().reset_index()

# Merge the 'rule40_counts' DataFrame with the calculated averages
rule40_counts = rule40_counts.merge(average_yoy_margin, on=['Ticker', 'Calendar Year'], how='left')

# Rename the new column to indicate it's the average
rule40_counts.rename(columns={'YoY Rev Growth + EBITDA Margin': 'Average of Rev Growth and EBITDA Margin within CY'}, inplace=True)

# Filter stocks where each year has at least 2 quarters meeting Rule 40
rule40_pass = rule40_counts[rule40_counts['Rule40_Pass'] >= 2]

display(rule40_pass)

Of the 15 stocks we are analyzing, 7 stocks achieve Rule of 40 in at least 2 quarters of each year listed above.

ADBE and CRWD are the leaders with 4 years all achieving Rule of 40 in all 4 quarters
CRM and DDOG passed Rule of 40 in 3 years 2020-2022 but failed in 2023
DOCU passed Rule of 40 in 2 years 2021-2022 but failed in 2023
FICO passed Rule of 40 in year 2021 and 2023
SNOW passed Rule of 40 in 2022

# Filter the data for each year and create scatter plots
years = [2020, 2021, 2022, 2023]
fig, axes = plt.subplots(1, 4, figsize=(10, 4))

# Define the common ylim based on the range of data across all years
common_ylim = (rule40_pass['Average of Rev Growth and EBITDA Margin within CY'].min() - 5,
               rule40_pass['Average of Rev Growth and EBITDA Margin within CY'].max() + 5)

for i, year in enumerate(years):
    data_year = rule40_pass[rule40_pass['Calendar Year'] == year]
    ax = axes[i]
    
    sns.scatterplot(x='Rule40_Pass', y='Average of Rev Growth and EBITDA Margin within CY', data=data_year, s=25, ax=ax)
    ax.set(xlabel=None)
    ax.set(ylabel=None)
    ax.set_title(f'Calendar Year {year}')
    ax.set_xticks([1, 2, 3, 4])
    
    # Set common ylim for all subplots
    ax.set_ylim(common_ylim)

    for j, row in data_year.iterrows():
        ax.text(x=row['Rule40_Pass'], 
                y=row['Average of Rev Growth and EBITDA Margin within CY'] + 2, 
                s=row['Ticker'], va='top')

# Set x-axis and y-axis labels for the entire frame
fig.text(0.5, -0.05, 'Count of Quarters within each CY that pass Rule of 50', ha='center')
fig.text(-0.05, 0.5, 'Average of Rev Growth and EBITDA Margin', va='center', rotation='vertical')
        
# Adjust layout
plt.tight_layout()
plt.show()

Part II of this analysis we will take a look at 5 individual stocks to see the trade-off between Revenue growth and EBITDA margin to achieve rule of 40/50

Link to part II in Finance section: https://www.eliza-bui.com/finance/different-paths-to-rule-of-40-analysis-of-5-stocks

Link to Notebook in my Github repository:

https://github.com/ExcellentBee/Learning-Everyday/blob/main/Scrape%20data%20from%20AV%20to%20analyze%20Rule%20of%2040.ipynb

Scrape financial data from alpha vantage