use yfinance to download historical data from yahoo finance
In today's digital age, leveraging technology to streamline financial analysis processes is not just a convenience; it's a necessity. Being able to read the three financial statements and calculate fundamental ratios is Finance 101 for any FP&A professional like me. However, instead of relying solely on manual Excel spreadsheets, we have a powerful ally at our disposal—Python. In this blog post, we will use Python library yfinance to automate downloading/scraping historical data from Yahoo Finance, then use other standard libraries like pandas and seaborn to standardize formats, calculate related metrics, and create charts.
Explore yfinance
yfinance is a free API to download data from Yahoo Finance created by Ran Aroussi. Its functions are pretty straightforward and beginner-friendly.
Full documentation can be found here: https://pypi.org/project/yfinance/
First, we need to install yfinance if this is the first time you use the library. Then we load all frequently used libraries
# install new package for the first time #pip install yfinance # import libraries import yfinance as yf import pandas as pd import datetime import seaborn as sns import matplotlib.pyplot as plt import matplotlib.dates as mpdates # Set the global display format for numbers with commas pd.options.display.float_format = '{:,.0f}'.format
Using the yf.Ticker(ticker_name) function, we can take a quick look into essential information about the company. Let’s use TSLA as an example
# Look at Tesla info Stock_Ticker = 'TSLA' Tesla = yf.Ticker(Stock_Ticker) Tesla.info
Using function {ticker}.quarterly_financials we can take a look at the last 4 quarters data
# Get last 4 quarterly financial info Tesla.quarterly_financials
Using function {ticker}.income_stmt we can look at line items in the financial statement, yet only last 3 years data are available
Hello, World!
Explore yfinance
yfinance is a free API to download data from Yahoo Finance created by Ran Aroussi. Its functions are pretty straightforward and beginner-friendly.
Full documentation can be found here: https://pypi.org/project/yfinance/
We will calculate some basic income statement ratios using 4 main line items: Revenue, Gross Profit, EBITDA, Net Income
# Get historical financial data for the last 3 years tesla_financials = pd.DataFrame(Tesla.income_stmt.loc[['Total Revenue', 'Gross Profit', 'EBITDA', 'Net Income']]) # Transpose features to columns to manipulate data more easily tesla_financials = tesla_financials.transpose() tesla_financials
# Put numbers in thousands format and update columns titles tesla_financials['Revenue (mil)'] = tesla_financials['Total Revenue'] / (10 ** 6) tesla_financials['Gross Profit (mil)'] = tesla_financials['Gross Profit'] / (10 ** 6) tesla_financials['EBITDA (mil)'] = tesla_financials['EBITDA'] / (10 ** 6) tesla_financials['Net Income (mil)'] = tesla_financials['Net Income'] / (10 ** 6) # Add a column for Gross Margin, EBITDA Margin, Net Profit Margin tesla_financials['Gross Margin %'] = tesla_financials['Gross Profit'] / tesla_financials['Total Revenue'] * 100 tesla_financials['EBITDA Margin %'] = tesla_financials['EBITDA'] / tesla_financials['Total Revenue'] * 100 tesla_financials['Net Profit Margin %'] = tesla_financials['Net Income'] / tesla_financials['Total Revenue'] * 100 # Get a new dataframe tesla_fin = tesla_financials[['Revenue (mil)', 'Gross Profit (mil)', 'EBITDA (mil)', 'Net Income (mil)', 'Gross Margin %', 'EBITDA Margin %', 'Net Profit Margin %' ]] tesla_fin
# reset index tesla_fin.reset_index(inplace=True) # change column name from index to calendar year tesla_fin.rename(columns={'index': 'Date'}, inplace=True) # Convert index to datetime format tesla_fin['Date'] = pd.to_datetime(tesla_fin['Date']) # Sort the DataFrame by the 'Date' column in ascending order tesla_fin = tesla_fin.sort_values(by='Date') # Calculate YoY growth tesla_fin['YoY Revenue Growth %'] = tesla_fin['Revenue (mil)'].pct_change() * 100 tesla_fin['YoY Gross Profit Growth %'] = tesla_fin['Gross Profit (mil)'].pct_change() * 100 tesla_fin['YoY EBITDA Growth %'] = tesla_fin['EBITDA (mil)'].pct_change() * 100 tesla_fin['YoY Rev Growth + EBITDA Margin'] = tesla_fin['YoY Revenue Growth %'] + tesla_fin['EBITDA Margin %'] tesla_fin
With impressive YoY revenue growth above 50%, TSLA deserves to be a hot (and hotly debated) stock to pay attention to
3. Analyze and Chart stock price performance
# Look at 2 years worth of stock price performance end_date = datetime.datetime.now() start_date = pd.to_datetime('01/01/2023') # Download historical stock price data stock_data = yf.download(Stock_Ticker, start=start_date, end=end_date) # Reset index to have Date as a column stock_data.reset_index(inplace=True) # Take a look at data stock_data.info()
Then we can add a combo chart to show both trading volume and closing price for each day of TSLA
# Create a Seaborn barplot for volume fig, ax1 = plt.subplots(figsize=(10,5)) ax1.bar(x='Date', height='Volume', data=stock_data, color='gray', alpha=0.3, label='Volume') # Create a secondary y-axis for volume ax2 = ax1.twinx() sns.lineplot(x='Date', y='Close', data=stock_data, color='blue', label='Closing Price', ax=ax2) # Set labels and title ax1.set_xlabel('Date') ax2.set_ylabel('Closing Price', color='blue') ax1.set_ylabel('Volume', color='gray') plt.title(f'TSLA Stock Prices and Volume ({start_date.date()} to {end_date.date()})') # Set major locator and formatter for x-axis (dates) locator = mpdates.MonthLocator(bymonthday=1) formatter = mpdates.DateFormatter('%Y-%m-%d') ax1.xaxis.set_major_locator(locator) ax1.xaxis.set_major_formatter(formatter) # rotate the x-axis tick labels for readability ax1.tick_params(axis='x', rotation=50) # Show the plot plt.show()
And we repeat the similar process for different stock tickers and other financial metrics in Income Statement, Balance Sheet, or Cashflow Statement
Notebook can be found in my Gitbub Repository:
https://github.com/ExcellentBee/Learning-Everyday/blob/main/Scrape%20Financial%20Data%20from%20Yahoo%20Finance.ipynb