How to Read and Write the HTML webpage using Pandas

Market Watch

MarketWatch is a website that provides financial information, business news, analysis, and stock market data

In [1]:
import pandas as pd
In [2]:
# URL to access the data. We can dynamically change the url based on the requirement
url = "https://www.marketwatch.com/tools/marketsummary" 

Note: Table tag alone can be accessed with the help of read_html method

In [3]:
# Reading the URL 
marketwatch_df = pd.read_html(url)  
In [4]:
# Accessing the US Market Overview table -> table with index 0(zero)
marketwatch_df[0]
Out[4]:
Index Last Chng. % Chng.
0 Dow Jones Industrial Average /zigman2/quotes/... 26655.00 -17.34 -0.07%
1 NASDAQ Composite Index /zigman2/quotes/210598... 10681.00 177.76 +1.69%
2 S&P 500 Index /zigman2/quotes/210599714/realtime 3240.00 15.21 +0.47%
3 Global Dow Realtime USD /zigman2/quotes/21059... 2967.00 9.03 +0.31%
4 Dow Jones Utility Average... /zigman2/quotes/... 817.46 -11.14 -1.34%
5 NYSE Composite Index /zigman2/quotes/21059803... 12388.00 -14.81 -0.12%
6 NYSE American Composite Index /zigman2/quotes... 2033.00 -4.56 -0.22%
7 Russell 2000 Index /zigman2/quotes/210598147/... 1464.00 -9.27 -0.63%
8 PHLX Semiconductor Index /zigman2/quotes/2105... 2086.00 16.71 +0.81%
9 Gold Continuous Contract /zigman2/quotes/2100... 1817.00 7.00 +0.39%
10 CBOE 30 Year Treasury Bond... /zigman2/quotes... 13.24 -0.05 -0.38%
11 CBOE 10 Year Treasury Note... /zigman2/quotes... 6.25 -0.03 -0.48%
In [5]:
# Accessing the NYSE Issues table -> table with index 1
marketwatch_df[1]
Out[5]:
0 1 2
0 Issues NYSE Nasdaq
1 Advancing 1259 1602
2 Declining 1716 1684
3 Unchanged 66 70
4 Total: 3041 3356
5 Issues at: Issues at: Issues at:
6 52-Week High 101 150
7 52-Week Low 8 13
8 Volume: Volume: Volume:
9 Advancing 1.17B 1.58B
10 Declining 1.13B 984.56M
11 Unchanged 14.57M 11.75M
12 Total: 2.31B 2.58B
In [6]:
# Accessing the NYSE Volumes table -> table with index 2
marketwatch_df[2]
Out[6]:
0 1 2 3 4
0 Most Active by Volume on 7/20/2020 Most Active by Volume on 7/20/2020 Most Active by Volume on 7/20/2020 Most Active by Volume on 7/20/2020 Most Active by Volume on 7/20/2020
1 Symbol Last Chng. % Chng. Volume
2 DNR 0.25 +0.02 +10.64% 121.32M
3 NIO 12.58 +1.49 +13.44% 113.93M
4 JE 0.56 +0.21 +59.32% 104.02M
5 AZN 59.16 -1.94 -3.18% 48.95M
6 F 6.70 -0.11 -1.54% 48.31M
7 BAC 23.59 +0.37 +1.57% 37.18M
8 NCLH 15.33 +0.06 +0.39% 31.77M
9 GE 6.91 -0.16 -2.22% 30.05M
10 SPY 323.20 +1.48 +0.46% 26.93M
11 XLF 23.91 -0.04 -0.19% 25.80M

Exporting all the data to Excel File

In [7]:
# Assigning the data frame to the respective variables
US_Market_Overview_df = marketwatch_df[0]
NYSE_Issues_df = marketwatch_df[1]
NYSE_Volume_df = marketwatch_df[2]
In [8]:
# Exporting all the data frame to an excel file. 
writer = pd.ExcelWriter('Market_Watch.xlsx', engine='xlsxwriter')

# Position of the dataframes in the worksheet.
US_Market_Overview_df.to_excel(writer, sheet_name='overview',index=False,header=None) 
NYSE_Issues_df.to_excel(writer, sheet_name='issues',index=False,header=None)
NYSE_Volume_df.to_excel(writer, sheet_name='volumes',index=False,header=None)

writer.save()