How to get cryptocurrency prices using Python (and various tools)

Here’s another post based on a question from Quora:

Can I export the contents of an HTML table to Excel or MySQL via Selenium/Python?

No, you can’t export a table from HTML to Excel or MySQL using Selenium with Python.

But you’re in luck! You just asked the wrong question.

It’s like if you had asked: “Can I build a bookshelf using my DeWalt Table Saw?”

The answer is, of course, still no, but a reasonable person would say that your table saw is one of the tools you could use, but you’re also going to need a hammer, and some nails — or a screwdriver and screws — and maybe some paint or varnish. And a couple clamps to hold it all to together while you’re building it.

Unfortunately, I am not a reasonable person. But I’ll show you how to do it anyway.

You can use Selenium with Python to extract content from an HTML table. You can then use other tools (with Python) to import that content into an Excel spreadsheet or MySQL database.

For example, I’ll fetch cryptocurrency prices from CoinMarketCap using Selenium:

# get the HTML table data using Selenium
from selenium import webdriver

url = "https://coinmarketcap.com/"
table_locator = "xpath", "//table"

driver = webdriver.Chrome()
driver.get(url)

table_webelement = driver.find_element(*table_locator)
table_html = table_webelement.get_attribute("outerHTML")

In addition to Selenium, I’d recommend using Pandas DataFrames to export to Excel — both because it’s easier than working with openpyxl directory (or xlwt, or xlsxwriter, or one of several other lower level libraries) and because pandas has all sorts of other great data manipulation features that might come in handy. (And it looks great on a resume!)

Here’s how you can use Python to read an HTML table directly into a Pandas Dataframe and then export it to a Microsoft Excel Spreadsheet using DataFrame.to_excel()

# load the HTML table to Pandas DataFrame
import pandas

dataframes = pandas.read_html(table_html)

# get the first and only table on the page
table_dataframe = dataframes[0] 

# export data to Excel
table_dataframe.to_excel("data.xlsx")

Here is the resulting spreadsheet:

Excel Spreadsheet with Most Recent Cryptocurrency Pricing

Or you can export the Dataframe to a database.

Here, we use MySQL Connector with SQL Alchemy to append our results from the HTML table to a table named “prices” into the MariaDB “test” database. If the table does not exist, it creates it.

Using Pandas Datataframe.to_sql()

# export data to MySQL (or MariaDB)
import sqlalchemy
from mysql import connector

conn = sqlalchemy.create_engine("mysql+mysqlconnector://username:password@localhost/test")

table_dataframe.to_sql(con=conn, name="prices", if_exists='append', index=False)

And our resulting table will look like this:

MYSQL Table with Most Recent Cryptocurrency Pricing

There are also other tools you may be able to use:

requests HTTP library instead of Selenium if you don’t need to manipulate the browser to fetch data

# get prices using requests
import requests

response = requests.get("https://coinmarketcap.com/")

BeautifulSoup4 to parse the HTML into a List of Lists

# parse data using Beauitful Soup
from bs4 import BeautifulSoup

soup = BeautifulSoup(response.content, "html.parser")
table_soup = soup.find("table")

headings = [th.text.strip() for th in table_soup.find_all("th")]

rows = []
for tr in table_soup.find_all('tr', limit=11):
    row = []
    for td in tr.find_all('td'):
        row.append(td.text.strip())
    rows.append(row)

csv to parse into a CSV instead of Excel format (a CSV file can also be loaded directly into MySQL without Pandas).

#export data to csv file
import csv 
 
with open("data.csv", mode="w") as csvfile: 
    writer = csv.writer(csvfile) 
    writer.writerow(headings) 
    writer.writerows(rows) 

Of course , you can also load your data List into pandas as well:

table_dataframe = pandas.DataFrame(rows) 

Here is a gist showing all the code together:

pip install selenium
pip install pandas
pip install openpyxl
pip install sqlalchemy
pip install mysql-connector-python
pip install beautifulsoup4
pip install requests
view raw dependencies hosted with ❤ by GitHub
# get prices using requests
import requests
response = requests.get("https://coinmarketcap.com/")
# parse data using Beauitful Soup
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.content, "html.parser")
table_soup = soup.find("table")
headings = [th.text.strip() for th in table_soup.find_all("th")]
rows = []
for tr in table_soup.find_all('tr', limit=11):
row = []
for td in tr.find_all('td'):
row.append(td.text.strip())
rows.append(row)
# save data using CSV
import csv
with open("data.csv", mode="w") as csvfile:
writer = csv.writer(csvfile)
writer.writerow(headings)
writer.writerows(rows)
from selenium import webdriver
url = "https://coinmarketcap.com/"
table_locator = "xpath", "//table"
driver = webdriver.Chrome()
driver.get(url)
table_webelement = driver.find_element(*table_locator)
table_html = table_webelement.get_attribute("outerHTML")
# load the HTML table to Pandas DataFrame
import pandas
dataframes = pandas.read_html(table_html)
table_dataframe = dataframes[0] # get the first and only table on the page
# export data to Excel
table_dataframe.to_excel("data.xlsx")
# export data to MySQL (or MariaDB)
import sqlalchemy
from mysql import connector
conn = sqlalchemy.create_engine("mysql+mysqlconnector://username:password@localhost/test")
table_dataframe.to_sql(con=conn, name="prices", if_exists='append', index=False)

Scheduling tests to monitor websites

If you have access to your crontab you can set a Selenium script to run periodically. If you don’t have cron, you can use a VM (with Vagrant) or Container (with Docker) to get it.

Cron is available on Linux & Unix systems. On Windows, you can use Task Scheduler. On Mac, there is launchd, but it also includes cron (which wraps launchd).

You could also set up a job to run on a schedule using a continuous integration server such as Jenkins. Or write a simple, long running script that runs in the background and sleeps between executions.

I have a service that runs Selenium tests and monitoring for my clients, and use both cron and Jenkins for executing test runs regularly. I also have event-triggered tasks that can be triggered by a checkin or user request.

Each line represents a task with schedule in the following format:

#minute   #hour     #day      #month    #weekday  #command

# perform a task every weekday morning at 7am
*         7         *         *         1-5       wakeup.sh

# perform a task every hour
@hourly python selenium-monitor.py

You can edit crontab to create a task by typing crontab -e

You can view your crontab by typing crontab -l

If you just want to repeat your task within your script while it’s running, you can add a sleep statement and loop (either over an interval or until you kill the script).

#!/usr/bin/env python

from time import sleep
from selenium import webdriver

sites = ['https://google.com', 'https://bing.com', 'https://duck.com']

interval = 60 #seconds
iterations = 10 #times

def poll_site(url):
	driver = webdriver.Chrome()
	driver.get(url)
	title = driver.title
	driver.quit()
	return title

while (iterations > 0):
	for url in sites:
		print(poll_site(url))
	sleep(interval)
	iterations -= 1

See the example code on github:

#!/usr/bin/env python
from time import sleep
from selenium import webdriver
sites = ['https://google.com', 'https://bing.com', 'https://duck.com']
interval = 60 #seconds
iterations = 10 #times
def poll_site(url):
driver = webdriver.Chrome()
driver.get(url)
title = driver.title
driver.quit()
return title
while (iterations > 0):
for url in sites:
print(poll_site(url))
sleep(interval)
iterations -= 1

Originally posted on Quora:

https://www.quora.com/How-can-I-schedule-simple-website-test-scripts-Selenium-to-run-regularly-like-Cron-jobs-and-notify-me-if-it-fails-for-free/answer/Aaron-Evans-56