Web Scraping in Python
These are my notes on Web Scraping in Python
Libraries
cloudflare-scrape - a Python library to bypass Cloudflare's anti-bot page -
Requests-HTML - Combines Requests and PyQuery to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible -
Snippets
Scrape a web page behind a login
from requests_html import HTMLSession
session = HTMLSession()
login_page = session.post(
"https://example.com/login.php",
data={
"username": "myles",
"password": "areallygoodpassword"
}
)
if not login_page.ok:
raise Exception
secret_page = session.get(
"https://example.com/admin/index.php",
cookies=login_page.cookies
)
if not secret_page.pk:
raise ExceptionLoop though a Description List element
Last updated