Web Scraping in Python

These are my notes on Web Scraping in Python

Libraries

cloudflare-scrape - a Python library to bypass Cloudflare's anti-bot page -
Requests-HTML - Combines Requests and PyQuery to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible -

Snippets

from requests_html import HTMLSession

session = HTMLSession()

login_page = session.post(
  "https://example.com/login.php",
  data={
    "username": "myles",
    "password": "areallygoodpassword"
  }
)

if not login_page.ok:
  raise Exception

secret_page = session.get(
  "https://example.com/admin/index.php",
  cookies=login_page.cookies
)

if not secret_page.pk:
  raise Exception

Loop though a Description List element

from pyquery import PyQuery as pq

doc = pq("""<dl>
    <dt>First name</dt>
    <dd>Dolores</dd>
    <dt>Last name</dt>
    <dd>Abernathy</dd>
    <dt>ID number</dt>
    <dd>CH465517080</dd>
    <dt>Status</dt>
    <dd>Conscious</dd>
    <dt>Park</dt>
    <dd>Westworld</dd>
    <dt>Narrative Role</dt>
    <dd>Rancher's daughter<dd>
</dl>""")

data = {}

for dt_el, dd_el in zip(*(iter(doc.find("dt, dd")),) * 2):
    data[dt_el.text] = dd_el.text

PreviousWagtail NextStatic Website Generators

Last updated 3 years ago

hashtagLibraries

hashtagSnippets

hashtagScrape a web page behind a login

hashtagLoop though a Description List element

Libraries

Snippets

Scrape a web page behind a login

Loop though a Description List element