Wiki
  • Glossary
  • License
  • Myles' Wiki
  • Meta
  • Status
  • Android
    • Fire OS
  • Computer Science
    • Artificial Intelligence
    • Machine Learning
  • Cooking
    • Recipies
      • Desserts
        • Peanut Butter Swirl Brownies
  • Dat Protocol
  • Databases
    • MySQL
    • Postgres
  • DevOps
    • Ansible
    • Docker
  • Graphic Design
    • Adobe Illustrator
    • Design Systems
    • Pen Plotters
    • SVG
    • Zine
  • iOS
  • Linux
  • Lists
    • Books to Read :open_book:
    • Film to Watch :film_projector:
    • TV Shows to Binge :television:
    • Video Games to Play :joystick:
  • Pentesting
    • Metasploit
    • nmap Cheat Sheet
  • Productivity
  • Programming
    • CSS
    • GitHub
    • Go
    • GraphQL
    • Methodology
    • R
    • Ruby
    • Data Science
      • Organizing Data Science Projects
    • JavaScript
      • Node.js
      • Vue.js
        • Nuxt.js
    • PHP
      • Laravel
      • WordPress
    • Python
      • Anaconda
      • Celery
      • django
      • Jupyter
      • pandas
      • Useful Regular Expression
      • Wagtail
      • Web Scraping in Python
    • Static Website Generators
      • Hugo
      • Jekyll
      • VuePress
  • Raspberry Pi
  • Selfhosted
  • Setup
    • Android
    • Bag
    • iOS Applications
    • macOS Setup
    • Microsoft Windows Setup
  • Startup
  • Text Editors
    • Visual Studio Code
  • UNIX
  • User Experience (UX)
  • Windows
Powered by GitBook
On this page
  • Libaries
  • Links
  • Notes
  • Snippets
  • Connect to a SQLite database
  • Using a SQLAlchemy engine to connect to a database
  • Python compatible column names with slugify
  • Read CSV file with all cells as strings
  • Traspose DataFrame and view all rows
  • Convert a column from continuous to categorical
  • Read a CSV file data in chunk size
  • Pandas/SQL Rosetta Stone
  1. Programming
  2. Python

pandas

PreviousJupyterNextUseful Regular Expression

Last updated 2 years ago

pandas is a library for data manipulation and analysis.

Panda Eating Bamboo

Libaries

  • Pandaral·lel - A simple and efficient tool to parallelize your pandas operations on all your CPUs on Linux & macOS -

  • Pandas Profiling - Generates profile reports from a pandas DataFrame -

Links

Notes

Snippets

Connect to a SQLite database

import pandas as pd
import sqlite

conn = sqlite3.connect("database.sqlite")
df = pd.read_sql_query("SELECT * FROM table_name;", conn)

df.head()

Using a SQLAlchemy engine to connect to a database

import pandas as pd
import numpy as np
from sqlalchemy import create_engine

engine = create_engine("postgresql:///database")

df = pd.read_sql_query("SELECT * FROM table;", con=engine)

df.head()

Python compatible column names with slugify

from slugify import slugify

df.columns = [slugify(c, separator="_", to_lower=True) for c in df.columns]

Read CSV file with all cells as strings

>>> df = pd.read_csv("data/source/example.csv", dtype=str)
>>> df.dtypes
ID                         object
NAME                       object
SALARY                     object

Traspose DataFrame and view all rows

>>> with pd.option_context("display.max_rows", None):
...    print(df.head(1).transpose())
ID                         1
NAME       Myles Braithwaite
SALARY               $10,000

Convert a column from continuous to categorical

>>> df["age_groups"] = pd.cut(
...     df.age,
...     bins=[0, 18, 65, 99],
...     labels=["child", "adult", "elderly"]
... )

Read a CSV file data in chunk size

Sometimes a CSV is just to large for the memory on your computer. You can tell the argument chunksize how many rows of data you would like to load.

df_chunks = pd.read_csv("data/source/data.csv", chunksize=10000)

df_chunks[0].head()

If you would like to load the scale down the data and load the it into one pd.DataFrame:

def chunk_preprocessing(df):
    """Preprocess a chunk of the data before analysis.

    Arguments
    ————————-
    df : `pd.DataFrame`
        The chunk of the data source that needs to be processed.
    """
    pass


df_chunks_processed = [
    chunk_preprocessing(df)
    for df in df_chunks
]

df_concat = pd.concat(df_chunks_processed)

Pandas/SQL Rosetta Stone

IN / pandas.DataFrame.isin

SELECT *
FROM table
WHERE
    city IN (
        "Toronto",
        "Richmond Hill"
    );
# City is ether Toronto or Richmond Hill:
df[df['city'].isin(['Toronto', 'Richmond Hill'])]

# City is not Markdale or Oakville:
df[~df['city'].isin(['Markdale', 'Oakville'])]

- a pandas.DataFrame-based ORM -

- speed up your Pandas workflows by changing a single line of code -

- a Bokeh plotting backend for Pandas and GeoPandas -

- up to date remote data access for pandas -

- is a Pandas DataFrame Styler class that helps you create report quality tables -

by Jay Alammar, 29 October 2018.

by Chris Moffitt, 20 November 2018.

by Akshar Raaj, 4 August 2019.

by Benjamin Cooley, 10 October 2018.

by Chris Moffitt, 8 April 2015.

is a function to normalize structured JSON into a flat dataframe. Useful for working with data that comes from an JSON API.

Usually I'm dealing with data from external sources that don't have pretty columns names. I like to use to convert them to Python compatible keys.

Kevin Markham () -

See the pandas documentation for more information on .

jardin
Modin
:megaphone:
Pandas Bokeh
pandas-datareader
PrettyPandas
A Gentle Visual Intro to Data Analysis in Python Using Pandas
Building a Repeatable Data Analysis Process with Jupyter Notebooks
Gaining a solid understanding of Pandas series
How to make a gif map using Python, Geopandas and Matplotlib
Improve panda’s Excel Output
pandas.io.json.json_normalize
slugify
justmarkham
https://twitter.com/justmarkham/status/1146040449678925824
pandas.DataFrame.isin
Python