pandas

pandas is a Python library for data manipulation and analysis.

Panda Eating Bamboo

Libaries

  • jardin - a pandas.DataFrame-based ORM -

  • Modin - speed up your Pandas workflows by changing a single line of code - :megaphone:

  • Pandaral·lel - A simple and efficient tool to parallelize your pandas operations on all your CPUs on Linux & macOS -

  • Pandas Bokeh - a Bokeh plotting backend for Pandas and GeoPandas -

  • pandas-datareader - up to date remote data access for pandas -

  • Pandas Profiling - Generates profile reports from a pandas DataFrame -

  • PrettyPandas - is a Pandas DataFrame Styler class that helps you create report quality tables -

Notes

  • pandas.io.json.json_normalize is a function to normalize structured JSON into a flat dataframe. Useful for working with data that comes from an JSON API.

Snippets

Connect to a SQLite database

Using a SQLAlchemy engine to connect to a database

Python compatible column names with slugify

Usually I'm dealing with data from external sources that don't have pretty columns names. I like to use slugify to convert them to Python compatible keys.

Read CSV file with all cells as strings

Traspose DataFrame and view all rows

Convert a column from continuous to categorical

Kevin Markham (justmarkham) - https://twitter.com/justmarkham/status/1146040449678925824

Read a CSV file data in chunk size

Sometimes a CSV is just to large for the memory on your computer. You can tell the argument chunksize how many rows of data you would like to load.

If you would like to load the scale down the data and load the it into one pd.DataFrame:

Pandas/SQL Rosetta Stone

IN / pandas.DataFrame.isin

See the pandas documentation for more information on pandas.DataFrame.isin.

Last updated