Wiki
  • Glossary
  • License
  • Myles' Wiki
  • Meta
  • Status
  • Android
    • Fire OS
  • Computer Science
    • Artificial Intelligence
    • Machine Learning
  • Cooking
    • Recipies
      • Desserts
        • Peanut Butter Swirl Brownies
  • Dat Protocol
  • Databases
    • MySQL
    • Postgres
  • DevOps
    • Ansible
    • Docker
  • Graphic Design
    • Adobe Illustrator
    • Design Systems
    • Pen Plotters
    • SVG
    • Zine
  • iOS
  • Linux
  • Lists
    • Books to Read :open_book:
    • Film to Watch :film_projector:
    • TV Shows to Binge :television:
    • Video Games to Play :joystick:
  • Pentesting
    • Metasploit
    • nmap Cheat Sheet
  • Productivity
  • Programming
    • CSS
    • GitHub
    • Go
    • GraphQL
    • Methodology
    • R
    • Ruby
    • Data Science
      • Organizing Data Science Projects
    • JavaScript
      • Node.js
      • Vue.js
        • Nuxt.js
    • PHP
      • Laravel
      • WordPress
    • Python
      • Anaconda
      • Celery
      • django
      • Jupyter
      • pandas
      • Useful Regular Expression
      • Wagtail
      • Web Scraping in Python
    • Static Website Generators
      • Hugo
      • Jekyll
      • VuePress
  • Raspberry Pi
  • Selfhosted
  • Setup
    • Android
    • Bag
    • iOS Applications
    • macOS Setup
    • Microsoft Windows Setup
  • Startup
  • Text Editors
    • Visual Studio Code
  • UNIX
  • User Experience (UX)
  • Windows
Powered by GitBook
On this page
  • Folder Structor
  • Sharing
  1. Programming
  2. Data Science

Organizing Data Science Projects

PreviousData ScienceNextJavaScript

Last updated 2 years ago

This is a living document[^1] on how I organize and manage my projects.

[^1]: A document that is continually edited and updated.

Folder Structor

project/
├── Pipfile
├── Pipfile.lock
├── archive/
├── data/
│   ├── interim/
│   ├── output/
│   └── source/
├── 000_Dashboard.ipynb
├── 001_Clean_Data.ipynb
├── shared.py
  • Pipfile & Pipfile.lock for any required third-party Python libraries.

  • I use the archive/ directory to store Jupyter Notebooks that didn't really work out. I find it incredibly useful to save old work and an average Jupyter Notebook file costs less than a penne of disk space.

  • data/ contains three folders, source/ for the orignal data, interim/ for data frames that have been muniplulated during the analysis, and output/ for the final data.

  • I use the 000_Dashboard.ipynb notebook file to have a place to run quick statistics of the data set. I think of it as a FAQ for the data.

  • The Python file shared.py is used for things that can be shared across all notebooks.

Sharing

I like to use Git and GitHub for sharing projects, but recently my average dataset has grown larger than Git and GitHub can handle. I'm thinkg of moving to a Git for source control over Jupyter Notebooks and Python files and S3 for hosting data files.

Data Science