Python Data Wrangling

Master data wrangling in Python: Attend a two-day boot camp blending seminars and hands-on exercises to integrate Python fundamentals with practical data manipulation.

Modules/Weeks

1

Weekly Effort

16 hours

Discipline

Format

Cost

See external site

Course Description

The Python Data Wrangling Boot Camp is a two-day intensive course that combines concept-focused seminars with hands-on exercises pairing Python fundamentals with practical data wrangling.

  • Learn techniques to efficiently load and explore datasets in Python for insightful data analysis.
  • Acquire methods to join, reconcile, and clean messy datasets for accurate analysis and interpretation.
  • Gain proficiency in conducting basic statistical analyses such as linear and logistic regression to derive meaningful insights from data.
  • Develop the ability to craft exploratory visualizations to uncover patterns and trends within datasets for deeper understanding.

Course Prerequisites

Participants must have (or create) an unrestricted Google account for working with sample notebooks (via Google Colab) and data sets.

What You Will Learn

Python is one of the world's most popular programming languages. It is versatile enough to create sophisticated data visualizations and powerful enough to run sophisticated machine learning models. Fortunately, Python was also specifically designed to be easy to learn and use, making it an excellent tool for anyone looking to enhance their data gathering and analytical skillset.

This two-day course will provide an introduction to the python programming language and demonstrate how it can be used to do essential data wrangling, manipulation and cleaning tasks using real-world biomedical data. Bringing together scalable methods and popular libraries for data manipulation, basic statistical analysis and visualization, this boot camp will provide participants with all the necessary tools and background for getting started with Python for data work. Through hosted notebooks, participants will leave the workshop with functioning code that they can then apply to their own data sets. Participants will receive orienting videos before the real-time sessions so they can familiarize themselves with the Jupyter Notebook/Google Colab environment; all code samples will be available in this format for participant use.

By the end of the workshop, participants will be able to:

  • Load and explore data sets in Python
  • Join, reconcile and otherwise clean up messy data sets
  • Do basic statistical analyses, including linear and logistic regression
  • Render exploratory visualizations

Instructors

Susan McGregor
Susan McGregor
Associate Research Scholar

Susan McGregor is a Research Scholar at Columbia University’s Data Science Institute, where she also co-chairs its Center for Data, Media & Society. McGregor’s research is centered on security and privacy issues affecting journalists and media organizations. Her current projects include NSF-funded work to provide readers with stronger guarantees about digital media by integrating cryptographic signatures into digital publishing workflows, an effort to develop novel classifiers for detecting abusive and harassing speech targeting journalists on Twitter, and using artificial intelligence and computer vision to help journalists recognize unfamiliar political graphics when reporting in the field. McGregor joins the Data Science Institute from the School of Journalism, where she developed the school’s first data journalism curriculum and served as a primary academic advisor for its dual-degree program in Journalism & Computer Science. She is the author of two forthcoming books: Information Security Essentials: A Guide for Reporters, Editors and Newsroom Leaders is due out from Columbia University Press in early 2021; Practical Python Data Wrangling and Data Quality will be published by O’Reilly Media in summer 2021.

Prior to her work at Columbia, McGregor spent several years as the Senior Programmer on the News Graphics team at the Wall Street Journal. She was named a 2010 Gerald Loeb Award winner for her work on WSJ’s “What They Know” series, and was a finalist for the Scripps Howard Foundation National Journalism Awards for Web Reporting in 2007. Her work has also been nominated for two Webby awards, in 2011 and 2015. She has published a range of academic papers in leading peer-reviewed security and privacy conferences exploring how these issues manifest in and impact the work of journalists. Her research and development work in this and related areas has received support from the National Science Foundation, the Knight Foundation, Google, and multiple schools and offices of Columbia University. 

In addition to her technical and academic work, McGregor is actively interested in how the arts can help stimulate critical thinking and introduce new perspectives around technology issues, occasionally creating small prototypes and installations. She holds a master’s degree in Educational Communication and Technology from NYU and a bachelor’s degree in Interactive Information Design from Harvard University.