Decision Making and Reinforcement Learning

Learn how to model decision-making problems with uncertainty using Markov decision processes (MDPs) and their dynamic programming algorithms in this introductory course on reinforcement learning.

Modules/Weeks

9

Weekly Effort

8-10 hours

Discipline

Format

Cost

Free
$20 certificate (optional)

Course Description

  • Explore the foundational concepts of sequential decision making and reinforcement learning, gaining an introductory understanding of how these principles apply to various problem-solving contexts.
  • Model preferences for decision making through utility theory, delve into multi-armed bandit problems, and evaluate feedback using different approaches, incorporating finite Markov decision processes (MDPs) in associative settings.
  • Learn about dynamic programming algorithms, partial observability modeled by POMDPs, and how they are solved using online planning methods, enhancing the ability to interpret and apply these techniques.
  • Understand the reinforcement learning problem, including paradigms such as Monte Carlo methods and temporal difference learning, and study a class of n-step temporal difference methods, focusing on algorithms and practical examples.

Free Enrollment with Optional Certificate

This course is available at no cost and includes full access to all instructional materials, videos, and assessments. Learners who successfully complete all course requirements will have the option to purchase a verified certificate of completion for $20.

Certificate Sample

Course Prerequisites

  • Foundational understanding of data structures such as graphs, trees, recursion, and complexity
  • Familiarity with basic optimization concepts from calculus, including critical mathematical tools and theories
  • Understanding of basic probability, including concepts such as expectations, bias, and variance, applying these statistical principles to various contexts
  • Proficiency in Python, using this skill for practical applications and problem-solving

What You Will Learn

By the end of this course, learners will be able to:

 

  • Map qualitative preferences to quantitative utilities and model sequential decision problems as multi-armed bandit problems or Markov decision processes, providing robust techniques for handling complex decision-making scenarios.

  • Implement dynamic programming algorithms to obtain optimal policies, employing these algorithms to derive the best actions in various contexts.

  • Learn to model partial observability with belief MDPs and implement online planning, gaining insights into handling situations where all relevant information is not readily available.

  • Understand and implement basic reinforcement learning algorithms using Monte Carlo and temporal difference methods, as well as comprehend how these methods can be generalized, enhancing the ability to adapt and apply these concepts in various real-world scenarios.

 

Course Outline

 

Module 1: Introduction to decision making and reinforcement learning

Module 2: Decision making and utility theory

Module 3: Bandit problems

Module 4: Markov decision processes

Module 5: Dynamic programming

Module 6: Partially observable Markov decision processes

Module 7: Monte Carlo methods

Module 8: Temporal-difference learning

Module 9: Reinforcement learning - generalization

Instructors

Headshot of Tony Dear
Tony Dear
Lecturer in Discipline

Tony Dear is interested in the intersection of robotics, locomotion, and machine and reinforcement learning. He is particularly interested in systems for which traditional planning methods may not be suitable due to problem complexity, but whose structure may be amenable to new methods in reinforcement learning (RL) or deep RL. His goal is to make such methods work on real, physical robots, especially in the realm of locomotion.

Dear received his BS in Electrical Engineering and Computer Science from UC Berkeley in 2012. He subsequently received his MS in 2015 and PhD in 2018, both in Robotics from Carnegie Mellon University. He joined Columbia as a faculty member in 2018 and is currently faculty director of the Bridge to MS Program in Computer Science.

Please note that there are no instructors or course assistants actively monitoring this course.