Fall 2026
Foundations of Machine Learning

Today, the most useful programs in the world, the ones that recommend your next video, beat grandmasters at chess, unlock your phone with your face, and answer essay questions in full paragraphs, learn from data instead of being told what to do line by line. This class is about how that works.

A clean definition comes from Tom Mitchell (1997):

A computer program is said to learn from experience E with respect to some class of tasks T, and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

It's a general statement about learning. It describes you learning to ride a bike just as well as it describes a neural network learning to translate French. In this course we focus on the case where the experience is data. The recipe is deceptively simple: show the machine many examples of the task, and hope it picks up patterns that generalize to examples it has never seen. Want to tell cats from non-cats in a photo? Feed it a few thousand pictures labeled "cat" and "no cat," and measure how often it gets new ones right. By the end of this course you will know every piece of machinery needed to actually build that classifier.

We focus on the classical methods like linear and logistic regression, naive Bayes, support vector machines, k-means clustering, Gaussian mixtures. These came before the deep learning boom, and people sometimes treat them as old news. They aren't. Two reasons. First, they are still genuinely powerful, and on many real problems they beat fancier models. Second, every big idea you need to understand modern AI, e.g. loss functions, optimization, generalization, regularization, shows up here in a clean form. Skip these and deep learning looks like magic. Learn these and it looks like the obvious next step.

By the end of the course you will know how to represent data, how to write down what it means to "fit" a model, how to actually fit it by nudging numbers in the right direction, and how to tell whether your model has truly learned something or just memorized the answers. That last skill, distinguishing real understanding from memorization, turns out to be the same question at the heart of how systems like ChatGPT and Claude work, and whether we should trust them.

This class will meet virtually via Zoom on Wednesdays from 4:30 to 6:00 PM San Francisco time and Thursdays from 4:30 to 6:30 PM San Francisco time from September 23 to December 3. 

Applications for Fall 2026 are due July 19. After that, we will continue to accept applications on a rolling basis while spots remain. Click here to apply!

Prerequisites: Single-variable calculus, basic probability, including conditional probability, aspects of linear algebra (matrix multiplication, dot products), and basic Python programming skills.